This paper introduces a VLIW architecture and its optimizing compiler which are now under development. Based on the URPR software pipelining approach, the architecture integrates nine PEs with the same structure on a single-chip. In addition, a pipeline register file is used to reduce the inter-body dependent distance to enhance the overlapping of the adjacent loop iterations, furthermore to shorten the length of the optimized loop body. The pipeline register file also increases the bandwidth between PEs. The optimizing compiler is also based on the URPR software pipelining approach. It uses a two-level software pipelining method to implement phase-coupled resource allocation and code optimization, and obtains good time and space optimal results. A compilation example of an FFT innermost loop is discussed. The simulation results indicate that the architecture could reach high performance with the aid of the optimizing compiler.
展开▼