AbstractA parallel compiler exploits temporal recursion to reduce the cost of control code generated in transforming a sequential nested loop program into a set of parallel processes mapped to an array of processors. A parallel compiler process transforms a nested loop program into a set of single loops, where each single loop is assigned to execute on a processor element in a parallel processor array. The parallel compiler obtains a mapping of iterations of the nested loop to processor elements in the array and a schedule of start times for initiating execution of the iterations on corresponding processor elements in the array. Based on this mapping and iteration schedule, the parallel compiler generates code to compute iteration coordinates on a processor element for an iteration of the single loop from iteration coordinates computed on the same processor element for a previous iteration of the single loop. The parallel compiler uses this method to generate code to compute loop indices, memory addresses, and tests of loop bounds efficiently based on values from a previous iteration. Other References- Allen et al. Automatic Decomposition of Scientific Programs for Parallel Executions. IEEE. 1987. pp. 63-76.
- Tang et al. Impact of Self-Scheduling Order on Performance Of Multiprocessor Systems. ACM. 1988. pp. 593-603.
- Calinescu. A BSP Approach to the Scheduling of Tightly-Nested Loops. IEEE. 1997. pp. 549-553.
- Leonardi et al. Nested Loops Optimization for Multiprocessor Architecture Design. IEEE. 1998. pp. 415-418.
- Yang et al. On Symbolic Scheduling and Parallel Complexity of Loops. IEEE. 1995. pp. 360-367.
- IBM Technical Disclosure Bulletin. Automatic Parallelization of Loops in Sequential Code. Jul., 1987. pp. 731-735.
- Rainer Leupers, Peter Marwedel, "Retargetable Generation of Code Selectors from HDL Processor Models," IEEE, 1997, pp. 140-144
- George Hadjiyiannis, Silvina Hanono, Srinivas Devadas, "ISDL: An Instruction Set Description Language for Retargetability," ACM, 1997, pp. 299-302
- Gyllenhaal et al., "HMDES Version 2.0 Specification," Hewlett Packard Laboratories Technical Report Impact-96-3, (Published before Aug. 20, 1999)
- Hadjiyiannis et al., "A Methodology for Accurate Performance Evaluation in Architecture Exploration." (Published before Aug. 20, 1999)
- Hoogerbrugge et al., "Automatic Synthesis of Transport Triggered Processors." (Published before Aug. 20, 1999)
- Corporaal et al., "MOVE: A Framework for High-Performance Processor Design," ACM, 1991, pp. 692-701
- Corporaal et al., "Cosynthesis with the MOVE Framework." (Published before Aug. 20, 1999)
- Lanneer et al, "Chapter 5--Chess: Retargetable Code Generation for Embedded DSP Processors," Code Generation for Embedded Processors, Kluwer Academic Publications, pp. 85-102. (Published before Aug. 20, 1999)
- Fauth, "Chapter 8--Beyond Tool-Specific Machine Descriptions," Code Generation for Embedded Processors, Kluwer Academic Publications, pp. 138-152
- Quinton et al., Systolic Algorithms & Architectures, Prentice Hall, 1991, pp. 283-339
- Park et al., "Sehwa: A Software Package for Synthesis of Pipelines from Behavioral Specifications," IEEE, Mar. 1988, vol. 7, No. 3, pp. 356-370
- Megson et al., "A Synthesis Method of LSGP Partitioning of Given-Shape Regular Arrays," IEEE, 1995, pp. 234-238
- Chen et al., "A General Methodology of Partitioning and Mapping for Given Regular Arrays," IEEE Transactions on Parallel and Distributed Systems, vol. 6, No. 10, 1995, pp. 1100-1107
- Bagg et al., "Parallelizing Applications into Silicon," MIT (Published before Aug. 20, 1999)
- Rim et al., "Optimal and Heuristic Algorithms for Solving the Binding Problem," Madison, WI, Sep. 10, 1993
- Weinhardt et al., "Memory Access Optimization and RAM Inference for Pipeline Vectorization," Proceedings of the 9th International Workshop, FPL '99, pp. 61-70
- Shackleford et al., "Satsuki: An Integrated Processor Synthesis and Compiler Generation System," IEICE Special Issue on Synthesis and Verification of Hardware Design, 10-96
- Devadas et al., "Algorithms for Hardware Allocation in Data Path Synthesis," IEEE, Jul. 1989, vol. 8, No. 7, pp. 768-781
- Cloutier et al., "The Combination of Scheduling, Allocation, and Mapping in a Single Algorithm," 27th ACM/IEEE Design Automation Conference, 1990, pp. 71-76
- Wilson et al., "An ILP Solution for Simultaneous Scheduling, Allocation, and Binding in Multiple Block Synthesis," IEEE, 1994, pp. 581-586
- Wilson et al., "An ILP Solution for Optimum Scheduling, Module and Register Allocation, and Operation Binding in Datapath Synthesis," OPA, 1995, pp. 21-36
- Paulin et al., "Force-Directed Scheduling for the Behavioral Synthesis of ASIC's," IEEE, Jun. 1989, vol. 8, No. 6, pp. 661-679
- Chang et al., "Using Integer Linear Programming for Instruction Scheduling and Register Allocation in Multi-Issue Processors," Computers Math. Applic., vol. 34, No. 9, 1997, pp. 1-14
- Aditya et al., "Elcor's Machine Description System: Version 3.0," HPL-98-128, Oct. 1998, pp. 1-75
- Rau et al., "Machine-Description Driven Compilers for EPIC Processors," HP Laboratories Technical Report, HPL-98-40, Sep. 1998, pp. 1-82
- Kathail et al., "HPL PlayDoh Architecture Specification: Version 1.0," HP Laboratories Technical Report, HPL-93-80, Feb. 1994, pp. 1-48
- Darte et al., "Partitioning for Array Processors," Tech. Rep. 90-23, LIP, ENS Lyon, 1990
- A. Darte, "Regular Partitioning for Synthesizing Fixed-Size Systolic Arrays," Integration, the VLSI Journal, vol. 12, 1991, pp. 293-304
- X. Chen and G.M. Megson, "A general Methodology of Partitioning and Mapping for Given Regular Arrays," IEEE Transactions on Parallel and Distributed Systems, vol. 6, No. 10, 1995, pp. 1100-110
| InventorsApplication No. 378397 filed on 08/20/1999 US Classes:717/161, Including scheduling instructions717/150, Loop compiling717/151OptimizationField of Search717/150, Loop compiling717/160, Including loop717/161, Including scheduling instructions717/159, Code restructuring717/149, For a parallel or multiprocessor system717/151, Optimization712/241Loop executionExaminers Primary: Dam, Tuan Q. Assistant: Zhen, WeiUS Patent References5230053, Processor scheduling method for iterative loops Issued on: 07/20/1993 Inventor: Zaiki5442790, Optimizing compiler for computers Issued on: 08/15/1995 Inventor: Nosenchuck5579494, Apparatus for detecting possibility of parallel processing and method thereof and a program translation apparatus utilized therein Issued on: 11/26/1996 Inventor: Zaiki5787272, Method and apparatus for improving synchronization time in a parallel processing system Issued on: 07/28/1998 Inventor: Gupta, et al.5802375, Outer loop vectorization Issued on: 09/01/1998 Inventor: Ngo, et al.5832272, Apparatus and method for parallel computation Issued on: 11/03/1998 Inventor: Kalantery5852734, Method and compiler for parallel execution of a program Issued on: 12/22/1998 Inventor: Komatsu, et al.6041181, Method of, system for, and computer program product for providing quick fusion in WHERE constructs Issued on: 03/21/2000 Inventor: Ju, et al.6058266, Method of, system for, and computer program product for performing weighted loop fusion by an optimizing compiler Issued on: 05/02/2000 Inventor: Megiddo, et al.6059841, Updating data dependencies for loop strip mining Issued on: 05/09/2000 Inventor: Caracuzzo6070011Compiler for performing a loop fusion, dependent upon loop peeling and/or loop reversal Issued on: 05/30/2000 Inventor: Liu, et al.International Class G06F 009/45
|