Programmatic method for reducing cost of control in parallel processes
Patent 6374403 Issued on April 16, 2002. Estimated Expiration Date: August 20, 2019. Estimated Expiration Date is calculated based on simple USPTO term provisions. It does not account for terminal disclaimers, term adjustments, failure to pay maintenance fees, or other factors which might affect the term of a patent.
A parallel compiler exploits temporal recursion to reduce the cost of control code generated in transforming a sequential nested loop program into a set of parallel processes mapped to an array of processors. A parallel compiler process transforms a nested loop program into a set of single loops, where each single loop is assigned to execute on a processor element in a parallel processor array. The parallel compiler obtains a mapping of iterations of the nested loop to processor elements in the array and a schedule of start times for initiating execution of the iterations on corresponding processor elements in the array. Based on this mapping and iteration schedule, the parallel compiler generates code to compute iteration coordinates on a processor element for an iteration of the single loop from iteration coordinates computed on the same processor element for a previous iteration of the single loop. The parallel compiler uses this method to generate code to compute loop indices, memory addresses, and tests of loop bounds efficiently based on values from a previous iteration.
Other References
Allen et al. Automatic Decomposition of Scientific Programs for Parallel Executions. IEEE. 1987. pp. 63-76.
Tang et al. Impact of Self-Scheduling Order on Performance Of Multiprocessor Systems. ACM. 1988. pp. 593-603.
Calinescu. A BSP Approach to the Scheduling of Tightly-Nested Loops. IEEE. 1997. pp. 549-553.
Leonardi et al. Nested Loops Optimization for Multiprocessor Architecture Design. IEEE. 1998. pp. 415-418.
Yang et al. On Symbolic Scheduling and Parallel Complexity of Loops. IEEE. 1995. pp. 360-367.
IBM Technical Disclosure Bulletin. Automatic Parallelization of Loops in Sequential Code. Jul., 1987. pp. 731-735.
Rainer Leupers, Peter Marwedel, "Retargetable Generation of Code Selectors from HDL Processor Models," IEEE, 1997, pp. 140-144
George Hadjiyiannis, Silvina Hanono, Srinivas Devadas, "ISDL: An Instruction Set Description Language for Retargetability," ACM, 1997, pp. 299-302
Gyllenhaal et al., "HMDES Version 2.0 Specification," Hewlett Packard Laboratories Technical Report Impact-96-3, (Published before Aug. 20, 1999)
Hadjiyiannis et al., "A Methodology for Accurate Performance Evaluation in Architecture Exploration." (Published before Aug. 20, 1999)
Hoogerbrugge et al., "Automatic Synthesis of Transport Triggered Processors." (Published before Aug. 20, 1999)
Corporaal et al., "MOVE: A Framework for High-Performance Processor Design," ACM, 1991, pp. 692-701
Corporaal et al., "Cosynthesis with the MOVE Framework." (Published before Aug. 20, 1999)
Lanneer et al, "Chapter 5--Chess: Retargetable Code Generation for Embedded DSP Processors," Code Generation for Embedded Processors, Kluwer Academic Publications, pp. 85-102. (Published before Aug. 20, 1999)
Fauth, "Chapter 8--Beyond Tool-Specific Machine Descriptions," Code Generation for Embedded Processors, Kluwer Academic Publications, pp. 138-152
Quinton et al., Systolic Algorithms & Architectures, Prentice Hall, 1991, pp. 283-339
Park et al., "Sehwa: A Software Package for Synthesis of Pipelines from Behavioral Specifications," IEEE, Mar. 1988, vol. 7, No. 3, pp. 356-370
Megson et al., "A Synthesis Method of LSGP Partitioning of Given-Shape Regular Arrays," IEEE, 1995, pp. 234-238
Chen et al., "A General Methodology of Partitioning and Mapping for Given Regular Arrays," IEEE Transactions on Parallel and Distributed Systems, vol. 6, No. 10, 1995, pp. 1100-1107
Bagg et al., "Parallelizing Applications into Silicon," MIT (Published before Aug. 20, 1999)
Rim et al., "Optimal and Heuristic Algorithms for Solving the Binding Problem," Madison, WI, Sep. 10, 1993
Weinhardt et al., "Memory Access Optimization and RAM Inference for Pipeline Vectorization," Proceedings of the 9th International Workshop, FPL '99, pp. 61-70
Shackleford et al., "Satsuki: An Integrated Processor Synthesis and Compiler Generation System," IEICE Special Issue on Synthesis and Verification of Hardware Design, 10-96
Devadas et al., "Algorithms for Hardware Allocation in Data Path Synthesis," IEEE, Jul. 1989, vol. 8, No. 7, pp. 768-781
Cloutier et al., "The Combination of Scheduling, Allocation, and Mapping in a Single Algorithm," 27th ACM/IEEE Design Automation Conference, 1990, pp. 71-76
Wilson et al., "An ILP Solution for Simultaneous Scheduling, Allocation, and Binding in Multiple Block Synthesis," IEEE, 1994, pp. 581-586
Wilson et al., "An ILP Solution for Optimum Scheduling, Module and Register Allocation, and Operation Binding in Datapath Synthesis," OPA, 1995, pp. 21-36
Paulin et al., "Force-Directed Scheduling for the Behavioral Synthesis of ASIC's," IEEE, Jun. 1989, vol. 8, No. 6, pp. 661-679
Chang et al., "Using Integer Linear Programming for Instruction Scheduling and Register Allocation in Multi-Issue Processors," Computers Math. Applic., vol. 34, No. 9, 1997, pp. 1-14
Aditya et al., "Elcor's Machine Description System: Version 3.0," HPL-98-128, Oct. 1998, pp. 1-75
Rau et al., "Machine-Description Driven Compilers for EPIC Processors," HP Laboratories Technical Report, HPL-98-40, Sep. 1998, pp. 1-82
Kathail et al., "HPL PlayDoh Architecture Specification: Version 1.0," HP Laboratories Technical Report, HPL-93-80, Feb. 1994, pp. 1-48
Darte et al., "Partitioning for Array Processors," Tech. Rep. 90-23, LIP, ENS Lyon, 1990
A. Darte, "Regular Partitioning for Synthesizing Fixed-Size Systolic Arrays," Integration, the VLSI Journal, vol. 12, 1991, pp. 293-304
X. Chen and G.M. Megson, "A general Methodology of Partitioning and Mapping for Given Regular Arrays," IEEE Transactions on Parallel and Distributed Systems, vol. 6, No. 10, 1995, pp. 1100-110