Patent References 3876987 Microcomputer based distributed control network System for detecting a program execution fault Task scheduler for a fault tolerant multiple node processing system Distributed multiprocess transaction processing system and method Error recovery system of a multiprocessor system for recovering an error in a processor by making the processor into a checking condition after completion of microprogram restart from a checkpoint Fault tolerant hypercube computer system architecture Operations controller for a fault tolerant multiple node processing system Operations controller for a fault tolerant multiple node processing system Method and apparatus for automatic recovery from excessive spin loops in an N-way multiprocessing system InventorApplicationNo. 643274 filed on 05/08/1996US Classes:714/47, Performance monitoring for fault avoidance709/224, Computer network monitoring714/15State recovery (i.e., process or data file)ExaminersPrimary: Beausoliel, Robert W. Jr.Assistant: Le, Dieu-Minh Attorney, Agent or FirmInternational ClassesG06F 011/00G06F 011/08 200.12 AbstractTechniques for fault-tolerant computing which do not require fault-tolerant hardware or a fault-tolerant operating system. The techniques employ a monitor daemon which is implemented as one or more user processes and a fault-tolerant library which can be bound into application programs. A user process which is executing on ordinary hardware under an ordinary operating system is made fault tolerant by registering it with the monitor daemon. The degree of fault tolerance can be controlled by means of the fault-tolerant library. Included in the fault-tolerant library is a function which defines portions of a user process's memory as critical memory, a function which copies the critical memory to persistent storage, and a function which restores the critical memory from persistent storage. The monitor daemon monitors fault-tolerant processes, and when such a process hangs or crashes, the daemon restarts it. When the techniques are employed in a multi-node system, the monitor daemon on each node monitors one other node in addition to the processes in its own node. In addition, the monitor daemon may maintain copies of the state of fault-tolerant processes running at least on the monitored node. When the monitored node fails, the monitor daemon starts the processes from the monitored node for which the monitor daemon has state on its own node. When a node leaves or rejoins the multi-node system, what other node a given monitor daemon monitors is automatically redetermined for the new configuration of the multi-node system.Other References
| |