Patent ReferencesParallel processor Text search system Method and system for generating lexicon of cooccurrence relations in natural language Patent #: 4942526 InventorsAssigneeApplicationNo. 739111 filed on 07/31/1991US Classes:704/9Natural languageExaminersPrimary: Envall, Roy N. Jr.Assistant: Poinvil, Frantzy Attorney, Agent or FirmInternational ClassesG06F 015/38G01L 001/06 AbstractClassification of natural language data wherein the natural language data has an open-ended range of possible values or the data values do not have a relative order. A training database stores training records, wherein each training record includes predictor data fields. Each predictor data field containes a feature, wherein each feature is a natural language term, and a target data field containing a target value representing a classification of the record. Features may also include conjunctions of natural language terms and each feature may also be a member of a category subset of features. The training database stores, for each feature, a probability weight value representing the probability that a record will have the target value contained in the target data field if a feature contained in a corresponding predictor data field occurs in the record. Features are extracted from a new record and each feature from the new record is used to query the training records to determine the probability weights from the training records having matching features. The probability weights are accumulated for each training record to determine a comparison score representing the probability that the training record matches the new record and provide an output indicating the training records most probability matching the new record. | |