Patent ReferencesMethod for extracting multi-word technical terms from text System for generating a search formula by accessing search terms on the basis of a training set of pertinent and non-pertinent objects Data base retrieval system utilizing stored vicinity feature values Apparatus and method for retrieving and grouping images representing text files based on the relevance of key words extracted from a selected file to the text files System and method for mining generalized association rules in databases Patent #: 5615341 InventorsApplicationNo. 541665 filed on 10/10/1995US Classes:707/6, Pattern matching access707/5Query augmenting and refining (e.g., inexact access)ExaminersPrimary: Amsbury, WayneAttorney, Agent or FirmInternational ClassS06F 017/30AbstractA method and apparatus are disclosed for mining generalized sequential patterns from a large database of data sequences, taking into account user specified constraints on the time-gap between adjacent elements of the patterns, sliding time-window, and taxonomies over data items. The invention first identifies the items with at least a minimum support, i.e., those contained in more than a minimum number of data sequences. The items are used as a seed set to generate candidate sequences. Next, the support of the candidate sequences are counted. The invention then identifies those candidate sequences that are frequent, i.e., those with a support above the minimum support. The frequent candidate sequences are entered into the set of sequential patterns, and are used to generate the next group of candidate sequences. Preferably, the candidate sequences are generated by joining previously found frequent candidate sequences, and candidate sequences having a contiguous subsequence without minimum support are discarded. In addition, the invention includes a hash-tree data structure for storing the candidate sequences and memory management techniques for performance improvement.Other References
| |