U.S. patents available from 1976 to present.
U.S. patent applications available from 2005 to present.

Method and system for mining generalized sequential patterns in a large database

Patent 5742811 Issued on April 21, 1998. Estimated Expiration Date: Icon_subject October 10, 2015. Estimated Expiration Date is calculated based on simple USPTO term provisions. It does not account for terminal disclaimers, term adjustments, failure to pay maintenance fees, or other factors which might affect the term of a patent.

Patent References

Method for extracting multi-word technical terms from text
Patent #: 5423032
Issued on: 06/06/1995
Inventor: Byrd, et al.

System for generating a search formula by accessing search terms on the basis of a training set of pertinent and non-pertinent objects
Patent #: 5442781
Issued on: 08/15/1995
Inventor: Yamagata

Data base retrieval system utilizing stored vicinity feature values
Patent #: 5546578
Issued on: 08/13/1996
Inventor: Takada

Apparatus and method for retrieving and grouping images representing text files based on the relevance of key words extracted from a selected file to the text files
Patent #: 5598557
Issued on: 01/28/1997
Inventor: Doner, et al.

System and method for mining generalized association rules in databases Patent #: 5615341
Issued on: 03/25/1997
Inventor: Agrawal, et al.

Inventors

Application

No. 541665 filed on 10/10/1995

US Classes:

707/6, Pattern matching access707/5Query augmenting and refining (e.g., inexact access)

Examiners

Primary: Amsbury, Wayne

Attorney, Agent or Firm

International Class

S06F 017/30

Abstract

A method and apparatus are disclosed for mining generalized sequential patterns from a large database of data sequences, taking into account user specified constraints on the time-gap between adjacent elements of the patterns, sliding time-window, and taxonomies over data items. The invention first identifies the items with at least a minimum support, i.e., those contained in more than a minimum number of data sequences. The items are used as a seed set to generate candidate sequences. Next, the support of the candidate sequences are counted. The invention then identifies those candidate sequences that are frequent, i.e., those with a support above the minimum support. The frequent candidate sequences are entered into the set of sequential patterns, and are used to generate the next group of candidate sequences. Preferably, the candidate sequences are generated by joining previously found frequent candidate sequences, and candidate sequences having a contiguous subsequence without minimum support are discarded. In addition, the invention includes a hash-tree data structure for storing the candidate sequences and memory management techniques for performance improvement.

Other References

  • Agrawal et al., An Interval Classifier for Database Mining Applications, Proc of the 18th VLDB Conference, Vancouver, British Columbia, Aug. 31, 1992, pp. 560-573
  • Dietterich et al., Discovering Patterns in Sequences of Events, Artificial Intelligence, Elsevier Science Publishers B.V. (North Holland), pp. 187-232, 1985
  • R. Agrawal et al., Mining Sequential Patterns, Int'l Conference on Data Engineering, pp. 3-14, Mar. 1995
  • H. Mannila et al., Discovering Frequent Episodes in Sequences (Extended Abstract) Int'l Conference on Knowledge Discovery in Databases and Data Mining, Dec. 1993, KDD-95, pp. 210-215
  • R. Agrawal et al., Fast Algorithms for Mining Association Rules, Proceedings of the 20th VLDB Conf., Santiago, Chile, pp. 487-499, 1994
  • J Wang et al., Combinatorial Pattern Discovery for Scientific Data: Some Preliminary Results, Proc. of the ACM SIGMOD Conf. on Management of Data, Minneapolis, pp. 115-125, May 1994
  • R. Agrawal et al., Mining Association Rules Between Sets of Items in Large Databases, Proc. of the ACM SIGMOD Conf. on Management of Data, pp. 207-216, Washington, D.C. May 1993
  • R. Agrawal et al., Database Mining: A Performance Perspective, IEEE Transactions on Knowledge and Data Eng. Special Issue on Learning and Discovery in Knowledge-Based Databases, pp. 914-925, Dec. 1993
  • R. Srikant et al., Mining Generalized Association Rules, IBM Research Report 9963 (87922), Jun. 27, 1995
  • M. Houtsma et al., Set-Oriented Mining for Association Rules, Research Report 9567 (83573) Oct. 22, 199
PatentsPlus Images
Enhanced PDF formats
loading...
PatentsPlus: add to cart
PatentsPlus: add to cartSearch-enhanced full patent PDF image
$9.95more info
PatentsPlus: add to cart
PatentsPlus: add to cartIntelligent turbocharged patent PDFs with marked up images
$16.95more info
 
Sign InRegister
Username  
Password   
forgot password?