U.S. patents available from 1976 to present.
U.S. patent applications available from 2005 to present.

Scalable system for expectation maximization clustering of large databases

Patent 6263337 Issued on July 17, 2001. Estimated Expiration Date: Icon_subject May 22, 2018. Estimated Expiration Date is calculated based on simple USPTO term provisions. It does not account for terminal disclaimers, term adjustments, failure to pay maintenance fees, or other factors which might affect the term of a patent.

Patent References

Automatic clustering method
Patent #: 5329596
Issued on: 07/12/1994
Inventor: Sakou, et al.

Method and system for data clustering for very large databases
Patent #: 5832182
Issued on: 11/03/1998
Inventor: Zhang, et al.

Method for mining causality rules with applications to electronic commerce
Patent #: 5832482
Issued on: 11/03/1998
Inventor: Yu, et al.

Object-oriented data mining and decision making system
Patent #: 5875285
Issued on: 02/23/1999
Inventor: Chang

System and method for data mining from relational data by sieving through iterated relational reinforcement Patent #: 5884305
Issued on: 03/16/1999
Inventor: Kleinberg, et al.

Inventors

Assignee

Application

No. 083906 filed on 05/22/1998

US Classes:

707/6, Pattern matching access707/3, Query processing (i.e., searching)707/4Query formulation, input preparation, or translation

Examiners

Primary: Amsbury, Wayne
Assistant: Havan, Thu-Thao

Attorney, Agent or Firm

International Class

G06F 017/00

Abstract

In one exemplary embodiment the invention provides a data mining system for use in finding clusters of data items in a database or any other data storage medium. Before the data evaluation begins a choice is made of the number M of models to be explored, and the number of clusters (K) of clusters within each of the M models. The clusters are used in categorizing the data in the database into K different clusters within each model. An initial set of estimates for a data distribution of each model to be explored is provided. Then a portion of the data in the database is read from a storage medium and brought into a rapid access memory buffer whose size is determined by the user or operating system depending on available memory resources. Data contained in the data buffer is used to update the original model data distributions in each of the K clusters over all M models. Some of the data belonging to a cluster is summarized or compressed and stored as a reduced form of the data representing sufficient statistics of the data. More data is accessed from the database and the models are updated. An updated set of parameters for the clusters is determined from the summarized data (sufficient statistics) and the newly acquired data. Stopping criteria are evaluated to determine if further data should be accessed from the database.

Other References

  • J. Banfield and A. Raftery, "Model-based Gaussian and non-Gaussian Clustering" Biometrics, vol. 49:803-821, pp. 15-34, (1993) C. Bishop, Neural Networks for Pattern Recognition. Oxford University Press. (1995)
  • P. Cheeseman and J. Stutz, "Bayesian Classification (AutoClass) Theory and Results", in Advances in Knowledge Discovery and Data Mining, Fayyad, U., G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusaymy (Eds.), pp. 153-180. MIT Press, (1996)
  • A.P. Demster, N.M. Laird, and D.B. Rubin, "Maximum Likelihood from Incomplete Data via the EM Algorithm". Journal of the Royal Statistical Society, Series B, 39(1): 1-38, (1977)
  • D. Fisher. "Knowledge Acquisition via Incremental Conceptual Clustering". Machine Learning, 2:139-172, (1987)
  • R.M. Neal and G.E. Hinton, "A View of the EM Algorithm that Justifies Incremental,Sparse, and Other Variants", to appear in M.I. Jordan(Ed.), Learning in Graphical Models, Kluwer: (1998)
  • S.Z. Selim and M.A. Ismail, K-Means-Type Algorithms: A Generalized Convergence Theorem and Characterization of Local Optimality.: IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. PAMI-6, No. 1, (1984)
  • T. Zhang, R. Ramakrishnan, and M. Livny. "BIRCH: A New Data Clustering Algorithm and Its Applications". Data Mining and Knowledge Discovery 1(2). (1997)
  • C. M. Bishop. "Neural Networks for Pattern Recognition." Bayes Theorem. Clarendon Press.Oxford pp. 17-23 (1995)
  • C.M. Bishop. "Neural Networks For Pattern Recognition." The Normal distribution. Clarendon Press.Oxford. pp. 34-38 (1995)
  • C.M. Bishop. "Neural Networks For Pattern Recognition." Maximum Likelihood. Clarendon Press. Oxford pp. 39-42 (1995)
  • C.M. Bishop. "Neural Networks For Pattern Recognition." Density Estimation in General. Clarendon Press. Oxford pp. 51-55, (1995)
  • C. M. Bishop. "Neural Networks for Pattern Recognition." Mixture Models/Maximum Likelihood/EM Algorithm. Clarendon Press. Oxford pp. 59-72 (1995)
  • R. Duda and P. Hart. "Pattern Classification and Scene Analysis." Bayes Decision Theory. John Wiley & Sons pp. 10-13
  • R. Duda and P. Hart. "Pattern Classification and Scene Analysis." The Normal Density. John Wiley & Sons. pp. 22-24 (1973)
  • R. Duda and P. Hart. "Pattern Classification and Scene Analysis." Maximum Likelihood Estimation: John Wiley & Sons pp. 45-49 (1973)
  • R. Duda and P. Hart. "Pattern Classificationa nd Scene Analysis." Sufficient Statistics and the Exponential Family. John Wiley & Sons pp. 62-66 (1973)
  • R. Duda and P. Hart. "Pattern Classification and Scene Analysis." Density Estimation. John Wiley & Sons Chap. 4, pp. 85-88 (1973)
  • R. Duda and P. Hart. "Pattern Classification and Scene Analysis." Unsupervised Learning and Clustering. John Wiley & Sons. Chap. 6 pp. 189-200 (1973)
  • R. Duda and P. Hart. "Pattern Classification and Scene Analysis." Clustering Criteria (K-Mean): John Wiley & Sons Chap. 6 pp. 217-219 (1973)
  • R. Duda and P. Hart. "Pattern Classificationa nd Scene Analysis." Iterative Optimization. (relates to K-Mean/EM) John Wiley & Sons Chap. 6 pp. 225-228 (1973)
  • K. Fukunaga. "Statistical Pattern Recognition". Bayes Theorem Academic Press Chap. 1 pp. 12-13 (1990)
  • K. Fukanaga. "Statistical Pattern Recognition." Normal Distributions. Academic Press. Chap. 2 pp. 16-24 (1990)
  • K. Fukanaga. "Statistical Pattern Recognition." Clustering Academic Press. Chap. 11 pp. 508-512 (1990)
  • R. Duda and P. Hart. "Pattern Classificationa nd Scene Analysis." Nearest Mean Reclassification Algorithm (k-Mean): Chap. 11 pp. 515-523. Academic Press. (1990)
  • K. Fukunaga. "Statistical Pattern Recognition". Maximum Likelihood. Academic Press Chap. 11 pp. 527-532 (1990
PatentsPlus Images
Enhanced PDF formats
loading...
PatentsPlus: add to cart
PatentsPlus: add to cartSearch-enhanced full patent PDF image
$9.95more info
PatentsPlus: add to cart
PatentsPlus: add to cartIntelligent turbocharged patent PDFs with marked up images
$16.95more info
 
Sign InRegister
Username  
Password   
forgot password?