Document information retrieval using global word co-occurrence patterns
Patent 5675819 Issued on October 7, 1997. Estimated Expiration Date: October 7, 2014. Estimated Expiration Date is calculated based on simple USPTO term provisions. It does not account for terminal disclaimers, term adjustments, failure to pay maintenance fees, or other factors which might affect the term of a patent.
A method and apparatus accesses relevant documents based on a query. A thesaurus of word vectors is formed for the words in the corpus of documents. The word vectors represent global lexical co-occurrence patterns and relationships between word neighbors. Document vectors, which are formed from the combination of word vectors, are in the same multi-dimensional space as the word vectors. A singular value decomposition is used to reduce the dimensionality of the document vectors. A query vector is formed from the combination of word vectors associated with the words in the query. The query vector and document vectors are compared to determine the relevant documents. The query vector can be divided into several factor clusters to form factor vectors. The factor vectors are then compared to the document vectors to determine the ranking of the documents within the factor cluster.
Other References
"Cluster Algorithm for Vector Libraries Having Multiple Dimensions", IBM Technical Disclosure Bulletin, vol. 37, No. 02A, Feb. 1994, pp. 79-82
"LSI meets TREC: A Status Report", Susan T. Dumais, NIST Special Publication 500-207, The First Text Retrieval Conference (TREC-1), Mar., 1993, pp. 137-152
"Full Text Indexing Based on Lexical Relations An Application: Software Libraries", Yoelle S. Maarek et al., Proceedings of the Twelfth Annual International ACMSIGIR Conference on Research and Development in Information Retrieval, Jun. 25-28, 1989, pp. 198-206
"Dimensions of Meaning", Hinrich Schuetze, Proceedings Supercomputing '92, Nov. 16-20, 1992, pp. 787-796
Douglas R. Cutting et al., Scatter/Gather: A Cluster-based Approach to Browsing Large Document Collections, pp. 1-12, 15th Ann Int'l SIGIR '92 (1992)
Gerard Salton et al., Introduction to Modern Information Retrieval, McGraw-Hill Book Company, pp. 118-155
Crouch, C.J., "An Approach to the Automatic Construction of Global Thesauri", Information Processing & Management, vol. 26, No. 5, pp. 629-640, 1990
Deerwester et al., "Indexing by Latent Semantic Analysis", Journal of the American Society for Information Science 41(6), pp. 391-407, 1990
Evans et al., "Automatic Indexing Using Selective NLP And First-Order Thesauri", Departments of Philosophy and Computer Science Laboratory for Computational Linguistics, Carnegie Mellon University, Pittsburgh, PA, pp. 624-639
Gallant, Stephen I., "A Practical Approach for Representing Context and for Performing Word Sense Disambiguation Using Neural Networks", Neural Computation 3, pp. 293-309, 1991
Grefenstette, Gregory, "Use of Syntactic Context to Produce Term Association Lists for Text Retrievel", Computer Science Department, University of Pittsburgh, Pittsburgh, PA, pp. 89-97, 1992
Liddy et al., "Statistically-Guided Word Sense Disambiguation", School of Information Studies, Syracuse University, Syracuse, New York, pp. 98-107
McCune et al., "Rubric: A System for Rule-Based Information Retrieval", IEEE Transactions on Software Engineering, vol. SE-11, No. 9, 1985, pp. 939-945
Peat et al., "The Limitations of Term Co-Occurrence Data for Query Expansion in Document Retrieval Systems", Journal of the American Society for Information Science 42(5), pp. 378-383, 1991
Qui et al., "Concept Based Query Expansion", Department of Computer Science, Swiss Federal Institute of Technology, Zurich, Switzerland, pp. 160-169
Ruge, Gerda, "Experiments on Linguistically-Based Term Associations", Information Processing & Management, vol. 28, No. 3, pp. 317-332, 1992
Voorhees et al., "Vector Expansion in a Large Collection", Siemens Corporate Research, Inc., Princeton, New Jersey
Wilks et al., "Providing Machine Tractable Dictionary Tools", Computer Research Laboratory, New Mexico State University, Las Cruces, New Mexico, pp. 98-15