U.S. patents available from 1976 to present.
U.S. patent applications available from 2005 to present.

Document information retrieval using global word co-occurrence patterns

Patent 5675819 Issued on October 7, 1997. Estimated Expiration Date: Icon_subject October 7, 2014. Estimated Expiration Date is calculated based on simple USPTO term provisions. It does not account for terminal disclaimers, term adjustments, failure to pay maintenance fees, or other factors which might affect the term of a patent.

Patent References

Method and system for generating lexicon of cooccurrence relations in natural language
Patent #: 4942526
Issued on: 07/17/1990
Inventor: Okajima, et al.

Method and apparatus for generating and/or updating cooccurrence relation dictionary
Patent #: 5181163
Issued on: 01/19/1993
Inventor: Nakajima, et al.

Methods for generating or revising context vectors for a plurality of word stems Patent #: 5325298
Issued on: 06/28/1994
Inventor: Gallant

Inventor

Assignee

Application

No. 260575 filed on 06/16/1994

US Classes:

704/10, Dictionary building, modification, or prioritization704/9, Natural language707/3, Query processing (i.e., searching)715/500PRESENTATION PROCESSING OF DOCUMENT

Examiners

Primary: Hayes, Gail O.
Assistant: Kalidindi, Krishna

Attorney, Agent or Firm

International Classes

G06F 015/38
G06F 015/21

Abstract

A method and apparatus accesses relevant documents based on a query. A thesaurus of word vectors is formed for the words in the corpus of documents. The word vectors represent global lexical co-occurrence patterns and relationships between word neighbors. Document vectors, which are formed from the combination of word vectors, are in the same multi-dimensional space as the word vectors. A singular value decomposition is used to reduce the dimensionality of the document vectors. A query vector is formed from the combination of word vectors associated with the words in the query. The query vector and document vectors are compared to determine the relevant documents. The query vector can be divided into several factor clusters to form factor vectors. The factor vectors are then compared to the document vectors to determine the ranking of the documents within the factor cluster.

Other References

  • "Cluster Algorithm for Vector Libraries Having Multiple Dimensions", IBM Technical Disclosure Bulletin, vol. 37, No. 02A, Feb. 1994, pp. 79-82
  • "LSI meets TREC: A Status Report", Susan T. Dumais, NIST Special Publication 500-207, The First Text Retrieval Conference (TREC-1), Mar., 1993, pp. 137-152
  • "Full Text Indexing Based on Lexical Relations An Application: Software Libraries", Yoelle S. Maarek et al., Proceedings of the Twelfth Annual International ACMSIGIR Conference on Research and Development in Information Retrieval, Jun. 25-28, 1989, pp. 198-206
  • "Dimensions of Meaning", Hinrich Schuetze, Proceedings Supercomputing '92, Nov. 16-20, 1992, pp. 787-796
  • Douglas R. Cutting et al., Scatter/Gather: A Cluster-based Approach to Browsing Large Document Collections, pp. 1-12, 15th Ann Int'l SIGIR '92 (1992)
  • Gerard Salton et al., Introduction to Modern Information Retrieval, McGraw-Hill Book Company, pp. 118-155
  • Crouch, C.J., "An Approach to the Automatic Construction of Global Thesauri", Information Processing & Management, vol. 26, No. 5, pp. 629-640, 1990
  • Deerwester et al., "Indexing by Latent Semantic Analysis", Journal of the American Society for Information Science 41(6), pp. 391-407, 1990
  • Evans et al., "Automatic Indexing Using Selective NLP And First-Order Thesauri", Departments of Philosophy and Computer Science Laboratory for Computational Linguistics, Carnegie Mellon University, Pittsburgh, PA, pp. 624-639
  • Gallant, Stephen I., "A Practical Approach for Representing Context and for Performing Word Sense Disambiguation Using Neural Networks", Neural Computation 3, pp. 293-309, 1991
  • Grefenstette, Gregory, "Use of Syntactic Context to Produce Term Association Lists for Text Retrievel", Computer Science Department, University of Pittsburgh, Pittsburgh, PA, pp. 89-97, 1992
  • Liddy et al., "Statistically-Guided Word Sense Disambiguation", School of Information Studies, Syracuse University, Syracuse, New York, pp. 98-107
  • McCune et al., "Rubric: A System for Rule-Based Information Retrieval", IEEE Transactions on Software Engineering, vol. SE-11, No. 9, 1985, pp. 939-945
  • Peat et al., "The Limitations of Term Co-Occurrence Data for Query Expansion in Document Retrieval Systems", Journal of the American Society for Information Science 42(5), pp. 378-383, 1991
  • Qui et al., "Concept Based Query Expansion", Department of Computer Science, Swiss Federal Institute of Technology, Zurich, Switzerland, pp. 160-169
  • Ruge, Gerda, "Experiments on Linguistically-Based Term Associations", Information Processing & Management, vol. 28, No. 3, pp. 317-332, 1992
  • Voorhees et al., "Vector Expansion in a Large Collection", Siemens Corporate Research, Inc., Princeton, New Jersey
  • Wilks et al., "Providing Machine Tractable Dictionary Tools", Computer Research Laboratory, New Mexico State University, Las Cruces, New Mexico, pp. 98-15
PatentsPlus Images
Enhanced PDF formats
loading...
PatentsPlus: add to cart
PatentsPlus: add to cartSearch-enhanced full patent PDF image
$9.95more info
PatentsPlus: add to cart
PatentsPlus: add to cartIntelligent turbocharged patent PDFs with marked up images
$16.95more info
 
Sign InRegister
Username  
Password   
forgot password?