U.S. patents available from 1976 to present.
U.S. patent applications available from 2005 to present.

Method and apparatus for incremental computation of the accuracy of a categorization-by-example system

Patent 7089238 Issued on August 8, 2006. Estimated Expiration Date: Icon_subject June 27, 2021. Estimated Expiration Date is calculated based on simple USPTO term provisions. It does not account for terminal disclaimers, term adjustments, failure to pay maintenance fees, or other factors which might affect the term of a patent.

Patent References

Multilingual document retrieval system and method using semantic vector matching
Patent #: 6006221
Issued on: 12/21/1999
Inventor: Liddy, et al.

Multidimensional data clustering and dimension reduction for indexing and searching Patent #: 6122628
Issued on: 09/19/2000
Inventor: Castelli, et al.

Inventors

Assignee

Application

No. 09893301 filed on 06/27/2001

US Classes:

707/5Query augmenting and refining (e.g., inexact access)

Examiners

Primary: Choules, Jack M.

Attorney, Agent or Firm

International Class

G06F 17/30

Abstract

Disclosed are methods and for incrementally updating the accuracy provided by documents in training set of used for automatic categorization. A k-nearest neighbor database includes the documents in the training set, categories, category assignments of the documents and category scores for the documents. A list made up of the nearest neighbors of the documents and corresponding similarity scores contains is maintained by the method. On adding or deleting documents or category assignments, the documents influenced by the changed documents or category assignments are identified. The category scores of the identified documents are updated to be consistent for the updated training set and a new precision and recall curves are computed for the categories including updated category scores. The precision and recall curves may be used to determine an optimal number of documents to maximize the return of relevant documents while minimizing the total number of documents.

Other References

  • Arya, S. et al, “An Optimal Algorithm for Approsimate Nearest Neighbor Searching in Fixed Dimentions,” Nov. 1998, Journal of the AMC vol. 45, No. 6, pp. 891-923.
  • Cutting, D. R., et al. “Scatter/Gather: A Cluster-based Approach to Browsing Large Document Collections,” Jun. 1992, Ann Int'l SIGIR '92 Denmark, pp. 318-329.
  • Daniel P. Lopresti A Comparison of Text-Based Methods for Detecting Duplication in Document Image Databases Document Recognition and Retrieval VII (IS&T/SPIE Electronic Imaging 2000), Jan. 2000, San Jose, CA.
  • Narayanan Shivakumar and Hector Garcia-Molina The SCAM Approach to Copy Detection in Digital Libraries Department of Computer Science Stanford University Stanford, CA 94305 USA.
  • William B. Frakes and Ricardo Baeza-Yates Information Retrieval Data Structures & Algorithms pp. 19-71.
  • Ian H. Witten et al. Managing Gigabytes Compressing and Indexing Documents and Images pp. 181-188.
  • Fazli Can et al. A Dynamic Cluster Maintenance System for Information Retrieval SIGIR 1987 pp. 123-131.
  • Fazli Can et al. Concepts of the Cover Coefficient-Based Clustering Methodology SIGIR 1985 pp. 204-211.
PatentsPlus Images
Enhanced PDF formats
loading...
PatentsPlus: add to cart
PatentsPlus: add to cartSearch-enhanced full patent PDF image
$9.95more info
PatentsPlus: add to cart
PatentsPlus: add to cartIntelligent turbocharged patent PDFs with marked up images
$18.95more info
 
Sign InRegister
Username  
Password   
forgot password?