Image processing apparatus
Method and apparatus for automatic character type classification of European script documents
Methods and apparatus for automatic modification of semantically significant portions of a document without document image decoding Patent #: 5384863
ApplicationNo. 556436 filed on 11/09/1995
US Classes:382/229, Context analysis or word recognition (e.g., character string)382/173, IMAGE SEGMENTATION382/224, Classification704/9Natural language
ExaminersPrimary: Couso, Jose L.
Assistant: Do, Anh Hong
Attorney, Agent or Firm
International ClassG06K 009/72
AbstractHighlighting and categorization of documents is carried out by using word tokens which represent words appearing in a document. Elimination of certain unimportant word tokens is first completed, after which the remaining words of the document are ranked according to their word token appearance rates. These rates are then used to highlight frequently appearing words in the document which indicate the document's topic. The document can also be categorized using document profiles developed from the word tokens.