Patent ReferencesImage processing apparatus Method and apparatus for automatic character type classification of European script documents Methods and apparatus for automatic modification of semantically significant portions of a document without document image decoding Patent #: 5384863 InventorAssigneeApplicationNo. 556436 filed on 11/09/1995US Classes:382/229, Context analysis or word recognition (e.g., character string)382/173, IMAGE SEGMENTATION382/224, Classification704/9Natural languageExaminersPrimary: Couso, Jose L.Assistant: Do, Anh Hong Attorney, Agent or FirmInternational ClassG06K 009/72AbstractHighlighting and categorization of documents is carried out by using word tokens which represent words appearing in a document. Elimination of certain unimportant word tokens is first completed, after which the remaining words of the document are ranked according to their word token appearance rates. These rates are then used to highlight frequently appearing words in the document which indicate the document's topic. The document can also be categorized using document profiles developed from the word tokens.Other References
| |