Acronym extraction system and method of identifying acronyms and extracting corresponding expansions from text
Patent 7236923 Issued on June 26, 2007. Estimated Expiration Date: August 7, 2022. Estimated Expiration Date is calculated based on simple USPTO term provisions. It does not account for terminal disclaimers, term adjustments, failure to pay maintenance fees, or other factors which might affect the term of a patent.
704/9, Natural language704/7, Storage or retrieval of data704/8, Multilingual or national language support707/5, Query augmenting and refining (e.g., inexact access)715/532, Dictionary704/10, Dictionary building, modification, or prioritization704/201For storage or transmission
An acronym expansion system of the present invention receives electronic documents and extracts acronyms and their corresponding expansions. A part-of-speech tagger decomposes text into string tokens or words and tags them with their part-of-speech, while an acronym identifier determines whether a word is a potential acronym based on various conditions. An expansion identifier retrieves lists of words preceding and following a potential acronym to search for the expansion. The resulting word lists are examined sequentially to identify and retrieve an expansion for the potential acronym. An expansion extractor receives the potential acronym and a processed word list to retrieve the expansion of the potential acronym from that list. The extractor may utilize information from prior search iterations, and verifies an extracted expansion against a set of rules to remove spurious expansions.
Other References
Larkley et al., Acrophile: an automated acronym extractor and server,2000, International Conference on Digital Libraries, Proceedings of the fifth ACM conference on Digital libraries, ACM Press, pp. 205-214.
Brill, E., “A Corpus-Based Approach to Language Learning”, Doctoral dissertation: University of Pennsylvania, 1993.
Larkey, L.S., Ogilvie, P., Price, M.A. & Tamilio B., “Acrophile: An Automated Acronym Extractor and Server”, Proceedings of the ACM Digital Libraries Conference, pp. 205-214, 2000.
Taghva, K., & Gilbreth, J., “Recognizing acronyms and their definitions”, Proceedings of the Fourth International Conference on Document Analysis and Recognition, pp. 191-198, IEEE Computer Society. 1999.
Yeates, S., “Automatic Extraction of Acronyms from Text”, Proceedings of the Third New Zealand Computer Science Research Student's Conference, pp. 117-124, 1999.
Yeates, S., Bainbridge, D., & Witten, I.H., “Using compression to identify acronyms in text”, Proceedings of the IEEE Data Compression Conference, pp. 582, IEEE Computer Society, 2000.
Yi, J., & Sundaresan, N., “Mining the Web for Acronyms Using the Duality of Patterns and Relations”, Proceedings of the ACM CIKM'99 Second Workshop on Web Information and Data Management, pp. 48-52, 1999.
Cason, Lee, “AcroWizard”, Softlookup Downloads. www.softlookup.com/30 Day Trial Software, May 7, 2000.
Dunn, Carol, “Anvil Logic Finds Technical Solution to the Growing Acronym Problem”, www.anvillogic.com/fosel.asp., Alexandria, VA.