Direkt zum Inhalt | Direkt zur Navigation

Informatik 5
Information Systems
Prof. Dr. M. Jarke
Sektionen
Benutzerspezifische Werkzeuge
Sie sind hier: Startseite Publications Entity Recognition in Information Extraction

Contact

Prof. Dr. M. Jarke
RWTH Aachen
Informatik 5
Ahornstr. 55
D-52056 Aachen
Tel +49/241/8021501
Fax +49/241/8022321

How to find us

Annual Reports

Disclaimer

Webmaster

 

 

Entity Recognition in Information Extraction

Year 2014
Abstract URL view
PDF URL view

Detecting and resolving entities is an important step in information retrieval from unstructured documents. Humans are able to recognize entities by context, but information extraction systems need to apply sophisticated algorithms to recognize an entity. The development and implementation of an entity recognition algorithm is described in this paper. The implemented system is integrated with an information extraction system that derives triples from unstructured text. By doing so, the triples are more valuable in query answering because they refer to identified entities. The system is trained to learn patterns for the occurrence of an entity. By extracting the information from Wikipedia encyclopedia, a dictionary of entities and their contexts is built. The entity recognition computes a score for context similarity which is based on cosine similarity with a tf-idf weighting scheme and the string similarity. The implemented system shows a good accuracy on Wikipedia articles. The system is not domain dependent and can be applied to recognize entities of arbitrary types.

Details

Proc. 6th Asian Conference on Intelligent Information and Database Systems (ACIIDS), Bangkok, Thailand, Springer, LNCS, Vol. 8397, pp. 113-122, 2014.

Authors

Published in

Proc. 6th Asian Conference on Intelligent Information and Database Systems (ACIIDS) .

Artikelaktionen