Skip to content. | Skip to navigation

Personal tools
You are here: Home Publications Entity Recognition in Information Extraction


Prof. Dr. S. Decker
RWTH Aachen
Informatik 5
Ahornstr. 55
D-52056 Aachen
Tel +49/241/8021501
Fax +49/241/8022321

How to find us

Annual Reports





Entity Recognition in Information Extraction

Year 2014
Abstract URL view
PDF URL view

Detecting and resolving entities is an important step in information retrieval from unstructured documents. Humans are able to recognize entities by context, but information extraction systems need to apply sophisticated algorithms to recognize an entity. The development and implementation of an entity recognition algorithm is described in this paper. The implemented system is integrated with an information extraction system that derives triples from unstructured text. By doing so, the triples are more valuable in query answering because they refer to identified entities. The system is trained to learn patterns for the occurrence of an entity. By extracting the information from Wikipedia encyclopedia, a dictionary of entities and their contexts is built. The entity recognition computes a score for context similarity which is based on cosine similarity with a tf-idf weighting scheme and the string similarity. The implemented system shows a good accuracy on Wikipedia articles. The system is not domain dependent and can be applied to recognize entities of arbitrary types.


Proc. 6th Asian Conference on Intelligent Information and Database Systems (ACIIDS), Bangkok, Thailand, Springer, LNCS, Vol. 8397, pp. 113-122, 2014.


Published in

Proc. 6th Asian Conference on Intelligent Information and Database Systems (ACIIDS) .

Document Actions