Entity Recognition in Information Extraction
| Thesis type |
|
|---|---|
| Status | Running |
| Proposal on | 05. Feb 2013 16:45 |
| Proposal room | Seminarraum I5 |
| Add proposal to calendar |
|
| Supervisor(s) | |
| Advisor(s) |
Dataspaces are composed of heterogeneous data sources: structured, unstructured and partially structured. Heterogeneity increases the complexity of user interaction with dataspace, and users may not be at ease fullfiling their information need in such an environment. The quality of information coming from different sources plays an important role in the context of dataspaces. A prevalent problem in dataspaces is the inability to easily reconcile the information contained in heterogeneous data sources that compose a dataspace.
The goal of this thesis is to try to solve the entity recognition problem in the context of an information extraction system. The system extracts structured triples in the form of (subject, predicate, object) from unstructured text documents. To make the triples more useful, it is necessary to link the subjects and/or objects of such triples to identified entities.
Prerequisites
Good programming skills in Java
Background in information retrieval, relational databases, statistics and probability theory
Knowledge of information extraction, or natural language processing is a plus

