Best Effort Schemaless Reference Reconciliation
Information overload is a common symptom in the Internet age nowadays. Search engines assist users to seek a "needle in a haystack". However, the evolving demand of data intensive applications now asks for not only an isolated piece of information, but also a collection of interlinked data elements. Furthermore, in order to enable machine understandability, the information needs to be structured.
Our project is a first step towards the goal of finding useful structured information from unstructured data. We aim at consolidating a collection of triples extracted from the web, so that duplicates, either explicit or implicit, are identified and merged. The responsibility of the thesis candidate is to develop a scalable approach to reconcile natural-language triples using well-established algorithms in databases and information retrieval.
For more information, see the following attachment:
ba-ref-reconciliation.pdf — PDF document, 742Kb
# Experienced in Java programming
# Good Command of English language
# Knowledge of Algorithms and Data Structures