Design and Implementation of an Index Structure to support Semantic Search
The goal of this thesis is the design and implementation of a composite index structure which supports efficient search over heterogeneous collections of disparate data. Such an index structure should be oriented towards supporting search on semantic classes; the latter are extracted from a knowledge base.
Dataspaces are composed of heterogeneous data sources: structured, unstructured and partially structured. Heterogeneity increases the complexity of user interaction with a dataspace, and users may not be at ease full-filing their information need in such an environment. The quality of information coming from different sources plays an important role in the context of dataspaces, imposing difficult challenges for the query processing infrastructure.
We are aiming at building an efficient index structure, supporting indexing for a very large collections of documents. Such disk-resident index structure should support efficient block oriented data organization.
Good programming skills in Java or C, C++
Willingness to read research papers
Strong background in relational databases
Knowledge of query optimization and compression techniques is a plus