Dataspace Framework
| Manager(s) | |
|---|---|
| Research field | Model Management |
| Status | running |
Enabling integrated access to structured and unstructured data
Dataspaces are composed of heterogeneous data sources: structured, unstructured and partially structured. Heterogeneity increases the complexity of user interaction with a dataspace, and users may not be at ease full-filing their information need in such an environment. The quality of information coming from different sources plays an important role in the context of dataspaces, imposing difficult challenges for the query processing infrastructure.
In this project we aim at building a dataspace framework, which should offer query and search services over heterogeneous sources without upfront integration efforts, but allows for "pay-as-you-go" integration over the time. We are investigating different approaches towards improving user' interaction, data exploration and query answering over heterogeneous data sources.
At Informatik 5, we focus mainly on the following challenges and their application in the context of dataspaces:
- Answering structured and keyword queries over structured and unstructured data
- Information Extraction (IE) over natural language text
- Utilization of external knowledge-bases towards improving query services and enriching dataspace knowledge-body
- Entity reconciliation
- Probabilistic query processing over heterogeneous sources (with varying data quality, or missing data)
Research staff
Former staff
Theses
- Entity Recognition in Information Extraction (Running)
- Conjunctive Triple Queries Over Text Documents (2012)
- A Framework for Objective Interestingness Measure Selection in Association Rule Mining (2012)
- Fact extraction over the Wikipedia collection (2012)
- Queries Crossing the Structure Chasm (2011)
- Query transformation for a Dataspace system ()
- Metadata-Based Fact Extraction from Wikipedia ()
- Design and Implementation of an Index Structure to support Semantic Search ()
Publications
-
Enabling Structured Queries over Unstructured Documents
Fisnik Kastrati, Xiang Li, Christoph Quix, Mohammadreza Khelghati
Published in International Workshop on Semantic based Opportunistic Data Management (SODM 2011), in conjunction with the 12th IEEE International Conference on Mobile Data Management (MDM 2011), Lulea, Sweden, 2011

