Using Background Knowledge in Schema Matching and Ontology Alignment
| Thesis type |
|
|---|---|
| Student | Pratanu Roy |
| Status | Finished |
| Submitted in | 2010 |
| Proposal on | 27. Apr 2010 17:15 |
| Proposal room | Seminarraum I5 |
| Add proposal to calendar |
|
| Presentation on | 08. Dec 2010 17:00 |
| Presentation room | Seminarraum I5 |
| Add presentation to calendar |
|
| Supervisor(s) |
|
| Advisor(s) |
Schema matchers use various information in order to identify correspondences in two schemas. This information can include labels and structural elements which are present in the schema as well as instance data. However, humans doing schema matching (or ontology alignment) manually use often much more information: in addition to domain-specific knowledge and experience, they might also have access to external documentation, thesauri, ontologies, or previous match results. The goal of this thesis is to develop schema matching strategies, which access external background knowledge and apply it in a schema matching/ontology alignment task to improve the quality of the result.
The use of background knowledge in schema matching is important as many matching tasks can be only solved with the use of additional background knowledge. For example, in the medical domain, thesauri and taxonomies are available which define concepts and terms in this domain. If two ontologies of this domain have to be aligned, using such information is required in order to identify similar elements as simpler techniques such as string matching and structural comparison are not sufficient. Furthermore, there is a lot of information available which could be used to improve matching results, however, there is currently no tool that can make use this information in a structured and organized way.
Therefore, the goal of this thesis is to develop a matcher which uses background information to identify correspondences in two models. This background information can include:
- (Domain-specific) ontologies, taxonomies, thesauri,
- Existing match results (i.e., correspondences or alignments),
- External documentation about the schemas or ontologies to be matched.
A first objective in this thesis is to identify possible source for background knowledge and to make that available in a structured way. In a second step, this structured information should be exploited by a matcher to identify correspondences of the models.
The matcher should be integrated into the schema matching framework of GeRoMeSuite. GeRoMeSuite is a generic model management system which includes already several schema matchers and provides the functionality to combine matchers to more complex matching strategies in a flexible manner. The matcher developed in this thesis should be integrated into such a strategy. The evaluation should show, that the matching strategy using background knowledge performs significantly better than strategies using no background knowledge.
Prerequisites
Knowledge in data modeling (relational data models and XML Schema) and programming skills in Java are required. Additional background in database systems, knowledge representation, or logic programming is helpful.

