Selection and Configuration of Schema Matchers
|Proposal on||04. Dec 2012 14:00|
|Proposal room||Seminarraum I5|
|Add proposal to calendar||
Schema matching is the task of finding correspondences between two schemas. Many schema matchers using various matching strategies have been proposed in the recent years. The selection and configuration of a schema matcher for a given matching task is still a challenging problem as it requires knowledge about the input schemas and the matchers being used. The goal of this thesis is to develop mechanisms to automatically select and configure matchers for a matching problem.
Schema matchers use a wide variety of information and methods to identify correspondences in two given schemas. A simple string comparison between the labels of schema elements is sometimes sufficient to achieve a good result. Often, additional information has to be taken into account, e.g. the schema structure, background knowledge such as thesauri, ontologies or existing mappings. Even if just a simple string matching technique is used, the string matcher has still to be configured with the right parameters (an important aspect is, for example, the tokenization of labels).
Therefore, the selection and configuration of schema matchers is a challenging problem. People experienced in schema matching and having knowledge about the schemas to be matched still have to invest some time in order to select the right methods and parameters for a given matching problem.
The goal of this thesis is to develop mechanisms to automatically select and configure matchers for a matching problem. These mechanisms should be based on the characteristics of the input schemas, such as labels, structure, and availability of (domain-specific) background knowledge. These characteristics should be represented in a feature vector, and a configuration which is known to work well for problems with similar feature vectors is chosen.
The mechanisms to select and configure schema matchers automatically should be implemented in the matching framework of our generic model management system GeRoMeSuite. GeRoMeSuite offers already various matchers, which can be combined in a very flexible and extensible matching framework. The evaluation should use data sets from existing matching tasks, such as OAEI (Ontology Alignment Evaluation Initiative) or other schema matching literature.
Knowledge in data modeling (relational data models and XML Schema) and programming skills in Java are required. Additional background in database systems, knowledge representation, or logic programming is helpful.