Validation and Ranking of Schema Matching Results
|Presentation on||15. Oct 2009 15:15|
|Presentation room||Seminarraum I5|
|Add presentation to calendar||
Schema matching is a basic problem in various database application domains such as schema/ontology integration, data integration/translation and recently semantic web. Given two schemas (over the same domain) the goal is to pair the elements of schemas that semantically correspond to each other. Unfortunately it has been recognized that the matching problem can be solved only in a semi-automated manner, i.e. the process of schema matching will eventually require human intervention, as correspondences produced by a schema matching tool are rather approximated results (heuristics) than guaranteed ones. Although schema matching tools perform well in some application categories in other cases they suffer from poor precision, that is, there is no perfect schema matching tool.
Yet the quality of the matching process can be improved by an automatic validation of schema matching results. For example, if in the matching process of two schemata we possess formalized information, like an ontology of the domain, we can detect inconsistencies that may arise in the matching phase.
The goal of this thesis is to develop a methodology which can be applied to validate a matching result and to improve the overall matching correctness. The procedure implementation depends on the application domain for which the schema matching should be performed and is considered twofold.
In the case of schema integration, the system should check the consistency of the two models including the set of correspondences found by performing an inconsistency test over the graph. If there is an inconsistency, then the correspondence causing the inconsistency is removed from the matching result. In the case of data translation, the system will rank different data translation mappings (executable queries) considering a number of quality features of a mapping. Such features are for example join path overlapping of the queries of a mapping, number of atomic values from the target schema being considered in a query and number of join relations in a query.
The main idea behind the approach of this thesis is to create a graph from the source and target schemata and to represent the correspondences found by the match operation as the connections between nodes of that graph. We can then inspect the graph for inconsistencies by searching through the graph for structures that have been previously identified as faulty.