Skip to content. | Skip to navigation

Informatik 5
Information Systems
Prof. Dr. M. Jarke
Sections
Personal tools
You are here: Home Staff M.Eng. Rihan Hai Performance Additional performance results

Additional performance results

This page provides the full performance results of our approach.

As we have stated in the paper, compared to the correctness, rewritability and characterization of rewritable plain SO tgds, performance is less of our focus. 

Because discovering FO semantics from generated mappings as plain SO tgds is not a frequent operation. 

Thus in the paper we only included the parameter with biggest impact on performance, i.e., number of parts in a plain SO tgd. This parameter contributes the most significant impact in the performance with its increasing value. Here we provide the full performance evaluation results of our approach. 

Experimental Setting: In each of the following three section we report the performance in a different rewriting strategy: 1) rewriting the input plain SO tgds to a single solution; 2) discover all solutions; 3) rewrite the generated plain SO tgds to a single nested tgd. And in all three sections we examine the six quantitative parameters describing a plain SO tgd: number of parts, number of source relations per part, arity of source relations, number of target relations per part, number of Skolem functions per target relation and the arity of a target relations.

4) In this section, the input plain SO tgds are generated in  Relation  and Part  modes by the plain SO tgd generator.

 

Main Observations: Number of parts has the most significant influence on the increase of rewriting time.  In comparison, the rest parameters have relatively minor impact, the rewriting time increase almost sub-linearly with some parameters (e.g., number of source relation per part); and in most experiments (expect testing on number of parts), the rewriting finishes in less than 0.25 seconds. 

 

1. Below is the impact of the rest parameters on performance when we rewrite the generated plain SO tgds to a single solution.

If not specifically marked , the shared parameter values are: number of parts are 3, number of source relations is 4, arity of source relations is 5, number of target relations is 3, arity of target relation is 4, number of Skolem function is 2.

 singleSetPerformance

Note here the results of performance should be interpreted together with the rewriting rate. The three lines in each figure represent plain SO tgds generated with three types of  Skolem functions. Thus their rewriting time required is different. Second, their rewriting rate are also different (more detailed analytics in Completeness results). When all input plain SO tgds are rewritable, as in "All" modes, the algorithms finish relatively faster. 

2. Below is the performance impact of the rest parameters when we rewrite the generated plain SO tgds to multiple solutions (discover all solutions).

 multiSetPerformance

 

3. Below are the results when we rewrite the generated plain SO tgds to a single nested tgd. As each test costs much less time using the same parameters as in the previous tests, we use another set of parameter values.The shared parameter values are: number of parts are 50, number of source relations is 5, arity of target relations is 5, number of target relations is 2, arity of target relation is 4, number of Skolem function is 3 ((except in Fig. 8e arity of target relation  is set as 10).

 

 

Performance test with increasing values of individual parameters

1) Notably, in this mode we no longer observe the exponential complexity as in the previous modes. This is because the core algorithms (Nest and de-Skolemization) are polynomial algorithms with regard to the number of parts, i.e., O(m^2). But of course, as discussed in the completeness test, this mode would have a lower rewriting rate.

Since it is a trade-off between rewriting rate and performance, we recommend the mapping designers to run this mode first with a relatively complex input mapping.

2) In this mode, since we don't iterate over solutions, "All" mode usually takes slight more rewriting time than the other two mode. More specially, if a plain SO is not rewritable in "Part" or "Relation" mode, the algorithm terminates earlier. In contrast, in "All" modes all input plain SO tgds are rewritable, and it takes more steps to form the new nested tgd as output.

 

 

4. The parameter values applied in this experiment is as below (the same as completeness test):

numSrcRels=1

aritySrcRels=5

numTgtRels=1

arityTgtRels=4

numFuncs=2

skolemFuncArgumentMode= relation_random or part_random


In this experiment we are only interested in plain SO tgds that are not rewritable by our approach.  But we cannot know whether the generated plain SO tgds are rewritable in advance, since they do not have ground truth .  In this test we first generate and process the plain SO tgds, and only record the rewriting time if Alg.3 returns an empty set, i.e., the input plain SO tgd is not rewritable.

non-rewritable

Document Actions