Best paper award for the paper on Scoring-based DOM Content Selection with Discrete Periodicity Analysis at the International Conference on Enterprise Information Systems (ICEIS)

September 20th, 2022

Thomas Osterland and Thomas Rose successfully published a paper at the International Conference on Enterprise Information Systems (ICEIS). Project Sinlog studied the use of Distributed Ledger Technology (DLT) for the optimization of inland waterway transport as ecological competitor to trucks and trains. Inland waterway transportation has to be considered as highly competitive transportation mean for containers as well as bulk load such as grain or coil along the transportation highways in Germany. Just to give an example, the transport of a container from Bonn to Rotterdam by inland vessel requires approximately five liters of gasoline compared to 35 l consumed by a truck on just 100 km. Hence, inland waterway transport is an ecological gain. However, the question arises how to capitalize on this sustainability asset? Digitalization surfaces certainly as a challenge for competitiveness: how to digitalize transport chains in inland logistics and how to authenticate digital information and processes? Project Sinlog developed a simulation workbench for assessing different alternatives for the utilization of Distributed Ledger Technology (DLT). One important challenge for these simulations is the capture of event data as baseline information.

The comprehensive analysis of large data volumes forms the shape of the future. It enables decision-making based on empiric evidence instead of expert experience and its utilization for the training of machine learning models enables new use cases in image recognition, speech analysis or regression and classification. One problem with data is, that it is often not readily available in aggregated form. Instead, it is necessary to search the web for information and elaborately mine websites for specific data. This paper presents an approach for the derivation of such baseline information. Web scraping is one approach. In this paper we present an interactive, scoring based approach for the scraping of specific information from websites. We propose a scoring function, that enables the adaption of threshold values to select specific sets of data.

Hence, our approach allows one to generate data sets from the web that can be used for various purposes. Although data from public sector information like governmental statistical sources might be an attractive source of information, their limitations in scope are known. The web might become a representative source.