Open Source Dynamics: Community vs. Development
Mining for knowledge within open source projects! Find out which user clusters are important for the open source success and how the user migration from periphery to the community core takes place. Compare social development of the community with dynamics of OSS development process.
Within the German Cluster project CONTICI this thesis focuses on the analysis of dynamics of different subgroups within open source communities and their influence of the project development and its success. Within this thesis the PostgreSQL community generated data should be mined for answers to a set of hypothesis. PostgreSQL presents a community of people bound by the development and maintenance of an open source object-relational database management system PostgreSQL. Like in many other open source projects the project management in PostgreSQL takes place publicly: the user communication within public mailing lists and forums, code storage and administration in public available source code repositories. The generated community data present a perfect resource for the community analysis.
The main idea of this thesis is to analyze the dynamics of social groups of OSS and compare their evolution with the dynamics of the OSS development process (new releases, number of commits and bug fixes, etc). To identify different social groups within PostgreSQL community various clustering methods have to be applied on the community communication pool. Using the dynamic social analysis the evolution of those groups over the time has to be investigated. Among others the existence of generation (almost complete exchange of community participants by new ones over the project life) previously identified within biojava, bioperl and biopython OSS have to be verified. The dynamic of the (sub-)community centralities measures (closeness, betweenness and degree centrality) over the years presents another aspect for analysis. The relationship between the centralities of all nodes of community graph can reveal much about overall network structure. Further, the fluctuation of the community members from periphery to the core of community and vice-versa can help us to understand the role of peripheral users within community evolvement.
The results of the communication data analysis should be compared to the evolution process of the developed software. This evolution can be measured in numbers of commits, new releases, reported bugs over the project life. The relation between development process and social state of the community can be used to predict the reaction of OSS communities towards changes in the development process. Moreover, the influence of different groups on the project success can be identified.
PostgreSQL consists of many different sub-projects with the very similar organization principles and related practices. This allows us to evaluate the hypotheses within different data sets. The estimated results have to be presented in an easy and intuitive way to the community members.
The thesis candidate should have knowledge on databases, Java and XML technologies, SNA and Clustering