A Distributed Genetic Evolutionary Tuning for Data Clustering: Part 1

A Distributed Genetic Evolutionary Tuning for Data Clustering: Part 1

This was my original post that was published on the AgilOne blog on June 2013 about the developed framework for self-tuning of data clustering algorithms.

In order for any data analytics service provider to high margin sustainable business has to deal with scalability, multi-tenancy and self-adaptability. Machine learning is a very powerful instrument for Big Data applications but a bad choice of algorithm can lead to poor results of the intended analysis. One way to mitigate this is to automate the tuning process. Such as tuning process should not require a priori knowledge of the data and without human intervention. As a Big Data Engineer at AgilOne, I worked on solutions for the self-tuning open problem. The work led to the development of TunUp: A Distributed Cloud-based Genetic Evolutionary Tuning for Data Clustering. The result was a solution that automatically evaluates and tunes data clustering algorithms, so that clustering-based analytics services can self-adapt and scale in a cost-efficient manner. Evaluating clusters For the initial work we choose K-Means as our clustering algorithm. K-Means is a simple but popular algorithm, widely used in many data mining applications.

TunUp is open-source and available at his GitHub page: https://github.com/gm-spacagna/tunup

The original report is available at: http://www.academia.edu/5082681/TunUp_A_Distributed_Cloud-based_Genetic_Evolutionary_Tuning_for_Data_Clustering