Category Archives: Open Source

In-Memory Logical Data Warehouse for accelerating Machine Learning Pipelines on top of Spark and Alluxio

Abstract: Legacy enterprise architectures still rely on relational data warehouse and require moving and syncing with the so-called “Data Lake” where raw data is stored and periodically ingested into a distributed file system such as HDFS. Moreover, there are a … Continue reading

Posted in Agile, Big Data, Machine Learning, Open Source, Scala, Spark | Tagged , , , , , , , | Leave a comment

A Distributed Genetic Evolutionary Tuning for Data Clustering: Part 1 This was my original post that was published on the AgilOne blog on June 2013 about the developed framework for self-tuning of data clustering algorithms. In order for any data … Continue reading

Link | Posted on by | Tagged , , , , , | Leave a comment

Data Clustering? don’t worry about the algorithm. Introduction post of Data Clustering Tuning published on AgilOne blog on May 2013. We are constantly pushing to improve our underlying algorithms and make them as adaptive as possible. Taking a step back, … Continue reading

Link | Posted on by | Tagged , , , | Leave a comment