Category Archives: Python
Apache Spark is a distributed computation framework that simplifies and speeds-up the data crunching and analytics workflow for data scientists and engineers working over large datasets. It offers an unified interface for prototyping as well as building production quality application which makes it particularly suitable for an agile approach. I personally believe that Spark will inevitably become the de-facto Big Data framework for Machine Learning and Data Science.
Despite of the different opinions about Spark, let’s assume that a data science team wants to start adopting it as main technology. The choice of programming language is often a dilemma. Shall we build our models in Python or in Scala? Shall we run the exploratory analysis using the iPython notebook or iScala? Continue reading
What happens when a python eats a pig? Or better, when we embed a pig into a python? Well, we are not talking about real animals but about two very powerful technologies: Apache Pig and Python! In this post we … Continue reading