Category Archives: Python

6 points to compare Python and Scala for Data Science using Apache Spark

Apache Spark is a distributed computation framework that simplifies and speeds-up the data crunching and analytics workflow for data scientists and engineers working over large datasets. It offers an unified interface for prototyping as well as building production quality application which makes it particularly suitable for an agile approach. I personally believe that Spark will inevitably become the de-facto Big Data framework for Machine Learning and Data Science.

Despite of the different opinions about Spark, let’s assume that a data science team wants to start adopting it as main technology. The choice of programming language is often a dilemma. Shall we build our models in Python or in Scala? Shall we run the exploratory analysis using the iPython notebook or iScala? Continue reading

Posted in Agile, Machine Learning, Python, Scala, Spark | Tagged , , | 13 Comments

Embedding Latin Pig into Python, the third millenium dinosaur!

What happens when a python eats a pig? Or better, when we embed a pig into a python? Well, we are not talking about real animals but about two very powerful technologies: Apache Pig and Python! In this post we … Continue reading

Posted in Big Data, Pig, Python, Software Development | Tagged , , | Leave a comment