Tag Archives: Functional Programming
This post has been published on the Cloudera blog and summurises the results and takeaways of a week-long hackathon happened in Lanzarote in December 2015. The goal was to prototype a recommender systems for retail customers of shops in Bristol in Bristol, UK. The article shows how the stack composed by Scala and Spark was great for quickly writing some prototyping code to run locally in a single laptop and at the same time scalable for larger dataset to process in the cluster. Continue reading
Proof of concept of how to use Scala, Spark and the recent library Sparkz for building production quality machine learning pipelines for predicting buyers of financial products.
The pipelines are implemented through custom declarative APIs that gives us greater control, transparency and testability of the whole process.
Code should be developed in a proper IDE and make use of advanced tools for re-factoring, auto-completion, syntax highlighting and auto-formatters; at least.
Notebooks should use routine libraries from the main codebase. As soon as some code is developed in a notebook and is reusable, it should be moved into a codebase. Continue reading
ETL is probably the most time consuming part of every Data Science project. The quality of extracted and crunched data is one of the major factor affecting the final results. In facts, real world data is always messy and inconsistent. Data Validation … Continue reading
What is Spark? Six reasons why CIOs should find out (and one why they shouldn’t) – 02 Nov 2015 – Computing Analysis
via What is Spark? Six reasons why CIOs should find out (and one why they shouldn’t) – 02 Nov 2015 – Computing Analysis.