Tag Archives: Functional Programming

Surfing and Coding in Lanzarote, the Barclays Data Science hackathon

This post has been published on the Cloudera blog and summurises the results and takeaways of a week-long hackathon happened in Lanzarote in December 2015. The goal was to prototype a recommender systems for retail customers of shops in Bristol in Bristol, UK. The article shows how the stack composed by Scala and Spark was great for quickly writing some prototyping code to run locally in a single laptop and at the same time scalable for larger dataset to process in the cluster. Continue reading

Posted in Agile, Machine Learning, Scala, Spark | Tagged , , , , , , | Leave a comment

Robust and declarative machine learning pipelines for predictive buying

Proof of concept of how to use Scala, Spark and the recent library Sparkz for building production quality machine learning pipelines for predicting buyers of financial products.

The pipelines are implemented through custom declarative APIs that gives us greater control, transparency and testability of the whole process.

The example followed the validation and evaluation principles as defined in The Data Science Manifesto available in beta at http://www.datasciencemanifesto.org Continue reading

Posted in Big Data, Classification, Machine Learning, Scala, Spark | Tagged , , , , , , | Leave a comment

Coding practices for data products development

Code should be developed in a proper IDE and make use of advanced tools for re-factoring, auto-completion, syntax highlighting and auto-formatters; at least.

Notebooks should use routine libraries from the main codebase. As soon as some code is developed in a notebook and is reusable, it should be moved into a codebase. Continue reading

Posted in Agile, Machine Learning, Software Development | Tagged , , , , | 2 Comments

Functional Data Validation using monads and applicative functors

ETL is probably the most time consuming part of every Data Science project. The quality of extracted and crunched data is one of the major¬†factor affecting the final results.¬†In facts, real world data is always messy and inconsistent. Data Validation … Continue reading

Posted in Big Data, Data Munging, Scala, Spark | Tagged , , , , , , | Leave a comment

What is Spark? Six reasons why CIOs should find out (and one why they shouldn’t) – 02 Nov 2015 – Computing Analysis

via What is Spark? Six reasons why CIOs should find out (and one why they shouldn’t) – 02 Nov 2015 – Computing Analysis.

Posted in Big Data, Scala | Tagged , | Leave a comment