Federated Learning and Differential Privacy

This piece is part of a series on 2019 trends in the AI and Machine Learning industry. You can read my full thoughts on the past year in this summary I wrote for the Helixa blog, which also includes links to the other in-depth pieces in this series.


“Federated Learning” is a new term for many of us, and it looks like the dawn of a new AI epoch.

Federated Learning refers to machine learning (ML) techniques that can train algorithms across multiple decentralized machines holding different local data samples — all without exchanging data. This new approach is different from traditional centralized or distributed training because there is no assumption that the local data samples are identically distributed. 

Federated Learning aims to mitigate the problem of Data Gravity, defined by Dave McCrory as “the ability of bodies of data to attract applications, services, and other data.”

In order to scale with the radical increase in devices, we need to move computation closer to the data generation. 

Federated Learning is about centralizing models on decentralized data.

The major use case is edge computing, where issues of data privacy, security, and network traffic make it expensive and difficult to quickly collect and process data in the cloud. This is especially relevant to IoT devices and smartphones.

The TensorFlow team has recently released TFF, an extension to support Federated Learning natively into TensorFlow.

Alex Ingerman, Product Manager at Google, presented an interesting use case of TFF: Federated Learning for Mobile Keyboard Prediction. GoogleBoard was trained this way, with typing data from millions of smartphone devices, in order to improve keyboard prediction without infringing on the privacy consumers expect with their personal conversations.

For an easy to understand visualization of how Federated Learning works in practice, you can read this online comic illustration by Google AI.

The main privacy principles to respect are:

  1. Only access aggregated anonymized reports from devices (e.g. model updates), no raw data
  2. Use focused collection, which only reports the minimum needed
  3. Never persist per-device report
  4. Utilize federated model averaging to generate a global model
  5. Don’t memorize individual reports during training (overfitting on a single data sample)

Here is where Differential Privacy (DP) comes in place. DP relates to those systems for publicly sharing information about a dataset, describing patterns of groups within the dataset while withholding information about individuals. An algorithm is differentially private if it is robust to Membership Inference Attacks, which means that it is impossible for an external observer to tell from the output of the model whether a particular individual’s information was used in the computation.

In the specific case of training machine learning models on different data sets, DP mostly consists of two injections:

  • Adding noise to the model parameters
  • Clipping the maximum model parameters updates (helpful for privacy principle #5)

In addition to the primary use case of edge devices, Federated Learning and Differential Privacy can boost model performances for the case where different clients want to collaborate without sharing raw data with each other. An example is a framework developed by Georgian Partners and Bluecore Technologies leveraging the findings explained in Bolt-on differential privacy for scalable stochastic gradient descent-based analytics.

When you consider recent privacy regulations like GDPR and increased public awareness of data privacy issues, federated learning has the potential to sidestep one of the largest problems in the industry: to help society to securely cooperate for the common good. 

Read this use case of Federated Learning applied to Cancer Research.


For curious, AI-focused professionals who want to innovate responsibly, Helixa derives complex human insights through ethical and intentional machine learning. We do this more effectively than anyone else by aggregating multiple data sources, prioritizing consumer privacy, and returning results in seconds. Visit www.helixa.ai to learn more.

Ethics and Responsible AI

This piece is part of a series on 2019 trends in the AI and Machine Learning industry. You can read my full thoughts on the past year in this summary I wrote for the Helixa blog, which also includes links to the other in-depth pieces in this series.


2019 was the year of the ethical considerations in the AI industry, and we need to continue prioritizing this conversation as we move forward and scale the technology further. 

Many different companies have used keynotes and other outlets to promote the importance of responsible AI and aligning with established ethical practices.

The idea is to have an AI system we can trust. That means ensuring fairness, robustness, explainability, transparency, just to name a few.

Ariadna Llitjós, Director of Engineering at Twitter Cortex ML Platform, highlighted a few initiatives of awareness and guidelines. Specifically, the Trusting AI toolkit by IBM and the Institute for Ethical AI & Machine Learning. The latter is formed by volunteers who have collectively defined the 8 principles of Responsible ML development listed in the following snapshot:

Considering the quick pace of advancement for deep learning solutions and their effects on society, the biggest attention is drawn to making models’ prediction explainable.

The institute also provides an open-source eXplainability framework containing a bunch of useful libraries and tools for bias evaluation and model explainability.

In addition to the tools available to Data Scientists and Engineers, it is important to make the right ethical decisions during the design-thinking phase. Specifically, we can incorporate ethical decisions during the user research process. Tools like Empathy Map, Stakeholder Map and Inclusive Panda can be useful for identifying critical points and the ways people would be affected by the products we design.

We also need to consider the principles that should be considered when defining the reward function for a Machine Learning algorithm, especially for Reinforcement Learning (RL). Emily Webber, ML Solutions Architect at Amazon, gave us an interesting example of how to use RL to define public policies. She showed that defining policy requires always a trade-off between who will benefit from the policy and who will be negatively affected. 

The major philosophical foundations can be divided into 4 principles:

UtilitarianismEgalitarianismKantian RightsLibertarianism
Do whatever increases overall utilityDo what increases overall equityUphold human rightsPreserve freedom

In addition to those, we could also consider the Pareto Improvements principle that strives to improve at least one person without making anyone else worse off. All of these are perfectly valid principles, but it can be difficult to choose one to apply in all practical situations.

The challenge for AI technologists and strategists will be to build systems that are able to take different points of view into account, weigh the pros and cons based on specific criteria (e.g. population size, or cost-benefit analysis), transparently communicate and collaborate with all of the involved actors, and react in a timely manner to accommodate changes of the society needs (governments, companies, and people). 

Design Ethically provides a great set of resources on how to incorporate ethical decision-making in tech.

For curious, AI-focused professionals who want to innovate responsibly, Helixa derives complex human insights through ethical and intentional machine learning. We do this more effectively than anyone else by aggregating multiple data sources, prioritizing consumer privacy, and returning results in seconds. Visit www.helixa.ai to learn more.

Latent Panelists Affinities: a Social Science case study

As part of the IBM PartyCloud happened in Milan on 20th September 2018, I gave a talk “A Journey into Data Science & AI” presenting a case study about estimating Panelists Latent Affinities. I showed the components to develop an intelligent social agent able to classify entities and estimate latent affinities. The session also covered good practices and common challenges faced by R&D organizations dealing with Machine Learning products.

If you would like to discuss about how AI technologies can be applied to social science, get in touch!

Deep Time-to-Failure: Predictive maintenance using RNNs and Weibull distributions

I published on GitHub a tutorial on how to implement an algorithm for predictive maintenance using survival analysis theory and gated Recurrent Neural Networks in Keras.

The tutorial is divided into:

  1. Fitting survival distributions and regression survival models using lifelines.
  2. Predicting the distribution of future time-to-failure using raw time-series of covariates as input of a Recurrent Neural Network in keras.

The second part is an extension of the wtte-rnn framework developed by @ragulpr. The original work focused on time-to-event models for churn predictions while we will focus on the time-to-failure variant.

In a time-to-failure model the single sequence will always end with the failure event while in a time-to-event model each sequence will contain multiple target events and the goal is to estimating when the next event will happen. This small simplification allows us to train a RNN of arbitrary lengths to predict only a fixed event in time.

The tutorial is a also a re-adaptation of the work done by @daynebatten on predicting run to failure time of jet engines.

The approach can be used to predict failures of any component in many other application domains or, in general, to predict any time to an event that determines the end of the sequence of observations. Thus, any model predicting a single target time event.

You can find the rest of the tutorial at https://github.com/gm-spacagna/deep-ttf/.

UPDATE (2018-12-03): There is also a presentation given as part of the Data Science Milan meetup during the IBM PartyCloud 2018:

Deep time-to-failure: predicting failures, churns and customer lifetime with RNN

Anomaly Detection using Deep Auto-Encoders

One of the determinants for a good anomaly detector is finding smart data representations that can easily evince deviations from the normal distribution. Traditional supervised approaches would require a strong assumption about what is normal and what not plus a non negligible effort in labeling the training dataset. Deep auto-encoders work very well in learning high-level abstractions and non-linear relationships of the data without requiring data labels. In this talk we will review a few popular techniques used in shallow machine learning and propose two semi-supervised approaches for novelty detection: one based on reconstruction error and another based on lower-dimensional feature compression.

Demystifying Data Science in the industry

On June the 7th I had a quick introductory talk at AssoLombarda in Milan regarding the role of Data Scientist into the 4th industrial revolution.

My presentation is an introduction to what Data Science in the industry is and what is not.

If you would like to know more about the Data Science Milan community visit www.datasciencemilan.org

In-Memory Logical Data Warehouse for accelerating Machine Learning Pipelines on top of Spark and Alluxio


Legacy enterprise architectures still rely on relational data warehouse and require moving and syncing with the so-called “Data Lake” where raw data is stored and periodically ingested into a distributed file system such as HDFS.

Moreover, there are a number of use cases where you might want to avoid storing data on the development cluster disks, such as for regulations or reducing latency, in which case Alluxio (previously known as Tachyon) can make this data available in-memory and shared among multiple applications.

We propose an Agile workflow by combining Spark, Scala, DataFrame (and the recent DataSet API), JDBC, Parquet, Kryo and Alluxio to create a scalable, in-memory, reactive stack to explore data directly from source and develop high quality machine learning pipelines that can then be deployed straight into production.

In this talk we will:

* Present how to load raw data from an RDBMS and use Spark to make it available as a DataSet

* Explain the iterative exploratory process and advantages of adopting functional programming

* Make a crucial analysis on the issues faced with the existing methodology

* Show how to deploy Alluxio and how it greatly improved the existing workflow by providing the desired in-memory solution and by decreasing the loading time from hours to seconds

* Discuss some future improvements to the overall architecture

Original meetup event: http://www.meetup.com/Alluxio/events/233453125/

The Barclays Data Science Hackathon: Building Retail Recommender Systems based on Customer Shopping Behaviour

From Data Science Milan meetup event:

In the depths of the last cold, wet British winter, the Advanced Data Analytics team from Barclays escaped to a villa on Lanzarote, Canary Islands, for a one week hackathon where they collaboratively developed a recommendation system on top of Apache Spark. The contest consisted on using Bristol customer shopping behaviour data to make personalised recommendations in a sort of Kaggle-like competition where each team’s goal was to build an MVP and then repeatedly iterate on it using common interfaces defined by a specifically built framework.
The talk will cover:

• How to rapidly prototype in Spark (via the native Scala API) on your laptop and magically scale to a production cluster without huge re-engineering effort.

• The benefits of doing type-safe ETLs representing data in hybrid, and possibly nested, structures like case classes.

• Enhanced collaboration and fair performance comparison by sharing ad-hoc APIs plugged into a common evaluation framework.

• The co-existence of machine learning models available in MLlib and domain-specific bespoke algorithms implemented from scratch.

• A showcase of different families of recommender models (business-to-business similarity, customer-to-customer similarity, matrix factorisation, random forest and ensembling techniques).

• How Scala (and functional programming) helped our cause.

Surfing and Coding in Lanzarote, the Barclays Data Science hackathon

This post has been published on the Cloudera blog and summurises the results and takeaways of a week-long hackathon happened in Lanzarote in December 2015. The goal was to prototype a recommender systems for retail customers of shops in Bristol in Bristol, UK. The article shows how the stack composed by Scala and Spark was great for quickly writing some prototyping code to run locally in a single laptop and at the same time scalable for larger dataset to process in the cluster.

man with laptop on colorful beach of island

Please continue reading at http://blog.cloudera.com/blog/2016/05/the-barclays-data-science-hackathon-using-apache-spark-and-scala-for-rapid-prototyping/.

Robust and declarative machine learning pipelines for predictive buying

Proof of concept of how to use Scala, Spark and the recent library Sparkz for building production quality machine learning pipelines for predicting buyers of financial products.

The pipelines are implemented through custom declarative APIs that gives us greater control, transparency and testability of the whole process.

The example followed the validation and evaluation principles as defined in The Data Science Manifesto available in beta at http://www.datasciencemanifesto.org