Deep Time-to-Failure: Predictive maintenance using RNNs and Weibull distributions

I published on GitHub a tutorial on how to implement an algorithm for predictive maintenance using survival analysis theory and gated Recurrent Neural Networks in Keras.

The tutorial is divided into:

  1. Fitting survival distributions and regression survival models using lifelines.
  2. Predicting the distribution of future time-to-failure using raw time-series of covariates as input of a Recurrent Neural Network in keras.

The second part is an extension of the wtte-rnn framework developed by @ragulpr. The original work focused on time-to-event models for churn predictions while we will focus on the time-to-failure variant.

In a time-to-failure model the single sequence will always end with the failure event while in a time-to-event model each sequence will contain multiple target events and the goal is to estimating when the next event will happen. This small simplification allows us to train a RNN of arbitrary lengths to predict only a fixed event in time.

The tutorial is a also a re-adaptation of the work done by @daynebatten on predicting run to failure time of jet engines.

The approach can be used to predict failures of any component in many other application domains or, in general, to predict any time to an event that determines the end of the sequence of observations. Thus, any model predicting a single target time event.

You can find the rest of the tutorial at https://github.com/gm-spacagna/deep-ttf/.

UPDATE (2018-12-03): There is also a presentation given as part of the Data Science Milan meetup during the IBM PartyCloud 2018:

Deep time-to-failure: predicting failures, churns and customer lifetime with RNN

Reasoning Under Uncertainty: Do the right thing!

Reasoning Under Uncertainty: Do the right thing!

The amount of digital data in the new era has grown exponentially in recent years and with the development of new technologies, is growing more rapidly than ever before. Simply recording data is one thing, whereas the ability to utilize it and turn it into a profit is another. Supposing we want to collect as many pieces of information as we can gather from any source, our database will be populated with a lot of sparse, unstructured, and not-explicitly-well-clear correlated data. In this essay we summarized the approach proposed in Chapter IV “Uncertain Knowledge and Representation” of the book “Artificial Intelligence: A Modern Approach” written by Russel S. and Norvig P., showing how the problem of reasoning under uncertainty is applied in data science, and in particular in the recent data revolution scenario. The proposed approach analyzes an extension of the Bayesian networks called Decisions networks that resulted to be a simple but elegant model for reasoning in presence of uncertainty.

A Distributed Genetic Evolutionary Tuning for Data Clustering: Part 1

A Distributed Genetic Evolutionary Tuning for Data Clustering: Part 1

This was my original post that was published on the AgilOne blog on June 2013 about the developed framework for self-tuning of data clustering algorithms.

In order for any data analytics service provider to high margin sustainable business has to deal with scalability, multi-tenancy and self-adaptability. Machine learning is a very powerful instrument for Big Data applications but a bad choice of algorithm can lead to poor results of the intended analysis. One way to mitigate this is to automate the tuning process. Such as tuning process should not require a priori knowledge of the data and without human intervention. As a Big Data Engineer at AgilOne, I worked on solutions for the self-tuning open problem. The work led to the development of TunUp: A Distributed Cloud-based Genetic Evolutionary Tuning for Data Clustering. The result was a solution that automatically evaluates and tunes data clustering algorithms, so that clustering-based analytics services can self-adapt and scale in a cost-efficient manner. Evaluating clusters For the initial work we choose K-Means as our clustering algorithm. K-Means is a simple but popular algorithm, widely used in many data mining applications.

TunUp is open-source and available at his GitHub page: https://github.com/gm-spacagna/tunup

The original report is available at: http://www.academia.edu/5082681/TunUp_A_Distributed_Cloud-based_Genetic_Evolutionary_Tuning_for_Data_Clustering