This is the third post of the Agile Data Science Iteration 0 series:


What we have achieved so far (see previous posts above):

  1. Rigorous definition of the business problem we are attempting to solve and why it is important
  2. Define your objective acceptance criteria
  3. Develop the validation framework (ergo, the acceptance test)
  4. Stop thinking, start googling!
  5. Gather initial dataset into proper infrastructure
  6. Initial Exploratory Data Analysis (EDA) targeted to understanding the underlying dataset

At this stage you should have a clear statement of what problem you are trying to solve and how you can objectively measure the quality of any possible solution regardless of what the final implementation will be. You have an initial background of state of the arts and an understanding of what the data look like.  You can now implement your first simple solution to the problem.

The Basic Solution

7. Define and quickly develop the simplest solution to the problem

The challenge here is “Are you able to implement a basic solution that solve the end-to-end goal (not necessary with the required quality) in a few days?”.

Before trying to think of very scalable algorithms or advanced modelling techniques, have you thought about a simples rules classifier? What if you can easily predict if an user is about to default his bank loan by simply looking at difference between how much is his earning and spending in the past 3 months and come out with a rule like “if that amount is less than X than the user is very likely to default”.

Maybe you spent 3 days of analysis and 2 days of development and you solved your problem even before to start it! Or maybe is not good enough but now you have got a basis to compare when you analyse the risk of continuing in this project at each iteration by comparing with what could be achieved in just 5 days of work.

8. Release/demo first basic solution

What would be more agile than releasing and demoing the basic solution? Never under-estimate the value of feedback and how your mind can focus on the next big thing given that everything done so far is checkpointed and reviewed.

Now you have got a quick and simple solution that does try to solve the business goal even if it might be not accurate yet. It could, but not necessary,  is your MVP. That depends on if the acceptance criteria are fulfilled or not.
What is important is that you have spent just a few days and you have got something to deliver and demoing. This will give you the following benefits:

  • trustiness with your stakeholder that you can deliver quickly
  • first set of feedback
  • inspiration for further improvements
  • baseline for comparison

You have all of the knowledge to start preparing your solution proposal.

The Proposal Preparation

9. Research of background for ways to improve the basic solution

Now you clearly now what goal you want to achieve, what minimum requirements to meet, what the data looks like, what basic solution to compare to. This is the right time for doing some deeper research of better ways of solving the problem by using more advanced techniques, domain specific knowledge and/or additional datasets.

10. Gather additional data into proper infrastructure (if required)

Like 5) but only if additional data is required by the current proposal.

11. Ad-hoc Exploratory Data Analysis (EDA)

At this stage the EDA is targeting explicitly to extracting knowledge related to the improved solution to propose.

12. Propose better solution minimising potential risks and marginal gain

Because you know have a comparison baseline, you should prefer quantifying the incremental benefit of your model rather than its absolute evaluation and try to trade off the additional complexity with the potential value gain.

At this stage you should most of the requirements defined. You have probably changed your mind different times as you were researching about the problem and re-scoping it into smaller problems. You now know what has to be implemented for your first MVP.


Details of how to structure your ETL will follow on the next post of the “Agile Data Science Iteration 0” series, stay tuned.
Meanwhile, why not sharing or commenting below?

The Initial Investigation << prev | next >> The ETL