April 6, 2020

Shipping Disruptive ML/AI Products at Scale (Not Papers)

Denis Borovikov
Shipping Disruptive ML/AI Products at Scale (Not Papers)

The Problem

While working on deep learning products, you typically face a problem: what is the optimal way to organize the development of deep-learning models from the project management point of view? And I mean the development of a new model, not just the implementation of an existing one. Imagine a deep learning model with lots of different layers and losses. In such a scenario, you have lots of uncertainty, and there is no guarantee that after some time you are ready to ship a solution. This problem is not new: the development of traditional software products also has much uncertainty. Even though it’s typically hard to estimate the time you spend on a project, it is usually expected that the software satisfies its predetermined requirements once it’s complete. With machine learning models, the situation is significantly more complex – you don’t really know if your solution will work at all until you launch the model and observe its impact on the product.

Business vs Academia

You might say “yes, research projects are risky and that’s just life”. True. The question is, how can we manage the process in order to increase the probability of success? Some people, especially those who come from academia, tend not to acknowledge the problem. However, the business environment has different requirements from academia. If you are a deep tech startup, the success of the entire company might depend on the results of your research. You have a limited runway, and if you can’t achieve the next milestone, the game is over. You cannot take one more year, as is the case with a PhD. Also, success criteria in business is very strict. While in a research project even limited incremental knowledge can justify publishing a paper on the findings, in business the ultimate goal is to increase certain key metrics. If your findings are not able to move these metrics enough, the results are useless from a business perspective. Furthermore, business tends to prioritize results over the novelty of the solution. If you can move metrics by adopting an existing solution, it’s totally fine. Disclaimer: if you are not an Engineer but a Research Scientist, your approach would be similar to an academic.

Why Not Scrum?

There are plenty of methodologies and frameworks to organize your software development process, including Agile frameworks like Scrum, Kanban, etc. Those frameworks are well-established, and I personally have had a lot of positive experience with them when I was developing traditional software. The Agile approach is a perfect choice for projects with high uncertainty. Let’s consider Scrum.

Scrum defines roles and rituals.

Scrum defines roles and rituals. As you can see, it all starts with a project vision. But Scrum doesn’t really tell you where you get this project vision. It’s up to the product owner to come up with the vision, populate the backlog and decide on priorities. Ultimately, Scrum is a framework and not a methodology.

Again, Scrum or any other Agile product development framework is a valid choice for high-risk projects that you need to ship to the production quickly. The problem is that they define common principles of work and do not really specify how you populate your roadmap.

Traditional (Non-ML) Products

If you decide to use Scrum, you still need to decide on your product discovery framework. In the case of traditional software, you augment your Scrum process with Lean, Design Thinking or The Big Guy Tells You What To Do approaches. This picture illustrates one of the possible combinations:

Traditional (non-ML) products

Now we need to think about the discovery framework for deep learning.

Pitfalls of Machine Learning Projects

Before we specify our framework, let’s talk about common pitfalls of the development of deep learning products. This will help us to come up with a process that solves actual problems.

1. Focusing on one architecture.

Imagine you have a problem and you quickly decide on some architecture. Very often, it’s a trendy architecture or the architecture you’re comfortable with. Then you spend weeks or even months with an attempt to make it work. You run evaluations, you see that results are not great, you plan to test new hypotheses again and again, and it still might not work in the end.

Learning: consider different architecture options.

Small remark on culture here: teamwork is key. If you bring more smart creative people in the room, you will get more options.

2. Reinventing the wheel.

Sometimes you see the problem, and you start coming up with your own solution straight away. You spend a lot of time making it work. However, it can easily be the case that the model has been researched already by somebody else.

Learning: review existing relevant works.

3. Too incremental development

Some people work in a very risk-averse fashion. They start with the simplest architecture, then do the full cycle of development in order to evaluate the model. If they are not happy with the result, they pick the next candidate and so on. They could save a lot of time if they did some cost / benefit analysis of candidate models in the beginning so they could start with a more promising model.

Learning: consider different architectures and pick the most promising one.

4. Misalignment of metrics and customer expectations.

It’s not always the case that good evaluation metrics lead to the success of the product. The trap here is to only look at your synthetic metric and to not perform a reality check. It might be an a/b test, product demo, or testing on a focus group. Imagine, you develop a genre classification of songs. It can be really hard to know what level of accuracy corresponds to a user sentiment “this software is useful”.

Learning: don’t rely on offline evaluation only and perform tests in real conditions as soon as possible.

5. Using faulty evaluation

It might be tempting to jump into development and test hypotheses as soon as possible. But you can waste a lot of time if you work with a wrong dataset or use the wrong metric. Also, you can often see that people run evaluations manually by executing scattered scripts.

Learning: invest in automated end-to-end evaluation pipeline.


As we can see, some common problems can be addressed by doing preparation work at the beginning of the project. Agile makes total sense. It’s more about setting the right priorities (i.e. sprint goals) in different stages of the project, which I will dive into later.


If you work on a Data Science project, it’s very likely that you know about CRISP-DM,but it’s time to revise this popular approach. It can be a good option to address our issues related to the development of Machine Learning models. See below:

If you work on a Data Science project, it’s very likely that you know about CRISP-DM,but it’s time to revise this popular approach.

All the steps presented in this diagram make sense. However, this view is too simplistic. For instance, what is the structure of the “Modeling” milestone? The reason is that CRISP-DM is a process for predictive analysis, meaning it’s not about model development. It’s more a method that helps you to decide between several library methods (i.e. Linear Regression vs. SVM) to get a prediction.


Disclaimer: we’ll not focus on technical aspects, such as handling overfitting/underfitting. The focus is more on organizational aspects. For good resource material on technical aspects, I recommend Andrej’s Karpathy post A Recipe for Training Neural Networks. As I mentioned before, Scrum (or any other Agile framework) is a perfect choice for ML, but here we will try to come up with the approach that helps you to populate your product backlog. In other words, it’s a memo that gives you an idea of what kind of meetings/activities you need to focus on during the current sprint.

We propose to break down the entire process into two separate phases:

  1. Design
  2. Deployment

Design Phase

Roles: Business Owners, ML Engineers

As a generalized model of the design process, we can use Design Thinking. This choice should not surprise you. Even though Design Thinking is often used by user interaction designers, it can be used to describe any kind of design activities: engineering, education, management, etc. You can read more about the relationship between Design Thinking and other disciplines in this excellent article: Design Thinking: A New Foundational Science for Engineering

Simplistically, the Design Thinking process can be viewed as below:

Diamonds show a combination of divergent and convergent thinking

The diagram shows the direction from the problem to the solution. Diamonds show a combination of divergent and convergent thinking:

  • Discovery: understand the problem by asking different research questions.
  • Define: the answers to these questions will allow for a distilled understanding of the problem.
  • Develop: produce different, alternative solutions to address the problem.
  • Deliver: finally, perform various tests to find the best solution.

Now, let’s break down each step into sub-steps that are relevant for ML model development. Some steps we can borrow from CRISP-DM.


  • Business Understanding
  • Data Understanding This is the initial stage, and it’s advisable to write a project brief with your problem definition and proposed success criteria. This document can be reviewed by both technical and non-technical people, and this will ensure alignment between parties.


  • Data Preparation
  • Fitting of a Baseline Model

The interesting part here is the fitting of a baseline model. We propose to fit a baseline model to validate your discovery phase. You can produce some (possibly poor) results and present them to stakeholders. Challenges you face will give you a better idea of what kind of solutions you need to come up with. The outcome of this stage is set-up and tested evaluation pipeline with all basic components in-place: data preparation, feature extraction, model fitting, evaluation.


  • Reading Literature
  • Brainstorming Ideas

The trickiest part here is to decide on the next model to try. There is no universal recipe. However, we recommend the following:

  • Appoint a project lead who will organize all activities.
  • Arrange a meeting to decide on a model. People should be prepared, meaning they should read literature and come up with a proposal before the meeting. The purpose of the meeting is to make a decision, not to continue researching.
  • You need to find the right balance between deciding on a model that is too complex to successfully implement and deciding on a model that is too simple and doesn’t have a satisfactory performance. Sometimes you can start your project from a state-of-the-art model. Other times it’s too sophisticated, and you should instead start with a standard, well-known model that is easy to implement. Discuss it. It’s why you brought others in.


  • Prototyping
  • Evaluation

Implementation notes:

  • Implementation of the model should be robust, but it’s fine to cut corners, write duplicate code, etc. This is a prototype, not a final version. If you are happy with the results, you can refactor the code later.
  • An interesting part is the tuning of hyperparameters. Ideally, you would need to always tune hyperparameters before doing the evaluation. Unfortunately, in the case of deep learning, each fitting procedure can take minutes or even hours, which makes tuning not really affordable. Instead, you could find some sub-optimal set of parameters quickly. Typically it’s enough to understand high-level features of the model. You can focus only on the most important parameters like learning rate, regularization, model capacity. Also, I see tuning by hand to be a quite practical approach at this stage. Again, the goal is to understand the overall behaviour of the model. Fine-tuning (using, let’s say, grid search) of hyperparameters can be done in the very end.

To recap, here is the entire process:

Now, let’s break down each step into sub-steps that are relevant for ML model development. Some steps we can borrow from CRISP-DM.


Roles: Data Engineers, Software Engineers, ML Engineers

We don’t specify a special process here. It’s just a set of tasks that you need to perform to ship your model to production. Concrete steps can be different for different projects. Some common steps are:

  • Cleaning up the code
  • Configuring a training pipeline
  • Establishing monitoring
  • Implementing a serving layer (REST API, for instance)

Note that those technical tasks can be done in parallel and by different people. For instance, a Software Engineer can implement a serving layer for the model and a Data Engineer can implement a training pipeline in parallel.

In Summary

Overall the ML development can be characterized as an engineering process. It means that we can, and perhaps should refer to existing approaches. However the nature of ML project is also quite different from traditional software development. There was a big shift in software development that started in the early 2000s from traditional approaches, such as waterfall, towards so-called Agile software development. New agile techniques helped to develop innovative software products by better managing uncertainty. Extreme uncertainty of ML projects requires us to go even further and develop new methods.