9 basic steps the business should consider when teaming up on a new ML project
You are an innovator.
You are curious about Machine Learning.
You have an idea that could improve your business by leveraging the information hidden in a set of data you already have or could collect (or both).
As an expert in your industry and your company, you and your team are often the only people who can detect a new revenue source, implement cost reduction, or make faster decisions. While ideas in concept are just ideas, yielding tangible business value from data requires the combination of different skills from different people: business vision, mathematical skills, technological chops, and the ability to collaborate.
How to turn your idea into something of value?
Experts define Machine learning as using data to create self-modifying algorithms (AKA Models), which can “learn” to produce the desired information. The machine learning practitioner will create algorithms and then “train” them to match expectations.
Your project will involve collecting and preparing data, training mathematical algorithms, developing a digital solution to present results (from simple one-shot visualization to a dynamic dashboard or an integrated software in your systems). Along the path to a successful implementation of your idea, you will have to work with internal or external partners from various roles and skills: business expert, IT people, Machine Learning expert, customers panels, etc.
In some cases, you will have to first sell your project to a sponsor and get a greenlight and a budget for your project. To succeed with all these people involved, you have to find a common language.
I have listed 9 topics you should focus on to explain your idea and launch your project.
Here is a template you could use to refine your project, and some advice on how to fill it out.
As a business leader you spotted an opportunity to improve your business(increase revenue, lower costs, mitigate risk, etc.).
You think that you can use available or reachable data to learn new information.
In this part, you want to explain:
- The information you want to learn, for example: customer profile, sales predictions, email category, probability of mechanical failure, etc.
- The data which are available or could be collected
2. Business Value
Who will benefit from the information gained using machine learning?
- Will your customers be offered a new service?
- Will you reduce the cost of an internal process?
3. Resulting Solution
How will your company use the resulting machine learning algorithm?
- Is it a one-shot study and the result will be a report? (ex: actual customer segmentation )
- Do you need your algorithm to be used on a regular basis to produce an updated dashboard? (ex: sales predictions)
- Do you need your algorithm to be integrated into your manufacturing process? (ex: automatic routing depending on image classification)
4. Data Sources
You might use various groups of data.
For each one, list:
- Content of data
- Source of data: where was the data captured and from whom (customer purchase, mechanical sensors, twitter post, medical report…)
- Estimated amount/volume of data
- Data presentation: Is the data already structured (ex: a spreadsheet of transactions) or does it need a pre-processing (ex: extracting meaningful value from physical sensor signals, collecting text from multiple sources in different formats…)
- Legal considerations relating to the storage and processing of data
- Data Labeling
For most of the projects, before being able to learn from your data, you need to know the “true answer” for a sufficient number of examples. Among a set of financial transactions, you need to know which were proven to be fraudulent and which were proven to not be fraudulent. Data tagged with the true answer are named “labeled data”.
IS your data already labeled?
If your data are not labeled, work with your team to clean the data.
5. Information to be learned
In this part, you will explain how a machine learning practitioner will learn the desired information from available data.
What result are you targeting?
- You want to predict an outcome “ Is this mail a technical support demand or a commercial demand”, “ Is this transaction fraudulent or not ?”
- You want to predict a value: “ What is the estimated volume of sales for this article for the next week ?”, “ What is the probability of this machine to experience a major failure before the end of the year ?”
- You want to discover patterns: “is it possible to segment our customers into 5 groups with similar habits ?”
The machine learning practitioner will turn your project into a “classification project, multi-class classification project, regression project, etc. …..”. It’s OK for you to use these words if they make sense to you. But you should keep your project written in words that are understandable by all collaborators in your project.
Clear understanding of the level of reliability in the result
- We should detect at least 99.9% of fraudulent transactions
- We accept 10% of error in sales predictions for this category of products
- The solution should classify an image in less than 0.5 second
The machine learning practitioner will turn your performance objective into a mathematical definition “False-negative, F-score, Jaccard score, confidence interval,…”. Try to get from her/him a precise explanation of the measure she/he will use. This will ensure that you are on the same path.
Before starting the project, it is not possible to guarantee that the performance level for the targeted result can be obtained.
Failure to reach the desired outcome can arise from different causes :
- The information is not contained in the data.
- In this case, you will have to collect other sources of data to succeed.
- The information is contained in your data but you don’t have enough accessible data.
- In this case, you will have to find data to increase the performance level.
- The developed algorithm and its technical implementation in the IT architecture cannot provide a response within the targeted time.
- The algorithm and/or its implementation (code, architecture) will have to be reworked in the hope of reaching the desired performance level.
At the beginning of your project, the machine learning practitioner should be able to rate the feasibility of your project based on the current known state and the specifics of your environment. During the development of the project, she/he must keep you informed about performance results and explain the causes of low performance.
7. Project Stakeholders
List all teams, departments, or external partners which will be involved :
- Business teams (marketing, manufacturing, HR, …)
- They define the business problem or the business opportunity. They evaluate the solution performance.
- Data lab, machine learning practitioner …
- The machine learning practitioner transforms data, develops and trains algorithms
- IT teams
- For some projects, you need to involve your IT department to extract data from your systems or to deploy and monitor the end user application
8. Project Plan
This part is the place for a visual display of all steps required and the people involved.
It typically includes all or part of the following items :
- Defining the business problem to be solved or the business opportunity
- Collecting and exploring already existing data, collecting external data
- Exploratory Data Analysis: extracting pieces of evidence that the information you want to get is contained (even if hidden) in your data
- Machine Learning training resulting in an algorithm which can extract the desired business information
- Implementation of the end-users solution: report, or dashboard, or business software
- Don’t forget the ongoing maintenance of your solution: The performance level could decrease and must be monitored.
- ex: Sales prediction algorithms must be periodically retrained to keep up with new trends.
- ex: Changes in data sources may arise ( poor quality of new data, interruption of some data source, change in input data format…)
Depending on the type of solution described in #3 “Resulting solution”, describe here the nature of the costs: software fees, hardware, hosting, Cloud, network (IT), end user application, support. It is also worth considering the opportunity costs of the project.
When you have covered these 9 topics, you will have a solid foundation for your project (not the entire foundation as there will be gaps to customize accordingly). These topics should allow you to demonstrate how you will extract value from your data. Most importantly, a common language will avoid you getting lost in data science mysteries and will empower you to manage your machine learning project towards your business objectives. This is just a starting point and we have skipped or avoided many items that can/will come up. Good luck on your project.