Thank you for Subscribing to CIO Applications Weekly Brief

How to choose between AutoML and Custom Model?
Ashley Wan, Head of Business Analytics - Asia Pacific, Richemont


Ashley Wan, Head of Business Analytics - Asia Pacific, Richemont
Nowadays, many organisations have started or planning to start investing in their data analytical capabilities, to maximise the power of data and become data-driven. Apart from the buzzword of big data, cloud computing, agile business intelligence so on and so forth, data science would be one of the hottest topics everyone is discussing and interested in developing.
In the past, AutoML has always been labelled as “Blackbox”, meaning users do not know what exactly the techniques and algorithms have been applied in the AutoML model and ended up with the results that users do not trust and feel uncomfortable presenting to business. Most organisations have rather hired data scientists to build their custom models to achieve their data science plan. However, it is very inefficient in terms of time consumption and complexity, not to mention any additional effort to spend on data cleansing and preparation.
On the one hand, AutoML is always a fast and simple way to test models with no coding required, on the other hand, the custom model is time-consuming and required strong data science skillsets to build.
Custom AutoML System
AutoML concept has been completely changed in recent few years. The “BlackBox” is opened and becomes more transparent, powerful and flexible for the data scientist to control the AutoML methods.
Most of the data science platform leaders have already owned such capabilities. The data scientist can pick and control how the AutoML works,simply choose the top-performing model or use domain expertise to guide the platform to pick the best-fit model. It saves data scientists a lot of time, and surely to well handles the increasing demand for data science in every organisation.
The algorithm of AutoML is getting smarter and more powerful every single day. It may not even achieve a significant difference in accuracy from the custom model. Nevertheless, things get complicated when a dataset is not well organised and cleansed. AutoML today is still not fully flexible to handle “dirty” datasets.
“The data scientist can pick and control how the AutoML works, simply choose the top-performing model or use domain expertise to guide the platform to pick the best-fit model.”
Instead of waiting for a better AutoML and going backwards to a custom model, why not focus on feeding the AutoML with the “workable” dataset, by leveraging ETL and the data lake concept? Given the increasing needs and importance of data science, it is crucial to build this capability as soon as possible, to gain competitive advantages.
Cloud AutoML on Data Lake
Custom AutoML on data lake can be very powerful. It combines several benefits such as cloud computing in terms of processing power, a data lake to store structured and unstructured data at any scale, and a flexible AutoML model system.
Leveraging the specific data cleansing or data governance tool on the data lake, the organisation can make full use of AutoML, with a great benefit on development time, compared to the custom model. Moreover, AutoML allows organisations to implement their data science model into their operations, provide meaningful predictions by re-training the model with the latest data regularly, as well as to introduce a “feedback loop” to boost performance.
The organisation which is planning to develop its data science capability, ought to consider which is its best starting plan based on the initial investment and data complexity.
I agree We use cookies on this website to enhance your user experience. By clicking any link on this page you are giving your consent for us to set cookies. More info