Thank you for Subscribing to CIO Applications Weekly Brief
Quo Vadis Data Science?
Aleksandar Lazarevic, VP of Advanced Analytics & Data Engineering, Stanley Black & Decker, Inc. [nyse: SWK]
The most recent excitement about AI due to increase in computational power and the successful application of deep learning in computer vision and natural language processing has made explaining my profession much easier. It is no surprise that all the other aforementioned data science areas (let’s use this term in the lack of better word) also received a lot of attention in recent years, so people could finally relate to some things I was saying.
At the same time, this created a great deal of noise when lots of people started to discuss these areas without fully realizing what they really entail. You can find proliferation of articles, online courses, Master programs, job postings, news, videos about all these fields. You can’t go to any major conference in any field without hearing about Big Data, Machine Learning or AI from a keynote or other speakers.There were even a few movies that promoted popularity of these fields as well, with MoneyBall, Her, ExMachina, Margin Call, The Big Short probably being the most popular. So, are we in a bubble?it is nowhere near as bad as housing or the Internet bubble not so long ago, since you still cannot hear a cashier at the supermarket or a gas station attendant telling you it is a smart investment to startan analytics company or invest in a cloud provider. We are probably in the fifth inning of the bubble, and we all know that most of the money is made in the later stages of these bubbles.
Despite all the hysteria, there is unfortunately no standard what data science actually is and what responsibilities a data scientist’s job should include. If you go to Linkedin, you will find over 1 million of people in a data science related profession, but their backgrounds, skills and responsibilities differ quite a bit. Variety among job postings for the same level of data science related professions is even more prominent. Furthermore, there are currently over 250 programs in US that offer graduate degrees in data science or analytics but their curricula are quite different.
Feeling all the publicity and hype about data science as well as the peer pressure or expectations imposed by Wall Street and shareholders to invest in these technologies, executives started to talk about the value that could come out of Big Data technologies and they jumped on this bandwagon without fully recognizing what it takes to run successful analytics.
Although some executives exercise all due diligence and try to properly educate themselves about the world of AI/Machine Learning/Data Science and other Big Data technologies, most of them read only a few HBR, Forbes, WSJ articles and they start believing they understand what it all means. Unfortunately, the truth is quite opposite.
As priorities and focus changes, the ability to leverage these habits should help your analytics move faster through the change curve
Surrounded by all these factors, executives typically start by trying to hire a few experienced data scientists from Big data-driven tech companies and a lot of people with PhD / Master degrees in technical fields.However, due to a growing gap between demand for these people and actual number of professionals in these fields, they may simply not get the best people. According to a few recent studies by Quantum Crunch, KPMG, McKinsey and IBM, Big Data Analytics skills are scarcest to find and there is estimated current shortage of 150K data science jobs.
Another big problem is huge difference between data driven culture in big tech companies and the rest of the pack. While Big-Tech companies have data in their blood and most of their employees are analytics savvy, the remaining companies are simply not in the good position to successfully leverage data they have due to several challenges:
1. their data typically sits is multiple legacy systems and its quality is simply not satisfactory for running analytics
2. they do not have proper infrastructure and tools to support computationally expensive big data analytics needs
3. their business divisions sit in silos, do not have proper analytics understanding and are not ready to adopt data science and go through challenging change management process
4. there is a big mismatch between what business wants and what analytics teams are trying to do and what academia taught them
5. there is lack of planning and criteria to successfully identify analytics use cases
As a result, significant portion of data science or Big Data analytics projects fail (according to some surveys it is high as 85%) and many data scientists often find themselves frustrated with the data they deal with as well as with lack of the impact they wanted to create when they took the job.
So, at this point you are probably asking “What is then the right process to structure data science / analytics organizations and successfully derive the value from them?” Although the topic itself also brings a lot of controversiesand may require an additional longer article, I would like to briefly offer some good practices that are worth considering (in my humble opinion, there are not best practices yet).To address five challenges mentioned above, at least the following five highly inter-connected teamshave to exist:
• Data platform team responsible for Big Data infrastructure including but not limited to data ingestion, data management, data governance, data quality, data architecture and deployment of analytics solutions. This team serves as abridge between IT and Data Science organizations and could sit within IT as well.
• Product management team accountable for engaging with business partners or clients, understanding business problems, identifying potential analytics use cases and specifying business requirements. This team serves as a link between business and data science and often plays the role of analytics translator or data storyteller.
• Data engineering team performs data wrangling, data preprocessing and building data pipelines. This team works closely with the data platform team and the data science team.
• Data Science team accountable for data exploration, identifying analytics use cases (in coordination with product managers) and designing analytical solutions, building initial analytical prototypes, developing full-scale analytics products and deploying them in coordination with the data platform team.
• Value acceleration team ensures successful business adoption of deployed analytics products, creating and implementing key performance indicators (KPI) that will quantify and track the value created, proper change management on the business side and advocating for embedding new skills required to support developed analytical solutions.
Figure below illustrates six phases of data science lifecycle and how these teams collaborate. It is extremely important to mention that you need to start simple with a few not data science heavy quick wins, and you need to create and maintain a high synergy among all these teams. The creation of these teams has come as a result of lessons learned by non-data driven companies when forming and scaling up the analytics organizations and it represents a major shift from a previous model where data science teams were highly technical and partially detached from the rest of the company.
At the end, regardless which strategy of building a data science organization you choose, keep in mind that you need to insist on building a strong commitment for these analytics initiatives from the business, as this is the most critical factor in ensuring the overall success of analytics organization.
Good luck in creating the value by solving business problems using data science!