
In 2022, the term artificial intelligence exploded in the popular lexicon, with the launch of image generation capabilities like Midjourney and chatbots based on generative language models, ChatGPT being the most popular. In popular parlance, AI now refers only to generative AI such as Midjourney and ChatGPT, though AI as a term was first coined in 1956.
Analysts are already calling for 2024 to see a dip into the trough of disillusionment with AI. I see it as a year that AI/ML gets real. I see it as the year that data science finally starts to produce the business impact we’ve been promising for more than a decade. I see the soft green shoots of a data science spring on the horizon.
on the horizon for data science teams
In 2024 we—meaning boots-on-the-ground data science and machine learning teams at smaller companies—will incorporate GenAI into production AI/ML systems and production AI/ML development in a way that gives us a substantial but incremental boost to productivity. Rather than acting as a radically transformative driver of change, GenAI will merely allow us to (finally!) build production AI/ML systems in a reasonable time frame with small teams of business-focused data scientists and machine learning engineers.
Now granted, large companies like Uber and LinkedIn have already been using data science and machine learning to optimize their businesses, and I’m sure there are some smaller companies who have succeeded as well. But many small-to-medium-sized companies have struggled to get value out of their data science teams.
data science has been a slog
I first entered data science in 2011, after finishing my PhD in statistics. With my prior experience as a software engineer and software engineering leader, my statistics knowledge, and my novelty-seeking drive to get involved in “what’s next” data science seemed the perfect field. I had worked in artificial intelligence upon first graduating from college in 1990, but that was during an AI Winter, and that was when AI had largely reached a dead end in its use of expert systems, logical processing, and knowledge bases. Data science based on machine learning and predictive modeling seemed fresh and inviting.
In 2012, Tom Davenport and DJ Patil deemed data scientist the sexiest job of the 21st century in an article in the Harvard Business Review. Having worked in the field since then, I can attest that it has been anything but sexy. Early on in my career in this space, most of my models never made it out of PowerPoint. As capabilities progressed, we saw models go into production but only after laborious manual labeling of data or use of inadequate proxy labels for supervised learning. Some or most of these models represented interesting achievements with matrix manipulation (interesting to a mathematician at least!) but did not provide the business impact we needed to see a good ROI. In some cases, the data scientists who could train a neural network themselves and were interested in the finicky experimentation required weren’t those who had time for what businesses really needed from their models (neural network or otherwise). Data science product management wasn’t a thing.1
Data science teams at small and medium-sized businesses have struggled to have business impact. We’ve tended to emphasize fancy models rather than figuring out how to make business and people’s lives better. While it’s been a lot of fun, I’m looking forward to having more business impact in the future.
yay for gen ai
Now, the extreme sophistication of GenAI capabilities, all those matrix manipulations, all the work the academic and industry researchers did to get to this point, all the GPUs and servers at our command via the APIs from the likes of OpenAI and Anthropic, the huge training datasets curated from the entire Internet, the human feedback that’s been gathered by the big AI organizations—all that—is going to finally make things (relatively) simple for small data science teams to focus on business impact rather than linear algebra.

Generative AI language models such as GPT-4 and Llama 2 provide a huge boost to our ability to get useful capabilities out using the broader AI/ML toolset, because they:
Support data scientists in adjusting the balance of their focus from PyTorch syntax, how to query their vector database, and tweaking neural network setup towards business needs2
Provide a way to process text better than anything the industry has developed before3
Allow for labeling of text without using human beings or using human beings as only final quality checks4
data science still matters
But GenAI language models alone cannot do all the work we need to produce AI/ML capabilities that improve our businesses and people’s lives.5 GenAI capabilities are useful only in concert with classical data science techniques including supervised learning using a range of algorithms, understanding of features and feature construction, and, crucially, the key scientific foundation of data science: evaluation of model performance both before deployment and during operation. You gotta understand all those things conceptually and in depth to be able to usefully tap into the power of GenAI.
As well, machine learning engineering and MLOps has progressed to the point where it will, in 2024, support the relatively simple productionization and monitoring of models that improve businesses. We have figured out how we put together feature stores, model registries, vector databases, model deployment services, and the other pieces we need.
So, I do not agree that we’re going to see disillusionment with GenAI in 2024, at least not within data science teams. I see a data science spring on the horizon, and can’t wait to see the flowers grow.
Anne Zelenka is co-founder of Incantata AI6. She also serves as Head of AI and Analytics at The Mom Project, a talent marketplace aimed at creating economic opportunity for moms and everyone else who wants a better work-life balance. These opinions are her own and do not represent those of TMP.
Shoutout to my awesome and amazing data science PM at TMP. You are the best.
Yes you still need to know how to code! And you must understand neural net architectures, machine learning algorithms, data science etc… but you don’t have to have the syntax memory of a 25-year-old and if your Docker setup won’t work ChatGPT can probably figure out what’s wrong for you
I work mainly with text data but this is true in the image, video, and audio processing spaces too
Supervised learning is where it’s at, and for that we need good labels
GenAI capabilities are useful on their own for certain tasks, e.g., if you want help writing some generic web content or rewriting your resume or quickly creating a logo for your dodgeball team
Hello to all the former supporters and followers of Incantata AI. I hope you enjoy the new incarnation of this newsletter and I am wishing you a very happy 2024.