Evolution of Data Engineering & Software Development | Blog

The rise of data engineering in the domain of software engineering

The impact of the latest wave of artificial intelligence innovations is undeniable. Hype aside, surveys such as McKinsey’s ‘Open Source technology in the age of AI’ confirm pervasive usage of AI technology with more than half of respondents making use of models, tools and incorporating prompt handling into their applications.

Usage is highlighted as with training costs in the billions of dollars, very few businesses will be engaged directly with the development of AI capabilities. Instead nearly all will be applying these innovations to their own products.

To understand what is happening in practice, we can regard the AI components as the processing elements in the systems analysis flow depicted below. Interesting though AI components are, they are not the subject of this article. Instead we consider the input. In the context of software and service development the input is our data. In this brave new world of AI-enabled features and services, for most businesses the critical item to consider is the fuel for the engine, the ingredients for the kitchen, the data itself.

Is data the new code?

As with nearly all Snowclones and respectful of Betteridge’s law of headlines the immediate answer is of course, no, it’s not. Better however to move beyond a literal response and consider what is really meant by the question. There are a number of secondary queries that we are implicitly encouraged to consider.

Has the role of data assumed greater importance in the sphere of software and service development?
Is the work of managing data now a much more significant part of the development process?
Is it true to say that the behaviour of software is now governed by the data resources to a much greater extent than it was in the pre ChatGPT era?

The answer here is very much yes. We are visibly surrounded by AI enabled software and what differentiates one AI based application from another is the data that it is trained with and utilises.

Examples of data-driven products

The following give a flavour of the broad reach of data-enabled applications in business today.

Music streaming: 35% of listening on Spotify followed algorithmically generated playlists according to 2023 research from yourmusic.marketing.
Sales / CRM: A decade ago, Salesforce set itself the goal of being an ‘AI-first Company’. Today the platform offers a wealth of AI-powered functionality including analytics and generative features whose creation has necessitated training and testing using large scale data resources.
Clinical notetaking software Nabla was trained using, with consent, patient data derived from 30,000 visits to specially commissioned virtual clinics.
Programmatic advertising / marketing: This has long been data driven. For a comprehensive overview, see the AdTech Book from ClearCode.
Retail: Most evident in Amazon’s use of recommendations and automated review and product summaries, data-driven features are ubiquitous in online retail.
‍Insurance: For a glimpse of the future, consider how insurers like Allianz believe AI might transform risk modelling and claim processing.

Why data engineering is critical to the future of software

In each of the above use cases we can see how data is an equal partner with code and therefore there is a competitive advantage available to those who best manage and harness the data resources they have. Unsurprisingly we have therefore in recent years seen the rise of the data engineering professional. A useful summary of the skills needed in this domain is provided by Coursera.

Acquire datasets that align with business needs
Develop algorithms to transform data into useful, actionable information
Build, test, and maintain database pipeline architectures
Collaborate with management to understand company objectives
Create new data validation methods and data analysis tools
Ensure compliance with data governance and security policies

It is interesting to note that a data engineer works with a high level of autonomy, which is what in part makes this such an attractive role. In nearly all the activities above there is considerable freedom available to the practitioner. The exception however is the final activity. Usage of data is very much conditional upon the ability to satisfy legislative and policy requirements and navigating usage challenges successfully can be the difference between an unrealised initiative and one that sees the light of day. The level of concern at the highest levels is shown by this recent Forbes article listing 20 contemporary challenges for CEOs—ethical, safe and secure usage of data surfaces in ⅓ of the areas listed.

Tonic.ai: a suite of tools for unlocking corporate data

By anonymising while still preserving the character, structure and distribution of data, the Tonic.ai product suite allows data to be transformed in such a way that it can be used to safely power initiatives that will provide competitive differentiation. The ability to successfully manage the data resources available will in part determine which businesses thrive in the contemporary economy.

Whether you need structured data de-identification for testing and development, unstructured data synthesis for AI model training, or synthetic data from scratch for new product innovation, we’re here to help. Connect with our team today to equip your developers with the data they need.

Data is the new code: the evolution of software development

Is data the new code?

Examples of data-driven products

Why data engineering is critical to the future of software

Tonic.ai: a suite of tools for unlocking corporate data

Want to make your data usable?

Related blog posts

The challenges of preparing unstructured data for Generative AI

Optimizing modern software testing strategies with better test data

Achieving test data management totality: the difference between total coverage and "close enough"

Make your sensitive data usable for testing and development.