DataOps vs DevOps: A deep dive into their practices - Blog

Baruch Sadogursky, Head of DevOps Advocacy at JFrog, recently joined our very own CTO Denis Borovikov and Engineering Lead, Seva Brekelov to discuss the impact of DevOps on software development, how DataOps can transform how companies work with data and how closely DevOps and DataOps are related. This conversation explores not only DevOps but also its intriguing counterpart, DataOps, setting the stage for an insightful discussion on DataOps vs DevOps. This is the first post in a series summarizing their conversation.

DevOps is one of the biggest transformations in the IT industry ever. Even if it wasn’t the fastest, it was very, very successful and has taken the industry by the storm. DevOps is a set of collaborative practices and a culture of collaboration between the Dev part (the developers), the Ops part (the infrastructure) and the production, the "Build" parts of the engineering organization. Exploring these collaborative practices further emphasizes the DataOps vs DevOps dialogue, as both seek to bridge gaps but in uniquely focused ways. An effort to overcome the siloed, classical organization in which the software is thrown over the fence from Dev to Ops. It evolves to include all the parts of development of software from testing, security, product, management, to all the other aspects. This is DevOps in a nutshell. As we dive deeper into DataOps vs DevOps, it becomes clear how both practices aim to integrate and streamline operations, though their methods and tools may differ significantly.

DevOps practice

The DevOps practice started over seven years ago. There’s now more attention on DevOps and DevSecOps in the data science and data engineering world than ever before. The challenges they solve are similar and also need to be solved in a similar manner.

How do you try to adapt those DevOps practices to what you are doing?

DevOps was about silos elimination. There are similar problems in working with data. And if you talk about silos and the data world, there’s typically a silo between data analysts and data engineers.

On one hand there are business oriented people with dashboards, not technically savvy. On the other hand, there are extremely technical people building very complex pipelines. In between the two groups is the silo.

For example, if a dashboard needs building, the business-oriented teams have to go to the data engineers. The data engineers might offer support, but schedule the support for next quarter, after finalizing a new data warehouse. The collaboration or lack thereof paints a clear picture of DataOps vs DevOps, especially when considering how both fields handle integration and communication between different team roles. They will have to get data from the warehouse but it is a time consuming process to finish the project. This is where the opportunity for DataOps lies.

Collaboration between data engineers and data sers

Collaboration between Data Engineers and Data Users — Adapted from Tamr

The data consumers are the users, such as product managers, data analysts, business analysts, data scientists, and ML engineers. The data has to be accessible for them in a format they know how to consume. On the other hand, you would need their input as well. What they know at the time, what they actually need and whether they can get the data that they really need.

Additionally, there is a separation of knowledge: on one side there is the business knowledge of the data consumers, who know the semantics of the data. On the other side there are the data engineers who typically do not understand the semantics of the data but are experts in the tools and this creates a large contradiction, because engineers are content with helping to build pipelines, but they don't necessarily know the meaning of the data. Various users know the meaning, but it's too complex for them to start coding SQL transformations to create all these staged and modeled views of the data.

DevOps has become an intersection of software development, quality assurance, and some technology operations.

DataOps is the intersection between data engineering, data quality, as well as data security and elements of data integration.

Very common question: what is the distinction between DataOps and DevOps for data? Why apply everything? Why do we even need a niche or new approach?

There is a distinction between DataOps and DevOps for data because the data pipeline is much longer than the development pipeline. In the development world, pipelines can be very short. In some cases it can be as short as just deploying master data to production, but in the data world that's never the case.

A data system can be viewed as a set of complex pipelines. Sometimes people use this metaphor of data fabric. How people see that data fabric is that it's a lot of data sources, data sinks and data pipelines shifting data around. You have lots of rows with a lot of different owners of the data.

And if an organization is large enough, departments such as compliance, will also be heavily involved with their own processes and policies around the data. This is why in the DataOps community people relate it to lean manufacturing.

The methodological differences between these practices are central to understanding DataOps vs DevOps, particularly how they approach process optimization and team dynamics. The DataOps community places emphasis on techniques such as lean statistical process control. The work is seen as a pipeline, a real factory versus how it is seen by people in DevOps. They lean more towards extreme programming, such as test automation and similar techniques. DataOps is closer to lean manufacturing than DevOps in methodology.

Lean manufacturing, from the classical software engineering perspective, is a close metaphor to software engineering as well.

Additionally, the pipelines are evolved so that people who are more distant can see them. There are many more scenarios and roles involved in the DevOps pipeline as well, which include a variety of different aspects. Although it is not simpler, the differences are significant enough to justify the distinction. DevOps are in what is integrated, and not necessarily determined in how complex the integration is. It's not a matter of there being less circles on the dev ops diagram because many more circles can be drawn in the service diagram.

The circles are different and this is the most important difference between DevOps and DataOps. As a result there is a lengthy list for DevOps too. What really matters is that there is not a lot of overlap in terms of the skills and the teams that participate. This is the biggest difference, not necessarily the complexity.

The roles and the nature of these roles are also different. This factors into the notion that the DataOps community is agile.

In the software world, people have become bored with agile. A common view is that there is already too much of agile everywhere, and that it is entering its final stages. There is a data world, which is something new and interesting, because data projects tend to be more structured.

They tend to be organized as waterfall projects just due to the nature of the work. Data analysts make scientific insights for the business, which means that there is a need to follow some scientific process. It becomes hard to skip stages and try things quickly to see what happens. That tends to falter in elements of data science, such as doing experiments. This requires systematic working, which subsequently makes the process more staged. Additionally, the idea of a data spring is a new concept and people are still figuring out how to apply these kinds of data springs to the data world.

Is it possible to break this waterfall paradigm in the data world because of the tooling available now?

Ten years ago, there was no standard approach to continuously developing everything in a single click as there is now by setting up GitHub actions, which allow developers to do all these things well and fast. Now, every single developer can do it.

The same idea applies to data currently, as it is too complex to set up some databases with particular properties, and data in certain states. It's not as easy as with software development.

The stage waterfall model is unavoidable, but you can run your process as a sequence of small, fast waterfalls in a way. If each stage is made faster, the sprints are going to be longer than in the development world, but it still promotes being agile in some form.

Ultimately, it is an iterative process everywhere. It can be called a waterfall or a small waterfall or agile iteration. And even the concept of pipeline, means things are done in sequence.

So it is not about leaving the sequence and order, but rather a matter of doing things collaboratively and consulting each other more frequently than before. In conclusion, the DataOps vs DevOps discussion not only highlights the unique challenges and solutions in each field but also underscores the evolving nature of technology operations and data management. It's very important to create empowered teams that are composed of people from different areas of expertise to unlock the full potential of this new way of working.

The most efficient way to actually implement DevOps when it comes to team structures or team topologies is to empower a team of cross-functionality specialists that come together, work together in some sequence. They will have their own processes but the communication is reduced to the level of a single team instead of trying and producing an effective cross team or cross organization communication channel, which is much harder to achieve. It is something that happens in the data engineering world.

The expression, self service is also common. The idea is that the more you can do within a team the better. Delays happen because people need to communicate with other teams, but the more that can be done within teams, the faster the process. And there are many people involved in the DataOps processes.

That is one of the biggest challenges when people are trying to build a data platform.

DataOps vs DevOps FAQs

What are the primary differences in team roles when comparing DataOps vs DevOps?

When comparing DataOps vs DevOps, team roles in DevOps often involve professionals who specialize in software development, operations, and quality assurance. DataOps, on the other hand, typically includes data engineers, data quality specialists, and data security professionals. These distinct roles reflect the different focuses of each practice: continuous integration and deployment in DevOps versus data management and analytics in DataOps.

How does the integration of tools differ in DataOps vs DevOps?

In DataOps vs DevOps, the integration of tools can significantly differ. DevOps integrates tools that facilitate continuous integration, continuous deployment, and automated testing. In contrast, DataOps integrates tools designed for data orchestration, real-time data processing, and analytics platforms. This reflects each field’s unique requirements for managing workflows and data streams.

Can principles of DevOps be applied effectively in DataOps, considering the DataOps vs DevOps comparison?

Considering the comparison between DataOps vs DevOps, principles of DevOps such as continuous integration and automated testing can indeed be adapted for DataOps. However, they must be tailored to fit the more complex and variable nature of data pipelines. This might include adopting continuous data validation and monitoring techniques to maintain data quality throughout the lifecycle.

What impact has the DataOps vs DevOps discussion had on traditional IT departments?

The discussion around DataOps vs DevOps has encouraged traditional IT departments to break down silos and foster closer collaboration between developers, operations professionals, and data specialists. This shift promotes a more integrated approach to managing both software development and data operations, potentially leading to more agile and responsive IT practices.

What could be the future developments in the field of DataOps vs DevOps?

Future developments in DataOps vs DevOps might include more sophisticated integration of AI and machine learning technologies to further automate processes within both domains. For DevOps, this could mean smarter automated testing and deployment strategies, while for DataOps, advanced analytics and machine learning could enable more predictive data quality management and governance practices. As both fields evolve, the integration of these technologies could redefine how organizations manage software development and data operations.