What is DataOps
Similar to how DevOps accelerated software delivery timelines, the goal of DataOps is to accelerate the data flow timeline using agile, automated approaches. This emerging discipline replaces traditional waterfall approaches of delivering data solutions to the end user, resulting in predictable, automated, and reliable analytics output.
As described in Gartner’s 2018 Hype Cycle for Data Management, DataOps is a collaborative data management practice focused on improving data flow communication, integration, and automation between data managers and consumers throughout organizations. The goal of DataOps is creating predictable delivery and facilitating change management in data, data models and similar, related artifacts. DataOps automates data delivery through technology, using security, quality, and metadata to improve data use and value within a dynamic environment.
With the rise of self-service analytical tools, users have come to expect the ability to generate analytical insights whenever they need them. They don’t want to submit a request for information, wait a day (or more) for an analyst to crunch the numbers, and receive a static report. They want real-time (or near real-time) data displayed in a meaningful way while facilitating a streamlined methodology to take actions based on the insights they receive. Tools and methodologies to support DataOps are still being developed and we look forward to new innovations as they arise.
Leading the way with providing tools to support DataOps is Salesforce’s Einstein Analytics. With Einstein Analytics, users have the ability to collaborate in support of continuous improvement and automation of data processing so that users can have the right information when they need it. A few examples of how Einstein Analytics supports DataOps are listed below.
Continuous Delivery with Chatter Collaboration
In DataOps, communication between teams is important in bridging the gap between business users, data analysts, data scientists, and data engineers. All members of the team must be able to communicate quickly to identify issues and needs. One of the core features of Salesforce is the ability to collaborate with other users. The same is true for Einstein Analytics.
Through integrated Chatter functionality, team members can annotate dashboard widgets with comments in order to communicate with others about issues or improvements without leaving Einstein Analytics. Is a dashboard visualization no longer providing relevant business insights? Are there errors in the data displayed? Users can quickly collaborate on these issues and more to ensure that the analytics solution provides up-to-date, accurate, and meaningful data.
Individual users can also build their own data explorations using Einstein lenses to ask questions of datasets within Einstein Analytics. If useful information is found, this lens and accompanying insights can be shared with the team via Chatter and even clipped to a new or existing dashboard to provide continuous analytics improvements.
Automate Data Integrations
Automating the data pipeline is one of the key practices of DataOps. Up-to-date data translates to up-to-date analytical insights. With Einstein Analytics, both internal Salesforce data and external data can be easily integrated. If you have data stored in outside applications, Einstein Analytics allows you to automate data loads so that the data used in the final solution is always up to date. When you load data into Analytics, a dataset is created. Datasets are collections of data stored in a denormalized but highly compressed form. Once you’ve created datasets using an automated integration, your users are able to use that data in creating analytics solutions.
Einstein Analytics allows the following tools to be used in creating datasets.
Source: Einstein Analytics Data Integration Guide
As you can see in the above table, it’s possible to load data using dataflows, data connections, and data recipes in an automated fashion. Currently, the Analytics Connector for Excel and the External Data API do not provide the ability to schedule refreshes.
Automate Data Transformations with Dataflows and Recipes
Once data is loaded, another important DataOps practice is to ensure that the data transformation process is also automated and easily updatable as data changes. After all, how useful is imported data if it’s not properly prepared for an analytics solution? Einstein Analytics provides two ways to prepare data – dataflows and dataset recipes.
Dataflows allow users to manage the complete end-to-end process -- from loading source data, preparing it, and creating one or more datasets. Users can employ a drag-and-drop dataflow editor or manually edit the underlying JSON code if more advanced preparation is required. Data recipes, on the other hand, limit users to data preparation only. With recipes, users prepare data using existing datasets.
Once created, both dataflows and recipes can be scheduled to run so that the datasets are updated automatically. An additional feature related to data quality is the ability to receive notifications when a dataflow runs. In this way, users can be quickly alerted to potential errors in the dataflow.
As explained above, continuous delivery through Chatter collaboration, automating data integrations, and automating data transformations are just a few of the ways that Einstein Analytics supports the developing DataOps movement. Einstein Analytics is a valuable tool in helping to remove the bottlenecks that commonly plague analytics projects, allowing organizations to produce valuable insights at the exact moment they’re needed.
About the AuthorFollow on Linkedin More Content by Kathryn Baker Parks