Exponential Data Growth and Resulting Challenges: Determining the Business Value, Democratization and Managing Data Movement
The amount of data organizations are generating has been increasing exponentially over the last few decades and as per the Forbes article based on IDC whitepaper last year, it is expected to hit 163 Zettabytes in 2025. Many companies have been struggling to manage this deluge of data coming from systems, devices, customers, social networks and business partners. How efficiently companies store, process and use this data is becoming more and more critical to the success of their business.
According to the online research firm Statista the database software market in 2017 was projected to be $50 billion. This cost is only a small part of cost for numerous platforms and technologies being deployed in the entire data platform ecosystem. In spite of all the investment in tools, technologies and people, many companies are finding it harder and harder to manage the ever-increasing size and complexity of data. The massive value and demand placed on data and expectations to drive impressive data driven outcomes is resulting in organizations struggling to rationalize available tools, technologies and time needed to successfully adopt them. The task of storing the data in right form, in the right platform, for right type of processing while controlling access is especially difficult. This has become more important in light of heightened sensitivity around data protection and privacy and new regulations such as GDPR. I will briefly address three of these challenges in this article.
Determining the value of enterprise data
To manage enterprise data effectively, it is important to determine its value to the business now and in the future. Unfortunately, many companies have not seriously started thinking about the value of their enterprise data. While such valuations can be hard to determine, they are necessary to objectively manage data and drive most value out of it. Some of the factors in determining the value of data assets are: company’s cost to acquire data, risk to the business of compromising or losing data, relative market value of data, how much money is the data generating currently and how much money the data can potentially generate in future. There are many other factors that can be included but it is always good to start with a simple model that also addresses how this value is impacted with time.
All data is not equal and need to be managed accordingly. This valuation of data assets should be used as basis to create, fund and execute data strategy and should serve as basis for data risks management. As organizations grapple with more and more data, its business value should determine where the most of data management efforts, funding and attention should be focused. Business should also help frame data lifecycle, providing guidance on what data is critical and how long it should be available to serve business needs.
Balancing Democratization with Control
In most organizations, it is usual to have data spread across many platforms and applications on-premise and external cloud environments and laptops of company employees. This spread of data increase data consistency and security issues and make data management very difficult. Some of the most frequent reasons organizations loose control of enterprise data are:
- Unconstrained duplication of data (sometimes driven by self service data initiatives) and lack of proper governance resulting in proliferation of inconsistent data sets
- Lack of enforced value based archival strategy resulting in data hoarding in data lake and other data stores
- In absence of proper master data management strategy, employees use their own copies of master data usually with wrong number of records and old attribute values
Managing Data Movement
This decade has been time of transition from on-premise to cloud based infrastructure (IaaS) and platforms (PaaS) offerings. This transition has also resulted in chaos with most companies moving a part of their environments to cloud and running the rest on-premise. While most newer applications and operational data stores were developed in cloud based platforms, a lot of existing systems have either stayed behind or were moved only partially. To support this hybrid on-premise and cloud model, lots of new data pipelines were needed across clouds and across applications. As a result, number of ETLs/ELTs has increased significantly and so has the chattiness among systems and services. The cloud based data lakes need data from on-premise systems and on-premise data warehouses needed data from apps in cloud. IT has been never as busy coding (recoding), deploying, monitoring and fixing data movement. Data strategy for cloud migration and edge should minimize the split between related data across on-premise systems, clouds and devices. This strategy should include edge data management and analytics to avoid moving and acquiring non value add data.
While the problem is complex and unique to each company with variation in the type of business, size, complexity, and years in business being some of many factors, these challenges can be handled by planning and building data strategy at the enterprise level instead of addressing it at each database, application, program or project level. For the strategy to be a successful partnership with business is a must and it starts with treating enterprise data the same as other business assets.