The term “data structure” is used throughout the technology industry, yet its definition and implementation can vary. I’ve seen this at various vendors: last fall, British Telecom (BT) talked about its data structure at an analyst event; Meanwhile, in storage, NetApp has refocused its brand on intelligent infrastructure, but previously used that term. Application platform vendor Appian has a data structure product, and database provider MongoDB has also talked about data fabrics and similar ideas.
At its core, a data structure is a unified architecture that abstracts and integrates different data sources to create a seamless data layer. The idea is to create a single, synchronized layer between the various data sources and workloads that need access to the data—your applications, your workloads, and increasingly your AI algorithms or learning engines.
There are many reasons for wanting such an overlay. A data structure acts as a generalized integration layer that connects to different data sources or adds advanced accessibility features for applications, workloads, and models, such as allowing access to these sources while keeping them in sync.
So far so good. But the challenge is that we have a gap between the principle of the data structure and its actual implementation. People use this term to represent different things. To return to our four examples:
- BT defines a data structure as a network-level overlay designed to optimize data transmission over long distances.
- NetApp’s interpretation (including the concept of intelligent data infrastructure) emphasizes storage efficiency and centralized management.
- Appian will position its data structure product as a data unification tool at the application layer, enabling faster development and customization of tools for users.
- MongoDB (and other providers of structured data solutions) consider data structure principles in the context of a data management infrastructure.
How do we bridge it all? One answer is to accept that we can approach this from multiple angles. You can talk about a data structure conceptually—knowing the need to connect data sources—but without exaggeration. You don’t need a one-size-fits-all “uber-fabric” that covers everything. Instead, focus on the specific data you need to manage.
Going back a few decades, we can see similarities with the principles of service-oriented architecture, which sought to separate service delivery from database systems. Back then, we discussed the differences between services, processes, and data. The same is true now: you can request a service or request data as a service, focusing on what is needed for your workload. Creating, reading, updating and deleting remains the easiest of data services!
I was also reminded of the origins of network acceleration, which would use caching to speed up data transfers by keeping a version of the data locally rather than repeatedly accessing the resource. Akamai built its business on how to transport unstructured content such as music and movies efficiently and over long distances.
This is not to say that data structures are reinventing the wheel. Technologically, we are in a different (cloud) world; In addition, they bring new aspects, not least around metadata management, provenance tracking, regulatory compliance and security features. These are particularly important for AI workloads where data governance, quality and provenance directly affect model performance and trustworthiness.
If you’re thinking about deploying a data structure, the best starting point is to think about what you want the data for. Not only does this help you navigate what kind of data structure might be most appropriate, but this approach also helps you avoid the trap of trying to manage all the data in the world. Instead, you can prioritize the most valuable subset of data and consider which level of data structure best suits your needs:
- Network level: For data integration across multi-cloud, on-premises and edge environments.
- Infrastructure level: If your data is centralized with a single storage vendor, focus on the storage tier to serve coherent pools of data.
- Application level: To bring together different datasets for specific applications or platforms.
For example, in the case of BT, they found internal value in using their data structure to consolidate data from multiple sources. This reduces duplication and helps streamline operations, making data management more efficient. It is clearly a useful tool for consolidating forces and improving application rationalization.
After all, a data structure is not a monolithic, one-size-fits-all solution. It’s a strategic conceptual layer backed by products and features that you can use where it makes the most sense to add flexibility and improve data delivery. A deployment fabric is not a set-it-and-forget-it exercise: it requires an ongoing effort to scale, deploy, and maintain—not just the software itself, but also the configuration and integration of data sources.
While a data structure can conceptually exist in multiple places, it is important not to unnecessarily replicate the delivery effort. So whether you’re pulling data over the network, within the infrastructure, or at the application level, the principles remain the same: use it where it best suits your needs, and let it evolve with the data it serves.
The post Demystifying data structures – bridging the gap between data sources and workloads appeared first on Gigaom.