Classifying Metadata by Producers and Consumers
Metadata management is a vast topic. Many textbooks have been published, focused exclusively on metadata, covering topics such as what it is, why it is important, and the enterprise strategy and program to enable enterprise information management and governance, to mention but a few of them.
The popular description of metadata being data about data is confusing, and yet misleadingly simple. In essence, metadata describes the data itself, what it means, how data points are connected to other data points and the technical aspects involved in the data lifecycle. It helps an organisation to understand its own data and gives information about how it is technically managed
This short blog is focused specifically on the types of metadata.
It has become popular to classify the types of metadata as being descriptive, administrative and structural. This classification is done based on how the metadata can be used, but it does not address the nature of the metadata itself, and who is responsible for creating it. To do this, a well accepted metadata classification differentiates between business, technical and operational metadata. This classification is focused on who the producers and consumers of metadata are.
Business metadata
Business metadata refers to the information that describes the business context of data, such as the meaning and purpose of the data, its intended use, business rules applied to capture valid data and various classifications applied to the data like who can see and update the data content.
It is primarily used by business users to understand the data better and make informed decisions. It also informs stakeholders in technology about the business meaning of the data that they are dealing with.
The task of capturing business metadata falls under both business and technical stakeholders. For technical, business analysts and data modelers capture some of this information when analysis is done for a project aimed at implementing a business request that requires a technical solution, and when the data modeler creates the logical data model for the solution. From a business perspective, data managers, business data stewards and data governing bodies collect the business metadata.
Technical metadata
Technical metadata refers to the information that describes the technical aspects of data, such as its format, structure, and location. Technical metadata addresses both data at rest and data in transit.
Data at rest refers to characteristics of the data when it is stored somewhere. Storage could be done in databases, file systems or as a dump of files in a blob, to name a few. The format of the stored data can be structured, semi-structured or unstructured. Technical metadata for data at rest describe these aspects, the name and physical format of each data point and constraints that are applied by the storage mechanism. An example of a constraint is whether the data point may be empty or not.
Data in transit is also known as data in motion. As the name implies, the physical characteristics of the data when it flows from one storage point to another is described. The source and target of the data is captured, the name of the operation that performed the task of moving the data, how frequently this task is performed, and other characteristics that describe the technical aspects of the moving data are captured as technical metadata.
Data lineage is the result of viewing data at rest and data in transit simultaneously.
Technical metadata is primarily used by IT professionals to enable them to manage, maintain and change software solutions when changes in business or technology are planned and implemented.
This type of metadata is captured by data modelers when the physical data model is created, and by technical resources like systems analysts and developers who are responsible for designing and implementing software solutions to business requirements.
Operational metadata
Operational metadata refers to the information that describes the day to day operation and monitoring of running software. This type of metadata gives information about the status of data refresh activities, the performance of data loads, the errors that were encountered when data was moved or saved, etcetera.
This type of metadata is primarily used by system administrators and operators in the IT operations team to monitor and optimise the data processing and storage systems, and by developers and data engineers as a starting point when a problem was picked up by the operations team and escalated to them for resolution.
The task of capturing operational metadata falls on the technology that was used to implement the technical solution. For example, Azure Data Factory (ADF) may be responsible for transforming data from a source to a target system. It would be the responsibility of ADF and / or Azure Purview to capture and expose the operational metadata. This type of metadata is thus generated by the scheduled processes themselves to indicate when they started execution, how they are progressing, note information about any errors encountered and lastly, when the individual tasks completed.
Conclusion
This simple method of classifying metadata to be either business, technical or operational clearly differentiates between the parties who are responsible for creating the metadata. Knowing this enables governance to pinpoint exactly what information should be collected when and by whom, which in turn allows for gates to be introduced to the software development lifecycle. These gates provide a surefire method to manage the metadata lifecycle effectively.
Comments
Post a Comment