Data Product Ontology (DPROD)

The concept of Data Products has emerged as organizations increasingly recognize the value of data as an asset to be managed and distributed like any other product. As more companies adopt decentralized data architectures, such as Data Mesh, the need for standardized methods to describe and manage data products consistently across platforms has become critical. This is where the [[dprod]] specification, built on W3C Linked Data standards, becomes essential. Without such a standard, organizations face significant challenges: inconsistent metadata across diverse data products, limited discoverability, and interoperability issues that hinder data integration from various sources. As data ecosystems grow, the lack of a common framework also impedes scalability, increases vendor lock-in, and makes it difficult to manage these products effectively.

DPROD offers a solution by providing a clear schema for describing data products, ensuring they are discoverable, interoperable, and treated with the same level of accountability as traditional products.

The W3C tech stack is perfectly suited to address these challenges because it was designed to foster interconnected, decentralized systems, providing a robust framework for creating metadata that is both machine-readable and human-understandable. DPROD enables consistent terminology across different platforms, domains, and organizations, allowing advanced users to semantically enrich their data products and connect them into distributed knowledge graphs.

As more organizations strive to build and scale data products, DPROD provides the standardization needed to ensure interoperability and unlock the full potential of decentralized data ecosystems in a controlled and mature way.

Scope

The Data Product (DPROD) specification is a profile of the Data Catalog (DCAT) Vocabulary, designed to describe Data Products. This document defines the schema and provides examples of its use.

DPROD extends DCAT to enable publishers to describe Data Products and data services in a decentralized way. By using a standard model and ontology, DPROD facilitates the consumption and aggregation of metadata from multiple Data Marketplaces. This approach increases the discoverability of products and services, supports decentralized data publishing, and enables federated search across multiple sites using a uniform query mechanism and structure.

The namespace for DPROD terms is https://ekgf.github.io/dprod/

The suggested prefix for the DPROD namespace is dprod

DPROD follows two basic principles:

The [[dprod]]] specification builds on DCAT [[vocab-dcat-3]] by connecting DCAT Data Services to DPROD Data Products using input and output ports. These ports are used to publish and consume data from a Data Product. DPROD treats ports as DCAT data services, so the data exchanged can be described using DCAT's highly expressive metadata around distributions and datasets. This approach also allows publishers to create their own descriptions for the data they are sharing. They can use a special property called conformsTo from DCAT to link to their own set of rules or guidelines for their data.

The DPROD specification has four main aims:

Conformance

A data product is conformant with this specification if it satisfies the [[SHACL]] constraints provided in the file dprod-shapes.ttl.

Terms and Definitions

All terms introduced in this specification are given definitions in the Data Product Model defined later.

Symbols

The following acronyms are used in this specification.

Normative namespaces

Namespaces and prefixes used in normative parts of this Profile are shown in the following table.

Prefix Namespace IRI Source
dprod https://ekgf.github.io/dprod/ [[dprod]]
dcat http://www.w3.org/ns/dcat# [[vocab-dcat-3]]
dct http://purl.org/dc/terms/ [[dcterms]]
odrl http://www.w3.org/ns/odrl/2/ [[odrl-model]]
owl http://www.w3.org/2002/07/owl# [[owl2-quick-reference]]
prov http://www.w3.org/ns/prov# [[prov-overview]]
rdf http://www.w3.org/1999/02/22-rdf-syntax-ns# [[rdf11-primer]]
rdfs http://www.w3.org/2000/01/rdf-schema# [[rdf-schema]]
sh http://www.w3.org/ns/shacl# [[shacl]]
xsd http://www.w3.org/2001/XMLSchema# [[xmlschema-2]]

Data Product (DPROD) Model

Data Mesh Architectures [[Data Mesh]] use input and output ports to manage how data enters and leaves a Data Product. These ports can handle different formats, schemas, and protocols. Input ports bring in data, while output ports send data to other Data Products for aggregation, reuse, analysis or reporting, etc.

In the [[[vocab-dcat-3]]] framework, a Data Service is a way to describe services that provide access to data. Data Services give standardized, machine-readable descriptions of how to access one or more datasets or data processing functions.

Data Services specify how to access and download the data. In DPROD, Data Services are connected to Distributions by a property called isAccessServiceOf; on the Distribution one can specify formats (like CSV or JSON etc.) and provide metadata about the "physical model" of the data. Distributions link to Datasets and DCAT has a very rich vocabulary for describing every aspect of a dataset. Finally, Datasets use the conformsTo property to link to the "logical model" where one can specify one's own rich semantic metadata.

By linking Data Product ports to DCAT DataServices ([[vocab-dcat-3]]), DPROD can describe Data Products in a way that machines can read across the organization. This makes it easier for data teams to build and manage their own data products independently, while still working well with the rest of the organization's data.

Using standards like DCAT helps create a strong and clear way to define Data Products. It ensures that as data becomes more complex, the methods for describing, sharing, and using data stay consistent and reliable. It also allows different organizations to share data securely and in a standardized way.

Information model for the Profile
Overview of the DPROD model and its relationship with DCAT classes

The Profile consists of the following classes:

As DCAT Data Services, the DPROD input and output ports can specify connection details, they have distributions that define formats, and link to datasets that conform to shared schemas. In this example, the UK Bonds Data Product includes an output port, which is a RESTful API. This API delivers JSON data conforming to the shared FIBO specification for callable bonds.

  {
    "@context": "https://ekgf.github.io/dprod/dprod.jsonld",
    "id": "https://y.com/products/uk-bonds",
    "type": "DataProduct",
    "title": "UK Bonds",
    "description": "UK Bonds is your one-stop-shop for all your bonds!",
    "dataProductOwner": "https://www.linkedin.com/in/tonyseale/",
    "lifecycleStatus" : "https://ekgf.github.io/dprod/data/lifecycle-status/Consume",
    "outputPort": {
      "type": "DataService",
      "endpointURL": "https://y.com/uk-10-year-bonds",
      "isAccessServiceOf": {
        "type": "Distribution",
        "format": "https://www.iana.org/assignments/media-types/application/json",
        "isDistributionOf": {
          "type": "Dataset",
          "id": "https://y.com/products/uk-bonds/datasets/10-year",
          "conformsTo": "https://spec.edmcouncil.org/fibo/ontology/SEC/Debt/Bonds/CallableBond"
        }
      }
     }
  }
  

The examples map the type of the above classes to @type in the JSON-LD serializations. One can use JSON-LD to extend the familiar JSON syntax with the shared semantics defined by DCAT and DPROD.
The JSON above can be pasted into https://json-ld.org/playground. One can see that the schema resolves.

The following sections are driven by the Shapes definitions for DPROD, which represent the properties expected to be used for instances of the above classes. As such, they include properties defined by the DCAT specification which DPROD extends. The prefix in the identifier for each entry indicates the ontology defining the property.

DataProduct

A rational, managed, and governed collection of data, with purpose, value and ownership, meeting consumer needs over a planned life-cycle.

A data product may have input and output ports, code and metadata

label

The name given to the data product.

Identifier: rdfs:label
Label:data product label shape
Domain:dprod:DataProduct
Range:xsd:string

description

A free text description of the data product.

Identifier: dct:description
Label:data product description shape
Domain:dprod:DataProduct
Range:xsd:string

dataProductOwner

The agent that is accountable overall for the data product, including managing it through its lifecycle.

Identifier: dprod:dataProductOwner
Label:data product owner
Domain:dprod:DataProduct
Range:prov:Agent

domain

The business or information area supported by the data product.

Identifier: dprod:domain
Label:domain
Comment:The domain is intended to be a resource in its own right. This specification does not constrain the class to be used.
Domain:dprod:DataProduct
Range:

inputPort

A set of services exposed by a data product to collect its source data and makes it available for further internal transformation. An input port can receive data from one or more upstream sources in a push (i.e. asynchronous subscription) or pop mode (i.e. synchronous query). Each data product may have one or more input ports.

Identifier: dprod:inputPort
Label:input port
Domain:dprod:DataProduct
Range:dcat:DataService

outputPort

A set of services exposed by a data product to share the generated data in a way that can be understood and trusted.

Identifier: dprod:outputPort
Label:output port
Domain:dprod:DataProduct
Range:dcat:DataService

inputDataset

The source data made available to the data product through input data services. Depending on the lifecycle of the data product, this may be a stated or inferred relationship aligned with the input ports.

Identifier: dprod:inputDataset
Label:input dataset
Domain:dprod:DataProduct
Range:dcat:Dataset

outputDataset

The data that is exposed by the data product through output data services in a way that can be understood and trusted. Depending on the lifecycle of the data product, this may be a stated or inferred relationship aligned with the output ports.

Identifier: dprod:outputDataset
Label:output dataset
Domain:dprod:DataProduct
Range:dcat:Dataset

purpose

A description of the objectives and intended usage of the data product.

Identifier: dprod:purpose
Label:purpose
Domain:dprod:DataProduct
Range:xsd:string

hasPolicy

An ODRL conformant policy expressing the rights associated with the data product. This is an inferred relationship based on the rights expressed on the individual datasets of the data product.

Identifier: odrl:hasPolicy
Label:data product has policy shape
Domain:dprod:DataProduct
Range:odrl:Policy

lifecycleStatus

The development status of the data product.

Identifier: dprod:lifecycleStatus
Label:lifecycle status
Domain:dprod:DataProduct
Range:dprod:DataProductLifecycleStatus

DataService

A collection of operations that provides access to one or more datasets or data processing functions.

A site or end-point providing operations related to the discovery of, access to, or processing functions on, data or related resources.

isAccessServiceOf

The dataset distribution that is being offered through this data service.

Identifier: dprod:isAccessServiceOf
Label:is access service of
Domain:dcat:DataService
Range:dcat:Distribution

protocol

A protocol (possibly one of many options) used to communicate with this data service.

Identifier: dprod:protocol
Label:protocol
Domain:dcat:DataService
Range:dcat:Protocol

securitySchemaType

The security schema type used for authentication and communication with this Data Service.

Identifier: dprod:securitySchemaType
Label:data service security schema type shape
Domain:dcat:DataService
Range:dcat:SecuritySchemaType

endpointURL

The root location or primary endpoint of the service.

Identifier: dcat:endpointURL
Label:end-point del servicio
Comment:Kořenové umístění nebo hlavní přístupový bod služby (IRI přístupné přes Web).
Domain:dcat:DataService
Range:rdfs:Resource

endpointDescription

A description of the services available via the end-points, including their operations, parameters etc.

Identifier: dcat:endpointDescription
Label:descripción del end-point del servicio
Comment:A description of the service end-point, including its operations, parameters etc.
Domain:dcat:DataService
Range:

Distribution

A specific representation of a dataset. A dataset might be available in multiple serializations that may differ in various ways, including natural language, media-type or format, schematic organization, temporal and spatial resolution, level of detail or profiles (which might specify any or all of the above).

accessService

A data service that gives access to the distribution of the dataset.

Identifier: dcat:accessService
Label:data access service
Comment:A site or end-point that gives access to the distribution of the dataset.
Domain:dcat:Distribution
Range:dcat:DataService

conformsTo

The schema that the distribution conforms to that is format and technology dependent.

Identifier: dct:conformsTo
Label:distribution conforms to shape
Domain:dcat:Distribution
Range:

isDistributionOf

The dataset that this distribution makes available.

Identifier: dprod:isDistributionOf
Label:is distribution of
Domain:dcat:Distribution
Range:dcat:Dataset

format

The file format of the distribution.

Identifier: dct:format
Label:distribution format shape
Domain:dcat:Distribution
Range:

Dataset

A collection of data, published or curated by a single source, and available for access or download in one or more representations.

label

The name given to the dataset

Identifier: rdfs:label
Label:dataset label shape
Domain:dcat:Dataset
Range:xsd:string

description

Free text description of the dataset.

Identifier: dct:description
Label:dataset distribution shape
Domain:dcat:Dataset
Range:xsd:string

type

The type or genre of the dataset.

Identifier: dct:type
Label:dataset type shape
Domain:dcat:Dataset
Range:

distribution

An available distribution of the dataset.

Identifier: dcat:distribution
Label:distribuce
Comment:An available distribution of the dataset.
Domain:dcat:Dataset
Range:dcat:Distribution

conformsTo

A model, schema, ontology, view or profile that the dataset conforms to.

Identifier: dct:conformsTo
Label:dataset conforms to shape
Domain:dcat:Dataset
Range:

hasPolicy

An ODRL conformant policy expressing the rights associated with the resource.

Identifier: odrl:hasPolicy
Label:dataset has policy shape
Domain:dcat:Dataset
Range:

informationSensitivityClassification

More granular classification that indicates the level of control and protection that must be applied to the asset due to the nature of the data and its sensitivity or importance to the organization.

Identifier: dprod:informationSensitivityClassification
Label:information sensitivity classification
Domain:dcat:Dataset
Range:dprod:InformationSensitivityClassification

DataProductLifecycleStatus

The development status of the data product taken from a controlled list (e.g. Ideation, Design, Build, Deploy, Consume).

The lifecycle of the data product as defined by EDM Council CDMC

InformationSensitivityClassification

A classification of the information within a dataset to indicate the level of control and protection that must be applied.

Protocol

A detailed specification, possibly including a specific version, for how to communicate with a service.

SecuritySchemaType

A classification encompassing a set of rules used for authentication and communication.

Worked Examples

This section contains some worked examples illustrating how to use [[dprod]] for some common use cases. All these examples are provided as accompanying machine-readable files, from which this part of the specification is automatically generated (hence soem formatting variations).

Sba Pool Rates

Rates for SBA Pool that are Mortgage Backed Securities.

The data product is provided through 3 ports:

{  "@context": "https://ekgf.github.io/dprod/dprod.jsonld",
  "id":  "sba-pool-rates",
  "@id": "https://y.com/products/sba-pool-rates",
  "@type": "DataProduct",
  "title": "SBA Pool Rates",
  "description": "Rates for SBA Pool that are Mortgage Backed Securities. The data product is provided through 3 ports, one of them proving all SBA pool rates through a query to a database, another port providing EMEA only rates through an api and another one providing US only rates through a Kafka topic",
  "dataProductOwner": "https://www.schema.xxx/person/johnSmith",
  "lifecycle" : "Consume",
 
  "outputPort": [{         
    "@type": "dcat:DataService",
    "id":  "sba-pool-rate-tabular-prod1",
    "environment": "PROD",
    "@id": "https://y.com/service/sba-pool-rate-tabular-prod1",
    "dcat:endpointURL": "jdbc:oracle:thin@sd656-5656-6745.ldn.organiation.com:43534/PGPERG.WORLD",
    "sql": "select * from ...",
    "isAccessServiceOf": {
      "@id": "https://y.com/distribution/sba-pool-rate-tabular",
      "id":  "sba-pool-rate-tabular",
      "@type": "dcat:Distribution",
      "dcterms:format": "https://www.iana.org/assignments/media-types/application/sql",
      "isDistributionOf": {
        "@type": "dcat:Dataset",
        "@id": "https://y.com/dataset/sba-pool-rate",
        "id":  "sba-pool-rate",
        "dcat:conformsTo": "https://spec.edmcouncil.org/fibo/ontology/SEC/Debt/MortgageBackedSecurities/SBA-Pool"
      }
    }
   }
,
  {            
    "@type": "dcat:DataService",  
    "id":  "sba-pool-rate-emea-api-prod1",
    "environment": "PROD",
    "@id": "https://y.com/service/sba-pool-rate-emea-api-prod1",
    "dcat:endpointURL": "https://example.org/mbs/SBA-Pool-location-emea",
    "dcat:conformsTo":  "../resources/users.yaml'" ,
    "isAccessServiceOf": {
      "@id": "https://y.com/distribution/sba-pool-rate-emea-json1",  
      "id":  "sba-pool-rate-emea-json1",
      "@type": "dcat:Distribution",
      "dcterms:format": "https://www.iana.org/assignments/media-types/application/json",
      "isDistributionOf": {             
        "@type": "dcat:Dataset",
        "@id": "https://y.com/dataset/sba-pool-rate-emea",
        "description": "SBA pool data that cover EMEA accessed through an api" ,
        "geographicalCoverage" :"https://y.com/country/EMEA",
        "id":  "sba-pool-rate-emea",
        "dcat:conformsTo": "https://spec.edmcouncil.org/fibo/ontology/SEC/Debt/MortgageBackedSecurities/SBA-Pool"
      }
    }
   }

 , {   
     "@type": "dcat:DataService",
     "id":  "sba-pool-rate-json-prod1",
     "@id": "https://y.com/service/sba-pool-rate-json-prod1",
     "dcat:endpointURL": "q1.debt.mbs.dataset.us", 
     "isAccessServiceOf": {
       "@id": "https://y.com/distribution/sba-pool-rate-json",
       "id":  "sba-pool-rate-json",
       "@type": "dcat:Distribution",
       "dcterms:format": "https://www.iana.org/assignments/media-types/application/json",
       "conformsTo": "http://confluent-registry-y/rates-json-schema.json",
        "schemaCompatiblity": "backwards compatible",
        "isDistributionOf": {
         "@type": "dcat:Dataset",
         "@id": "https://y.com/dataset/sba-pool-rate-us",
         "id":  "sba-pool-rate-us",
         "geographicalCoverage" :"https://y.com/country/US",
         "dcat:conformsTo": "https://spec.edmcouncil.org/fibo/ontology/SEC/Debt/MortgageBackedSecurities/SBA-Pool"
       }
     }
  }
    ]
}

Core Data Product Extensions

For real world data products, the core data product details will be part of a wider set of metadata that allows the data and data product to be used effectively.

Below is an example of extending the DPROD data product, specifically by adding an agreement to a data product.

In this example, a Data Product Agreement is defined as a subclass of FIBO Agreement.

Definition of a simple Agreement based on FIBO:

A full definition of agreements for data products is likely to be more complex than a single class and may use other information models or their profiles (such as ODRL Policy) or create dedicated definitions.

Below is an example of a Data Product with an associated Data Product Agreement with an effective date.

Using the agreement:

Data Schema

The Data Product provides to the consumers (dprod:outputDataset) datasets defined based on DCAT. Datasets should be described (dcat:conforms) with logical models. Logical models describe business entities and their properties (attributes and relationships) with consistent business terms and they are technology independent. Ideally, logical models are based on existing standards eg, FIBO, CDM etc. If a logical model does not exist to describe the dataset, then the dataset publisher can create one, preferably by using SHACL modelling language:

Example of a Dataset conforming to a SHACL Schema:

exampleDataset dcat:conforms exampleSchema:DatasetLogicalSchema.
exampleSchema:DatasetLogicalSchema a owl:Ontology, dct:Standard.

Based on SHACL all entities that exist in the dataset are Node Shapes (1). The attributes of the entities are described as Property Shapes with sh:datatype (2) The relationships are also defined as Property Shaped with sh:class the target class of the relationship (3)

{
"@context": "https://ekgf.github.io/dprod/dprod.jsonld",
"id": "https://y.com/products/equity-trade-xxx",
"@type": "DataProduct",
"title": "Equity Trade XXX",
"description": "Trade data defining the outcome of equity trades between parties in different stock markets, where the terms are primarily reflected in the tradable product. Additionally, Trade includes attributes such as the trade date, transacting parties, and settlement terms. Some attributes, such as the parties, are already defined in the Party Product and are simply referenced in Trade",
"outputPort": {
  "@type": "DataService",
  "endpointURL": "abfss://datasetsv1@demo.dfs.core.windows.net/demo/full/trade-euronext",
  "isAccessServiceOf": {
    "@type": "Distribution",
    "format": "application/parquet",
    "isDistributionOf": {
      "@type": "Dataset",
      "@id": "https://y.com/dataset/equity-trade-euronext-paris",
      "title":  "Equity Trade Euronext Paris XXX",
        "conformsTo":"https://spec.edmcouncil.org/fibo/ontology/BP/Process/FinancialContextAndProcess/SecuritiesTrade"     } 
    }
  }
}

Data Quality

[
  
  {
  "@context": "https://ekgf.github.io/dprod/dprod.jsonld",
  "id": "https://y.com/derived-quality-measurementA",
  "@type": "QualityMeasurement",
  "value": 1,
  "computedOn": {
      "@type": "DataProduct",
      "@id": "https://y.com/products/uk-bonds"
    },
  "isMeasurementOf": {
    "@type": "Metric",
    "label": "Number of stale datasets"
    }
}
,

{
  "@context": "https://ekgf.github.io/dprod/dprod.jsonld",
  "@id": "https://y.com/quality-measurement-B",
  "@type": "QualityMeasurement",
  "value": "false",
  "computedOn": {
      "@type": "Dataset",
      "@id": "https://y.com/products/uk-bonds/yearlyPrices"
    },
  "isMeasurementOf": {
    "@type": "Metric",
    "label": "Expected distribution frequency achieved"
  }
}
,
  {
    "@context": "https://ekgf.github.io/dprod/dprod.jsonld",
    "id": "https://y.com/products/uk-bonds",
    "type": "DataProduct",
    "outputPort": {
      "type": "DataService",
      "endpointURL": "https://y.com/uk-bonds/quality-report",
      "isAccessServiceOf": {
        "type": "Distribution",
        "isDistributionOf": {
          "type": "Dataset",
          "conformsTo": "https://www.w3.org/TR/vocab-dqv/#dqv:QualityMeasurement"
        }
      }
    }
  }
]

Equity Trade

Example of a data product for Equity Trades.

The Equity Trades Data Product provides two datasets to the consumers: one for trades in London Stock Exchange (LSEG) and one in Euronext.

{
"@context": "https://ekgf.github.io/dprod/dprod.jsonld",
"id":  "equity trade-xxx",
"@id": "https://y.com/products/equity-trade-xxx",
"@type": "DataProduct",
"title": "Equity Trade XXX",
"description": "Trade data defining the outcome of equity trades between parties in different stock markets, where the terms are primarily reflected in the tradable product. Additionally, Trade includes attributes such as the trade date, transacting parties, and settlement terms. Some attributes, such as the parties, are already defined in the Party Product and are simply referenced in Trade",
"dataProductOwner": "https://www.schema.xxx/person/AnnTaylor",
"lifecycle" : "Consume",
"outputPort": [{
  "@type": "dcat:DataService",
  "id":  "equity-trade-euronext-xxx-tabular-adls-prod",
  "@id": "https://y.com/service/equity-trade-euronext-xxx-adls-prod-1",
  "dcat:endpointURL": "abfss://datasetsv1@demo.dfs.core.windows.net/demo/full/trade-euronext",
  "dcat:endpointDescription": "Details for accessing storage account",
  "isAccessServiceOf": {
    "@id": "https://y.com/service/equity-trade-euronext-xxx-tabular",
    "id":  "equity-trade-euronext-xxx-tabular",
    "@type": "dcat:Distribution",
    "dcterms:format": "application/parquet",
     "dcat:conformsTo": "https://cdm.finos.org/docs/event-model" ,
    "isDistributionOf": {
      "@type": "dcat:Dataset",
      "@id": "https://y.com/dataset/equity-trade-euronext-paris",
      "datasetOwner": "https://www.schema.xxx/person/JohnBarks",
      "title":  "Equity Trade Euronext Paris XXX",
      "id":  "equity-trade-euronext-paris-xxx",
      "dcat:conformsTo": "https://cdm.finos.org/docs/event-model"
        } 
    }
  }
,{
   "@type": "dcat:DataService",
  "id":  "equity-trade-lseg-xxx-tabular-adls-prod",
  "@id": "https://y.com/service/equity-trade-lseg-xxx-adls-prod-1",
  "dcat:endpointURL": "abfss://datasetsv1@demo.dfs.core.windows.net/demo/full/trade-lseg",
  "dcat:endpointDescription": "Details for accessing storage account",
  "isAccessServiceOf": {
    "@id": "https://y.com/service/equity-trade-lseg-xxx-tabular",
    "id":  "equity-trade-lseg-xxx-tabular",
    "@type": "dcat:Distribution",
    "dcterms:format": "application/parquet",
     "dcat:conformsTo": "https://cdm.finos.org/docs/event-model" ,
    "isDistributionOf": {
      "@type": "dcat:Dataset",
      "@id": "https://y.com/dataset/equity-trade-lseg-xxx",
      "title":  "Equity Trade LSEG XXX",
      "id":  "equity-trade-lseg-xxx",
      "dcat:conformsTo": "https://cdm.finos.org/docs/event-model"
        } 
    }
  }
  ]
}

Data Rights

ODRL is a W3C standard to describe rights and entitlements. Based on ODRL, data product and dataset publishers can describe policies in a consistent, standard and machine-readable manner. Policies contain permissions and prohibitions on specific actions that are required to be met by stakeholders.

In addition, policies may be limited by constraints (eg. temporal or geographical constraints) and duties (eg. payments) that may be imposed on the permissions.

Policies and their permitted or prohibited actions can be described at different levels, eg. a policy can target a data product, a dataset, a data service or even a column.

Sophisticated engines should interpret and enforce the ODRL policies at the appropriate level eg.:

An example of a policy follows, that describes permission to distribute the data only within a specific geographic region:

{
  "@type": "Policy",
  "id": "56456df-dfg-34535345-5545",
  
  "assigner": 'https://schema.org/person/AdamSmith",
  "target": "https://data.org/data-product/equity-trade-xxx",
  
  "permission": [
      {
       "action": "odrl:read",
       "constraint": [
          {
            "@type": "Constraint",
            "leftOperand": "spatial",
            "operator": "odrl:isAnyOf",
            "rightOperand": [region:EMEA, region:APAC],
            "description": " Permission to read all the datasets of the product if user is working inside EMEA or APAC"
        }
    ]
}
        

Observability Ports

An Observability Port is a designated interface or endpoint in a system or application specifically used for monitoring and diagnostic purposes. It allows external tools or services to collect and analyze data related to the system's performance, health, and behaviour. By exposing metrics, logs, and traces through this port, administrators and developers can gain insights into the system's state, troubleshoot issues, and ensure it operates efficiently and reliably.

Defining Observability Ports in DPROD

DPROD has a schema-first design. The first thing to do is define a schema for the logging information. It could be a schema based on OpenTelemetry, but this uses RLOG (which is a semantic ontology for logging).

To find the Observability Port, query the ports to identify the ones that return an RLOG:Entry:

outputPort >> isAccessServiceOf >> isDistributionOf >> conformsTo >> rlog:Entry

Example Data Product with Observability Port

One can see that the example data product has two ports, one with the data and one with the logging. This query will return the URI of the port that returns logging data: https://y.com/uk-bonds/observability-port.

Here is an example of a data product with an observability port:

Given that the schema defines the class for an observation, it can be used to find all observability ports on data product like this:

[https://y.com/data-product/uk-bonds/port/2024-observability] >> isAccessServiceOf >> isDistributionOf >> conformsTo >> https://y.com/schema/ObservabilityLog

In Linked Data a SPARQL query would do that:

This query will return the URI of the port that provides logging data: https://y.com/data-product/uk-bonds/port/2024-observability.

Data Lineage

It is important to be able to trace the lineage of data. Within DPROD, this can be done in two ways: at a high level from one data product to another and, if desired, at the more detailed level of the underlying datasets.

High Level Lineage: Between Data Products

Data products have input and output ports, and one data product’s input port will point to another data product’s output port.

This allows a user to query the lineage. The data products all have URLs as identifiers, and properties all connect to each other, so a query can walk from one data product to the downstream data products that feed it.

One can follow the path that leads from one data product to another like this:

Data Product >> inputPort >> isAccessServiceOf >> isDistributionOf >> Input Data Product 

The following example data has three data products that connect to each other through their input and output ports:

Given this example data, starting at the data product https://y.com/data-product/company-finance, one could walk the relationships to find the input data products that feed it:

https://y.com/data-product/company-finance >> 
    :inputPort >> 
    :isAccessServiceOf >> 
    :isDistributionOf >> [
        https://y.com/data-product/company-sales, 
        https://y.com/data-product/company-hr
    ]

In Linked Data, this would use a query such as:

Detailed Level: Between Datasets

To track lineage at a more granular level, one can also use PROV (https://www.w3.org/TR/prov-o/) at the dataset level.

See: https://www.w3.org/TR/vocab-dcat-3/#examples-dataset-provenance.

Acknowledgements

The editors gratefully acknowledge the feedback and contributions made by individuals who have participated in EDM Council's CDMC team.