Data Product Vocabulary (DPROD)

The Data Product (DPROD) specification is a profile of the Data Catalog (DCAT) Vocabulary, designed to describe Data Products. This document defines the schema and provides examples for its use.

DPROD extends DCAT to enable publishers to describe Data Products and data services in a decentralized way. By using a standard model and vocabulary, DPROD facilitates the consumption and aggregation of metadata from multiple Data Marketplaces. This approach increases the discoverability of products and services, supports decentralized data publishing, and enables federated search across multiple sites using a uniform query mechanism and structure.

The namespace for DPROD terms is https://ekgf.github.io/data-product-spec/dprod

The suggested prefix for the DPROD namespace is dprod


DPROD follows two basic principles:

🔵Decentralize Data Ownership: To make data integration more efficient, tasks should be shared among multiple teams. DCAT helps by offering a standard way to publish datasets in a decentralized manner.

🔵Harmonize Data Schemas: Using shared schemas helps unify different data formats. For instance, the DPROD specification provides a common set of rules for defining a Data Product. You can extend this schema as needed.


The DPROD specification builds on DCAT by connecting DCAT Data Services to DPROD Data Products using Input and output ports. These ports are used to publish and consume data from a Data Product. DPROD treats ports as dcat data services, so the data exchanged can be described using DCAT's highly expressive metadata around distributions and datasets. This approach also allows you to create your own descriptions for the data you are sharing. You can use a special property called conformsTo from DCAT to link to your own set of rules or guidelines for your data.

The DPROD specification has four main aims:

🔵 To provide unambiguous and sharable semantics to answer the question: 'What is a data product?'

🔵 Be simple for anyone to use, but expressive enough to power large data marketplaces

🔵 Allow organisations to reuse their existing data catalogues and dataset infrastructure

🔵 To share common semantics across different Data Products and promote harmonisation

Status of this document

The current version is DRAFT. Feedback and comments welcome via the Github Issue feature.

Conformance

As well as sections marked as non-normative, all authoring guidelines, diagrams, examples, and notes in this Profile are non-normative. Everything else in this Profile is normative.

The keywords MAY, MUST, MUST NOT, RECOMMENDED, SHOULD, and SHOULD NOT are to be interpreted as described in [[!RFC2119]].

Normative namespaces

Namespaces and prefixes used in normative parts of this Profile are shown in the following table.

Prefix Namespace IRI Source
dcat http://www.w3.org/ns/dcat# [[VOCAB-DCAT-3]]
dct http://purl.org/dc/terms/ [[DCTERMS]]
odrl http://www.w3.org/ns/odrl/2/ [[ODRL-VOCAB]]
sdo https://schema.org [[SCHEMA-ORG]]

Data Product (DPROD) Model

Data Mesh Architectures use input and output ports to manage how data enters and leaves a Data Product. These ports can handle different formats, schemas, and protocols. Input ports bring in data, while output ports send data to other Data Products for aggregation, reuse, analysis or reporting etc.

In the Data Catalog Vocabulary (DCAT) framework, a Data Service is a way to describe services that provide access to data. Data Services give standardized, machine-readable descriptions of how to access one or more datasets or data processing functions.

Data Services specify how to access and download the data. In DPROD Data Services are connected to Distributions by a property called isAccessServiceOf, on the Distribution you can specify formats (like CSV or JSON etc) and provide metadata about the "physical model" of the data. Distributions link to Datasets and DCAT has a very rich vocabulary for describing every aspect of your dataset. Finally, Datasets use the conformsTo property to link to the "logical model" where you can specify rich semantic metadata of your own.

By linking Data Product ports to DCAT DataServices, DPROD can describe Data Products in a way that machines can read across the organization. This makes it easier for data teams to build and manage their own data products independently, while still working well with the rest of the organization's data.

Using standards like DCAT helps create a strong and clear way to define Data Products. It ensures that as data becomes more complex, the methods for describing, sharing, and using data stay consistent and reliable. It also allows different organizations to share data securely and in a standardized way.

Information model for the Profile
Overview of DPROD model and its relationship with DCAT classes

The Profile consists of the following classes:

As DCAT Data Services, the DPROD input and output ports can specify connection details, they have distributions that define formats, and link to datasets that conform to shared schemas. In this example, the UK Bonds Data Product includes an output port, which is a RESTful API. This API delivers JSON data conforming to the shared FIBO specification for callable bonds.

  {
    "@context": "https://ekgf.github.io/data-product-spec/dprod.jsonld",
    "id": "https://y.com/products/uk-bonds",
    "type": "DataProduct",
    "title": "UK Bonds",
    "description": "UK Bonds is your one-stop-shop for all your bonds!",
    "dataProductOwner": "https://www.linkedin.com/in/tonyseale/",
    "lifecycleStatus" : "https://ekgf.github.io/data-product-spec/dprod/data/lifecycle-status/Consume",
    "outputPort": {
      "type": "DataService",
      "endpointURL": "https://y.com/uk-10-year-bonds",
      "isAccessServiceOf": {
        "type": "Distribution",
        "format": "https://www.iana.org/assignments/media-types/application/json",
        "isDistributionOf": {
          "type": "Dataset",
          "id": "https://y.com/products/uk-bonds/datasets/10-year",
          "conformsTo": "https://spec.edmcouncil.org/fibo/ontology/SEC/Debt/Bonds/CallableBond"
        }
      }
     }
  }
  

The examples in map the type of the above classes to @type in the JSON-LD serialisations. You can use JSON-LD to extend the familiar JSON syntax with the shared semantics defined by DCAT and DPROD.
You can copy the json above and paste it into https://json-ld.org/playground. You can see that the schema resolves.

DataProduct

A data product is a rational, managed, and governed collection of data, with purpose, value and ownership, meeting consumer needs over a planned life-cycle.

label

The name given to the Data Product
Identifier:rdfs:label
Domain:dprod:DataProduct
Range:xsd:string

description

A free text description of the Data Product
Identifier:dcterms:description
Domain:dprod:DataProduct
Range:xsd:string

dataProductOwner

The Agent that is overall accountable for the data product. This includes managing the data product along its lifecycle ( creation, usage, versioning, deletion).
Identifier:dprod:dataProductOwner
Label:dataProductowner
Domain:dprod:DataProduct
Range:foaf:Agent

domain

The business or information area supported by the data product.
Identifier:dprod:domain
Comment:The domain is intended to be a resource in its own right. This specification does not constrain the class to be used.
Domain:dprod:DataProduct
Range:

inputPort

an input port describes a set of services exposed by a data product to collect its source data and makes it available for further internal transformation. An input port can receive data from one or more upstream sources in a push (i.e. asynchronous subscription) or pop mode (i.e. synchronous query). Each data product may have one or more input ports
Identifier:dprod:inputPort
Label:inputPort
Domain:dprod:DataProduct
Range:dcat:DataService

outputPort

an output port describes a set of services exposed by a data product to share the generated data in a way that can be understood and trusted
Identifier:dprod:outputPort
Label:outputPort
Domain:dprod:DataProduct
Range:dcat:DataService

inputDataset

the source data made available to the data product through input data services. Depending on the lifecycle of the data product, this may be a stated or inferred relationship aligned with the input ports
Identifier:dprod:inputDataset
Label:input Dataset
Domain:dprod:DataProduct
Range:dcat:Dataset

outputDataset

the data that is exposed by the data product through output data services in a way that can be understood and trusted. Depending on the lifecycle of the data product, this may be a stated or inferred relationship aligned with the output ports
Identifier:dprod:outputDataset
Label:output Dataset
Domain:dprod:DataProduct
Range:dcat:Dataset

purpose

A description of the objectives and intended usage of the data product.
Identifier:dprod:purpose
Domain:dprod:DataProduct
Range:xsd:string

hasPolicy

An ODRL conformant policy expressing the rights associated with the data product. This is an inferred relationship based on the rights expressed on the individual datasets of the data product
Identifier:odrl:hasPolicy
Domain:dprod:DataProduct
Range:odrl:Policy

lifecycleStatus

The lifecycle status of the Data Product taken from a control list ( Ideation, Design, Build, Deploy, Consume ).
Identifier:dprod:lifecycleStatus
Label:lifecycleStatus
Domain:dprod:DataProduct
Range:dprod:DataProductLifecycleStatus

DataService

A collection of operations that provides access to one or more datasets or data processing functions.

isAccessServiceOf

The dataset distribution that is being offered through this Data Service
Identifier:dprod:isAccessServiceOf
Label:is Access Service Of
Domain:dcat:DataService
Range:dcat:Distribution

protocol

A protocol (possibly one of many options) used to communicate with this Data Service
Identifier:dprod:protocol
Domain:dcat:DataService
Range:dcat:Protocol

securitySchemaType

The security schema type used for authentication and to communication with this Data Service
Identifier:dprod:securitySchemaType
Domain:dcat:DataService
Range:dcat:SecuritySchemaType

endpointURL

The root location or primary endpoint of the service
Identifier:dcat:endpointURL
Domain:dcat:DataService
Range:

endpointDescription

A description of the services available via the end-points, including their operations, parameters etc
Identifier:dcat:endpointDescription
Domain:dcat:DataService
Range:

Distribution

A specific representation of a dataset. A dataset might be available in multiple serializations that may differ in various ways, including natural language, media-type or format, schematic organization, temporal and spatial resolution, level of detail or profiles (which might specify any or all of the above).

accessService

A data service that gives access to the distribution of the dataset
Identifier:dcat:accessService
Domain:dcat:Distribution
Range:dcat:DataService

conformsTo

The schema that the distribution conforms to that is format and technology dependent
Identifier:dcterms:conformsTo
Domain:dcat:Distribution
Range:

isDistributionOf

The dataset that this distribution makes available
Identifier:dprod:isDistributionOf
Label:isDistributionOf
Domain:dcat:Distribution
Range:dcat:Dataset

format

The file format of the distribution
Identifier:dcterms:format
Domain:dcat:Distribution
Range:

Dataset

A collection of data, published or curated by a single agent, and available for access or download in one or more representations.

label

The name given to the dataset
Identifier:rdfs:label
Domain:dcat:Dataset
Range:xsd:string

description

Free text description of the dataset
Identifier:dcterms:description
Domain:dcat:Dataset
Range:xsd:string

type

The type or genre of the Dataset
Identifier:dcterms:type
Domain:dcat:Dataset
Range:

distribution

An available distribution of the dataset
Identifier:dcat:distribution
Domain:dcat:Dataset
Range:dcat:Distribution

conformsTo

A model, schema, ontology, view or profile that the dataset conformsTo
Identifier:dcterms:conformsTo
Domain:dcat:Dataset
Range:

hasPolicy

An ODRL conformant policy expressing the rights associated with the resource
Identifier:odrl:hasPolicy
Domain:dcat:Dataset
Range:

informationSensitivityClassification

The relationship to a taxonomy that defines the different levels of control and protection that must be applied to the dataset. This is a more granular relationship of the classification of a dataset that includes other classification concepts
Identifier:dprod:informationSensitivityClassification
Label:information Sensitivity Classification
Domain:dcat:Dataset
Range:dprod:InformationSensitivityClassification

SecuritySchemaType

A security schema type used for authentication and communication.

InformationSensitivityClassification

The shape of Information Sensitivity Classification as defined in the dprod schema

DataProductLifecycleStatus

The shape of Data Product Lifecycle Status

Protocol

A protocol, possibly including a specific version, used for communicating with a service

Worked Examples

Here are some worked examples of how to use DPROD for some common use cases

Core Data Product Extensions

For real world data products, the core data product details will be part of a wider set of metadata that allows the data and data product to be used effectively. Below is an example of extending the DPROD data product, specifically by adding an agreement to a data product.

In this example, a Data Product Agreement is defined as a subclass of FIBO Agreement.

Definition of a simple Agreement based on FIBO:

                        
[
  {
    "@context": [
      https://ekgf.github.io/data-product-spec/dprod.jsonld,
      {
        "fibo": http://spec.edmcouncil.org/fibo/ontology/FND/Agreements/MetadataFNDAgreements/#,
        "ex": http://example.org/dp#
      }
    ],
    "@id": "ex:isSubjectToAgreement",
    "@type": "rdf:Property",
    "rdfs:label": "Data Product is Subject To FIBO Agreement",
    "rdfs:domain": {
      "@id": "DataProduct"
    },
    "rdfs:range": {
      "@id": "DataProductAgreement"
    }
  },
  {
    "@id": "ex:DataProductAgreement",
    "@type": "rdfs:class",
    "rdfs:label": "DataProductAgreement",
    "rdfs:subClassOf": {
      "@id": "fibo:Agreement"
    }
  }
]


                    

A full definition of agreements for data products is likely to be more complex than a single class and may use other information models or their profiles (such as ODRL Policy) or create dedicated definitions.

Below is an example of a Data Product with an associated Data Product Agreement with an effective date.

Using the agreement:

                        
{
  "@context": [
    https://ekgf.github.io/data-product-spec/dprod.jsonld,
    {
      "fibo": http://spec.edmcouncil.org/fibo/ontology/FND/Agreements/MetadataFNDAgreements/#,
      "ex": http://example.org/dp#
    }
  ],

  "dataProducts": [
    {
      "id": https://y.com/data-product/company-sales,
      "type": "DataProduct",
      "outputPort": {
        "id": https://y.com/data-product/company-sales/port/2025-sales,
        "type": "DataService",
        "label": "Sales",
        "endpointURL": https://y.com/data-product/company-sales/port/2025-sales,
        "isAccessServiceOf": {
          "type": "Distribution",
          "format": https://www.iana.org/assignments/media-types/application/json,
          "isDistributionOf": {
            "type": "Dataset",
            "label": "Sales",
            "id": https://y.com/data-product/company-sales/dataset/2025-sales,
            "conformsTo": https://y.com/schema/Sale
          }
        }
      },
      "ex:iSubjectToAgreement": {
        "@id": "ex:VVSimpleAgreement",
        "@type": "ex:DataProductagreement"
      }
    }
  ],

  "agreements": [
    {
      "@id": "ex:VVSimpleAgreement",
      "@type": "ex:DataProductAgreement",
      "rdfs:label": "Very Simple Data Product Agreement",
      "fibo:hasEffectiveDate": {
        "@type": "xsd:date",
        "@value": "2024-08-31"
      }
    }
  ]
}


                    

     

Data Lineage

Data Lineage

It is important to be able to trace the lineage of data. Within DPROD, this can be done in two ways: at a high level from one data product to another and, if you want, at a more detailed level of the underlying datasets.

High Level Lineage: Between Data Products

Data products have input and output ports, and one data product’s input port will point to another data product’s output port.

This allows a user to query the lineage. The data products all have URLs as identifiers, and properties all connect to each other, so you can walk from one data product to the downstream data products that feed it.

You can follow the path that leads from one data product to another like this:

                        
Data Product >> inputPort >> isAccessServiceOf >> isDistributionOf >> Input Data Product


                    

Let's look at some example data with three data products that connect to each other through their input and output ports:

                        
{
  "@context": "https://ekgf.github.io/data-product-spec/dprod.jsonld",
  "dataProducts": [
    {
      "id": "https://y.com/data-product/company-finance",
      "type": "DataProduct",
      "inputPort": [
        {
          "id": "https://y.com/data-product/company-sales/port/2025-sales",
          "type": "DataService"
        },
        {
          "id": "https://y.com/data-product/company-hr/port/2025-payroll",
          "type": "DataService"
        }
      ],
      "outputPort": {
        "id": "https://y.com/data-product/company-sales/port/2025-balance-sheet",
        "type": "DataService",
        "label": "Balance Sheet",
        "endpointURL": "https://y.com/data-product/company-sales/port/2025-c",
        "isAccessServiceOf": {
          "type": "Distribution",
          "format": "https://www.iana.org/assignments/media-types/application/json",
          "isDistributionOf": {
            "type": "Dataset",
            "id": "https://y.com/data-product/company-sales/dataset/2025-balance-sheet",
            "conformsTo": "https://y.com/schema/BalanceSheet"
          }
        }
      }
    },
    {
      "id": "https://y.com/data-product/company-sales",
      "type": "DataProduct",
      "outputPort": {
        "id": "https://y.com/data-product/company-sales/port/2025-sales",
        "type": "DataService",
        "label": "Sales",
        "endpointURL": "https://y.com/data-product/company-sales/port/2025-sales",
        "isAccessServiceOf": {
          "type": "Distribution",
          "format": "https://www.iana.org/assignments/media-types/application/json",
          "isDistributionOf": {
            "type": "Dataset",
            "label": "Sales",
            "id": "https://y.com/data-product/company-sales/dataset/2025-sales",
            "conformsTo": "https://y.com/schema/Sale"
          }
        }
      }
    },
    {
      "id": "https://y.com/data-product/company-hr",
      "type": "DataProduct",
      "outputPort": {
        "id": "https://y.com/data-product/company-sales/port/2025-payroll",
        "type": "DataService",
        "label": "Payroll",
        "endpointURL": "https://y.com/data-product/company-hr/port/2025-payroll",
        "isAccessServiceOf": {
          "type": "Distribution",
          "format": "https://www.iana.org/assignments/media-types/text/csv",
          "isDistributionOf": {
            "type": "Dataset",
            "label": "Payroll",
            "id": "https://y.com/data-product/company-sales/dataset/2025-payroll",
            "conformsTo": "https://y.com/schema/Payroll"
          }
        }
      }
    }
  ]
}


                    

Given this example data, if we started at the data product https://y.com/data-product/company-finance, we could walk the relationships to find the input data products that feed it:

                        
https://y.com/data-product/company-finance >> :inputPort >> :isAccessServiceOf >> :isDistributionOf >> [https://y.com/data-product/company-sales , https://y.com/data-product/company-hr]


                    

In Linked Data, we would actually do this with a query like this:

                        
PREFIX dcat: 
PREFIX dprod: 
PREFIX rdfs: 
PREFIX : 

SELECT DISTINCT ?input
WHERE
{
  :company-finance dprod:inputPort ?inputPort.
  ?inputPort dprod:isAccessServiceOf/dprod:isDistributionOf/rdfs:label ?input.
}


                    

Detailed Level: Between Datasets

If you wish to track lineage at a more granular level, you can also use PROV (https://www.w3.org/TR/prov-o/) at the dataset level.

                        
dap:atnf-P366-2003SEPT
  rdf:type dcat:Dataset ;
  dcterms:bibliographicCitation "Burgay, M; McLaughlin, M; Kramer, M; Lyne, A; Joshi, B; Pearce, G; D'Amico, N; Possenti, A; Manchester, R; Camilo, F (2017): Parkes observations for project P366 semester 2003SEPT. v1. CSIRO. Data Collection. https://doi.org/10.4225/08/598dc08d07bb7" ;
  dcterms:title "Parkes observations for project P366 semester 2003SEPT"@en ;
  dcat:landingPage  ;
  prov:wasGeneratedBy dap:P366 ;
  .

dap:P366
  rdf:type prov:Activity ;
  dcterms:type  ;
  prov:startedAtTime "2000-11-01"^^xsd:date ;
  prov:used dap:Parkes-radio-telescope ;
  prov:wasInformedBy dap:ATNF ;
  rdfs:label "P366 - Parkes multibeam high-latitude pulsar survey"@en ;
  rdfs:seeAlso  ;
  .


                    

See: https://www.w3.org/TR/vocab-dcat-3/#examples-dataset-provenance.


     

Data Quality

[

  {
  "@context": "https://ekgf.github.io/data-product-spec/dprod.jsonld",
  "id": "https://y.com/derived-quality-measurementA",
  "@type": "QualityMeasurement",
  "value": 1,
  "computedOn": {
      "@type": "DataProduct",
      "@id": "https://y.com/products/uk-bonds"
    },
  "isMeasurementOf": {
    "@type": "Metric",
    "label": "Number of stale datasets"
    }
}
,

{
  "@context": "https://ekgf.github.io/data-product-spec/dprod.jsonld",
  "@id": "https://y.com/quality-measurement-B",
  "@type": "QualityMeasurement",
  "value": "false",
  "computedOn": {
      "@type": "Dataset",
      "@id": "https://y.com/products/uk-bonds/yearlyPrices"
    },
  "isMeasurementOf": {
    "@type": "Metric",
    "label": "Expected distribution frequency achieved"
  }
}
,
  {
    "@context": "https://ekgf.github.io/data-product-spec/dprod.jsonld",
    "id": "https://y.com/products/uk-bonds",
    "type": "DataProduct",
    "outputPort": {
      "type": "DataService",
      "endpointURL": "https://y.com/uk-bonds/quality-report",
      "isAccessServiceOf": {
        "type": "Distribution",
        "isDistributionOf": {
          "type": "Dataset",
          "conformsTo": "https://www.w3.org/TR/vocab-dqv/#dqv:QualityMeasurement"
        }
      }
    }
  }
]

Data Rights

ODRL is a W3C standard to describe rights and entitlements

More specifically based on ODRL, data product and dataset publishers can describe the policies in a consistent, standard and machine-readable manner. Policies contain permissions and prohibitions on specific actions that are required to be met by stakeholders.

In addition, policies may be limited by constraints (eg. temporal or geographical constraints) and duties ( eg. payments) that may be imposed on the permissions.

Policies and their permitted or prohibited actions can be described on different levels, eg. a Policy can target a Data Product, a Dataset, a Data Service or even a Column.

Sophisticated engines should interpret and enforce the odrl policies on the appropriate level eg.:

                        
examplePolicyA odrl:targets exampleProduct:ProductA.
examplePolicyB odrl:targets exampleDataset:DatasetA1


                    

An example of a Policy follows, that describes permission to distribute the data only inside a specific region:

                        
examplePolicyA odrl:permission
   {
    "action": "odrl:distribute",
    "constraint": [
       {"leftOperand": "region",
         "operator": "eq",
        "rightOperator": "region:EMEA"
       }
     ]
    }


                    
{
  "@type": "Policy",
  "id": "56456df-dfg-34535345-5545",

  "assigner": 'https://schema.org/person/AdamSmith",
  "target": "https://data.org/data-product/equity-trade-xxx",

  "permission": [
      {
       "action": "odrl:read",
       "constraint": [
          {
            "@type": "Constraint",
            "leftOperand": "workingCountry",
            "operator": "odrl:isAnyOf",
            "rightOperand": [region:EMEA, region:APAC],
            "description": " Permission to read all the datasets of the product if user is working inside EMEA or APAC"
        }
    ]
}

Data Schema

The Data Product provides to the consumers (dprod:outputDataset) datasets defined based on DCAT. Datasets should be described (dcat:conforms) with logical models. Logical models describe business entities and their properties (attributes and relationships) with consistent business terms and they are technology independent. Ideally, logical models are based on existing standards eg, FIBO, CDM etc. If a logical model does not exist to describe the dataset, then the dataset publisher can create one, preferably by using SHACL modelling language:

Example of a Dataset conforming to a SHACL Schema

                        
exampleDataset dcat:conforms exampleSchema:DatasetLogicalSchema.
exampleSchema:DatasetLogicalSchema a owl:Ontology, dct:Standard.


                    

Based on SHACL all entities that exist in the dataset are Node Shapes (1). The attributes of the entities are described as Property Shapes with sh:datatype (2) The relationships are also defined as Property Shaped with sh:class the target class of the relationship (3)

                        
example:Account a sh:NodeShape;       // definition of the entity as a Node Shape (1)
rdfs:label "Account"@en;              // human readable name of the entity
dc:description "An Account is...";    // description of the entity
sh:property example:Account-AccountAge;       // an account has a property shape Account Age. Definition of the property shape follows (2)
sh:property example:Account-AccountBranch     // an account has a property shape Account Branch. Definition of the property shape follows   (3)
rdfs:isDefinedBy exampleSchema:DatasetLogicalSchema;
.

example:Account-AccountAge a sh:PropertyShape;   // (2) Definition of the Account-AccountAge property shape describing that an account MUST have exactly AccountAge attribute and its datatype is integer
sh:path example:AccountAge;
sh:datatype xsd:integer;
sh:minCount 1;
sh:maxCount 1;
rdfs:isDefinedBy exampleSchema:DatasetLogicalSchema;
.

example:Account-AccountBranch a sh:PropertyShape;   // (3) Definition of the Account-AccountBranch property shape describing than an account must have at least one Account Branch which is another entity
sh:path example:AccountBranch;
sh:class  example:Branch;
sh:minCount 1;
rdfs:isDefinedBy exampleSchema:DatasetLogicalSchema;
.

example:Branch a sh:NodeShape;     //definition of the entity Branch as a Node Shape (1)
rdfs:label "Branch"@en;              // human readable description of the entity
dc:description "A Branch is..";
rdfs:isDefinedBy exampleSchema:DatasetLogicalSchema;
....


                    
{
"@context": "https://ekgf.github.io/data-product-spec/dprod.jsonld",
"id": "https://y.com/products/equity-trade-xxx",
"@type": "DataProduct",
"title": "Equity Trade XXX",
"description": "Trade data defining the outcome of equity trades between parties in different stock markets, where the terms are primarily reflected in the tradable product. Additionally, Trade includes attributes such as the trade date, transacting parties, and settlement terms. Some attributes, such as the parties, are already defined in the Party Product and are simply referenced in Trade",
"outputPort": {
  "@type": "DataService",
  "endpointURL": "abfss://datasetsv1@demo.dfs.core.windows.net/demo/full/trade-euronext",
  "isAccessServiceOf": {
    "@type": "Distribution",
    "format": "application/parquet",
    "isDistributionOf": {
      "@type": "Dataset",
      "@id": "https://y.com/dataset/equity-trade-euronext-paris",
      "title":  "Equity Trade Euronext Paris XXX",
        "conformsTo":"https://spec.edmcouncil.org/fibo/ontology/BP/Process/FinancialContextAndProcess/SecuritiesTrade"     }
    }
  }
}

Equity Trade

Example of a Data Product with Equity Trades.

The Equity Trade Data Product provides to the consumers two datasets, one for trades in LSEG and one in Euronext.

{
"@context": "https://ekgf.github.io/data-product-spec/dprod.jsonld",
"id":  "equity trade-xxx",
"@id": "https://y.com/products/equity-trade-xxx",
"@type": "DataProduct",
"title": "Equity Trade XXX",
"description": "Trade data defining the outcome of equity trades between parties in different stock markets, where the terms are primarily reflected in the tradable product. Additionally, Trade includes attributes such as the trade date, transacting parties, and settlement terms. Some attributes, such as the parties, are already defined in the Party Product and are simply referenced in Trade",
"dataProductOwner": "https://www.schema.xxx/person/AnnTaylor",
"lifecycle" : "Consume",
"outputPort": [{
  "@type": "dcat:DataService",
  "id":  "equity-trade-euronext-xxx-tabular-adls-prod",
  "@id": "https://y.com/service/equity-trade-euronext-xxx-adls-prod-1",
  "dcat:endpointURL": "abfss://datasetsv1@demo.dfs.core.windows.net/demo/full/trade-euronext",
  "dcat:endpointDescription": "Details for accessing storage account",
  "isAccessServiceOf": {
    "@id": "https://y.com/service/equity-trade-euronext-xxx-tabular",
    "id":  "equity-trade-euronext-xxx-tabular",
    "@type": "dcat:Distribution",
    "dcterms:format": "application/parquet",
     "dcat:conformsTo": "https://cdm.finos.org/docs/event-model" ,
    "isDistributionOf": {
      "@type": "dcat:Dataset",
      "@id": "https://y.com/dataset/equity-trade-euronext-paris",
      "datasetOwner": "https://www.schema.xxx/person/JohnBarks",
      "title":  "Equity Trade Euronext Paris XXX",
      "id":  "equity-trade-euronext-paris-xxx",
      "dcat:conformsTo": "https://cdm.finos.org/docs/event-model"
        }
    }
  }
,{
   "@type": "dcat:DataService",
  "id":  "equity-trade-lseg-xxx-tabular-adls-prod",
  "@id": "https://y.com/service/equity-trade-lseg-xxx-adls-prod-1",
  "dcat:endpointURL": "abfss://datasetsv1@demo.dfs.core.windows.net/demo/full/trade-lseg",
  "dcat:endpointDescription": "Details for accessing storage account",
  "isAccessServiceOf": {
    "@id": "https://y.com/service/equity-trade-lseg-xxx-tabular",
    "id":  "equity-trade-lseg-xxx-tabular",
    "@type": "dcat:Distribution",
    "dcterms:format": "application/parquet",
     "dcat:conformsTo": "https://cdm.finos.org/docs/event-model" ,
    "isDistributionOf": {
      "@type": "dcat:Dataset",
      "@id": "https://y.com/dataset/equity-trade-lseg-xxx",
      "title":  "Equity Trade LSEG XXX",
      "id":  "equity-trade-lseg-xxx",
      "dcat:conformsTo": "https://cdm.finos.org/docs/event-model"
        }
    }
  }
  ]
}

Observability

Observability Ports

An Observability Port is a designated interface or endpoint in a system or application specifically used for monitoring and diagnostic purposes. It allows external tools or services to collect and analyze data related to the system's performance, health, and behaviour. By exposing metrics, logs, and traces through this port, administrators and developers can gain insights into the system's state, troubleshoot issues, and ensure it operates efficiently and reliably.

Defining Observability Ports in DPROD

DPROD has a schema-first design. The first thing you would need to do is define a schema for your logging information. It could be a schema based on OpenTelemetry, but in this example, we use RLOG (which is a semantic ontology for logging).

To find the Observability Port, you would query the ports to identify the ones that return an RLOG:Entry:

                        
  outputPort >> isAccessServiceOf >> isDistributionOf >> conformsTo  >> rlog:Entry


                    

Example Data Product with Observability Port

You can see that the example data product has two ports, one with the data and one with the logging. This query will return the URI of the port that returns logging data: https://y.com/uk-bonds/observability-port.

Here is an example of a data product with an observability port:

                        
{
  "@context": "https://ekgf.github.io/data-product-spec/dprod.jsonld",
  "dataProducts": [
    {
      "id": "https://y.com/data-product/uk-bonds",
      "type": "DataProduct",
      "inputPort": [
        {
          "id": "https://y.com/data-product/uk-bonds/port/2024-data",
          "type": "DataService"
        }
      ],
      "outputPort": [
        {
          "id": "https://y.com/data-product/uk-bonds/port/2024-observability",
          "type": "DataService",
          "label": "Observability Port",
          "endpointURL": "https://y.com/data-product/uk-bonds/port/2024-observability",
          "isAccessServiceOf": {
            "type": "Distribution",
            "format": "https://www.iana.org/assignments/media-types/application/json",
            "isDistributionOf": {
              "type": "Dataset",
              "id": "https://y.com/data-product/uk-bonds/dataset/2024-observability",
              "conformsTo": "https://y.com/schema/ObservabilityLog"
            }
          }
        },
        {
          "id": "https://y.com/data-product/uk-bonds/port/2024-data",
          "type": "DataService",
          "label": "Data Port",
          "endpointURL": "https://y.com/data-product/uk-bonds/port/2024-data",
          "isAccessServiceOf": {
            "type": "Distribution",
            "format": "https://www.iana.org/assignments/media-types/application/json",
            "isDistributionOf": {
              "type": "Dataset",
              "id": "https://y.com/data-product/uk-bonds/dataset/2024-data",
              "conformsTo": "https://y.com/schema/Data"
            }
          }
        }
      ]
    }
  ]
}


                    

Given that our schema defines the class for an observation, we can use that to find all observantly ports on data product like this:

                        
[https://y.com/data-product/uk-bonds/port/2024-observability] >> isAccessServiceOf >> isDistributionOf >> conformsTo >> https://y.com/schema/ObservabilityLog


                    

In Linked Data we would use a SPARQL query to do that:

                        
SELECT ?port
WHERE
{
  ?port a dcat:DataService .
  ?port (dprod:isAccessServiceOf/dprod:isDistributionOf)/dcat:conformsTo rlog:Entry
}


                    

This query will return the URI of the port that provides logging data: https://y.com/data-product/uk-bonds/port/2024-observability.


     

Sba Pool Rates

Rates for SBE Pool that are Mortgage Backed Securities. The data product is provided through 3 ports: 1st port: providing all sba pool rates through a query to a database 2nd port: providing only EMEA rates through an api 3rd port: providing only US rates through a Kafka topic

{  "@context": "https://ekgf.github.io/data-product-spec/dprod.jsonld",
  "id":  "sba-pool-rates",
  "@id": "https://y.com/products/sba-pool-rates",
  "@type": "DataProduct",
  "title": "SBA Pool Rates",
  "description": "Rates for SBE Pool that are Mortgage Backed Securities. The data product is provided through 3 ports, one of them proving all sba pool rates through a query to a database, another port providing EMEA only rates through an api and another one providing US only rates through a Kafka topic",
  "dataProductOwner": "https://www.schema.xxx/person/johnSmith",
  "lifecycle" : "Consume",

  "outputPort": [{
    "@type": "dcat:DataService",
    "id":  "sba-pool-rate-tabular-prod1",
    "environment": "PROD",
    "@id": "https://y.com/service/sba-pool-rate-tabular-prod1",
    "dcat:endpointURL": "jdbc:oracle:thin@sd656-5656-6745.ldn.organiation.com:43534/PGPERG.WORLD",
    "sql": "select * from ...",
    "isAccessServiceOf": {
      "@id": "https://y.com/distribution/sba-pool-rate-tabular",
      "id":  "sba-pool-rate-tabular",
      "@type": "dcat:Distribution",
      "dcterms:format": "https://www.iana.org/assignments/media-types/application/sql",
      "isDistributionOf": {
        "@type": "dcat:Dataset",
        "@id": "https://y.com/dataset/sba-pool-rate",
        "id":  "sba-pool-rate",
        "dcat:conformsTo": "https://spec.edmcouncil.org/fibo/ontology/SEC/Debt/MortgageBackedSecurities/SBA-Pool"
      }
    }
   }
,
  {
    "@type": "dcat:DataService",
    "id":  "sba-pool-rate-emea-api-prod1",
    "environment": "PROD",
    "@id": "https://y.com/service/sba-pool-rate-emea-api-prod1",
    "dcat:endpointURL": "https://example.org/mbs/SBA-Pool-location-emea",
    "dcat:conformsTo":  "../resources/users.yaml'" ,
    "isAccessServiceOf": {
      "@id": "https://y.com/distribution/sba-pool-rate-emea-json1",
      "id":  "sba-pool-rate-emea-json1",
      "@type": "dcat:Distribution",
      "dcterms:format": "https://www.iana.org/assignments/media-types/application/json",
      "isDistributionOf": {
        "@type": "dcat:Dataset",
        "@id": "https://y.com/dataset/sba-pool-rate-emea",
        "description": "sba pool data that cover EMEA accessed through an api" ,
        "geographicalCoverage" :"https://y.com/country/EMEA",
        "id":  "sba-pool-rate-emea",
        "dcat:conformsTo": "https://spec.edmcouncil.org/fibo/ontology/SEC/Debt/MortgageBackedSecurities/SBA-Pool"
      }
    }
   }

 , {
     "@type": "dcat:DataService",
     "id":  "sba-pool-rate-json-prod1",
     "@id": "https://y.com/service/sba-pool-rate-json-prod1",
     "dcat:endpointURL": "q1.debt.mbs.dataset.us",
     "isAccessServiceOf": {
       "@id": "https://y.com/distribution/sba-pool-rate-json",
       "id":  "sba-pool-rate-json",
       "@type": "dcat:Distribution",
       "dcterms:format": "https://www.iana.org/assignments/media-types/application/json",
       "conformsTo": "http://confluent-registry-y/rates-json-schema.json",
        "schemaCompatiblity": "backwards compatible",
        "isDistributionOf": {
         "@type": "dcat:Dataset",
         "@id": "https://y.com/dataset/sba-pool-rate-us",
         "id":  "sba-pool-rate-us",
         "geographicalCoverage" :"https://y.com/country/US" ,
         "dcat:conformsTo": "https://spec.edmcouncil.org/fibo/ontology/SEC/Debt/MortgageBackedSecurities/SBA-Pool"
       }
     }
  }
    ]
}

Acknowledgements

The editors gratefully acknowledge the feedback and contributions made to this document by: