The Data Product (DPROD) specification is a profile of the Data Catalog (DCAT) Vocabulary, designed to describe Data Products. This document defines the schema and provides examples for its use.
DPROD extends DCAT to enable publishers to describe Data Products and data services in a decentralized way. By using a standard model and vocabulary, DPROD facilitates the consumption and aggregation of metadata from multiple Data Marketplaces. This approach increases the discoverability of products and services, supports decentralized data publishing, and enables federated search across multiple sites using a uniform query mechanism and structure.
The namespace for DPROD terms is https://ekgf.github.io/data-product-spec/dprod
The suggested prefix for the DPROD namespace is dprod
DPROD follows two basic principles:
🔵Decentralize Data Ownership: To make data integration more efficient, tasks should be shared among multiple teams. DCAT helps by offering a standard way to publish datasets in a decentralized manner.
🔵Harmonize Data Schemas: Using shared schemas helps unify different data formats. For instance, the DPROD specification provides a common set of rules for defining a Data Product. You can extend this schema as needed.
The DPROD specification has four main aims:
🔵 To provide unambiguous and sharable semantics to answer the question: 'What is a data product?'
🔵 Be simple for anyone to use, but expressive enough to power large data marketplaces
🔵 Allow organisations to reuse their existing data catalogues and dataset infrastructure
🔵 To share common semantics across different Data Products and promote harmonisation
The current version is DRAFT. Feedback and comments welcome via the Github Issue feature.
As well as sections marked as non-normative, all authoring guidelines, diagrams, examples, and notes in this Profile are non-normative. Everything else in this Profile is normative.
The keywords MAY, MUST, MUST NOT, RECOMMENDED, SHOULD, and SHOULD NOT are to be interpreted as described in [[!RFC2119]].
Namespaces and prefixes used in normative parts of this Profile are shown in the following table.
Prefix | Namespace IRI | Source |
---|---|---|
dcat
|
http://www.w3.org/ns/dcat#
|
[[VOCAB-DCAT-3]] |
dct
|
http://purl.org/dc/terms/
|
[[DCTERMS]] |
odrl
|
http://www.w3.org/ns/odrl/2/
|
[[ODRL-VOCAB]] |
sdo
|
https://schema.org
|
[[SCHEMA-ORG]] |
Data Mesh Architectures use input and output ports to manage how data enters and leaves a Data Product. These ports can handle different formats, schemas, and protocols. Input ports bring in data, while output ports send data to other Data Products for aggregation, reuse, analysis or reporting etc.
In the Data Catalog Vocabulary (DCAT) framework, a Data Service is a way to describe services that provide access to data. Data Services give standardized, machine-readable descriptions of how to access one or more datasets or data processing functions.
Data Services specify how to access and download the data. In DPROD Data Services are connected to Distributions by a property called isAccessServiceOf, on the Distribution you can specify formats (like CSV or JSON etc) and provide metadata about the "physical model" of the data. Distributions link to Datasets and DCAT has a very rich vocabulary for describing every aspect of your dataset. Finally, Datasets use the conformsTo property to link to the "logical model" where you can specify rich semantic metadata of your own.
By linking Data Product ports to DCAT DataServices, DPROD can describe Data Products in a way that machines can read across the organization. This makes it easier for data teams to build and manage their own data products independently, while still working well with the rest of the organization's data.
Using standards like DCAT helps create a strong and clear way to define Data Products. It ensures that as data becomes more complex, the methods for describing, sharing, and using data stay consistent and reliable. It also allows different organizations to share data securely and in a standardized way.
The Profile consists of the following classes:
dcat:Catalog
) - The collection of Data Products dprod:DataProduct
) - A data product may have input and output ports, code and metadatadcat:DataService
) - A digital interface that provides access to a Dataset. The can be a HTTP URL, a Database or a FileShare etcdcat:Distribution
) - A specific representation of a dataset (CSV, JSON, ADLS etc) which can conform to a physical modeldcat:Dataset
) - A collection of related data that can conform to a logical modelAs DCAT Data Services, the DPROD input and output ports can specify connection details, they have distributions that define formats, and link to datasets that conform to shared schemas. In this example, the UK Bonds Data Product includes an output port, which is a RESTful API. This API delivers JSON data conforming to the shared FIBO specification for callable bonds.
{ "@context": "https://ekgf.github.io/data-product-spec/dprod.jsonld", "id": "https://y.com/products/uk-bonds", "type": "DataProduct", "title": "UK Bonds", "description": "UK Bonds is your one-stop-shop for all your bonds!", "dataProductOwner": "https://www.linkedin.com/in/tonyseale/", "lifecycleStatus" : "https://ekgf.github.io/data-product-spec/dprod/data/lifecycle-status/Consume", "outputPort": { "type": "DataService", "endpointURL": "https://y.com/uk-10-year-bonds", "isAccessServiceOf": { "type": "Distribution", "format": "https://www.iana.org/assignments/media-types/application/json", "isDistributionOf": { "type": "Dataset", "id": "https://y.com/products/uk-bonds/datasets/10-year", "conformsTo": "https://spec.edmcouncil.org/fibo/ontology/SEC/Debt/Bonds/CallableBond" } } } }
The examples in map the type of the above classes to @type
in the JSON-LD serialisations. You can use JSON-LD to extend the familiar JSON syntax with the shared semantics defined by DCAT and DPROD.
You can copy the json above and paste it into https://json-ld.org/playground. You can see that the schema resolves.
Identifier: | rdfs:label |
---|---|
Domain: | dprod:DataProduct |
Range: | xsd:string |
Identifier: | dcterms:description |
---|---|
Domain: | dprod:DataProduct |
Range: | xsd:string |
Identifier: | dprod:dataProductOwner |
---|---|
Label: | dataProductowner |
Domain: | dprod:DataProduct |
Range: | foaf:Agent |
Identifier: | dprod:domain |
---|---|
Comment: | The domain is intended to be a resource in its own right. This specification does not constrain the class to be used. |
Domain: | dprod:DataProduct |
Range: |
Identifier: | dprod:inputPort |
---|---|
Label: | inputPort |
Domain: | dprod:DataProduct |
Range: | dcat:DataService |
Identifier: | dprod:outputPort |
---|---|
Label: | outputPort |
Domain: | dprod:DataProduct |
Range: | dcat:DataService |
Identifier: | dprod:inputDataset |
---|---|
Label: | input Dataset |
Domain: | dprod:DataProduct |
Range: | dcat:Dataset |
Identifier: | dprod:outputDataset |
---|---|
Label: | output Dataset |
Domain: | dprod:DataProduct |
Range: | dcat:Dataset |
Identifier: | dprod:purpose |
---|---|
Domain: | dprod:DataProduct |
Range: | xsd:string |
Identifier: | odrl:hasPolicy |
---|---|
Domain: | dprod:DataProduct |
Range: | odrl:Policy |
Identifier: | dprod:lifecycleStatus |
---|---|
Label: | lifecycleStatus |
Domain: | dprod:DataProduct |
Range: | dprod:DataProductLifecycleStatus |
Identifier: | dprod:isAccessServiceOf |
---|---|
Label: | is Access Service Of |
Domain: | dcat:DataService |
Range: | dcat:Distribution |
Identifier: | dprod:protocol |
---|---|
Domain: | dcat:DataService |
Range: | dcat:Protocol |
Identifier: | dprod:securitySchemaType |
---|---|
Domain: | dcat:DataService |
Range: | dcat:SecuritySchemaType |
Identifier: | dcat:endpointURL |
---|---|
Domain: | dcat:DataService |
Range: |
Identifier: | dcat:endpointDescription |
---|---|
Domain: | dcat:DataService |
Range: |
Identifier: | dcat:accessService |
---|---|
Domain: | dcat:Distribution |
Range: | dcat:DataService |
Identifier: | dcterms:conformsTo |
---|---|
Domain: | dcat:Distribution |
Range: |
Identifier: | dprod:isDistributionOf |
---|---|
Label: | isDistributionOf |
Domain: | dcat:Distribution |
Range: | dcat:Dataset |
Identifier: | dcterms:format |
---|---|
Domain: | dcat:Distribution |
Range: |
Identifier: | rdfs:label |
---|---|
Domain: | dcat:Dataset |
Range: | xsd:string |
Identifier: | dcterms:description |
---|---|
Domain: | dcat:Dataset |
Range: | xsd:string |
Identifier: | dcterms:type |
---|---|
Domain: | dcat:Dataset |
Range: |
Identifier: | dcat:distribution |
---|---|
Domain: | dcat:Dataset |
Range: | dcat:Distribution |
Identifier: | dcterms:conformsTo |
---|---|
Domain: | dcat:Dataset |
Range: |
Identifier: | odrl:hasPolicy |
---|---|
Domain: | dcat:Dataset |
Range: |
Identifier: | dprod:informationSensitivityClassification |
---|---|
Label: | information Sensitivity Classification |
Domain: | dcat:Dataset |
Range: | dprod:InformationSensitivityClassification |
Here are some worked examples of how to use DPROD for some common use cases
For real world data products, the core data product details will be part of a wider set of metadata that allows the data and data product to be used effectively. Below is an example of extending the DPROD data product, specifically by adding an agreement to a data product.
In this example, a Data Product Agreement is defined as a subclass of FIBO Agreement.
Definition of a simple Agreement based on FIBO:
[
{
"@context": [
https://ekgf.github.io/data-product-spec/dprod.jsonld,
{
"fibo": http://spec.edmcouncil.org/fibo/ontology/FND/Agreements/MetadataFNDAgreements/#,
"ex": http://example.org/dp#
}
],
"@id": "ex:isSubjectToAgreement",
"@type": "rdf:Property",
"rdfs:label": "Data Product is Subject To FIBO Agreement",
"rdfs:domain": {
"@id": "DataProduct"
},
"rdfs:range": {
"@id": "DataProductAgreement"
}
},
{
"@id": "ex:DataProductAgreement",
"@type": "rdfs:class",
"rdfs:label": "DataProductAgreement",
"rdfs:subClassOf": {
"@id": "fibo:Agreement"
}
}
]
A full definition of agreements for data products is likely to be more complex than a single class and may use other information models or their profiles (such as ODRL Policy) or create dedicated definitions.
Below is an example of a Data Product with an associated Data Product Agreement with an effective date.
Using the agreement:
{
"@context": [
https://ekgf.github.io/data-product-spec/dprod.jsonld,
{
"fibo": http://spec.edmcouncil.org/fibo/ontology/FND/Agreements/MetadataFNDAgreements/#,
"ex": http://example.org/dp#
}
],
"dataProducts": [
{
"id": https://y.com/data-product/company-sales,
"type": "DataProduct",
"outputPort": {
"id": https://y.com/data-product/company-sales/port/2025-sales,
"type": "DataService",
"label": "Sales",
"endpointURL": https://y.com/data-product/company-sales/port/2025-sales,
"isAccessServiceOf": {
"type": "Distribution",
"format": https://www.iana.org/assignments/media-types/application/json,
"isDistributionOf": {
"type": "Dataset",
"label": "Sales",
"id": https://y.com/data-product/company-sales/dataset/2025-sales,
"conformsTo": https://y.com/schema/Sale
}
}
},
"ex:iSubjectToAgreement": {
"@id": "ex:VVSimpleAgreement",
"@type": "ex:DataProductagreement"
}
}
],
"agreements": [
{
"@id": "ex:VVSimpleAgreement",
"@type": "ex:DataProductAgreement",
"rdfs:label": "Very Simple Data Product Agreement",
"fibo:hasEffectiveDate": {
"@type": "xsd:date",
"@value": "2024-08-31"
}
}
]
}
It is important to be able to trace the lineage of data. Within DPROD, this can be done in two ways: at a high level from one data product to another and, if you want, at a more detailed level of the underlying datasets.
Data products have input and output ports, and one data product’s input port will point to another data product’s output port.
This allows a user to query the lineage. The data products all have URLs as identifiers, and properties all connect to each other, so you can walk from one data product to the downstream data products that feed it.
You can follow the path that leads from one data product to another like this:
Data Product >> inputPort >> isAccessServiceOf >> isDistributionOf >> Input Data Product
Let's look at some example data with three data products that connect to each other through their input and output ports:
{
"@context": "https://ekgf.github.io/data-product-spec/dprod.jsonld",
"dataProducts": [
{
"id": "https://y.com/data-product/company-finance",
"type": "DataProduct",
"inputPort": [
{
"id": "https://y.com/data-product/company-sales/port/2025-sales",
"type": "DataService"
},
{
"id": "https://y.com/data-product/company-hr/port/2025-payroll",
"type": "DataService"
}
],
"outputPort": {
"id": "https://y.com/data-product/company-sales/port/2025-balance-sheet",
"type": "DataService",
"label": "Balance Sheet",
"endpointURL": "https://y.com/data-product/company-sales/port/2025-c",
"isAccessServiceOf": {
"type": "Distribution",
"format": "https://www.iana.org/assignments/media-types/application/json",
"isDistributionOf": {
"type": "Dataset",
"id": "https://y.com/data-product/company-sales/dataset/2025-balance-sheet",
"conformsTo": "https://y.com/schema/BalanceSheet"
}
}
}
},
{
"id": "https://y.com/data-product/company-sales",
"type": "DataProduct",
"outputPort": {
"id": "https://y.com/data-product/company-sales/port/2025-sales",
"type": "DataService",
"label": "Sales",
"endpointURL": "https://y.com/data-product/company-sales/port/2025-sales",
"isAccessServiceOf": {
"type": "Distribution",
"format": "https://www.iana.org/assignments/media-types/application/json",
"isDistributionOf": {
"type": "Dataset",
"label": "Sales",
"id": "https://y.com/data-product/company-sales/dataset/2025-sales",
"conformsTo": "https://y.com/schema/Sale"
}
}
}
},
{
"id": "https://y.com/data-product/company-hr",
"type": "DataProduct",
"outputPort": {
"id": "https://y.com/data-product/company-sales/port/2025-payroll",
"type": "DataService",
"label": "Payroll",
"endpointURL": "https://y.com/data-product/company-hr/port/2025-payroll",
"isAccessServiceOf": {
"type": "Distribution",
"format": "https://www.iana.org/assignments/media-types/text/csv",
"isDistributionOf": {
"type": "Dataset",
"label": "Payroll",
"id": "https://y.com/data-product/company-sales/dataset/2025-payroll",
"conformsTo": "https://y.com/schema/Payroll"
}
}
}
}
]
}
Given this example data, if we started at the data product https://y.com/data-product/company-finance
, we could walk the relationships to find the input data products that feed it:
https://y.com/data-product/company-finance >> :inputPort >> :isAccessServiceOf >> :isDistributionOf >> [https://y.com/data-product/company-sales , https://y.com/data-product/company-hr]
In Linked Data, we would actually do this with a query like this:
PREFIX dcat:
PREFIX dprod:
PREFIX rdfs:
PREFIX :
SELECT DISTINCT ?input
WHERE
{
:company-finance dprod:inputPort ?inputPort.
?inputPort dprod:isAccessServiceOf/dprod:isDistributionOf/rdfs:label ?input.
}
If you wish to track lineage at a more granular level, you can also use PROV (https://www.w3.org/TR/prov-o/) at the dataset level.
dap:atnf-P366-2003SEPT
rdf:type dcat:Dataset ;
dcterms:bibliographicCitation "Burgay, M; McLaughlin, M; Kramer, M; Lyne, A; Joshi, B; Pearce, G; D'Amico, N; Possenti, A; Manchester, R; Camilo, F (2017): Parkes observations for project P366 semester 2003SEPT. v1. CSIRO. Data Collection. https://doi.org/10.4225/08/598dc08d07bb7" ;
dcterms:title "Parkes observations for project P366 semester 2003SEPT"@en ;
dcat:landingPage ;
prov:wasGeneratedBy dap:P366 ;
.
dap:P366
rdf:type prov:Activity ;
dcterms:type ;
prov:startedAtTime "2000-11-01"^^xsd:date ;
prov:used dap:Parkes-radio-telescope ;
prov:wasInformedBy dap:ATNF ;
rdfs:label "P366 - Parkes multibeam high-latitude pulsar survey"@en ;
rdfs:seeAlso ;
.
See: https://www.w3.org/TR/vocab-dcat-3/#examples-dataset-provenance.
[ { "@context": "https://ekgf.github.io/data-product-spec/dprod.jsonld", "id": "https://y.com/derived-quality-measurementA", "@type": "QualityMeasurement", "value": 1, "computedOn": { "@type": "DataProduct", "@id": "https://y.com/products/uk-bonds" }, "isMeasurementOf": { "@type": "Metric", "label": "Number of stale datasets" } } , { "@context": "https://ekgf.github.io/data-product-spec/dprod.jsonld", "@id": "https://y.com/quality-measurement-B", "@type": "QualityMeasurement", "value": "false", "computedOn": { "@type": "Dataset", "@id": "https://y.com/products/uk-bonds/yearlyPrices" }, "isMeasurementOf": { "@type": "Metric", "label": "Expected distribution frequency achieved" } } , { "@context": "https://ekgf.github.io/data-product-spec/dprod.jsonld", "id": "https://y.com/products/uk-bonds", "type": "DataProduct", "outputPort": { "type": "DataService", "endpointURL": "https://y.com/uk-bonds/quality-report", "isAccessServiceOf": { "type": "Distribution", "isDistributionOf": { "type": "Dataset", "conformsTo": "https://www.w3.org/TR/vocab-dqv/#dqv:QualityMeasurement" } } } } ]
ODRL is a W3C standard to describe rights and entitlements
More specifically based on ODRL, data product and dataset publishers can describe the policies in a consistent, standard and machine-readable manner. Policies contain permissions and prohibitions on specific actions that are required to be met by stakeholders.
In addition, policies may be limited by constraints (eg. temporal or geographical constraints) and duties ( eg. payments) that may be imposed on the permissions.
Policies and their permitted or prohibited actions can be described on different levels, eg. a Policy can target a Data Product, a Dataset, a Data Service or even a Column.
Sophisticated engines should interpret and enforce the odrl policies on the appropriate level eg.:
examplePolicyA odrl:targets exampleProduct:ProductA.
examplePolicyB odrl:targets exampleDataset:DatasetA1
An example of a Policy follows, that describes permission to distribute the data only inside a specific region:
examplePolicyA odrl:permission
{
"action": "odrl:distribute",
"constraint": [
{"leftOperand": "region",
"operator": "eq",
"rightOperator": "region:EMEA"
}
]
}
{ "@type": "Policy", "id": "56456df-dfg-34535345-5545", "assigner": 'https://schema.org/person/AdamSmith", "target": "https://data.org/data-product/equity-trade-xxx", "permission": [ { "action": "odrl:read", "constraint": [ { "@type": "Constraint", "leftOperand": "workingCountry", "operator": "odrl:isAnyOf", "rightOperand": [region:EMEA, region:APAC], "description": " Permission to read all the datasets of the product if user is working inside EMEA or APAC" } ] }
The Data Product provides to the consumers (dprod:outputDataset) datasets defined based on DCAT. Datasets should be described (dcat:conforms) with logical models. Logical models describe business entities and their properties (attributes and relationships) with consistent business terms and they are technology independent. Ideally, logical models are based on existing standards eg, FIBO, CDM etc. If a logical model does not exist to describe the dataset, then the dataset publisher can create one, preferably by using SHACL modelling language:
Example of a Dataset conforming to a SHACL Schema
exampleDataset dcat:conforms exampleSchema:DatasetLogicalSchema.
exampleSchema:DatasetLogicalSchema a owl:Ontology, dct:Standard.
Based on SHACL all entities that exist in the dataset are Node Shapes (1). The attributes of the entities are described as Property Shapes with sh:datatype (2) The relationships are also defined as Property Shaped with sh:class the target class of the relationship (3)
example:Account a sh:NodeShape; // definition of the entity as a Node Shape (1)
rdfs:label "Account"@en; // human readable name of the entity
dc:description "An Account is..."; // description of the entity
sh:property example:Account-AccountAge; // an account has a property shape Account Age. Definition of the property shape follows (2)
sh:property example:Account-AccountBranch // an account has a property shape Account Branch. Definition of the property shape follows (3)
rdfs:isDefinedBy exampleSchema:DatasetLogicalSchema;
.
example:Account-AccountAge a sh:PropertyShape; // (2) Definition of the Account-AccountAge property shape describing that an account MUST have exactly AccountAge attribute and its datatype is integer
sh:path example:AccountAge;
sh:datatype xsd:integer;
sh:minCount 1;
sh:maxCount 1;
rdfs:isDefinedBy exampleSchema:DatasetLogicalSchema;
.
example:Account-AccountBranch a sh:PropertyShape; // (3) Definition of the Account-AccountBranch property shape describing than an account must have at least one Account Branch which is another entity
sh:path example:AccountBranch;
sh:class example:Branch;
sh:minCount 1;
rdfs:isDefinedBy exampleSchema:DatasetLogicalSchema;
.
example:Branch a sh:NodeShape; //definition of the entity Branch as a Node Shape (1)
rdfs:label "Branch"@en; // human readable description of the entity
dc:description "A Branch is..";
rdfs:isDefinedBy exampleSchema:DatasetLogicalSchema;
....
{ "@context": "https://ekgf.github.io/data-product-spec/dprod.jsonld", "id": "https://y.com/products/equity-trade-xxx", "@type": "DataProduct", "title": "Equity Trade XXX", "description": "Trade data defining the outcome of equity trades between parties in different stock markets, where the terms are primarily reflected in the tradable product. Additionally, Trade includes attributes such as the trade date, transacting parties, and settlement terms. Some attributes, such as the parties, are already defined in the Party Product and are simply referenced in Trade", "outputPort": { "@type": "DataService", "endpointURL": "abfss://datasetsv1@demo.dfs.core.windows.net/demo/full/trade-euronext", "isAccessServiceOf": { "@type": "Distribution", "format": "application/parquet", "isDistributionOf": { "@type": "Dataset", "@id": "https://y.com/dataset/equity-trade-euronext-paris", "title": "Equity Trade Euronext Paris XXX", "conformsTo":"https://spec.edmcouncil.org/fibo/ontology/BP/Process/FinancialContextAndProcess/SecuritiesTrade" } } } }
Example of a Data Product with Equity Trades.
The Equity Trade Data Product provides to the consumers two datasets, one for trades in LSEG and one in Euronext.
{ "@context": "https://ekgf.github.io/data-product-spec/dprod.jsonld", "id": "equity trade-xxx", "@id": "https://y.com/products/equity-trade-xxx", "@type": "DataProduct", "title": "Equity Trade XXX", "description": "Trade data defining the outcome of equity trades between parties in different stock markets, where the terms are primarily reflected in the tradable product. Additionally, Trade includes attributes such as the trade date, transacting parties, and settlement terms. Some attributes, such as the parties, are already defined in the Party Product and are simply referenced in Trade", "dataProductOwner": "https://www.schema.xxx/person/AnnTaylor", "lifecycle" : "Consume", "outputPort": [{ "@type": "dcat:DataService", "id": "equity-trade-euronext-xxx-tabular-adls-prod", "@id": "https://y.com/service/equity-trade-euronext-xxx-adls-prod-1", "dcat:endpointURL": "abfss://datasetsv1@demo.dfs.core.windows.net/demo/full/trade-euronext", "dcat:endpointDescription": "Details for accessing storage account", "isAccessServiceOf": { "@id": "https://y.com/service/equity-trade-euronext-xxx-tabular", "id": "equity-trade-euronext-xxx-tabular", "@type": "dcat:Distribution", "dcterms:format": "application/parquet", "dcat:conformsTo": "https://cdm.finos.org/docs/event-model" , "isDistributionOf": { "@type": "dcat:Dataset", "@id": "https://y.com/dataset/equity-trade-euronext-paris", "datasetOwner": "https://www.schema.xxx/person/JohnBarks", "title": "Equity Trade Euronext Paris XXX", "id": "equity-trade-euronext-paris-xxx", "dcat:conformsTo": "https://cdm.finos.org/docs/event-model" } } } ,{ "@type": "dcat:DataService", "id": "equity-trade-lseg-xxx-tabular-adls-prod", "@id": "https://y.com/service/equity-trade-lseg-xxx-adls-prod-1", "dcat:endpointURL": "abfss://datasetsv1@demo.dfs.core.windows.net/demo/full/trade-lseg", "dcat:endpointDescription": "Details for accessing storage account", "isAccessServiceOf": { "@id": "https://y.com/service/equity-trade-lseg-xxx-tabular", "id": "equity-trade-lseg-xxx-tabular", "@type": "dcat:Distribution", "dcterms:format": "application/parquet", "dcat:conformsTo": "https://cdm.finos.org/docs/event-model" , "isDistributionOf": { "@type": "dcat:Dataset", "@id": "https://y.com/dataset/equity-trade-lseg-xxx", "title": "Equity Trade LSEG XXX", "id": "equity-trade-lseg-xxx", "dcat:conformsTo": "https://cdm.finos.org/docs/event-model" } } } ] }
An Observability Port is a designated interface or endpoint in a system or application specifically used for monitoring and diagnostic purposes. It allows external tools or services to collect and analyze data related to the system's performance, health, and behaviour. By exposing metrics, logs, and traces through this port, administrators and developers can gain insights into the system's state, troubleshoot issues, and ensure it operates efficiently and reliably.
DPROD has a schema-first design. The first thing you would need to do is define a schema for your logging information. It could be a schema based on OpenTelemetry, but in this example, we use RLOG (which is a semantic ontology for logging).
To find the Observability Port, you would query the ports to identify the ones that return an RLOG:Entry
:
outputPort >> isAccessServiceOf >> isDistributionOf >> conformsTo >> rlog:Entry
You can see that the example data product has two ports, one with the data and one with the logging. This query will return the URI of the port that returns logging data: https://y.com/uk-bonds/observability-port
.
Here is an example of a data product with an observability port:
{
"@context": "https://ekgf.github.io/data-product-spec/dprod.jsonld",
"dataProducts": [
{
"id": "https://y.com/data-product/uk-bonds",
"type": "DataProduct",
"inputPort": [
{
"id": "https://y.com/data-product/uk-bonds/port/2024-data",
"type": "DataService"
}
],
"outputPort": [
{
"id": "https://y.com/data-product/uk-bonds/port/2024-observability",
"type": "DataService",
"label": "Observability Port",
"endpointURL": "https://y.com/data-product/uk-bonds/port/2024-observability",
"isAccessServiceOf": {
"type": "Distribution",
"format": "https://www.iana.org/assignments/media-types/application/json",
"isDistributionOf": {
"type": "Dataset",
"id": "https://y.com/data-product/uk-bonds/dataset/2024-observability",
"conformsTo": "https://y.com/schema/ObservabilityLog"
}
}
},
{
"id": "https://y.com/data-product/uk-bonds/port/2024-data",
"type": "DataService",
"label": "Data Port",
"endpointURL": "https://y.com/data-product/uk-bonds/port/2024-data",
"isAccessServiceOf": {
"type": "Distribution",
"format": "https://www.iana.org/assignments/media-types/application/json",
"isDistributionOf": {
"type": "Dataset",
"id": "https://y.com/data-product/uk-bonds/dataset/2024-data",
"conformsTo": "https://y.com/schema/Data"
}
}
}
]
}
]
}
Given that our schema defines the class for an observation, we can use that to find all observantly ports on data product like this:
[https://y.com/data-product/uk-bonds/port/2024-observability] >> isAccessServiceOf >> isDistributionOf >> conformsTo >> https://y.com/schema/ObservabilityLog
In Linked Data we would use a SPARQL query to do that:
SELECT ?port
WHERE
{
?port a dcat:DataService .
?port (dprod:isAccessServiceOf/dprod:isDistributionOf)/dcat:conformsTo rlog:Entry
}
This query will return the URI of the port that provides logging data: https://y.com/data-product/uk-bonds/port/2024-observability
.
Rates for SBE Pool that are Mortgage Backed Securities. The data product is provided through 3 ports: 1st port: providing all sba pool rates through a query to a database 2nd port: providing only EMEA rates through an api 3rd port: providing only US rates through a Kafka topic
{ "@context": "https://ekgf.github.io/data-product-spec/dprod.jsonld", "id": "sba-pool-rates", "@id": "https://y.com/products/sba-pool-rates", "@type": "DataProduct", "title": "SBA Pool Rates", "description": "Rates for SBE Pool that are Mortgage Backed Securities. The data product is provided through 3 ports, one of them proving all sba pool rates through a query to a database, another port providing EMEA only rates through an api and another one providing US only rates through a Kafka topic", "dataProductOwner": "https://www.schema.xxx/person/johnSmith", "lifecycle" : "Consume", "outputPort": [{ "@type": "dcat:DataService", "id": "sba-pool-rate-tabular-prod1", "environment": "PROD", "@id": "https://y.com/service/sba-pool-rate-tabular-prod1", "dcat:endpointURL": "jdbc:oracle:thin@sd656-5656-6745.ldn.organiation.com:43534/PGPERG.WORLD", "sql": "select * from ...", "isAccessServiceOf": { "@id": "https://y.com/distribution/sba-pool-rate-tabular", "id": "sba-pool-rate-tabular", "@type": "dcat:Distribution", "dcterms:format": "https://www.iana.org/assignments/media-types/application/sql", "isDistributionOf": { "@type": "dcat:Dataset", "@id": "https://y.com/dataset/sba-pool-rate", "id": "sba-pool-rate", "dcat:conformsTo": "https://spec.edmcouncil.org/fibo/ontology/SEC/Debt/MortgageBackedSecurities/SBA-Pool" } } } , { "@type": "dcat:DataService", "id": "sba-pool-rate-emea-api-prod1", "environment": "PROD", "@id": "https://y.com/service/sba-pool-rate-emea-api-prod1", "dcat:endpointURL": "https://example.org/mbs/SBA-Pool-location-emea", "dcat:conformsTo": "../resources/users.yaml'" , "isAccessServiceOf": { "@id": "https://y.com/distribution/sba-pool-rate-emea-json1", "id": "sba-pool-rate-emea-json1", "@type": "dcat:Distribution", "dcterms:format": "https://www.iana.org/assignments/media-types/application/json", "isDistributionOf": { "@type": "dcat:Dataset", "@id": "https://y.com/dataset/sba-pool-rate-emea", "description": "sba pool data that cover EMEA accessed through an api" , "geographicalCoverage" :"https://y.com/country/EMEA", "id": "sba-pool-rate-emea", "dcat:conformsTo": "https://spec.edmcouncil.org/fibo/ontology/SEC/Debt/MortgageBackedSecurities/SBA-Pool" } } } , { "@type": "dcat:DataService", "id": "sba-pool-rate-json-prod1", "@id": "https://y.com/service/sba-pool-rate-json-prod1", "dcat:endpointURL": "q1.debt.mbs.dataset.us", "isAccessServiceOf": { "@id": "https://y.com/distribution/sba-pool-rate-json", "id": "sba-pool-rate-json", "@type": "dcat:Distribution", "dcterms:format": "https://www.iana.org/assignments/media-types/application/json", "conformsTo": "http://confluent-registry-y/rates-json-schema.json", "schemaCompatiblity": "backwards compatible", "isDistributionOf": { "@type": "dcat:Dataset", "@id": "https://y.com/dataset/sba-pool-rate-us", "id": "sba-pool-rate-us", "geographicalCoverage" :"https://y.com/country/US" , "dcat:conformsTo": "https://spec.edmcouncil.org/fibo/ontology/SEC/Debt/MortgageBackedSecurities/SBA-Pool" } } } ] }
The editors gratefully acknowledge the feedback and contributions made to this document by: