RDF

udata has built-in RDF support allowing it to both expose and harvest RDF metadata. It uses the Data Catalog Vocabulary (or DCAT) as base vocabulary.

Endpoints

udata exposes instance metadata through different RDF endpoints and tries to follow some best practices.

All relative URLs are relative to the udata instance root

Content Negotiation

The following formats are supported (default in bold):

Format Extension MIME type
RDF/XML xml, rdf application/rdf+xml, application/xml
Turtle ttl text/turle, application/x-turtle
Notation3 n3 text/n3
JSON-LD jsonld, json application/ld+json, application/json
N-Triples nt application/n-triples
TriG trig application/trig

Each endpoint is available through a generic URL which performs content negotiation and redirects to a set of format specific URLs. The default format is JSON-LD.

Organization

Organizations are available through the following URL:

/organization/{id}/catalog

where id is the organization’s identifier on the udata instance.

This URL performs content negotiation and redirects to:

/organization/{id}/catalog.{format}

It is exposed as a DCAT Catalog and a Hydra Collection This allows pagination through the hydra:PartialCollectionView class.

The organization’s catalog embeds the organization’s datasets.

Dataset

Datasets are available through the following URL:

/dataset/{id}/rdf

where id is the dataset’s identifier on the udata instance.

This URL performs content negotiation and redirects to:

/dataset/{id}/rdf.{format}

The dataset pages serves as an identifier and performs content negotiation too, so the following URLs will all redirect to the same RDF endpoint:

/dataset/{id}
/dataset/{slug}
/{lang}/dataset/{id}
/{lang}/dataset/{slug}

A Dataset is exposed as a DCAT Dataset, a Resource as DCAT Distribution and fields are mapped according to:

Dataset dcat:Dataset notes
id dct:identifier
title dct:title
description dct:description
tags dct:keyword
created_at dct:issued
last_modified dct:modified
resources dcat:distribution
temporal_coverage dct:temporal Uses schema.org startDate and endDate
frequency dct:accrualPeriodicity Frequencies without Dublin Core equivalent are mapped to the closest one
license dct:license + dct:right License URL in dct:license and license label in dct:right
Resource dcat:Distribution notes
id dct:identifier
title dct:title
description dct:description
url dcat:downloadURL as URI reference
permanent url dcat:accessURL as URI reference
published dct:issued
last_modified dct:modified
format dct:format
mime dcat:mediaType
filesize dcat:bytesSize
checksum spdx:checksum
TemporalCoverage dct:PeriodOfTime
start schema:startDate
end schema:endDate
Checksum spdx:Checksum
type spdx:algorithm
value spdx:checksumValue

Catalog

The site catalog is exposed through:

/catalog

and performs content negotiation to

/catalog.{format}

It is exposed as a DCAT Catalog and a Hydra Collection This allows pagination through the hydra:PartialCollectionView class.

Dataportal

There is a work in progress Dataportal specification but as many sites already use this formalism, the catalog is also available (as a redirect) on the following URL:

/data.{format}

where format is one of the supported format extensions.

JSON-LD context

To reduce payload and increase human readbility, udata exposes a JSON-LD context and uses it in its serialization. This context is available on:

/context.jsonld

Harvester

udata can harvest other RDF/DCAT enabled data portals with the DCAT Harvester.

References

The udata rdf implementation and its harvester were created using these references:

The used namespaces are: