Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DCAT formatter improvements #8642

Draft
wants to merge 31 commits into
base: main
Choose a base branch
from

Conversation

josegar74
Copy link
Member

@josegar74 josegar74 commented Feb 6, 2025

  • Improve the functions to obtain record / resource URIs to produce full URIs.

  • The original metadata can be ISO19115.3-2008 or ISO19139. For ISO19139, the extraction of languages was not defined.

  • Use iso2 language code for rdf localised elements and adms:status

  • Process all resource constraints.

  • Formatter / DCAT / EU / Fix conformsTo URI for DCAT-AP and add legislations.

  • Formatter / DCAT / Use relation API to populates link. Adds:

    • dcat:inSeries
    • dcat:seriesMember
    • dct:relation
    • dct:source
    • dct:references
    • dct:servcesDataset
  • Formatter / DCAT / Service / Endpoint description can also be a link with function information.

  • Formatter / DCAT / Distribution

    • Add support for DQ spec and report reference as foaf:page
    • Fix when function is not defined or empty
    • Add modified date based on resource revision
    • Add dcat:mediaType
    • Copy resource language
    • Case insensitive matching of format
  • Formatter / DCAT / Avoid to generate URI not starting with http (which will be RDF invalid). Add missing match on online resource name and DQ citation link.

  • Formatter / DCAT / EU publication file type expect http URI not https. Also add additional iana types.

  • Formatter / DCAT / Limit dcat:theme element to EU data themes. Keywords are encoded using dct:subject if with Anchor and dcat:keyword for label.

  • Formatter / DCAT / EU validator complains about this.

  • Formatter / DCAT / Add ADMS license type and EU access rights.

  • Based on license URL add corresponding ADMS type

  • Based on accessConstraint type add EU access type

Checklist

  • I have read the contribution guidelines
  • Pull request provided for main branch, backports managed with label
  • Good housekeeping of code, cleaning up comments, tests, and documentation
  • Clean commit history broken into understandable chucks, avoiding big commits with hundreds of files, cautious of reformatting and whitespace changes
  • Clean commit messages, longer verbose messages are encouraged
  • API Changes are identified in commit messages
  • Testing provided for features or enhancements using automatic tests
  • User documentation provided for new features or enhancements in manual
  • Build documentation provided for development instructions in README.md files
  • Library management using pom.xml dependency management. Update build documentation with intended library use and library tutorials or documentation

…Is to produce full URIs.

Previously rdf:about elements contained not a full URI, like rdf:about='353108a0-f0f8-4f8e-9d0e-60f18dfda169' that was causing validation issues in DCAT-AP validator
@josegar74 josegar74 added this to the 4.4.7 milestone Feb 6, 2025
@josegar74 josegar74 requested a review from fxprunayre February 6, 2025 12:49
@CLAassistant
Copy link

CLAassistant commented Feb 6, 2025

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
0 out of 2 committers have signed the CLA.

❌ fxprunayre
❌ josegar74
You have signed the CLA already but the status is still pending? Let us recheck it.

josegar74 and others added 13 commits February 6, 2025 16:03
Previously language codes were inconsistent, these type of elements got the language iso3 codes (<dct:title xml:lang='dut'...) and elements from vocabularies, where the language is defined with the iso2 code, got the iso2 code: (<skos:prefLabel xml:lang='nl'...)
Otherwise the template for managing resource constraints in formatter/dcat/dcat-core-access-and-use.xsl did not correctly process the contraints when the first resource constraint had no access or use constraints. For example:

<mri:resourceConstraints>
    <mco:MD_Constraints>
        <mco:useLimitation>
            <gco:CharacterString>Geen gebruiksbeperkingen</gco:CharacterString>
        </mco:useLimitation>
    </mco:MD_Constraints>
</mri:resourceConstraints>
<mri:resourceConstraints>
    <mco:MD_LegalConstraints>
        <mco:accessConstraints>
            <mco:MD_RestrictionCode codeList=http://www.isotc211.org/2005/resources/Codelist/gmxCodelists.xml#MD_RestrictionCode codeListValue=otherRestrictions codeSpace=ISOTC211/19115>otherRestrictions</mco:MD_RestrictionCode>
        </mco:accessConstraints>
        <mco:otherConstraints>
            <gco:CharacterString>Open data (publiek)|https://creativecommons.org/publicdomain/mark/1.0/deed.nl\</gco:CharacterString\>
        </mco:otherConstraints>
    </mco:MD_LegalConstraints>
</mri:resourceConstraints>
* Based on license URL add corresponding ADMS type
* Based on accessConstraint type add EU access type

```xml
<dct:LicenseDocument rdf:about="https://creativecommons.org/licenses/by-sa/2.0">
    <dct:type>
       <skos:Concept rdf:about="http://purl.org/adms/licencetype/ViralEffect-ShareAlike">
          <skos:prefLabel xml:lang="en">Viral effect (a.k.a. Share-alike)</skos:prefLabel>
          <skos:notation>ViralEffect-ShareAlike</skos:notation>
          <skos:inScheme rdf:resource="http://purl.org/adms/licencetype/1.0"/>
       </skos:Concept>
    </dct:type>
    <dct:type>
       <skos:Concept rdf:about="http://purl.org/adms/licencetype/Attribution">
          <skos:notation>Attribution</skos:notation>
          <skos:prefLabel xml:lang="en">Attribution</skos:prefLabel>
          <skos:inScheme rdf:resource="http://purl.org/adms/licencetype/1.0"/>
       </skos:Concept>
    </dct:type>
 </dct:LicenseDocument>
</dct:license>

<dct:accessRights>
 <dct:RightsStatement rdf:about="http://publications.europa.eu/resource/authority/access-right/PUBLIC"/>
</dct:accessRights>
```
…ds are encoded using dct:subject if with Anchor and dcat:keyword for label.
…h will be RDF invalid). Add missing match on online resource name and DQ citation link.
* Add support for DQ spec and report reference as foaf:page
* Fix when function is not defined or empty
* Add modified date based on resource revision
* Add dcat:mediaType
* Copy resource language
* Case insensitive matching of format
Adds:
* dcat:inSeries
* dcat:seriesMember
* dct:relation
* dct:source
* dct:references
* dct:servcesDataset
@josegar74 josegar74 force-pushed the 44-dcat-formatter-improvements branch from 8cf7aa3 to 63171aa Compare February 13, 2025 10:08
@josegar74 josegar74 force-pushed the 44-dcat-formatter-improvements branch from 679140c to 7c44f46 Compare February 18, 2025 09:20
fxprunayre and others added 2 commits February 19, 2025 14:15
If no match, data.europa.eu display UNKNOWN.
Formatter / DCAT / Add some more media type mapping to EU vocabuary
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants