Skip to content
This repository has been archived by the owner on Jan 25, 2018. It is now read-only.

Remove solr specific suffixes #80

Open
mejackreed opened this issue Jan 28, 2016 · 6 comments
Open

Remove solr specific suffixes #80

mejackreed opened this issue Jan 28, 2016 · 6 comments

Comments

@mejackreed
Copy link
Member

No description provided.

@mejackreed mejackreed added this to the v1.0.0 milestone Jan 28, 2016
@drh-stanford
Copy link

Below is an example of a JSON-LD format for the GeoBlacklight schema that abstracts out the Solr specific details, and makes a couple other changes. Namely, this example uses @id in lieu of layer_slug_s, and dc:identifiers for a set of alternate identifiers and drops uuid ( #53 ). Note that dct:references becomes a proper JSON hash, and all derivative fields are dropped.

To ingest the abstracted JSON-LD format into a Solr index would require a shim of harvesting code that derives the fields needed for the Solr implementation (such as solr_geom's ENVELOPE syntax from the georss:box field). This harvesting code could also provide a conversion utility from the current version of the JSON schema and the 1.0 abstracted JSON-LD version.

There's several other issues with various individual fields, such as moving layer_id_s into dct:references #77, but the example below is meant to illustrate the JSON-LD file format and its implications as an interchange format.

The example shows that the JSON-LD'ness is pretty straightforward. Namely, the use of @context for the prefixes, and @id to identify the layer.

{
  "@context": {
    "dc": "http://purl.org/dc/elements/1.1/",
    "dct": "http://purl.org/dc/terms/",
    "georss": "http://georss.org#",
    "layer": "http://geoblacklight.org/schema/1.0#",
    "stanford": "http://library.stanford.edu#"
  },
  "@id": "stanford-fr148tw1471",
  "dc:identifier": [
    "http://purl.stanford.edu/fr148tw1471"
  ],
  "dc:title": "Geology: Offshore of Point Reyes, California, 2010",
  "dc:description": "This polygon shapefile represents geologic features within the offshore region of Point Reyes, California...",
  "dc:rights": "Public",
  "dct:provenance": "Stanford",
  "dct:references": {
    "http://schema.org/url": "http://purl.stanford.edu/fr148tw1471",
    "http://schema.org/downloadUrl": "http://stacks.stanford.edu/file/druid:fr148tw1471/data.zip",
    "http://www.loc.gov/mods/v3": "http://purl.stanford.edu/fr148tw1471.mods",
    "http://www.isotc211.org/schemas/2005/gmd/": "http://opengeometadata.stanford.edu/metadata/edu.stanford.purl/druid:fr148tw1471/iso19139.xml",
    "http://www.w3.org/1999/xhtml": "http://opengeometadata.stanford.edu/metadata/edu.stanford.purl/druid:fr148tw1471/default.html",
    "http://www.opengis.net/def/serviceType/ogc/wfs": "https://geowebservices.stanford.edu/geoserver/wfs",
    "http://www.opengis.net/def/serviceType/ogc/wms": "https://geowebservices.stanford.edu/geoserver/wms"
  },
  "layer:id": "druid:fr148tw1471",
  "layer:geom_type": "Polygon",
  "layer:modified_dt": "2016-02-05T22:07:10Z",
  "dc:format": "Shapefile",
  "dc:language": "English",
  "dc:type": "Dataset",
  "dc:publisher": "Geological Survey (U.S.)",
  "dc:creator": [
    "Michael W. Manson",
    "Janet T. Watt",
    "H. Gary Greene",
    "Moss Landing Marine Laboratories",
    "Pacific Coastal and Marine Science Center",
    "Golden, Nadine E."
  ],
  "dc:subject": [
    "Geology",
    "Geomorphology",
    "Sediments (Geology)",
    "Marine sediments",
    "Ocean bottom",
    "Geoscientific Information",
    "Oceans"
  ],
  "dct:issued": "2014",
  "dct:temporal": [
    "2006-2010"
  ],
  "dct:spatial": [
    "California",
    "Marin County (Calif.)",
    "Drakes Bay (Calif.)",
    "Pacific Ocean"
  ],
  "dc:relation": [
    "http://sws.geonames.org/3687919/",
    "http://sws.geonames.org/5370468/",
    "http://sws.geonames.org/8411083/"
  ],
  "georss:box": "37.939061 -123.091039 38.098269 -122.892843",
  "stanford:rights_metadata": "<?xml version=\"1.0\"?>\n<rightsMetadata>\n  <access type=\"discover\">\n    <machine>\n      <world/>\n    </machine>\n  </access>\n  <access type=\"read\">\n    <machine>\n      <world/>\n    </machine>\n  </access>\n  <use>\n    <human type=\"useAndReproduction\">This item is in the public domain.  There are no restrictions on use.</human>\n    <human type=\"creativeCommons\"/>\n    <machine type=\"creativeCommons\"/>\n  </use>\n  <copyright>\n    <human>This work is in the Public Domain, meaning that it is not subject to copyright.</human>\n  </copyright>\n</rightsMetadata>\n"
}

@eliotjordan
Copy link
Member

I like seeing this as JSON-LD. Thanks for getting this up @drh-stanford!

@mejackreed
Copy link
Member Author

Yes thanks, looks good! One quick concern I have is increasing the complexity of indexing documents from their native format. Maybe we can use something from here: https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Index+Handlers#UploadingDatawithIndexHandlers-TransformingandIndexingCustomJSON ?

Though it does seem like this might not fully meet our need, but the XML approach seems more amenable, as you can provide custom xslt's to transform your data. Sigh.

@mejackreed
Copy link
Member Author

Also maybe the Data Import Handlers (DIH) are an option?

@drh-stanford
Copy link

The layer:id probably should move into the dct:references since it's not really an "identifier" as much as it's a parameter to the WMS/WFS protocol.

@mejackreed
Copy link
Member Author

Not to throw a wrench in things, but we should possible talk about DCAT as an alternative too! https://project-open-data.cio.gov/v1.1/schema/

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants