Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The editor or API should prevent bad character encodings #867

Closed
mmo opened this issue Jul 5, 2022 · 1 comment · Fixed by #908
Closed

The editor or API should prevent bad character encodings #867

mmo opened this issue Jul 5, 2022 · 1 comment · Fixed by #908
Labels
bug Breaks something but is not blocking f: data About data model, importation, transformation, exportation of data, specific for bibliographic data p-High To set a high priority!

Comments

@mmo
Copy link
Collaborator

mmo commented Jul 5, 2022

How it works

When a document record contains character encoding problems, caused for instance when the cataloguer enters abstracts or other metadata by copying-pasting from PDF files, this affects OAI-PMH behaviour. Every PMH request that includes that record will fail with a server error:

All strings must be XML compatible: Unicode or ASCII, no NULL bytes or control characters

Improvement suggestion

The editor prevent to submit non authorised characters or, ideally, automatically correct it.

Alternative:

  • OAI-PMH requests should not fail due to character encoding problems in a single record. Records should be checked for character encoding problems. Possible approaches are (1=worst ... 4=best):
    1. During the OAI-PMH response: check each record for encoding problems and exclude it from the response, if needed
    2. During the OAI-PMH response: check each record for encoding problems and automatically sanitize it, if needed, before including it in the response
    3. During record creation: automatically sanitize the record before saving
    4. During record creation: issue an error and prevent the record to be created (ckeck server-side/client-side implications)
@mmo mmo added the enhancement Enhancement of an existing feature label Jul 5, 2022
@pronguen pronguen added bug Breaks something but is not blocking f: data About data model, importation, transformation, exportation of data, specific for bibliographic data p-High To set a high priority! and removed enhancement Enhancement of an existing feature labels Jul 5, 2022
@pronguen pronguen changed the title Make OAI-PMH responses more robust against bad character encodings The editor should prevent bad character encodings Aug 8, 2022
@pronguen
Copy link
Contributor

pronguen commented Aug 8, 2022

Similar to #861

@PascalRepond PascalRepond changed the title The editor should prevent bad character encodings The editor or API should prevent bad character encodings Aug 10, 2022
jma added a commit to jma/sonar that referenced this issue Nov 16, 2022
* Adds new `safety` exceptions.
* Removes controls chars when the dublin core xml file is produced.
* Closes rero#867.

Co-Authored-by: Johnny Mariéthoz <[email protected]>
jma added a commit to jma/sonar that referenced this issue Nov 16, 2022
* Adds new `safety` exceptions.
* Removes controls chars when the dublin core xml file is produced.
* Closes rero#867.

Co-Authored-by: Johnny Mariéthoz <[email protected]>
@jma jma closed this as completed in ac86d20 Nov 22, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Breaks something but is not blocking f: data About data model, importation, transformation, exportation of data, specific for bibliographic data p-High To set a high priority!
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants