-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Provide vocabulary to specify purposes and permissions related to AI training #82
Comments
Hi. Thanks for the proposal. This is an interesting application that I/we hadn't forseen. Of existing vocabs, I think ODRL would be a good option to specify machine-readable licenses to indicate what is permitted and prohibited, and schema.org might suffice to specify types of contents (e.g. images, videos). What is then left is specifying purposes such as training an ML model - for which I don't think there are existing vocabularies. In DPVCG, we are interested in expanding the DPV to more regulations - such as the EU's AI Act where such purposes are relevant. So I think this can function as an use-case towards the development of AI relevant vocabularies including purposes. For example, https://w3id.org/AIRO#training is a concept from @DelaramGlp's work on AI related risk management that refers to the training phase in AI development lifecycle. In DPV, this can be a category of purpose. You and others are welcome to help with these efforts, or provide such purposes, or have more use-cases/examples. |
We discussed this in Meeting FEB-06 and decided to include in the scope of DPV for providing concepts to represent AI training so that it can be used with vocabularies like ODRL to express policies and agreements that state permissions/prohibitions over AI training. This is a rather complex topic and we don't want to simply state Eventually, we may decide to include composite concepts such as |
(using dpvbot:) This was discussed in Meeting 2025-02-13 |
I collected some more thoughts on this in a blog post. The summary of it is:
Example: A notice stating Name (personal data) will be used for training to provide personalised recommendations based on informed consent, the training will take place on device, and data will be (only) stored on device, and (optionally) a prohibition data will not be transferred outside the device. ex:SomeNotice a dpv:AINotice ;
dpv:hasProcess [
a ai:AIProcess ;
ai:hasTrainingData pd:Name ;
dpv:hasPurpose dpv:ProvidePersonalisedRecommendations ;
dpv:hasLocation dpv:WithinDevice ;
dpv:hasProcessingCondition [
dpv:hasProcessing dpv:Store ;
dpv:hasLocation dpv:WithinDevice ;
] ;
dpv:hasProhibition [
a dpv:Prohibition ;
dpv:hasProcessing dpv:Transfer ;
dpv:hasLocation dpv:OutsideDevice ; # new concept
] ;
dpv:hasLegalBasis dpv:InformedConsent ;
] . |
I’m not sure this is the correct place to file this issue, but I would love for some standardized way to disallow my content (writing, photography, code) from being used in AI training data.
I’m imagining an extension of robots.txt where I can explicitly disallow crawlers that search for AI training data.
OR some sort of standard way to indicate copyright permissions and training usage being included might also be helpful, but probably more complicated.
Ultimately I want people to find my work, but I don’t want it to end up in an AI model for others to make things in the style of my work.
The text was updated successfully, but these errors were encountered: