Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discuss: Event Metadata #1

Open
wants to merge 1 commit into
base: discussions/event-metadata/base
Choose a base branch
from

Conversation

LukasKalbertodt
Copy link
Member

@LukasKalbertodt LukasKalbertodt commented Mar 20, 2025

This discussion PR is for only event metadata.

To add new comments, go into the "files changed" tab and comment on individual lines or sections of lines. Or you can comment on existing discussion threads here. If you feel like you want to start a discussion about a broader topic (than something mainly referencing a few lines), consider open a new discussion here.

(🟦) Metadata fields marked with this symbol are *Opencast-managed*: they are read-only for users/external applications. All other fields can be freely changed, as long as validity checks pass.

#### General
- `id: ID` 🟦: unique among all events.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed recently, this needs clearer definition. Is the id unique to the system, tenancy ( or even globally)? I think the feeling was that id+organization should be unique because of the possibility of loading the same MediaPackage into different tenancies in the same system.

Need to add organization: NonBlankString ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I completely forgot about multi-tenancy in the whole doc. Need to think about it.

- `creators: NonBlankString[]`: The people mainly responsible for creating this video and/or presenting the talk which this video is a recording of. Should contain human-readable names and not usernames. Plain text. This is the main "who?"-information shown in the UIs; other fields in `extraMetadata` (e.g. `dct:contributor`) might be shown too, but less prominently.
- `language: LangCode?`: describes the (main) language of this event and its metadata. For example, the audio language and (if applicable) language of video content is more important than the language of available subtitles. Generally, assets can have their own language specified.
- `series: SeriesID?`: optional ID of the series this event belongs to. Must be a valid series ID of an existing series at all time.
- `owner: Username`: TODO figure out details

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we allow owner to also be a group rather than just an individual?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally, I would like to only ever have a single individual (one person to "speak" to), but I see that others might want to have multiple users.

- `startTime: DateTime?`: Actual real life datetime when the video recording started or will start, with timezone. If this is not applicable, for example because it's a short movie, this should be undefined. UIs should use this as primary date to show for a video and if unset, fallback to `created`.
- `endTime: DateTime?`: Like `startTime`, but when the video recording stopped. Due to cutting, recording pauses and etc, the `duration` is not necessarily `end - start`.
- `duration: Milliseconds` 🟦: duration of the event. As specified in ["assets"](./assets), this needs to always match the duration of all non-internal tracks.
- `updated: Timestamp` 🟦: Timestamp of when anything about this event was last changed.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

modified ?
Again needs a clear definition of what affects a change e.g. if the series is modified

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also like modified better.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds fine to me. I just chose updated as that's the current name.

Will clarify when this timestamp is updated, but no, series changes shouldn't affect it. Series (will) have their own modified timestamp.

- `endTime: DateTime?`: Like `startTime`, but when the video recording stopped. Due to cutting, recording pauses and etc, the `duration` is not necessarily `end - start`.
- `duration: Milliseconds` 🟦: duration of the event. As specified in ["assets"](./assets), this needs to always match the duration of all non-internal tracks.
- `updated: Timestamp` 🟦: Timestamp of when anything about this event was last changed.
- `created: Timestamp` 🟦: Timestamp of when the event was created in Opencast. It is set once when the event is first stored in Opencast's DB, and never changed again. This also implies that scheduled event's `created` date is when the scheduling took place, _not_ the time it is scheduled for (that would be `startDate`)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add immutable flag

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you elaborate? Like, add an immutable field to the event? What would that do exactly?

- `endTime: DateTime?`: Like `startTime`, but when the video recording stopped. Due to cutting, recording pauses and etc, the `duration` is not necessarily `end - start`.
- `duration: Milliseconds` 🟦: duration of the event. As specified in ["assets"](./assets), this needs to always match the duration of all non-internal tracks.
- `updated: Timestamp` 🟦: Timestamp of when anything about this event was last changed.
- `created: Timestamp` 🟦: Timestamp of when the event was created in Opencast. It is set once when the event is first stored in Opencast's DB, and never changed again. This also implies that scheduled event's `created` date is when the scheduling took place, _not_ the time it is scheduled for (that would be `startDate`)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

available: DateTimeRange : publication period when the event is visible to intended audience. Can be unset or open-ended.

Maybe this should be specific to each publication?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

publication

No idea what you are talking about :D

But yeah I guess this available is related to the lifecycle management? Did not consider anything there. Will think about it and also talk to Arne.

Copy link
Member

@mtneug mtneug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is ETHZ still using translated metadata (e.g. metadata with a language code attached to it)? Do we want that?

(🟦) Metadata fields marked with this symbol are *Opencast-managed*: they are read-only for users/external applications. All other fields can be freely changed, as long as validity checks pass.

#### General
- `id: ID` 🟦: unique among all events.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I strongly disagree with this field being read-only during ingest (afterward, sure, it should never be changed).

When migrating from other systems, you want embeddings to remain intact. Usually, this means that the system importing the videos will also import the IDs.

People give meaning to the IDs. E.g. adopters set the ID based on the ID of lecture systems etc.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think I mentioned anywhere that this cannot be chosen by the ingester? I know of the use cases for a user-defined ID and therefore I did not disallow it. Neither this field nor the type ID mention anything. Maybe you read a previous version of my document? Or you confused this with an asset ID, where I said "assigned by Opencast"?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definition of 🟦 above.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yeah woops, should clarify this then.

- `description: string?`: user-specified, human-readable description, potentially quite long.
- TODO: Decide whether this is plain text, markdown or anything else. External apps displaying this need to know that. Some basic formatting options might be nice?
- `creators: NonBlankString[]`: The people mainly responsible for creating this video and/or presenting the talk which this video is a recording of. Should contain human-readable names and not usernames. Plain text. This is the main "who?"-information shown in the UIs; other fields in `extraMetadata` (e.g. `dct:contributor`) might be shown too, but less prominently.
- `language: LangCode?`: describes the (main) language of this event and its metadata. For example, the audio language and (if applicable) language of video content is more important than the language of available subtitles. Generally, assets can have their own language specified.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So what if you have multiple audio tracks with different languages? If this can be specified at the asset level, why have this at the event level?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mhh interesting. So when e.g. Tobira would want to a show a language for an event, it would be derived from the track properties... maybe? But how would that work for an uploader UI? I am assuming that we do not always can/want to auto-detect the language, so it is useful for the user/video manager to specify a language when uploading, right? But what if a video file with multiple tracks is uploaded? Mh... Does the user then need to specify the lang per track? Or does an uploader only have a single "language" field? Since multiple audio tracks are likely very rare?

- `creators: NonBlankString[]`: The people mainly responsible for creating this video and/or presenting the talk which this video is a recording of. Should contain human-readable names and not usernames. Plain text. This is the main "who?"-information shown in the UIs; other fields in `extraMetadata` (e.g. `dct:contributor`) might be shown too, but less prominently.
- `language: LangCode?`: describes the (main) language of this event and its metadata. For example, the audio language and (if applicable) language of video content is more important than the language of available subtitles. Generally, assets can have their own language specified.
- `series: SeriesID?`: optional ID of the series this event belongs to. Must be a valid series ID of an existing series at all time.
- `owner: Username`: TODO figure out details
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally, I would like to only ever have a single individual (one person to "speak" to), but I see that others might want to have multiple users.

- `creators: NonBlankString[]`: The people mainly responsible for creating this video and/or presenting the talk which this video is a recording of. Should contain human-readable names and not usernames. Plain text. This is the main "who?"-information shown in the UIs; other fields in `extraMetadata` (e.g. `dct:contributor`) might be shown too, but less prominently.
- `language: LangCode?`: describes the (main) language of this event and its metadata. For example, the audio language and (if applicable) language of video content is more important than the language of available subtitles. Generally, assets can have their own language specified.
- `series: SeriesID?`: optional ID of the series this event belongs to. Must be a valid series ID of an existing series at all time.
- `owner: Username`: TODO figure out details
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the past, people disagreed on the definition of the term "owner". Mainly is this the person legally owning this piece of media vs is this the person responsible for uploading this and having all the rights to manage this video. IMO it should be the latter and leave copyright stuff to other fields.

Note: ILIAS and Moodle encode ownership in the ACLs with a specific owner role (ROLE_OWNER_{username} by default). This makes sense if the owner should have all access rights. On the other hand, you could not include the write action, which doesn't make sense... On the other other hand, if access rights are controlled by metadata, this feels wrong.

This is not written here, but I would not require the username to exist. Uploading from Moodle / ILIAS is possible without Opencast knowing that the user exists. And I don't want Moodle / ILIAS to create user records.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As obvious by the "TODO", I will still think about all of this, but here are a few points that are already clear in my head:

  • Yes, the copyright/legal stuff I would solve with other fields, most likely DC rightsHolder
  • The owner field will not affect access rights, only the ACL does that.
  • Yes the username does not need to exist. The check is simply not possible with the wide range of auth systems we want to support.

- `startTime: DateTime?`: Actual real life datetime when the video recording started or will start, with timezone. If this is not applicable, for example because it's a short movie, this should be undefined. UIs should use this as primary date to show for a video and if unset, fallback to `created`.
- `endTime: DateTime?`: Like `startTime`, but when the video recording stopped. Due to cutting, recording pauses and etc, the `duration` is not necessarily `end - start`.
- `duration: Milliseconds` 🟦: duration of the event. As specified in ["assets"](./assets), this needs to always match the duration of all non-internal tracks.
- `updated: Timestamp` 🟦: Timestamp of when anything about this event was last changed.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also like modified better.

- `endTime: DateTime?`: Like `startTime`, but when the video recording stopped. Due to cutting, recording pauses and etc, the `duration` is not necessarily `end - start`.
- `duration: Milliseconds` 🟦: duration of the event. As specified in ["assets"](./assets), this needs to always match the duration of all non-internal tracks.
- `updated: Timestamp` 🟦: Timestamp of when anything about this event was last changed.
- `created: Timestamp` 🟦: Timestamp of when the event was created in Opencast. It is set once when the event is first stored in Opencast's DB, and never changed again. This also implies that scheduled event's `created` date is when the scheduling took place, _not_ the time it is scheduled for (that would be `startDate`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

created should automatically be reset for duplicated events.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mh event duplication, right... I guess I still want to understand the actual use cases for that and wonder if we can solve those in a different way rather than event duplication. But yes, if we have event duplication, then I agree with you. And then i should probably go through all the fields again and check if any of those should behave in a special way when duplicated.

- `created: Timestamp` 🟦: Timestamp of when the event was created in Opencast. It is set once when the event is first stored in Opencast's DB, and never changed again. This also implies that scheduled event's `created` date is when the scheduling took place, _not_ the time it is scheduled for (that would be `startDate`)

#### Flags
- `explicitContent: bool`: specifies whether this event contains content that is considered "explicit", like swear words or whatnot. This is required for some integrations like iTunes.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We never had this flag by default. Why add it now? Can this be extended metadata?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I dislike it as well, it's just due to iTunes. Making it extraMetadata... well it depends. If we ever want Opencast to read and interpret that field, e.g. to create iTunes feeds, then I would say: no. I think everything in extraMetadata should be "ignored" by Opencast, i.e. never interpreted in any way, just copied around.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it depends on the definition of Opencast. I can see your point, but I guess the annotation tool or so can store additional stuff in there and it could be considered to be Opencast.

To the matter at hand: how do we support iTunes right now? I'm still not following.

#### Flags
- `explicitContent: bool`: specifies whether this event contains content that is considered "explicit", like swear words or whatnot. This is required for some integrations like iTunes.
- `isLive: bool` 🟦: TODO this is currently stored per track, figure out if that's useful
- `ingestUser: Username` 🟦: username of the user that created this event. Cannot be changed and is useful for tracking responsibility.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about API users doing stuff for other users?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would store the API user in there. For other usages we have owner or other fields. But yeah, we need to drill into that and understand use cases for this.

(Also not sure why I listed this under "flags", will move it...)

Comment on lines +30 to +31
- `downloadable: bool`: a flag indicating whether users are allowed to download this video (i.e. tracks attached to this event). This can inform external apps whether to show a download button or to enable anti-download protection. The exact effects of this flag are deliberately unspecified, this merely states an *intend*.
- `listed: bool`: specifies whether this event should be considered "list", meaning that users can find it via search. If it is `false`, users have to know the ID of the event (e.g. via a series or playlist) in order to access it.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels like specific to Tobira (sure downloadable is also interesting for Moodle). Why not extended metadata? What if you want different settings per external system (e.g. allow in Moodle and not in Tobira)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's that specific. During the summit, this topic came up and people basically asked why Tobira works the way it does regarding "listed": just because there is no flag in Opencast. That's why Tobira defines it implicitly via other means. And it seemed like people want an explicit flag for this, to make it user controllable.

But yeah, good point about the "different in different systems". Although wait, is there any other system out there where it matters? All LMS plugins only show the video attached to a course right? There is no global search through all videos, right? So then it's just Tobira and Opencast itself having this "search" functionality, where it matters.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also marked downloadable in the GitHub commen and the same applies to there. Moodle allows configurable downloads (configurable globally or per embedding and stores it in Moodle itself).

- The keys of this map consists of a _namespace_ and a _field name_, separated by `:`, i.e. `ns:name`. Both parts must consist of only `a-z`, `A-Z`, `0-9`, `-` and `_`.
- The namespace `dct` is special as it refers to the Dublin Core Terms specification, e.g. `dct:rightsHolder` refers to [the `rightsHolder` property](https://www.dublincore.org/specifications/dublin-core/dcmi-terms/#http://purl.org/dc/terms/rightsHolder) of DC terms. Also see [the DC mapping section below](#dublin-core-mapping). It should be avoided to set fields that already have a mapping, like `dct:title`, which is mapped to the OC core metadata `title`.
- Unlike the "extended metadata" before, using `extraMetadata` does work out of the box and does not incur any relevant performance overhead. Therefore, applications are encouraged to add useful data here, e.g. `studip:course-id`, `oc-studio:version` or `ethz:room-number`.
- There should be a community resource for collecting used fields and best practices around `extraMetadata`. That way, common requirements are identified quickly and the community can converge towards a standard.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see this going out of hand, but let's try.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants