Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Blog Tags for Trainings #633

Open
stevepiercy opened this issue Aug 31, 2022 · 7 comments
Open

Blog Tags for Trainings #633

stevepiercy opened this issue Aug 31, 2022 · 7 comments
Labels
04 type: enhancement 14 prio: low 99 use case for AI assistance Generative AI ist critical for content creation. Marks use cases for assistant technology.

Comments

@stevepiercy
Copy link
Contributor

Consider using blog-style tags for trainings for searching.

See an example implementation at: https://sphinx-tags.readthedocs.io/en/latest/quickstart.html

@acsr
Copy link
Contributor

acsr commented Mar 14, 2025

@stevepiercy I second this approach. Can we simply add sphinx-tags as extension to overcome the hen egg problem. And then add a more concrete example how to use that option in the MyST Reference section. If you can suggest a different order of execution (eg. testing in a branch) let me know. I think this cannot break stuff, is easy to remove and would help a lot. We can even hide tags using CSS until release of a serious amount of tags.

@acsr acsr added the 99 use case for AI assistance Generative AI ist critical for content creation. Marks use cases for assistant technology. label Mar 14, 2025
@acsr
Copy link
Contributor

acsr commented Mar 14, 2025

By the way: Suggesting tags with some guidance is a typical task for a dumb LLM based AI-Agent in the postprocessing of content.

  1. Take this content by Markdown URL list as given (nothing else)
  2. Take these existing tags as given: [List]
  3. Go through the listet content items/urls by using the rendered versions as well – one by one and add tags in the Markdown source.
  4. Suggest missing tags and offer them as a second step for review.
  5. Add the suggested tags except those of a given exclude list to the Markdown as a second step.
  6. Create one pull request per task (needs some refinement)
  7. Write a proper change log per session for content editors
  8. Write a news style changelog for consumers of the trainings to catch up with latest changes

Note: Using AI here does not rise any copyright issues at all.

@stevepiercy
Copy link
Contributor Author

I don't think tags add value, given the PLIP plone/Products.CMFPlone#4097.

As such, I personally have zero interest in doing the work to implement this extension, as it subverts the long-term goal. I think we should close this issue.

@acsr
Copy link
Contributor

acsr commented Mar 14, 2025

I cannot get exactly where the PLIP plone/Products.CMFPlone#4097 adresses the purpose of tags.

Tags are part of SEO and information architecture in knowledge management. Even with a upcoming Nuclia search engine.

Why? Tags (or keywords) come into consideration when the actual content lacks terms that help to reflect all synonyms or relationships or contexts the written content is relevant for. There are also usecases where a term belongs to a particular namespace or its would help to insert it into different taxonomies. It avoids bloating the content itself. There are tags that are synonym to terms and tags that are synthetic creations to to take them out of existing namespaces and make them unique.

If you can point me to the exact point in your PLIP that adresses this purpose before closing this would help my understanding.

@stevepiercy
Copy link
Contributor Author

You didn't mention SEO until now. In any case, we don't have a problem with SEO. https://duckduckgo.com/?t=ffab&q=plone+training&atb=v190-1&ia=web

We barely have enough interest from authors to contribute trainings, much less maintain them, and even less to create and maintain tags. It would fall to me, and I don't have the interest to do this. I have much, much higher priorities, like having Nuclia search results, as explained in the PLIP. A good search tool would obviate the need for tags. The problem is easily finding content, and a good search beats tags every time.

I also don't have the interest to mentor a first-timer through the process.

I have negative interest in bringing AI into the picture to "do it for us". It just sounds like more work, to place the responsibility of curation onto an AI. Its output would still require curation by a human.

If this is something that you feel strongly enough to take on and maintain or mentor a first-timer through, please do. It's very low priority.

@acsr
Copy link
Contributor

acsr commented Mar 14, 2025

I have negative interest in bringing AI into the picture to "do it for us". It just sounds like more work, to place the responsibility of curation onto an AI. Its output would still require curation by a human.

I agree with your personal priorities.

Actually a good search like algolia, typesense or nuclia need to use machine learning aka AI to improve synonym or context improvement. To make this happen you need to map synonyms not available in the content to their counterparts. In Algolia this was a manual task years ago similar to adding tags. Leaving this to a "good" search engine is uncurated AI as well. The difference when adding tags, is that the assignment is visible and allows to visit aggregated similar content. Much closer than current Sphinx results.

SEO needs in fact no special mentioning, since searching relates to finding (at least a bit).
All AI or classic full text search is failing if the content lacks proper markup or content or the engine lacks access to context. Metadata is king. Tags are metadata. AI succeeds in semantic context resolution by using much longer distance vector chains.

In any case, we don't have a problem with SEO. https://duckduckgo.com/?t=ffab&q=plone+training&atb=v190-1&ia=web

An example when a search engine finds obvious content is no proof how to find stuff those search engines do not discover due to lacking context.

Thanks for leaving this at low prio. We need to see how fast nuclia comes into range and if this vaporizes as you expect. I am open.

The increasing quality of results by LLMs creating Python code has only one origin: The excellent Sphinx documentation markup of the standard library.

@stevepiercy
Copy link
Contributor Author

See the work that went into configuring Nuclia to do exactly that in the open pull requests in Training linked in the PLIP.

We use https://pypi.org/project/sphinxext-opengraph/ as well for SEO. Full list of extensions: https://6.docs.plone.org/contributing/documentation/themes-and-extensions.html#extensions

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
04 type: enhancement 14 prio: low 99 use case for AI assistance Generative AI ist critical for content creation. Marks use cases for assistant technology.
Projects
None yet
Development

No branches or pull requests

2 participants