Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

User Guide chapter for address-based hashing #1294

Merged
merged 15 commits into from
Apr 1, 2025

Conversation

wks
Copy link
Collaborator

@wks wks commented Mar 25, 2025

This PR adds a User Guide chapter for address-based hashing.

We also extended the Glossary to introduce GC-safe points and related concepts.

@wks wks requested a review from qinsoon March 26, 2025 02:58
@@ -47,6 +48,153 @@ implementing memory-sensitive caches.

[java-soft-ref]: https://docs.oracle.com/en/java/javase/21/docs/api/java.base/java/lang/ref/SoftReference.html

## GC-safe Point
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my opinion, the description here is way too long for 'glossary'. Glossary is usuallly just one or few sentences that define a term, short and precise, rather than describing the term with a few paragraphs or sections.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I am aware that the "Glossary" in most books are concise. It is true for the GC Handbook. But I still think there are differences between our Glossary and that of others.

Firstly, our Glossary also has the responsibility to disambiguate terms that are used in the MMTk project from those used differently by other people, papers, books and/or software projects. So in addition to explaining those terms, we may also compare and contrast our terminology with the terms used by others, listing synonyms ("also known as ...") as well as seemingly related terms that are actually different.

Secondly, our audience are the users of MMTk, including GC developers and VM binding developers. Not all of them are GC experts, and they may not be familiar with important concepts such as GC-safe points, yieldpoints, etc. which we core developers usually take for granted. For such audience, simple descriptions in one or two sentences are not enough. I hope the Glossary can be a mini encyclopedia that covers enough details for GC and VM binding developers to make the right decision, and link to other resources (such as papers) to direct the readers to the right path.

Thirdly, books introduce important terms in the main text chapters, leaving the Glossary as a quick reference for the readers. The Glossary of the GC Handbook is only 13 pages in length, and each term is introduced in exactly one sentence. But if the reader is not satisfied by the short description, they can always use the index to locate the main text. On the contrary, the User Guide is not a comprehensive guide for GC like the GC Handbook. We don't always have other chapters to link to (although we do have some for Finalizer and Address-based Hashing which we introduce in this PR).

I think we can reconcile those several needs.

  • If we have another chapter that introduces a concept in greater detail, we can be brief in the Glossary, and link to that chapter.
  • If there is an open-access paper that introduces that concept and is sufficiently accurate and up to date, we can be brief, too, in the Glossary, and refer the readers to that paper. We can't simply link to the GC Handbook which is behind a paywall.
  • Otherwise we shall introduce the concept in greater details in the Glossary. But we may offload the content to another chapter if it is too long. If the concept fits well in the structure of the Tutorial or the Porting Guide, we can move the content there. Otherwise, a dedicated "special concerns" chapter may work as well.

As for the concept of GC-safe point, I think

  • It is an important concept that VM binding developers should know.
  • It could be part of the Porting Guide, but we currently don't have such a chapter. In theory, it could be part of the "next steps" when guiding the user to implement a stop-the-world MarkSweep GC.
  • There is no open-access material that provides a satisfying description of GC-safe points. This paper is about yieldpoint which is one kind of GC-safe point. The GC Handbook describes "GC-safe point" briefly as "a point in the mutator code at which it is safe for GC to occur", but does not generalize it to concurrent GC or native/runtime code. But it is usually the runtime code that brings trouble w.r.t. the safety of GC.

So I think we can leave this "GC-safe point" section as is in the Glossary for now, as long as the contents are accurate and complete. We can later add a MarkSweep chapter in the Porting Guide next to NoGC (which we should have added long ago) and move the detailed introduction of "GC-safe point" there.

@@ -360,11 +360,13 @@ pub trait ObjectModel<VM: VMBinding> {
/// * `from`: The address of the object to be copied.
/// * `semantics`: The copy semantic to use.
/// * `copy_context`: The `GCWorkerCopyContext` for the GC thread.
// ANCHOR: copy
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest referring to the method by its API doc, such as https://docs.mmtk.io/api/mmtk/vm/object_model/trait.ObjectModel.html#tymethod.copy. We never had ANCHOR in the main repo (we only have those in tests and doc code).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to replace it with a link to API doc, but I still find that it is easier for the reader to understand if we include the signature of the ObjectModel::copy method, or at least its parameter names because the names (especially the from argument) are frequently mentioned in the text. I ended up changing the Address-based Hashing doc into this:

When using a non-delayed-copy collector, MMTk calls `ObjectModel::copy(from, semantics,
copy_context)` to copy the `from` object to a new location (See [API doc][object-model-copy-api]).

This is not as good as directly including part of the source code surrounded by ANCHOR: because (1) it always correctly reflects the actual signature of the function, and (2) when a developer modifies the signature of ObjectModel::copy in object_model.rs, the ANCHOR: can remind the developer that we may need to update the documentation.

This is the first time we add ANCHOR in the main source code, but I think it will be beneficial.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can just embed the function declaration in the doc (with the risk of being outdated).

I am concerned that if we start using ANCHOR in the code and keep adding more (in function signatures, in type declarations, or inside function body), it will be disruptive for the readability -- it is hard to argue which ANCHOR is good and can be allowed, and what should not be allowed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicating the function declaration in the doc with a link to the API should be sufficiently clear, even if it is outdated.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the ANCHOR and copied the signature of ObjectModel::copy into the documentation.

- `ObjectModel::get_align_when_copied`
- `ObjectModel::get_align_offset_when_copied`

When using a non-delayed-copy collector, MMTk calls `ObjectgModel::copy` which is defined as:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo:

Suggested change
When using a non-delayed-copy collector, MMTk calls `ObjectgModel::copy` which is defined as:
When using a non-delayed-copy collector, MMTk calls `ObjectModel::copy` which is defined as:

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@wks wks added this pull request to the merge queue Apr 1, 2025
Merged via the queue into mmtk:master with commit 302f611 Apr 1, 2025
33 checks passed
@wks wks deleted the feature/address-based-hashing-guide branch April 1, 2025 04:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants