Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Separate Sshable user and initial unix user in assemble_with_sshable #3008

Merged
merged 1 commit into from
Mar 19, 2025

Conversation

fdr
Copy link
Collaborator

@fdr fdr commented Mar 18, 2025

This patch is intended to be a semantic no-op as applied, but once
managed services have been adapted to install their rhizome programs
into the rhizome home directory and use the rhizome user to
execute them (like VmHosts), the call sites currently specifying
sshable_unix_user: "ubi" can stop specifying sshable_unix_user and
rely on the default value rhizome.

Sshable was originally designed exclusively for the rhizome user
to support running its programs. This changed in commit
51ee438 to support E2E test cases
requiring standard cached SSH access without needing rhizome
installation (initially introduced in
68f0c7a).

It was later adapted for GitHub Actions, which learned to bypass the
rhizome upload phase for performance gains and meeting specific unix
user naming requirements for parity with first-party GitHub Action
Runners. We saw firsthand that minor deviations here could lead to
confusion, including an inconsistency seen in the GitHub Actions
product: see e2329cc.

It so happens both of these use cases involve using a single,
bootstrapped user as the only user clover is aware of or needs to be
aware of. But, it was incorrectly extended to services with full
rhizome directories and upload steps. This was an oversight.
Despite creating a separate rhizome user, rhizome programs were
loaded into the home directory of ubi, and both automation and
personnel used the same unix user to log in and authorized_keys file
to log in, which made authorized_keys more awkward to update than
the arrangement seen on VM Hosts, which was intended to be the default
arrangement.

Although both Sshable and Vm have parameters and database fields named
unix_user, they originated independently for distinct purposes. I
think their shared naming, while typically a useful heuristic for
coupling in Clover's codebase, contributed to this repeated oversight.

Copy link
Contributor

@jeremyevans jeremyevans left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes seem fine to me, but I'm not sure if this separates the arguments. This looks like it just converts a required positional argument to an optional keyword argument. Maybe the backstory should be inserted into the commit message?

@fdr
Copy link
Collaborator Author

fdr commented Mar 18, 2025

The changes seem fine to me, but I'm not sure if this separates the arguments. This looks like it just converts a required positional argument to an optional keyword argument. Maybe the backstory should be inserted into the commit message?

there is an aliasing issue, in that both Vm has a unix_user (which is passed to cloud-init, and defaults to ubi on this platform), but so does Sshable, though Sshable began life being hard-coded to rhizome, but to the extent it grew the ability to be not-rhizome is due to a couple of particular carve-outs (a case in CI, as related in commit message said: 51ee438, and later github actions for other kind of sensitive/particular reasons). So it really began as two unrelated concepts that got merged.

@fdr fdr force-pushed the separate-users branch from e6b742c to d376bee Compare March 19, 2025 03:53
This patch is intended to be a semantic no-op as applied, but once
managed services have been adapted to install their `rhizome` programs
into the `rhizome` home directory and use the `rhizome` user to
execute them (like `VmHost`s), the call sites currently specifying
`sshable_unix_user: "ubi"` can stop specifying `sshable_unix_user` and
rely on the default value `rhizome`.

`Sshable` was originally designed exclusively for the `rhizome` user
to support running its programs. This changed in commit
51ee438 to support E2E test cases
requiring standard cached SSH access without needing rhizome
installation (initially introduced in
68f0c7a).

It was later adapted for GitHub Actions, which learned to bypass the
`rhizome` upload phase for performance gains and meeting specific unix
user naming requirements for parity with first-party GitHub Action
Runners.  We saw firsthand that minor deviations here could lead to
confusion, including an inconsistency seen in the GitHub Actions
product: see e2329cc.

It so happens both of these use cases involve using a single,
bootstrapped user as the only user clover is aware of or needs to be
aware of.  But, it was incorrectly extended to services with full
`rhizome` directories and upload steps.  This was an oversight.
Despite creating a separate `rhizome` user, `rhizome` programs were
loaded into the home directory of `ubi`, and both automation and
personnel used the same unix user to log in and `authorized_keys` file
to log in, which made `authorized_keys` more awkward to update than
the arrangement seen on VM Hosts, which was intended to be the default
arrangement.

Although both `Sshable` and `Vm` have parameters and database fields
named `unix_user`, they originated independently for distinct
purposes. I think their shared naming, while typically a useful
heuristic for coupling in Clover's codebase, contributed to this
repeated oversight.
@fdr fdr force-pushed the separate-users branch from d376bee to c3f0115 Compare March 19, 2025 03:53
@furkansahin
Copy link
Contributor

Why is everyone approving a draft pr?

@fdr fdr marked this pull request as ready for review March 19, 2025 17:08
@fdr
Copy link
Collaborator Author

fdr commented Mar 19, 2025

Why is everyone approving a draft pr?

image

I didn't unmark it as draft because I was waiting for CI, which is kinda slow these days compared to my historical norm, but I had to take off before it finished.

But also, unlike historical norms, I usually asked for reviews even before letting tests runs (and failing them), but this annoyed @pykello that one time because I wrote code and tagged him when it was quite broken, on the optimistic principle that no one was going to review my stuff that fast. My solution was this was to use draft until CI finished, which is now lengthy time wise.

Maybe I'll take a middle-ground where, if I actually bothered running the local test suite (for that fix that I broke and annoyed @pykello, I didn't) and am reasonbly sure it'll succeed, I'll take it out of draft.

@fdr fdr merged commit fb31111 into main Mar 19, 2025
8 checks passed
@fdr fdr deleted the separate-users branch March 19, 2025 17:13
@github-actions github-actions bot locked and limited conversation to collaborators Mar 19, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants