Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add SNP-style virtual attestations, restoring code update tests #6770

Open
wants to merge 17 commits into
base: main
Choose a base branch
from

Conversation

eddyashton
Copy link
Member

We previously had a vestigial virtual attestation reusing some of the terminology and fields of SGX attestations. This didn't provide any distinctions between nodes or apply checks during node joining, so wasn't usefully testing code upgrade flows.

This has been replaced with a new scheme based on SNP attestations. A virtual node now has a measurement (the sha256 of the enclave library, calculated by the host at startup) and a host data/security policy value (currently a single default string for security policy, with host data the sha256 of that as it is for SNP). This introduces many duplicated tables, and associated duplicated governance, because we don't want to risk collisions across platforms.

The beneficial outcome is that we can now test code update flows (ie - change the "permitted nodes" of a service at run-time, confirm that old nodes can no longer join) close to how they run on SNP. We can also test some of the effects of fiddling with these tables (eg - omitting security policies, setting invalid host data) outside of SNP, though there's the caveat that these are all touching separate governance actions and tables.

There's no endorsements for virtual attestations, to avoid creating/maintaining any fake hardware keys, but this means there are still join paths on SNP that virtual doesn't test. I've tried to avoid too many renames/refactors of existing fields, but the existing PAL is extremely porous and inconsistent, so some of the names/concepts are unclear (ie - "host_data" is an SNP concept, "security_policy" is what we/ACI put there, but the names aren't consistently split and the digesting/decoding is haphazard).

I'll add some comments describing the changes I remember, when it's not last-thing-on-a-Friday.

@eddyashton eddyashton requested a review from a team as a code owner January 17, 2025 16:34
Comment on lines -51 to -53
@reqs.description("Test quotes")
@reqs.supports_methods("/node/quotes/self", "/node/quotes")
def test_quote(network, args):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is deleted. It's almost exactly the same as verify_quotes in code_update. The only other thing it does is check that the /node/quotes/self calls match entries from the single /node/quotes list. That's now added to verify_quotes.

Comment on lines -10 to +30
enclave_type, enclave_platform, oe_binary_dir, package, library_dir="."
):
def get_measurement(enclave_type, enclave_platform, package, library_dir="."):
lib_path = infra.path.build_lib_path(
package, enclave_type, enclave_platform, library_dir
)

if enclave_platform == "sgx":
res = subprocess.run(
[os.path.join(oe_binary_dir, "oesign"), "dump", "-e", lib_path],
capture_output=True,
check=True,
)
lines = [
line
for line in res.stdout.decode().split(os.linesep)
if line.startswith("mrenclave=")
]
if enclave_platform == "virtual":
hash = sha256(open(lib_path, "rb").read())
return hash.hexdigest()

return lines[0].split("=")[1]
else:
# Virtual and SNP
return hashlib.sha256(lib_path.encode()).hexdigest()
raise ValueError(f"Cannot get measurement on {enclave_platform}")


def get_host_data_and_security_policy(enclave_platform):
DEFAULT_VIRTUAL_SECURITY_POLICY = "Default CCF virtual security policy"
if enclave_platform == "snp":
security_policy = snp.get_container_group_security_policy()
elif enclave_platform == "virtual":
security_policy = DEFAULT_VIRTUAL_SECURITY_POLICY
else:
raise ValueError(f"Cannot get security policy on {enclave_platform}")
host_data = sha256(security_policy.encode()).hexdigest()
return host_data, security_policy
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moving away from any ability to generate/check SGX attestations, updating terminology to "measurement" rather than "code_id". We get host data and security policies on the client for SNP (because we're actually the same box...), but don't yet get the measurement - I think we could now do that from Python too, and get closer to an SNP code update story, but it's outside the scope of this PR.

Comment on lines +579 to +597
# Measurements
test_measurements_tables(network, args)
test_add_node_with_bad_code(network, args)

# Host data/security policy
test_host_data_tables(network, args)
test_add_node_with_bad_host_data(network, args)
test_add_node_with_stubbed_security_policy(network, args)

if snp.IS_SNP:
test_snp_measurements_tables(network, args)
test_add_node_with_no_uvm_endorsements(network, args)
test_host_data_table(network, args)
test_add_node_without_security_policy(network, args)
test_add_node_remove_trusted_security_policy(network, args)
test_start_node_with_mismatched_host_data(network, args)
test_add_node_with_bad_host_data(network, args)
test_add_node_with_bad_code(network, args)
# NB: Assumes the current nodes are still using args.package, so must run before test_proposal_invalidation
test_add_node_with_bad_security_policy(network, args)

# Endorsements
if snp.IS_SNP:
test_endorsements_tables(network, args)
test_add_node_with_no_uvm_endorsements(network, args)

# NB: Assumes the current nodes are still using args.package, so must run before test_update_all_nodes
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lots of renamed tests, but hopefully the correspondence to the old ones is relatively clear. We now have fewer SNP-only tests, because they have a sane Virtual implementation.

Comment on lines 589 to +590
test_add_node_without_security_policy(network, args)
test_add_node_remove_trusted_security_policy(network, args)
test_start_node_with_mismatched_host_data(network, args)
test_add_node_with_bad_host_data(network, args)
test_add_node_with_bad_code(network, args)
# NB: Assumes the current nodes are still using args.package, so must run before test_proposal_invalidation
test_add_node_with_bad_security_policy(network, args)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm humming and hahhing about supporting these, but will likely look at it if the PR lives for a while. We don't currently have a way to dynamically set a different security policy for Virtual, and doing so would require some kind of plumbing (probably an env var or file it reads? But still needs to dive through several layers of our infra). The SNP model both has all of this plumbing (with snp_ specific names), but ignores it for these tests and just fiddles with some files in the security-context directory. Should we duplicate that for virtual? I vote no.

Comment on lines +735 to +743
def add_measurement(self, remote_node, platform, measurement):
if platform == "sgx":
return self.add_new_code(remote_node, measurement)
elif platform == "virtual":
return self.add_virtual_measurement(remote_node, measurement)
elif platform == "snp":
return self.add_snp_measurement(remote_node, measurement)
else:
raise ValueError(f"Unsupported platform {platform}")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These wrapper functions could live in code_update.py, but are plausibly useful for other tests. At the platform-agnostic test code, you want to "update a measurement", but under the hood that has to call a specific governance function to write to a platform-specific table. Other options are available, this is even more copy-paste code, but I think it's "fine".

Comment on lines +356 to +364
if new_host_data is not None:
old_host_data, old_security_policy = (
infra.utils.get_host_data_and_security_policy(args.enclave_platform)
)

if old_host_data != new_host_data:
network.consortium.remove_host_data(
primary, args.enclave_platform, old_host_data
)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is hopefully predicting what will be necessary if/when a version-transition includes a host-data update. Currently it doesn't, on either SNP or Virtual. But if it does, we will (probably?) want to cycle it like we cycle measurements (I hope?).

});
virtual_policy["hostData"] = virtual_host_data;

response_body["virtual"] = virtual_policy;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is one of the first big additions to the new governance API since we dropped the TypeSpec, meaning it's currently undocumented. We could try manually patching the generated OpenAPI, but I think that's rubbish! We could restore auto-generated descriptions for these endpoints (currently all hidden), but we deliberately don't rely on the magic auto-serialisers for this, so it wouldn't really help. Gah.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant