Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Hugging Face Hub SaaS Account #7698

Open
danehans opened this issue Jan 20, 2025 · 10 comments
Open

Add Hugging Face Hub SaaS Account #7698

danehans opened this issue Jan 20, 2025 · 10 comments
Labels
sig/k8s-infra Categorizes an issue or PR as relevant to SIG K8s Infra.

Comments

@danehans
Copy link
Contributor

The Gateway API Inference Extension project requires a Hugging Face Hub account to download LLMs such as meta-llama/Llama-2-7b-hf for running e2e tests in CI. The account must generate an access token and store the token in the CI cluster as an environment variable, e.g. HUGGING_FACE_TOKEN.

cc: @robscott @ahg-g

@danehans danehans added the sig/k8s-infra Categorizes an issue or PR as relevant to SIG K8s Infra. label Jan 20, 2025
@ameukam
Copy link
Member

ameukam commented Jan 23, 2025

cc @kubernetes/sig-k8s-infra-leads

@BenTheElder
Copy link
Member

Is this free?

If this is a typical user account we should probably manage the account in a SIG one-password vault and then populate the API key into one or more of the CI clusters.

We'll also have to figure out an email for the user, maybe one of the private [email protected] lists (assuming it may be sent password reset emails etc)

Currently we only have a few github robot accounts like this co-managed with SIG Testing + Contribex, most everything else is donated cloud SaaS where CNCF is primary on the account, then SIG K8s Infra, and projects / sub-accounts are provisioned for Kubernetes projects.

@BenTheElder
Copy link
Member

account to download LLMs such as meta-llama/Llama-2-7b-hf for running e2e tests in CI.

This particular model appears to require signing a licensing agreement with Meta? Is this really the only way we can test our code?

@BenTheElder
Copy link
Member

It seems like we should be able to e2e test routing serving requests without actually running any particular model? Just some trivial fake?

@danehans
Copy link
Contributor Author

I’ve tried multiple open-source models that don’t require signing a license agreement (e.g., GPT-J, MPT, etc.), but each one either isn’t supported by vLLM or lacks LoRA compatibility. Additionally, the existing LoRA adapters are specifically trained for GPT-J or Llama 2, so they can’t be reused for other models. Since EPP (the reference inference extension) scrapes real metrics from vLLM to perform load balancing, substituting a fake model server won’t suffice for proper testing.

@ahg-g
Copy link
Member

ahg-g commented Jan 24, 2025

@danehans can we use GPT-J then? I assume it doesn't require signing an agreement

@BenTheElder
Copy link
Member

BenTheElder commented Jan 24, 2025

Since EPP (the reference inference extension) scrapes real metrics from vLLM to perform load balancing, substituting a fake model server won’t suffice for proper testing.

We can't fake metrics for deterministic testing?

Independent of managing this account, ideally we don't have to spend $$$ running models where not necessary.
(This is where we build things like kind, kwowk etc to make testing more sustainable)

@BenTheElder
Copy link
Member

I don't think any one of us can unilaterally sign up the Kubernetes organization to agree to some legal terms (as opposed to using software libraries under a CNCF approved license), nor should we personally agree and provide our personal account. cc @kubernetes/steering-committee

Annoyingly you can't even see the agreement terms without signing into an account.

For the other infra we do not have any agreements signed ourselves, the vendors have provided resources to the CNCF and the CNCF delegates resources to us.

Let's see what the others think though, cc @kubernetes/sig-k8s-infra-leads

@danehans
Copy link
Contributor Author

@ahg-g GPT-J (EleutherAI/gpt-j-6B) does not support LoRA. From the vLLM logs:

ERROR 01-24 11:23:00 engine.py:366] AssertionError: GPTJForCausalLM does not support LoRA yet.

Do you agree with faking the model server for e2e testing?

@ahg-g
Copy link
Member

ahg-g commented Jan 24, 2025

@ahg-g GPT-J (EleutherAI/gpt-j-6B) does not support LoRA. From the vLLM logs:

@liu-cong mentioned that we can use mistral

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sig/k8s-infra Categorizes an issue or PR as relevant to SIG K8s Infra.
Projects
None yet
Development

No branches or pull requests

4 participants