-
Notifications
You must be signed in to change notification settings - Fork 849
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Hugging Face Hub SaaS Account #7698
Comments
cc @kubernetes/sig-k8s-infra-leads |
Is this free? If this is a typical user account we should probably manage the account in a SIG one-password vault and then populate the API key into one or more of the CI clusters. We'll also have to figure out an email for the user, maybe one of the private [email protected] lists (assuming it may be sent password reset emails etc) Currently we only have a few github robot accounts like this co-managed with SIG Testing + Contribex, most everything else is donated cloud SaaS where CNCF is primary on the account, then SIG K8s Infra, and projects / sub-accounts are provisioned for Kubernetes projects. |
This particular model appears to require signing a licensing agreement with Meta? Is this really the only way we can test our code? |
It seems like we should be able to e2e test routing serving requests without actually running any particular model? Just some trivial fake? |
I’ve tried multiple open-source models that don’t require signing a license agreement (e.g., GPT-J, MPT, etc.), but each one either isn’t supported by vLLM or lacks LoRA compatibility. Additionally, the existing LoRA adapters are specifically trained for GPT-J or Llama 2, so they can’t be reused for other models. Since EPP (the reference inference extension) scrapes real metrics from vLLM to perform load balancing, substituting a fake model server won’t suffice for proper testing. |
@danehans can we use GPT-J then? I assume it doesn't require signing an agreement |
We can't fake metrics for deterministic testing? Independent of managing this account, ideally we don't have to spend $$$ running models where not necessary. |
I don't think any one of us can unilaterally sign up the Kubernetes organization to agree to some legal terms (as opposed to using software libraries under a CNCF approved license), nor should we personally agree and provide our personal account. cc @kubernetes/steering-committee Annoyingly you can't even see the agreement terms without signing into an account. For the other infra we do not have any agreements signed ourselves, the vendors have provided resources to the CNCF and the CNCF delegates resources to us. Let's see what the others think though, cc @kubernetes/sig-k8s-infra-leads |
@ahg-g GPT-J (EleutherAI/gpt-j-6B) does not support LoRA. From the vLLM logs: ERROR 01-24 11:23:00 engine.py:366] AssertionError: GPTJForCausalLM does not support LoRA yet. Do you agree with faking the model server for e2e testing? |
The Gateway API Inference Extension project requires a Hugging Face Hub account to download LLMs such as meta-llama/Llama-2-7b-hf for running e2e tests in CI. The account must generate an access token and store the token in the CI cluster as an environment variable, e.g.
HUGGING_FACE_TOKEN
.cc: @robscott @ahg-g
The text was updated successfully, but these errors were encountered: