You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am trying to research MoE layer memory optimizations and I am using vLLM to do so. I have some custom logging code in the initializers/model code of the Mixtral model. When I load a quantized model, no logging code is executed. Simple print statements in the MixtralModel.__init__ are not printed to screen. Is this on purpose? Where are the MoE kernels getting executed?
Thanks for any help, I have been stuck on this for a while.
@robertgshaw2-neuralmagic is it a good idea to submit a PR to change the name of the quant Mixtral implementation? It wasn't very clear to me that it was the same name as the non-quantized implementation.
Hi,
I am trying to research MoE layer memory optimizations and I am using vLLM to do so. I have some custom logging code in the initializers/model code of the Mixtral model. When I load a quantized model, no logging code is executed. Simple print statements in the
MixtralModel.__init__
are not printed to screen. Is this on purpose? Where are the MoE kernels getting executed?Thanks for any help, I have been stuck on this for a while.
For reference, I have tried to use the https://huggingface.co/TheBloke/mixtral-8x7b-v0.1-AWQ and I have quantized my own models with autoAWQ and bitsandbytes and the same behavior occurs.
The text was updated successfully, but these errors were encountered: