Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mixtral-8x7b-instruct support #3862

Open
scoute opened this issue Feb 17, 2025 · 2 comments
Open

Mixtral-8x7b-instruct support #3862

scoute opened this issue Feb 17, 2025 · 2 comments
Labels
enhancement New feature or request

Comments

@scoute
Copy link

scoute commented Feb 17, 2025

Hi, guys!

First of all, I would like to express my gratitude. I have been using TabbyML and Qwen-2.5-Coder-32b-instruct for several months now, and I feel joy and excitement every day!

Recently, I discovered another model called Mixtral-8x7b-instruct, which surpassed Qwen in response speed by approximately 3-4 times! I ran Mixtral with koboldcpp, but I couldn't manage to run it with TabbyML no matter how hard I tried.

I even tried launching Mixtral-8x7b-instruct under the guise of Mistral-7b, but that didn't work either.

Perhaps you can suggest a way to launch it through parameters in the config or add support for this model. Although it is slightly larger than Qwen-32b, it works significantly faster, even on CPU.

@scoute scoute added the enhancement New feature or request label Feb 17, 2025
@zwpaper
Copy link
Member

zwpaper commented Feb 18, 2025

Hi @scoute, Thank you for the information. We are pleased that Tabby could help.

I discovered that KoboldCPP is a fork of Llama.cpp, and llama.cpp itself has also implemented mixtral support, as seen here: ggml-org/llama.cpp#4406. I believe it should not be challenging to integrate it. We will investigate this further at a later time.

Additionally, could you please share with us your hardware information related to running the Qwen 32B? We would also like to know about your user experience and the specific casino where you are utilizing Tabby.

@scoute
Copy link
Author

scoute commented Feb 18, 2025

I just write code using Tabby chat prompts.
My hardware: AMD Ryzen 9 7950X, DDR5 128 Gb

KoboldCPP showed this difference:
(213.0ms/T = 4.69T/s) - Mixtral-8x7b-instruct (q5) - 28Gb
(638.5ms/T = 1.57T/s) - Qwen-2.5-Coder-32b-instruct (q8) - 33Gb

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants