Alibaba Qwen models Qwen2.5-72B-Instruct & Qwen2.5-Coder-32B-Instruct | Voters

Alibaba Qwen models Qwen2.5-72B-Instruct & Qwen2.5-Coder-32B-Instruct

under review

Samuel Jackson

With the release of Qwen's web interface, I had a chance to try various models they offered and was especially impressed with Qwen2.5-72B-Instruct and Qwen2.5-Coder-32B-Instruct. Would it be possible to have these models integrated into Merlin?

Thank you very much

January 11, 2025

endu

Merged in a post:

add Qwen2.5-Max

tag t

Ali's new model Qwen2.5-max High ability in coding mathematics.I believe it can make us more efficient.

February 10, 2025

endu

updated the status to

under review

Samuel Jackson

Qwen2.5-Coder-32B-Instruct
Price:
Input tokens: $0.80 per 1M tokens.
Output tokens: $0.80 per 1M tokens.
Blended price (3:1 input-to-output ratio): $0.80 per 1M tokens.
Context Window: Up to 131,072 tokens.
Output Speed: Approximately 72 tokens per second3.
Latency (Time to First Token): Around 0.37–0.49 seconds depending on the provider.
Rate Limits:
Specific rate limits depend on the API provider (e.g., Fireworks, DeepInfra, etc.), but no explicit hard limit is mentioned in the sources.
Qwen2.5-72B-Instruct
Price:
Input tokens: $0.40 per 1M tokens.
Output tokens: $0.75 per 1M tokens.
Blended price (3:1 input-to-output ratio): $0.40–$0.90 per 1M tokens based on provider.
Context Window: Up to 131,072 tokens.
Output Speed: Approximately 67.9–77 tokens per second depending on the provider.
Latency (Time to First Token): Around 0.31–0.82 seconds depending on the provider.
Rate Limits:
Similar to Qwen2.5-Coder, rate limits depend on the API provider, but no specific hard limits are provided in the sources.

Photo Viewer