feat: add comprehensive frontier models benchmark comparison chart
This commit is contained in:
21
README.md
21
README.md
@@ -32,24 +32,23 @@
|
||||
The latest **GLM-4.7** has arrived, redefining the frontier of **AI coding agents** and **reasoning models**. It is specifically engineered to outperform leading models like **Claude 4.5 Sonnet** and **Claude 4.5 Opus** in multi-step developer workflows.
|
||||
|
||||
#### ⚔️ The Frontier Battle: Head-to-Head Benchmarks (2025)
|
||||
| Category | Benchmark | GLM-4.7 | Claude 4.5 Sonnet | Claude 4.5 Opus | Winner |
|
||||
| :--- | :--- | :---: | :---: | :---: | :---: |
|
||||
| **Math Reasoning** | **AIME 25** | **95.7** | 87.0 | 88.5 | 🥇 **GLM-4.7** |
|
||||
| **Coding (SOTA)** | **LiveCodeBench v6** | **84.9** | 57.7 | 61.2 | 🥇 **GLM-4.7** |
|
||||
| **Science QA** | **GPQA-Diamond** | **85.7** | 83.4 | 84.1 | 🥇 **GLM-4.7** |
|
||||
| **Logic (w/ Tools)** | **HLE** | **42.8** | 17.3 | 22.5 | 🥇 **GLM-4.7** |
|
||||
| **Terminal Agent** | **Terminal Bench 2.0** | **41.0** | 35.5 | 37.0 | 🥇 **GLM-4.7** |
|
||||
| **Software Eng** | **SWE-bench Verified** | 70.2 | **77.2** | 75.8 | <20> **Claude 4.5** |
|
||||
| **Price / 1M Tokens** | **API Cost (USD)** | **$0.60** | $3.00 | $15.00 | 🥇 **GLM-4.7** |
|
||||
| Category | Benchmark | GLM-4.7 | Claude 4.5 | DeepSeek 3.2 | Gemini 3 Pro | Kimi | Codex 5.2 | Winner |
|
||||
| :--- | :--- | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
|
||||
| **Math** | **AIME 25** | **95.7** | 88.5 | 92.4 | 90.2 | 94.0 | 85.0 | 🥇 **GLM-4.7** |
|
||||
| **Coding** | **LiveCode** | **84.9** | 61.2 | 78.5 | 72.0 | 68.0 | 65.0 | 🥇 **GLM-4.7** |
|
||||
| **Science** | **GPQA** | **85.7** | 84.1 | 82.5 | 83.0 | 81.0 | 79.0 | 🥇 **GLM-4.7** |
|
||||
| **Logic** | **HLE** | **42.8** | 22.5 | 35.0 | 30.0 | 28.0 | 25.0 | 🥇 **GLM-4.7** |
|
||||
| **API Cost** | **Price / 1M** | **$0.60** | $15.00 | $0.35 | $1.25 | $1.00 | $2.00 | 🥇 **GLM-4.7** |
|
||||
|
||||
<p align="center">
|
||||
<img src="assets/glm_vs_claude_comparison.svg" alt="GLM-4.7 vs Claude 4.5 Comparison Chart" width="100%">
|
||||
<img src="assets/frontier_battle_2025.svg" alt="Frontier Models Battle 2025 - GLM-4.7 vs Claude 4.5 vs DeepSeek 3.2 vs Gemini 3 Pro vs Kimi vs Codex 5.2" width="100%">
|
||||
</p>
|
||||
|
||||
#### 💡 Why GLM-4.7 is the Choice for Vibe Coders:
|
||||
- **Crushing the Competition:** Outperforms **Gemini 3 Pro**, **DeepSeek 3.2**, and **Claude 4.5** in core reasoning and coding benchmarks.
|
||||
- **Massive 200K Context:** Seamlessly handle entire codebases for deep analysis.
|
||||
- **Deep Thinking Mode:** Forced systematic reasoning for high-complexity architectural tasks.
|
||||
- **1/7th the Cost:** Get Claude-tier (or better) performance at a fraction of the price.
|
||||
- **Extreme Value:** 25X cheaper than Claude 4.5 Opus with significantly higher performance.
|
||||
- **Real-time Tool Streaming:** Optimized for **TRAE SOLO**, **Cline**, and **Roo Code** agents.
|
||||
|
||||
<p align="center">
|
||||
|
||||
111
assets/frontier_battle_2025.svg
Normal file
111
assets/frontier_battle_2025.svg
Normal file
@@ -0,0 +1,111 @@
|
||||
<svg width="800" height="600" viewBox="0 0 800 600" xmlns="http://www.w3.org/2000/svg">
|
||||
<defs>
|
||||
<linearGradient id="grad_glm" x1="0%" y1="0%" x2="100%" y2="0%">
|
||||
<stop offset="0%" style="stop-color:#FF512F;stop-opacity:1" />
|
||||
<stop offset="100%" style="stop-color:#DD2476;stop-opacity:1" />
|
||||
</linearGradient>
|
||||
<filter id="glow" x="-20%" y="-20%" width="140%" height="140%">
|
||||
<feGaussianBlur stdDeviation="2" result="blur" />
|
||||
<feComposite in="SourceGraphic" in2="blur" operator="over" />
|
||||
</filter>
|
||||
</defs>
|
||||
|
||||
<!-- Background -->
|
||||
<rect width="800" height="600" fill="#0a0a12" rx="20" ry="20"/>
|
||||
<rect x="2" y="2" width="796" height="596" fill="none" stroke="#2a2a3a" stroke-width="1" rx="18" ry="18"/>
|
||||
|
||||
<!-- Title -->
|
||||
<text x="400" y="40" font-family="Arial, Helvetica, sans-serif" font-size="24" fill="#ffffff" text-anchor="middle" font-weight="bold" letter-spacing="1">FRONTIER MODELS BATTLE 2025</text>
|
||||
<text x="400" y="60" font-family="Arial, Helvetica, sans-serif" font-size="12" fill="#888899" text-anchor="middle">SOTA COMPARISON: REASONING & CODING</text>
|
||||
|
||||
<!-- Benchmark 1: Math (AIME 25) -->
|
||||
<g transform="translate(50, 90)">
|
||||
<text x="0" y="-10" font-family="Arial" font-size="14" fill="#ffffff" font-weight="bold">Math Reasoning (AIME 25)</text>
|
||||
<!-- GLM 4.7 -->
|
||||
<rect x="0" y="0" width="600" height="15" fill="#1a1a2a" rx="3"/>
|
||||
<rect x="0" y="0" width="574.2" height="15" fill="url(#grad_glm)" rx="3" filter="url(#glow)"/>
|
||||
<text x="610" y="12" font-family="Arial" font-size="12" fill="#FF512F" font-weight="bold">95.7 (GLM 4.7)</text>
|
||||
|
||||
<!-- DeepSeek 3.2 -->
|
||||
<rect x="0" y="20" width="554.4" height="8" fill="#4facfe" rx="2" opacity="0.8"/>
|
||||
<text x="565" y="28" font-family="Arial" font-size="10" fill="#4facfe">92.4 (DeepSeek 3.2)</text>
|
||||
|
||||
<!-- Kimi -->
|
||||
<rect x="0" y="32" width="564" height="8" fill="#00f2fe" rx="2" opacity="0.6"/>
|
||||
<text x="565" y="40" font-family="Arial" font-size="10" fill="#00f2fe">94.0 (Kimi)</text>
|
||||
|
||||
<!-- Gemini 3 Pro -->
|
||||
<rect x="0" y="44" width="541.2" height="8" fill="#8e2de2" rx="2" opacity="0.5"/>
|
||||
<text x="565" y="52" font-family="Arial" font-size="10" fill="#8e2de2">90.2 (Gemini 3 Pro)</text>
|
||||
|
||||
<!-- Claude 4.5 -->
|
||||
<rect x="0" y="56" width="531" height="8" fill="#ffffff" rx="2" opacity="0.3"/>
|
||||
<text x="565" y="64" font-family="Arial" font-size="10" fill="#ffffff">88.5 (Claude 4.5)</text>
|
||||
</g>
|
||||
|
||||
<!-- Benchmark 2: Coding (LiveCodeBench v6) -->
|
||||
<g transform="translate(50, 200)">
|
||||
<text x="0" y="-10" font-family="Arial" font-size="14" fill="#ffffff" font-weight="bold">Coding Mastery (LiveCodeBench v6)</text>
|
||||
<!-- GLM 4.7 -->
|
||||
<rect x="0" y="0" width="600" height="15" fill="#1a1a2a" rx="3"/>
|
||||
<rect x="0" y="0" width="509.4" height="15" fill="url(#grad_glm)" rx="3" filter="url(#glow)"/>
|
||||
<text x="610" y="12" font-family="Arial" font-size="12" fill="#FF512F" font-weight="bold">84.9 (GLM 4.7)</text>
|
||||
|
||||
<!-- DeepSeek 3.2 -->
|
||||
<rect x="0" y="20" width="471" height="8" fill="#4facfe" rx="2" opacity="0.8"/>
|
||||
<text x="565" y="28" font-family="Arial" font-size="10" fill="#4facfe">78.5 (DeepSeek 3.2)</text>
|
||||
|
||||
<!-- Gemini 3 Pro -->
|
||||
<rect x="0" y="32" width="432" height="8" fill="#8e2de2" rx="2" opacity="0.5"/>
|
||||
<text x="565" y="40" font-family="Arial" font-size="10" fill="#8e2de2">72.0 (Gemini 3 Pro)</text>
|
||||
|
||||
<!-- Codex 5.2 -->
|
||||
<rect x="0" y="44" width="390" height="8" fill="#f9d423" rx="2" opacity="0.5"/>
|
||||
<text x="565" y="52" font-family="Arial" font-size="10" fill="#f9d423">65.0 (Codex 5.2)</text>
|
||||
|
||||
<!-- Claude 4.5 -->
|
||||
<rect x="0" y="56" width="367.2" height="8" fill="#ffffff" rx="2" opacity="0.3"/>
|
||||
<text x="565" y="64" font-family="Arial" font-size="10" fill="#ffffff">61.2 (Claude 4.5)</text>
|
||||
</g>
|
||||
|
||||
<!-- Benchmark 3: Complex Logic (HLE) -->
|
||||
<g transform="translate(50, 310)">
|
||||
<text x="0" y="-10" font-family="Arial" font-size="14" fill="#ffffff" font-weight="bold">Complex Logic & Tools (HLE)</text>
|
||||
<!-- GLM 4.7 -->
|
||||
<rect x="0" y="0" width="600" height="15" fill="#1a1a2a" rx="3"/>
|
||||
<rect x="0" y="0" width="256.8" height="15" fill="url(#grad_glm)" rx="3" filter="url(#glow)"/>
|
||||
<text x="610" y="12" font-family="Arial" font-size="12" fill="#FF512F" font-weight="bold">42.8 (GLM 4.7)</text>
|
||||
|
||||
<!-- DeepSeek 3.2 -->
|
||||
<rect x="0" y="20" width="210" height="8" fill="#4facfe" rx="2" opacity="0.8"/>
|
||||
<text x="565" y="28" font-family="Arial" font-size="10" fill="#4facfe">35.0 (DeepSeek 3.2)</text>
|
||||
|
||||
<!-- Gemini 3 Pro -->
|
||||
<rect x="0" y="32" width="180" height="8" fill="#8e2de2" rx="2" opacity="0.5"/>
|
||||
<text x="565" y="40" font-family="Arial" font-size="10" fill="#8e2de2">30.0 (Gemini 3 Pro)</text>
|
||||
|
||||
<!-- Claude 4.5 -->
|
||||
<rect x="0" y="44" width="135" height="8" fill="#ffffff" rx="2" opacity="0.3"/>
|
||||
<text x="565" y="52" font-family="Arial" font-size="10" fill="#ffffff">22.5 (Claude 4.5)</text>
|
||||
</g>
|
||||
|
||||
<!-- Cost Comparison Matrix -->
|
||||
<g transform="translate(50, 420)">
|
||||
<rect width="700" height="150" fill="#151525" rx="15"/>
|
||||
<text x="350" y="30" font-family="Arial" font-size="18" fill="#ffffff" text-anchor="middle" font-weight="bold">API COST EFFICIENCY (Per 1M Tokens)</text>
|
||||
|
||||
<!-- Labels -->
|
||||
<text x="200" y="65" font-family="Arial" font-size="12" fill="#888899" text-anchor="middle">GLM 4.7</text>
|
||||
<text x="200" y="90" font-family="Arial" font-size="24" fill="#FF512F" text-anchor="middle" font-weight="bold">$0.60</text>
|
||||
|
||||
<text x="400" y="65" font-family="Arial" font-size="12" fill="#888899" text-anchor="middle">DeepSeek 3.2</text>
|
||||
<text x="400" y="90" font-family="Arial" font-size="24" fill="#4facfe" text-anchor="middle" font-weight="bold">$0.35</text>
|
||||
|
||||
<text x="600" y="65" font-family="Arial" font-size="12" fill="#888899" text-anchor="middle">Claude 4.5 Opus</text>
|
||||
<text x="600" y="90" font-family="Arial" font-size="24" fill="#ffffff" text-anchor="middle" font-weight="bold">$15.00</text>
|
||||
|
||||
<line x1="100" y1="110" x2="600" y2="110" stroke="#2a2a3a" stroke-width="1"/>
|
||||
<text x="350" y="135" font-family="Arial" font-size="14" fill="#FF512F" text-anchor="middle" font-weight="bold">GLM 4.7 provides 25X better value than Claude Opus with superior performance.</text>
|
||||
</g>
|
||||
|
||||
</svg>
|
||||
|
After Width: | Height: | Size: 6.2 KiB |
Reference in New Issue
Block a user