全球AI大模型排行榜

🚀 AI大模型排行榜 全球领先的人工智能模型性能评测

247 模型总数
2025-08-12 更新日期
模型 公司 上下文长度 AI分析指数 ℹ️ MMLUPRO推理与知识 ℹ️ GPQA_DIAMOND科学推理 ℹ️ 期末考试 LIVECODEBENCH SCICODE HUMANEVAL MATH500 AIME2024 Chatbot_Arena 每百万Tokens价格 每秒输出Tokens
GPT-5 (high)
400k
69
87%
85%
27%
67%
43%
73%
94%
76%
96%
99%
99%
GPT-5 (medium)
400k
68
87%
84%
24%
70%
41%
71%
92%
73%
92%
99%
98%
Grok 4
256k
68
87%
88%
24%
82%
46%
54%
93%
68%
94%
99%
98%
o3-pro
200k
68
85%
o3
200k
67
85%
83%
20%
78%
41%
71%
88%
69%
90%
99%
99%
o4-mini (high)
200k
65
83%
78%
18%
80%
47%
69%
91%
55%
94%
99%
99%
Gemini 2.5 Pro
1m
65
86%
84%
21%
80%
43%
49%
88%
66%
89%
97%
GPT-5 mini
400k
64
83%
80%
15%
69%
41%
71%
85%
66%
Qwen3 235B 2507 (Reasoning)
256k
64
84%
79%
15%
79%
42%
51%
91%
67%
94%
98%
98%
GPT-5 (low)
400k
63
86%
81%
18%
75%
39%
67%
83%
59%
83%
99%
99%
Claude 4.1 Opus Thinking
200k
61
gpt-oss-120B (high)
131k
61
81%
78%
19%
64%
36%
69%
89%
51%
Gemini 2.5 Pro (Mar '25)
1m
59
86%
84%
17%
78%
40%
87%
98%
99%
Claude 4 Sonnet Thinking
200k
59
84%
78%
10%
66%
40%
55%
74%
65%
77%
99%
DeepSeek R1 0528
128k
59
85%
81%
15%
77%
40%
40%
76%
56%
89%
98%
97%
Gemini 2.5 Flash (Reasoning)
1m
58
83%
79%
11%
70%
39%
50%
73%
62%
82%
98%
96%
Grok 3 mini Reasoning (high)
1m
58
83%
79%
11%
70%
41%
46%
85%
50%
93%
99%
98%
Gemini 2.5 Pro (May' 25)
1m
58
84%
82%
15%
77%
42%
84%
99%
99%
GLM-4.5
128k
56
84%
78%
12%
74%
35%
44%
74%
48%
87%
98%
98%
o3-mini (high)
200k
55
80%
77%
12%
73%
40%
86%
99%
Claude 4 Opus Thinking
200k
55
87%
80%
12%
64%
40%
54%
73%
34%
76%
98%
GPT-5 nano
400k
54
77%
67%
8%
60%
34%
66%
78%
40%
Qwen3 30B 2507 (Reasoning)
33k
53
81%
71%
10%
71%
33%
91%
98%
MiniMax M1 80k
1m
53
82%
70%
8%
71%
37%
42%
61%
54%
85%
98%
o3-mini
200k
53
79%
75%
9%
72%
40%
77%
97%
97%
Llama Nemotron Super 49B v1.5 (Reasoning)
128k
52
81%
75%
7%
74%
35%
37%
77%
34%
86%
98%
95%
o1
200k
52
84%
75%
8%
68%
36%
72%
97%
97%
MiniMax M1 40k
1m
51
81%
68%
8%
66%
38%
81%
97%
Qwen3 235B 2507 (Non-reasoning)
256k
51
83%
75%
11%
52%
36%
46%
72%
31%
72%
98%
96%
Sonar Reasoning Pro
127k
51
79%
96%
EXAONE 4.0 32B (Reasoning)
131k
51
82%
74%
11%
75%
34%
36%
80%
14%
84%
98%
97%
Gemini 2.5 Flash (April '25) (Reasoning)
1m
50
80%
70%
12%
51%
36%
84%
98%
DeepSeek R1 (Jan '25)
128k
50
84%
71%
9%
62%
36%
68%
97%
98%
GLM-4.5-Air
128k
49
82%
73%
7%
68%
31%
67%
97%
93%
o1-preview
128k
49
92%
96%
gpt-oss-20B (high)
131k
49
74%
62%
9%
72%
35%
61%
62%
19%
Claude 4.1 Opus
200k
49
Kimi K2
128k
49
82%
77%
7%
56%
18%
42%
57%
51%
69%
97%
93%
Qwen3 235B (Reasoning)
33k
48
83%
70%
12%
62%
40%
39%
82%
0%
84%
93%
QwQ-32B
131k
48
76%
59%
8%
63%
36%
78%
96%
98%
Gemini 2.5 Flash
1m
47
81%
68%
5%
50%
29%
39%
60%
46%
50%
93%
95%
Claude 3.7 Sonnet Thinking
200k
47
84%
77%
10%
47%
40%
49%
95%
98%
GPT-4.1
1m
47
81%
67%
5%
46%
38%
43%
35%
61%
44%
91%
96%
Claude 4 Opus
200k
47
86%
70%
6%
54%
41%
43%
36%
36%
56%
94%
97%
Llama Nemotron Ultra Reasoning
128k
46
83%
73%
8%
64%
35%
38%
64%
7%
75%
95%
Qwen3 30B 2507 (Non-reasoning)
33k
46
78%
66%
7%
52%
30%
73%
98%
94%
Claude 4 Sonnet
200k
46
84%
68%
4%
45%
37%
45%
38%
44%
41%
93%
97%
Grok 3 Reasoning Beta
1m
46
o1-pro
200k
46
Qwen3 14B (Reasoning)
33k
45
77%
60%
4%
52%
32%
76%
96%
96%
Qwen3 Coder 480B
262k
45
79%
62%
4%
59%
36%
41%
39%
42%
48%
94%
97%
Gemini 2.5 Flash-Lite (Reasoning)
1m
44
76%
63%
6%
59%
19%
70%
97%
97%
Qwen3 32B (Reasoning)
33k
44
80%
67%
8%
55%
35%
36%
73%
0%
81%
96%
DeepSeek V3 0324 (Mar '25)
128k
44
82%
66%
5%
41%
36%
41%
41%
41%
52%
94%
92%
GPT-5 (minimal)
400k
44
81%
67%
5%
56%
39%
46%
32%
25%
37%
86%
95%
Solar Pro 2 (Reasoning)
66k
43
81%
69%
7%
62%
30%
37%
61%
0%
69%
97%
97%
o1-mini
128k
43
74%
60%
5%
58%
32%
60%
94%
97%
GPT-4.5 (Preview)
128k
42
Qwen3 30B (Reasoning)
33k
42
78%
62%
7%
51%
28%
42%
72%
0%
75%
96%
GPT-4.1 mini
1m
42
78%
66%
5%
48%
40%
43%
93%
95%
Llama 4 Maverick
1m
42
81%
67%
5%
40%
33%
43%
19%
46%
39%
89%
88%
Gemini 2.0 Flash Thinking exp. (Jan '25)
1m
42
80%
70%
7%
32%
33%
50%
94%
DeepSeek R1 0528 Qwen3 8B
33k
42
74%
61%
6%
51%
20%
65%
93%
91%
DeepSeek R1 Distill Qwen 32B
128k
41
74%
62%
6%
27%
38%
69%
94%
95%
Qwen3 8B (Reasoning)
33k
41
74%
59%
4%
41%
23%
75%
90%
Llama 3.3 Nemotron Super 49B Reasoning
128k
40
79%
64%
7%
28%
28%
58%
96%
96%
EXAONE 4.0 32B
131k
40
77%
63%
5%
47%
25%
47%
94%
91%
Solar Pro 2 (Reasoning)
64k
40
77%
58%
6%
46%
16%
66%
90%
Grok 3
1m
40
80%
69%
5%
43%
37%
33%
87%
91%
GPT-4o (March 2025)
128k
40
80%
66%
5%
43%
37%
26%
33%
89%
96%
Mistral Medium 3
128k
39
76%
58%
4%
40%
33%
39%
30%
28%
44%
91%
90%
Gemini 2.0 Pro Experimental
2m
38
81%
62%
7%
35%
31%
36%
92%
95%
DeepSeek R1 Distill Qwen 14B
128k
38
74%
48%
4%
38%
24%
67%
95%
93%
Sonar Reasoning
127k
38
62%
77%
92%
Gemini 2.5 Flash (April '25)
1m
38
78%
59%
5%
41%
23%
43%
93%
Gemini 2.0 Flash
1m
38
78%
62%
5%
33%
33%
40%
22%
28%
33%
93%
90%
Magistral Medium
40k
38
75%
68%
10%
53%
30%
25%
40%
0%
70%
92%
DeepSeek R1 Distill Llama 70B
128k
37
80%
40%
6%
27%
31%
67%
94%
97%
Claude 3.7 Sonnet
200k
37
80%
66%
5%
39%
38%
21%
22%
85%
95%
Qwen3 4B (Reasoning)
32k
36
70%
52%
5%
47%
4%
66%
93%
91%
Reka Flash 3
128k
36
67%
53%
5%
44%
27%
51%
89%
95%
Magistral Small
40k
36
75%
64%
7%
51%
24%
25%
41%
0%
71%
96%
96%
Gemini 2.0 Flash (exp)
1m
36
78%
64%
5%
21%
34%
30%
91%
91%
Nova Premier
1m
35
73%
57%
5%
32%
28%
36%
17%
30%
17%
84%
91%
Gemini 2.5 Flash-Lite
1m
35
72%
47%
4%
40%
18%
50%
93%
93%
DeepSeek V3 (Dec '24)
128k
35
75%
56%
4%
36%
35%
25%
89%
91%
Qwen2.5 Max
32k
34
76%
59%
5%
36%
34%
23%
84%
93%
Llama 3.1 Nemotron Nano 4B v1.1 (Reasoning)
128k
34
56%
41%
5%
49%
10%
71%
95%
Gemini 1.5 Pro (Sep)
2m
34
75%
59%
5%
32%
30%
23%
88%
90%
Solar Pro 2
64k
34
73%
54%
4%
39%
27%
30%
87%
88%
Claude 3.5 Sonnet (Oct)
200k
33
77%
60%
4%
38%
37%
16%
77%
93%
Qwen3 Coder 30B
262k
33
71%
52%
4%
40%
28%
30%
89%
92%
Qwen3 235B
33k
33
76%
61%
5%
34%
30%
37%
24%
0%
33%
90%
Solar Pro 2
66k
33
75%
56%
4%
42%
25%
34%
30%
0%
41%
89%
88%
Llama 4 Scout
10m
33
75%
59%
4%
30%
17%
40%
14%
26%
28%
84%
83%
Sonar
127k
32
69%
47%
7%
30%
23%
49%
82%
82%
Mistral Small 3.2
128k
32
68%
51%
4%
28%
26%
34%
27%
17%
32%
88%
85%
Sonar Pro
200k
32
76%
58%
8%
28%
23%
29%
75%
85%
Command A
256k
32
71%
53%
5%
29%
28%
37%
13%
18%
10%
82%
82%
QwQ 32B-Preview
33k
32
65%
56%
5%
34%
4%
45%
91%
87%
Devstral Medium
256k
31
71%
49%
4%
34%
29%
30%
5%
29%
7%
71%
94%
Llama 3.3 70B
128k
31
71%
50%
4%
29%
26%
47%
8%
15%
30%
77%
86%
Gemini 2.0 Flash-Lite (Feb '25)
1m
30
72%
54%
4%
19%
25%
28%
87%
88%
Qwen3 30B
33k
30
71%
52%
5%
32%
26%
32%
22%
0%
26%
86%
GPT-4.1 nano
1m
30
66%
51%
4%
33%
26%
24%
85%
88%
Qwen3 14B
33k
30
68%
47%
4%
28%
27%
28%
87%
Qwen3 32B
33k
30
73%
54%
4%
29%
28%
32%
20%
0%
30%
87%
90%
GPT-4o (May '24)
128k
30
74%
53%
3%
33%
31%
11%
79%
94%
Gemini 2.0 Flash-Lite (Preview)
1m
30
54%
4%
18%
25%
30%
87%
90%
GPT-4o (Nov '24)
128k
30
75%
54%
3%
31%
33%
34%
6%
0%
15%
76%
93%
GPT-4o (Aug '24)
128k
29
52%
3%
32%
12%
80%
93%
Llama 3.1 405B
128k
29
73%
52%
4%
31%
30%
21%
70%
85%
Qwen2.5 72B
131k
29
72%
49%
4%
28%
27%
16%
86%
88%
MiniMax-Text-01
4m
29
76%
58%
4%
25%
25%
13%
75%
86%
Nova Pro
300k
29
69%
50%
3%
23%
21%
38%
7%
19%
11%
79%
83%
Claude 3.5 Sonnet (June)
200k
29
75%
56%
4%
32%
10%
70%
90%
Tulu3 405B
128k
29
72%
52%
4%
29%
30%
13%
78%
89%
GPT-4o (ChatGPT)
128k
29
77%
51%
4%
33%
53%
10%
80%
94%
Llama 3.3 Nemotron Super 49B v1
128k
28
70%
52%
4%
28%
23%
19%
78%
83%
Grok 2
131k
28
71%
51%
4%
27%
28%
13%
78%
86%
Phi-4
16k
28
71%
57%
4%
23%
26%
24%
18%
0%
14%
81%
87%
Gemini 1.5 Flash (Sep)
1m
28
68%
46%
4%
27%
27%
18%
83%
84%
GPT-4 Turbo
128k
28
69%
3%
29%
32%
15%
74%
92%
Mistral Large 2 (Nov '24)
128k
27
70%
49%
4%
29%
29%
11%
74%
90%
Llama Nemotron Super 49B v1.5
128k
27
69%
48%
4%
29%
24%
33%
22%
14%
77%
86%
Qwen3 1.7B (Reasoning)
32k
27
57%
36%
5%
31%
4%
51%
89%
85%
Mistral Small 3.1
128k
26
66%
45%
5%
21%
27%
30%
4%
14%
9%
71%
86%
Grok Beta
128k
26
70%
47%
5%
24%
30%
10%
74%
87%
Pixtral Large
128k
26
70%
51%
4%
26%
29%
7%
71%
85%
Qwen2.5 Instruct 32B
128k
26
70%
47%
4%
25%
23%
11%
81%
90%
Llama 3.1 Nemotron 70B
128k
26
69%
47%
5%
17%
23%
25%
73%
82%
Qwen3 8B
33k
25
64%
45%
3%
20%
17%
24%
83%
Mistral Large 2 (Jul '24)
128k
25
68%
47%
3%
27%
27%
9%
71%
89%
Gemma 3 27B
128k
25
67%
43%
5%
14%
21%
32%
21%
0%
25%
88%
89%
Qwen2.5 Coder 32B
131k
25
64%
42%
4%
30%
27%
12%
77%
90%
GPT-4
8k
25
Nova Lite
300k
25
59%
43%
5%
17%
14%
34%
7%
18%
11%
77%
84%
GPT-4o mini
128k
24
65%
43%
4%
23%
23%
31%
15%
12%
79%
88%
Llama 3.1 70B
128k
24
68%
41%
5%
23%
27%
17%
65%
81%
Gemma 3 12B
128k
24
60%
35%
5%
14%
17%
37%
18%
7%
22%
85%
83%
Mistral Small 3
32k
24
65%
46%
4%
25%
24%
8%
72%
85%
DeepSeek-V2.5 (Dec '24)
128k
24
76%
88%
Qwen3 4B
32k
24
59%
40%
4%
23%
17%
21%
84%
Claude 3 Opus
200k
24
70%
49%
3%
28%
23%
3%
64%
85%
Claude 3.5 Haiku
200k
23
63%
41%
4%
31%
27%
3%
72%
86%
Gemini 2.0 Flash Thinking exp. (Dec '24)
2m
23
48%
94%
DeepSeek-V2.5
128k
23
87%
Devstral Small (May '25)
256k
23
63%
43%
4%
26%
25%
7%
68%
85%
Mistral Saba
32k
23
61%
42%
4%
24%
13%
68%
85%
DeepSeek R1 Distill Llama 8B
128k
23
54%
30%
4%
23%
12%
33%
85%
84%
Reka Core
128k
22
56%
73%
Gemini 1.5 Pro (May)
2m
22
66%
37%
4%
24%
27%
8%
67%
83%
R1 1776
128k
22
95%
Qwen2.5 Turbo
1m
22
63%
41%
4%
16%
15%
12%
81%
85%
Reka Flash
128k
22
53%
74%
Llama 3.2 90B (Vision)
128k
22
67%
43%
5%
21%
24%
5%
63%
82%
Solar Mini
4k
22
33%
59%
Reka Flash (Feb '24)
128k
22
33%
61%
Reka Edge
128k
21
22%
41%
Grok-1
8k
21
Qwen2 72B
131k
21
62%
37%
4%
16%
23%
15%
70%
83%
Devstral Small
256k
21
62%
41%
4%
25%
24%
0%
64%
85%
Nova Micro
130k
20
53%
36%
5%
14%
9%
29%
6%
10%
8%
70%
80%
Gemma 2 27B
8k
20
57%
36%
4%
28%
13%
30%
54%
76%
Gemini 1.5 Flash-8B
1m
19
57%
36%
5%
22%
23%
3%
69%
12%
Llama 3.1 8B
128k
19
48%
26%
5%
12%
13%
29%
4%
16%
8%
52%
67%
Gemma 3n E4B
32k
18
49%
30%
4%
15%
8%
28%
14%
0%
14%
77%
DeepHermes 3 - Mistral 24B
32k
18
58%
38%
4%
20%
23%
5%
60%
75%
Jamba 1.7 Large
256k
18
58%
39%
4%
18%
19%
6%
60%
71%
Jamba 1.5 Large
256k
18
57%
43%
4%
14%
16%
5%
61%
24%
Granite 3.3 8B
128k
18
47%
34%
4%
13%
10%
22%
7%
4%
5%
67%
71%
Hermes 3 - Llama-3.1 70B
128k
17
57%
40%
4%
19%
23%
2%
54%
75%
DeepSeek-Coder-V2
128k
17
74%
87%
Jamba 1.6 Large
256k
17
56%
39%
4%
17%
18%
5%
58%
70%
Gemini 1.5 Flash (May)
1m
17
57%
32%
4%
20%
18%
9%
55%
72%
Yi-Large
32k
16
59%
36%
3%
11%
19%
7%
56%
74%
Claude 3 Sonnet
200k
16
58%
40%
4%
18%
23%
5%
41%
71%
Codestral (Jan '25)
256k
16
45%
31%
5%
24%
25%
4%
61%
85%
Llama 3 70B
8k
16
57%
38%
4%
20%
19%
0%
48%
79%
Mistral Small (Sep '24)
33k
16
53%
38%
4%
14%
16%
6%
56%
81%
Gemini 1.0 Ultra
33k
16
Gemma 3n E4B (May '25)
32k
15
48%
28%
5%
14%
9%
11%
75%
76%
Phi-4 Multimodal
128k
15
49%
32%
4%
13%
11%
9%
69%
73%
Qwen2.5 Coder 7B
131k
15
47%
34%
5%
13%
15%
5%
66%
90%
Mistral Large (Feb '24)
33k
15
52%
35%
3%
18%
21%
0%
53%
71%
Jamba Instruct
256k
15
34%
27%
5%
5%
8%
24%
0%
Mixtral 8x22B
65k
14
54%
33%
4%
15%
19%
0%
55%
72%
Phi-4 Mini
128k
14
47%
33%
4%
13%
11%
3%
70%
74%
Llama 2 Chat 7B
4k
14
16%
23%
6%
0%
0%
0%
6%
13%
Gemma 3 4B
128k
14
42%
29%
5%
11%
7%
28%
13%
6%
77%
72%
Llama 3.2 11B (Vision)
128k
13
46%
22%
5%
11%
9%
52%
69%
Qwen3 1.7B
32k
13
41%
28%
5%
13%
7%
10%
72%
Qwen1.5 Chat 110B
32k
13
29%
Phi-3 Medium 14B
128k
13
54%
33%
5%
15%
12%
1%
46%
0%
Claude 2.1
200k
12
50%
32%
4%
20%
18%
3%
37%
16%
Claude 3 Haiku
200k
12
15%
19%
1%
39%
76%
Pixtral 12B
128k
11
47%
34%
5%
12%
14%
0%
46%
78%
Qwen3 0.6B (Reasoning)
32k
11
35%
24%
6%
12%
3%
10%
75%
49%
Claude 2.0
100k
11
49%
34%
17%
19%
0%
DeepSeek-V2
128k
11
87%
Mistral Small (Feb '24)
33k
11
42%
30%
4%
11%
13%
1%
56%
79%
Mistral Medium
33k
11
49%
35%
3%
10%
12%
4%
41%
GPT-3.5 Turbo
4k
11
46%
30%
44%
70%
Gemma 3n E2B
32k
10
38%
23%
4%
10%
5%
9%
69%
Ministral 8B
128k
10
39%
28%
5%
11%
12%
4%
57%
77%
Gemma 2 9B
8k
10
50%
31%
4%
13%
1%
0%
52%
65%
Phi-3 Mini
4k
10
44%
32%
4%
12%
9%
4%
46%
25%
Arctic
4k
10
75%
Qwen Chat 72B
34k
10
LFM 40B
32k
10
43%
33%
5%
10%
7%
2%
48%
51%
Command-R+
128k
9
43%
34%
5%
11%
12%
0%
40%
63%
Llama 3 8B
8k
9
41%
30%
5%
10%
12%
0%
50%
71%
PALM-2
8k
9
Gemini 1.0 Pro
33k
9
43%
28%
5%
12%
12%
1%
40%
2%
DeepSeek Coder V2 Lite
128k
8
43%
32%
5%
16%
14%
Codestral (May '24)
33k
8
33%
26%
5%
21%
22%
0%
35%
80%
Aya Expanse 32B
128k
8
38%
23%
5%
14%
15%
0%
45%
68%
Llama 2 Chat 70B
4k
8
41%
33%
5%
10%
0%
32%
34%
DeepSeek LLM 67B (V1)
4k
8
75%
Llama 2 Chat 13B
4k
8
41%
32%
5%
10%
12%
2%
33%
0%
Command-R+ (Apr '24)
128k
8
43%
32%
5%
12%
12%
1%
28%
64%
OpenChat 3.5
8k
8
31%
23%
5%
12%
0%
31%
68%
DBRX
33k
8
40%
33%
7%
9%
12%
3%
28%
67%
Ministral 3B
128k
8
34%
26%
6%
7%
9%
0%
54%
74%
Mistral NeMo
128k
8
40%
31%
4%
6%
10%
0%
40%
65%
Llama 3.2 3B
128k
7
35%
26%
5%
8%
5%
7%
49%
56%
DeepSeek R1 Distill Qwen 1.5B
128k
7
27%
10%
3%
7%
7%
18%
69%
45%
Jamba 1.5 Mini
256k
6
37%
30%
5%
6%
8%
1%
36%
63%
Jamba 1.7 Mini
258k
6
39%
32%
5%
6%
9%
1%
26%
48%
Jamba 1.6 Mini
256k
5
37%
30%
5%
7%
10%
3%
26%
43%
Mixtral 8x7B
33k
5
39%
29%
5%
7%
3%
0%
30%
1%
Qwen3 0.6B
32k
4
23%
23%
5%
7%
4%
2%
52%
34%
DeepHermes 3 - Llama-3.1 8B
128k
4
37%
27%
4%
9%
9%
0%
22%
54%
Aya Expanse 8B
8k
4
31%
25%
5%
7%
8%
0%
32%
44%
Command-R
128k
3
34%
29%
5%
4%
9%
0%
15%
42%
Command-R (Mar '24)
128k
2
34%
28%
5%
5%
6%
1%
16%
40%
Claude Instant
100k
2
43%
33%
4%
11%
0%
26%
2%
Qwen Chat 14B
8k
2
Codestral-Mamba
256k
2
21%
21%
5%
13%
11%
0%
24%
80%
Gemma 3 1B
32k
1
14%
24%
5%
2%
1%
0%
48%
32%
Llama 3.2 1B
128k
1
20%
20%
5%
2%
2%
0%
14%
40%
Llama 65B
2k
1
Mistral 7B
8k
1
25%
18%
4%
5%
2%
0%
12%
40%
Grok 3 mini Reasoning (low)
1m
GPT-4o mini Realtime (Dec '24)
128k
GPT-4o Realtime (Dec '24)
128k
GPT-3.5 Turbo (0613)
4k

📊 数据可视化

AI分析指数对比

价格vs性能散点图

输出速度对比

欢迎来到AI快讯网,开启AI资讯新时代!