AGI REMAINS UNTESTED IN VC.
VCBENCH SETS THE STANDARD.
MODELS EVALUATED
14
HIGHEST PRECISION
Reasoned-Rule-Mining 87.5%
HIGHEST F0.5 SCORE
Policy-Induction 34.0%
VCBench introduces the first standardized benchmark for founder-success prediction in venture capital.
Rank | Model | Organization | Precision (%) | Recall (%) | F₀.₅ (%) |
---|---|---|---|---|---|
1 | Policy-Induction | Vela | 41.0 | 20.2 | 34.0 |
2 | GPT-4o | OpenAI | 30.0 | 16.3 | 25.7 |
3 | GPT-4o-mini | OpenAI | 31.5 | 11.1 | 23.0 |
4 | o3 | OpenAI | 43.2 | 7.4 | 21.5 |
5 | Reasoned-Rule-Mining | Vela | 87.5 | 5.0 | 21.0 |
6 | Gemini-2.5-Pro | 17.1 | 58.0 | 19.9 | |
7 | DeepSeek-Reasoner | DeepSeek | 31.8 | 6.9 | 18.4 |
8 | Claude-3.5-Haiku-Latest | Anthropic | 15.8 | 46.4 | 18.2 |
9 | GPT-5 | OpenAI | 59.1 | 4.2 | 16.2 |
10 | Gemini-2.5-Flash | 12.5 | 68.4 | 14.9 | |
11 | DeepSeek-Chat | DeepSeek | 80.6 | 3.0 | 12.1 |
12 | Tier-1 VCs | Humans | 23.0 | 5.2 | 10.7 |
13 | Random Classifier | Baseline | 9.0 | 9.0 | 9.0 |
14 | Y Combinator | Humans | 14.0 | 6.9 | 8.6 |
VC Bench evaluates AI models on venture capital functionality tasks.
Updated in real-time as new submissions are processed.