Evaluating Claude Opus 4.5

Pradyu Prasad

Features

Try it out in Elicit

Dec 17, 2025

1 min read

Claude Opus 4.5 came out 2 days ago so we benchmarked Opus 4.5, Sonnet 4.5, and Gemini 3 Pro on research tasks at Elicit - extracting answers from papers and writing systematic review reports.

Claude Opus 4.5 vs Sonnet 4.5 vs Gemini 3 Pro

For question-answering and data extraction from papers, Opus 4.5 is the new state of the art.

Opus 4.5 has 96.5% accuracy vs Gemini's 89.4%
Opus is also best on our combined "accurate, supported, and direct" metric (76% vs 71%).
Gemini is slightly better on claim supportedness (an measure for hallucination).

For report writing, Opus 4.5 produces significantly better-supported reports than Sonnet 4.5, the previous best model for this task:

62% of Opus' claims were well-supported vs Sonnet's 54%
Only 31% of Opus' claims were poorly-supported vs Sonnet's 40%
Opus is less verbose and writes approximately 20% fewer claims per report.

We didn't compare Gemini since Sonnet 4.5 already wins 75% of head-to-head comparisons vs Gemini, and Gemini is 6x slower than Sonnet.

We also did deeper manual comparisons of 5 reports, and found that Opus and Sonnet reach the same conclusions with no dramatic differences in output. Sonnet writes longer reports with more extensive commentary by default.

Opus 4.5 is now live to Elicit users on Pro, Teams, and Enterprise.

Introducing the Elicit API

Search our 138M+ papers and generate Reports using our API.

Mar 3, 2026

Introducing the Elicit API

Search our 138M+ papers and generate Reports using our API.

Mar 3, 2026

Introducing the Elicit API

Search our 138M+ papers and generate Reports using our API.

Feb 18, 2026

The AI Force Multiplier – Orchestrating Integrated Evidence to Drive Biopharma Value

AI elevates RWE planning into a corporate strategic mandate

Feb 18, 2026

The AI Force Multiplier – Orchestrating Integrated Evidence to Drive Biopharma Value

AI elevates RWE planning into a corporate strategic mandate

Feb 18, 2026

The AI Force Multiplier – Orchestrating Integrated Evidence to Drive Biopharma Value

AI elevates RWE planning into a corporate strategic mandate

Jan 19, 2026

Science and Culture, Part II: The Arnold-Huxley debate

The Arnold-Huxley debate on the role of sciences and humanities in education encapsulates some of the best arguments on both sides.

Jan 19, 2026

Science and Culture, Part II: The Arnold-Huxley debate

The Arnold-Huxley debate on the role of sciences and humanities in education encapsulates some of the best arguments on both sides.

Jan 19, 2026

Science and Culture, Part II: The Arnold-Huxley debate

The Arnold-Huxley debate on the role of sciences and humanities in education encapsulates some of the best arguments on both sides.

Save time, think better.

Try Elicit for free

Talk to Sales

Evaluating Claude Opus 4.5

Dec 17, 2025

1 min read

Read next

Introducing the Elicit API

Introducing the Elicit API

Introducing the Elicit API

The AI Force Multiplier – Orchestrating Integrated Evidence to Drive Biopharma Value

The AI Force Multiplier – Orchestrating Integrated Evidence to Drive Biopharma Value

The AI Force Multiplier – Orchestrating Integrated Evidence to Drive Biopharma Value

Science and Culture, Part II: The Arnold-Huxley debate

Science and Culture, Part II: The Arnold-Huxley debate

Science and Culture, Part II: The Arnold-Huxley debate

Solutions

Industries

Resources

MORE

Solutions

Industries

Resources

MORE

Solutions

Industries

Resources

MORE