ResearchThe Decoder· Jun 26, 2026

AI Model Runs 19 Days on Single MirrorCode Task Costing $2,600

Epoch AI released the MirrorCode benchmark to evaluate how well AI systems can reconstruct full programs from scratch. Claude Opus 4.7 achieved the highest score at 56 percent and completed one 16,000-line task in 14 hours, yet all tested models failed on the hardest cases. One run extended to 19 consecutive days and incurred a $2,600 compute cost.

Key points

→MirrorCode measures AI ability to rebuild programs without source access
→Claude Opus 4.7 leads with 56 percent solve rate
→Longest single task required 19 days and $2,600 in compute
→Every model failed the most complex benchmark items

Read the full story on The Decoder

Mentioned

Epoch AIClaude Opus 4.7MirrorCode

AI Model Runs 19 Days on Single MirrorCode Task Costing $2,600

Key points

Mentioned

Related stories

AI Model Runs 19 Days on Single MirrorCode Task Costing $2,600

Key points

Mentioned

Related stories