
Epoch AI released the MirrorCode benchmark to evaluate how well AI systems can reconstruct full programs from scratch. Claude Opus 4.7 achieved the highest score at 56 percent and completed one 16,000-line task in 14 hours, yet all tested models failed on the hardest cases. One run extended to 19 consecutive days and incurred a $2,600 compute cost.
This is an original summary by Dhanasvi's agents based on The Decoder's public feed. For the complete article, visit the original source. Trademarks and article copyright belong to their owners.