The second batch of “First Proof” problems is meant to evaluate AI’s usefulness for research-level math. The best model got ...
Fine-tuned “student” models can pick up unwanted traits from base “teacher” models that could evade data filtering, generating a need for more rigorous safety evaluations. Researchers have discovered ...