One of the biggest statistical biases one encounters when trying to assess the true success rate of AI tools is the strong reporting bias against disclosing negative results. If an individual or AI company research group applies their AI tool to an open problem, but makes no substantial progress, there is little incentive for the user of that tool to report the negative statement; furthermore, even if such results are reported, they are less likely to go "viral" on social media than positive results. As a consequence, the results one actually hears about on such media is inevitably highly skewed towards the positive results.

With that in mind, I commend this recent initiative of Paata Ivanisvili and Mehmet Mars Seven to systematically document the outcomes (both positive and negative) of applying frontier LLMs to open problems, such as the Erdos problems: mehmetmars7.github.io/Erdospro

As one can see, the true success rate of these tools for, say, the Erdos problems is actually only on the level of a percentage point or two; but with over 600 outstanding open problems, this still leads to an impressively large (and non-trivial) set of actual AI contributions to these problems, though overwhelmingly concentrated near the easy end of the difficulty spectrum, and not yet a harbinger that the median Erdos problem is anywhere within reach of tehse tools.

0

If you have a fediverse account, you can quote this note from your own instance. Search https://mathstodon.xyz/users/tao/statuses/115911902186528812 on your instance and quote it. (Note that quoting is not supported in Mastodon.)