Lemmit.Online bot@lemmit.online

Lemmit.Online bot@lemmit.online

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/machinelearning by /u/marojejian on 2024-10-18 19:36:07+00:00.

Paper:

“while o1’s performance is a quantum improvement on the benchmark, outpacing the competition, it is still far from saturating it…”

The summary is apt. o1 looks to be a very impressive improvement. At the same time, it reveals the remaining gaps: degradation with increasing composition length, 100x cost, and huge degradation when “retrieval” is hampered via obfuscation of names.

But, I wonder if this is close enough. e.g. this type of model is at least sufficient to provide synthetic data / supervision to train a model that can fill these gaps. If so, it won’t take long to find out, IMHO.

Also the authors have some spicy footnotes. e.g. :

“The rich irony of researchers using tax payer provided research funds to pay private companies like OpenAI to evaluate their private commercial models is certainly not lost on us.”

[R] LLMs Still Can't Plan; Can LRMs? A Preliminary Evaluation of OpenAI's o1 on PlanBench

[R] LLMs Still Can't Plan; Can LRMs? A Preliminary Evaluation of OpenAI's o1 on PlanBench

This is an automated archive made by the Lemmit Bot.