This is an automated archive made by the Lemmit Bot.
The original was posted on /r/machinelearning by /u/marojejian on 2024-10-18 19:36:07+00:00.
Paper:
“while o1’s performance is a quantum improvement on the benchmark, outpacing the competition, it is still far from saturating it…”
The summary is apt. o1 looks to be a very impressive improvement. At the same time, it reveals the remaining gaps: degradation with increasing composition length, 100x cost, and huge degradation when “retrieval” is hampered via obfuscation of names.
But, I wonder if this is close enough. e.g. this type of model is at least sufficient to provide synthetic data / supervision to train a model that can fill these gaps. If so, it won’t take long to find out, IMHO.
Also the authors have some spicy footnotes. e.g. :
“The rich irony of researchers using tax payer provided research funds to pay private companies like OpenAI to evaluate their private commercial models is certainly not lost on us.”