NeurIPS 2024 Highlights Inference Compute Real – World AI Beyond LLMs

NeurIPS 2024 Highlights Inference Compute Real – World AI Beyond LLMs







NeurIPS Showcased

NeurIPS 2024 Showcased Advances in Inference-Time Compute. NeurIPS 2024, the largest in-person AI conference to date, highlighted the growing importance of inference-time compute as a key factor in improving AI performance beyond just scaling model size. OpenAI’s Noam Brown, drawing from his experience with poker bots like Libratus and Pluribus, demonstrated that adding extra computation during inference—letting models “think longer”—can offer improvements comparable to increasing model size and training data by massive factors. For example, Pluribus achieved superhuman performance in multiplayer poker while costing under $150 to train and running efficiently on 28 CPU cores, showing that test-time compute can be a cost-effective way to boost AI capabilities.

Workshops Showcased

Workshops Showcased Real-World AI Applications and Scientific Reasoning. Sunday’s workshops covered a wide range of topics from interpretable AI to Machine Learning for Systems, with notable presentations from startups and researchers pushing AI towards real-world scientific reasoning. Basis, a startup aiming to build the first AI system capable of everyday science through its MARA project, outlined a three-year roadmap to model, abstract, and reason like human scientists. Meanwhile, Sakana AI, inspired by nature’s evolutionary and collective intelligence, presented novel approaches to foundation model interventions. These workshops highlighted the diversity of AI research and the increasing focus on practical, interpretable, and regulatable AI systems.

Math Workshop

Math-AI Workshop Explored AI’s Role in Formal and Informal Mathematics. The Math-AI workshop featured talks from leading researchers including Google DeepMind’s Adam Wagner, Carnegie Mellon’s Jeremy Avigad, and Stanford’s James Zou, emphasizing how AI is transforming mathematical research. Wagner introduced PatternBoost, a method combining transformer language models with local and global search to discover new mathematical constructions, showing that even simple learning methods can yield valuable results. Avigad discussed how machine learning can assist interactive theorem proving by improving premise selection and search, while stressing the importance of user-friendly AI tools for mathematicians. Zou presented projects like The Virtual Lab, where GPT-4 – powered agents collaboratively designed SARS-CoV – 2 nanobodies, with 58 out of 70 AI-generated antibiotic recipes successfully synthesized and validated experimentally. These examples underscore AI’s growing impact on both formal theorem proving and practical scientific discovery.

Math - AI Workshop on AI in Formal and Informal Math.

Inference Scaling

Inference Scaling Laws Reveal Optimal Model Sizes for Compute Budgets. Sean Welleck’s research on inference scaling laws addressed a critical question: how to best allocate a fixed inference compute budget for large language model problem-solving. His empirical analysis showed that using the largest model is not always optimal. Instead, smaller models paired with advanced inference strategies often yield better results under compute constraints. This insight challenges the common assumption that bigger always means better and points toward more nuanced strategies optimizing model size, meta-generation methods, and compute allocation. Welleck’s meta-generation tutorial offered practical guidance for researchers aiming to maximize performance within resource limits.

Inference Scaling Laws for Optimal Model Sizes and Compute.

Formal Mathematical Reasoning Approaching a New Frontier

UC Berkeley’s Dawn Song highlighted formal mathematical reasoning as an AI frontier poised for rapid progress. Her upcoming position paper proposes a taxonomy of autonomy levels for theorem proving, ranging from level 0, which involves checking formal proofs, to level 5, where AI would solve problems and discover new mathematics beyond human capabilities. Currently, existing benchmarks reach only level

3. Song emphasized four key future directions: improving data, algorithms, human-centric tools, and code generation with verification. This structured approach aims to accelerate AI’s ability to handle complex formal reasoning tasks, potentially transforming mathematical research workflows.

Formal mathematical reasoning in AI by Dawn Song.

New Measures and Techniques Address Transformer Limitations

Apple’s Samy Bengio addressed a challenge in AI reasoning: whether transformers can learn deductive reasoning effectively. His lab introduced a metric called globality degree to evaluate task difficulty for transformers trained through stochastic gradient descent (SGD).

Tasks with high globality degree remain hard for standard transformers, and naive methods like agnostic scratchpads do not fully solve this. To tackle these challenges, Bengio’s team developed inductive scratchpads, a new approach that helps transformers manage tasks requiring more global reasoning. This research clarifies current transformer limitations and offers pathways for future improvements in AI reasoning capabilities.

Panel Discussion

Panel Discussion Highlighted Rapid Progress and Future Challenges. A highly anticipated panel at the Math-AI workshop underscored the fast pace of AI progress in mathematics. Panelists noted that the MATH benchmark, once thought difficult to saturate, is now nearly solved by AI systems with some human intervention, rating 7 or 8 out of 10 on a performance scale. However, fully autonomous AI still scores below 1, indicating significant room for improvement. The panel also advised benchmark designers to include a wide difficulty gradient to better capture model capabilities. Experts expect AI to change mathematical practice profoundly but agree humans will remain essential collaborators, echoing patterns seen in chess and Go where human-AI partnerships thrive.

Panel Discussion on AI Progress and Future Challenges in Math.



Summary of NeurIPS 2024 Themes and Impact

NeurIPS 2024 showcased a shift towards leveraging inference-time compute, nuanced model scaling, and AI’s expanding role in scientific and mathematical reasoning. From poker-playing bots that think longer to AI systems designing new antibiotics, the conference illustrated how strategic use of compute and novel algorithms can unlock breakthroughs beyond the current limits of large language models. The event’s workshops and panels emphasized the importance of practical tools, interpretability, and collaboration between humans and AI. With President Donald Trump now in office as of November 2024, the AI community remains focused on advancing these technologies responsibly and innovatively in the coming years.

Leave a Reply