Frontier AI models have crossed a threshold. They no longer merely assist scientists, but now co-design not only which questions to pursue, but how to pursue them. This keynote examines how we can accelerate scientific discovery using these advanced models. Drawing an analogy to Amdahl’s Law, we’ll see how extraordinary speed-ups in hypothesis generation, simulation, and data interpretation collide with bottlenecks in chemistry, fabrication, and field observation, forcing a strategic rebalancing of the entire research pipeline. We’ll explore embedding human values in autonomous goal setting, preserving trust and reproducibility amid synthetic data, and redesigning the workforce to align automated cognition with irreplaceable human judgment. Last, we’ll introduce high-level considerations and concrete actions to collectively explore how we can navigate this rapidly changing landscape.
There is a strategic necessity for hybrid approaches in modern HPC. This talk explores three critical dimensions transforming scientific computing today: the integration of AI with traditional HPC workflows, seamless extension of on-premises infrastructure to the cloud, and the convergence of quantum and classical computing. Through concrete examples from weather forecasting, computational fluid dynamics, and molecular modeling, we’ll demonstrate how AWS’s hybrid solutions deliver measurable advantages, dramatically reducing computation time while providing access to next-generation hardware without capital expenditure constraints. As computational demands grow exponentially across scientific disciplines, hybrid architectures emerge as the practical foundation for accelerating scientific discovery, while enabling organizations to fully leverage the scalability, flexibility, and innovation velocity that cloud computing provides.
Thierry Pellegrino, AWS
Understanding and anticipating complex dynamic behavior is fundamental to both computational social science and the scientific modeling of socio-technical systems. Behaviors of humans and systems in the wild could unfold dynamically — often shaped by diverse contexts and evolving intentions. Yet data capturing real-world behaviors — such as mobility routines, energy use patterns, and decision-making in urban and digital environments — are inherently noisy, context-dependent, and often only partially observed. This talk synthesizes recent progress in understanding behavior at scale through data-driven modeling and simulation, highlighting the convergence of data-efficient learning, generative models, and agentic AI for complex systems analysis.
Drawing from years of work in spatiotemporal data mining and multimodal machine learning, we examine how AI systems are now capable of learning from sparse signals while generalizing across heterogeneous settings. These advances reveal how latent routines, dynamics, and behavioral patterns can be learned without explicit ground-truth supervision. We will also demonstrate the use of LLMs for synthetic data generation. These approaches reflect a shift toward data-efficient, transferable, and context-sensitive models that are aimed at generalization beyond narrow domains.
We also discuss the rise of agentic AI for enabling automated tooling and simulation. We will present our new cyber-physical-social simulation generation framework, enabling automated scenario generation, behavior testing, and what-if analysis. This framework opens new possibilities for integrating empirical data with simulated environments.
Large-scale language and learning models are poised to transform computational science, yet integrating them with exascale simulations, streaming experiments, and deep domain priors remains an open research frontier. This talk draws on RIKEN’s AI4Science effort and recent Fugaku/Frontier deployments to outline a technical agenda for science-grade LLMs: scalable tokenization and sparse attention for multi-terabyte, multi-modal inputs; adaptive patching and sequence-tiling that push ViTs into the billion-token regime; cross-channel and physics-constrained operators that fuse neural surrogates with PDE solvers; and distributed orchestration of ensembles and agent workflows across CPU–GPU–QC hybrids. Performance results — up to 1.8 EF/s and 6.9x speedups — are presented for climate down-scaling, high-resolution 3-D imaging, and materials discovery, followed by open challenges in provenance, uncertainty quantification, and human-in-the-loop steering for petascale-and-beyond ML stacks.
Moderator: Satoshi Matsuoka, RIKEN R-CCS
By leveraging the highest fidelity quantum systems in tandem with HPC, we can now harness previously unobtainable information to expand the scope of AI datasets, enabling fine-tuned, bespoke AI techniques with new capabilities to discover solutions for complex problems in areas like drug discovery and materials science — a bold initiative we call Generative Quantum AI. This talk will shed light on new territory for HPC to revolutionize scientific discovery of commercial relevance by unlocking value at the intersection of quantum computing and AI.
Steve Clark, Quantinuum
Hyperion Research has recently conducted a number of studies on the use of AI at technical computing (or HPC) centers around the world. This presentation will cover the highlights and key findings from these studies, including: AI use cases and the level of AI usage at HPC sites; growth rate of applying AI to technical workloads; budgets, cloud usage, frameworks, use of LLMs, hardware preferences, etc.; and barriers to applying AI more quickly and attributes that sites would like to see improved.
Earl Joseph, Hyperion Research
This talk presents RIKEN’s approach to building a secure, open-source AI infrastructure tailored for scientific and industrial use. We showcase how frontier open models, agentic AI, and RAG pipelines can be deployed locally, integrated tightly with HPC systems and simulation apps, while preserving privacy and performance. We will also share our experiences testing RAGFlow and our plans to set up a model serving stack for applications like the Spring-8 light source. Lastly, we highlight our work to eliminate inefficient cross-process communication by enabling large models to run directly in the memory space of our HPC simulations written in C, C++, or Fortran.
Jens Domke, RIKEN R-CCS
Trillion-parameter reasoning engines promise great cognitive power, yet scientific breakthroughs still hinge on integrating models with experiments, data, and people. Let’s examine an AI-native Scientific Discovery Platform that couples frontier language models to simulation codes, knowledge graphs, and autonomous laboratories, orchestrated by policy-aware scheduling and ultra-low-latency “thought-action” fabrics. The platform will autonomously generate hypotheses, execute in-silico or in-vitro experiments, assess uncertainty, and iteratively refine understanding, all while preserving provenance. I’ll outline needed advances in data logistics, model–simulation co-execution, and trustable evaluation, and argue that careful co-design of infrastructure, benchmarks, and culture is vital to realize this potential.
With Agents set to proliferate across every facet of our lives, we discuss the state-of-the-art and the state of the future. We also explore complex questions of ownership, agency, control, decision-making and outcomes in an agentic world.
The intellectual bottlenecks of science are growing, as evidenced by the increasing complexity of fields, the volume of scientific papers published, and the number of humans involved. To manage this complexity, it is likely that major breakthroughs will increasingly rely on automation of the stages of the scientific method. One approach has been scientific agents — AI models equipped with tools and data to manipulate and observe the world. Such systems are increasingly automating tasks such as literature research, hypothesis generation, and data analysis. They can scale in dimensions beyond what has been previously possible, like checking every claim of a paper against all previous literature for disagreement. In this talk, we’ll examine work at FutureHouse to apply scientific agents across stages of biological research and discovery, including challenges faced in defining well-posed problems, scaling compute, and evaluating AI scientist performance.
Siddharth Narayanan, FutureHouse
Current AI systems, while already useful in accelerating some aspects of scientific research, remain fundamentally limited by their operational architectures, brittle reasoning mechanisms, and separation from reality. Progress in AI-driven science now depends on closing three fundamental gaps — the abstraction gap, the reasoning gap, and the reality gap — rather than on model size/data/test time compute. Scientific reasoning demands internal representations that support simulation of actions and response, causal structures that distinguish correlation from mechanism, and continuous calibration. A vision is proposed for “active inference” AI systems: a multi-layered stack where discovery arises from the interplay between internal models that enable counterfactual reasoning and external validation that grounds hypotheses in reality. It is also argued that human judgment is indispensable, not as a temporary scaffold but as a permanent architectural component.
Karthik Duraisamy, University of Michigan
To set the stage, we will cover irresponsible AI: (1) discrimination (e.g., facial recognition, justice); (2) pseudocience (e.g., biometric based predictions); (3) limitations (e.g., human incompetence, minimal adversarial AI); (4) indiscriminate use of computing resources (e.g., large language models); and (5) the impact of generative AI (disinformation, mental health and copyright issues). These examples do have a personal bias, but set the context for the second part, where we address three challenges: (1) principles and governance; (2) regulation; and (3) our cognitive biases. We will finish by discussing responsible AI initiatives and the near future.
Moderator: Ricardo Baeza-Yates, Barcelona Supercomputing Center
Oak Ridge National Laboratory’s Artificial Intelligence Initiative is driving the advancement of AI methods to accelerate discovery and innovation in science, energy, and national security. With access to world-class computational resources, including the Frontier exascale system, the initiative prioritizes the development of AI foundation models and adaptive AI systems tailored to non-language modalities such as time series, spatial-temporal, multimodal sensor data, and physics-based simulations. These efforts integrate physics-informed learning, uncertainty quantification, and causal reasoning to enable robust, explainable AI applications in complex scientific environments. The initiative supports a diverse portfolio and industry collaborations, spanning strategic domains such as nuclear energy, materials discovery, and national security. It also plays a key role in workforce development through the AI Academy, which engages over 200 researchers across directorates. Through targeted investments and cross-cutting coordination with ORNL’s experimental facilities, the AI Initiative is enabling next-generation AI systems that go beyond language and LLMs to transform and accelerate scientific discovery.
Prasanna Balaprakash, Oak Ridge National Laboratory
As LLMs gain adoption in higher-stakes scenarios, it is critical to understand why they generate certain responses. To address this challenge, we developed OLMoTrace — a system that traces outputs of LLMs back into their multi-trillion-token training data in real time. Given an LLM response to a user prompt, OLMoTrace finds long and unique spans in this response that appear verbatim in the training data, and shows users these spans and their enclosing training documents. The purpose is to reveal where LLMs may have learned to generate certain word sequences. By combining algorithmic innovations and low-level system optimizations, our production system can return tracing results (i.e., spans and documents) in 4.5 seconds for typical LLM responses (~450 tokens), making it a real-time experience for users. We found OLMoTrace to be useful for fact-checking LLM outputs, understanding LLM hallucinations, tracing the “creative expressions” generated by LLMs, tracing some of their math capabilities, debugging erratic model outputs, etc. OLMoTrace is available in the public Ai2 model playground so that anyone can use this tool to trace the outputs of OLMo 2 and Tulu 3 models. To our knowledge, OLMoTrace is the first system to scale up model behavior tracing beyond the trillion-token ballpark. Complementary to the body of work in mechanistic interpretability that traces LLM outputs into model weights and circuits, OLMoTrace traces directly into the training data, serving as an important piece in our model understanding toolbox.
Jiacheng Liu, Allen Institute for AI
GeoAI foundation models hold great promise for applications in diverse fields such as disaster prevention, environmental monitoring, and urban planning. However, their widespread adoption is hindered by significant challenges, including issues related to data quality, model transferability across regions, and the presence of geographical bias. A critical issue that warrants attention is the measurement of geo-bias in these models. Addressing this is essential for ensuring fairness, robustness, and generalizability. In this context, it is important to explore the current limitations and future directions for developing effective geo-bias metrics, while also discussing the broader challenges and opportunities for achieving sustainable and equitable GeoAI deployment.
Kyoung-Sook Kim, National Institute of Advanced Industrial Science and Technology (AIST)
This cross‑sector panel convenes leaders from industry, academia, national laboratories, and government to chart a collaborative roadmap for trustworthy AI for science. We will dissect the technical and governance hurdles that impede reliable scientific AI and debate the institutional mechanisms — algorithmic improvements, shared testbeds, open data standards, federated model hubs, and national infrastructure — needed to overcome them. Panelists will assess how near‑term breakthroughs in computing, advanced accelerators, and HPC-AI convergence could redefine algorithmic efficiency and scientific discovery over the next 3–5 years, while confronting inequities in compute access across sectors.
Moderator: Karthik Duraisamy, University of Michigan
Hal Finkel, U.S. Department of Energy
Raj Hazra, Quantinuum
Pradeep Dubey, Intel
Molly Presley, Hammerspace
Agentic AI systems — collections of autonomous, goal-seeking entities that plan, act, and learn across open-ended environments — have moved from research prototypes to production pipelines, yet the field still lacks a shared formal definition. This panel convenes builders of agentic platforms from national labs, academia, and industry to dissect what differentiates an ‘agent’ from a sophisticated subroutine, compare architectural families (LLM-centric tool use, multi-agent swarms, hybrid symbolic-neural controllers), and discuss design and implementation trade-offs among scalability, maintainability, security, and standards-driven interoperability. Through concrete case studies the speakers will expose design heuristics, failure modes, and metrics that matter, offering the TPC25 community a roadmap for considering frameworks and hardening agentic workflows for scientific discovery.
Moderator: Addison Snell, Intersect360 Research
Elahe Vedadi, Google DeepMind
Preeth Chengappa, Microsoft Discovery
Kexin Huang, Stanford University (Boimni project)
Arvind Ramanathan, Argonne National Laboratory (Scientia project)
Siddharth Narayanan, FutureHouse (Agentic life sciences project)
LLMs are not mechanical tools that provide the same benefit for everyone. Rather, they are intellectual tools that interact with human culture and creativity. The influence is mutual, as not only are the models affected by the data we train on, but our culture and the data we generate will be influenced by LLMs. Therefore, it is paramount to develop sovereign AI models that adhere to our cultural norms and acquire the technology to develop such models independently. This talk will showcase some of the major efforts to train Japanese LLMs, focusing on training data, training methodologies, and challenges.
Rio Yokota, Institute for Science Tokyo
Recent advancements have positioned Large Language Models (LLMs) as transformative tools for scientific research, capable of addressing complex tasks that require reasoning, problem-solving, and decision-making. Their exceptional capabilities suggest their potential as scientific research assistants, but also highlight the need for holistic, rigorous, and domain-specific evaluation to assess effectiveness in real-world scientific applications.
First, this talk motivates and describes the current effort at Argonne National Laboratory to develop a multifaceted methodology for evaluating AI models as scientific Research Assistants (EAIRA). This methodology incorporates four primary classes of evaluations: 1) Multiple Choice Questions to assess factual recall; 2) Open Response to evaluate advanced reasoning and problem-solving skills; 3) Lab-Style Experiments involving detailed analysis of capabilities as research assistants in controlled environments; and 4) Field-Style Experiments to capture researcher-LLM interactions at scale in a wide range of scientific domains and applications. For each of these four classes of evaluation, we develop testing methods (e.g., benchmarks) and tools for manual and automatic QA generation and validation, as well as for collecting and analyzing researcher-LLM interactions.
We will present a selection of tools and generated benchmarks, as well as the early analysis of the largest Field-Style Experiments to date (the 1,000 Scientists AI JAM). These complementary methods enable a comprehensive analysis of LLM strengths and weaknesses with respect to their scientific knowledge, reasoning abilities, and adaptability. Although developed within a subset of scientific domains, the methodology is designed to be generalizable to a wide range of scientific domains.
Franck Cappello, Argonne National Laboratory
Today’s frontier models, with ASI-class systems on the horizon, can deliver the greatest scientific returns for humanity. This panel of community leaders will summarize a series of discussions in order to stimulate the community to identify and examine areas of science that can be meaningfully accelerated, ultimately identifying a few grand challenges whose solutions multiply benefits across society. Examples might include decarbonizing the global energy system, solutions to inequality and scarcity, overcoming threats to health that cause enormous human suffering, and removing the threat of nuclear weapons. Such a dialog is needed to distill a prioritized research portfolio in terms of challenges that commercial and public sector (academia, national laboratories) AI and science communities can pursue immediately in order to turn AI’s accelerating capabilities into tangible, equitable gains.
Moderator: Charlie Catlett, Trillion Parameter Consortium
Ian Foster, Argonne National Laboratory
Karthik Duraisamy, University of Michigan
Satoshi Matsuoka, RIKEN R-CCS
Thierry Pellegrino, AWS
TPC25 breakout groups are designed to identify, form, and pursue collaborations that will accelerate the development of new AI capabilities and services for scientific discovery. Some sessions are organized by TPC working groups, others are prospective working groups or birds-of-a-feather gatherings. Each session comprises a small set of lightning talks followed by group discussion, and all TPC25 participants are encouraged to submit lightning talk proposals.
The six-way parallel breakout schedule is loosely organized around six themes: Workflows, Initiatives, Life Sciences, Evaluation, Scale and Services, and Applications.
Each track takes place in a different Working Group room: the main Plenary room, Cedar, Sierra, Cascade, Siskyou, and Donner (look for digital signage outside room doors).
Ian Foster, Argonne National Laboratory and University of Chicago
Neeraj Kumar, Pacific Northwest National Laboratory
Robert Underwood, Argonne National Laboratory
Ravi Madduri, Argonne National Laboratory
This multi-session track explores emerging systems and strategies for building intelligent, scalable platforms to accelerate scientific discovery. Talks and discussions will cover the design of agent-based architectures, integration of scientific workflows with large language models, scalable data pipelines, and novel reasoning frameworks. Emphasis will be placed on domain-specific applications spanning biology, climate, and materials, highlighting new approaches to real-time discovery, autonomous labs, and LLM-driven scientific tools. Participants will engage in dialogue on the future of scientific AI infrastructure and the coordination required to realize a distributed, agent-enabled discovery ecosystem.
A Case Study of the System Software/Middleware Needs for Agents
Ian Foster (University of Chicago and Argonne National Laboratory)
A Case Study of the System Software/Middleware Needs for Agents
Robert Underwood (Argonne National Laboratory)
Enabling Autonomous Labs: The NSDF-ORNL Partnership for Real-time Scientific Discovery
Michela Taufer (University of Tennassee at Knoxville)
Academy: Empowering Scientific Workflows with Federated Agents
Kyle Chard (University of Chicago)
LangChain-Parsl: Connect Large Language Model Agents to High Performance Computing Resources
Heng Ma (Argonne National Laboratory)
Building AI Scientific Assistants for Accelerating Understanding of Complex Biological Systems
Arvind Ramanathan (Argonne National Laboratory)
A Grassroots Network and Community Roadmap for Interconnected Autonomous Science Laboratories for Accelerated Discovery
Rafael Ferreira da Silva (Oak Ridge National Laboratory)
Integrating Data and AI to Advance Earth System Predictability
Po-lun Ma (Pacfic Northwest National Laboratory)
LLM Agent-based Code Translation for Low Resource Programming Languages
Le Chen (Argonne National Laboratory)
ChatVis: Automating Scientific Visualization with a Large Language Model
Tanwi Mallick (Argonne National Laboratory)
Towards Enhancing Reliability in Agentic Scientific Workflows
Amal Gueroudji (Argonne National Laboratory)
From Models to Missions: The Path of Agentic AI
Nelli Babayan (Microsoft)
Agents and the Model Context Protocol
Ravi Madduri (Argonne National Laboratory)
Towards Scalable Memory Runtimes for LLM Agents with DataStates
Avinash Maurya (Argonne National Laboratory)
Imperfect Recognition: A Study of OCR Limitations in the Context of Scientific Documents
Chinmay Sahasrabudhe (Sandia National Laboratories)
Why Use Model Context Protocol (MCP) in Scientific Application Domains?
Eliott Jacopin (RIKEN Center for Biosystems Dynamics Research)
Empowering Scientific and Supercomputing Users and Their Workloads Using Context Engineering and Domain Specific Models (DSMs)
Rodolfo Tonoli (Articul8 AI)
Daniel Ratner, SLAC
Large-scale scientific facilities generate extensive “living documents”: logbooks, wikis, hardware/software standards, etc. Large language models are promising tools for both extracting information from and maintaining these documents, with example applications including improving search, educating new employees, and generating shift summaries. However, the wealth of highly specialized terminology as well as multi-modal information complicates the direct application of existing LLMs. This session will discuss opportunities, challenges, and existing solutions, and search for ways to collaborate across a range of DOE facilities.
AI Systems for Technical Logbooks
Aaron Reed (SLAC National Accelerator Laboratory)
Enhancing APS Logbook Search with Retrieval-Augmented Generation Models
Rajat Sainju (Argonne National Laboratory)
LCLS-Elog-Copilot: An AI Agent to Navigate LCLS Experiment Metadata
Cong Wang (SLAC National Accelerator Laboratory)
LLMs for Dark Matter: Knowledge Management Across Multi-Modal Silosed Resources
Maria Elena Monzani (SLAC and Stanford University)
LLMs for Living Docs
Kuktae Kim (SLAC National Accelerator Laboratory)
Kibaek Kim, Argonne National Laboratory
Hendrik Hamann, Stony Brook University and Brookhaven National Laboratory
Hongwei Jin, Argonne National Laboratory
GridFM (Grid Foundation Model) is a fast-growing, community-driven initiative uniting researchers from national labs, academia, and industry to develop foundation models tailored to the electric grid. Unlike typical foundation models, GridFM focuses on graph-based models pretrained on multi-modal grid data to capture the system’s complexity and dynamics. This session will introduce GridFM to the TPC community, present the motivation for domain-specific foundation models, and feature invited talks highlighting early technical progress and real-world applications. An open forum will follow to discuss shared challenges in data, modeling, software, and infrastructure, and explore connections with HPC, scientific ML, and energy systems modeling. Researchers in AI/ML, scalable algorithms, and complex systems are invited to join and help shape the future of GridFM.
Designing a Unified Data Structure for Multi-task Training of GridFM
Zhirui Liang (Argonne National Laboratory)
Federated Learning Framework for Collaborative Training of Electric Grid Foundation Models
Yijiang Li (Argonne National Laboratory)
Foundation Models for the Electric Grid
Hendrik Hamann (Stony Brook University and Brookhaven National Laboratory)
GridFM Models for Distribution System State Estimation
Stefano Fenu (Argonne National Laboratory)
GridFM: A Foundation Model for Power Grid Intelligence via Heterogeneous GNNs
Hongwei Jin (Argonne National Laboratory)
The First Set of Domain-specific GenAI Models for Electric and Power Systems to Accelerate the Open Power AI Consortium (OPAI)
Rodolfo Tonoli (Articul8 AI)
Raghu Machiraju, The Ohio State University
DK Panda, The Ohio State University
Zhao Zhang, Rutgers University
This session will explore the use of AI models across the computing continuum, highlighting diverse application domains and key technical challenges. Participants will discuss current obstacles in data collection, model training, and deployment workflows, particularly in distributed environments. The session will also focus on how the NSF-funded ICICLE AI Institute components can be rapidly adapted to meet TPC community requirements, including domain-specific data curation and labeling, scalable and distributed model training, and robust evaluation for bias and alignment with scientific goals. In addition, we will examine the role of agentic workflows in scientific computing, identifying reusable patterns that can be instantiated within ICICLE and deployed effectively on HPC systems.
Francis Alexander, Argonne National Laboratory
Peter Nugent, Lawrence Berkeley National Laboratory
Suzanne Pierce, U. Texas (Austin) / TACC
This Birds of a Feather session explores how AI models can transform high-consequence decision-making under uncertainty. Building on the consortium’s collaborative development of foundation models for science and engineering, we’ll examine applications in disaster response, supply chain resilience, pandemic management, and other areas. The discussion will connect TPC’s work on scalable architectures, scientific data curation, and exascale optimization to breakthrough capabilities in time-critical decision support. Participants from industry, academia, and national laboratories will explore how TPC’s multi-institutional approach can accelerate next-generation decision support systems while maintaining the rigor essential for high-stakes applications.
AI in Epidemiology
Peter Nugent (Lawrence Berkeley National Lab)
Satisficing at Scale: How AI Connects Community Knowledge with Scientific Models for Actionable Solutions
Suzanne A Pierce (Texas Advanced Computing Center, The University of Texas at Austin)
The Wall Confronting Large Language Models
Peter Coveney (Argonne National Laboratory and University College London)
Urgent AI for Decision Science
Manish Parashar (Scientific Computing and Imaging Institute, University of Utah)
Joshua Tan, Metagov
Nick Vincent, Simon Fraser University
Avani Wildani, Cloudflare
Governments around the world are beginning to invest in public AI — AI built and maintained as public infrastructure. New initiatives like AuroraGPT (U.S.), SEA-LION (Singapore), and the EU’s AI Gigafactories signal growing recognition that public institutions have a critical role to play in the next generation of AI systems. Meanwhile, national labs are being asked to take on more: building sovereign capabilities, supporting regulated sectors, and producing high-trust, mission-aligned models. This breakout introduces the emerging public AI movement and key proposals such as “Airbus for AI” and “CERN for AI.” It aims to provide national lab leaders with a strategic view of the policy landscape — beyond narrow regulation — and to catalyze coordination across jurisdictions. Blending technical and institutional design challenges, this session may seed a working group focused on aligning national labs with the new public AI infrastructure agenda.
Natalie Bates, Lawrence Berkeley National Laboratory
Siddhartha Jana, Intel
As AI workloads and HPC systems scale to new extremes, energy efficiency has become a defining challenge for sustainable computing. This BOF, led by the Energy Efficiency HPC Working Group (EE HPC WG), brings together experts from computing centers, industry, and research institutions to explore recent advances in sustainable infrastructure, including liquid cooling innovations, facility energy reuse, alternative power sources, and software-driven energy optimization. Participants will discuss trends in AI/HPC operations, insights from systems like Fugaku, and strategies for integrating sustainability into procurement and operational policies. The session invites open dialogue on actionable approaches for reducing energy and environmental impacts across the global HPC and AI ecosystem.
AI & HPC Facility Trends
Jason Hick (Los Alamos National Laboratory)
Balancing Power and Performance for AI and ML on Heterogeneous HPC Systems
Brice Videau (Argonne National Laboratory)
Arvind Ramanathan, Argonne National Laboratory
Tom Brettin, Argonne National Laboratory
Miguel Vazquez, Barcelona Supercomputing Center
Silvia Crivelli, Lawrence Berkeley National Laboratory
Heidi Hanson, Oak Ridge National Laboratory
This track will focus on the development of foundation models / large language models and agentic systems for biology. Given the shared interests and the broader implications for how AI models and agentic systems can potentially alter the scope of biological research, the goal of this session is to catalyze discussions and build collaborations in areas such as: (1) how to build shared datasets for creating a rich repertoire of downstream evaluation tasks for foundation models; (2) discuss and develop shared strategies for model sharing and scoping across diverse biological applications; (3) evaluate approaches towards incorporating robust strategies to reflect implicitly on the bias and trust/safety into the context of biological data; and (4) how to develop agentic systems for a variety of tasks ranging from discovery to laboratory operations. The track will include multiple topic areas in each 90-minute breakout, including Agentic systems, AI for Cancer, and emerging opportunities at the intersection of HPC, AI, biomedicine and precision population health. Agentic AI systems in particular offer a novel paradigm for simulating complex disease trajectories and personalizing interventions at scale. Balancing predictive accuracy with explainability poses a core challenge. Advancing this field requires coordinated innovation in AI algorithms, data infrastructure, and HPC workflows for biological and bio-medical research and healthcare.
Scalable AI for Pediatric Cancer: Unlocking Precision Medicine from Discovery to Treatment
Ninad Oak (St. Jude Children’s Research Hospital)
Global Federated Learning Enabled by the Planet9 Ecosystem
Xun Zhu (St. Jude Children’s Research Hospital)
Accelerating Peptide Binder Design for Cancer Using Generative AI and Multiscale Simulations
Matt Sinclair (Argonne National Laboratory)
Antibody Design Using Preference Optimization and Structural Inference
Archit Vasan (Argonne National Laboratory)
Visualizing Collaborative Intelligence
Bharat Kale (Argonne National Laboratory)
Biomni: A General-purpose Biomedical AI Agent
Keixin Huang (Stanford University)
Cracking Shells: Streamlining MCP Server Management for Scientific Software
Jacopin Eliott (RIKEN Center for Biosystems Dynamics Research)
Biological Reasoning System (BioR5): A Three-Layer AI Architecture
Peng Ding (Argonne National Laboratory)
Leveraging AI-driven Protein Structure Prediction to Decode Phosphorylation Effects on CSF1R Kinase Domain Dimerization
Moeen Meigooni (Argonne National Laboratory)
Workflow for Fine-tuning Genome-scale Language Models for Generative Enzyme Design
Xinran Lian (Argonne National Laboratory)
Doug Norton, The HPC-AI Society
Hear from TPC Executive Director Charlie Catlett about TPC’s work in advancing AI for science and engineering, and discuss how the two non-profit, vendor neutral organizations, TPC and The HPC-AI Society, are holistically collaborating to advance common goals through their respective communities.
Ravi Madduri, Argonne National Laboratory
Jason Haga, Advanced Industrial Science and Technology (National Institute of Advanced Industrial Science and Technology (AIST))
Feiyi Wang / Sahil Tyagi, Oak Ridge National Laboratory
Miguel Vazquez, Barcelona Super Computing Center
In this BOF, we will explore the application of Federated Learning in building foundational AI models for science, as well as the technical policy challenges in training foundational models across the geographical boundaries. We will specifically focus on identifying computing, networking, and other challenges in training these models across different scientific domains. We will focus on identifying key bottlenecks in compute, networking, and coordination, and what it would take to overcome them. We’ll also look at how these challenges play out across different scientific domains, and what’s needed to make federated model training a practical reality for the broader research community.
Differentially Private Federated Averaging with James-Stein Estimator
Xinran Zhao (Arizona State University)
Federated Learning at Scale: Privacy-preserving Collaboration on Frontier Supercomputer
Olivera Kotevska (Oak Ridge National Laboratory)
Scalable Federated Learning Across DOE HPC Clusters
Yijiang Li (Argonne National Laboratory)
Scaling Secure Collaboration: Real-World Federated Learning with FLARE
Chester Chen (NVIDIA)
SR-APPFL: Scalable and Resilient Advanced Privacy-Preserving Federated Learning
Xiaoyi Lu (University of California, Merced)
SyftBox: A Networked Protocol for Federated Training with Minimal Coordination
Irina Bejan (OpenMined)
Training Scalability of the APPFL Framework
Zilinghan Li (Argonne National Laboratory)
Franck Cappello, Argonne National Laboratory
Sandeep Madireddy, Argonne National Laboratory
Javier Aula-Blasco, Barcelona Supercomputing Center
One of the main thrusts behind the rapid evolution of LLMs is the availability of benchmarks that assess the skills and trustworthiness of LLMs. Not only do they enable a rigorous evaluation of LLMs skills and trustworthiness from accepted metrics, but they also generate competition between LLM developers. While several frameworks/benchmarks have emerged as de facto standards for the evaluation of general-purpose LLMs (Eleuther AI Harness and HELM for skills, DecodingTrust for trustworthiness), only very few of them specifically are related to science. In this segment, we will discuss the challenges of developing methods to evaluate the skills, trustworthiness, and safety of large Foundation Models for science This track will include multiple sessions focused on different facets of model evaluation.
Guarding the Future: Advancing Risk Assessment, Safety Alignment, and Guardrail Systems for AI Agents (Session Keynote)
Bo Li (University of Illinois Urbana-Champaign and Virtue AI)
EAIRA: Establishing a Methodology for Evaluating AI Models as Scientific Research Assistants
Franck Capello (Argonne National Laboratory)
SciCode: A Research Coding Benchmark Curated by Scientists
Eliu Huerta (Argonne National Laboratory)
LLM Evaluation on Biological Science
Shinjae Yoo (Brookhaven National Laboratory)
Astrophysics Benchmarking of LLMs
Nesar Ramachandra (Argonne National Laboratory)
Exploring the Capabilities of the Frontier Large Language Models for Nuclear Energy Research
Prasanna Balaprakash (Oak Ridge National Laboratory)
Adarsha Balaji (Argonne National Laboratory)
UProp: Investigating the Uncertainty Propagation of LLMs in Multi-step Agentic Decision-Making
Kaidi Xu / Jinhao Duan (Drexel University)
Robustness and Safety-constrained Generation
Ferdinando Fioretto (University of Virginia)
Double-blind Evaluation via GPU Enclaves: A Path to Trustworthy Model Assessment
Irina Bejan (OpenMined)
Evaluating Probability Consistency and Trust in Large Language Models
Bradley Love (Los Alamos National Laboratory)
Automated Multiple Choice Question Answering Benchmark Generation and Model Evaluation
Ozan Gokdemir (Argonne National Laboratory)
LLM Judges
Neil Getty (Argonne National Laboratory)
DoReMi: Difficulty-oriented Reasoning Effort Modeling of Science Problems for Language Models
Cong Xu (HPE)
Evaluation of Multimodal Understanding in Foundation Models for Geo-Spatial Data
Tanwi Mallick (Argonne National Laboratory)
End-to-end Evaluation: Lab Style and Field Style Experiments (And 1000 Scientists AI JAM)
Franck Capello (Argonne National Laboratory)
Prioritizing Skills and Capabilities for Science Assistant Evaluation
Patrick Emami (National Renewable Energy Lab)
Semibench: A Microtask Benchmark for Semiconductor Manufacturing AI Agents
Angel Yanguas-Gil (Argonne National Laboratory)
Evaluating Agentic AI for Science: Insights from ChemGraph
Thang Duc Pham (Argonne National Laboratory)
Diane Oyen, Los Alamos National Laboratory
Bradley Love, Los Alamos National Laboratory
Ayan Biswas, Los Alamos National Laboratory
For SciML models to be trustworthy and broadly deployable, we must balance accuracy, complexity, and computational cost. Like LLMs, SciML faces “data walls” where scale alone yields diminishing returns — prompting growing interest in models capable of genuine reasoning. But what counts as reasoning in SciML? Unlike LLMs, which reward-learn response patterns, scientific reasoning demands adherence to physical laws, logical transparency, and uncertainty quantification. This includes symbolic derivation, solver integration, and interpretable chains of thought. Yet, embedding constraints can reduce flexibility, while data-driven models risk losing interpretability. Navigating this space requires community-developed evaluation frameworks — and clarity on what “reasoning” means across disciplines — to distinguish brute-force prediction from scientific understanding and advance robust, insightful AI for science.
Challenges in Reasoning in Scientific Machine Learning with PDEs
Siddharth Mansingh (Los Alamos National Laboratory)
Evaluating Probability Consistency and Trust in Large Language Models
Bradley Love (Los Alamos National Laboratory)
Weicheng Huang, National Center for High Performance Computing
Venkat Vishwanath, Argonne Leadership Computing Facility
Aleksi Kallio, IT Center for Science (CSC)
Dan Stanzione, Texas Advanced Computing Center
As foundation models and domain-specific AI systems gain traction in science, HPC centers are actively developing infrastructure to support scalable, high-performance inference services. This session convenes leaders from labs, vendors, and the open-source community to share experiences, challenges, and emerging best practices in deploying inference for science. Topics include integration of inference with simulations and workflows, support for diverse architectures, sustainability of open-weight models, supporting proprietary models, and software stacks tuned for reliability and reproducibility. Training and workforce development will also be addressed, reflecting the growing institutional investment in upskilling staff and users. The session will gather use cases, propose next steps — including a community webpage and Slack/Discord channel — and catalyze collaboration across the TPC community.
Persistent Inference Services for Science: Progress at TACC to Date
Dan Stanzione (Texas Advanced Computing Center / The University of Texas at Austin)
Exploring GPT-based AI services at Pitt and PSC to Support Scientific Research
Barr von Oehsen (Pittsburgh Supercomputing Center)
Leveraging a First-party Inference Service for HPC User Support
Mitja Sainio (CSC – IT Center for Science)
Serving AI Models on NCSA’s HPC Systems
Volodymyr Kindratenko (National Center for Supercomputing Applications)
The AI RAP and its Integration with Heterogeneous Inference Accelerators
Yun-Te Lin (National Center for High-performance Computing)
Deploying Scalable Inference Endpoints for Science at ALCF
Venkatram Vishwanath (Argonne National Laboratory)
Scaling In-memory Computing to Data Center Levels for Fast and Efficient Science Inference
Satyam Srivastava (d-Matrix)
Challenges and Strategies for Deploying Scalable AI Inference
Panogiotis Kourdis (Intel)
Generative AI Inference at Scale: A World of Trade-offs
Darshan Gandhi (SambaNova)
Rio Yokota, Institute of Science Tokyo
Murali Emani, Argonne National Laboratory
Architectures for AI models are evolving rapidly, with frequent innovations in transformer variants, mixture-of-experts extensions, and state-space models. Frameworks like Megatron-LM, DeepSpeed, and their forks support different architectures, parallelism strategies, and system optimizations. Identifying the optimal architecture and framework for training trillion-parameter models on scientific data is vital to unlocking the next generation of AI for science. Equally crucial is efficient inference, which enables the practical use of pre-trained models in downstream scientific applications. This multi-session track will bring together researchers and practitioners to discuss cutting-edge strategies for large-scale training, inference, and test-time scaling, alongside robust methods.
Performance Modeling and System Design Insights for AI Foundation Models
Shashank Subramanian (Lawrence Berkeley National Laboratory)
Characterizing GPU Memory Errors: Insights from a Cross-supercomputer Study
Lishan Yang (George Mason University)
Design Choices for Compute-Efficient Mixture-of-Expert Models
Daria Soboleva (Cerebras Systems)
X-MoE: Enabling Scalable Training for Emerging Mixture-of-Experts Architectures on HPC Platforms
Minjia Zhang (University of Illinois at Urbana-Champaign)
Architecture of AERIS, an Argonne Earth Systems Model
Väinö Hatanpää (Argonne National Laboratory)
MegaFold: System-level Optimizations for Accelerating Protein Structure Prediction Models
Minjia Zhang (University of Illinois at Urbana-Champaign)
Diamond: Democratizing Large Foundation Model Training for Science
Zhao Zhang (Rutgers University)
Communication-efficient Large Language Model Optimization
Zhao Zhang (Rutgers University)
AI-powered Performance Insights – From Data to Predictions
Ashwin M. Aji (AMD)
Agentic Systems on SN40L Dataflow Architecture
Darshan Gandhi (SambaNova)
Machine Learning-guided Memory Optimization for DLRM Inference on Tiered Memory
Dong Li (University of California Merced)
Eliminating Communication in LLM Training via Generic Tensor Slicing and Overlapping
Dong Li (University of California Merced)
Phoenix: Enabling Sparse Fine-tuning for Foundation Model Downstream Tasks on Cerebras
Wenqian Dong (Oregon State University)
Quantum/HPC Hybrid Solutions in the Cloud
Sebastian Hassinger (AWS)
Anshu Dubey, Argonne National Laboratory
Valerie Taylor, Argonne National Laboratory
Pete Beckman, Northwestern University
Generative AI has rapidly evolved, now demonstrating strong performance in scientific code generation, refactoring, and formal verification. This session highlights recent breakthroughs, including agentic systems that plan and refine scientific workflows, execution-guided code generation, domain-specific LLMs for HPC and HEP applications, and tools for energy-aware refactoring and formal specification synthesis. Talks will explore datasets, multi-agent pipelines, and inference-time feedback loops that dramatically improve correctness and usability. We will discuss how these methods are accelerating software development across science domains, marking a shift from early experimentation to practical, performant systems that redefine how we build and verify scientific code.
Agentic Systems for Scientific Code Generation
Tanwi Mallick (Argonne National Lab)
CelloAI: Leveraging Large Language Models for HPC Software Development in High Energy Physics
ChatHPC, AI-assistance for HPC Programming and Software Ecosystem
CLEVER: A Benchmark for Two-staged, End-to-end Verified Code Generation
LASSI-EE: Automated Energy-aware Refactoring of Parallel Scientific Codes Using LLMs
Matthew Dearing (University of Illinois Chicago and Argonne National Laboratory)
LLM Agent-based Code Translation for Low Resource Languages
William Tang, Princeton Particle Physics Laboratory
Shantenu Ja, Rutgers University and Princeton Particle Physics Laboratory
The aim of Fusion Foundation Models (FFMs) is to transform fusion‑energy research and accelerate commercialization this decade. Embedding fundamental physics directly within associated architectures enables overcoming scarce experimental data and extrapolating while adhering to validation, verification, and uncertainty quantification principles. These models will deliver predictive accuracy, operational intelligence, and design optimization across the fusion lifecycle. Priorities include delivering physics‑informed, multimodal architectures, enabling human‑AI-enabled co‑discovery, and orchestrating specialized AI agents that ensure scalability, robustness, real‑time response, and strict ethical and safety compliance. This BoF surveys the current landscape of FFM development and deployment, probing applications for real‑time control and prediction in complex scientific settings. Participants will map actionable, collaborative opportunities, positioning fusion as a compelling proving ground and inviting active engagement in the science community.
AI for Fusion Diagnostics, Control and Scientific Discovery
Egemen Kolemen (Princeton University and Princeton Plasma Physics Laboratory)
AI for Scientific Control in Magnetic Fusion Energy
William Tang (Princeton University)
Foundation Models in Fusion Energy Simulation and Experiment
Michael Churchill (Princeton Plasma Physics Laboratory)
Surrogate Model of First Principle Simulations of Fusion Plasma
Xishuo Wei (University of California, Irvine)
Surrogate Models as Essential Building Blocks for Fusion Foundation Models
Alvaro Sanchez-Villar (Princeton Plasma Physics Laboratory)
Tokamaks and Tokenization: Enabling the Foundation Model Scale with Electron Cyclotron Emission Imaging Data
Jesse Rodriguez (Oregon State University)
Eliu Huerta, Argonne National Laboratory and University of Chicago
Samuel Blau, Lawrence Berkeley National Laboratory
This session explores the transformative role of generative AI and autonomous agents in accelerating discovery in materials science. As this field increasingly adopts data-centric approaches, the session will focus on four key themes: (1) the generation and curation of high-quality, interoperable datasets spanning simulations, synthesis protocols, and experimental measurements; (2) the development of foundation models and domain-specific AI systems, including multimodal LLMs tailored to scientific data; (3) the integration of AI agents with robotic platforms and autonomous labs, enabling self-driving experimentation, real-time decision-making, and natural language control of complex workflows; and (4) the application of these technologies in real-world discovery.
Accelerating Discovery of Novel Materials Using AI
Geetika Gupta (NVIDIA)
ChemGraph: Automating Computational Chemistry with Agentic AI
Thang Pham (Argonne National Laboratory)
Domain-Aware Data Compression for Scientific Instruments
Amarjit Singh (RIKEN (R-CCS))
How Can AI for Materials Science Reach Internet Scale?
Anuroop Sriram (Meta)
Machine Learning Force Fields and Generative Models for Atomistic Simulations: Navigating Speed, Accuracy, and Scalability Trade-offs in the Age of (Some) Large-scale Scientific Data
Aditi Krishnapriyan (UC Berkeley)
Scaling Deep Learning for Materials Discovery
Valerie Taylor, Argonne National Laboratory
Jason Haga, Advanced Industrial Science and Technology (AIST)
Javier Aula-Blasco, Barcelona Supercomputing Center
Claudio Domenico Arlandini, CINECA
Over the past year, the accelerating integration of AI into scientific research has intensified demand for a skilled, AI-fluent scientific workforce. Institutions worldwide have launched new training programs, curricula, and upskilling initiatives to prepare both early-career researchers and established staff to effectively harness AI for discovery. These efforts are producing valuable insights into what pedagogical strategies succeed — and where gaps remain. As scientific domains rapidly adopt LLMs and foundation models, building a diverse, globally connected, and continuously learning workforce is both a strategic imperative and a social responsibility. This session will explore how the Trillion Parameter Consortium can support inclusive, cross-disciplinary workforce development through governance, shared curricula, and collaborative training infrastructures.
ADAPT PA: Giving Students Computing Skills for Every Career Path
Barr von Oehsen (Pittsburgh Supercomputing Center)
Code Green Jam: Peer Student Learning to Write Energy-Efficient Parallel Code
Lessons Learned from Vibe Coding Using Warp
Po-Lun Ma, Pacific Northwest National Laboratory
Foundation models are reshaping how we simulate, understand, and interact with the Earth system. This session explores how large-scale AI models accelerate Earth system prediction, transform data assimilation and coupling, and enable new forms of discovery and decision support. We invite contributions on digital Earth development, analysis, and prediction, and LLM-based tools for querying, debugging, or orchestrating digital twins. Case studies, open-source tools, and critical perspectives on trustworthiness, reproducibility, and interpretability are welcome. We will discuss the challenges and opportunities at the intersection of AI and Earth science, and their role in shaping next-generation digital Earth infrastructure and Earth system sciences.
Characterizing Extreme Weather in a Huge Ensemble of Machine Learning Weather Forecasts
Ankur Mahesh (UC Berkeley and Lawrence Berkeley National Laboratory)
Climate in a Bottle: Steerable, Foundational Climate State Sampling at 13 Megapixel Ambition
Mike Pritchard (NVIDIA)
Fairness of Geospatial Foundation Models
Kyoungsook KIM (National Institute of Advanced Industrial Science and Technology (AIST))
Foundation Models for Earth Systems
Hendrik Hamann (Stony Brook University)