Publications

(* indicates equal contribution)

Preprints

GARDO: Reinforcing Diffusion Models without Reward Hacking
Haoran He, Yuxiao Ye, Jie Liu, Jiajun Liang, Zhiyong Wang, Ziyang Yuan, Xintao Wang, Hangyu Mao, Pengfei Wan, Ling Pan
Preprint
[PDF] [Website]
Scaling Image and Video Generation via Test-Time Evolutionary Search
Haoran He, Jiajun Liang, Xintao Wang, Pengfei Wan, Di Zhang, Kun Gai, Ling Pan
Preprint
[PDF] [Website]
Steering Vision-Language-Action Models as Anti-Exploration: A Test-Time Scaling Approach
Siyuan Yang, Yang Zhang, Haoran He, Ling Pan, Xiu Li, Chenjia Bai, Xuelong Li
Preprint
[PDF]

Random Policy Valuation is Enough for LLM Reasoning with Verifiable Rewards
Haoran He*, Yuxiao Ye*, Qingpeng Cai, Chen Hu, Binxing Jiao, Daxin Jiang, Ling Pan
In Fourteenth International Conference on Learning Representations (ICLR), Rio de Janeiro, Brazil, 2026
[PDF] [Code & Model]
Asymmetric Proximal Policy Optimization: mini-critics boost LLM reasoning
Jiashun Liu*, Johan Obando-Ceron*, Han Lu, Yancheng He, Weixun Wang, Wenbo Su, Bo Zheng, Pablo Samuel Castro, Aaron Courville, Ling Pan
In Fourteenth International Conference on Learning Representations (ICLR), Rio de Janeiro, Brazil, 2026
[PDF]
Tricks or Traps? A Deep Dive into RL for LLM Reasoning
Zihe Liu*, Jiashun Liu*, Yancheng He*, Weixun Wang*, Jiaheng Liu, Ling Pan, Xinyu Hu, Shaopan Xiong, Ju Huang, Jian Hu, Shengyi Huang, Siran Yang, Jiamang Wang, Wenbo Su, Bo Zheng
In Fourteenth International Conference on Learning Representations (ICLR), Rio de Janeiro, Brazil, 2026
[PDF]
Pre-Trained Video Generative Models as World Simulators
Haoran He, Yang Zhang, Liang Lin, Zhongwen Xu, Ling Pan
In Fortieth Annual AAAI Conference on Artificial Intelligence (AAAI), Singapore, Singapore, 2026
[PDF]
ROLL Flash--Accelerating RLVR and Agentic Training with Asynchrony
Technical Report, 2026
[PDF]
Let It Flow: Agentic Crafting on Rock and Roll, Building the ROME Model within an Open Agentic Learning Ecosystem
Technical Report, 2026
[PDF]
Measure Gradients, Not Activations! Enhancing Neuronal Activity in Deep Reinforcement Learning
Jiashun Liu*, Zihao Wu*, Johan Obando-Ceron*, Pablo Samuel Castro, Aaron Courville, Ling Pan
In Thirty-Ninth Conference on Neural Information Processing Systems (NeurIPS), San Diego, USA, 2025
[PDF]
Learning Intractable Multimodal Policies with Reparameterization and Diversity Regularization
Ziqi Wang, Jiashun Liu, Ling Pan
In Thirty-Ninth Conference on Neural Information Processing Systems (NeurIPS), San Diego, USA, 2025
[PDF]
Beyond the Destination: A Novel Benchmark for Exploration-Aware Embodied Question Answering
Kaixuan Jiang, Yang Liu, Weixing Chen, Jingzhou Luo, Ziliang Chen, Ling Pan, Guanbin Li, Liang Lin
In International Conference on Computer Vision (ICCV), Hawaii, USA, 2025
[PDF]
The Courage to Stop: Overcoming Sunk Cost Fallacy in Deep Reinforcement Learning
Jiashun Liu, Johan Obando-Ceron, Pablo Samuel Castro, Aaron Courville, Ling Pan
In Forty-Second International Conference on Machine Learning (ICML), Vancouver, Canada, 2025
[PDF]
Random Policy Evaluation Uncovers Policies of Generative Flow Networks
Haoran He, Emmanuel Bengio, Qingpeng Cai, Ling Pan
In Forty-Second International Conference on Machine Learning (ICML), Vancouver, Canada, 2025
[PDF]
Learning to Sample Effective and Diverse Prompts for Text-to-Image Generation
Taeyoung Yun, Dinghuai Zhang, Jinkyoo Park, Ling Pan
In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, USA, 2025
[PDF] [Code]
Neuroplastic Expansion in Deep Reinforcement Learning
Jiashun Liu, Johan Obando-Ceron, Aaron Courville, Ling Pan
In Thirteenth International Conference on Learning Representations (ICLR), Singapore, Singapore, 2025
[PDF] [Code]
Looking Backward: Retrospective Backward Synthesis for Goal-Conditioned GFlowNets
Haoran He, Can Chang, Huazhe Xu, Ling Pan
In Thirteenth International Conference on Learning Representations (ICLR), Singapore, Singapore, 2025
[PDF] [Code]
Tackling Sparsity in Designated Driver Dispatch with Multi-Agent Reinforcement Learning
Jiaxuan Jiang, Ling Pan, Lin Zhou, Longbo Huang, Zhixuan Fang
In Twenty-Fourth International Conference on Autonomous Agents and Multi-Agent Systems (AAMAS), Detroit, USA, 2025
[PDF]
Flow Factorization for Efficient Generative Flow Networks
Jiashun Liu*, Chunhui Li*, Cheng-Hao Liu, Dianbo Liu, Qingpeng Cai, Ling Pan
In Thirty-Ninth Annual AAAI Conference on Artificial Intelligence (AAAI), Philadelphia, USA, 2025
Oral (Top 5%)
[PDF]
Generative Flow Networks for Personalized Multimedia Systems: A Case Study on Short Video Feeds
Yili Jin, Ling Pan, Rui-Xiao Zhang, Jiangchuan Liu, Xue Liu
In Thirty-Third ACM International Conference on Multimedia (ACM MM), Brave New Ideas Track, Dublin, Ireland, 2025
[PDF]
Evolution guided generative flow networks
Zarif Ikram, Ling Pan, Dianbo Liu
In Transactions on Machine Learning Research (TMLR), 2025
[PDF]
Learning an Actionable Discrete Diffusion Policy via Large-Scale Actionless Video Pre-Training
Haoran He, Chenjia Bai, Ling Pan, Weinan Zhang, Bin Zhao, Xuelong Li
In Thirty-Eighth Conference on Neural Information Processing Systems (NeurIPS), Vancouver, Canada, 2024
[PDF] [Code]
QGFN: Controllable Greediness with Action Values
Elaine Lau, Stephen Lu, Ling Pan, Doina Precup, Emmanuel Bengio
In Thirty-Eighth Conference on Neural Information Processing Systems (NeurIPS), Vancouver, Canada, 2024
[PDF] [Code]
Kaleidoscope: Learnable Masks for Heterogeneous Multi-agent Reinforcement Learning
Xinran Li, Ling Pan, Jun Zhang
In Thirty-Eighth Conference on Neural Information Processing Systems (NeurIPS), Vancouver, Canada, 2024
[PDF] [Code]
Value-Based Deep Multi-Agent Reinforcement Learning with Dynamic Sparse Training
Pihe Hu, Shaolong Li, Zhuoran Li, Ling Pan, Longbo Huang
In Thirty-Eighth Conference on Neural Information Processing Systems (NeurIPS), Vancouver, Canada, 2024
[PDF]
Bridging the Sim-to-Real Gap from the Information Bottleneck Perspective
Haoran He, Peilin Wu, Chenjia Bai, Hang Lai, Lingxiao Wang, Ling Pan, Xiaolin Hu, Weinan Zhang
In Eighth Annual Conference on Robot Learning (CoRL), Munich, Germany, 2024
Oral (Top 5%)
[PDF]
Learning to Scale Logits for Temperature-Conditional GFlowNets
Minsu Kim*, Joohwan Ko*, Taeyoung Yun*, Dinghuai Zhang, Ling Pan, Woochang Kim, Jinkyoo Park, Emmanuel Bengio, Yoshua Bengio
In Forty-First International Conference on Machine Learning (ICML), Vienna, Austria, 2024
[PDF] [Code]
Pre-Training and Fine-Tuning Generative Flow Networks
Ling Pan, Moksh Jain, Kanika Madan, Yoshua Bengio
In Twelfth International Conference on Learning Representations (ICLR), Vienna, Austria, 2024
Spotlight (Top 5%)
[PDF]
Distributional GFlowNets with Quantile Flows
Dinghuai Zhang*, Ling Pan*, Ricky T.Q. Chen, Aaron Courville, Yoshua Bengio
In Transactions on Machine Learning Research (TMLR), 2024
[PDF] [Code]
Multi-User Delay-Constrained Scheduling with Deep Recurrent Reinforcement Learning
Pihe Hu, Yu Chen, Ling Pan, Zhixuan Fang, Fu Xiao, Longbo Huang
In IEEE/ACM Transactions on Networking (TON), 2024
[PDF]
Let the Flows Tell: Solving Graph Combinatorial Problems with GFlowNets
Dinghuai Zhang, Hanjun Dai, Nikolay Malkin, Aaron Courville, Yoshua Bengio, Ling Pan
In Thirty-Seventh Conference on Neural Information Processing Systems (NeurIPS), New Orleans, USA, 2023
Spotlight (Top 5%)
[PDF] [Code]
Better Training of GFlowNets with Local Credit and Incomplete Trajectories
Ling Pan, Nikolay Malkin, Dinghuai Zhang, Yoshua Bengio
In Fortieth International Conference on Machine Learning (ICML), Hawaii, USA, 2023
[PDF] [Code]
Stochastic Generative Flow Networks
Ling Pan*, Dinghuai Zhang*, Moksh Jain, Longbo Huang, Yoshua Bengio
In Thirty-Ninth Conference on Uncertainty in Artificial Intelligence (UAI), Pittsburgh, USA, 2023
Spotlight (Top 7%)
[PDF] [Code]
Generative Augmented Flow Networks
Ling Pan, Dinghuai Zhang, Aaron Courville, Longbo Huang, Yoshua Bengio
In Eleventh International Conference on Learning Representations (ICLR), Kigali, Rwanda, 2023
Spotlight (Top 5%)
[PDF] [Code]
RLx2: Training a Sparse Deep Reinforcement Learning Model from Scratch
Yiqin Tan, Pihe Hu, Ling Pan, Jiatai Huang, Longbo Huang
In Eleventh International Conference on Learning Representations (ICLR), Kigali, Rwanda, 2023
Spotlight (Top 5%)
[PDF] [Code]
E-MAPP: Efficient Multi-Agent Reinforcement Learning with Parallel Program Guidance
Can Chang, Ni Mu, Jiajun Wu, Ling Pan, Huazhe Xu
In Thirty-Sixth Conference on Neural Information Processing Systems (NeurIPS), New Orleans, USA, 2022
Spotlight (Top 5%)
[PDF] [Website]
Plan Better Amid Conservatism: Offline Multi-Agent Reinforcement Learning with Actor Rectification
Ling Pan, Longbo Huang, Tengyu Ma, Huazhe Xu
In Thirty-Ninth International Conference on Machine Learning (ICML), Baltimore, USA, 2022
[PDF] [Code] [Website]
Recurrent Softmax Policy Gradient for Delay-Constrained Scheduling
Pihe Hu, Ling Pan, Yu Chen, Zhixuan Fang, Longbo Huang
In Twenty-Third International Symposium on Theory, Algorithmic Foundations, and Protocol Design for Mobile Networks and Mobile Computing (MobiHoc), Seoul, South Korea, 2022
[PDF]
Network Topology Optimization via Deep Reinforcement Learning
Zhuoran Li, Xing Wang, Ling Pan, Lin Zhu, Zhendong Wang, Junlan Feng, Chao Deng, Longbo Huang
In IEEE Transactions on Communications (TCOM), 2022
[PDF]
Regularized Softmax Deep Multi-Agent Q-Learning
Ling Pan, Tabish Rashid, Bei Peng, Longbo Huang, Shimon Whiteson
In Thirty-Fifth Conference on Neural Information Processing Systems (NeurIPS), 2021
[PDF][Code]
Exploration in Policy Optimization through Multiple Paths
Ling Pan, Qingpeng Cai, Longbo Huang
Journal of Autonomous Agents and Multi-agent Systems (JAAMAS), 2021
[PDF]
Softmax Deep Double Deterministic Policy Gradients
Ling Pan, Qingpeng Cai, Longbo Huang
In Thirty-Fourth Conference on Neural Information Processing Systems (NeurIPS), 2020
[PDF][Code]
Reinforcement Learning with Dynamic Boltzmann Softmax Updates
Ling Pan, Qingpeng Cai, Qi Meng, Wei Chen, Longbo Huang
In Twenty-Ninth International Joint Conference on Artificial Intelligence (IJCAI), Yokohama, Japan, 2020
(Acceptance rate: 12.6%)
[PDF]
Multi-Path Policy Optimization
Ling Pan, Qingpeng Cai, Longbo Huang
In Nineteenth International Conference on Autonomous Agents and Multi-Agent Systems (AAMAS), Auckland, New Zealand, 2020
Invited for fast-track publication in JAAMAS (Top 5%)
[PDF]
Deterministic Value-Policy Gradients
Qingpeng Cai*, Ling Pan*, Pingzhong Tang
In Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI), New York, USA, 2020
[PDF]
A Deep Reinforcement Learning Framework for Rebalancing Dockless Bike Sharing Systems
Ling Pan, Qingpeng Cai, Zhixuan Fang, Pingzhong Tang, Longbo Huang
In Thirty-Third AAAI Conference on Artificial Intelligence (AAAI), Hawaii, USA, 2019
(Acceptance rate: 16.2%)
[PDF]