General IPDPS Info









A green and blue logo  Description automatically generated with low confidence

Silver Level Partner


A picture containing font, graphics, black, screenshot  Description automatically generated


Workshop Support



IPDPS 2023 Advance Program

Please visit this IPDPS website regularly for updates, since there may be schedule revisions.

Authors who have corrections should send email to giving full details.

MONDAY - 15 May 2023

DAYS • Monday • Tuesday • Wednesday • Thursday • Friday


See each individual
for program & schedule details



Heterogeneity in Computing Workshop



Reconfigurable Architectures Workshop



High Performance Computational Biology



Graphs, Architectures, Programming, and Learning



NSF/TCPP Workshop on Parallel and Distributed Computing Education



Advances in Parallel and Distributed Computational Models



High-level Parallel Programming Models and Supportive Environments



Coarse-Grained Reconfigurable Architectures for HPC



AI for Datacenter Operations



Quantum Computing Algorithms, Systems, and Applications



Accelerators and Hybrid Emerging Systems



Extreme-Scale Storage and Analysis

6:00 PM -7:30 PM

IPDPS - TCPP Welcome Reception

TUESDAY - 16 May 2023

DAYS • Monday • Tuesday • Wednesday • Thursday • Friday

Opening Session
8:00 AM - 8:30 AM

Opening Session

Keynote Session
8:30 AM - 9:30 AM


Session Chair: DK Panda


Keshav Pingali

The University of Texas at Austin and Katana Graph Inc. and Recipient of the 2023 IEEE Charles Babbage Award


Fifty Years of Parallel Programming: Ieri, Oggi, Domani or Yesterday, Today, Tomorrow


Read details

Morning Break 9:30 AM -10:00 AM

Parallel Technical
Sessions 1A & 1B

10:00 AM - 11:30 AM

SESSION 1A: Graphs Processing

Session Chair: Ananth Kalyanaraman

  • GraphTensor: Comprehensive GNN-Acceleration Framework for Efficient Parallel Processing of Massive Datasets
    Junhyeok Jang, Miryeong Kwon, Donghyun Gouk, Hanyeoreum Bae, and Myoungsoo Jung (KAIST)

  • GraphMetaP: Efficient MetaPath Generation for Dynamic Heterogeneous Graph Models       
    Haiheng He (Huazhong University of Science and Technology, Zhijiang Lab); Dan Chen (Huazhong University of Science and Technology); Long Zheng, Yu Huang, Haifeng Liu, and Chaoqiang Liu (Huazhong University of Science and Technology, Zhijiang Lab); and Xiaofei Liao and Hai Jin (Huazhong University of Science and Technology)

  • Traversing Large Compressed Graphs on GPUs
    Prasun Gera (Cerebras, Georgia Tech) and Hyesoon Kim (Georgia Tech)

  • Distributed Sparse Random Projection Trees for Constructing K-Nearest Neighbor Graphs
    Isuru Ranawaka (Indiana University), Md Khaledur Rahman (Meta), and Ariful Azad (Indiana University)

  • Fast Deterministic Gathering with Detection on Arbitrary Graphs: The Power of Many Robots
    Anisur Rahaman Molla (Indian Statistical Institute), Kaushik Mondal (Indian Institute of Technology Ropar), and William K. Moses Jr. (Durham University)

  • Accurate and Efficient Distributed COVID-19 Spread Prediction based on a Large-Scale Time-Varying People Mobility Graph
    Sudipta Saha Shubha, Shohaib Mahmud, Haiying Shen, Geoffrey C. Fox, and Madhav Marathe (University of Virginia)

SESSION 1B: Architectural Advances

Session Chair: Antonino Tumeo

  • H-Cache: Traffic-Aware Hybrid Rule-Caching in Software-Defined Networks
    Zeyu Luan (Tsinghua-Berkeley Shenzhen Institute), Qing Li (Peng Cheng Laboratory), and Yi Wang and Yong Jiang (Tsinghua Shenzhen International Graduate School)
  • Accelerating Packet Processing in Container Overlay Networks via Packet-level Parallelism
    Jiaxin Lei (SUNY Binghamton), Manish Munikar (The University of Texas at Arlington), Hui Lu (SUNY Binghamton), and Rao Jia (The University of Texas at Arlington)
  • Software-Defined, Fast and Strongly-Consistent Data Replication for RDMA-based PM Datastores
    Haodi Lu, Haikun Liu, Chencheng Ye, Xiaofei Liao, Fubing Mao, Yu Zhang, and Hai Jin (Huazhong University of Science and Technology)   
  • Signal Detection for Large MIMO Systems Using Sphere Decoding on FPGAs
    Mohamed Hassan, Adel Dabah, Hatem Ltaief, and Suhaib A. Fahmy (KAUST)
  • Efficient Hardware Primitives for Immediate Memory Reclamation in Optimistic Data Structures
    Ajay Singh and Trevor Brown (University of Waterloo) and Michael Spear (Lehigh University)
  • A Novel Framework for Efficient Offloading of Communication Operations to Bluefield SmartNICs
    Kaushik Kandadi Suresh, Benjamin T. Michalowicz, Bharath Ramesh, Nick Contini, Jinghan Yao, Shulei Xu, Aamir Shafi, Hari Subramoni, and Dhabaleswar K. Panda (The Ohio State University)

11:30 AM – 1:00 PM

Lunch (on your own) & PhD Forum Program

Parallel Technical
Sessions 2A & 2B

1:00 PM – 2:00 PM

SESSION 2A: HPC Optimizations for ML

Session Chair: Bogdan Nicolae

  • Accelerating Distributed Deep Learning Training with Compression Assisted Allgather and Reduce-Scatter Communication
    Qinghua Zhou, Quentin Anthony, Lang Xu, Aamir Shafi, Mustafa Abduljabbar, Hari Subramoni, and Dhabaleswar K. (DK) Panda (The Ohio State University)
  • Accelerating CNN inference on long vector architectures via co-design
    Sonia Rani Gupta, Nikela Papadopoulou, and Miquel Pericas (Chalmers University of Technology)
  • Exploiting Input Tensor Dynamics in Activation Checkpointing for Efficient Training on GPU
    Jianjin Liao, Mingzhen Li, Hailong Yang, Qingxiao Sun, Biao Sun, Jiwei Hao, and Tianyu Feng (Beihang University); Fengwei Yu, Shengdong Chen, Ye Tao, and Zicheng Zhang (SenseTime Research); and Zhongzhi Luan and Depei Qian (Beihang University)
  • MPipeMoE: Memory Efficient MoE for Pre-trained Models with Adaptive Pipeline Parallelism
    Zheng Zhang (Wuhan University); Donglin Yang (Nvidia Corp.); Yaqi Xia (Wuhan University); Liang Ding and Dacheng Tao (JD Explore Academy, Inc.); Xiaobo Zhou (University of Macau); and Dazhao Cheng (Wuhan University)

SESSION 2B: I/O Optimizations

Session Chair: Mai Zheng

  • Mimir: Extending I/O interfaces for expressing User Intent for Complex Workloads in HPC.
    Hariharan Devarajan (Lawrence Livermore National Laboratory, Illinois Institute of Technology) and Kathryn Mohror (Lawrence Livermore National Laboratory)
  • Drill: Log-based Anomaly Detection for Large-scale Storage Systems Using Source Code Analysis
    Di Zhang and Chris Egersdoerfer (University of North Carolina at Charlotte); Tabassum Mahmud and Mai Zheng (Iowa State University); and Dong Dai (University of North Carolina, Charlotte)
  • FaultyRank: A Graph-based Parallel File System Checker
    Saisha Kamat and Abdullah Al Raqibul Islam (University of North Carolina at Charlotte); Mai Zheng (Iowa State University); and Dong Dai (University of North Carolina, Charlotte)
  • Evaluating Asynchronous Parallel IO on HPC Systems
    John Ravi (North Carolina State University); Suren Byna, Quincey Koziol, and Houjun Tang (Lawrence Berkeley National Laboratory); and Michela Becchi (North Carolina State University)

Early Afternoon Break 2:00 PM – 2:15 PM

Parallel Technical
Sessions 3A & 3B

2:15 PM – 3:15 PM

SESSION 3A: Large Scale ML

Session Chair: Sanmukh Kuppannagari

  • An Efficient 2D Method for Training Super-Large Deep Learning Models
    Qifan Xu (UCLA) and Yang You (NUS)
  • Dynasparse: Accelerating GNN Inference through Dynamic Sparsity Exploitation
    Bingyi Zhang and Viktor Prasanna (University of Southern California)
  • Exploiting Sparsity in Pruned Neural Networks to Optimize Large Model Training
    Siddharth Singh and Abhinav Bhatele (University of Maryland)
  • Asynch-SGBDT: Train Stochastic Gradient Boosting Decision Trees in an Asynchronous Parallel Manner
    Daning Cheng (tsinghua), shigang li (Beijing University of Posts and Telecommunications), and Yunquan Zhang (ict)

SESSION 3B: New Systems for Storage

Session Chair: Mustafa Abduljabbar

  • SRC: Mitigate I/O Throughput Degradation in Network Congestion Control of Disaggregated Storage Systems
    Danlin Jia, Yiming Xie, and Li Wang (Northeastern University); Xiaoqian Zhang and Allen Yang (University of Massachusetts Boston); Xuebin Yao, Mahsa Bayati, and Pradeep Subedi (Samsung Semiconductor Inc.); Bo Sheng (University of Massachusetts Boston); and Ningfang Mi (Northeastern University)
  • Boosting Multi-Block Repair in Cloud Storage Systems with Wide-Stripe Erasure Coding
    Qi Yu, Lin Wang, Yuchong Hu, Yumeng Xu, and Dan Feng (Huazhong University of Science and Technology) and Jie Fu, Xia Zhu, Zhen Yao, and Wenjia Wei (Huawei)
  • UnifyFS: A User-level Shared File System for Unified Access to Distributed Local Storage
    Michael Brim (Oak Ridge National Laboratory (ORNL)); Adam Moody (Lawrence Livermore National Laboratory); Seung-Hwan Lim, Ross Miller, and Swen Boehm (Oak Ridge National Laboratory); Cameron Stanavige and Kathryn Mohror (Lawrence Livermore National Laboratory); and Sarp Oral (Oak Ridge National Laboratory)
  • ArkFS: A Distributed File System on Object Storage for Archiving Data in HPC Environment
    Kyu-Jin Cho, Injae Kang, and Jin-Soo Kim (Seoul National University)

Late Afternoon Break 3:15 PM – 4:00 PM PhD Forum Posters on Display

Best Papers

4:00 PM - 6:00 PM

Best Paper Nominees - Plenary

Session Chair: Gagan Agrawal

  • On Doorway Egress by Autonomous Robots
    Rory Hector and Ramachandran Vaidyanathan (Louisiana State University), Gokarna Sharma (Kent State University), and Jerry Trahan (Louisiana State University)

  • PAQR: Pivoting Avoiding QR factorization
    Wissam Sid-Lakhdar, Sebastien Cayrols, Daniel Bielich, Ahmad Abdelfattah, Piotr Luszczek, Mark Gates, and Stanimire Tomov (University of Tennessee at Knoxville); Hans Johansen and David Williams-Young (Lawrence Berkeley National Laboratory); Timothy Davis (Texas A&M University); and Jack Dongarra and Hartwig Anzt (University of Tennessee at Knoxville)

  • DeepThermo: Deep Learning Accelerated Parallel Monte Carlo Sampling for Thermodynamics Evaluation of High Entropy Alloys
    Junqi Yin, Feiyi Wang, and Mallikarjun Shankar (Oak Ridge National Laboratory)

  • ByteTransformer: A High-Performance Transformer Boosted for Variable-Length Inputs

    Yujia Zhai (University of California, Riverside); Chengquan Jiang, Leyuan Wang, and Xiaoying Jia (ByteDance Ltd.); Shang Zhang (NVIDIA Corporation); Zizhong Chen (University of California, Riverside); and Xin Liu and Yibo Zhu (ByteDance Ltd.)

WEDNESDAY - 17 May 2023

DAYS • Monday • Tuesday • Wednesday • Thursday • Friday

Keynote Session
8:30 AM - 9:30 AM


Session Chair: Devesh Tiwari


Dilma Da Silva

National Science Foundation & Texas A&M University


The adventurous life of a system software researcher


Read details

Morning Break 9:30 AM -10:00 AM

Parallel Technical
Sessions 4A & 4B

10:00 AM - 11:30 AM

SESSION 4A: Linear Algebra Algorithms

Session Chair: Costas Busch

  • On the Arithmetic Intensity of Distributed-Memory Dense Matrix Multiplication Involving an Input Symmetric Matrix (SYMM)
    Emmanuel Agullo (Inria, LaBRI); Alfredo Buttari (CNRS, IRIT); Olivier Coulaud and Lionel Eyraud-Dubois (Inria, LaBRI); Mathieu Faverge (Bordeaux INP, LaBRI); Alain Franc (Inrae, Inria); Abdou Guermouche (Université Bordeaux, LaBRI); Antoine Jego (IRIT, INPT); Romain Peressoni (Inria, LaBRI); and Florent Pruvost (Inria)

  • A Novel Triangular Space-Filling Curve for Cache-Oblivious In-Place Transposition of Square Matrices
    João N. F. Alves (Research Group Scientific Computing, University of Vienna; Instituto Superior Técnico, Universidade de Lisboa); Luís Silveira Russo (Instituto Superior Técnico, Universidade de Lisboa; INESC-ID Lisboa); Alexandre P. Francisco (Instituto Superior Técnico, Universidade de Lisboa; INESC-ID, Lisboa); and Siegfried Benkner (Research Group Scientific Computing, University of Vienna)

  • Memory-aware Optimization for Sparse Matrix-Vector Multiplication Invocations
    Yichen Zhang and Shengguo Li (National University of Defense Technology); Fan Yuan (Xiangtan University, National University of Defense Technology); Dezun Dong, Xiaojian Yang, and Tiejun Li (National University of Defense Technology); and Zheng Wang (United Kingdom)

  • Data Distribution Schemes for Dense Linear Algebra Kernels on Any Number of Processors
    Olivier Beaumont (Inria Center at the University of Bordeaux; LaBRI, France); Jean-Alexandre Collin (Inria Center at the University of Bordeaux); and Lionel Eyraud-Dubois and Mathieu Vérité (Inria Center at the University of Bordeaux; LaBRI, France)

  • Dynamic Tensor Linearization and Time Slicing for Efficient Factorization of Infinite Data Streams
    Yongseok Soh (University of Oregon), Ahmed E. Helal and Fabio Checconi (Intel Labs), Jan Laukemann (University of Erlangen-Nürnberg), Jesmin Jahan Tithi (Intel Labs), Teresa Ranadive (Laboratory for Physical Sciences), Fabrizio Petrini (Intel Labs), and Jee W. Choi (University of Oregon)          

SESSION 4B: Resource Management

Session Chair: Cynthia Philips

  • Scheduling with Many Shared Resources  
    Max A. Deppert and Klaus Jansen (Kiel University), Marten Maack and Simon Pukrop (Paderborn University), and Malin Rau (University of Hamburg)

  • Chic-Sched: a HPC Placement-Group Scheduler on Hierarchical Topologies with Constraints 
    Laurent Schares, Asser Tantawi, Pavlos Maniotis, Ming-Hung Chen, Claudia Misale, Seetharami Seelam, and Hao Yu (IBM)

  • Generalizable Reinforcement Learning-Based Coarsening Model for Resource Allocation over Large and Diverse Stream Processing Graphs         
    Lanshun Nie, Yuqi Qiu, and Meng Fei (Harbin Institute of Technology); Mo Yu (IBM Research); and Jing Li (New Jersey Institute of Technology)

  • RLP: Power Management Based on a Latency-Aware Roofline Model          
    Bo Wang, Anara Kozhokanova, Christian Terboven, and Matthias Mueller (RWTH Aachen University)

  • SLAP: An Adaptive, Learned Admission Policy for Content Delivery Network Caching
    Ke Liu (Wuhan National Laboratory for Optoelectronics (WNLO) of Huazhong University of Science and Technology (HUST)), Kan Wu (University of Wisconsin-Madison), Hua Wang and Ke Zhou (Huazhong University of Science and Technology), Ji Zhang (Wuhan National Laboratory for Optoelectronics (WNLO) of Huazhong University of Science and Technology (HUST)), and Cong Li (Tencent)

  • Proactive SLA-aware Application Placement in the Computing Continuum  

    Zahra Najafabadi Samani, Narges Mehran, Dragi Kimovski, and Radu Prodan (Alpen-Adria-University of Klagenfurt)

11:30 AM – 1:00 PM

Lunch (on your own) & PhD Forum Program

Parallel Technical
Sessions 5A & 5B

1:00 PM – 2:00 PM

SESSION 5A: Federated and Graph Learning

Session Chair: Abhinav Bhatele

  • PFedSA: Personalized Federated Multi-Task Learning via Similarity Awareness
    Chuyao Ye, Hao Zheng, Zhigang Hu, and Meiguang Zheng (Central South University)
  • FedBIAD: Communication-Efficient and Accuracy-Guaranteed Federated Learning with Bayesian Inference-Based Adaptive Dropout
    Jingjing Xue and Min Liu (Institute of Computing Technology, Chinese Academy of Sciences; University of Chinese Academy of Sciences); Sheng Sun (Institute of Computing Technology, Chinese Academy of Sciences); and Yuwei Wang, Hui Jiang, and Xuefeng Jiang (Institute of Computing Technology, Chinese Academy of Sciences; University of Chinese Academy of Sciences)
  • Fast Sparse GPU Kernels for Accelerated Training of Graph Neural Networks
    Ruibo Fan (The Hong Kong University of Science and Technology (Guangzhou)), Wei Wang (The Hong Kong University of Science and Technology), and Xiaowen Chu (The Hong Kong University of Science and Technology (Guangzhou))
  • Communication Optimization for Distributed Execution of Graph Neural Networks
    Süreyya Emre Kurt and Jinghua Yan (University of Utah), Aravind Sukumaran-Rajam (Meta), and Prashant Pandey and P. Sadayappan (University of Utah)

SESSION 5B: Systems and ML

Session Chair: Hao Zheng

  • A Machine Learning Approach Towards Runtime Optimisation of Matrix Multiplication
    Yufan Xia (Australian National University), Marco De La Pierre (Pawsey Supercomputing Centre), and Amanda S. Barnard and Giuseppe Maria Junior Barca (Australian National University)
  • Power Constrained Autotuning using Graph Neural Networks 
    Akash Dutta (Iowa State University), Jee Choi (University of Oregon), and Ali Jannesari (Iowa State University)
  • SCONNA: A Stochastic Computing Based Optical Accelerator for Ultra-Fast, Energy-Efficient Inference of Integer-Quantized CNNs 
    Sairam Sri Vatsavai, Venkata Sai Praneeth Karempudi, Ishan Thakkar, Sayed Ahmad Salehi, and Todd Hastings (University of Kentucky)
  • HyScale-GNN: A Scalable Hybrid GNN Training System on Single-Node Heterogeneous Architecture
    Yi-Chien Lin and Viktor Prasanna (University of Southern California)   

Early Afternoon Break 2:00 PM -2:30 PM

Parallel Technical
Sessions 6A & 6B

2:30 PM – 4:00 PM

SESSION 6A: Scientific Applications

Session Chair: Richard Geber

  • Optimizing Cloud Computing Resource Usage for Hemodynamic Simulation           
    William Ladd, Christopher Jensen, Madhurima Vardhan, and Jeff Ames (Duke University); Jeff Hammond (NVIDIA); Erik Draeger (LLNL); and Amanda Randles (Duke University)

  • Predictive Analysis of Code Optimisations on Large-Scale Coupled CFD-Combustion Simulations using the CPX Mini-App           
    Archie Powell and Gihan Mudalige (University of Warwick)

  • Scalable adaptive algorithms for next-generation multiphase flow simulations         
    Kumar Saurabh (Iowa State University), Masado Ishii (University of Utah), Makrand Khanwale (Stanford University), Hari Sundar (University of Utah), and Baskar Ganapathysubramanian (Iowa State University)

  • Porting a Computational Fluid Dynamics Code with AMR to Extreme-Scale GPU Platforms   
    Joshua Hoke Davis, Justin Shafner, Daniel Nichols, Nathan Grube, Pino Martin, and Abhinav Bhatele (University of Maryland)

  • Neural Network Compiler for Parallel High-Throughput Simulation of Digital Circuits
    Ignacio Gavier, Joshua Russell, Devdhar Patel, Hava Siegelmann, and Edward Rietman (University of Massachusetts Amherst)

SESSION 6B: Performance Engineering

Session Chair: Oliver Sinnen

  • Opportunities and Limitations of Hardware Timestamps in Concurrent Data Structures
    Olivia Grimes, Jacob Nelson-Slivon, Ahmed Hassan, and Roberto Palmieri (Lehigh University)
  • Harnessing the Crowd for Autotuning High-Performance Computing Applications
    Younghyun Cho and James W. Demmel (University of California, Berkeley); Jacob King (Tech-X); and Xiaoye S. Li, Yang Liu, and Hengrui Luo (Lawrence Berkeley National Laboratory)
  • Designing and Optimizing GPU-aware Nonblocking MPI Neighborhood Collective Communication for PETSc 
    Kawthar Shafie Khorassani, Chen-Chun Chen, Hari Subramoni, and Dhabaleswar Panda (The Ohio State University)
  • SW-LCM: A Scalable and Weakly-supervised Land Cover Mapping Method on a New Sunway Supercomputer
    Yi Zhao, Juepeng Zheng, and Haohuan Fu (Tsinghua University); Wenzhao Wu (National Supercomputing Center in Wuxi); Jie Gao (National Research Center of Parallel Computer Engineering and Technology); Mengxuan Chen, Jinxiao Zhang, Lixian Zhang, Runmin Dong, and Zhenrong Du (Tsinghua University); Sha Liu and Xin Liu (National Research Center of Parallel Computer Engineering and Technology); Shaoqing Zhang (Qingdao Pilot National Laboratory for Marine Science and Technology); and Le Yu (Tsinghua University)   
  • Feature-based SpMV Performance Analysis on Contemporary Devices 
    Panagiotis Mpakos, Dimitrios Galanopoulos, Petros Anastasiadis, and Georgios Goumas (National Technical University of Athens); Nikela Papadopoulou (Chalmers University of Technology); and Nectarios Koziris (National Technical University of Athens)
  • An Experimental Study of Two-level Schwarz Domain-Decomposition Preconditioners on GPUs 
    Ichitaro Yamazaki (SNL), Alexander Heinlein (Delft University of Technology), and Sivasankaran Rajamanickam (SNL)   

Late Afternoon Break 4:00 PM -4:30 PM

IPDPS 2023 Panel Discussion

4:30 PM

Next Big Application(s) for HPC after Deep Learning   

In the last 5-10 years, Deep Neural Networks (DNNs) not only emerged as a new target class of applications for HPC researchers, but quickly started dominating HPC conferences. Rapidly increasing size of state-of-the-art DNN models has continued this strong interest, with efforts being made in each of the areas of architectures, programming systems, algorithms, and tuning of applications. Year 2023 may be a good time for the community to ask:  “What will be the next big application class or classes  that will excite and drive HPC researchers in the near future.’’ Trends in life sciences, materials, climate, secure computing, and/or others may provide certain clues in answering this question.  This panel will examine this open-ended question with a set of leading researchers and with active audience participation.


Gagan Agrawal, Augusta University

Panel Members:

  • Srinivas Aluru, Georgia Institute of Technology
  • David Bader, New Jersey Institute of Technology
  • Wu-chun Feng, Virginia Tech
  • Ananth Kalyanaraman, Washington State University
  • Cynthia Philips, Sandia National Laboratories

6:00 -
7:30 PM

PHD FORUM - Students at posters


Pre-banquet RECEPTION

7:30 PM

BANQUET (paper and poster awards)

THURSDAY - 18 May 2023

DAYS • Monday • Tuesday • Wednesday • Thursday • Friday

Keynote Session
8:30 AM - 9:30 AM


Session Chair: Viktor Prasanna


Ivo Bolsens
Senior Vice President and Chief Technology Officer, Adaptive and Embedded Computing Group, AMD


Future workloads drive the need for high performant and adaptive computing hardware


Read details

Morning Break 9:30 AM -10:00 AM

Parallel Technical

7A & 7B
10:00 AM - 11:30 AM

SESSION 7A: Combinatorial Algorithms

Session Chair: R. Vaidyanathan

  • Engineering Massively Parallel MST Algorithms
    Peter Sanders and Matthias Schimek (Karlsruhe Institute of Technology)
  • Engineering a Distributed-Memory Triangle Counting Algorithm
    Peter Sanders and Tim Niklas Uhl (Karlsruhe Institute of Technology)
  • PRF: A Fast Parallel Relaxed Flooding Algorithm for Voronoi Diagram Generation on GPU 
    Jue Wang and Fumihiko Ino (Osaka University) and Jing Ke (Shanghai Jiao Tong University)
  • Satellite Collision Detection using Spatial Data Structures
    Christian Hellwig, Fabian Czappa, and Martin Michel (Technical University of Darmstadt); Reinhold Bertrand (Technical University of Darmstadt, European Space Agency ESA/ESOC); and Felix Wolf (Technical University of Darmstadt)
  • AnyQ: An Evaluation Framework for Massively-Parallel Queue Algorithms
    Michael Kenzel and Stefan Lemme (DFKI); Richard Membarth (Technische Hochschule Ingolstadt, DFKI); Matthias Kurtenacker (DFKI); Hugo Devillers (Saarland University); Markus Steinberger (Graz University of Technology); and Philipp Slusallek (DFKI, Saarland University)

SESSION 7B: Emerging Technology

Session Chair: Jens Domke

  • qTask: Task-parallel Quantum Circuit Simulation with Incrementality  
    Tsung-Wei Huang (University of Utah)   
  • GPU-Accelerated Error-Bounded Compression Framework for Quantum Circuit Simulations
    Milan Shah (North Carolina State University, Argonne National Laboratory); Xiaodong Yu and Sheng Di (Argonne National Laboratory); Danylo Lykov (Argonne National Laboratory, The University of Chicago); Yuri Alexeev (Argonne National Laboratory); Michela Becchi (North Carolina State University); and Franck Cappello (Argonne National Laboratory)
  • An Adaptive Hybrid Quantum Algorithm for the Metric Traveling Salesman Problem
    Fei Li (George Mason University) and Arul Mazumder (Massachusetts Academy of Math and Science at WPI)
  • Stochastic Neuromorphic Circuits for Finding Graph Max Cuts
    Bradley H. Theilman, Yipu Wang, Ojas Parekh, William Severa, J. Darby Smith, and James B. Aimone (Sandia National Laboratories)
  • TurboHE: Accelerating Fully Homomorphic Encryption Using FPGA Clusters
    Haohao Liao, Mahmoud A. Elmohr, Xuan Dong, Yanjun Qian, Wenzhe Yang, Zhiwei Shang, and Yin Tan (Huawei Technologies)
  • Towards Faster Fully Homomorphic Encryption Implementation with Integer and Floating-point Computing Power of GPUs  
    Guang Fan (State Key Laboratory of Information Security, Institute of Information Engineering, Chinese Academy of Sciences; School of Cyber Security, University of Chinese Academy of Sciences); Fangyu Zheng (State Key Laboratory of Information Security, Institute of Information Engineering, Chinese Academy of Sciences); Lipeng Wan (State Key Laboratory of Information Security, Institute of Information Engineering, Chinese Academy of Sciences; School of Cyber Security, University of Chinese Academy of Sciences); Lily Gao (Department of Computer and Software, Nanjing University of Information Science and Technology); Yuan Zhao (Ant Group); Jiankuo Dong (School of Computer Science, Nanjing University of Posts and Telecommunications); Yixuan Song (Ant Group); Yuewu Wang (State Key Laboratory of Information Security, Institute of Information Engineering, Chinese Academy of Sciences; School of Cyber Security, University of Chinese Academy of Sciences); and Jingqiang Lin (School of Cyber Security, University of Science and Technology of China)   

11:30 AM – 1:00 PM

Lunch (on your own) & PhD Forum Program

Parallel Technical

8A & 8B
1:00 PM – 2:30 PM

SESSION 8A: Data-Intensive Algorithms

Session Chair: Ariful Azad

  • FedTrip: A Resource Efficient Federated Learning Method with Triplet Regularization
    Xujing Li and Min Liu (Institute of Computing Technology, Chinese Academy of Sciences; University of Chinese Academy of Sciences); Sheng Sun and Yuwei Wang (Institute of Computing Technology, Chinese Academy of Sciences); and Hui Jiang and Xuefeng Jiang (Institute of Computing Technology, Chinese Academy of Sciences; University of Chinese Academy of Sciences)

  • A Guaranteed Approximation Algorithm for Scheduling Fork-Join Graphs with Communication Delay
    Pierre-François Dutot (Grenoble University) and Yeu-Shin Fu, Nikhil Prasad, and Oliver Sinnen (University of Auckland)

  • SelB-k-NN: A Mini-Batch K-Nearest Neighbors Algorithm on AI Processors 
    Yifeng Tang and Cho-Li Wang (The University of Hong Kong)
    Exact Fault-Tolerant Consensus with Voting Validity
    Zhangchen Xu, Yuetai Li, Chenglin Feng, and Lei Zhang (University of Glasgow)

  • k-Center Clustering with Outliers in the MPC and Streaming Model
    Mark de Berg, Leyla Biabani, and Morteza Monemizadeh (Eindhoven University of Technology)

SESSION 8B: Serverless/Cloud Computing Systems

Session Chair: Radu Prodan

  • FIRST: Exploiting the Multi-Dimensional Attributes of Functions for Power-Aware Serverless Computing 
    Lu Zhang, Chao Li, Xinkai Wang, Weiqi Feng, Zheng Yu, Quan Chen, Jingwen Leng, and Minyi Guo (Shanghai Jiao Tong University) and Pu Yang and Shang Yue (Tencent)
  • Duo: Improving Data Sharing of Stateful Serverless Applications by Efficiently Caching Multi-read Data
    Zhuo Huang, Song Wu, Chaoyi Cheng, Hao Fan, and Hai Jin (CGCL/SCTS/BDTS, HUST)
  • QoS-Aware and Cost-Efficient Dynamic Resource Allocation for Serverless ML Workflows
    Hao Wu, Junxiao Deng, and Hao Fan (CGCL/SCTS/BDTS, HUST); Shadi Ibrahim (Inria, Univ. Rennes, CNRS, IRISA); and Song Wu and Hai Jin (CGCL/SCTS/BDTS, HUST)
  • rFaaS: Enabling High Performance Serverless with RDMA and Leases
    Marcin Copik (ETH Zurich) and Konstantin Taranov, Alexandru Calotoiu, and Torsten Hoefler (ETH Zurich)
  • Alioth: A Machine Learning Based Interference-Aware Performance Monitor for Multi-Tenancy Applications in Public Cloud
    Tianyao Shi, Yingxuan Yang, Yunlong Cheng, and Xiaofeng Gao (Shanghai Jiao Tong University) and Zhen Fang and Yongqiang Yang (Huawei Technologies)
  • GPU-enabled Function-as-a-Service for Machine Learning Inference
    Ming Zhao, Kritshekhar Jha, and Sungho Hong (Arizona State University)

Afternoon Break 2:30-3:00 PM

Parallel Technical

9A & 9B
3:00 PM – 4:30 PM

SESSION 9A: Optimizations for New Applications

Session Chair: Probir Roy

  • Lyra: Fast and Scalable Resilience to Reordering Attacks in Blockchains
    Pouriya Zarbafian and Vincent Gramoli (University of Sydney)
    Smart Red Belly Blockchain: Reducing Congestion for Web3
    Deepal Tennakoon, Yiding Hua, and Vincent Gramoli (University of Sydney)

  • SBGT: Scaling Bayesian Group Testing for Disease Surveillance
    Weicong Chen (Case Western Reserve University), Hao Qi and Xiaoyi Lu (University of California Merced), and Curtis Tatsuoka (University of Pittsburgh)

  • RT-DBSCAN: Accelerating DBSCAN using Ray Tracing Hardware
    Vani Nagarajan and Milind Kulkarni (Purdue University)

  • Distributing Simplex-Shaped Nested for-Loops to Identify Carcinogenic Gene Combinations
    Sajal Dash, Mohammad Alaul Haque Monil, and Junqi Yin (Oak Ridge National Laboratory); Ramu Anandakrishnan (The Edward Via College of Osteopathic Medicine); and Feiyi Wang (Oak Ridge National Laboratory)

SESSION 9B: Big Data Management

Session Chair: Bing Xie

  • LowFive: In Situ Data Transport for High-Performance Workflows
    Tom Peterka (Argonne National Laboratory), Dmitriy Morozov and Arnur Nigmetov (Lawrence Berkeley National Laboratory), Orcun Yildiz and Bogdan Nicolae (Argonne National Laboratory), and Philip Davis (University of Utah)

  • MCR-DL: Mix-and-Match Communication Runtime for Deep Learning
    Quentin Anthony (Ohio State University); Ammar Awan, Jeff Rasley, and Yuxiong He (Microsoft); and Aamir Shafi, Mustafa Abduljabbar, Hari Subramoni, and Dhabaleswar Panda (Ohio State University)

  • Lossy Scientific Data Compression With SPERR
    Shaomeng Li (National Center for Atmospheric Research), Peter Lindstrom (Lawrence Livermore National Lab), and John Clyne (National Center for Atmospheric Research)

  • Fast and Automatic Floating Point Error Analysis With CHEF-FP
    Garima Singh, Baidyanath Kundu, and Vassil Vassilev (CERN and Princeton University); Alexander Penev (Plovdiv University); David Lange (Princeton University); and Harshitha Menon (Lawrence Livermore National Laboratory)

  • Evaluating DAOS for HPC storage           
    DAOS as HPC Storage: a View From Numerical Weather Prediction
    Nicolau Manubens Gil, Tiago Quintino, Simon D. Smart, and Emanuele Danovaro (ECMWF) and Adrian Jackson (EPCC)

  • ZFP-X: Efficient Embedded Coding for Accelerating Lossy Floating Point Compression
    Bing Lu and Yida Li (Hunan University), Junqi Wang (Beijing Institute for General Artificial Intelligence), and Huizhang Luo and Kenli Li (Hunan University)

4:30 PM

IPDPS 2024 Preview and other wrap-up attractions!

FRIDAY - 19 May 2023

DAYS • Monday • Tuesday • Wednesday • Thursday • Friday


See each individual
for program & schedule details



Job Scheduling Strategies for Parallel Processing



Parallel and Distributed Scientific and Engineering Computing



Automatic Performance Tuning



Parallel AI and Systems for the Edge



Scalable Deep Learning over Parallel And Distributed Infrastructures



Parallel and Distributed Processing for Computational Social Systems



Parallel / Distributed Combinatorics and Optimization



Composable Systems



Extreme Scaling of AI for Science



Keshav Pingali

The University of Texas at Austin and Katana Graph Inc.
Recipient of the 2023 IEEE Charles Babbage Award

Title: Fifty Years of Parallel Programming: Ieri, Oggi, Domani *

Abstract: Parallel programming started around 1970 so as a discipline, it is now more than 50 years old. What have we learned in the past 50 years about parallel programming? What problems have we solved and what problems remain to be solved? What can young researchers learn from the successes and failures of our discipline? This talk presents a personal point of view about these and other questions regarding the state of parallel programming.

* The subtitle of this talk is Italian for "Yesterday, Today, Tomorrow," and it is borrowed from the title of a play by Alberto Moravia.

Bio: Keshav Pingali is a Professor in the Department of Computer Science at the University of Texas at Austin, where he holds the W.A."Tex" Moncrief Chair of Computing in the Institute for Computational Engineering and Sciences (ICES) at UT Austin. He was on the faculty of the Department of Computer Science at Cornell University from 1986 to 2006, where he held the India Chair of Computer Science. He has a PhD from MIT, and a B.Tech. from the Indian Institute of Technology, Kanpur, where he was awarded the President's Gold Medal.

Pingali has made deep and wide-ranging contributions to many areas of parallel computing including programming languages, compilers, and runtime systems for multicore, manycore and distributed computers. His current research is focused on programming models and tools for high-performance graph computing.

Pingali is a Fellow of the IEEE, ACM and AAAS. He received the IIT Kanpur Distinguished Alumnus Award in 2013, and the IEEE CS Charles Babbage Award in 2023. Between 2008 and 2011, he was the co-Editor-in-chief of the ACM Transactions on Programming Languages and Systems. He has also served on the NSF CISE Advisory Committee. He is currently the CEO of Katana Graph, a start-up in the graph computing area backed by leading investors from Silicon Valley.



Dilma Da Silva

National Science Foundation & Texas A&M University

Title: The adventurous life of a system software researcher 

Abstract: This talk addresses the pursuit of scalable and adaptive parallel/distributed systems by presenting the perspectives and lessons learned during my 35 years of project successes and failures. Besides discussing the technical challenges in these projects, the talk also addresses personal aspects of the work relevant to career happiness and maintaining high levels of motivation, commitment, and fun. Following this story of a career centered on designing scalable, adaptable systems while broadening our computing community, the talk will present a short overview of NSF’s Directorate of Computer and Information Science and Engineering.

Bio: Dilma Da Silva is a Professor and Holder of the Ford Design Professorship II at the Department of Computer Science and Engineering at Texas A&M University. She is currently serving as the Division Director for Computing and Communication Foundations at the National Science Foundation. Her previous roles at Texas A&M include Department Head (2014-2019), Associate Dean (2019-2020), interim director of the Texas A& M Institute of Data Science, and interim director of the Texas A&M Cybersecurity Center. Her primary research interests are distributed systems, operating systems, and computer science education. Before joining Texas A&M, she worked at Qualcomm Research (2012-2014), IBM Research (2000-2012), and the University of Sao Paulo (1996-2000).

Dilma is an ACM Distinguished Scientist, a member of the board of CRA-WP (Computer Research Association's Widening Participation Committee), and a Latinas in Computing group co-founder. She served as an ACM SIGOPS from 2011 to 2015 and chaired the ACM Senior Award Committee in 2015. She has chaired more than 35 conferences/workshops and participated in more than 100 program committees. Recent leadership roles include Program Co-chair for IEEE ICDCS'21, ACM Middleware'20, Supercomputing'19, and IPDPS'19. She has published more than 80 technical papers and has 15 patents. Dilma received her doctoral degree in computer science from Georgia Tech in 1997 and her bachelor's and master's degrees from the University of São Paulo, Brazil. She is passionate about enabling the next generation of talent.



Ivo Bolsens

Senior Vice President and Chief Technology Officer, Adaptive and Embedded Computing Group, AMD

Title: Future workloads drive the need for high performant and adaptive computing hardware

Abstract: As big data pushes the need for high-performance and adaptive computing  beyond  the exascale threshold, the pressure is on to find computing architectures that meet the right mix of price, performance, and power efficiency to support cost-effective data center scalability, acceleration of  applications that drive higher end-user productivity and faster time to insights and lower power consumption for sustainability. This will require heterogeneous architectures that combine traditional CPUs and GPUs with innovative accelerators to meet the ever-growing demand of big-data-driven workloads. Endpoints connected to the cloud are being infused with intelligence through sensors, cameras and other devices and are creating massive amounts of mostly unstructured data. Processing this data is driving demand for new workloads such as machine learning. Adaptive computing allows for compute and connectivity hardware that can adapt to the workload and efficiently process data in use, in motion and at rest.

Bio: Ivo Bolsens is senior vice president and chief technology officer (CTO), for the Adaptive and Embedded Computing Group (AECG) at AMD. He oversees AECG’s advanced hardware and software technology development, including future architecture directions and software stacks to enable emerging opportunities in the fields of machine learning and high-performance computing for edge and cloud. Bolsens leads the corporate initiative to establish AMD’s pervasive AI ecosystem and he manages the Open Source Program Office to accelerate solutions for programming AMD silicon. His team is also driving the  university program to create a thriving, global ecosystem for AMD technology in academia.

Bolsens joined AMD from Xilinx  in February 2022 as part of the  largest acquisition in semiconductor industry.  At Xilinx, he served as CTO in charge of the long-term technology strategy and advanced development activities for all software and hardware products. Bolsens joined Xilinx in June 2001 as its CTO, from the Interuniversity Microelectronics Centre (IMEC), the largest  semiconductor research center based in Belgium, where he was vice president, Design of Information and Communication Systems, leading the R&D of digital signal processing systems (DSP) for  video applications and wireless communication terminals, as well as the development of compilers for DSP processors and system-on-chip (SOC) design software. During his tenure at IMEC, he and his team spun-out three successful startups in the field of SOC design tools and wireless systems.

Bolsens serves on the advisory boards of IMEC, the Engineering Departments of San Jose State University and Santa Clara University, and the Department of Electrical Engineering and Computer Sciences at UC Berkeley. He is also a board member of EvoNexus, a startup technology incubator.

Bolsens holds a master’s in Electrical Engineering and a Ph.D. in Applied Science from the Catholic University of Leuven in Belgium.

Register Today

Early/Advance Registration
By 27 March 2023
By 31 March 2023
Registration Details

Search IPDPS


Follow IPDPS


Tweets by @IPDPS

IPDPS 2022 Report

36th IEEE International Parallel & Distributed Processing Symposium
May 30 – June 3, 2022
(Lyon, France)