The agenda of workshop and tutorials: Workshop/Tutorials Agenda
The agenda of workshop on RISC-V and OpenPOWER in HPC: RISCV Agenda
The agenda of SHAD tutorial: SHAD tutorial
The agenda of GraphBLAS tutorial: GraphBLAS tutorial
All times are in US EDT
June 15th, 2021
10:00 | Welcome | |
10:20 | Session 1: Loop Optimizations. Session Chair: Milind Chabbi | |
RAJALC: Inter-loop Optimization in RAJA | Brandon Neth, Thomas R.W. Scogland, Bronis R. de Supinski, Michelle Mills Strout | |
Tile Size Selection of Affine Programs for GPGPUs using Polyhedral Cross-Compilation | Khaled Abdelaal, Martin Kong | |
A Practical Tile Size Selection Model for Affine Loop Nests | Kumudha Narasimhan, Aravind Acharya, Abhinav Baid, Uday Bondhugula | |
10:50 | Session 2: Program Analysis and Benchmarking. Session Chair: Zhang, Xuechen | |
Does It Matter? - OMPSanitizer: An Impact Analyzer of Reported Data Races in OpenMP Programs | Wenwen Wang, Pei-Hung Lin | |
NumaPerf: Predictive NUMA Profiling | Xin Zhao, Jin Zhou, Hui Guan, Wei Wang, Xu Liu, Tongping Liu | |
NPBench: A Benchmarking Suite for High-Performance NumPy | Alexandros Nikolaos Ziogas, Tal Ben-Nun, Timo Schneider, Torsten Hoefler | |
DSGEN: Concolic Testing GPU Implementations of Concurrent Dynamic Data Structures | Xiaofan Sun, Rajiv Gupta | |
11:30 | Session 3: Managing Parallelism. Session Chair: Dimitrios Nikolopoulos | |
Task-Graph Scheduling Extensions for Efficient Synchronization and Communication | Seonmyeong Bak, Oscar Hernandez, Mark Gates, Piotr Luszczek, Vivek Sarkar | |
uSteal: A Theory-backed Framework for Preemptive Work and Resource Stealing in Mixed-Criticality Microservices | Amirhossein Mirhosseini, Thomas Wenisch | |
ThundeRiNG: Generating Multiple Independent Random Number Sequences on FPGAs | Hongshi Tan, Xinyu Chen, Yao Chen, Bingsheng He, Weng-Fai Wong | |
12:00 | Session 4: Resilience and Security. Session Chair: Dong Li | |
FT-BLAS: A High Performance BLAS Implementation With Online Fault Tolerance | Yujia Zhai, Elisabeth Giem, Quan Fan, Kai Zhao, Jinyang Liu, Zizhong Chen | |
PSSM: Achieving Secure Memory for GPUs with Partitioned and Sectored Security Metadata | Shougang Yuan, Yan Solihin, Huiyang Zhou | |
12:30 | Keynote Speaker: Rommie Amaro, UCSD, USA. Title: Computational Microscopy of SARS-CoV-2. Session Chair: Jose Moreira |
June 16th, 2021
10:00 | Keynote Speaker: Kevin Skadron, University of Virginia, USA. Title: Processing in Memory: Past, Present, and Future. Session Chair: Yoav Etsion | |
11:10 | Session 5: New Architectures for HPC. Session Chair: Trevor Mudge | |
Omegaflow: A High-Performance Dependency-based Architecture | Yaoyang Zhou, Zihao Yu, Chuanqi Zhang, Yinan Xu, Huizhe Wang, Sa Wang, Ninghui Sun, Yungang Bao | |
PLANAR: A Programmable Accelerator for Near-Memory Data Rearrangement | Adrián Barredo, Adrià Armejach, Jonathan Beard, Miquel Moreto | |
Power and Energy Efficient Routing for Mach-Zehnder Interferometer based Photonic Switches | Markos Kynigos, Jose A. Pascual, Javier Navaridas, John Goodacre, Mikel Lujan | |
11:40 | Session 6: Exploiting Non-Volatile Memory. Session Chair: Panda, Dhabaleswar | |
Athena: High-Performance Sparse Tensor Contraction Sequence on Heterogeneous Memory | Jiawen Liu, Dong Li, Roberto Gioiosa, Jiajia Li | |
Optimizing Large-Scale Plasma Simulations on Persistent Memory-based Heterogeneous Memory with Effective Data Placement Across Memory Hierarchy | Jie Ren, Jiaolin Luo, Ivy Peng, Kai Wu, Dong Li | |
MD-HM: Memoization-based Molecular Dynamics Simulations on Big Memory System | Zhen Xie, Wenqian Dong, Jie Liu, Ivy Peng, Yanbao Ma, Dong Li | |
12:10 | Session 7: Machine Learning (1). Session Chair: Dingwen Tao | |
Enabling Energy-Efficient DNN Training on Hybrid GPU-FPGA Accelerators | Xin He, Jiawen Liu, Zhen Xie, Hao Chen, Guoyang Chen, Weifeng Zhang, Dong Li | |
Proxima: Accelerating the Integration of Machine Learning in Atomistic Simulations | Yuliana Zamora, Logan Ward, Ganesh Sivaraman, Ian Foster, Henry Hoffmann | |
Partitioning Sparse Deep Neural Networks for Scalable Training and Inference | Gunduz Vehbi Demirci, Hakan Ferhatosmanoglu | |
12:40 | 12:40 Session 8: Machine Learning (2). Session Chair: Hui Guan | |
ClickTrain: Efficient and Accurate End-to-End Deep Learning Training via Fine-Grained Architecture-Preserving Pruning | Chengming Zhang, Geng Yuan, Wei Niu, Jiannan Tian, Sian Jin, Donglin Zhuang, Zhe Jiang, Yanzhi Wang, Bin Ren, Shuaiwen Leon Song, Dingwen Tao | |
SumMerge: An Efficient Algorithm and Implementation for Weight Repetition-Aware DNN Inference | Rohan Baskar Prabhakar, Sachit Kuhar, Rohit Agrawal, Christopher J. Hughes, Christopher W. Fletcher | |
Accelerating DNNs Inference with Predictive Layer Fusion | MohammadHossein Olyaiy, Christopher Ng, Mieszko Lis | |
AUTO-PRUNE: Automated DNN Pruning and Mapping for ReRAM-Based Accelerator | Siling Yang, Weijian Chen, Xuechen Zhang, Shuibing He, Yanlong Yin, Xian-He Sun |
June 17th, 2021
10:00 | Keynote Speaker: Chris Monroe, Duke University and IonQ, Inc., USA. Title: Quantum Computing with Atoms. Session Chair: Frank Mueller | |
11:10 | Session 9: Data Locality and Vectorization. Session Chair: Peter Hofstee | |
Performance Portable Back-projection Algorithms on CPUs: Agnostic Data Locality and Vectorization Optimizations | Peng Chen, Mohamed Wahib, Xiao Wang, Shinichiro Takizawa, Takahiro Hirofuchi, Hirotaka Ogawa, Satoshi Matsuoka | |
A Systematic Approach to Improving Data Locality Across Fourier Transforms and Linear Algebra Operations | Doru Thom Popovici, Andrew Canning, Zhengji Zhao, Lin-Wang Wang, John Shalf | |
11:30 | Session 10: Algorithms Adapting to High-Performance Networks. Session Chair: Xin Yuan | |
Delay Sensitivity-driven Congestion Mitigation for HPC Systems | Archit Patke, Saurabh Jha, Haoran Qiu, Jim Brandt, Ann Gentile, Joe Greenseid, Zbigniew Kalbarczyk, Ravishankar K. Iyer | |
Topology-aware Optimizations for Multi-GPU Ptychographic Image Reconstruction | Xiaodong Yu, Tekin Bicer, Rajkumar Kettimuthu, Ian Foster | |
11:50 | Session 11: Graph Data Structures and Algorithms. Session Chair: Kento Sato | |
Distributed merge forest: a new fast and scalable approach for topological analysis at scale | Xuan Huang, Pavol Klacansky, Steve Petruzza, Attila Gyulassy, Peer-Timo Bremer, Valerio Pascucci | |
Sandslash: A Two-Level Framework for Efficient Graph Pattern Mining | Xuhao Chen, Roshan Dathathri, Gurbinder Gill, Loc Hoang, Keshav Pingali | |
12:10 | Session 12: Parallelization Constrained by Data Dependencies. Session Chair: Jiajia Li | |
On the Automatic Parallelization of Subscripted Subscript Patterns using Array Property Analysis | Akshay Bhosale, Rudolf Eigenmann | |
ALTO: Adaptive Linearized Storage of Sparse Tensors | Ahmed E. Helal, Jan Laukemann, Fabio Checconi, Jesmin Jahan Tithi, Teresa Ranadive, Fabrizio Petrini, Jeewhan Choi | |
An Optimized Tensor Completion Library for multiple GPUs | Ming Dun, Yunchun Li, Hailong Yang, Qingxiao Sun, Zhongzhi Luan, Depei Qian | |
Distributed-Memory Parallel Algorithms for Sparse Times Tall-Skinny-Dense Matrix Multiplication | Oguz Selvitopi, Benjamin Brock, Israt Nisa, Alok Tripathy, Katherine Yelick, Aydın Buluç | |
12:50 | Session 13: Best Paper Candidates. Session Chair: Yoav Etsion | |
HyQuas: Hybrid Partitioner Based Quantum Circuit Simulation System on GPU | Chen Zhang, Zeyu Song, Haojie Wang, Kaiyuan Rong, Jidong Zhai | |
FULL-W2V: Fully Exploiting Data Reuse for W2V on GPU-Accelerated Systems | Thomas Randall, Tyler Allen, Rong Ge | |
A Performance Portability Framework for Python | Nader Al Awar, Steven Zhu, George Biros, Milos Gligoric | |
ProMT: Optimizing Integrity Tree Updates for Write-Intensive Pages in Secure NVMs | Mazen Alwadi, Aziz Mohaisen, Amro Awad | |
13:30 | Concluding Remarks |