I am a Senior Deep Learning Architect at NVIDIA. Previously , I was a Postdoc Research Fellow at the University of British Columbia (UBC), where I also earned my doctorate in 2020. I worked with Prof. Mieszko Lis at UBC. I worked in computer architecture and computer systems, with particular focuses on GPUs and Machine Learning Accelerators. Before joining UBC, I received my master and bachelor degrees in Xi’an Jiaotong University.
[Curriculum Vitae] [Google Scholar][LinkedIn]
Email: xren@nvidia.com
PhD, University of British Columbia, Canada, Oct. 2020
Dissertation: Efficient Synchronization Mechanisms for Scalable GPU Architectures
M.Sc, Xi’an Jiaotong University, China, Jun. 2015
Thesis: Parallel Acceleration Algorithms and FPGA Implementation for KLMS and KAP
B.Sc, Xi’an Jiaotong University, China, Jun. 2012
Jun. 2021 – Present | Senior Architect | NVIDIA, Canada and USA |
Nov. 2020 – Jun. 2021 | Postdoc Research Fellow | University of British Columbia, Canada |
Sept. 2015 – Oct. 2020 | Research Assistant | University of British Columbia, Canada |
Sept. 2019 – Nov. 2019 | Research Intern | Max Planck Institute for Software Systems, Germany |
Aug. 2018 – Nov. 2018 | Research Intern | NVIDIA, USA |
May. 2017 – Aug. 2017 | Research Intern | NVIDIA, USA |
Sept. 2012 – Jun. 2015 | Research Assistant | Xi’an Jiaotong University, China |
Jul. 2011 – Sept. 2011 | Undergraduate Intern | ICT, Chinese Academy of Science, China |
Michalis Kokologiannakis, Xiaowei Ren, and Viktor Vafeiadis. “Dynamic Partial Order Reductions for Spinloops”, Annual International Conference of Formal Methods in Computer-Aided Design (FMCAD), Yale University, Connecticut, USA, October 2021. [paper][full-talk-slides][full-talk-video]
Xiaowei Ren, and Mieszko Lis. “CHOPIN: Scalable Graphics Rendering in Multi-GPU Systems via Parallel Image Composition”, 27th International Symposium on High Performance Computer Architecture (HPCA), Seoul, South Korea, February 2021. (acceptance rate: 63/258 = 24.4%) [paper][full-talk-slides][short-talk-slides][short-talk-video]
Dingqing Yang, Amin Ghasemazar*, Xiaowei Ren*, Maximilian Golub, Guy Lemieux, and Mieszko Lis. “Procrustes: a Dataflow and Accelerator for Sparse Deep Neural Network Training”, 53rd International Symposium on Microarchitecture (MICRO), Athens, Greece, October 2020. (acceptance rate: 82/424 = 19.3%, *equal contribution) [paper][arXiv][lightning-video][full-talk-video]
Xiaowei Ren, Daniel Lustig, Evgeny Bolotin, Aamer Jaleel, Oreste Villa, and David Nellans. “HMG: Extending Cache Coherence Protocols Across Modern Hierarchical Multi-GPU Systems”, 26th International Symposium on High Performance Computer Architecture (HPCA), San Diego, USA, February 2020. (acceptance rate: 48/248 = 19.4%) [paper][slides][lightning-slides][lightning-video]
Xiaowei Ren, and Mieszko Lis. “High-Performance GPU Transactional Memory via Eager Conflict Detection”, 24th International Symposium on High Performance Computer Architecture (HPCA), Vienna, Austria, February 2018. (acceptance rate: 54/260 = 20.8%) [paper][slides][lightning-slides][lightning-video]
Xiaowei Ren, and Mieszko Lis. “Efficient Sequential Consistency in GPUs via Relativistic Cache Coherence”, 23rd International Symposium on High Performance Computer Architecture (HPCA), Austin, USA, February 2017. (acceptance rate: 50/224 = 22.3%) [paper][slides][lightning-slides]
Pengju Ren, Xiaowei Ren, Sudhanshu Sane, Michel A. Kinsy, and Nanning Zheng. “A Deadlock-Free and Connectivity-Guaranteed Methodology for Achieving Fault-tolerance in On-Chip Networks”, IEEE Transactions on Computers (TC), 2016.
Xiaowei Ren, Qihang Yu, Badong Chen, Nanning Zheng, and Pengju Ren. “A Reconfigurable Parallel Accelerator for the Kernel Affine Projection Algorithm”, IEEE International Conference on Digital Signal Processing (DSP), Singapore, July 2015.
Xiaowei Ren, Qihang Yu, Badong Chen, Nanning Zheng, and Pengju Ren. “A 128-way FPGA Platform for the Acceleration of KLMS Algorithm”, Asia and South Pacific Design Automation Conference (ASP-DAC), Tokyo, Japan, January 2015. (University LSI Design Contest)
Xiaowei Ren, Qihang Yu, Badong Chen, Nanning Zheng, and Pengju Ren. “A Real-time Permutation Entropy Computation for EEG Signals”, Asia and South Pacific Design Automation Conference (ASP-DAC), Tokyo, Japan, January 2015. (University LSI Design Contest)
Xiaowei Ren, Pengju Ren, Badong Chen, Jose C. Principe, and Nanning Zheng. “A Reconfigurable Parallel Acceleration Platform for Evaluation of Permutation Entropy”, 36th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), Chicago, USA, August 2014.
Xiaowei Ren, Pengju Ren, Badong Chen, Tai Min, and Nanning Zheng. “Hardware implementation of KLMS Algorithm using FPGA”, International Joint Conference on Neural Networks (IJCNN), Beijing, China, July 2014.
Pengju Ren, Qingxin Meng, Xiaowei Ren, and Nanning Zheng. “Fault-tolerant Routing for On-chip Network without Using Virtual Channel”, ACM/EDAC/IEEE Design Automation Conference (DAC), San Francisco, USA, June 2014. (acceptance rate: 3150/10963 = 29%)
2016 – 2020 | UBC Graduate Support Initiative (GSI) Awards |
2012 – 2015 | National Master Scholarship (honors top 5% students) |
2013 – 2014 | Suzhou Industrial Park Scholarship |
2010 – 2011 | CASC Secondary Class Scholarship |
2009 – 2010 | National Encouragement Scholarship (honors top 3% students) |
2008 – 2009 | Siyuan Scholarship |