Yao Chen       

Yao Chen

Research Assistant Professor
National University of Singapore, Singapore
E-mail:  yaochen@nus.edu.sg, yaochen@comp.nus.edu.sg
Address: 15 Computing Drive, Com2-02-02, Singapore, 117418
Phone:   +65 8376 7292
Homepage | GoogleScholar | ResearchGate
	

I am a Research Assistant Professor at the Department of Computer Science, School of Computing, National University of Singapore (NUS). Before joining NUS, I was a Senior Research Scientist with the Advanced Digital Sciences Center (ADSC), a research center of the University of Illinois at Urbana-Champaign (UIUC) based in Singapore.

Currently, I am working with Prof. Bingsheng He and Prof. Weng-Fai Wong at the NUS on quite some interesting projects.


Research interests

  • Domain-specific FPGA-based accelerators: FPGA enabled domain-specific architectures
  • Software/hardware co-designed efficient systems: ML/Graph accelerators, on-device AI
  • Electronic design automation (EDA): domain-specific high level synthesis (HLS), domain-specific architecture generation

As Principle Investigator (PI)

  1. OctoBrain: A privacy-preserving edge-cloud collaborative learning solution for lift systems.
    Singapore Cybersecurity Consortium (SGCSC). SGD $178,560: 2021 - 2022.
  2. Pooling large scale heterogeneous Processors.
    Alibaba Innovation Research (AIR) Funding. USD 140K: 2017 - 2019.

As Co-Principle Investigator (Co-PI)

  1. Real-time deep learning networks for fraud detection in modern e-marketplace systems.
    AISG. SGD $3.3M: 2022 - 2025.
  2. Dynamic Graph Random Walks on FPGAs.
    ByteDance. SGD 150K: 2023 - 2024.
  3. Towards Energy-Efficient Tera Scale Graph Processing on FPGAs.
    Google Research. SGD 50K: 2023 - 2024.

As a collaborator and Area Leader

  1. Memory Efficient Graph Accelerators on HLS-based FPGAs.
    Ministry of Education, Academic Research Fund (AcRF) Tier 2. SGD 577K:2022 - 2025.
  2. GraphMind: Energy-Efficient Hardware Accelerators for Graph Neural Networks.
    NUS Advanced Research and Technology Innovation Center. SGD 362K: 2021 - 2024.
  3. Software-Hardware co-design for real-time AI.
    TSCP. SGD $15M: 2017 - 2022.
  • Techlaunch, Management of Technology, National University of Singapore, School of Engineering. (2021.1 - 2021.4)
  • 2023 AMD HACC Outstanding Researcher Award
  • 2021 6th Place winner in 2021 RadioML Challenge
  • 2021 Second Place winner (FPGA track), IEEE Design Automation Conference (DAC) System Design Contest
  • 2020 Third Place winner (FPGA track), 2020 Design Automation Conference (DAC) System Design Contest
  • 2019 First Place winner (FPGA track), IEEE Design Automation Conference (DAC) System Design Contest
  • 2019 Best poster award, ICML Workshop 2019
  • 2010 Best Undergraduate Research Award, Nankai University

Software License

  1. Low Loss DNN Quantization Package, SGD 2.5K/year, 2021.

Patents

  1. A GPU acceleration method for spike sorting. R. Cai, K. Zhao, J. He, Y. Chen, Z. Hao, W. Wen, B. Chen. patent No. CN109460785A.
  2. A CUDA based spike sorting acceleration method on GPU. R. Cai, K. Zhao, J. He, Y. Chen, Z. Hao, W. Wen, B. Chen. patent No. CN109376651A.
  3. Bi-CMOS based Fifth Order High Accuracy Temperature Compensated Circuit. C. Lin, Z. Guo, H. Xu, Y. Chen, K. Liang, G. Li. patent No. 201610443932.2.
  4. Fifth Order High Accuracy Temperature Compensated Crystal Oscillator ASIC Design. C. Lin, Z. Guo, H. Xu, K. Liang, Y. Chen, G. Li. patent No. 201610443932.2.
  5. Nandflash Based Data Acquisition System. X. Gao, L. Wang, Y. Chen. patent No. CN20111005119.8.
  6. IC Card Identity Authentication System Used in Public Transportation. Y. Chen, P. Chang, L. Wang, R. Chen, H. Wu. patent No. ZL200810152397.0.

Invited Talk

  1. From Applications to Efficient Architectures on FPGAs. Shenzhen, China, CCF ESTC 2024
  2. Parallel Graph Processing on FPGAs. National Defence University, China, 2024.
  3. From Applications to Efficient Architectures. Wuhan University, China, 2024.
  4. Domain-specific architectures for modern data intensive applications. University of Electronic Science and Technology China, China, 2024.
  5. Efficient Graph Processing Architectures. National Defence University, China, 2023. (Virtual)
  6. Model, Accelerator and System for an Object Detection. Invited talk at Nankai University, 2021. (Virtual)
  7. DNN Acceleration on the Edge. Invited talk at Zhejiang University, 2021. (Virtual)
  8. Machine Learning Acceleration for Cyber Security. Invited talk at Create Webinar, Singapore, 2021. (Virtual)
  9. Deep Neural Networks on FPGAs. Invited talk at Leibniz AI Lab, Leibniz University Hannover, Hannover, Germany, 2021. (Virtual)
  10. DNN Acceleration: A Cloud to Edge Approach. Invited talk at College of Computer Science, Nankai University, Tianjin China, 2020

Workshop Presentation

  1. Efficient Architecture for Green Graph Processing on FPGAs. NRF-FRC Green Computing Workshop, Singapore, 2023.
  2. Low Loss DNN Model Quantization. ADSC workshop, Singapore, 2021. (Virtual)

Paper Presentation

  1. YOLO-ReT: Towards High Accuracy Real-time Object Detection on Edge GPUs. WACV Conference, 2022. (Virtual)
  2. HiKonv: High Throughput Quantized Convolution With Novel Bit-wise Management and Computation. ASP-DAC Conference, 2022. (Virtual)
  3. HaoCL: Harnessing Large-scale Heterogeneous Processors Made Easy. ICDCS Conference, Singapore, 2020. (Virtual)
  4. T-DLA: An Open-source Deep Learning Accelerator for Ternarized DNN Models on Embedded FPGA. ISVLSI Conference, Miami, Florida, USA, 2019
  5. Cloud-DNN: An Open Framework for Mapping DNN Models to Cloud FPGAs. FPGA Conference, Seaside, CA, US, 2019

Associate Editor

  • 2023 - present: ACM Transactions on Reconfigurable Technology and Systems (TRETS)
  • 2023 - present: Frontiers in Electronics

Assistant Editor

  • NRF FRC Green Computing Report (NRF Singapore), 2023

Conference Chairs

TPC Members

  • 2025: DATE, IEEE BigData
  • 2024: ICDCS, ICCAD, ICDE (poster session)
  • 2023: ICCAD, BigData, DATE
  • 2022: ICCAD, ICCD, AAAI
  • 2021: AIOTS, CCGRID

Journal Review

  • IEEE Transactions on Computers (TC)
  • IEEE Transactions on Parallel and Distributed Systems (TPDS)
  • IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD)
  • IEEE Design and Test (IEEE D&T)
  • ACM Transactions on Architecture and Code Optimization (TACO)
  • IEEE Transactions on Very Large Scale Integration (VLSI) Systems (TVLSI)
  • IEEE Journal on Emerging and Selected Topics in Circuits and Systems
  • IEEE Transactions on Network Science and Engineering
  • ACM Transactions on Design Automation of Electronic Systems (TODAES)
  • CCF Transactions on High Performance Computing
  • Micromachines

Panel Review

  • DTC Research Grant Call, Singapore, 2023

Journal Papers (* indicates co-first authors)

  1. [TACO'24] Kunpeng Xie, Ye Lu, Xinyu He, Dezhi Yi, Huijuan Dong, Yao Chen. Winols: A Large-Tiling Sparse Winograd CNN Accelerator on FPGAs.
  2. [TPDS'23] Zining Zhang, Yao Chen, Bingsheng He, Zhenjie Zhang. NIOT: A Novel Inference Optimization of Transformers on Modern FPGAs.
  3. [TRETS'22] Xinyu Chen, Feng Cheng, Hongshi Tan, Yao Chen, Bingsheng He, Weng-Fai Wong, Deming Chen. ThunderGP: Resource-Efficient Graph Processing Framework on FPGAs with HLS.
  4. [TACL'21] Prakhar Ganesh*, Yao Chen*, Xin Lou, Mohammad Ali Khan, Yin Yang, Deming Chen, Marianne Winslett, Hassan Sajjad, Preslav Nakov. Compressing large-scale transformer-based models: A case study on BERT.
  5. [TSG'20] Prakhar Ganesh, Xin Lou, Yao Chen, Rui Tan, David K.Y. Yau, Deming Chen, Marianne Winslett. Learning-based simultaneous detection and characterization of time delay attack in cyber-physical systems.
  6. [TC'20] Cheng Gong, Yao Chen, Ye Lu, Tao Li, Cong Hao, Deming Chen. Vecq: Minimal loss DNN model compression with vectorized weight quantization.
  7. [TNanoBio'19] Libo Huang, Bingo Wing-Kuen Ling, Ruichu Cai, Yan Zeng, Jiong He, Yao Chen. WMsorting: Wavelet packets decomposition and mutual information based spike sorting method.
  8. [FGCS'18] Jihe Wang, Danghui Wang, Meikang Qiu, Yao Chen, Bing Guo. A locality-aware shuffle optimization on fat-tree data centers.
  9. [TCAD'16] Ying Chen, Tan Nguyen, Yao Chen, Swathi T. Gurumani, Yun Liang, Kyle Rupnow, Jason Cong, Wen-mei Hwu, Deming Chen. FCUDA-HB: Hierarchical and scalable bus architecture generation on FPGAs with the FCUDA flow.
  10. [TVLSI'16] Yao Chen, Swathi Gurumani, Yun Liang, Guofeng Li, Donghui Guo, Kyle Rupnow, Deming Chen. FCUDA-NoC: A scalable and efficient network-on-chip implementation for the CUDA-to-FPGA flow.

Conference Papers (* indicates co-first authors or corresponding authors)

  1. [SIGMOD'25] Jixian Su, Chiyu Hao, Shixuan Sun, Hao Zhang, Sen Gao, Jiaxin Jiang, Yao Chen, Chenyi Zhang, Bingsheng He, and Minyi Guo. Revisiting the Design of In-Memory Dynamic Graph Storage. (Accepted)
  2. [ICDE'25] Yujian Fu, Cheng Chen, Yao Chen, Weng-Fai Wong, Bingsheng He. Vista: Vector Indexing and Search for Large-scale Imbalanced Datasets. (Accepted)
  3. [KDD'25] Borui Xu, Zeyi Wen, Yao Chen*, Weiguo Liu*, and Bingsheng He. ScalaGBM: Memory Efficient GBDT training for high-dimensional data on GPU. (Accepted)
  4. [ECCV'24] Cheng Gong, Yao Chen*, Qiuyang Luo, Ye Lu, Tao Li, Yuzhi Zhang, Yufei Sun*, Le Zhang. Deep Feature Surgery: Towards Accurate and Efficient Multi-Exit Networks.
  5. [SIGMOD'24] Qiange Wang, Yao Chen, Weng-Fai Wong, Bingsheng He. HongTu: Scalable Full-Graph GNN Training on Multiple GPUs.
  6. [SIGMOD'23] Hongshi Tan, Xinyu Chen, Yao Chen*, Bingsheng He*, Weng-Fai Wong. LightRW: FPGA Accelerated Graph Dynamic Random Walks. (Passed Artifact Evaluation)
  7. [MICRO'22] Xinyu Chen, Yao Chen, Feng Cheng, Hongshi Tan, Weng-Fai Wong, Bingsheng He. ReGraph: Scaling Graph Processing on HBM-enabled FPGAs with Heterogeneous Pipelines.
  8. [ASP-DAC'22] Xinheng Liu*, Yao Chen*, Junhao Pan, Jinjun Xiong, Deming Chen. HiKonv: High throughput quantized convolution with novel bit-wise management and computation.
  9. [WACV'22] Prakhar Ganesh*, Yao Chen*, David Yin Yang, Marianne Winslett, Deming Chen. YOLO-ReT: Towards high accuracy real-time object detection on edge GPUs. [pdf]
  10. [ASAP'22] Xinheng Liu, Yao Chen, Cong Hao, Ashutosh Dhar, Deming Chen. WinoCNN: Kernel sharing winograd systolic array for efficient convolutional neural network acceleration on FPGAs.
  11. [ICS'21] Hongshi Tan, Xinyu Chen, Yao Chen, Wengfai Wong, Bingsheng He. ThundeRiNG: Generating multiple independent random number sequences on FPGAs.
  12. [GLSVLSI'21] Yao Chen, Cole Hawkins, Kaiqi Zhang, Zheng Zhang, Cong Hao. 3U-EdgeAI: Ultra-low memory training, ultra-low bitwidth quantization, and ultra-low latency acceleration. [pdf]
  13. [DAC'21] Lixiang Li, Yao Chen, Zacharie Zirnheld, Pan Li, Cong Hao. MELOPPR: Software/hardware co-design for memory-efficient low-latency personalized pagerank.
  14. [DAC'21] Xinyu Chen, Hongshi Tan, Yao Chen, Wengfai Wong, Bingsheng He, Deming Chen. Skew-oblivious data routing for data intensive applications on FPGAs with HLS.
  15. [FPGA'21] Xinyu Chen, Hongshi Tan, Yao Chen, Wengfai Wong, Bingsheng He, Deming Chen. ThunderGP: HLS-based graph processing framework on FPGAs. (Passed Artifact Evaluation)
  16. [GLSVLSI'20] Cong Hao, Yao Chen, Xiaofan Zhang, Yuhong Li, Jinjun Xiong, Wen-mei Hwu, Deming Chen. Effective algorithm-accelerator co-design for AI solutions on edge devices.
  17. [ACL'20] Ruichu Cai, Zhihao Liang, Boyan Xu, Zijian Li, Yuexing Hao, Yao Chen. TAG: Type-auxiliary guiding for code comment generation.
  18. [ICDCS'20] Yao Chen, Xin Long, Jiong He, Yuhang Chen, Hongshi Tan, Zhenxiang Zhang, Marianne Winslett, Deming Chen. HaoCL: Harnessing large-scale heterogeneous processors made easy.
  19. [DAC'20] Yuhong Li, Cong Hao, Xiaofan Zhang, Xinheng Liu, Yao Chen, Jinjun Xiong, Wen mei Hwu, Deming Chen. EDD: Efficient differentiable DNN architecture and implementation co-search for embedded AI solutions.
  20. [CIDR'19] Xinyu Chen, Yao Chen, Ronak Bajaj, Jiong He, Bingsheng He, Weng-Fai Wong, Deming Chen. Is FPGA useful for hash joins? Exploring hash joins on coupled CPU-FPGA architecture.
  21. [ICCAD'19] Cong Hao, Yao Chen, Xinheng Liu, Atif Sarwari, Daryl Sew, Ashutosh Dhar, Bryan Wu, Dongdong Fu, Jinjun Xiong, Wen-mei Hwu, Junli Gu, Deming Chen. NAIS: Neural architecture and implementation search and its applications in autonomous driving. (Invited)
  22. [FPL'19] Xinyu Chen, Ronak Bajaj, Yao Chen, Jiong He, Weng-Fai Wong, Bingsheng He, Deming Chen. On-the-fly parallel data shuffling for graph processing on OpenCL-based FPGAs.
  23. [ODML-CDNNR'19] Xiaofan Zhang, Cong Hao, Yuhong Li, Yao Chen, Jinjun Xiong, Wen-mei Hwu, Deming Chen. A bi-directional co-design approach to enable deep learning on IoT devices. In Proceedings of the ICML 2019 Workshop, Joint Workshop on On-Device Machine Learning & Compact Deep Neural Network Representations.
  24. [ISVLSI'19] Yao Chen, Kai Zhang, Cheng Gong, Cong Hao, Xiaofan Zhang, Tao Li, Deming Chen. T-DLA: An open-source deep learning accelerator for ternarized DNN models on embedded FPGA. [pdf]
  25. [IJCNN'19] Cheng Gong, Tao Li, Ye Lu, Cong Hao, Xiaofan Zhang, Deming Chen, Yao Chen. μL2Q: An ultra-low loss quantization method for DNN compression.
  26. [FPGA'19] Yao Chen, Jiong He, Xiaofan Zhang, Cong Hao, Deming Chen. Cloud-DNN: An open framework for mapping DNN models to cloud FPGAs. [pdf]
  27. [ISCAS'19] Huachao Xu, Jinlong Hu, Yao Chen, Guofeng Li, Chao Lu. Pico-ampere voltage references for iot systems.
  28. [BIBM'18] Yao Chen, Libo Huang, Jiong He, Kunyao Zhao, Ruichu Cai, Zhifeng Hao. HASS: High accuracy spike sorting with wavelet package decomposition and mutual information.
  29. [ICDCS'18] Jiong He, Yao Chen, Tom Zhengjia Fu, Xin Long, Marianne Winslett, Liang You, Zhenjie Zhang. HaaS: Cloud-based real-time data analytics with heterogeneity-aware scheduling.
  30. [ICSICT'18] Huachao Xu, Yao Chen, Jinlong Hu, Hongkun Cai, Tao Du, Ke Liang, Guofeng Li. 110pA, 170ppm/V, -77dB@100hz voltage reference for IoT systems.
  31. [ICSICT'18] Huachao Xu, Yao Chen, Jinlong Hu, Tao Du, Hongkun Cai, Ke Liang, Guofeng Li. 73 pA, 250 ppm/v, -74db@100hz voltage reference using one type of MOSFETs.
  32. [ICCSS'18] Jinlong Hu, Huachao Xu, Tao Du, Guofeng Li, Yao Chen. Anovel1.03ppm/°C wide temperature range curvature compensated bandgap voltage reference.
  33. [ISVLSI'16] Tan Nguyen, Yao Chen, Kyle Rupnow, Swathi T. Gurumani, Deming Chen. SoC, NoC and hierarchical bus implementations of applications on FPGAs using the FCUDA flow.
  34. [FPGA'16] Xinheng Liu, Yao Chen, Tan Nguyen, Swathi Gurumani, Kyle Rupnow, Deming Chen. High level synthesis of complex applications: An h.264 video decoder.
  35. [ASICON'15] Liwei Yang, Yao Chen, Wei Zuo, Tan Nguyen, Swathi T. Gurumani, Kyle Rupnow, Deming Chen. System-level design solutions: Enabling the IoT explosion.
  36. [FCCM'14] Swathi T. Gurumani, Jacob Tolar, Yao Chen, Yun Liang, Kyle Rupnow, Deming Chen. Integrated CUDA-to-FPGA synthesis with Network-on-Chip.

Preprints

  • [ArXiv] Zining Zhang, Yao Chen, Bingsheng He, Zhenjie Zhang. Aggressive Post-Training Compression on Extremely Large Language Models. 2024.
  • [ArXiv] Yao Chen, Junhao Pan, Xinheng Liu, Jinjun Xiong, Deming Chen. HiKonv: Maximizing the Throughput of Quantized Convolution With Novel Bit-wise Management and Computation. 2023.
  • Book Chapters

    1. Xiaofan Zhang, Yao Chen, Cong Hao, Sitao Huang, Yuhong Li and Deming Chen. "Compilation and Optimizations for Efficient Machine Learning on Embedded Systems," In Embedded Machine Learning for Cyber-Physical, IoT, and Edge Computing, Springer Nature, 2023.
    2. Daisuke Mashima, Yao Chen, Muhammad M. Roomi, Subhash Lakshminarayana, Deming Chen. "Cybersecurity for Modern Smart Grid Against Emerging Threats," In Foundations and Trends in Privacy and Security, 2023.