Yao CHEN (陈瑶)
Research Assistant Professor
National University of Singapore, Singapore
I am a Research Assistant Professor at the National University of Singapore (NUS). Before joining NUS, I was a Senior Research Scientist with the Advanced Digital Sciences Center (ADSC), a research center of the University of Illinois at Urbana-Champaign (UIUC) based in Singapore.
Currently, I am working with Prof. Bingsheng He and Prof. Weng-Fai Wong at the NUS on quite some interesting projects.
Research Interests
Domain-specific FPGA-based accelerators: FPGA enabled domain-specific architectures
Software/hardware co-designed efficient systems: ML/Graph accelerators, on-device AI
Electronic design automation (EDA): domain-specific high level synthesis (HLS), domain-specific architecture generation
Professional Experience
Department of Computer Science, School of Computing, National University of Singapore, Singapore
Research Assistant Professor (Aug 2022 - Present)
Advanced Digital Sciences Center, University of Illinois at Urbana-Champaign (UIUC)
Senior Research Scientist, Coordinator for Hardware and Data Analytics Research Groups (Jan 2022 - Aug 2022)
Research Scientist, Coordinator for Hardware and Data Analytics Research Groups (Jul 2020 - Dec 2021)
Research Scientist (May 2019 - Apr 2020)
PostDoc Researcher (May2018 - Apr 2019)
Senior Research Engineer (Jul 2016 - Apr 2018)
Teaching Experience
Techlaunch, Management of Technology, National University of Singapore, School of Engineering. (2021.1 - 2021.4)
Education
Ph.D. Nankai University Tianjin, China 2010 - 2016
Visiting Ph.D. University of Illinois at Urbana-Champaign Illinois, USA 2013 - 2015
B.S. Nankai University Tianjin, China 2006 - 2010
Selected Publications (*Indicates co-authorship)
Hongshi Tan, Xinyu Chen, Yao Chen*, Bingsheng He*, Weng-Fai Wong. LightRW: FPGA Accelerated Graph Dynamic Random Walks. (SIGMOD) 2023 (Passed Artifact Evaluation)
Zining Zhang, Yao Chen, Bingsheng He and Zhenjie Zhang. NIOT: A Novel Inference Optimization of Transformers on Modern FPGAs. Transactions on Parallel and Distributed Systems (TPDS). 2023
Xinheng Liu, Yao Chen*, Junhao Pan, Jinjun Xiong and Deming Chen. HiKonv: High throughput quantized convolution with novel bit-wise management and computation. In 27th Asia and South Pacific Design Automation Conference (ASP-DAC), 2022
Xinyu Chen, Yao Chen, Feng Cheng, Hongshi Tan, Weng-Fai Wong and Bingsheng He. ReGraph: Scaling Graph Processing on HBM-enabled FPGAs with Heterogeneous Pipelines. (MICRO) 2022
Prakhar Ganesh, Yao Chen*, Xin Lou, Mohammad Ali Khan, Yin Yang, Deming Chen, Marianne Winslett, Hassan Sajjad, and Preslav Nakov. Compressing large-scale transformer-based models: A case study on BERT. Transactions of the Association for Computational Linguistics (TACL), 2021
Lixiang Li, Yao Chen, Zacharie Zirnheld, Pan Li, and Cong Hao. MELOPPR: Software/hardware co-design for memory-efficient low-latency personalized pagerank. In Design Automation Conference (DAC), 2021
Xinyu Chen, Hongshi Tan, Yao Chen, Wengfai Wong, Bingsheng He, and Deming Chen. Skew-oblivious data routing for data intensive applications on FPGAs with HLS. In Design Automation Conference (DAC), 2021
Xinyu Chen, Hongshi Tan, Yao Chen, Wengfai Wong, Bingsheng He, and Deming Chen. ThunderGP: HLS-based graph processing framework on FPGAs. In Proceedings of the 2021 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA), 2021. (Passed Artifact Evaluation)
Cheng Gong, Yao Chen, Ye Lu, Tao Li, Cong Hao, and Deming Chen. Vecq: Minimal loss DNN model compression with vectorized weight quantization. IEEE Transactions on Computers (TC), 2020
Ruichu Cai, Zhihao Liang, Boyan Xu, Zijian Li, Yuexing Hao, and Yao Chen. TAG: Typeauxiliary guiding for code comment generation. In Proceedings of 58th Annual Meeting of the Association for Computational Linguistics (ACL), 2020
Yuhong Li, Cong Hao, Xiaofan Zhang, Xinheng Liu, Yao Chen, Jinjun Xiong, Wen-mei Hwu, and Deming Chen. EDD: Efficient differentiable DNN architecture and implementation co-search for embedded AI solutions. In Design Automation Conference (DAC), 2020
Yao Chen, Xin Long, Jiong He, Yuhang Chen, Hongshi Tan, Zhenxiang Zhang, Marianne Winslett, and Deming Chen. HaoCL: Harnessing large-scale heterogeneous processors made easy. In International Conference on Distributed Computing Systems (ICDCS), 2020
Yao Chen, Jiong He, Xiaofan Zhang, Cong Hao, and Deming Chen. Cloud-DNN: An open framework for mapping DNN models to cloud FPGAs. In Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, (FPGA), 2019 [link]
Major Projects & Funding
As Principle Investigator (PI)
"OctoBrain: A privacy-preserving edge-cloud collaborative learning solution for lift systems", Singapore Cybersecurity Consortium (SGCSC) (SGD $178,560: 2021-2022).
"Pooling large scale heterogeneous Processors", Alibaba Innovation Research (AIR) Funding (USD 140K: 2017-2019).
As Co-Principle Investigator (Co-PI)
"Real-time deep learning networks for fraud detection in modern e-marketplace systems", AISG (SGD $3.3M: 2022-2025).
"Dynamic Graph Random Walks on FPGAs", ByteDance (SGD 150K: 2023-2024).
"Towards Energy-Efficient Tera Scale Graph Processing on FPGAs", Google Research (SGD 50K: 2023-2024).
As a collaborator and Area Leader
"Memory Efficient Graph Accelerators on HLS-based FPGAs", Ministry of Education, Academic Research Fund (AcRF) Tier 2 (SGD 577K:2022-2025)
"GraphMind: Energy-Efficient Hardware Accelerators for Graph Neural Networks", NUS Advanced Research and Technology Innovation Center (SGD 362K: 2021-2024)
"Software-Hardware co-design for real-time AI", TSCP. (SGD $15M: 2017-2022). [Thrust Lead Scientist]
Professional Services
Associate Editor:
2023 - present ACM Transactions on Reconfigurable Technology and Systems (TRETS)
- Special Issue on FPGA-based Embedded Systems for Industrial and IoT Applications
2023 - present Frontiers in Electronics
Conference and Workshop Organization
2022 CSCLOUD Public Chair
2022 CACM Regional Special Section on East Asia and Oceania Region
2021 ISVLSI Registration Chair
TPC Member:
2024 ICDCS
2023 ICCAD, BigData, DATE
2022 ICCAD, ICCD, AAAI
2021 AIOTS, CCGRID
Session/Comittee Chair
ASP-DAC 2024 Session Chair
ISVLSI 2019 Session Chair
Journal Review
IEEE Design and Test (IEEE D&T)
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD)
IEEE Transactions on Very Large Scale Integration (VLSI) Systems (TVLSI)
IEEE Journal on Emerging and Selected Topics in Circuits and Systems
IEEE Transactions on Network Science and Engineering
Wireless Networks
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Micromachines
Canadian Journal of Electrical and Computer Engineering
CCF Transactions on High Performance Computing
Integration
Applied Science
Patents & Software License
2021 "Low Loss DNN Quantization Software" licensed to local company at a price of SGD 2.5K/year
R. Cai, K. Zhao, J. He, Y. Chen, Z. Hao, W. Wen, B. Chen, "A GPU acceleration method for spike sorting", patent No. CN109460785A
R. Cai, K. Zhao, J. He, Y. Chen, Z. Hao, W. Wen, B. Chen, "A CUDA based spike sorting acceleration method on GPU", patent No. CN109376651A
C. Lin, Z. Guo, H. Xu, Y. Chen, K. Liang, G. Li, “Bi-CMOS based Fifth Order High Accuracy Temperature Compensated Circuit”, 201610443932.2
C. Lin, Z. Guo, H. Xu, K. Liang, Y. Chen, G. Li, “Fifth Order High Accuracy Temperature Compensated Crystal Oscillator ASIC Design”, 201610443932.2
X. Gao, L. Wang, Y. Chen, “Nandflash Based Data Acquisition System”, patent No. CN20111005119.8
Y. Chen, P. Chang, L. Wang, R. Chen, H. Wu, “IC Card Identity Authentication System Used in Public Transportation”, patent No. ZL200810152397.0
Full Publication list
Book Chapters
Xiaofan Zhang, Yao Chen, Cong Hao, Sitao Huang, Yuhong Li and Deming Chen. "Compilation and Optimizations for Efficient Machine Learning on Embedded Systems," In Embedded Machine Learning for Cyber-Physical, IoT, and Edge Computing, Springer Nature, 2023.
Daisuke Mashima, Yao Chen, Muhammad M. Roomi, Subhash Lakshminarayana, Deming Chen. "Cybersecurity for Modern Smart Grid Against Emerging Threats," In Foundations and Trends in Privacy and Security, 2023.
Journal Papers (* indicates co-first authors)
Kunpeng Xie, Ye Lu, Xinyu He, Dezhi Yi, Huijuan Dong, Yao Chen. Winols: A Large-Tiling Sparse Winograd CNN Accelerator on FPGAs. ACM Transactions on Architecture and Code Optimization (TACO). 2024
Zining Zhang, Yao Chen, Bingsheng He and Zhenjie Zhang. NIOT: A Novel Inference Optimization of Transformers on Modern FPGAs. Transactions on Parallel and Distributed Systems (TPDS). 2023
Xinyu Chen, Feng Cheng, Hongshi Tan, Yao Chen, Bingsheng He, Weng-Fai Wong and Deming Chen. ThunderGP: Resource-Efficient Graph Processing Framework on FPGAs with HLS. ACM Transaction on Reconfigurable Technology and Systems (TRETS), 2022
Prakhar Ganesh*, Yao Chen*, Xin Lou, Mohammad Ali Khan, Yin Yang, Deming Chen, Marianne Winslett, Hassan Sajjad, and Preslav Nakov. Compressing large-scale transformer-based models: A case study on BERT. Transactions of the Association for Computational Linguistics (TACL), 2021
Prakhar Ganesh, Xin Lou, Yao Chen, Rui Tan, David K.Y. Yau, Deming Chen, and Marianne Winslett. Learning-based simultaneous detection and characterization of time delay attack in cyber-physical systems. IEEE Transactions on Smart Grid (TSG), 2020
Cheng Gong, Yao Chen, Ye Lu, Tao Li, Cong Hao, and Deming Chen. Vecq: Minimal loss DNN model compression with vectorized weight quantization. IEEE Transactions on Computers (TC), 2020
Libo Huang, Bingo Wing-Kuen Ling, Ruichu Cai, Yan Zeng, Jiong He, and Yao Chen. WMsorting: Wavelet packets decomposition and mutual information based spike sorting method. IEEE Transactions on NanoBioscience, 2019
Jihe Wang, Danghui Wang, Meikang Qiu, Yao Chen, and Bing Guo. A locality-aware shuffle optimization on fat-tree data centers. Future Generation Computer Systems (FGCS), 89:31 – 43, 2018
Ying Chen, Tan Nguyen, Yao Chen, Swathi T. Gurumani, Yun Liang, Kyle Rupnow, Jason Cong, Wen-mei Hwu, and Deming Chen. FCUDA-HB: Hierarchical and scalable bus architecture generation on FPGAs with the FCUDA flow. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD), 35(12):2032–2045, 2016
Yao Chen, Swathi Gurumani, Yun Liang, Guofeng Li, Donghui Guo, Kyle Rupnow, and Deming Chen. FCUDA-NoC: A scalable and efficient network-on-chip implementation for the CUDA-to-FPGA flow. IEEE Transactions on Very Large Scale Integration Systems (TVLSI), 24(6):2220–2233, June 2016
Conference Papers (* indicates co-first authors or corresponding authors)
Qiange Wang, Yao Chen, Weng-Fai Wong and Bingsheng He. HongTu: Scalable Full-Graph GNN Training on Multiple GPUs. (SIGMOD) 2024
Hongshi Tan, Xinyu Chen, Yao Chen*, Bingsheng He*, Weng-Fai Wong. LightRW: FPGA Accelerated Graph Dynamic Random Walks. (SIGMOD) 2023. (Passed Artifact Evaluation)
Xinyu Chen, Yao Chen, Feng Cheng, Hongshi Tan, Weng-Fai Wong and Bingsheng He. ReGraph: Scaling Graph Processing on HBM-enabled FPGAs with Heterogeneous Pipelines. (MICRO) 2022
Xinheng Liu*, Yao Chen*, Junhao Pan, Jinjun Xiong and Deming Chen. HiKonv: High throughput quantized convolution with novel bit-wise management and computation. In 27th Asia and South Pacific Design Automation Conference (ASP-DAC), 2022
Prakhar Ganesh*, Yao Chen*, David Yin Yang, Marianne Winslett, and Deming Chen. YOLO-ReT: Towards high accuracy real-time object detection on edge GPUs. In 2022 IEEE Winter Conference of Applications on Computer Vision (WACV), 2022 [link]
Xinheng Liu, Yao Chen, Cong Hao, Ashutosh Dhar, and Deming Chen. WinoCNN: Kernel sharing winograd systolic array for efficient convolutional neural network acceleration on FPGAs. In The 32nd IEEE International Conference on Application-specific Systems, Architectures and Processors (ASAP), 2022
Hongshi Tan, Xinyu Chen, Yao Chen, Wengfai Wong, and Bingsheng He. ThundeRiNG: Generating multiple independent random number sequences on FPGAs. In 2021 International Conference on Supercomputing (ICS), 2021
Yao Chen, Cole Hawkins, Kaiqi Zhang, Zheng Zhang, and Cong Hao. 3U-EdgeAI: Ultra-low memory training, ultra-low bitwidth quantization, and ultra-low latency acceleration. In Proceedings of the 2021 on Great Lakes Symposium on VLSI (GLSVLSI), 2021 (Invited) [link]
Lixiang Li, Yao Chen, Zacharie Zirnheld, Pan Li, and Cong Hao. MELOPPR: Software/hardware co-design for memory-efficient low-latency personalized pagerank. In Design Automation Conference (DAC), 2021
Xinyu Chen, Hongshi Tan, Yao Chen, Wengfai Wong, Bingsheng He, and Deming Chen. Skew-oblivious data routing for data intensive applications on FPGAs with HLS. In Design Automation Conference (DAC), 2021
Xinyu Chen, Hongshi Tan, Yao Chen, Wengfai Wong, Bingsheng He, and Deming Chen. ThunderGP: HLS-based graph processing framework on FPGAs. In Proceedings of the 2021 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA), 2021. (Passed Artifact Evaluation)
Cong Hao, Yao Chen, Xiaofan Zhang, Yuhong Li, Jinjun Xiong, Wen-mei Hwu, and Deming Chen. Effective algorithm-accelerator co-design for AI solutions on edge devices. In Proceedings of the 2020 on Great Lakes Symposium on VLSI (GLSVLSI), 2020
Ruichu Cai, Zhihao Liang, Boyan Xu, Zijian Li, Yuexing Hao, and Yao Chen. TAG: Typeauxiliary guiding for code comment generation. In Proceedings of 58th Annual Meeting of the Association for Computational Linguistics (ACL), 2020
Yao Chen, Xin Long, Jiong He, Yuhang Chen, Hongshi Tan, Zhenxiang Zhang, Marianne Winslett, and Deming Chen. HaoCL: Harnessing large-scale heterogeneous processors made easy. In International Conference on Distributed Computing Systems (ICDCS), 2020
Yuhong Li, Cong Hao, Xiaofan Zhang, Xinheng Liu, Yao Chen, Jinjun Xiong, Wen mei Hwu, and Deming Chen. EDD: Efficient differentiable DNN architecture and implementation co-search for embedded AI solutions. In Design Automation Conference (DAC), 2020
Xinyu Chen, Yao Chen, Ronak Bajaj, Jiong He, Bingsheng He, Weng-Fai Wong, and Deming Chen. Is FPGA useful for hash joins? Exploring hash joins on coupled CPU-FPGA architecture. In The Conference on Innovative Data Systems Research (CIDR), 2019
Cong Hao, Yao Chen, Xinheng Liu, Atif Sarwari, Daryl Sew, Ashutosh Dhar, Bryan Wu, Dongdong Fu, Jinjun Xiong, Wen-mei Hwu, Junli Gu, and Deming Chen. NAIS: Neural architecture and implementation search and its applications in autonomous driving. In International Conference On Computer-Aided Design (ICCAD), 2019 (Invited)
Xinyu Chen, Ronak Bajaj, Yao Chen, Jiong He, Weng-Fai Wong, Bingsheng He, and Deming Chen. On-the-fly parallel data shuffling for graph processing on OpenCL-based FPGAs. In Proceedings of the 2019 International Conference on Field-Programmable Logic and Applications (FPL), 2019
Xiaofan Zhang, Cong Hao, Yuhong Li, Yao Chen, Jinjun Xiong, Wen-mei Hwu, and Deming Chen. A bi-directional co-design approach to enable deep learning on IoT devices. In Proceedings of the ICML 2019 Workshop, Joint Workshop on On-Device Machine Learning & Compact Deep Neural Network Representations, (ODML-CDNNR), 2019
Yao Chen, Kai Zhang, Cheng Gong, Cong Hao, Xiaofan Zhang, Tao Li, and Deming Chen. T-DLA: An open-source deep learning accelerator for ternarized DNN models on embedded FPGA. In 2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI), 2019 [link]
Cheng Gong, Tao Li, Ye Lu, Cong Hao, Xiaofan Zhang, Deming Chen, and Yao Chen. μL2Q: An ultra-low loss quantization method for DNN compression. In Proceedings of the 2019 International Joint Conference on Neural Networks, (IJCNN), 2019
Yao Chen, Jiong He, Xiaofan Zhang, Cong Hao, and Deming Chen. Cloud-DNN: An open framework for mapping DNN models to cloud FPGAs. In Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, (FPGA), 2019 [link]
Huachao Xu, Jinlong Hu, Yao Chen, Guofeng Li, and Chao Lu. Pico-ampere voltage references for iot systems. In 2019 IEEE International Symposium on Circuits and Systems (ISCAS), 2019
Yao Chen, Libo Huang, Jiong He, Kunyao Zhao, Ruichu Cai, and Zhifeng Hao. HASS: High accuracy spike sorting with wavelet package decomposition and mutual information. In 2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), Dec 2018
Jiong He, Yao Chen, Tom Zhengjia Fu, Xin Long, Mariaane Winslett, Liang You, and Zhenjie Zhang. HaaS: Cloud-based real-time data analytics with heterogeneity-aware scheduling. In 2018 IEEE 38th International Conference on Distributed Computing Systems (ICDCS), 2018
Huachao Xu, Yao Chen, Jinlong Hu, Hongkun Cai, Tao Du, Ke Liang, and Guofeng Li. 110pA, 170ppm/V, -77dB@100hz voltage reference for IoT systems. In 2018 14th IEEE International Conference on Solid-State and Integrated Circuit Technology (ICSICT), Oct 2018
Huachao Xu, Yao Chen, Jinlong Hu, Tao Du, Hongkun Cai, Ke Liang, and Guofeng Li. 73 pA, 250 ppm/v, -74db@100hz voltage reference using one type of MOSFETs. Oct 2018
Jinlong Hu, Huachao Xu, Tao Du, Guofeng Li, and Yao Chen. Anovel1.03ppm/◦C wide temperature range curvature compensated bandgap voltage reference. In 2018 IEEE 2nd International Conference on Circuits, System and Simulation (ICCSS), 2018
Tan Nguyen, Yao Chen, Kyle Rupnow, Swathi T. Gurumani, and Deming Chen. SoC, NoC and hierarchical bus implementations of applications on FPGAs using the FCUDA flow. In 2016 IEEE Computer Society Annual Symposium on VLSI (ISVLSI), 2016
Xinheng Liu, Yao Chen, Tan Nguyen, Swathi Gurumani, Kyle Rupnow, and Deming Chen. High level synthesis of complex applications: An h.264 video decoder. In Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA), 2016
Liwei Yang, Yao Chen, Wei Zuo, Tan Nguyen, Swathi T. Gurumani, Kyle Rupnow, and Deming Chen. System-level design solutions: Enabling the IoT explosion. In 2015 IEEE 11th International Conference on ASIC (ASICON), 2015
Swathi T. Gurumani, Jacob Tolar, Yao Chen, Yun Liang, Kyle Rupnow, and Deming Chen. Integrated CUDA- to-FPGA synthesis with Network-on-Chip. In Proceedings of 2014 IEEE 22nd Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM), 2014
Preprints
Yao Chen, Junhao Pan, Xinheng Liu, Jinjun Xiong, Deming Chen. HiKonv: Maximizing the Throughput of Quantized Convolution With Novel Bit-wise Management and Computation. ArXiv.