I am a Research Assistant Professor at the Department of Computer Science, School of Computing, National University of Singapore (NUS). Before joining NUS, I was a Senior Research Scientist with the Advanced Digital Sciences Center (ADSC), a research center of the University of Illinois at Urbana-Champaign (UIUC) based in Singapore.
2021 Second Place winner (FPGA track), IEEE Design Automation Conference (DAC) System Design Contest
2020 Third Place winner (FPGA track), 2020 Design Automation Conference (DAC) System Design Contest
2019 First Place winner (FPGA track), IEEE Design Automation Conference (DAC) System Design Contest
2019 Best poster award, ICML Workshop 2019
2010 Best Undergraduate Research Award, Nankai University
Software License
Low Loss DNN Quantization Package, SGD 2.5K/year, 2021.
Patents
A GPU acceleration method for spike sorting. R. Cai, K. Zhao, J. He, Y. Chen, Z. Hao, W. Wen, B. Chen. patent No. CN109460785A.
A CUDA based spike sorting acceleration method on GPU. R. Cai, K. Zhao, J. He, Y. Chen, Z. Hao, W. Wen, B. Chen. patent No. CN109376651A.
Bi-CMOS based Fifth Order High Accuracy Temperature Compensated Circuit. C. Lin, Z. Guo, H. Xu, Y. Chen, K. Liang, G. Li. patent No. 201610443932.2.
Fifth Order High Accuracy Temperature Compensated Crystal Oscillator ASIC Design. C. Lin, Z. Guo, H. Xu, K. Liang, Y. Chen, G. Li. patent No. 201610443932.2.
Nandflash Based Data Acquisition System. X. Gao, L. Wang, Y. Chen. patent No. CN20111005119.8.
IC Card Identity Authentication System Used in Public Transportation. Y. Chen, P. Chang, L. Wang, R. Chen, H. Wu. patent No. ZL200810152397.0.
Invited Talk
From Applications to Efficient Architectures on FPGAs. Shenzhen, China, CCF ESTC 2024
Parallel Graph Processing on FPGAs. National Defence University, China, 2024.
From Applications to Efficient Architectures. Wuhan University, China, 2024.
Domain-specific architectures for modern data intensive applications. University of Electronic Science and Technology China, China, 2024.
Efficient Graph Processing Architectures. National Defence University, China, 2023. (Virtual)
Model, Accelerator and System for an Object Detection. Invited talk at Nankai University, 2021. (Virtual)
DNN Acceleration on the Edge. Invited talk at Zhejiang University, 2021. (Virtual)
Machine Learning Acceleration for Cyber Security. Invited talk at Create Webinar, Singapore, 2021. (Virtual)
Deep Neural Networks on FPGAs. Invited talk at Leibniz AI Lab, Leibniz University Hannover, Hannover, Germany, 2021. (Virtual)
DNN Acceleration: A Cloud to Edge Approach. Invited talk at College of Computer Science, Nankai University, Tianjin China, 2020
Workshop Presentation
Efficient Architecture for Green Graph Processing on FPGAs. NRF-FRC Green Computing Workshop, Singapore, 2023.
Low Loss DNN Model Quantization. ADSC workshop, Singapore, 2021. (Virtual)
Paper Presentation
YOLO-ReT: Towards High Accuracy Real-time Object Detection on Edge GPUs. WACV Conference, 2022. (Virtual)
HiKonv: High Throughput Quantized Convolution With Novel Bit-wise Management and Computation. ASP-DAC Conference, 2022. (Virtual)
IEEE Transactions on Parallel and Distributed Systems (TPDS)
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD)
IEEE Design and Test (IEEE D&T)
ACM Transactions on Architecture and Code Optimization (TACO)
IEEE Transactions on Very Large Scale Integration (VLSI) Systems (TVLSI)
IEEE Journal on Emerging and Selected Topics in Circuits and Systems
IEEE Transactions on Network Science and Engineering
ACM Transactions on Design Automation of Electronic Systems (TODAES)
CCF Transactions on High Performance Computing
Micromachines
Panel Review
DTC Research Grant Call, Singapore, 2023
Journal Papers (* indicates co-first authors)
[TACO'24] Kunpeng Xie, Ye Lu, Xinyu He, Dezhi Yi, Huijuan Dong, Yao Chen. Winols: A Large-Tiling Sparse Winograd CNN Accelerator on FPGAs.
[TPDS'23] Zining Zhang, Yao Chen, Bingsheng He, Zhenjie Zhang. NIOT: A Novel Inference Optimization of Transformers on Modern FPGAs.
[TRETS'22] Xinyu Chen, Feng Cheng, Hongshi Tan, Yao Chen, Bingsheng He, Weng-Fai Wong, Deming Chen. ThunderGP: Resource-Efficient Graph Processing Framework on FPGAs with HLS.
[TACL'21] Prakhar Ganesh*, Yao Chen*, Xin Lou, Mohammad Ali Khan, Yin Yang, Deming Chen, Marianne Winslett, Hassan Sajjad, Preslav Nakov. Compressing large-scale transformer-based models: A case study on BERT.
[TSG'20] Prakhar Ganesh, Xin Lou, Yao Chen, Rui Tan, David K.Y. Yau, Deming Chen, Marianne Winslett. Learning-based simultaneous detection and characterization of time delay attack in cyber-physical systems.
[TC'20] Cheng Gong, Yao Chen, Ye Lu, Tao Li, Cong Hao, Deming Chen. Vecq: Minimal loss DNN model compression with vectorized weight quantization.
[TNanoBio'19] Libo Huang, Bingo Wing-Kuen Ling, Ruichu Cai, Yan Zeng, Jiong He, Yao Chen. WMsorting: Wavelet packets decomposition and mutual information based spike sorting method.
[FGCS'18] Jihe Wang, Danghui Wang, Meikang Qiu, Yao Chen, Bing Guo. A locality-aware shuffle optimization on fat-tree data centers.
[TCAD'16] Ying Chen, Tan Nguyen, Yao Chen, Swathi T. Gurumani, Yun Liang, Kyle Rupnow, Jason Cong, Wen-mei Hwu, Deming Chen. FCUDA-HB: Hierarchical and scalable bus architecture generation on FPGAs with the FCUDA flow.
[TVLSI'16]Yao Chen, Swathi Gurumani, Yun Liang, Guofeng Li, Donghui Guo, Kyle Rupnow, Deming Chen. FCUDA-NoC: A scalable and efficient network-on-chip implementation for the CUDA-to-FPGA flow.
Conference Papers (* indicates co-first authors or corresponding authors)
[SIGMOD'25] Jixian Su, Chiyu Hao, Shixuan Sun, Hao Zhang, Sen Gao, Jiaxin Jiang, Yao Chen, Chenyi Zhang, Bingsheng He, and Minyi Guo. Revisiting the Design of In-Memory Dynamic Graph Storage. (Accepted)
[ICDE'25] Yujian Fu, Cheng Chen, Yao Chen, Weng-Fai Wong, Bingsheng He. Vista: Vector Indexing and Search for Large-scale Imbalanced Datasets. (Accepted)
[KDD'25] Borui Xu, Zeyi Wen, Yao Chen*, Weiguo Liu*, and Bingsheng He. ScalaGBM: Memory Efficient GBDT training for high-dimensional data on GPU. (Accepted)
[ECCV'24] Cheng Gong, Yao Chen*, Qiuyang Luo, Ye Lu, Tao Li, Yuzhi Zhang, Yufei Sun*, Le Zhang. Deep Feature Surgery: Towards Accurate and Efficient Multi-Exit Networks.
[SIGMOD'24] Qiange Wang, Yao Chen, Weng-Fai Wong, Bingsheng He. HongTu: Scalable Full-Graph GNN Training on Multiple GPUs.
[MICRO'22] Xinyu Chen, Yao Chen, Feng Cheng, Hongshi Tan, Weng-Fai Wong, Bingsheng He. ReGraph: Scaling Graph Processing on HBM-enabled FPGAs with Heterogeneous Pipelines.
[ASP-DAC'22] Xinheng Liu*, Yao Chen*, Junhao Pan, Jinjun Xiong, Deming Chen. HiKonv: High throughput quantized convolution with novel bit-wise management and computation.
[WACV'22] Prakhar Ganesh*, Yao Chen*, David Yin Yang, Marianne Winslett, Deming Chen. YOLO-ReT: Towards high accuracy real-time object detection on edge GPUs. [pdf]
[ASAP'22] Xinheng Liu, Yao Chen, Cong Hao, Ashutosh Dhar, Deming Chen. WinoCNN: Kernel sharing winograd systolic array for efficient convolutional neural network acceleration on FPGAs.
[ICS'21] Hongshi Tan, Xinyu Chen, Yao Chen, Wengfai Wong, Bingsheng He. ThundeRiNG: Generating multiple independent random number sequences on FPGAs.
[DAC'21] Lixiang Li, Yao Chen, Zacharie Zirnheld, Pan Li, Cong Hao. MELOPPR: Software/hardware co-design for memory-efficient low-latency personalized pagerank.
[DAC'21] Xinyu Chen, Hongshi Tan, Yao Chen, Wengfai Wong, Bingsheng He, Deming Chen. Skew-oblivious data routing for data intensive applications on FPGAs with HLS.
[DAC'20] Yuhong Li, Cong Hao, Xiaofan Zhang, Xinheng Liu, Yao Chen, Jinjun Xiong, Wen mei Hwu, Deming Chen. EDD: Efficient differentiable DNN architecture and implementation co-search for embedded AI solutions.
[CIDR'19] Xinyu Chen, Yao Chen, Ronak Bajaj, Jiong He, Bingsheng He, Weng-Fai Wong, Deming Chen. Is FPGA useful for hash joins? Exploring hash joins on coupled CPU-FPGA architecture.
[ICCAD'19] Cong Hao, Yao Chen, Xinheng Liu, Atif Sarwari, Daryl Sew, Ashutosh Dhar, Bryan Wu, Dongdong Fu, Jinjun Xiong, Wen-mei Hwu, Junli Gu, Deming Chen. NAIS: Neural architecture and implementation search and its applications in autonomous driving. (Invited)
[FPL'19] Xinyu Chen, Ronak Bajaj, Yao Chen, Jiong He, Weng-Fai Wong, Bingsheng He, Deming Chen. On-the-fly parallel data shuffling for graph processing on OpenCL-based FPGAs.
[ODML-CDNNR'19] Xiaofan Zhang, Cong Hao, Yuhong Li, Yao Chen, Jinjun Xiong, Wen-mei Hwu, Deming Chen. A bi-directional co-design approach to enable deep learning on IoT devices. In Proceedings of the ICML 2019 Workshop, Joint Workshop on On-Device Machine Learning & Compact Deep Neural Network Representations.
[ISVLSI'19]Yao Chen, Kai Zhang, Cheng Gong, Cong Hao, Xiaofan Zhang, Tao Li, Deming Chen. T-DLA: An open-source deep learning accelerator for ternarized DNN models on embedded FPGA. [pdf]
[IJCNN'19] Cheng Gong, Tao Li, Ye Lu, Cong Hao, Xiaofan Zhang, Deming Chen, Yao Chen. μL2Q: An ultra-low loss quantization method for DNN compression.
[FPGA'19]Yao Chen, Jiong He, Xiaofan Zhang, Cong Hao, Deming Chen. Cloud-DNN: An open framework for mapping DNN models to cloud FPGAs. [pdf]
[ISCAS'19] Huachao Xu, Jinlong Hu, Yao Chen, Guofeng Li, Chao Lu. Pico-ampere voltage references for iot systems.
[BIBM'18]Yao Chen, Libo Huang, Jiong He, Kunyao Zhao, Ruichu Cai, Zhifeng Hao. HASS: High accuracy spike sorting with wavelet package decomposition and mutual information.
[ICDCS'18] Jiong He, Yao Chen, Tom Zhengjia Fu, Xin Long, Marianne Winslett, Liang You, Zhenjie Zhang. HaaS: Cloud-based real-time data analytics with heterogeneity-aware scheduling.
[ICSICT'18] Huachao Xu, Yao Chen, Jinlong Hu, Hongkun Cai, Tao Du, Ke Liang, Guofeng Li. 110pA, 170ppm/V, -77dB@100hz voltage reference for IoT systems.
[ICSICT'18] Huachao Xu, Yao Chen, Jinlong Hu, Tao Du, Hongkun Cai, Ke Liang, Guofeng Li. 73 pA, 250 ppm/v, -74db@100hz voltage reference using one type of MOSFETs.
[ICCSS'18] Jinlong Hu, Huachao Xu, Tao Du, Guofeng Li, Yao Chen. Anovel1.03ppm/°C wide temperature range curvature compensated bandgap voltage reference.
[ISVLSI'16] Tan Nguyen, Yao Chen, Kyle Rupnow, Swathi T. Gurumani, Deming Chen. SoC, NoC and hierarchical bus implementations of applications on FPGAs using the FCUDA flow.
[FPGA'16] Xinheng Liu, Yao Chen, Tan Nguyen, Swathi Gurumani, Kyle Rupnow, Deming Chen. High level synthesis of complex applications: An h.264 video decoder.
[ASICON'15] Liwei Yang, Yao Chen, Wei Zuo, Tan Nguyen, Swathi T. Gurumani, Kyle Rupnow, Deming Chen. System-level design solutions: Enabling the IoT explosion.
[FCCM'14] Swathi T. Gurumani, Jacob Tolar, Yao Chen, Yun Liang, Kyle Rupnow, Deming Chen. Integrated CUDA-to-FPGA synthesis with Network-on-Chip.
Preprints
[ArXiv] Zining Zhang, Yao Chen, Bingsheng He, Zhenjie Zhang. Aggressive Post-Training Compression on Extremely Large Language Models. 2024.
[ArXiv]Yao Chen, Junhao Pan, Xinheng Liu, Jinjun Xiong, Deming Chen. HiKonv: Maximizing the Throughput of Quantized Convolution With Novel Bit-wise Management and Computation. 2023.
Book Chapters
Xiaofan Zhang, Yao Chen, Cong Hao, Sitao Huang, Yuhong Li and Deming Chen. "Compilation and Optimizations for Efficient Machine Learning on Embedded Systems," In Embedded Machine Learning for Cyber-Physical, IoT, and Edge Computing, Springer Nature, 2023.
Daisuke Mashima, Yao Chen, Muhammad M. Roomi, Subhash Lakshminarayana, Deming Chen. "Cybersecurity for Modern Smart Grid Against Emerging Threats," In Foundations and Trends in Privacy and Security, 2023.