Journal Papers (* indicates co-first authors)
- [TACO'24] Kunpeng Xie, Ye Lu, Xinyu He, Dezhi Yi, Huijuan Dong, Yao Chen. Winols: A Large-Tiling Sparse Winograd CNN Accelerator on FPGAs.
- [TPDS'23] Zining Zhang, Yao Chen, Bingsheng He, Zhenjie Zhang. NIOT: A Novel Inference Optimization of Transformers on Modern FPGAs.
- [TRETS'22] Xinyu Chen, Feng Cheng, Hongshi Tan, Yao Chen, Bingsheng He, Weng-Fai Wong, Deming Chen. ThunderGP: Resource-Efficient Graph Processing Framework on FPGAs with HLS.
- [TACL'21] Prakhar Ganesh*, Yao Chen*, Xin Lou, Mohammad Ali Khan, Yin Yang, Deming Chen, Marianne Winslett, Hassan Sajjad, Preslav Nakov. Compressing large-scale transformer-based models: A case study on BERT.
- [TSG'20] Prakhar Ganesh, Xin Lou, Yao Chen, Rui Tan, David K.Y. Yau, Deming Chen, Marianne Winslett. Learning-based simultaneous detection and characterization of time delay attack in cyber-physical systems.
- [TC'20] Cheng Gong, Yao Chen, Ye Lu, Tao Li, Cong Hao, Deming Chen. Vecq: Minimal loss DNN model compression with vectorized weight quantization.
- [TNanoBio'19] Libo Huang, Bingo Wing-Kuen Ling, Ruichu Cai, Yan Zeng, Jiong He, Yao Chen. WMsorting: Wavelet packets decomposition and mutual information based spike sorting method.
- [FGCS'18] Jihe Wang, Danghui Wang, Meikang Qiu, Yao Chen, Bing Guo. A locality-aware shuffle optimization on fat-tree data centers.
- [TCAD'16] Ying Chen, Tan Nguyen, Yao Chen, Swathi T. Gurumani, Yun Liang, Kyle Rupnow, Jason Cong, Wen-mei Hwu, Deming Chen. FCUDA-HB: Hierarchical and scalable bus architecture generation on FPGAs with the FCUDA flow.
- [TVLSI'16] Yao Chen, Swathi Gurumani, Yun Liang, Guofeng Li, Donghui Guo, Kyle Rupnow, Deming Chen. FCUDA-NoC: A scalable and efficient network-on-chip implementation for the CUDA-to-FPGA flow.
Conference Papers (* indicates co-first authors or corresponding authors)
- [SIGMOD'25] Feng Yu, Hongshi Tan, Xinyu Chen, Yao Chen, Weng-Fai Wong, and Bingsheng He. Clementi: Efficient Load Balancing and Communication Overlap for Multi-FPGA Graph Processing. (Accepted)
- [NAACL'25] Borui Xu, Zeyi Wen, Yao Chen*, Weiguo Liu*, and Bingsheng He. Evaluating Small Language Models for News Summarization: Implications and Factors Influencing Performance. (Accepted)
- [SIGMOD'25] Jixian Su, Chiyu Hao, Shixuan Sun, Hao Zhang, Sen Gao, Jiaxin Jiang, Yao Chen, Chenyi Zhang, Bingsheng He, and Minyi Guo. Revisiting the Design of In-Memory Dynamic Graph Storage.
- [ICDE'25] Yujian Fu, Cheng Chen, Yao Chen, Weng-Fai Wong, Bingsheng He. Vista: Vector Indexing and Search for Large-scale Imbalanced Datasets.
- [KDD'25] Borui Xu, Zeyi Wen, Yao Chen*, Weiguo Liu*, and Bingsheng He. ScalaGBM: Memory Efficient GBDT training for high-dimensional data on GPU.
- [ECCV'24] Cheng Gong, Yao Chen*, Qiuyang Luo, Ye Lu, Tao Li, Yuzhi Zhang, Yufei Sun*, Le Zhang. Deep Feature Surgery: Towards Accurate and Efficient Multi-Exit Networks.
- [SIGMOD'24] Qiange Wang, Yao Chen, Weng-Fai Wong, Bingsheng He. HongTu: Scalable Full-Graph GNN Training on Multiple GPUs.
- [SIGMOD'23] Hongshi Tan, Xinyu Chen, Yao Chen*, Bingsheng He*, Weng-Fai Wong. LightRW: FPGA Accelerated Graph Dynamic Random Walks. (Passed Artifact Evaluation)
- [MICRO'22] Xinyu Chen, Yao Chen, Feng Cheng, Hongshi Tan, Weng-Fai Wong, Bingsheng He. ReGraph: Scaling Graph Processing on HBM-enabled FPGAs with Heterogeneous Pipelines.
- [ASP-DAC'22] Xinheng Liu*, Yao Chen*, Junhao Pan, Jinjun Xiong, Deming Chen. HiKonv: High throughput quantized convolution with novel bit-wise management and computation.
- [WACV'22] Prakhar Ganesh*, Yao Chen*, David Yin Yang, Marianne Winslett, Deming Chen. YOLO-ReT: Towards high accuracy real-time object detection on edge GPUs. [pdf]
- [ASAP'22] Xinheng Liu, Yao Chen, Cong Hao, Ashutosh Dhar, Deming Chen. WinoCNN: Kernel sharing winograd systolic array for efficient convolutional neural network acceleration on FPGAs.
- [ICS'21] Hongshi Tan, Xinyu Chen, Yao Chen, Wengfai Wong, Bingsheng He. ThundeRiNG: Generating multiple independent random number sequences on FPGAs.
- [GLSVLSI'21] Yao Chen, Cole Hawkins, Kaiqi Zhang, Zheng Zhang, Cong Hao. 3U-EdgeAI: Ultra-low memory training, ultra-low bitwidth quantization, and ultra-low latency acceleration. [pdf]
- [DAC'21] Lixiang Li, Yao Chen, Zacharie Zirnheld, Pan Li, Cong Hao. MELOPPR: Software/hardware co-design for memory-efficient low-latency personalized pagerank.
- [DAC'21] Xinyu Chen, Hongshi Tan, Yao Chen, Wengfai Wong, Bingsheng He, Deming Chen. Skew-oblivious data routing for data intensive applications on FPGAs with HLS.
- [FPGA'21] Xinyu Chen, Hongshi Tan, Yao Chen, Wengfai Wong, Bingsheng He, Deming Chen. ThunderGP: HLS-based graph processing framework on FPGAs. (Passed Artifact Evaluation)
- [GLSVLSI'20] Cong Hao, Yao Chen, Xiaofan Zhang, Yuhong Li, Jinjun Xiong, Wen-mei Hwu, Deming Chen. Effective algorithm-accelerator co-design for AI solutions on edge devices.
- [ACL'20] Ruichu Cai, Zhihao Liang, Boyan Xu, Zijian Li, Yuexing Hao, Yao Chen. TAG: Type-auxiliary guiding for code comment generation.
- [ICDCS'20] Yao Chen, Xin Long, Jiong He, Yuhang Chen, Hongshi Tan, Zhenxiang Zhang, Marianne Winslett, Deming Chen. HaoCL: Harnessing large-scale heterogeneous processors made easy.
- [DAC'20] Yuhong Li, Cong Hao, Xiaofan Zhang, Xinheng Liu, Yao Chen, Jinjun Xiong, Wen mei Hwu, Deming Chen. EDD: Efficient differentiable DNN architecture and implementation co-search for embedded AI solutions.
- [CIDR'19] Xinyu Chen, Yao Chen, Ronak Bajaj, Jiong He, Bingsheng He, Weng-Fai Wong, Deming Chen. Is FPGA useful for hash joins? Exploring hash joins on coupled CPU-FPGA architecture.
- [ICCAD'19] Cong Hao, Yao Chen, Xinheng Liu, Atif Sarwari, Daryl Sew, Ashutosh Dhar, Bryan Wu, Dongdong Fu, Jinjun Xiong, Wen-mei Hwu, Junli Gu, Deming Chen. NAIS: Neural architecture and implementation search and its applications in autonomous driving. (Invited)
- [FPL'19] Xinyu Chen, Ronak Bajaj, Yao Chen, Jiong He, Weng-Fai Wong, Bingsheng He, Deming Chen. On-the-fly parallel data shuffling for graph processing on OpenCL-based FPGAs.
- [ODML-CDNNR'19] Xiaofan Zhang, Cong Hao, Yuhong Li, Yao Chen, Jinjun Xiong, Wen-mei Hwu, Deming Chen. A bi-directional co-design approach to enable deep learning on IoT devices. In Proceedings of the ICML 2019 Workshop, Joint Workshop on On-Device Machine Learning & Compact Deep Neural Network Representations.
- [ISVLSI'19] Yao Chen, Kai Zhang, Cheng Gong, Cong Hao, Xiaofan Zhang, Tao Li, Deming Chen. T-DLA: An open-source deep learning accelerator for ternarized DNN models on embedded FPGA. [pdf]
- [IJCNN'19] Cheng Gong, Tao Li, Ye Lu, Cong Hao, Xiaofan Zhang, Deming Chen, Yao Chen. μL2Q: An ultra-low loss quantization method for DNN compression.
- [FPGA'19] Yao Chen, Jiong He, Xiaofan Zhang, Cong Hao, Deming Chen. Cloud-DNN: An open framework for mapping DNN models to cloud FPGAs. [pdf]
- [ISCAS'19] Huachao Xu, Jinlong Hu, Yao Chen, Guofeng Li, Chao Lu. Pico-ampere voltage references for iot systems.
- [BIBM'18] Yao Chen, Libo Huang, Jiong He, Kunyao Zhao, Ruichu Cai, Zhifeng Hao. HASS: High accuracy spike sorting with wavelet package decomposition and mutual information.
- [ICDCS'18] Jiong He, Yao Chen, Tom Zhengjia Fu, Xin Long, Marianne Winslett, Liang You, Zhenjie Zhang. HaaS: Cloud-based real-time data analytics with heterogeneity-aware scheduling.
- [ICSICT'18] Huachao Xu, Yao Chen, Jinlong Hu, Hongkun Cai, Tao Du, Ke Liang, Guofeng Li. 110pA, 170ppm/V, -77dB@100hz voltage reference for IoT systems.
- [ICSICT'18] Huachao Xu, Yao Chen, Jinlong Hu, Tao Du, Hongkun Cai, Ke Liang, Guofeng Li. 73 pA, 250 ppm/v, -74db@100hz voltage reference using one type of MOSFETs.
- [ICCSS'18] Jinlong Hu, Huachao Xu, Tao Du, Guofeng Li, Yao Chen. Anovel1.03ppm/°C wide temperature range curvature compensated bandgap voltage reference.
- [ISVLSI'16] Tan Nguyen, Yao Chen, Kyle Rupnow, Swathi T. Gurumani, Deming Chen. SoC, NoC and hierarchical bus implementations of applications on FPGAs using the FCUDA flow.
- [FPGA'16] Xinheng Liu, Yao Chen, Tan Nguyen, Swathi Gurumani, Kyle Rupnow, Deming Chen. High level synthesis of complex applications: An h.264 video decoder.
- [ASICON'15] Liwei Yang, Yao Chen, Wei Zuo, Tan Nguyen, Swathi T. Gurumani, Kyle Rupnow, Deming Chen. System-level design solutions: Enabling the IoT explosion.
- [FCCM'14] Swathi T. Gurumani, Jacob Tolar, Yao Chen, Yun Liang, Kyle Rupnow, Deming Chen. Integrated CUDA-to-FPGA synthesis with Network-on-Chip.
Preprints
[ArXiv] Zining Zhang, Yao Chen, Bingsheng He, Zhenjie Zhang. Aggressive Post-Training Compression on Extremely Large Language Models. 2024.
[ArXiv] Yao Chen, Junhao Pan, Xinheng Liu, Jinjun Xiong, Deming Chen. HiKonv: Maximizing the Throughput of Quantized Convolution With Novel Bit-wise Management and Computation. 2023.
Book Chapters
- Xiaofan Zhang, Yao Chen, Cong Hao, Sitao Huang, Yuhong Li and Deming Chen. "Compilation and Optimizations for Efficient Machine Learning on Embedded Systems," In Embedded Machine Learning for Cyber-Physical, IoT, and Edge Computing, Springer Nature, 2023.
- Daisuke Mashima, Yao Chen, Muhammad M. Roomi, Subhash Lakshminarayana, Deming Chen. "Cybersecurity for Modern Smart Grid Against Emerging Threats," In Foundations and Trends in Privacy and Security, 2023.
期刊论文 (* 共一或通讯作者)
- [TACO'24] Kunpeng Xie, Ye Lu, Xinyu He, Dezhi Yi, Huijuan Dong, Yao Chen. Winols: A Large-Tiling Sparse Winograd CNN Accelerator on FPGAs.
- [TPDS'23] Zining Zhang, Yao Chen, Bingsheng He, Zhenjie Zhang. NIOT: A Novel Inference Optimization of Transformers on Modern FPGAs.
- [TRETS'22] Xinyu Chen, Feng Cheng, Hongshi Tan, Yao Chen, Bingsheng He, Weng-Fai Wong, Deming Chen. ThunderGP: Resource-Efficient Graph Processing Framework on FPGAs with HLS.
- [TACL'21] Prakhar Ganesh*, Yao Chen*, Xin Lou, Mohammad Ali Khan, Yin Yang, Deming Chen, Marianne Winslett, Hassan Sajjad, Preslav Nakov. Compressing large-scale transformer-based models: A case study on BERT.
- [TSG'20] Prakhar Ganesh, Xin Lou, Yao Chen, Rui Tan, David K.Y. Yau, Deming Chen, Marianne Winslett. Learning-based simultaneous detection and characterization of time delay attack in cyber-physical systems.
- [TC'20] Cheng Gong, Yao Chen, Ye Lu, Tao Li, Cong Hao, Deming Chen. Vecq: Minimal loss DNN model compression with vectorized weight quantization.
- [TNanoBio'19] Libo Huang, Bingo Wing-Kuen Ling, Ruichu Cai, Yan Zeng, Jiong He, Yao Chen. WMsorting: Wavelet packets decomposition and mutual information based spike sorting method.
- [FGCS'18] Jihe Wang, Danghui Wang, Meikang Qiu, Yao Chen, Bing Guo. A locality-aware shuffle optimization on fat-tree data centers.
- [TCAD'16] Ying Chen, Tan Nguyen, Yao Chen, Swathi T. Gurumani, Yun Liang, Kyle Rupnow, Jason Cong, Wen-mei Hwu, Deming Chen. FCUDA-HB: Hierarchical and scalable bus architecture generation on FPGAs with the FCUDA flow.
- [TVLSI'16] Yao Chen, Swathi Gurumani, Yun Liang, Guofeng Li, Donghui Guo, Kyle Rupnow, Deming Chen. FCUDA-NoC: A scalable and efficient network-on-chip implementation for the CUDA-to-FPGA flow.
会议论文 (* 共一或通讯作者)
- [SIGMOD'25] Jixian Su, Chiyu Hao, Shixuan Sun, Hao Zhang, Sen Gao, Jiaxin Jiang, Yao Chen, Chenyi Zhang, Bingsheng He, and Minyi Guo. Revisiting the Design of In-Memory Dynamic Graph Storage. (Accepted)
- [ICDE'25] Yujian Fu, Cheng Chen, Yao Chen, Weng-Fai Wong, Bingsheng He. Vista: Vector Indexing and Search for Large-scale Imbalanced Datasets.
- [KDD'25] Borui Xu, Zeyi Wen, Yao Chen*, Weiguo Liu*, and Bingsheng He. ScalaGBM: Memory Efficient GBDT training for high-dimensional data on GPU.
- [ECCV'24] Cheng Gong, Yao Chen*, Qiuyang Luo, Ye Lu, Tao Li, Yuzhi Zhang, Yufei Sun*, Le Zhang. Deep Feature Surgery: Towards Accurate and Efficient Multi-Exit Networks.
- [SIGMOD'24] Qiange Wang, Yao Chen, Weng-Fai Wong, Bingsheng He. HongTu: Scalable Full-Graph GNN Training on Multiple GPUs.
- [SIGMOD'23] Hongshi Tan, Xinyu Chen, Yao Chen*, Bingsheng He*, Weng-Fai Wong. LightRW: FPGA Accelerated Graph Dynamic Random Walks. (Passed Artifact Evaluation)
- [MICRO'22] Xinyu Chen, Yao Chen, Feng Cheng, Hongshi Tan, Weng-Fai Wong, Bingsheng He. ReGraph: Scaling Graph Processing on HBM-enabled FPGAs with Heterogeneous Pipelines.
- [ASP-DAC'22] Xinheng Liu*, Yao Chen*, Junhao Pan, Jinjun Xiong, Deming Chen. HiKonv: High throughput quantized convolution with novel bit-wise management and computation.
- [WACV'22] Prakhar Ganesh*, Yao Chen*, David Yin Yang, Marianne Winslett, Deming Chen. YOLO-ReT: Towards high accuracy real-time object detection on edge GPUs. [pdf]
- [ASAP'22] Xinheng Liu, Yao Chen, Cong Hao, Ashutosh Dhar, Deming Chen. WinoCNN: Kernel sharing winograd systolic array for efficient convolutional neural network acceleration on FPGAs.
- [ICS'21] Hongshi Tan, Xinyu Chen, Yao Chen, Wengfai Wong, Bingsheng He. ThundeRiNG: Generating multiple independent random number sequences on FPGAs.
- [GLSVLSI'21] Yao Chen, Cole Hawkins, Kaiqi Zhang, Zheng Zhang, Cong Hao. 3U-EdgeAI: Ultra-low memory training, ultra-low bitwidth quantization, and ultra-low latency acceleration. [pdf]
- [DAC'21] Lixiang Li, Yao Chen, Zacharie Zirnheld, Pan Li, Cong Hao. MELOPPR: Software/hardware co-design for memory-efficient low-latency personalized pagerank.
- [DAC'21] Xinyu Chen, Hongshi Tan, Yao Chen, Wengfai Wong, Bingsheng He, Deming Chen. Skew-oblivious data routing for data intensive applications on FPGAs with HLS.
- [FPGA'21] Xinyu Chen, Hongshi Tan, Yao Chen, Wengfai Wong, Bingsheng He, Deming Chen. ThunderGP: HLS-based graph processing framework on FPGAs. (Passed Artifact Evaluation)
- [GLSVLSI'20] Cong Hao, Yao Chen, Xiaofan Zhang, Yuhong Li, Jinjun Xiong, Wen-mei Hwu, Deming Chen. Effective algorithm-accelerator co-design for AI solutions on edge devices.
- [ACL'20] Ruichu Cai, Zhihao Liang, Boyan Xu, Zijian Li, Yuexing Hao, Yao Chen. TAG: Type-auxiliary guiding for code comment generation.
- [ICDCS'20] Yao Chen, Xin Long, Jiong He, Yuhang Chen, Hongshi Tan, Zhenxiang Zhang, Marianne Winslett, Deming Chen. HaoCL: Harnessing large-scale heterogeneous processors made easy.
- [DAC'20] Yuhong Li, Cong Hao, Xiaofan Zhang, Xinheng Liu, Yao Chen, Jinjun Xiong, Wen mei Hwu, Deming Chen. EDD: Efficient differentiable DNN architecture and implementation co-search for embedded AI solutions.
- [CIDR'19] Xinyu Chen, Yao Chen, Ronak Bajaj, Jiong He, Bingsheng He, Weng-Fai Wong, Deming Chen. Is FPGA useful for hash joins? Exploring hash joins on coupled CPU-FPGA architecture.
- [ICCAD'19] Cong Hao, Yao Chen, Xinheng Liu, Atif Sarwari, Daryl Sew, Ashutosh Dhar, Bryan Wu, Dongdong Fu, Jinjun Xiong, Wen-mei Hwu, Junli Gu, Deming Chen. NAIS: Neural architecture and implementation search and its applications in autonomous driving. (Invited)
- [FPL'19] Xinyu Chen, Ronak Bajaj, Yao Chen, Jiong He, Weng-Fai Wong, Bingsheng He, Deming Chen. On-the-fly parallel data shuffling for graph processing on OpenCL-based FPGAs.
- [ODML-CDNNR'19] Xiaofan Zhang, Cong Hao, Yuhong Li, Yao Chen, Jinjun Xiong, Wen-mei Hwu, Deming Chen. A bi-directional co-design approach to enable deep learning on IoT devices. In Proceedings of the ICML 2019 Workshop, Joint Workshop on On-Device Machine Learning & Compact Deep Neural Network Representations.
- [ISVLSI'19] Yao Chen, Kai Zhang, Cheng Gong, Cong Hao, Xiaofan Zhang, Tao Li, Deming Chen. T-DLA: An open-source deep learning accelerator for ternarized DNN models on embedded FPGA. [pdf]
- [IJCNN'19] Cheng Gong, Tao Li, Ye Lu, Cong Hao, Xiaofan Zhang, Deming Chen, Yao Chen. μL2Q: An ultra-low loss quantization method for DNN compression.
- [FPGA'19] Yao Chen, Jiong He, Xiaofan Zhang, Cong Hao, Deming Chen. Cloud-DNN: An open framework for mapping DNN models to cloud FPGAs. [pdf]
- [ISCAS'19] Huachao Xu, Jinlong Hu, Yao Chen, Guofeng Li, Chao Lu. Pico-ampere voltage references for iot systems.
- [BIBM'18] Yao Chen, Libo Huang, Jiong He, Kunyao Zhao, Ruichu Cai, Zhifeng Hao. HASS: High accuracy spike sorting with wavelet package decomposition and mutual information.
- [ICDCS'18] Jiong He, Yao Chen, Tom Zhengjia Fu, Xin Long, Marianne Winslett, Liang You, Zhenjie Zhang. HaaS: Cloud-based real-time data analytics with heterogeneity-aware scheduling.
- [ICSICT'18] Huachao Xu, Yao Chen, Jinlong Hu, Hongkun Cai, Tao Du, Ke Liang, Guofeng Li. 110pA, 170ppm/V, -77dB@100hz voltage reference for IoT systems.
- [ICSICT'18] Huachao Xu, Yao Chen, Jinlong Hu, Tao Du, Hongkun Cai, Ke Liang, Guofeng Li. 73 pA, 250 ppm/v, -74db@100hz voltage reference using one type of MOSFETs.
- [ICCSS'18] Jinlong Hu, Huachao Xu, Tao Du, Guofeng Li, Yao Chen. Anovel1.03ppm/°C wide temperature range curvature compensated bandgap voltage reference.
- [ISVLSI'16] Tan Nguyen, Yao Chen, Kyle Rupnow, Swathi T. Gurumani, Deming Chen. SoC, NoC and hierarchical bus implementations of applications on FPGAs using the FCUDA flow.
- [FPGA'16] Xinheng Liu, Yao Chen, Tan Nguyen, Swathi Gurumani, Kyle Rupnow, Deming Chen. High level synthesis of complex applications: An h.264 video decoder.
- [ASICON'15] Liwei Yang, Yao Chen, Wei Zuo, Tan Nguyen, Swathi T. Gurumani, Kyle Rupnow, Deming Chen. System-level design solutions: Enabling the IoT explosion.
- [FCCM'14] Swathi T. Gurumani, Jacob Tolar, Yao Chen, Yun Liang, Kyle Rupnow, Deming Chen. Integrated CUDA-to-FPGA synthesis with Network-on-Chip.
预印版
[ArXiv] Zining Zhang, Yao Chen, Bingsheng He, Zhenjie Zhang. Aggressive Post-Training Compression on Extremely Large Language Models. 2024.
[ArXiv] Yao Chen, Junhao Pan, Xinheng Liu, Jinjun Xiong, Deming Chen. HiKonv: Maximizing the Throughput of Quantized Convolution With Novel Bit-wise Management and Computation. 2023.
书籍与章节
- Xiaofan Zhang, Yao Chen, Cong Hao, Sitao Huang, Yuhong Li and Deming Chen. "Compilation and Optimizations for Efficient Machine Learning on Embedded Systems," In Embedded Machine Learning for Cyber-Physical, IoT, and Edge Computing, Springer Nature, 2023.
- Daisuke Mashima, Yao Chen, Muhammad M. Roomi, Subhash Lakshminarayana, Deming Chen. "Cybersecurity for Modern Smart Grid Against Emerging Threats," In Foundations and Trends in Privacy and Security, 2023.