自主智能与无人系统团队

任务目标与背景
研究方向
研究基础

任务目标

顺应国家重大战略目标，解决人工智能领域核心问题，产出有国际影响力的科研成果，形成国际上领先的具身智能、大模型理论与算法研究团队，设计先验知识引导和硬件感知的大模型压缩方法，提出面向物理具身环境的软硬协同多模态AI理论。

背景

顺应当前人工智能发展趋势，聚焦人工智能发展痛点，自主智能与无人系统将在生产活动中发挥巨大优势，致力解决理论和落地方面的关键性难题。

研究方向

面向国家战略与国际趋势，致力于具身智能、大模型理论与算法的研究，攻克人工智能短硬件效率、功耗、产业部署等诸多难题。

研究基础

团队依托各位成员深耕多年的深厚技术积淀，汇聚多位“杰青”、“海优”等顶尖学者，有扎实的理论基础和软硬件实力，深耕大模型等基础研究算法以及具身智能、无人系统、机器人等前沿应用研究。

团队成员

陈涛陈涛

高能效具身智能理论研究，多模态具身环境感知、理解和推理，具身大模型的软硬协同设计和优化

林宙辰教授

机器学习、优化算法

王斌教授

信号/图像处理、模式识别、机器学习，遥感图像、自然图像与毫米波图像的信息处理与理解

胡波教授

数字通信、数字图像与视频处理、数字系统设计

杨涛副教授

AI赋能的无线通信、分布式智能与边缘联邦学习、空地协同的自主无人系统

冯辉副教授

图信号处理、图机器学习、智能电子系统

尹建君副教授

智能视觉/语音/语言/理解及应用，多模态知识图谱与知识加工，垂类大模型开发与应用

黄奇伟工程师

微机原理与接口实验、信号与系统实验、通信系统实验

科研项目

项目名称	项目类型	负责人	执行年限
子类感知深度学习理论及其应用研究子类感知深度学习理论及其应用研究国家自然科学基金委员会，面上项目 2021至2024 负责人：陈涛项目介绍	国家自然科学基金委员会，面上项目	陈涛	2021至2024
面向网络化智能制造的分布式自主学习理论与方法面向网络化智能制造的分布式自主学习理论与方法国家自然科学基金委员会，重点项目 2020至2023 负责人：陈涛项目介绍	国家自然科学基金委员会，重点项目	陈涛	2020至2023
通用视觉方法与基础研发平台通用视觉方法与基础研发平台国家科技重大专项子课题 2023至2025 负责人：陈涛项目介绍	国家科技重大专项子课题	陈涛	2023至2025
青少年颞下颌关节源性牙颌面畸形多模态数据库青少年颞下颌关节源性牙颌面畸形多模态数据库工信部，人工智能医疗器械揭榜挂帅重点专项项目 2023至2025 负责人：陈涛项目介绍	工信部，人工智能医疗器械揭榜挂帅重点专项项目	陈涛	2023至2025
噪声鲁棒的光矩阵计算低精度量化适配Transformer模型研究噪声鲁棒的光矩阵计算低精度量化适配Transformer模型研究上海科委，原创“探索者”计划项目 2024至2025 负责人：陈涛项目介绍	上海科委，原创“探索者”计划项目	陈涛	2024至2025
面向有限算力和数据的多模态融合与推理研究面向有限算力和数据的多模态融合与推理研究上海自然科学基金委员会，面上项目 2023至2026 负责人：陈涛项目介绍	上海自然科学基金委员会，面上项目	陈涛	2023至2026
面向跨媒体数据融合的轻量化协同人工智能理论研究面向跨媒体数据融合的轻量化协同人工智能理论研究上海市人工智能市级科技重大专项 2021至2025 负责人：陈涛项目介绍	上海市人工智能市级科技重大专项	陈涛	2021至2025
恶劣天气下的嵌入式视觉监控恶劣天气下的嵌入式视觉监控复旦-睿派科技校企联合研究中心 2022至2025 负责人：陈涛项目介绍	复旦-睿派科技校企联合研究中心	陈涛	2022至2025
等变深度网络设计及其应用等变深度网络设计及其应用国家自然科学基金委员会面上项目 2023-01-01 至 2026-12-31 负责人：林宙辰项目介绍本项目致力于系统性地构建等变深度网络的设计理论与算法，包括等变网络拓扑架构设计、更一般的等变/部分等变卷积设计、等变非线性性设计（包括多元等变激活函数设计和非线性等变层设计）、等变网络架构搜索（包括和潜在等变性对应的群的搜索）与轻量化，并在图像处理、物体检测、几何对象处理等需要等变性的重要问题上进行检验，从而解决现有等变网络设计理论不够完善、门槛高、得到的网络不够实用和应用面不够宽的问题。	国家自然科学基金委员会面上项目	林宙辰	2023-01-01 至 2026-12-31
面向通用视觉的机器学习理论与方法面向通用视觉的机器学习理论与方法中华人民共和国科学技术部科技创新2030-“新一代人工智能（2030）”重大项目 2023-01 至 2025-12 负责人：林宙辰项目介绍当前通用视觉智能面临着数据、泛化、认知、效率和安全等若干个瓶颈问题。本项目致力于从机器学习角度系统性地解决这些问题。本项目将从机器学习基础理论、机器学习底层算法、机器学习高层算法、机器学习应用支撑和应用验证五个层次展开研究，相应地设置五个课题。	中华人民共和国科学技术部科技创新2030-“新一代人工智能（2030）”重大项目	林宙辰	2023-01 至 2025-12
面向复杂场景的大模型深度逻辑推理能力提升关键技术研究面向复杂场景的大模型深度逻辑推理能力提升关键技术研究北京市自然科学基金-顺义新能源智能网联汽车创新联合基金 2025-07 至2028-06 负责人：林宙辰项目介绍近年来关于大语言模型推理的研究受到了广泛关注，但大语言模型在逻辑推理能力仍存在明显不足，限制了其在复杂场景的实际应用。针对大模型逻辑推理能力弱、多智能体组织协同不足、知识注入效率低、缺少覆盖多类型逻辑推理的测评基准等问题，本项目拟研究：（1）基于逻辑形式复杂度和推理复杂度的大模型分阶段微调方法，提升大模型对复杂问题的逻辑推理能力；（2）基于多智能体自主论辩机制实现对大量逻辑前提的分割、传递与总结，通过多智能体协作赋能大模型的复杂逻辑推理和自主决策；（3）融合逐步推理与自适应检索增强生成，实现基于逻辑表示的物理规律和世界知识高效注入；（4）建立一套覆盖多逻辑类型的大规模评测基准和系统，从而全面评估大模型的逻辑推理能力。综上，本项目的开展能为大模型在复杂推理需求较高的智能工业制造、金融风控评估、智慧司法服务、自动驾驶辅助等真实世界场景的落地应用，具有显著的理论价值和实际应用前景。	北京市自然科学基金-顺义新能源智能网联汽车创新联合基金	林宙辰	2025-07 至2028-06
面向高效精准工具调用的长上下文建模与跨会话记忆增强技术研究及示范应用面向高效精准工具调用的长上下文建模与跨会话记忆增强技术研究及示范应用中央引导地方专项 2025-10至2027-10 负责人：林宙辰项目介绍本课题针对当前大语言模型在长对话场景中上下文管理效率低下、长期跨会话记忆能力不足、工具调用缺乏个性化和精准性等关键问题，重点突破基于单位文本携带的信息密度提升的长文本选择与压缩算法、基于长期记忆与外部知识的动态检索技术与协同训练与推理、以及基于记忆驱动的工具调用动态更新技术的模型与工具的协同进化与自我增强等关键技术，在人工智能与互联网领域不少于2 家企业开展示范应用，涵盖智能客服、个性化推荐、实时交互营销等方向。	中央引导地方专项	林宙辰	2025-10至2027-10
多模态遥感大数据智能融合分析与精准推测多模态遥感大数据智能融合分析与精准推测国家重点研发计划“地球观测与导航”专项项目（编号: 2022YFB3903400） 2022年12月-2026年11月负责人：王斌项目介绍	国家重点研发计划“地球观测与导航”专项项目（编号: 2022YFB3903400）	王斌	2022年12月-2026年11月
高光谱遥感图像目标探测的新理论新方法及关键技术研究高光谱遥感图像目标探测的新理论新方法及关键技术研究国家自然科学基金面上项目（编号：61971141） 2020年1月-2023年12月负责人：王斌项目介绍	国家自然科学基金面上项目（编号：61971141）	王斌	2020年1月-2023年12月
高/多光谱遥感图像非线性盲解混新理论新方法及在悬沙量反演中的应用高/多光谱遥感图像非线性盲解混新理论新方法及在悬沙量反演中的应用国家自然科学基金面上项目（NSF:62371140） 2024年1月-2027年12月负责人：王斌项目介绍	国家自然科学基金面上项目（NSF:62371140）	王斌	2024年1月-2027年12月
可重构自适应毫米波快速安检成像与自动目标识别关键技术研究可重构自适应毫米波快速安检成像与自动目标识别关键技术研究国家自然科学基金重点项目（编号：61731021） 2018年1月-2022年12月负责人：王斌项目介绍	国家自然科学基金重点项目（编号：61731021）	王斌	2018年1月-2022年12月
高光谱遥感图像混合像元非线性分解理论方法及应用研究高光谱遥感图像混合像元非线性分解理论方法及应用研究国家自然科学基金面上项目（编号：61572133） 2016年1月-2019年12月负责人：王斌项目介绍	国家自然科学基金面上项目（编号：61572133）	王斌	2016年1月-2019年12月
高光谱遥感图像张量分析的理论与方法研究高光谱遥感图像张量分析的理论与方法研究国家自然科学基金面上项目（编号：41371337） 2014年1月-2014年12月负责人：王斌项目介绍	国家自然科学基金面上项目（编号：41371337）	王斌	2014年1月-2014年12月
基于频域的选择性视觉注意模型及在遥感图像中的应用基于频域的选择性视觉注意模型及在遥感图像中的应用国家自然科学基金面上项目（编号：61071134） 2011年1月-2013年12月负责人：王斌项目介绍	国家自然科学基金面上项目（编号：61071134）	王斌	2011年1月-2013年12月
高光谱遥感图像混合像元盲分解理论及应用高光谱遥感图像混合像元盲分解理论及应用国家自然科学基金面上项目（编号：60672116） 2007年1月-2009年12月负责人：王斌项目介绍	国家自然科学基金面上项目（编号：60672116）	王斌	2007年1月-2009年12月
基于约束的独立元分析法对脑磁图（MEG）信号的分析和处理基于约束的独立元分析法对脑磁图（MEG）信号的分析和处理国家自然科学基金面上项目（编号：30370392） 2004年1月-2006年12月负责人：王斌项目介绍	国家自然科学基金面上项目（编号：30370392）	王斌	2004年1月-2006年12月
高光谱遥感图像光谱分解理论、算法与实现高光谱遥感图像光谱分解理论、算法与实现国家高技术研究计划（863计划）项目（编号：2009AA12Z115） 2009年1月-2010年12月负责人：王斌项目介绍	国家高技术研究计划（863计划）项目（编号：2009AA12Z115）	王斌	2009年1月-2010年12月
基于张量分析的高光谱遥感图像处理研究基于张量分析的高光谱遥感图像处理研究上海市教育委员会科研创新项目（重点项目）（编号：13ZZ005） 2013年1月-2015年12月负责人：王斌项目介绍	上海市教育委员会科研创新项目（重点项目）（编号：13ZZ005）	王斌	2013年1月-2015年12月
基于内生智能的分布式网络与通信关键技术基于内生智能的分布式网络与通信关键技术科技部重点研发课题-战略性科技创新合作专项 2024.7-2027.6 负责人：胡波项目介绍聚焦内生智能的分布式网络与通信关键技术及系统验证，从构建可定制弹性算网底座、移动时变拓扑的联邦智能系统性能分析以及发展协同感知与自主控制的内生智能服务技术三个方向开展研究，以面向车路协同的内生智能仿真与验证系统为切入点，形成仿真环境构建→核心算法研究→ 安全与部署考量的逐次递进的研究策略和方案实施，为6G网络与智能交通等领域的发展提供关键技术支撑。	科技部重点研发课题-战略性科技创新合作专项	胡波	2024.7-2027.6
全天候生产运营综合保障备件智慧仓储系统研发全天候生产运营综合保障备件智慧仓储系统研发长三角科技创新共同体联合攻关课题 2025.12-2028.11 负责人：杨涛项目介绍针对目前散杂货港口仓储管理存在作业高度依赖人工、仓储作业效率低下、安全隐患突出、备件库存管理粗放等问题，开展散杂货港口智慧仓储及无人巡检关键技术攻关，本课题聚焦全天候生产运营综合保障备件智慧仓储系统的研发，建立散杂货港口仓储物流全流程管控技术体系并研发成套系统，树立散杂货港口仓储物流新模式，支撑我国散杂货港口的高质量发展。	长三角科技创新共同体联合攻关课题	杨涛	2025.12-2028.11
微网关键装备碳轨迹和能耗的空间立体监测技术研究微网关键装备碳轨迹和能耗的空间立体监测技术研究国家电网公司总部科技项目 2021.11-2023.12 负责人：胡波项目介绍结合实时天气、园区建筑和生产经营活动的性质，考虑清洁能源发电-负荷预测-储能-电动车-氢利用在内的园区微电网等多种因素，完成微电网碳排系统中规模数据的画像，给出碳排放风险刻画的理论模型和性能指标，发现及规避潜在的碳排放失控风险；构建包含园区传感器布局、碳数据采集、建模、计算、数据展示以及标准制定在内的园区碳排放平台；建立微网关键装备的全生命周期的碳轨迹模型，形成地面大数据建模和无人机整合方案，建立立体化的碳核查体系。	国家电网公司总部科技项目	胡波	2021.11-2023.12
多无人机协同图传系统中的三维干扰规避研究多无人机协同图传系统中的三维干扰规避研究上海市自然科学基金 2019.7-2022.6 负责人：杨涛项目介绍课题以多无人机协同图传任务过程中的传输干扰有效规避为研究目标，以匹配博弈理论为建模工具，分析了合作干扰规避方案的可行性，研究了具体方案的构造及方案性能的分析和比较,为低空无人机群的安全部署提供了技术参考。	上海市自然科学基金	杨涛	2019.7-2022.6

科研发表

期刊论文
会议论文
著作

论文标题	发表期刊	第一/通讯作者	发表年份
PaceLLM:Brain-Inspired Large Language Models for Long-Context Understanding PaceLLM:Brain-Inspired Large Language Models for Long-Context Understanding Kangcong Li, Tao Chen NeurIPS 摘要 While Large Language Models (LLMs) demonstrate strong performance across domains, their long-context capabilities are limited by transient neural activa tions causing information decay and unstructured feed-forward network (FFN) weights leading to semantic fragmentation. Inspired by the brain’s working mem ory and cortical modularity, we propose PaceLLM, featuring two innovations: (1) a Persistent Activity (PA) Mechanism that mimics prefrontal cortex (PFC) neurons’ persistent firing by introducing an activation-level memory bank to dynamically retrieve, reuse, and update critical FFN states, addressing contextual decay; and (2) Cortical Expert (CE) Clustering that emulates task-adaptive neural specialization to reorganize FFN weights into semantic modules, establishing cross-token depen dencies and mitigating fragmentation. Extensive evaluations show that PaceLLM achieves 6% improvement on LongBench’s Multi-document QA and 12.5–17.5% performance gains on ∞-Bench tasks, while extending measurable context length to 200K tokens in Needle-In-A-Haystack (NIAH) tests. This work pioneers brain inspired LLM optimization and is complementary to other works. Besides, it can be generalized to any model and enhance their long-context performance and interpretability without structural overhauls.	NeurIPS	Kangcong Li, Tao Chen	2025
FAVOR-Bench:A Comprehensive Benchmark for Fine-Grained Video Motion Understanding FAVOR-Bench:A Comprehensive Benchmark for Fine-Grained Video Motion Understanding Chongjun Tu, Tao Chen NeurIPS 摘要 Multimodal Large Language Models (MLLMs) have shownremarkable capabilities in video content understand ing but still struggle with fine-grained motion comprehen sion. To comprehensively assess the motion understand ing ability of existing MLLMs, we introduce FAVOR-Bench, comprising 1,776 videos with structured manual annota tions of various motions. Our benchmark includes both close-ended and open-ended tasks. For close-ended evalu ation, we carefully design 8,184 multiple-choice question answer pairs spanning six distinct sub-tasks. For open ended evaluation, we develop both a novel cost-efficient LLM-free and a GPT-assisted caption assessment method, where the former can enhance benchmarking interpretabil ity and reproducibility. Comprehensive experiments with 21 state-of-the-art MLLMs reveal significant limitations in their ability to comprehend and describe detailed temporal dynamics in video motions. To alleviate this limitation, we further build FAVOR-Train, a dataset consisting of 17,152 videos with fine-grained motion annotations. The results of finetuning Qwen2.5-VL on FAVOR-Train yield consis tent improvements on motion-related tasks of TVBench, Mo tionBench and our FAVOR-Bench. Comprehensive assess ment results demonstrate that the proposed FAVOR-Bench and FAVOR-Train provide valuable tools to the community for developing more powerful video understanding models.	NeurIPS	Chongjun Tu, Tao Chen	2025
DOLPHIN:Moving Towards Closed-loop Auto-research through Thinking, Practice, and Feedback DOLPHIN:Moving Towards Closed-loop Auto-research through Thinking, Practice, and Feedback Jiakang Yuan, Tao Chen ACL Main Conference 摘要 The scientific research paradigm is undergoing a profound transformation owing to the development of Artificial Intelligence (AI). Recent works demonstrate that various AI-assisted research methods can largely improve research efficiency by improving data analysis, accelerating computation, and fostering novel idea generation. To further move towards the ultimate goal (i.e., automatic scientific research), in this paper, we introduce Dolphin, a closed-loop LLM-driven framework to enhance the automation level of scientific research. Dolphin first generates novel ideas based on feedback from previous experiments and relevant papers ranked by the topic and task attributes. Then, the generated ideas can be implemented using a code template refined and debugged with the designed exception-traceback-guided local code structure. Finally, Dolphin automatically analyzes the results of each idea and feeds the results back to the next round of idea generation. Experiments are conducted on the benchmark datasets of different topics and a subset of MLE-bench. Results show that Dolphin can continuously improve the performance of the input topic in a loop. We highlight that Dolphin can automatically propose methods that are comparable to the state-of-the-art in some tasks such as 3D point classification.	ACL Main Conference	Jiakang Yuan, Tao Chen	2025
Boost Embodied AI Models with Robust Compression Boundary Boost Embodied AI Models with Robust Compression Boundary Chong Yu, Tao Chen IJCAI 摘要 The rapid improvement of deep learning models with the integration of the physical world has dramatically improved embodied AI capabilities. Meanwhile, the powerful embodied AI models and their scales place an increasing burden on deployment efficiency. The efficiency issue is more apparent on embodied AI platforms than on data centers because they have more limited computational resources and memory bandwidth. Meanwhile, most embodied AI scenarios, like autonomous driving and robotics, are more sensitive to fast responses. Theoretically, the traditional model compression techniques can help embodied AI models with more efficient computation, lower memory and energy consumption, and reduced latency. Because the embodied AI models are expected to interact with the physical world, the corresponding compressed models are also expected to resist natural corruption caused by real-world events such as noise, blur, weather conditions, and even adversarial corruption. This paper explores the novel paradigm to boost the efficiency of the embodied AI models and the robust compression boundary. The efficacy of our method has been proven to find the optimal balance between accuracy, efficiency, and robustness in real-world conditions.	IJCAI	Chong Yu, Tao Chen	2025
Once-Tuning-Multiple-Variants:Tuning Once and Expanded as Multiple Vision-Language Model Variants Once-Tuning-Multiple-Variants:Tuning Once and Expanded as Multiple Vision-Language Model Variants Chong Yu, Tao Chen CVPR 摘要 Vision-language model (VLM) is one of the most important models for multi-modal tasks. Real industrial applications often meet the challenge of adapting VLMs to different sce narios, such as varying hardware platforms or performance requirements. Traditional methods involve training or fine tuning to adapt multiple unique VLMs or using model com pression techniques to create multiple compact models. These approaches are complex and resource-intensive. This paper introduces a novel paradigm called Once-Tuning Multiple-Variants (OTMV). OTMV requires only a single tuning process to inject dynamic weight expansion capacity into the original VLM structure. This tuned VLM can then be expanded into multiple variants tailored for different sce narios in inference. The tuning mechanism of OTMV is in spired by the mathematical series expansion theorem, which helps to reduce the parameter size and memory require ments while maintaining accuracy for VLM. Experiment re sults show that OTMV-tuned models achieve comparable accuracy to baseline VLMs across various visual-language tasks. The experiments also demonstrate the dynamic ex pansion capability of OTMV-tuned VLMs, outperforming traditional model compression and adaptation techniques in terms of accuracy and efficiency.	CVPR	Chong Yu, Tao Chen	2025
DeRS:Towards Extremely Efficient Upcycled Mixture-of-Experts Models DeRS:Towards Extremely Efficient Upcycled Mixture-of-Experts Models Yongqi Huang, Tao Chen CVPR 摘要 Upcycled Mixture-of-Experts (MoE) models have shown great potential in various tasks by converting the original Feed-Forward Network (FFN) layers in pre-trained dense models into MoE layers. However, these models still suffer from significant parameter inefficiency due to the introduction of multiple experts. In this work, we propose a novel DeRS (Decompose, Replace, and Synthesis) paradigm to overcome this shortcoming, which is motivated by our observations about the unique redundancy mechanisms of upcycled MoE experts. Specifically, DeRS decomposes the experts into one expert-shared base weight and multiple expert-specific delta weights, and subsequently represents these delta weights in lightweight forms. Our proposed DeRS paradigm can be applied to enhance parameter efficiency in two different scenarios, including: 1) DeRS Compression for inference stage, using sparsification or quantization to compress vanilla upcycled MoE models; and 2) DeRS Upcycling for training stage, employing lightweight sparse or low-rank matrixes to efficiently upcycle dense models into MoE models. Extensive experiments across three different tasks show that the proposed methods can achieve extreme parameter efficiency while maintaining the performance for both training and compression of upcycled MoE models.	CVPR	Yongqi Huang, Tao Chen	2025
Consistency-aware Self-Training for Iterative-based Stereo Matching Consistency-aware Self-Training for Iterative-based Stereo Matching Zhou Jingyi, Tao Chen CVPR 摘要 Iterative-based methods have become mainstream in stereo matching due to their high performance. However, these methods heavily rely on labeled data and face challenges with unlabeled real-world data. To this end, we propose a consistency-aware self-training framework for iterative-based stereo matching for the first time, leveraging real-world unlabeled data in a teacher-student manner. We first observe that regions with larger errors tend to exhibit more pronounced oscillation characteristics during model this http URL on this, we introduce a novel consistency-aware soft filtering module to evaluate the reliability of teacher-predicted pseudo-labels, which consists of a multi-resolution prediction consistency filter and an iterative prediction consistency filter to assess the prediction fluctuations of multiple resolutions and iterative optimization respectively. Further, we introduce a consistency-aware soft-weighted loss to adjust the weight of pseudo-labels accordingly, relieving the error accumulation and performance degradation problem due to incorrect pseudo-labels. Extensive experiments demonstrate that our method can improve the performance of various iterative-based stereo matching approaches in various scenarios. In particular, our method can achieve further enhancements over the current SOTA methods on several benchmark datasets.	CVPR	Zhou Jingyi, Tao Chen	2025
HiSplat:Hierarchical 3D Gaussian Splatting for Generalizable Sparse-View Reconstruction HiSplat:Hierarchical 3D Gaussian Splatting for Generalizable Sparse-View Reconstruction Shengji Tang, Tao Chen ICLR 摘要 Reconstructing 3D scenes from multiple viewpoints is a fundamental task in stereo vision. Recently, advances in generalizable 3D Gaussian Splatting have enabled high-quality novel view synthesis for unseen scenes from sparse input views by feed-forward predicting per-pixel Gaussian parameters without extra optimiza tion. However, existing methods typically generate single-scale 3D Gaussians, which lack representation of both large-scale structure and texture details, re sulting in mislocation and artefacts. In this paper, we propose a novel frame work, HiSplat, which introduces a hierarchical manner in generalizable 3D Gaus sian Splatting to construct hierarchical 3D Gaussians via a coarse-to-fine strategy. Specifically, HiSplat generates large coarse-grained Gaussians to capture large scale structures, followed by fine-grained Gaussians to enhance delicate texture details. To promote inter-scale interactions, we propose an Error Aware Module for Gaussian compensation and a Modulating Fusion Module for Gaussian repair. Our method achieves joint optimization of hierarchical representations, allowing for novel view synthesis using only two-view reference images. Comprehensive experiments on various datasets demonstrate that HiSplat significantly enhances reconstruction quality and cross-dataset generalization compared to prior single scale methods. The corresponding ablation study and analysis of different-scale 3D Gaussians reveal the mechanism behind the effectiveness.	ICLR	Shengji Tang, Tao Chen	2025
Training-Free Adaptive Diffusion with Bounded Difference Approximation Strategy Training-Free Adaptive Diffusion with Bounded Difference Approximation Strategy Hancheng Ye, Tao Chen NeurIPS 摘要 Diffusion models have recently achieved great success in the synthesis of high-quality images and videos. However, the existing denoising techniques in diffusion models are commonly based on step-by-step noise predictions, which suffers from high computation cost, resulting in a prohibitive latency for interactive applications. In this paper, we propose AdaptiveDiffusion to relieve this bottleneck by adaptively reducing the noise prediction steps during the denoising process. Our method considers the potential of skipping as many noise prediction steps as possible while keeping the final denoised results identical to the original full-step ones. Specifically, the skipping strategy is guided by the third-order latent difference that indicates the stability between timesteps during the denoising process, which benefits the reusing of previous noise prediction results. Extensive experiments on image and video diffusion models demonstrate that our method can significantly speed up the denoising process while generating identical results to the original process, achieving up to an average 2~5x speedup without quality degradation.	NeurIPS	Hancheng Ye, Tao Chen	2024
MeshXL:Neural Coordinate Field for Generative 3D Foundation Models MeshXL:Neural Coordinate Field for Generative 3D Foundation Models Sijin Chen, Tao chen NeurIPS 摘要 The polygon mesh representation of 3D data exhibits great flexibility, fast rendering speed, and storage efficiency, which is widely preferred in various applications. However, given its unstructured graph representation, the direct generation of high-fidelity 3D meshes is challenging. Fortunately, with a pre-defined ordering strategy, 3D meshes can be represented as sequences, and the generation process can be seamlessly treated as an auto-regressive problem. In this paper, we validate the Neural Coordinate Field (NeurCF), an explicit coordinate representation with implicit neural embeddings, is a simple-yet-effective representation for large scale sequential mesh modeling. After that, we present MeshXL, a family of generative pre-trained auto-regressive models, which addresses the process of 3D mesh generation with modern large language model approaches. Extensive experiments show that MeshXL is able to generate high-quality 3D meshes, and can also serve as foundation models for various down-stream applications.	NeurIPS	Sijin Chen, Tao chen	2024
M3DBench:Let’s Instruct Large Models with Multi-modal 3D Prompts M3DBench:Let’s Instruct Large Models with Multi-modal 3D Prompts Mingsheng Li, Tao Chen ECCV 摘要 Recently, 3D understanding has become popular to facilitate autonomous agents to perform further decisionmaking. However, existing 3D datasets and methods are often limited to specific tasks. On the other hand, recent progress in Large Language Models (LLMs) and Multimodal Language Models (MLMs) have demonstrated exceptional general language and imagery tasking performance. Therefore, it is interesting to unlock MLM's potential to be 3D generalist for wider tasks. However, current MLMs' research has been less focused on 3D tasks due to a lack of large-scale 3D instruction-following datasets. In this work, we introduce a comprehensive 3D instructionfollowing dataset called M3DBench, which possesses the following characteristics: 1) It supports general multimodal instructions interleaved with text, images, 3D objects, and other visual prompts. 2) It unifies diverse 3D tasks at both region and scene levels, covering a variety of fundamental abilities in real-world 3D environments. 3) It is a large-scale 3D instruction-following dataset with over 320k instruction-response pairs. Furthermore, we establish a new benchmark for assessing the performance of large models in understanding multi-modal 3D prompts. Extensive experiments demonstrate the effectiveness of our dataset and baseline, supporting general 3D-centric tasks, which can inspire future research.	ECCV	Mingsheng Li, Tao Chen	2024
MotionChain:Conversational Motion Controllers via Multimodal Prompts MotionChain:Conversational Motion Controllers via Multimodal Prompts Biao Jiang, Tao chen ECCV 摘要 Recent advancements in language models have demonstrated their adeptness in conducting multi-turn dialogues and retaining conversational context. However, this proficiency remains largely unexplored in other multimodal gen erative models, particularly in human motion models. By integrating multi-turn conversations in controlling continuous virtual human movements, generative human motion models can achieve an intuitive and step-by-step process of human task execution for humanoid robotics, game agents, or other embodied systems. In this work, we present MotionChain, a conversational human motion controller to generate continuous and long-term human motion through multimodal prompts. Specifically, MotionChain consists of multi-modal tokenizers that transform var ious data types such as text, image, and motion, into discrete tokens, coupled with a Vision-Motion-aware Language model. By leveraging large-scale language, vision-language, and vision-motion data to assist motion-related generation tasks, MotionChain thus comprehends each instruction in multi-turn conversation and generates human motions followed by these prompts. Extensive experiments vali date the efficacy of MotionChain, demonstrating state-of-the-art performance in conversational motion generation, as well as more intuitive manners of controlling and interacting with virtual humans.	ECCV	Biao Jiang, Tao chen	2024
Better Regression Makes Better Test-time Adaptive 3D Object Detection Better Regression Makes Better Test-time Adaptive 3D Object Detection Yuan Jiakang, Tao Chen ECCV 摘要 DomainAdaptation (DA)hasbeenwidely explored and made significant progress on cross-domain 3D tasks recently. Despite being ef fective, existing works fail to deal with rapidly changing domains due to the unpredictable test time scenarios and meanwhile fast response time requirement. Thus, we explore a new task named test-time domain adaptive 3D object detection and propose Reg-TTA3D, a pseudo-label based test-time adaptative 3D object detection method. By investigating the factor that limits the detection accuracy, we find that regression is essential in this task. To make better regression, we first design a noise consistency pseudo-label generation process to filter pseudo-labels with instability under noise interference and obtain reliable pseudo-labels. Then, confidence-guided regression refinement is introduced, which uses the box regression results of high-confidence boxes to supervise boxes with relatively low confidence, further making the predicted box size gradually approach the distribution of the target domain. Finally, to better update the regression layer and alleviate the class-imbalance is sue, a class-balance EMA updating strategy is proposed. Experimental results on multiple cross-domain scenarios including cross-beam, cross location, and cross-weather demonstrate that Reg-TTA3D can achieve comparable or even better performance compared to unsupervised do main adaptation works by only updating less than 0.1% parameters within less than 1% time.	ECCV	Yuan Jiakang, Tao Chen	2024
LL3DA:Visual Interactive Instruction Tuning for Omni-3D Understanding, Reasoning, and Planning LL3DA:Visual Interactive Instruction Tuning for Omni-3D Understanding, Reasoning, and Planning Sijin Chen, Tao chen CVPR 摘要 Recent advances in Large Multimodal Models (LMM) have made it possible for various applications in human machine interactions. However, developing LMMs that can comprehend, reason, and plan in complex and diverse 3D environments remains a challenging topic, especially considering the demand for understanding permutation invariant point cloud 3D representations of the 3D scene. Existing works seek help from multi-view images, and project 2D features to 3D space as 3D scene representa tions. This, however, leads to huge computational over head and performance degradation. In this paper, we present LL3DA, a Large Language 3D Assistant that takes point cloud as direct input and respond to both textual instructions and visual-prompts. This help LMMs better comprehend human interactions and further help to remove the ambiguities in cluttered 3D scenes. Experiments show that LL3DAachievesremarkableresults, and surpasses var ious 3D vision-language models on both 3D Dense Cap tioning and 3D Question Answering.	CVPR	Sijin Chen, Tao chen	2024
MotionGPT:Human Motion as a Foreign Language MotionGPT:Human Motion as a Foreign Language Biao Jiang, Tao chen NeurIPS 摘要 Though the advancement of pre-trained large language models unfolds, the exploration of building a unified model for language and other multi-modal data, such as motion, remains challenging and untouched so far. Fortunately, human motion displays a semantic coupling akin to human language, often perceived as a form of body language. By fusing language data with large-scale motion models, motion-language pre-training that can enhance the performance of motion-related tasks becomes feasible. Driven by this insight, we propose MotionGPT, a unified, versatile, and user-friendly motion-language model to handle multiple motion-relevant tasks. Specifically, we employ the discrete vector quantization for human motion and transfer 3D motion into motion tokens, similar to the generation process of word tokens. Building upon this "motion vocabulary", we perform language modeling on both motion and text in a unified manner, treating human motion as a specific language. Moreover, inspired by prompt learning, we pre-train MotionGPT with a mixture of motion-language data and fine-tune it on prompt-based question-and-answer tasks. Extensive experiments demonstrate that MotionGPT achieves state-of-the-art performances on multiple motion tasks including text-driven motion generation, motion captioning, motion prediction, and motion in-between.	NeurIPS	Biao Jiang, Tao chen	2023
Projective Equivariant Networks via Second-order Fundamental Differential Invariants Projective Equivariant Networks via Second-order Fundamental Differential Invariants Yikang Li, Zhouchen Lin NeurIPS 2025 摘要 Equivariant networks enhance model efficiency and generalization by embedding symmetry priors into their architectures. However, most existing methods, primarily based on group convolutions and steerable convolutions, face significant limitations when dealing with complex transformation groups, particularly the projective group, which plays a crucial role in vision. In this work, we tackle the challenge by constructing projective equivariant networks based on differential invariants. Using the moving frame method with a carefully selected cross section tailored for multi-dimensional functions, we derive a complete and concise set of second-order fundamental differential invariants of the projective group. We provide a rigorous analysis of the properties and transformation relationships of their underlying components, yielding a further simplified and unified set of fundamental differential invariants, which facilitates both theoretical analysis and practical applications. Building on this foundation, we develop the first deep projective equivariant networks, PDINet, which achieve full projective equivariance without discretizing or sampling the group. Empirical results on the projectively transformed STL-10 and Imagenette datasets show that PDINet achieves improvements of 11.39% and 5.66% in accuracy over the respective standard baselines under out-of-distribution settings, demonstrating its strong generalization to complex geometric transformations.	NeurIPS 2025	Yikang Li, Zhouchen Lin	2025
PaZO: Preconditioned Accelerated Zeroth-Order Optimization for Fine-Tuning LLMs PaZO: Preconditioned Accelerated Zeroth-Order Optimization for Fine-Tuning LLMs Hanzhen Zhao, Zhouchen Lin NeurIPS 2025 摘要 This paper introduces PaZO, a preconditioned accelerated zeroth-order optimization algorithm for fine-tuning large language models (LLMs). First, we theoretically demonstrate the necessity of preconditioning in zeroth-order optimization, proving that zeroth-order stochastic gradient descent (ZO-SGD) alone fails to achieve the ideal convergence rate. Building on this, we propose a Preconditioned Simultaneous Perturbation Stochastic Approximation (PSPSA) and theoretical version of PaZO, and demonstrate that setting the order of preconditioner as -1/2 in PSPSA yields the improved convergence rate for PaZO. Moreover, we design a practical version of PaZO that stabilizes training via diagonal Hessian estimate and moving average technique. Extensive experiments on diverse downstream tasks with models like RoBERTa-large and OPT show PaZO's effectiveness. Compared to other zeroth order baselines, PaZO achieves better performance across models and tasks.	NeurIPS 2025	Hanzhen Zhao, Zhouchen Lin	2025
Stepsize Anything: A Unified Learning Rate Schedule for Budgeted-Iteration Training Stepsize Anything: A Unified Learning Rate Schedule for Budgeted-Iteration Training Anda Tang, Zhouchen Lin NeurIPS 2025 摘要 The expanding computational costs and limited resources underscore the critical need for budgeted-iteration training, which aims to achieve optimal learning within predetermined iteration budgets. While learning rate schedules fundamentally govern the performance of different networks and tasks, particularly in budgeted-iteration scenarios, their design remains largely heuristic, lacking theoretical foundations. In addition, the optimal learning rate schedule requires extensive trial-and-error selection, making the training process inefficient. In this work, we propose the Unified Budget-Aware (UBA) schedule, a theoretically grounded learning rate schedule that consistently outperforms commonly-used schedules among diverse architectures and tasks under different constrained training budgets. First, we bridge the gap by constructing a novel training budget-aware optimization framework, which explicitly accounts for the robustness to landscape curvature variations. From this framework, we derive the UBA schedule, controlled by a single hyper-parameter $\varphi$ that provides a trade-off between flexibility and simplicity, eliminating the need for per-network numerical optimization. Moreover, we establish a theoretical connection between $\varphi$ and the condition number, adding interpretation and justification to our approach. Besides, we prove the convergence for different values of $\varphi$. We offer practical guidelines for its selection via theoretical analysis and empirical results. Extensive experimental results show that UBA \textit{consistently surpasses} the commonly-used schedules across diverse vision and language tasks, spanning network architectures (e.g., ResNet, OLMo) and scales, under different training-iteration budgets.	NeurIPS 2025	Anda Tang, Zhouchen Lin	2025
Affine Equivariant Networks Based on Differential Invariants Affine Equivariant Networks Based on Differential Invariants Yikang Li, Zhouchen Lin CVPR 2024 摘要 Convolutional neural networks benefit from translation equivariance, achieving tremendous success. Equivariant networks further extend this property to other transformation groups. However, most existing methods require discretization or sampling of groups, leading to increased model sizes for larger groups, such as the affine group. In this paper, we build affine equivariant networks based on differential invariants from the viewpoint of symmetric PDEs, without discretizing or sampling the group. To address the division-by-zero issue arising from fractional differential invariants of the affine group, we construct a new kind of affine invariants by normalizing polynomial relative differential invariants to replace classical differential invariants. For further flexibility, we design an equivariant layer, which can be directly integrated into convolutional networks of various architectures. Moreover, our frame work for the affine group is also applicable to its continuous subgroups. We implement equivariant networks for the scale group, the rotation-scale group, and the affine group. Numerical experiments demonstrate the outstanding performance of our framework across classification tasks involving transformations of these groups. Remarkably, under the out-of-distribution setting, our model achieves a 3.37% improvement in accuracy over the main counterpart affConv on the affNIST dataset.	CVPR 2024	Yikang Li, Zhouchen Lin	2024
PDO-eConvs: Partial Differential Operator Based Equivariant Convolutions PDO-eConvs: Partial Differential Operator Based Equivariant Convolutions Zhengyang Shen, Zhouchen Lin ICML 2020 摘要 Recent research has shown that incorporating equivariance into neural network architectures is very helpful, and there have been some works investigating the equivariance of networks under group actions. However, as digital images and feature maps are on the discrete meshgrid, corresponding equivariance-preserving transformation groups are very limited. In this work, we deal with this issue from the connection between convolutions and partial differential operators (PDOs). In theory, assuming inputs to be smooth, we transform PDOs and propose a system which is equivariant to a much more general continuous group, the n-dimension Euclidean group. In implementation, we discretize the system using the numerical schemes of PDOs, deriving approximately equivariant convolutions (PDO-eConvs). Theoretically, the approximation error of PDO-eConvs is of the quadratic order. It is the first time that the error analysis is provided when the equivariance is approximate. Extensive experiments on rotated MNIST and natural image classification show that PDO-eConvs perform competitively yet use parameters much more efficiently. Particularly, compared with Wide ResNets, our methods result in better results using only 12.6% parameters.	ICML 2020	Zhengyang Shen, Zhouchen Lin	2020
Hyperspectral Anomaly Detection Via Deep Prior Mamba Network Hyperspectral Anomaly Detection Via Deep Prior Mamba Network Linwei Li, Chonghui Wan, Bin Wang*, and Bo Hu 2025 IEEE International Geoscience and Remote Sensing Symposium (IGARSS’25) 摘要	2025 IEEE International Geoscience and Remote Sensing Symposium (IGARSS’25)	Linwei Li, Chonghui Wan, Bin Wang*, and Bo Hu	pp. 8659-8663, 3 - 8 August 2025, Brisbane, Australia.
Unsupervised Nonlinear Hyperspectral Unmixing using Kernel-Based Autoencoder Network Unsupervised Nonlinear Hyperspectral Unmixing using Kernel-Based Autoencoder Network Chonghui Wan, Linwei Li, Bin Wang*, and Bo Hu 2025 IEEE International Geoscience and Remote Sensing Symposium (IGARSS’25) 摘要	2025 IEEE International Geoscience and Remote Sensing Symposium (IGARSS’25)	Chonghui Wan, Linwei Li, Bin Wang*, and Bo Hu	pp. 8720-8724, 3 - 8 August 2025, Brisbane, Australia.
Merging Total Variation into Low-rank Representation for Hyperspectral Anomaly Detection Merging Total Variation into Low-rank Representation for Hyperspectral Anomaly Detection Linwei Li, Ziyu Wu, Bin Wang*, and Bo Hu 2024 IEEE International Geoscience and Remote Sensing Symposium (IGARSS’24) 摘要	2024 IEEE International Geoscience and Remote Sensing Symposium (IGARSS’24)	Linwei Li, Ziyu Wu, Bin Wang*, and Bo Hu	pp. 7991-7994, 7 - 12 July 2024, Athens, Greece.
Unsupervised Nonlinear Hyperspectral Unmixing Based on An Extended Multilinear Mixing Model-inspired Dual-stream Network Unsupervised Nonlinear Hyperspectral Unmixing Based on An Extended Multilinear Mixing Model-inspired Dual-stream Network Minglei Li, Linwei Li, Bin Wang*, and Bo Hu 2024 IEEE International Geoscience and Remote Sensing Symposium (IGARSS’24) 摘要	2024 IEEE International Geoscience and Remote Sensing Symposium (IGARSS’24)	Minglei Li, Linwei Li, Bin Wang*, and Bo Hu	pp. 9033-9036, 7 - 12 July 2024, Athens, Greece.
An Autoencoder Framework with Transformer Encoder and EMLM Embedded Decoder for Nonlinear Hyperspectral Anomaly Detection An Autoencoder Framework with Transformer Encoder and EMLM Embedded Decoder for Nonlinear Hyperspectral Anomaly Detection Ziyu Wu, Linwei Li, Bin Wang*, and Bo Hu 2024 IEEE International Geoscience and Remote Sensing Symposium (IGARSS’24) 摘要	2024 IEEE International Geoscience and Remote Sensing Symposium (IGARSS’24)	Ziyu Wu, Linwei Li, Bin Wang*, and Bo Hu	pp. 9015-9018, 7 - 12 July 2024, Athens, Greece.
EventMG: Efficient Multilevel Mamba-Graph Learning for Spatiotemporal Event Representation EventMG: Efficient Multilevel Mamba-Graph Learning for Spatiotemporal Event Representation 冯辉 Annual Conference on Neural Information Processing Systems (NeurIPS) 摘要 Event cameras offer unique advantages in scenarios involving high speed, low light, and high dynamic range, yet their asynchronous and sparse nature poses significant challenges to efficient spatiotemporal representation learning. Specifically, despite notable progress in the field, effectively modeling the full spatiotemporal context, selectively attending to salient dynamic regions, and robustly adapting to the variable density and dynamic nature of event data remain key challenges. Motivated by these challenges, this paper proposes EventMG, a lightweight, efficient, multilevel Mamba-Graph architecture designed for learning high-quality spatiotemporal event representations. EventMG employs a multilevel approach, jointly modeling information at the micro (single event) and macro (event cluster) levels to comprehensively capture the multi-scale characteristics of event data. At the micro-level, it focuses on spatiotemporal details, employing State Space Model (SSM) based Mamba, to precisely capture long-range dependencies among numerous event nodes. Concurrently, at the macro-level, Component Graphs are introduced to efficiently encode the local semantics and global topology of dense event regions. Furthermore, to better accommodate the dynamic and sparse characteristics of data, we propose the Spatiotemporal-aware Event Scanning Technology (SEST), integrating the Adaptive Perturbation Network (APN) and Multidirectional Scanning Module (MSM), which substantially enhances the model's ability to perceive and focus on key spatiotemporal patterns. By employing this novel collaborative paradigm, EventMG demonstrates the ability to effectively capture multi-level spatiotemporal characteristics of event data while maintaining a low parameter count and linear computational complexity, suggesting a promising direction for event representation learning.	Annual Conference on Neural Information Processing Systems (NeurIPS)	冯辉	2025
UAV-Assisted Multi-Task Federated Learning with Task Knowledge Sharing UAV-Assisted Multi-Task Federated Learning with Task Knowledge Sharing 杨涛 IEEE International Conference on Communications（ICC）. 摘要 The rapid development of Unmanned aerial vehicles (UAVs) technology has spawned a wide variety of applications, such as emergency communications, regional surveillance, and disaster relief. Due to their limited battery capacity and processing power, multiple UAVs are often required for complex tasks. In such cases, a control center is crucial for coordinating their activities, which fits well with the federated learning (FL) framework. However, conventional FL approaches often focus on a single task, ignoring the potential of training multiple related tasks simultaneously. In this paper, we propose a UAVassisted multi-task federated learning scheme, in which data collected by multiple UAVs can be used to train multiple related tasks concurrently. The scheme facilitates the training process by sharing feature extractors across related tasks and introduces a task attention mechanism to balance task performance and encourage knowledge sharing. To provide an analytical description of training performance, the convergence analysis of the proposed scheme is performed. Additionally, the optimal bandwidth allocation for UAVs under limited bandwidth conditions is derived to minimize communication time. Meanwhile, a UAV-EV association strategy based on coalition formation game is proposed. Simulation results validate the effectiveness of the proposed scheme in enhancing multi-task performance and training speed.	IEEE International Conference on Communications（ICC）.	杨涛	2025
EGSST: Event-based Graph Spatiotemporal Sensitive Transformer for Object Detection EGSST: Event-based Graph Spatiotemporal Sensitive Transformer for Object Detection 冯辉 Annual Conference on Neural Information Processing Systems (NeurIPS) 摘要 Event cameras provide exceptionally high temporal resolution in dynamic vision systems due to their unique event-driven mechanism. However, the sparse and asynchronous nature of event data makes frame-based visual processing methods inappropriate. This study proposes a novel framework, Event-based Graph Spatiotemporal Sensitive Transformer (EGSST), for the exploitation of spatial and temporal properties of event data. Firstly, a well-designed graph structure is employed to model event data, which not only preserves the original temporal data but also captures spatial details. Furthermore, inspired by the phenomenon that human eyes pay more attention to objects that produce significant dynamic changes, we design a Spatiotemporal Sensitivity Module (SSM) and an adaptive Temporal Activation Controller (TAC). Through these two modules, our framework can mimic the response of the human eyes in dynamic environments by selectively activating the temporal attention mechanism based on the relative dynamics of event data, thereby effectively conserving computational resources. In addition, the integration of a lightweight, multi-scale Linear Vision Transformer (LViT) markedly enhances processing efficiency. Our research proposes a fully event-driven approach, effectively exploiting the temporal precision of event data and optimising the allocation of computational resources by intelligently distinguishing the dynamics within the event data. The framework provides a lightweight, fast, accurate, and fully event-based solution for object detection tasks in complex dynamic environments, demonstrating significant practicality and potential for application.	Annual Conference on Neural Information Processing Systems (NeurIPS)	冯辉	2024
Hardware Acceleration of Phase and Gain Control for Analog Beamforming Hardware Acceleration of Phase and Gain Control for Analog Beamforming 胡波 2024 IEEE International Symposium on Circuits and Systems (ISCAS) 摘要 The beamforming technique has been widely used to improve the link budget in wireless communications. Compared with the digital beamformer, the analog beamformer has much lower hardware complexity and is more suitable for low-cost mobile applications. In this paper, we consider element-level phase and gain control of the analog beamformer using two phase shifters only. By setting the phase shifts properly, simultaneous 360◦ phase and 6-dB gain control (SPGC) can be achieved to form the beam pattern. We first propose a low-complexity SPGC method tailored for massive multiple-input multipleoutput (MIMO). Based on the conventional and proposed SPGC methods, we then design the full-featured accelerator (FFA) and the hardware-efficient accelerator (HEA) to accelerate the computing process. These two accelerators are implemented in 28 nm technology. FFA integrates 137 kilogate equivalents (kGE) in a core area of 0.0693 mm2 and dissipates 104.4 mw at 2.0 GHz with 16 degrees of parallelism, while HEA can reduce the core area by 44.6% and power consumption by 42.7% without significant performance loss.	2024 IEEE International Symposium on Circuits and Systems (ISCAS)	胡波	2024
Recovery of Graph Signals From Sign Measurements Recovery of Graph Signals From Sign Measurements 冯辉 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 摘要 Sampling and interpolation of continuous graph signals have been extensively studied, in order to reconstruct or estimate the entire graph signal from the signal values on a subset of vertices. Whereas in a lot of real-world scenarios, only the signs of signals are available. For example, a rating system may only provide simple options such as “like” or “dislike”. We are interested in whether it is possible to recover the original signal from such coarse information. In this paper, the reconstruction of bandlimited graph signals based on sign measurements is discussed and a greedy sampling strategy is proposed. The simulation experiments are presented, and the greedy sampling algorithm is compared with the random sampling algorithm, which verifies the feasibility of the proposed approach.	2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)	冯辉	2022
Performance Analysis for Correlated AoI and Energy Efficiency in Heterogeneous CR-IoT System Performance Analysis for Correlated AoI and Energy Efficiency in Heterogeneous CR-IoT System 杨涛 IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS) 摘要 We consider a cognitive radio based Internet of Things (CR-IoT) system where the secondary IoT device (SD) accesses the licensed channel during the transmission vacancies of the primary IoT device (PD). We focus on the impact of the IoT devices’ heterogeneous traffic pattern on the energy efficiency and on the age of information (AoI) performance of the SD. We first derive closed-form expressions of the energy efficiency and the average AoI, and subsequently explore their convexity and monotonicity to the transmit power. Following these characterizations, an optimal transmit power optimization algorithm (TPOA) is proposed for the SD to maximize the energy efficiency while maintaining the average AoI under a predefined threshold. Numerical results verify the different preferences of the SD toward different PD traffic patterns, and provides insights into the tradeoff between the energy efficiency and the average AoI.	IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS)	杨涛	2021
Regularized Recovery by Multi-Order Partial Hypergraph Total Variation Regularized Recovery by Multi-Order Partial Hypergraph Total Variation 冯辉 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 摘要 Capturing complex high-order interactions among data is an important task in many scenarios. A common way to model high-order interactions is to use hypergraphs whose topology can be mathematically represented by tensors. Existing methods use a fixed-order tensor to describe the topology of the whole hypergraph, which ignores the divergence of differentorder interactions. In this work, we take this divergence into consideration, and propose a multi-order hypergraph Laplacian and the corresponding total variation. Taking this total variation as a regularization term, we can utilize the topology information contained by it to smooth the hypergraph signal. This can help distinguish different-order interactions and represent high-order interactions accurately.	IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)	冯辉	2021

论文标题	发表期刊	第一/通讯作者	发表年份
机器学习中的交替方向乘子法机器学习中的交替方向乘子法林宙辰科学出版社摘要本书概述了机器学习中ADMM 的新进展. 书中全面介绍了各种情形下的ADMM, 包括确定性和随机性的算法、集中式和分布式的算法, 以及求解凸问题和非凸问题的算法, 深入介绍了各个算法的核心思想, 并为算法的收敛性和收敛速度提供了详细的证明。本书面向机器学习和优化领域的研究人员, 也包括人工智能、信号处理、自动控制、网络通信、应用数学等专业的高年级本科生和研究生, 以及从事相关领域产品研发的工程师。	科学出版社	林宙辰	2023
机器学习中的加速一阶优化算法机器学习中的加速一阶优化算法林宙辰机械工业出版社摘要本书概述了机器学习中加速一阶优化算法的新进展。书中全面介绍了各种情形下的加速一阶优化算法，包括确定性和随机性的算法、同步和异步的算法，以求解带约束的问题和无约束的问题、凸问题和非凸问题，对算法思想进行了深入的解读，并对其收敛速度提供了详细的证明。本书面向机器学习和优化领域的研究人员，包括人工智能、信号处理及应用数学特别是计算数学专业高年级本科生、研究生，以及从事人工智能、信号处理领域产品研发的工程师。	机械工业出版社	林宙辰	2021
Alternating Direction Method of Multipliers for Machine Learnin Alternating Direction Method of Multipliers for Machine Learnin Zhouchen Lin Springer 摘要 Written by experts in machine learning and optimization, this is the first book offering a state-of-the-art review on ADMM under various scenarios, including deterministic and convex optimization, nonconvex optimization, stochastic optimization, and distributed optimization. Offering a rich blend of ideas, theories and proofs, the book is up-to-date and self-contained. It is an excellent reference book for users who are seeking a relatively universal algorithm for constrained problems. Graduate students or researchers can read it to grasp the frontiers of ADMM in machine learning in a short period of time.	Springer	Zhouchen Lin	2022
Accelerated Optimization for Machine Learning: First-Order Algorithms Accelerated Optimization for Machine Learning: First-Order Algorithms Zhouchen Lin Springer 摘要 Written by experts of machine learning and optimization, this is the first book offering the state-of-the-art review on accelerated first-order optimization algorithms for machine Learning. The book provides a comprehensive introduction to the addressed topic. It covers various methods, including both deterministic and stochastic algorithms, where the algorithms can be synchronous or asynchronous, for both unconstrained and constrained problems, where the problems can be convex or non-convex. Offering a rich blend of ideas, theories and proofs, the book is up-to-date and self-contained. It is an excellent reference book for users who are seeking faster optimization algorithms. Graduate students or researchers can read it to grasp the frontiers of optimization in machine learning in a short period of time.	Springer	Zhouchen Lin	2020
高光谱遥感图像解混理论与方法：从线性到非线性高光谱遥感图像解混理论与方法：从线性到非线性王斌、杨斌科学出版社摘要（该专著获2019年度国家科学技术出版基金资助）	科学出版社	王斌、杨斌	2019
自动驾驶——人工智能理论与实践自动驾驶——人工智能理论与实践胡波清华大学出版社摘要本书参照产业界自动驾驶技术研发的基本流程，充分借鉴了产业界在自动驾驶技术领域中的实际研发经验，以高性能的智能小车和高度仿真的车道沙盘为实验教具和运行环境，深入浅出地讲解自动驾驶技术的原理与实际应用，为初学者打开一扇通往人工智能世界的大门。本书以帮助初学者如何从无到有地打造出具备自动驾驶功能的智能小车为主线，内容分为看车（了解自动驾驶）、造车（设计智能小车）、开车（收集训练数据）、写车（编写自动驾驶模型）、算车（训练和优化自动驾驶模型）、玩车（部署并验证自动驾驶模型）6章。初学者可以通过边学习理论知识边动手实践的方式，系统学习人工智能的算法理论和应用实例。本书没有堆砌艰深晦涩的公式推导，力求将枯燥难解的算法原理及模型进行直观的讲解，希望读者在学习的过程中，了解现实中自动驾驶技术的发展并获得运用人工智能解决自动驾驶难题的乐趣。本书适合作为高等院校智能科学与技术、人工智能相关专业的教材，也适合作为人工智能研究人员、开发人员的参考书。	清华大学出版社	胡波	2023

人才培养

招生信息
科创项目

招生信息：科硕、科博

专业名称及代码：

080900 电子科学与技术

招生信息：专硕、专博

专业名称及代码：

085400 电子信息