MADTP++:Bridge the Gap between Token and Weight Pruning for Accelerating VLTs
MADTP++:Bridge the Gap between Token and Weight Pruning for Accelerating VLTs
Jianjian Cao, Tao Chen
IEEE Transactions on Pattern Analysis and Machine Intelligence (T-PAMI)
摘要
Vision-Language Transformers (VLTs) have achieved remarkable success, but their computational costs pose a challenge due to the large number of input tokens and extensive model parameters. Existing VLT compression methods primarily rely on single-modality-based token pruning or coarse-grained weight pruning techniques. However, these methods face significant obstacles, such as ignoring the critical alignment of different modalities and lacking the flexibility to dynamically compress each layer for token pruning, exhibiting inevitable performance degradation due to coarse-grained weight pruning, and struggling with the simultaneous compression of both input tokens and model parameters. To address those limitations, we propose MADTP++, a novel approach that integrates custom-made token and weight pruning processes into a unified framework, achieving superior compression in both parameter counts and computational costs.
|
IEEE Transactions on Pattern Analysis and Machine Intelligence (T-PAMI)
|
Jianjian Cao, Tao Chen
|
2025
|
Attention Reallocation:Towards Zero-cost and Controllable Hallucination Mitigation of MLLMs
Attention Reallocation:Towards Zero-cost and Controllable Hallucination Mitigation of MLLMs
Chongjun Tu, Tao Chen
International Journal of Computer Vision (IJCV),
摘要
Multi-Modal Large Language Models (MLLMs) stand out in various tasks but still struggle with hallucinations. While recent training-free mitigation methods mostly introduce additional inference overhead via retrospection strategy and contrastive decoding, we propose attention reallocation (AttnReal) to mitigate hallucinations with nearly zero extra cost. Our approach is motivated by the key observations that, MLLM's unreasonable attention distribution causes features to be dominated by historical output tokens, which further contributes to hallucinated responses because of the distribution gap between different token types. Based on the observations, AttnReal recycles excessive attention from output tokens and reallocates it to visual tokens, which reduces MLLM's reliance on language priors and ensures the decoding process depends more on the visual inputs. More interestingly, we find that, by controlling the intensity of AttnReal, we can achieve a wide-range trade-off between the response faithfulness and overall performance. Comprehensive results from different benchmarks validate the effectiveness of AttnReal across six open-source MLLMs and three decoding strategies.
|
International Journal of Computer Vision (IJCV),
|
Chongjun Tu, Tao Chen
|
2025
|
Taylor-Series-Expansion-Based Vision Transformer Models
Taylor-Series-Expansion-Based Vision Transformer Models
Chong Yu, Tao Chen
IEEE Transactions on Pattern Analysis and Machine Intelligence (T-PAMI)
摘要
Taylor-Series-Expansion (TSE) is a mathematics theorem. It proves that the expansion of the first few finite Taylor Series is a good approximation of a nonlinear function in most cases. Inspired by the TSE theorem, a brand-new TSE-based vision transformer is designed. TSE-based vision transformer uses the shared first-order TSE transformer block’s weight (in analogy with the Taylor-Series first-order term), its finite multiple multiplications (in analogy with the Taylor-Series expanded high-order terms), and the corresponding learnable TSE coefficients to approximate the naive vision transformer. In this manner, the TSE-based vision model reduces the memory burden but keeps a similar accuracy as the naive counterpart. Derived from adding the Taylor skip mechanism in training, the TSE-based vision transformer has good dynamic expansion capability. Experiment results show TSE-based models can boost actual deployment latency by 1.30-1.36× on A100 GPU and 1.34-1.45× on AGX Orin with negligible accuracy degradation on ImageNet classification, COCO detection, and ADE20K segmentation benchmarking tasks. Moreover, TSE-based optimization is orthogonal to model compression. Combining with the state-of-the-art vision transformer compression method, it can boost actual deployment performance by 1.70-1.87× and 3.29-3.61× of latency and throughput on A100 GPU, and 1.67-1.74× and 2.76-2.94× improvement of latency and throughput on AGX Orin.
|
IEEE Transactions on Pattern Analysis and Machine Intelligence (T-PAMI)
|
Chong Yu, Tao Chen
|
2025
|
Stimulative Training++:Go Beyond The Performance Limits of Residual Networks
Stimulative Training++:Go Beyond The Performance Limits of Residual Networks
Peng Ye, Tao Chen
IEEE Transactions on Pattern Analysis and Machine Intelligence (T-PAMI)
摘要
Residual networks have shown great success and become indispensable in recent deep neural network models. In this work, we aim to re-investigate the training process of residual networks from a novel social psychology perspective of loafing, and further propose a new training scheme as well as three improved strategies for boosting residual networks beyond their performance limits. Previous research has suggested that residual networks can be considered as ensembles of shallow networks, which implies that the final performance of a residual network is influenced by a group of subnetworks. We identify a previously overlooked problem that is analogous to social loafing, where subnetworks within a residual network are prone to exert less effort when working as part of a group compared to working alone. We define this problem as \textit{network loafing}. Similar to the decreased individual productivity and overall performance as demonstrated in society, network loafing inevitably causes sub-par performance. Inspired by solutions from social psychology, we first propose a novel training scheme called stimulative training, which randomly samples a residual subnetwork and calculates the KL divergence loss between the sampled subnetwork and the given residual network for extra supervision. In order to unleash the potential of stimulative training, we further propose three simple-yet-effective strategies, including a novel KL- loss that only aligns the network logits direction, random smaller inputs for subnetworks, and inter-stage sampling rules. Comprehensive experiments and analysis verify the effectiveness of stimulative training as well as its three improved strategies.
|
IEEE Transactions on Pattern Analysis and Machine Intelligence (T-PAMI)
|
Peng Ye, Tao Chen
|
2025
|
WI3D:Weakly Incremental 3D Detection via Vision Foundation Models
WI3D:Weakly Incremental 3D Detection via Vision Foundation Models
Mingsheng Li, Tao Chen
EEE Transactions on Multimedia (T-MM)
摘要
Class-incremental 3D object detection demands a 3D detector to locate and recognize novel categories in a stream fashion while preserving its base detection ability. However, existing methods require delicate 3D annotations for learning novel categories, resulting in significant labeling costs. To this end, we explore a label-efficient approach called Weakly Incremental 3D Detection (WI3D), which teaches a 3D detector to learn incrementally with off-the-shelf vision foundation models. We propose a novel dual-teaching framework incorporating both intra-modal and inter-modal knowledge from pseudo labels and feature space. Specifically, our framework features a class-agnostic pseudo-label refinement module, designed for the generation of high-quality 3D pseudo labels. This module is built on a lightweight transformer that models the spatial relationships between pseudo labels and their interactions with rich contextual information in point clouds. Additionally, we introduce a cross-modal knowledge transfer module to enhance the representation learning of novel classes, along with a reweighting knowledge distillation strategy that dynamically assesses and distills knowledge from previously learned categories. Extensive experiments show that our approach can efficiently learn novel concepts while preserving knowledge of base classes in WI3D scenarios, and surpass baseline approaches on both SUN-RGBD and ScanNet.
|
EEE Transactions on Multimedia (T-MM)
|
Mingsheng Li, Tao Chen
|
2024
|
Vote2Cap-DETR++:Decoupling Localization and Describing for End-to-End 3D Dense Captioning
Vote2Cap-DETR++:Decoupling Localization and Describing for End-to-End 3D Dense Captioning
Sijin Chen, Tao chen
IEEE Transactions on Pattern Analysis and Machine Intelligence (T-PAMI)
摘要
3D dense captioning requires a model to translate its understanding of an input 3D scene into several captions associated with different object regions. Existing methods adopt a sophisticated "detect-then-describe" pipeline, which builds explicit relation modules upon a 3D detector with numerous hand-crafted components. While these methods have achieved initial success, the cascade pipeline tends to accumulate errors because of duplicated and inaccurate box estimations and messy 3D scenes. In this paper, we first propose Vote2Cap-DETR, a simple-yet-effective transformer framework that decouples the decoding process of caption generation and object localization through parallel decoding. Moreover, we argue that object localization and description generation require different levels of scene understanding, which could be challenging for a shared set of queries to capture. To this end, we propose an advanced version, Vote2Cap-DETR++, which decouples the queries into localization and caption queries to capture task-specific features. Additionally, we introduce the iterative spatial refinement strategy to vote queries for faster convergence and better localization performance. We also insert additional spatial information to the caption head for more accurate descriptions. Without bells and whistles, extensive experiments on two commonly used datasets, ScanRefer and Nr3D, demonstrate Vote2Cap-DETR and Vote2Cap-DETR++ surpass conventional "detect-then-describe" methods by a large margin.
|
IEEE Transactions on Pattern Analysis and Machine Intelligence (T-PAMI)
|
Sijin Chen, Tao chen
|
2024
|
Performance-aware Approximation of Global Channel Pruning for Multitask CNNs
Performance-aware Approximation of Global Channel Pruning for Multitask CNNs
Hancheng Ye, Tao Chen
IEEE Transactions on Pattern Analysis and Machine Intelligence (T-PAMI)
摘要
Global channel pruning (GCP) aims to remove a subset of channels (filters) across different layers from a deep model without hurting the performance. Previous works focus on either single task model pruning or simply adapting it to multitask scenario, and still face the following problems when handling multitask pruning: 1) Due to the task mismatch, a well-pruned backbone for classification task focuses on preserving filters that can extract category-sensitive information, causing filters that may be useful for other tasks to be pruned during the backbone pruning stage; 2) For multitask predictions, different filters within or between layers are more closely related and interacted than that for single task prediction, making multitask pruning more difficult. Therefore, aiming at multitask model compression, we propose a Performance-Aware Global Channel Pruning (PAGCP) framework. We first theoretically present the objective for achieving superior GCP, by considering the joint saliency of filters from intra- and inter-layers. Then a sequentially greedy pruning strategy is proposed to optimize the objective, where a performance-aware oracle criterion is developed to evaluate sensitivity of filters to each task and preserve the globally most task-related filters. Experiments on several multitask datasets show that the proposed PAGCP can reduce the FLOPs and parameters by over 60% with minor performance drop, and achieves 1.2x∼3.3x acceleration on both cloud and mobile platforms.
|
IEEE Transactions on Pattern Analysis and Machine Intelligence (T-PAMI)
|
Hancheng Ye, Tao Chen
|
2023
|
High-Rank Irreducible Cartesian Tensor Decomposition and Bases of Equivariant Spaces
High-Rank Irreducible Cartesian Tensor Decomposition and Bases of Equivariant Spaces
Shihao Shao, Zhouchen Lin
J. Machine Learning Research
摘要
Irreducible Cartesian tensors (ICTs) play a crucial role in the design of equivariant graph neural networks, as well as in theoretical chemistry and chemical physics. Meanwhile, the design space of available linear operations on tensors that preserve symmetry presents a significant challenge. The ICT decomposition and a basis of this equivariant space are difficult to obtain for high-rank tensors. After decades of research, Bonvicini (2024) has recently achieved an explicit ICT decomposition for n = 5 with factorial time/space complexity. In this work we, for the first time, obtain decomposition matrices for ICTs up to rank n = 9 with reduced and affordable complexity, by constructing what we call path matrices. The path matrices are obtained via performing chain-like contractions with Clebsch-Gordan matrices following the parentage scheme. We prove and leverage that the concatenation of path matrices is an orthonormal change-of-basis matrix between the Cartesian tensor product space and the spherical direct sum spaces. Furthermore, we identify a complete orthogonal basis for the equivariant space, rather than a spanning set (Pearce-Crump, 2023b), through this path matrices technique. Our method avoids the RREF algorithm and maintains a fully analytical derivation of each ICT decomposition matrix, thereby significantly improving the algorithm's speed to obtain arbitrary rank orthogonal ICT decomposition matrices and orthogonal equivariant bases. We further extend our result to the arbitrary tensor product and direct sum spaces, enabling free design between different spaces while keeping symmetry.
|
J. Machine Learning Research
|
Shihao Shao, Zhouchen Lin
|
2025
|
Adan: Adaptive Nesterov Momentum Algorithm for Faster Optimizing Deep Models
Adan: Adaptive Nesterov Momentum Algorithm for Faster Optimizing Deep Models
Xingyu Xie, Zhouchen Lin
IEEE Trans. Pattern Analysis and Machine Intelligence
摘要
In deep learning, different kinds of deep networks typically need different optimizers, which have to be chosen after multiple trials, making the training process inefficient. To relieve this issue and consistently improve the model training speed across deep networks, we propose the ADAptive Nesterov momentum algorithm, Adan for short. Adan first reformulates the vanilla Nesterov acceleration to develop a new Nesterov momentum estimation (NME) method, which avoids the extra overhead of computing gradient at the extrapolation point. Then Adan adopts NME to estimate the gradient’s first- and second-order moments in adaptive gradient algorithms for convergence acceleration. Besides, we prove that Adan finds an \epsilon-approximate first-order stationary point within O(\epsilon^{-3.5}) stochastic gradient complexity on the non-convex stochastic problems (e.g., deep learning problems), matching the best-known lower bound. Extensive experimental results show that Adan consistently surpasses the corresponding SoTA optimizers on vision, language, and RL tasks and sets new SoTAs for many popular networks and frameworks, e.g., ResNet, ConvNext, ViT, Swin, MAE, DETR, GPT-2, Transformer-XL, and BERT. More surprisingly, Adan can use half of the training cost (epochs) of SoTA optimizers to achieve higher or comparable performance on ViT, GPT-2, MAE, etc, and also shows great tolerance to a large range of minibatch size, e.g., from 1 k to 32 k.
|
IEEE Trans. Pattern Analysis and Machine Intelligence
|
Xingyu Xie, Zhouchen Lin
|
2024
|
Designing Universally-Approximating Deep Neural Networks: A First-Order Optimization Approach
Designing Universally-Approximating Deep Neural Networks: A First-Order Optimization Approach
Zhoutong Wu, Zhouchen Lin
IEEE Trans. Pattern Analysis and Machine Intelligence
摘要
Universal approximation capability, also referred to as universality, is an important property of deep neural networks,endowing them with the potency to accurately represent the under lying target function in learning tasks. In practice, the architecture of deep neural networks largely influences the performance of the models. However, most existing methodologies for designing neural architectures, such as the heuristic manual design or neural architecture search, ignore the universal approximation property,thus losing a potential safeguard about the performance. In this paper, we propose a unified framework to design the architectures of deep neural networks with a universality guarantee based on first-order optimization algorithms, where the forward pass is interpreted as the updates of an optimization algorithm. The (explicit or implicit) network is designed by replacing each gradient term in the algorithm with a learnable module similar to a two-layer network or its derivatives. Specifically, we explore the realm of width-bounded neural networks, a common practical scenario, showcasing their universality. Moreover, adding operations of normalization, downsampling, and upsampling does not hurt the universality. To the best of our knowledge, this is the first work that width-bounded networks with universal approximation guarantee can be designed in a principled way. Our framework can inspire a variety of neural architectures including some renowned structures such as ResNet and DenseNet, as well as novel innovations. The experimental results on image classification problems demonstrate that the newly inspired networks are competitive and surpass the baselines of ResNet, DenseNet, as well as the advanced ConvNeXt and ViT, testifying to the effectiveness of our framework.
|
IEEE Trans. Pattern Analysis and Machine Intelligence
|
Zhoutong Wu, Zhouchen Lin
|
2024
|
Towards Understanding Convergence and Generalization of AdamW
Towards Understanding Convergence and Generalization of AdamW
Pan Zhou, Shuicheng Yan
IEEE Trans. Pattern Analysis and Machine Intelligence
摘要
AdamW modifies Adam by adding a decoupled weight decay to decay network weights per training iteration. For adaptive algorithms, this decoupled weight decay does not affect specific optimization steps, and differs from the widely used L2-regularizer which changes optimization steps via changing the first- and second-order gradient moments. Despite its great practical success, for AdamW, its convergence behavior and generalization improvement over Adam and L2-regularized Adam (L2-Adam) remain absent yet. To solve this issue, we prove the convergence of AdamW and justify its generalization advantages over Adam and L2-Adam. Specifically, AdamW provably converges but minimizes a dynamically regularized loss that combines vanilla loss and a dynamical regularization induced by decoupled weight decay, thus yielding different behaviors with Adam and L2-Adam. Moreover, on both general nonconvex problems and PL-conditioned problems, we establish stochastic gradient complexity of AdamW to find a stationary point. Such complexity is also applicable to Adam and L2-Adam, and improves their previously known complexity, especially for over-parameterized networks. Besides, we prove that AdamW enjoys smaller generalization errors than Adam and L2-Adam from the Bayesian posterior aspect. This result, for the first time, explicitly reveals the benefits of decoupled weight decay in AdamW. Experimental results validate our theory.
|
IEEE Trans. Pattern Analysis and Machine Intelligence
|
Pan Zhou, Shuicheng Yan
|
2024
|
Optimization Induced Equilibrium Networks: An Explicit Optimization Perspective for Understanding Equilibrium Models
Optimization Induced Equilibrium Networks: An Explicit Optimization Perspective for Understanding Equilibrium Models
Xingyu Xie, Zhouchen Lin
IEEE Trans. Pattern Analysis and Machine Intelligence
摘要
To reveal the mystery behind deep neural networks(DNNs), optimization may offer a good perspective. There are already some clues showing the strong connection between DNNs and optimization problems, e.g., under a mild condition, DNN's activation function is indeed a proximal operator. In this paper, we are committed to providing a unified optimization induced interpretability for a special class of networks--equilibrium models, i.e., neural networks defined by fixed point equations, which have become increasingly attractive recently. To this end, we first decompose DNNs into a new class of unit layer that is the proximal operator of an implicit convex function while keeping its output unchanged. Then, the equilibrium model of the unit layer can be derived, we name it Optimization Induced Equilibrium Networks (OptEq). The equilibrium point of OptEq can be theoretically connected to the solution of a convex optimization problem with explicit objectives. Based on this, we can flexibly introduce prior properties to the equilibrium points: 1)modifying the underlying convex problems explicitly so as to change the architectures of OptEq; and 2) merging the information into the fixed point iteration, which guarantees to choose the desired equilibrium point when the fixed point set is non-singleton. We show that OptEq outperforms previous implicit models even with fewer parameters.
|
IEEE Trans. Pattern Analysis and Machine Intelligence
|
Xingyu Xie, Zhouchen Lin
|
2023
|
Deep Prior Mamba Network for Hyperspectral Anomaly Detection
Deep Prior Mamba Network for Hyperspectral Anomaly Detection
Linwei Li, and Bin Wang*
IEEE Transactions on Geoscience and Remote Sensing
|
IEEE Transactions on Geoscience and Remote Sensing
|
Linwei Li, and Bin Wang*
|
2025
|
DF2RQ: Dynamic Feature Fusion via Region-wise Queries for Semantic Segmentation of Multimodal Remote Sensing Data
DF2RQ: Dynamic Feature Fusion via Region-wise Queries for Semantic Segmentation of Multimodal Remote Sensing Data
Shiyang Feng, Zhaowei Li, Bo Zhang, and Bin Wang*
IEEE Transactions on Geoscience and Remote Sensing
|
IEEE Transactions on Geoscience and Remote Sensing
|
Shiyang Feng, Zhaowei Li, Bo Zhang, and Bin Wang*
|
2025
|
DSF2-NAS: Dual-Stage Feature Fusion via Network Architecture Search for Classification of Multimodal Remote Sensing Images
DSF2-NAS: Dual-Stage Feature Fusion via Network Architecture Search for Classification of Multimodal Remote Sensing Images
Shiyang Feng, Zhaowei Li, Bo Zhang, Tao Chen, and Bin Wang*
IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing
|
IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing
|
Shiyang Feng, Zhaowei Li, Bo Zhang, Tao Chen, and Bin Wang*
|
2025
|
ASS-CD: Adapting Segment Anything Model and Swin-Transformer for Change Detection in Remote Sensing Images
ASS-CD: Adapting Segment Anything Model and Swin-Transformer for Change Detection in Remote Sensing Images
Chenlong Wei, Xiaofeng Wu, and Bin Wang*
|
Remote Sensing
|
Chenlong Wei, Xiaofeng Wu, and Bin Wang*
|
2025
|
稀疏性和自相似性先验引导的深度学习图像盲超分
稀疏性和自相似性先验引导的深度学习图像盲超分
葛孙逸,罗小伟,冯世阳,王斌*
|
红外与毫米波学报
|
葛孙逸,罗小伟,冯世阳,王斌*
|
2025
|
基于自监督预训练与跨尺度对比学习的多模态遥感图像融合
基于自监督预训练与跨尺度对比学习的多模态遥感图像融合
李朝伟,冯世阳,王斌*
|
红外与毫米波学报
|
李朝伟,冯世阳,王斌*
|
2025
|
基于知识蒸馏的轻量化遥感多模态大语言模型
基于知识蒸馏的轻量化遥感多模态大语言模型
张馨月,冯世阳,王斌*
|
红外与毫米波学报
|
张馨月,冯世阳,王斌*
|
2025
|
EMLM-Net: An Extended Multilinear Mixing Model-Inspired Dual-Stream Network for Unsupervised Nonlinear Hyperspectral Unmixing
EMLM-Net: An Extended Multilinear Mixing Model-Inspired Dual-Stream Network for Unsupervised Nonlinear Hyperspectral Unmixing
Minglei Li, Bin Yang, and Bin Wang*
IEEE Transactions on Geoscience and Remote Sensing
|
IEEE Transactions on Geoscience and Remote Sensing
|
Minglei Li, Bin Yang, and Bin Wang*
|
2024
|
Transformer-Based Autoencoder Framework for Nonlinear Hyperspectral Anomaly Detection
Transformer-Based Autoencoder Framework for Nonlinear Hyperspectral Anomaly Detection
Ziyu Wu, and Bin Wang*
IEEE Transactions on Geoscience and Remote Sensing
|
IEEE Transactions on Geoscience and Remote Sensing
|
Ziyu Wu, and Bin Wang*
|
2024
|
Joint Distribution Adaptive-Alignment for Cross-Domain Segmentation of High-Resolution Remote Sensing Images
Joint Distribution Adaptive-Alignment for Cross-Domain Segmentation of High-Resolution Remote Sensing Images
Haitao Huang, Baopu Li, Yuchen Zhang, Tao Chen, and Bin Wang*
IEEE Transactions on Geoscience and Remote Sensing
|
IEEE Transactions on Geoscience and Remote Sensing
|
Haitao Huang, Baopu Li, Yuchen Zhang, Tao Chen, and Bin Wang*
|
2024
|
SD-SQ: Point Set Decoding Based on Semantic Query for Object Detection in Remote Sensing Images
SD-SQ: Point Set Decoding Based on Semantic Query for Object Detection in Remote Sensing Images
Shiyang Feng, and Bin Wang*
IEEE Transactions on Geoscience and Remote Sensing
|
IEEE Transactions on Geoscience and Remote Sensing
|
Shiyang Feng, and Bin Wang*
|
2024
|
Progressive Feature Fusion Framework Based on Graph Convolutional Network for Remote Sensing Scene Classification
Progressive Feature Fusion Framework Based on Graph Convolutional Network for Remote Sensing Scene Classification
Chongyang Zhang, and Bin Wang*
IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing
|
IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing
|
Chongyang Zhang, and Bin Wang*
|
2024
|
Hyperspectral Anomaly Detection via Merging Total Variation into Low-Rank Representation
Hyperspectral Anomaly Detection via Merging Total Variation into Low-Rank Representation
Linwei Li, Ziyu Wu, and Bin Wang*
IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing
|
IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing
|
Linwei Li, Ziyu Wu, and Bin Wang*
|
2024
|
Memory-adaptive Supervised LSTM Networks for Deep Soft Sensor Development of Industrial Processes
Memory-adaptive Supervised LSTM Networks for Deep Soft Sensor Development of Industrial Processes
Feifan Shen, Jiayang Wu, Lingjian Ye, and Bin Wang
|
IEEE Sensors Journal
|
Feifan Shen, Jiayang Wu, Lingjian Ye, and Bin Wang
|
2024
|
Exploring Multi-Timestep Multi-Stage Diffusion Features for Hyperspectral Image Classification
Exploring Multi-Timestep Multi-Stage Diffusion Features for Hyperspectral Image Classification
Jingyi Zhou, Jiamu Sheng, Peng Ye, Jiayuan Fan, Tong He, Bin Wang, and Tao Chen
IEEE Transactions on Geoscience and Remote Sensing
|
IEEE Transactions on Geoscience and Remote Sensing
|
Jingyi Zhou, Jiamu Sheng, Peng Ye, Jiayuan Fan, Tong He, Bin Wang, and Tao Chen
|
2024
|
基于知识蒸馏的轻量化遥感图像场景分类
基于知识蒸馏的轻量化遥感图像场景分类
张重阳,王斌*
|
红外与毫米波学报
|
张重阳,王斌*
|
2024
|
Subset Random Sampling and Reconstruction of Finite Time-Vertex Graph Signals
Subset Random Sampling and Reconstruction of Finite Time-Vertex Graph Signals
冯辉
IEEE Transactions on Signal and Information Processing over Networks
摘要
Finite time-vertex graph signals (FTVGS) provide an efficient representation for capturing spatio-temporal correlations across multiple data sources on irregular structures. Although sampling and reconstruction of FTVGS with known spectral support have been extensively studied, the case of unknown spectral support requires further investigation. Existing random sampling methods may extract samples from any vertex at any time, but such strategies are not friendly in practice, where sampling is typically limited to a subset of vertices and moments. To address this requirement, we propose a subset random sampling scheme for FTVGS. Specifically, we first randomly select a subset of rows and columns to form a submatrix, followed by random sampling within that submatrix. In theory, we provide sufficient conditions for reconstructing the original FTVGS with high probability. Additionally, we introduce a reconstruction framework incorporating low-rank, sparsity, and smoothness priors (LSSP), and verify the feasibility of the reconstruction and the effectiveness of the framework through experiments.
|
IEEE Transactions on Signal and Information Processing over Networks
|
冯辉
|
2025
|
Sampling theory of jointly bandlimited time-vertex graph signalsSampling theory of jointly bandlimited time-vertex graph signals
Sampling theory of jointly bandlimited time-vertex graph signalsSampling theory of jointly bandlimited time-vertex graph signals
冯辉
摘要
Time-vertex graph signal (TVGS) models describe time-varying data with irregular structures. The bandlimitedness in the joint time-vertex Fourier spectral domain reflects smoothness in both temporal and graph topology. In this paper, we study the critical sampling of three types of TVGS including continuous-time signals, infinite-length sequences, and finite-length sequences in the time domain for each vertex on the graph. For a jointly bandlimited TVGS, we prove a lower bound on sampling density or sampling ratio, which depends on the measure of the spectral support in the joint time-vertex Fourier spectral domain. We also provide a lower bound on the sampling density or sampling ratio of each vertex on sampling sets for perfect recovery. To demonstrate that critical sampling is achievable, we propose the sampling and reconstruction procedures for the different types of TVGS. Finally, we show how the proposed sampling schemes can be applied to numerical as well as real datasets.
|
Signal Processing
|
冯辉
|
2024
|
Online Signed Sampling of Bandlimited Graph Signals
Online Signed Sampling of Bandlimited Graph Signals
冯辉
IEEE Transactions on Signal and Information Processing over Networks
摘要
The theory of sampling and recovery of bandlimited graph signals has been extensively studied. However, in many cases, the observation of a signal is quite coarse. For example, users only provide simple comments such as “like” or “dislike” for a product on an e-commerce platform. This is a particular scenario where only the sign information of a graph signal can be measured. In this paper, we are interested in how to sample based on sign information in an online manner, by which the direction of the original graph signal can be estimated. The online signed sampling problem of a graph signal can be formulated as a Markov decision process in a finite horizon. Unfortunately, it is intractable for large size graphs. We propose a low-complexity greedy signed sampling algorithm (GSS) as well as a stopping criterion. Meanwhile, we prove that the objective function is adaptive monotonic and adaptive submodular, so that the performance is close enough to the global optimum with a lower bound. Finally, we demonstrate the effectiveness of the GSS algorithm by both synthesis and realworld data.
|
IEEE Transactions on Signal and Information Processing over Networks
|
冯辉
|
2024
|
Angular Parameter Estimation for Incoherently Distributed Sources with Single RF Chain
Angular Parameter Estimation for Incoherently Distributed Sources with Single RF Chain
胡波
IEEE Transaction on Signal Processing
摘要
In this paper, we consider the problem of estimating the angular parameters, i.e., the nominal angle-of-arrivals (AoAs) and angular spreads, of incoherently distributed sources using the phased-array equipped with a single RF chain. We first derive the approximate Fourier series of the received power. The coefficients can be expressed in closed form with the angular parameters. In the case of single source, this finding directly suggests the design of the low-complexity algorithm that performs spatial sampling and discrete Fourier transform to estimate the Fourier series coefficients, from which the nominal AoA and angular spread can be obtained successively. In the case of multiple sources, we focus on one source at one time, and the multiples sources are handled one by one. Based on the Fourier series expression, the power fitting approach is proposed to build the nonlinear least-squares problem. Then, the semi-exhaustive search algorithm is developed to find the solution, which gives the angular parameters of the target source. Additionally, the approximate Cramer-Rao bound is derived as benchmark. The numerical results demonstrate that in certain cases, the proposed methods can even outperform the existing method that uses fully-digital array.
|
IEEE Transaction on Signal Processing
|
胡波
|
2024
|
Knowledge Distillation-Based Edge-Decision Hierarchies for Interactive Behavior-Aware Planning in Autonomous Driving System
Knowledge Distillation-Based Edge-Decision Hierarchies for Interactive Behavior-Aware Planning in Autonomous Driving System
胡波
IEEE Transactions on Intelligent Transportation Systems
摘要
Interactive behavior-aware planning benefits from the hierarchical learning process when adapting to dense traffic. However, the difficulty in the Intelligent Transportation System (ITS) is that the autonomous vehicle fails to execute real-time response due to hardly perceiving dynamic objects beyond the visual range. This problem can be tackled by vehicle-roadcloud cooperation that synchronously collects global perception information and makes strategic policy for deployment. Here we propose a hierarchical edge-decision framework, which addresses real-time motion skill that distills from analogical reasoning of spatial-temporal events. The first step is establishing the goal-conditioned motion library from the centralized edge-cloud perception, to compose the belief-based best response with collision avoidance. In addition, a novel perspective of latent space is presented to promote motion rehearsal in the cloud, which could generate prior credible trajectories based on the policy distillation procedure of extracting informative action from thoroughly exploring changing events. Moreover, the twostage hierarchy decision is developed to boost the efficiency of advanced policy modification, through evaluating the hierarchical judgment matrices considering conditional criteria, thereby constituting an optimum auto-driving motion with vehicle-roadcloud collaborative system. Extensive validation on challenging autonomous driving scenarios outperforms, demonstrating that our edge-decision method significantly promotes adaption to the complex time-varying environment in ITS system in a smooth and sustainable manner.
|
IEEE Transactions on Intelligent Transportation Systems
|
胡波
|
2024
|
The Data Value based Asynchronous Federated Learning for UAV Swarm under Unstable Communication Scenarios
The Data Value based Asynchronous Federated Learning for UAV Swarm under Unstable Communication Scenarios
杨涛
IEEE Transactions on Mobile Computing
摘要
Federated learning has provided a new approach to coordinating a group of clients to train a machine learning model collaboratively, which can be easily embedded into Unmanned Aerial Vehicle (UAV) swarms. Compared with the terrestrial wireless networks, the UAV swarm faces more precarious communication conditions, rendering synchronous aggregation no longer tenable. Additionally, the data collected from UAVs tend to be heterogeneous due to different deployment regions or requirements. To overcome these restrictions, this article has proposed a novel two-stage Asynchronous Federated Learning scheme for the UAV swarm. Initially, the convergence property of both convex and non-convex models trained by the proposed scheme is analyzed. In the pre-training stage, we modeled the learning process as a cooperative game with demonstrated monotonicity and submodularity. Furthermore, the Shapley Value is imported to quantify data values of UAVs, and the upper bound of its estimation error rate is derived. In the training stage, a new concept named Network Age of Updates (AoU) is proposed to address the fairness issue, quantifying the model’s generalization capability with data value consideration, and a sequential UAV selection scheduling is performed through the AoU minimization by Whittle Index method. Finally, the system performance is validated through both theoretical analysis and simulations.
|
IEEE Transactions on Mobile Computing
|
杨涛
|
2024
|
An Air-Ground Coordinated Sensing, Relay and Offloading for Emergency Disposal in ITS System
An Air-Ground Coordinated Sensing, Relay and Offloading for Emergency Disposal in ITS System
杨涛
IEEE Transactions on Intelligent Transportation System
摘要
Nowadays, the Unmanned Aerial Vehicle (UAV) has emerged as a powerful platform for diversified application development. In traffic areas specifically for the much-anticipated Intelligent Transportation System (ITS), UAVs can coordinate with the ground internet of things infrastructure, such as the Road Side Unit (RSU), to perform 3D data collection and processing, improving traffic safety and transportation efficiency. In ITS, decision-making is crucial and relies heavily on proper and timely processing and transmission of the massive amounts of data generated by ubiquitous sensors, especially in emergency disposal scenarios. In this paper, the latency incurred from RSU-aided data offloading and UAV-aided data relay plays a decisive role, characterized by both the average and risk performance metric, Conditional Value at Risk (CVaR), for quantifying the risk incurred from the latency violation over a certain threshold in probability. Specifically, the peak Age of Information (AoI) with its distribution is adopted to underlying the CVaR analysis in relay latency. Besides, the matching algorithm to coordinate the mutual transmission between UAVs and RSUs is proposed, achieving low computation complexity and minimum risk for both sides. The system performance is validated through analysis, simulation, and field experiments.
|
IEEE Transactions on Intelligent Transportation System
|
杨涛
|
2023
|
An Adaptive Matching Bridged Resource Allocation Over Correlated Energy Efficiency and AoI in CR-IoT System
An Adaptive Matching Bridged Resource Allocation Over Correlated Energy Efficiency and AoI in CR-IoT System
杨涛
IEEE Transactions on Green Communications and Networking
摘要
We consider a Cognitive Radio based Internet of Things (CR-IoT) system consisting of multiple Primary IoT Devices (PDs) and Secondary IoT Devices (SDs). We focus on the impact of the IoT devices’ heterogeneous traffic pattern on the energy efficiency and on the age of information (AoI) performance, and aim to maximize the sum of energy efficiency of the SDs under their individual constraints of the average age of information (AoI), via optimally deciding the transmit power and channel allocation. We first derive the closed-form expressions of the energy efficiency and the average AoI, and subsequently explore their convexity and monotonicity to the transmit power. Following these haracterizations, an optimal transmit power optimization algorithm (TPOA) is proposed for the SDs, and the channel allocation problem is reformulated as a matching game based on the utilities calculated from TPOA. To improve the system performance, we introduce the virtual price charged in each channel. Taking advantage of the correlation between two adjacent matchings, we propose a semi-distributed, low-complexity adaptive matching algorithm to construct an -stable matching, and the effectiveness of the proposed matching algorithm is verified through Monte Carlo simulations
|
IEEE Transactions on Green Communications and Networking
|
杨涛
|
2022
|
Regret of Age-of-Information Bandits in Nonstationary Wireless Networks
Regret of Age-of-Information Bandits in Nonstationary Wireless Networks
胡波
IEEE Wireless Communications Letters
摘要
We consider a wireless network with a source periodically generating time-sensitive information and transmitting it to a destination via one of N non-stationary orthogonal wireless channels. The goal of the scheduling policy is to keep the information at the destination fresh, which is captured by the Age of Information (AoI) metric. While obtaining an analytical and accurate AoI performance characterization in non-stationary wireless channels is usually intractable, we thereby resort to multi-armed bandits (MAB) to solve this problem, where the non-stationary channels and AoI as taken as arms and rewards, respectively. We consider three special non-stationary channels in which the lower bound on the AoI regret achievable by any policy is derived, respectively. In addition, the upper bound of Exp3.S, Active Arm Elimination (AAE) and Cumulative Sum Upper Confidence Bound (CUSUM-UCB) policy for the corresponding three settings are presented. Furthermore, the variants of AAE and CUSUM-UCB are proposed and verified more effectively than their original policies via simulations.
|
IEEE Wireless Communications Letters
|
胡波
|
2022
|
Average Rate Approximation and Maximization for RIS-Assisted Multi-User MISO System
Average Rate Approximation and Maximization for RIS-Assisted Multi-User MISO System
杨涛
IEEE Wireless Communications Letters
摘要
In this letter, we investigate a reconfigurable intelligent surface (RIS) assisted downlink multi-user multipleinput single-output (MISO) system. We derive the approximations of average rate of user equipments (UEs) under proper assumptions, and reveal the relations between average rates and system parameters. Furthermore, a RIS configuration algorithm is proposed to improve the average sum rate with low complexity by exploiting the approximation of average rate and statistical channel state information (CSI). The rationality of the assumptions, the tightness of the approximations and the effectiveness of the proposed algorithm are validated by numerical results.
|
IEEE Wireless Communications Letters
|
杨涛
|
2022
|
Two-Timescale Resource Allocation for Cooperative D2D Communication: A Matching Game Approach
Two-Timescale Resource Allocation for Cooperative D2D Communication: A Matching Game Approach
杨涛
IEEE Transactions on Vehicular Technology
摘要
In this paper, we consider a cooperative device-todevice (D2D) communication system, where the D2D transmitters (DTs) act as relays to assist the densified cellular network users (CUs) for transmission quality of service (QoS) improvement. The proposed system achieves a win-win situation, i.e. improving the spectrum efficiency of the CUs that cannot meet their rate requirement while providing spectrum access for D2D pairs. Unlike previous works, to reduce the overhead, we design a novel two-timescale resource allocation scheme, in which the pairing between CUs and D2D pairs is decided at a long timescale and transmission time for CU and D2D pair is determined at a short timescale. Specifically, to characterize the long-term payoff of each potential CU-D2D pair, we investigate the optimal cooperation policy to decide the transmission time based on the instantaneous channel state information (CSI). We prove that the optimal policy is a threshold policy which can be achieved via binary search. Since CUs and D2D pairs are self-interested, they are paired only when they agree to cooperate mutually. Therefore, to study the cooperation behaviors of CUs and D2D pairs, we formulate the pairing problem as a matching game, based on the long-term payoff achieved by the optimal cooperation policy of each possible pairing. Furthermore, unlike most previous matching models in D2D networks, we allow transfer between CUs and D2D pairs to improve the performance. To solve the pairing problem, a distributed algorithm is proposed, which converges to an -stable matching. We show that there is a trade-off between the optimality and the computational complexity of the algorithm. We also analyze the algorithm in terms of the robustness to the unilateral deviation of D2D pairs. Finally, the simulation results verify the efficiency of the proposed matching algorithm.
|
IEEE Transactions on Vehicular Technology
|
杨涛
|
2021
|