You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Laplacian-Former: Overcoming the Limitations of Vision Transformers in Local Texture Detection MICCAI 2023
Vision Transformer (ViT) models have demonstrated a breakthrough in a wide range of computer vision tasks. However, compared to the Convolutional Neural Network (CNN) models, it has been observed that the ViT models struggle to capture high-frequency components of images, which can limit their ability to detect local textures and edge information. As abnormalities in human tissue, such as tumors and lesions, may greatly vary in structure, texture, and shape, high-frequency information such as texture is crucial for effective semantic segmentation tasks. To address this limitation in ViT models, we propose a new technique, Laplacian-Former, that enhances the self-attention map by adaptively re-calibrating the frequency information in a Laplacian pyramid. More specifically, our proposed method utilizes a dual attention mechanism via efficient attention and frequency attention while the efficient attention mechanism reduces the complexity of self-attention to linear while producing the same output, selectively intensifying the contribution of shape and texture features. Furthermore, we introduce a novel efficient enhancement multi-scale bridge that effectively transfers spatial information from the encoder to the decoder while preserving the fundamental features.
Citation
@inproceedings{azad2023laplacian,
title={Laplacian-Former: Overcoming the Limitations of Vision Transformers in Local Texture Detection},
author={Azad, Reza and Kazerouni, Amirhossein and Azad, Babak and Khodapanah Aghdam, Ehsan and Velichko, Yury and Bagci, Ulas and Merhof, Dorit},
booktitle={International Conference on Medical Image Computing and Computer-Assisted Intervention},
pages={736--746},
year={2023},
organization={Springer}
}
--root_path [Train data path]
--test_path [Test data path]
--eval_interval [Evaluation epoch]
--dst_fast [Optional] [Load all data into RAM for faster training]
--resume [Optional] [Resume from checkpoint]
--model_path [Optional] [Provide the path to the latest checkpoint file for loading the model.]
For information regarding training the skin dataset, please refer to this link.
--test_path [Test data path]
--is_savenii [Whether to save results during inference]
--pretrained_path [Pretrained model path]
Experiments
For evaluating the performance of the proposed method, two challenging tasks in medical image segmentation have been considered: Synapse Dataset and ISIC 2018 Dataset. The proposed Laplacian-Former achieves superior segmentation performance.
Our results in the table are updated according to the model weight.