Delve into the transformative world of Transformer models with our curated list of top research papers. Whether you're a novice or an expert, these papers provide valuable insights and advancements in deep learning technology. Discover the latest trends and findings in Transformer research here.
Looking for research-backed answers?Try AI Search
It is pointed out that the attention inside these local patches are also essential for building visual transformers with high performance and a new architecture, namely, Transformer iN Transformer (TNT), is explored.
This course explores Transformational Leadership as it relates to workforce dynamics and practices and investigates the history of this theory, including the variety of approaches and salient cultural, gender, and business forces influencing its development over time.
Ze Liu, Yutong Lin, Yue Cao + 5 more
2021 IEEE/CVF International Conference on Computer Vision (ICCV)
A hierarchical Transformer whose representation is computed with Shifted windows, which has the flexibility to model at various scales and has linear computational complexity with respect to image size and will prove beneficial for all-MLP architectures.
Yinhao Zhu, Yang Yang, Taco Cohen
journal unavailable
It is shown that nonlinear transforms built on Swin-transformers can achieve better compression efficiency than transforms built on convolutional neural networks (ConvNets), while requiring fewer parameters and shorter decoding time.
A. Panigrahi, Sadhika Malladi, Mengzhou Xia + 1 more
ArXiv
This work proposes an efficient construction, Transformer in Transformer (in short, TinT), that allows a transformer to simulate and fine-tune complex models internally during inference (e.g., pre-trained language models), and introduces innovative approximation techniques that allow a TinT model with less than 2 billion parameters to simulateand fine-Tune a 125 million parameter transformer model within a single forward pass.
Absorption Index (AI) remains valid for old-aged unsealed transformers as a simple and effective method of non-destructive control insulation. The reasons for AI decrease within transformer operation are insulation moistening and contamination. Seven gradation levels of the insulation condition and algorithm of the operating procedures are proposed depending on the value of the measured AI and its variation in time. Along with AI, it is recommended to measure the polarisation index (PI) and the PI-2 (R 600 /R 15 ratio).
K. Apel, G. Adey, D. Frisby
journal unavailable
As Apel himself notes in his preface, the expression "Transformation of Philosophy" bears an ambiguity, naming both a change that took place in the development of philosophy as well as Apel's own systematic project. As a historical approach the title characterizes the transformation that philosophy has undergone in 20th century philosophy through an emphasis on the mediation and the configuring power of language. Apel focuses on three main currents, represented by Wittgenstein, Heidegger, and Peirce.
Xin Chen, Bin Yan, Jiawen Zhu + 3 more
2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
This work presents a novel attention-based feature fusion network, which effectively combines the template and search region features solely using attention and presents a Transformer tracking method based on the Siamese-like feature extraction backbone, the designed attention- based fusion mechanism, and the classification and regression head.
Dongchen Han, Xuran Pan, Yizeng Han + 2 more
2023 IEEE/CVF International Conference on Computer Vision (ICCV)
This paper proposes a novel Focused Linear Attention module, which introduces a simple yet effective mapping function and an efficient rank restoration module to enhance the expressiveness of self-attention while maintaining low computation complexity.
Nihal Özdoğan
Journal of Innovative Science and Engineering (JISE)
Investigating solutions of differential equations has been an important issue for scientists. Researchers around the world have talked about different methods to solve differential equations. The type and order of the differential equation enabled us to decide the method that we could choose to find the solution of the equation. One of these methods is the integral transform. Integral transform is the conversion of a real or complex valued function into another function by some algebraic operations. Integral transforms are used to solve many problems in mathematics and engi...
Alexander Yom Din, Taelin Karidi, Leshem Choshen + 1 more
ArXiv
A simple method for casting the hidden representations as final representations, bypassing the transformer computation in-between, using linear transformations is suggested, which far exceeds the prevailing practice of inspecting hidden representations from all layers, in the space of the final layer.
Salman Hameed Khan, Muzammal Naseer, Munawar Hayat + 3 more
ACM Computing Surveys (CSUR)
This survey aims to provide a comprehensive overview of the Transformer models in the computer vision discipline with an introduction to fundamental concepts behind the success of Transformers, i.e., self-attention, large-scale pre-training, and bidirectional feature encoding.
Cong Wang, Jinshan Pan, Wei Wang + 5 more
ArXiv
Experimental results show that the UHDformer reduces about ninety-seven percent model sizes compared with most state-of-the-art methods while significantly improving performance under different training sets on 3 UHD image restoration tasks, including low-light image enhancement, image dehazing, and image deblurring.
A protein language model which takes as input a set of sequences in the form of a multiple sequence alignment and is trained with a variant of the masked language modeling objective across many protein families surpasses current state-of-the-art unsupervised structure learning methods by a wide margin.
Ze Liu, Jia Ning, Yue Cao + 4 more
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
This paper advocates an inductive bias of locality in video Transformers, which leads to a better speed-accuracy trade-off compared to previous approaches which compute self-attention globally even with spatial-temporal factorization.
Nessrine Omrani, Nada Rejeb, A. Maalaoui + 2 more
IEEE Transactions on Engineering Management
The empirical results show that the technology context (IT infrastructure and digital tools) along with the existing level of innovation are the main drivers that act as stepping stones in digital technology adoption.
It is demonstrated that an approximate kNN lookup into a non-differentiable memory of recent (key, value) pairs improves language modeling across various benchmarks and tasks, including generic webtext, math papers, books, code, as well as formal theorems (Isabelle).
William S. Peebles, Saining Xie
2023 IEEE/CVF International Conference on Computer Vision (ICCV)
A new class of diffusion models based on the transformer architecture is explored, replacing the commonly-used U-Net backbone with a transformer that operates on latent patches that outperform all prior diffusion models on the class-conditional ImageNet 512×512 and 256×256 benchmarks.
This work designs an Inception mixer to explicitly graft the advantages of convolution and max-pooling for capturing the high-frequency information to Transformers, and introduces a frequency ramp structure, which can effectively trade-off high- and low-frequency components across different layers.
Haoqi Fan, Bo Xiong, K. Mangalam + 4 more
2021 IEEE/CVF International Conference on Computer Vision (ICCV)
This fundamental architectural prior for modeling the dense nature of visual signals for a variety of video recognition tasks where it outperforms concurrent vision transformers that rely on large scale external pre-training and are 5-10× more costly in computation and parameters is evaluated.
Xiaoyi Dong, Jianmin Bao, Dongdong Chen + 5 more
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
The Cross-Shaped Window self-attention mechanism for computing self-Attention in the horizontal and vertical stripes in parallel that form a cross-shaped window is developed, with each stripe obtained by splitting the input feature into stripes of equal width.
Changyeon Kim, Jongjin Park, Jinwoo Shin + 3 more
ArXiv
This paper introduces a new preference model based on the weighted sum of non-Markovian rewards, a neural architecture that models human preferences using transformers and demonstrates that Preference Transformer can solve a variety of control tasks using real human preferences, while prior approaches fail to work.
Bo Peng, Eric Alcaide, Quentin G. Anthony + 29 more
ArXiv
This work proposes a novel model architecture, Receptance Weighted Key Value (RWKV), that combines the efficient parallelizable training of transformers with the efficient inference of RNNs, and presents a significant step towards reconciling trade-offs between computational efficiency and model performance in sequence processing tasks.
Xiaohua Zhai, Alexander Kolesnikov, N. Houlsby + 1 more
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
A ViT model with two billion parameters is successfully trained, which attains a new state-of-the-art on ImageNet of 90.45% top-1 accuracy and performs well for few-shot transfer.
Lauri Wessel, Abayomi Baiyere, Roxana Ologeanu-Taddeï + 2 more
J. Assoc. Inf. Syst.
An empirically grounded conceptualization is developed that sets these two phenomena apart, finding that there are two distinctive differences: digital transformation activities leverage digital technology in (re)defining an organization’s value proposition, while IT-enabled organizational transformation activities Leverage digitalTechnology in supporting the value proposition.
S. Khakale, Dinkar P. Patil
SSRN Electronic Journal
--------------------------------------------------------------------------------------------------------------------------------------Submitted: 25-09-2021 Revised: 01-10-2021 Accepted: 05-10-2021 --------------------------------------------------------------------------------------------------------------------------------------ABSTRACT: In this paper a new integral transform namely Soham transform is developed and applied to solve linear ordinary differential equations with constant coefficients.
Timothée Darcet, Maxime Oquab, J. Mairal + 1 more
ArXiv
This paper identifies and characterize artifacts in feature maps of both supervised and self-supervised ViT networks, and proposes a simple yet effective solution based on providing additional tokens to the input sequence of the Vision Transformer to fill that role.
Luis Muller, Mikhail Galkin, Christopher Morris + 1 more
ArXiv
A taxonomy of graph transformer architectures is derived, bringing some order to this emerging field by probing how well graph transformers can recover various graph properties, how well they can deal with heterophilic graphs, and to what extent they prevent over-squashing.
Noël Carroll, N. Hassan, I. Junglas + 2 more
European Journal of Information Systems
Some of the key challenges associated with researching digital transformations within the information systems (IS) field are outlined and the importance of shifting the focus on how digital transformations are managed and sustained is stressed.
Zilong Huang, Youcheng Ben, Guozhong Luo + 3 more
ArXiv
A new vision transformer is proposed, named Shuffle Transformer, which is highly efficient and easy to implement by modifying two lines of code and the depth-wise convolution is introduced to complement the spatial shuffle for enhancing neighbor-window connections.
This work proposes a novel architecture, called the Energy Transformer, that uses a sequence of attention layers that are purposely designed to minimize a specifically engineered energy function, which is responsible for representing the relationships between the tokens.
Xuran Pan, Tianzhu Ye, Zhuofan Xia + 2 more
2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
A novel local attention module, Slide Attention, which leverages common convolution operations to achieve high efficiency, flexibility and generalizability and is applicable to a variety of advanced Vision Transformer models and compatible with various hardware devices, and achieves consistently improved performances on comprehensive benchmarks.
Rosena Shintabella, Catur Edi Widodo, Adi Wibowo
International Journal of Innovative Science and Research Technology (IJISRT)
An innovative model is proposed to improve the accuracy of lost of life transfomer prediction using stacking ensembles enhanced with genetic algorithm (GA), and the developed framework presents a promising solution for accurate and reliable transformer life prediction.
Anurag Arnab, Mostafa Dehghani, G. Heigold + 3 more
2021 IEEE/CVF International Conference on Computer Vision (ICCV)
This work shows how to effectively regularise the model during training and leverage pretrained image models to be able to train on comparatively small datasets, and achieves state-of-the-art results on multiple video classification benchmarks.
Bernard Moussad, Rahmatullah Roche, Debswapna Bhattacharya
Proceedings of the National Academy of Sciences of the United States of America
The predictive modeling performance of the state-of-the-art protein structure prediction methods built on transformers for 69 protein targets from the recently concluded Critical Assessment of Structure Prediction (CASP15) challenge is reported.
Deepali Jain, K. Choromanski, Sumeet Singh + 4 more
ArXiv
Mnemosyne is a new class of learnable optimizers based on the novel spatio-temporal low-rank implicit attention Transformers that can learn to train entire neural network architectures, including other Transformers, without any task-specific optimizer tuning.
Qingsong Wen, Tian Zhou, Chao Zhang + 4 more
journal unavailable
This paper systematically review Transformer schemes for time series modeling by highlighting their strengths as well as limitations and categorizes time series Transformers based on common tasks including forecasting, anomaly detection, and classification.
P. Faratin, Ray Garcia, Jacomo Corbo
ArXiv
The goal of this article is to offer an organizational framework for making rational choices as enterprises start their transformation journey towards an AI first organization.
Nouha Dziri, Ximing Lu, Melanie Sclar + 13 more
ArXiv
The empirical findings suggest that transformer LLMs solve compositional tasks by reducing multi-step compositional reasoning into linearized subgraph matching, without necessarily developing systematic problem-solving skills.
Patrick Esser, Sumith Kulal, A. Blattmann + 14 more
ArXiv
This work improves existing noise sampling techniques for training rectified flow models by biasing them towards perceptually relevant scales and presents a novel transformer-based architecture for text-to-image generation that uses separate weights for the two modalities and enables a bidirectional flow of information between image and text tokens.
A novel Spike-Driven Self-Attention (SDSA), which exploits only mask and addition operations without any multiplication, and thus having up to $87.2\times$ lower computation energy than vanilla self-attention.
Chen Zhu, Wei Ping, Chaowei Xiao + 4 more
journal unavailable
This paper proposes Long-Short Transformer (Transformer-LS), an efficient self-attention mechanism for modeling long sequences with linear complexity for both language and vision tasks, and proposes a dual normalization strategy to account for the scale mismatch between the two attention mechanisms.
Opher Lieber, Barak Lenz, Hofit Bata + 19 more
ArXiv
Jamba is presented, a new base large language model based on a novel hybrid Transformer-Mamba mixture-of-experts (MoE) architecture that provides high throughput and small memory footprint compared to vanilla Transformers, and at the same time state-of-the-art performance on standard language model benchmarks and long-context evaluations.
Mostafa Dehghani, Josip Djolonga, Basil Mustafa + 39 more
ArXiv
A recipe for highly efficient and stable training of a 22B-parameter ViT (ViT-22B) and a wide variety of experiments on the resulting model, which demonstrates the potential for "LLM-like"scaling in vision, and provides key steps towards getting there.
René Ranftl, Alexey Bochkovskiy, V. Koltun
2021 IEEE/CVF International Conference on Computer Vision (ICCV)
D dense prediction transformers, an architecture that leverages vision transformers in place of convolutional networks as a backbone for dense prediction tasks, can be fine-tuned on smaller datasets such as NYUv2, KITTI, and Pascal Context where it also sets the new state of the art.
The Colorization Transformer is presented, a novel approach for diverse high fidelity image colorization based on self-attention that outperforms the previous state-of-the-art on colorising ImageNet based on FID results and based on a human evaluation in a Mechanical Turk test.
William Merrill, Ashish Sabharwal
ArXiv
This paper aims to demonstrate how transformers’ reasoning can be improved by allowing them to use a “chain of thought” or “scratchpad”, i.e., generate and condition on a sequence of intermediate tokens before answering.
Ali Hassani, Steven Walton, Jiacheng Li + 2 more
2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
NA is a pixel-wise operation, localizing self attention to the nearest neighboring pixels, and therefore enjoys a linear time and space complexity compared to the quadratic complexity of SA, and is the first efficient and scalable sliding window attention mechanism for vision.
Ailing Zeng, Mu-Hwa Chen, L. Zhang + 1 more
journal unavailable
Experimental results on nine real-life datasets show that LTSF-Linear surprisingly outperforms existing sophisticated Transformer-based L TSF models in all cases, and often by a large margin.
Qinqing Zheng, Amy Zhang, Aditya Grover
journal unavailable
This work proposes Online Decision Transformers (ODT), an RL algorithm based on sequence modeling that blends offline pretraining with online finetuning in a unified framework that is competitive with the state-of-the-art in absolute performance on the D4RL benchmark but shows much more significant gains during the finet tuning procedure.