
Vaswani et al (Public Board)
: Vaswani et al. (2017)
You're referring to the seminal paper "Attention Is All You Need" by Vaswani et al. published in 2017!
This paper introduced the Transformer model, a groundbreaking neural network architecture that revolutionized the field of natural language processing (NLP) and has since become a standard building block in many state-of-the-art models.
The Transformer architecture is particularly notable for its use of self-attention mechanisms, which allow the model to focus on specific parts of the input sequence and weigh their importance. This approach eliminated the need for recurrent neural networks (RNNs) and their associated recurrence and sequential processing, making it possible to process input sequences of arbitrary length in parallel.
The paper demonstrated the effectiveness of the Transformer model on several NLP tasks, including machine translation, and showed significant improvements over traditional RNN-based models. The Transformer architecture has since been widely adopted in many areas of NLP, including language translation, text summarization, question answering, and more.
Some key features of the Transformer architecture include:
1. Self-attention mechanism: allows the model to focus on specific parts of the input sequence.
2. Encoder-decoder structure: separates the input sequence into an encoder (which generates a set of contextualized representations) and a decoder (which generates the output sequence).
3. Multi-head attention: allows the model to jointly attend to information from different representation subspaces at different positions.
4. Positional encoding: uses sinusoidal functions to encode position information into the input sequence.
The Transformer architecture has had a profound impact on the field of NLP and has paved the way for many subsequent advancements in areas such as language modeling, machine translation, and text generation.
----------------------------------------------------------------
: what is the url
The original paper "Attention Is All You Need" by Vaswani et al. (2017) is published on the arXiv preprint server. The URL is:
https://arxiv.org/abs/1706.03762
This is a freely accessible online paper, and you can download the PDF or read it directly on the arXiv website.
----------------------------------------------------------------
One of the subjects I did in my Masters degree was Neural Networks so this is all pretty interesting. In essence the Vaswani paper made training parallelisable. You can imagine the speedup. They tested it on English-German and English-French translation.
https://arxiv.org/abs/1706.03762 abstract
https://arxiv.org/pdf/1706.03762 pdf
The pdf was last revised Aug 2023.