Foundational Transformer Architecture
August 02, 2024
October 17, 2024
- Theoretical and critical analysis of Transformer architecture;
- Development of new Transformer architectures including new Layers (Gated Residual Kolmogorov-Arnold Networks, Feature Pyramid Network), Positional Encoders (Attention-Leveraged Rotary Positional Embeddings), Attentions (Multi-scale attention, Attention with uncertainty estimation)
- Implementation of the new architectures in an LLM and performance evaluation by experimental testing;
- Statistical analysis of the obtained performance and selection of the best architectures;
- Development of a framework for optimal selection of architectures and parameters.
Goal
The project aims to explore and develop new architectures and strategies to improve the performance of Language Models (LLMs) based on the Transformer architecture, increasing their effectiveness, efficiency and robustness against hallucinations.