Foundational Transformer Architecture

August 02, 2024 October 17, 2024

Theoretical and critical analysis of Transformer architecture;
Development of new Transformer architectures including new Layers (Gated Residual Kolmogorov-Arnold Networks, Feature Pyramid Network), Positional Encoders (Attention-Leveraged Rotary Positional Embeddings), Attentions (Multi-scale attention, Attention with uncertainty estimation)
Implementation of the new architectures in an LLM and performance evaluation by experimental testing;
Statistical analysis of the obtained performance and selection of the best architectures;
Development of a framework for optimal selection of architectures and parameters.

Goal

The project aims to explore and develop new architectures and strategies to improve the performance of Language Models (LLMs) based on the Transformer architecture, increasing their effectiveness, efficiency and robustness against hallucinations.

Supervisors

Marco Polignano