Skip to yearly menu bar Skip to main content

Workshop

Next Generation of Sequence Modeling Architectures

Caglar Gulcehre · Razvan Pascanu · Antonio Orvieto · Carmen Amo Alonso · Maciej Wołczyk

Project Page

Abstract

This workshop aims to bring together various researchers to chart the course for the next generation of sequence models. The focus will be on better understanding the limitations of existing models like transformer architectures, recurrent neural networks, and state space models (e.g., S4, Mamba), as well as describing existing open problems. We will touch on topics such as memory, long-range context and in-context learning, optimization stability of these architectures, and their ability to represent different classes of problems. We will also cover interpretability and pragmatic aspects of getting these models to be efficient and perform well: how they should be scaled up, and the trade-offs and limitations imposed by current hardware. We will place additional emphasis on the discussion regarding how we should evaluate and benchmark sequential models at scale, for example, in the context of language or other domains like vision, audio, or biological signals.

Video

Chat is not available.

Schedule

Timezone: America/Los_Angeles

12:00 AM

Opening Remarks

Caglar Gulcehre · Razvan Pascanu · Antonio Orvieto · Carmen Amo Alonso · Maciej Wołczyk

Video

12:10 AM

Sepp Hochreiter

Sepp Hochreiter

Video

12:40 AM

Poster Spotlights

Video

1:00 AM

Albert Gu

Video

1:30 AM

Soham De

Video

2:00 AM

Angela Fan

Video

2:30 AM

Joao Sacramento

Joao Sacramento

Video

3:00 AM

Poster Session

4:30 AM

Lunch Break

5:00 AM

Stephanie Chan

Video

5:30 AM

Hava Siegelman

Video

6:00 AM

Poster Spotlights

Video

6:30 AM

Coffee Break

7:00 AM

Panel Discussion

Video

Hierarchical State Space Models for Continuous Sequence-to-Sequence Modeling

Raunaq Bhirangi · Chenyu Wang · Venkatesh Pattabiraman · Carmel Majidi · Abhinav Gupta · Tess Hellebrekers · Lerrel Pinto

Fleet of Agents: Coordinated Problem Solving with Large Language Models using Genetic Particle Filtering

Akhil Arora · Lars Klein · Nearchos Potamitis · Roland Aydin · Caglar Gulcehre · Robert West

Delay Embedding Theory of Neural Sequence Models

Mitchell Ostrow · Adam Eisen · Ila R. Fiete

Pretrained Hybrids with MAD Skills

Nicholas Roberts · Samuel Guo · Zhiqi Gao · Satya Sai Srinath Namburi GNVV · Sonia Cromp · Chengjun Wu · Chengyu Duan · Frederic Sala

When can transformers compositionally generalize in-context?

Seijin Kobayashi · Simon Schug · Yassir Akram · Florian Redhardt · Johannes Von Oswald · Razvan Pascanu · Guillaume Lajoie · Joao Sacramento

Multi-Task Instruction Training of Text Diffusion Models

Changyou Chen · Gargi Balasubramaniam · Rui Meng · Han Zhao · Bunyamin Sisman · qingjun cui

HiPPO-Prophecy: State-Space Models can Provably Learn Dynamical Systems in Context

Federico Arangath Joseph · Noah Liniger · Kilian Haefeli

MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts

Maciej Pióro · Kamil Ciebiera · Krystian Król · Jan Ludziejewski · Michał Krutul · Jakub Krajewski · Szymon Antoniak · Piotr Milos · Marek Cygan · Sebastian Jaszczur

Needle in the Haystack for Memory Based Large Language Models

Elliot Nelson · Soham Dan · Georgios Kollias · Payel Das · Subhajit Chaudhury

QSMixer: Connecting SSMs with Mixer Models via Quasi-Separable Matrices

Ali Behrouz · Michele Santacatterina · Ramin Zabih

Parallelizing Autoregressive Generation with Variational State-Space Models

Gaspard Lambrechts · Yann Claes · Pierre Geurts · Damien Ernst

ECG Signal Denoising Using Multi-scale Patch Embedding and Transformers

Ding Zhu · Vishnu Chhabra · Mohammad Mahdi Khalili

The Role of State Matrix Initialization in SSMs: A Perspective on the Approximation-Estimation Tradeoff

Fusheng Liu · Qianxiao Li

State soup: in-context skill learning, retrieval and mixing

Maciej Pióro · Maciej Wołczyk · Razvan Pascanu · Johannes Von Oswald · Joao Sacramento

RotRNN: Modelling Long Sequences with Rotations

Rares Dolga · Kai Biegun · Jake Cunningham · David Barber

Towards a theory of learning dynamics in deep state space models

Jakub Smekal · Jimmy Smith · Michael Kleinman · Dan Biderman · Scott Linderman

Recurrent VAE with Gaussian Process Decoders for De novo Molecular Generation

Vidhi Lalchand · David Lines · Neil Lawrence

DynaGraph: Dynamic Contrastive Graph for Interpretable Multi-label Prediction using Time-Series EHR Data

Munib Mesinovic · Soheila Molaei · Peter Watkinson · Tingting Zhu

Repurposing Language Models into Embedding Models: Finding the Compute-Optimal Recipe

Albert Jiang · Alicja Ziarko · Bartosz Piotrowski · Wenda Li · Mateja Jamnik · Piotr Milos

Enhancing Transformer RNNs with Multiple Temporal Perspectives

Razvan Dumitru · Darius Peteleaza · Mihai Surdeanu

State Space Models for Brain Computer Interfaces?

Pablo Soëtard · Miran Özdogan · Oiwi Parker Jones

Reparameterized Multi-Resolution Convolutions for Long Sequence Modelling

Jake Cunningham · Giorgio Giannone · Mingtian Zhang · Marc Deisenroth

On the Power of Convolution-Augmented Transformer

Mingchen Li · Xuechen Zhang · Yixiao HUANG · Samet Oymak

Vision-LSTM: xLSTM as Generic Vision Backbone

Benedikt Alkin · Maximilian Beck · Korbinian Pöppel · Sepp Hochreiter · Johannes Brandstetter

EBBS: An Ensemble with Bi-Level Beam Search for Zero-Shot Machine Translation

Yuqiao Wen · Behzad Shayegh · Chenyang Huang · Yanshuai Cao · Lili Mou

Towards Dynamic Feature Acquisition on Medical Time Series by Maximizing Conditional Mutual Information

Fedor Sergeev · Paola Malsot · Gunnar Ratsch · Vincent Fortuin

Unlocking the Secrets of Linear Complexity Sequence Model from A Unified Perspective

Zhen Qin · Xuyang Shen · Dong Li · Weigao Sun · Stan Birchfield · Richard I Hartley · Yiran Zhong

OmniJARVIS: Unified Vision-Language-Action Tokenization Enables Open-World Instruction Following Agents

Zihao Wang · Shaofei Cai · Zhancun Mu · Haowei Lin · Ceyao Zhang · Xuejie Liu · Qing Li · Anji Liu · Xiaojian Ma · Yitao Liang

Reservoir Structured State Space Models

Giuseppe Lombardi · Claudio Gallicchio · Andrea Ceni

Recurrent Action Transformer with Memory

Egor Cherepanov · Aleksei Staroverov · Dmitry Yudin · Alexey Kovalev · Aleksandr Panov

Rough Transformers: Lightweight Continuous-Time Sequence Modelling with Path Signatures

Fernando Moreno-Pino · Alvaro Arroyo · Harrison Waldon · Xiaowen Dong · Alvaro Cartea

Viewing Attention as a Recurrent Neural Network

Leo Feng · Frederick Tung · Hossein Hajimirsadeghi · Mohamed Osama Ahmed · Yoshua Bengio · Greg Mori

Latte: Latent Attention for Linear Time Transformers

Rares Dolga · Marius Cobzarenco · Ahmed Shahin · David Barber

An All-MLP Sequence Modeling Architecture That Excels at Copying

Chenwei Cui · Zehao Yan · Gedeon Muhawenayo · Hannah Kerner

Randomized Signatures for processing long-range Sequences on Graphs

Lukas Gruber · Bernhard Schäfl · Johannes Brandstetter · Sepp Hochreiter

Selective Attention: Enhancing Transformer through Principled Context Control

Xuechen Zhang · Xiangyu Chang · Mingchen Li · Amit Roy-Chowdhury · Jiasi Chen · Samet Oymak

FutureTST: When Transformers Meet Future Exogenous Drivers

Kshitij Tayal · Arvind Renganathan · Vipin Kumar · Dan Lu

On Feature Learning in Structured State Space Models

Leena Chennuru Vankadara · Jin Xu · Moritz Haas · Volkan Cevher

xLSTM: Extended Long Short-Term Memory

Maximilian Beck · Korbinian Pöppel · Markus Spanring · Andreas Auer · Oleksandra Prudnikova · Michael Kopp · Günter Klambauer · Johannes Brandstetter · Sepp Hochreiter

Investigating Low-Rank Training in Transformer Language Models: Efficiency and Scaling Analysis

Xiuying Wei · Skander Moalla · Razvan Pascanu · Caglar Gulcehre

MSAMamba: Adapting Subquadratic Models To Long-Context DNA MSA Analysis

Vishrut Thoutam · Dina Ellsworth

BAM! Just Like That: Simple and Efficient Parameter Upcycling for Mixture of Experts

Qizhen Zhang · Nikolas Gritsch · Dwaraknath Gnaneshwar · Simon Guo · David Cairuz · Bharat Venkitesh · Jakob Foerster · Phil Blunsom · Sebastian Ruder · Ahmet Üstün · Acyr Locatelli

Reservoir Memory Networks: Long-range temporal dependencies with untrained RNNs

Claudio Gallicchio · Andrea Ceni

On the Bottleneck of State Space Models: Locality and Oversmoothing

Pragya Srivastava · Peihao Wang · Ruisi Cai · Jiajun Zhu · Pan Li · Zhangyang “Atlas” Wang

LongSSM: On the Length Extension of State-space Models in Language Modelling

Shida Wang

KalMamba: Towards Efficient Probabilistic State Space Models for RL under Uncertainty

Philipp Becker · Niklas Freymuth · Gerhard Neumann

Associative Recurrent Memory Transformer

Ivan Rodkin · Yuri Kuratov · Aidar Bulatov · Mikhail Burtsev

Orthogonal residual connections for long-term memory retention in recurrent neural networks

Andrea Ceni · Claudio Gallicchio

Enhancing Sequence Modeling with Multi-Resolution State Space Models

Mahdi Karami · Ali Behrouz

SeRpEnt: Selective Resampling for Expressive State Space Models

Stefano Rando · Luca Romani · Matteo Migliarini · Denis Gudovskiy · Luca Franco · Luca Rigazio · Fabio Galasso

Chimera: Effectively Modeling Multivariate Time Series with 2-Dimensional State Space Models

Ali Behrouz · Michele Santacatterina · Ramin Zabih

Q-S5 Towards Quantized State Space Models

Steven Abreu · Jens Egholm Pedersen · Kade Heckel · Alessandro Pierro

Serpent: Scalable and Efficient Image Restoration via Multi-scale Structured State Space Models

Mohammad Shahab Sepehri · Zalan Fabian · Mahdi Soltanolkotabi

Probing the Decision Boundaries of In-context Learning in Large Language Models

Siyan Zhao · Tung Nguyen · Aditya Grover

Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations

Alexander Hägele · Elie Bakouch · Atli Kosson · Loubna Ben allal · Leandro Von Werra · Martin Jaggi

Length independent generalization bounds for deep SSM architectures

Dániel Rácz · Mihaly Petreczky · Balint Daroczy