dblp: Martin Jaggi

Channel: dblp: Martin Jaggi

Image may be NSFW.
Clik here to view.

Beyond spectral gap: The role of the topology in decentralized learning.

December 31, 2021, 3:00 pm

Thijs Vogels, Hadrien Hendrikx, Martin Jaggi: Beyond spectral gap: The role of the topology in decentralized learning. CoRR abs/2206.03093 (2022)

View Article

Image may be NSFW.
Clik here to view.

On Avoiding Local Minima Using Gradient Descent With Large Learning Rates.

December 31, 2021, 3:00 pm

Amirkeivan Mohtashami, Martin Jaggi, Sebastian U. Stich: On Avoiding Local Minima Using Gradient Descent With Large Learning Rates. CoRR abs/2205.15142 (2022)

View Article

Image may be NSFW.
Clik here to view.

SKILL: Structured Knowledge Infusion for Large Language Models.

December 31, 2021, 3:00 pm

Fedor Moiseev, Zhe Dong, Enrique Alfonseca, Martin Jaggi: SKILL: Structured Knowledge Infusion for Large Language Models. CoRR abs/2205.08184 (2022)

View Article

Image may be NSFW.
Clik here to view.

Data-heterogeneity-aware Mixing for Decentralized Learning.

December 31, 2021, 3:00 pm

Yatin Dandi, Anastasia Koloskova, Martin Jaggi, Sebastian U. Stich: Data-heterogeneity-aware Mixing for Decentralized Learning. CoRR abs/2204.06477 (2022)

View Article

Image may be NSFW.
Clik here to view.

Improving Generalization via Uncertainty Driven Perturbations.

December 31, 2021, 3:00 pm

Matteo Pagliardini, Gilberto Manunza, Martin Jaggi, Michael I. Jordan, Tatjana Chavdarova: Improving Generalization via Uncertainty Driven Perturbations. CoRR abs/2202.05737 (2022)

View Article

Image may be NSFW.
Clik here to view.

Agree to Disagree: Diversity through Disagreement for Better Transferability.

December 31, 2021, 3:00 pm

Matteo Pagliardini, Martin Jaggi, François Fleuret, Sai Praneeth Karimireddy: Agree to Disagree: Diversity through Disagreement for Better Transferability. CoRR abs/2202.04414 (2022)

View Article

Image may be NSFW.
Clik here to view.

Characterizing & Finding Good Data Orderings for Fast Convergence of...

December 31, 2021, 3:00 pm

Amirkeivan Mohtashami, Sebastian U. Stich, Martin Jaggi: Characterizing & Finding Good Data Orderings for Fast Convergence of Sequential Gradient Methods. CoRR abs/2202.01838 (2022)

View Article

Image may be NSFW.
Clik here to view.

Byzantine-Robust Decentralized Learning via Self-Centered Clipping.

December 31, 2021, 3:00 pm

Lie He, Sai Praneeth Karimireddy, Martin Jaggi: Byzantine-Robust Decentralized Learning via Self-Centered Clipping. CoRR abs/2202.01545 (2022)

View Article

Image may be NSFW.
Clik here to view.

Beyond spectral gap: the role of the topology in decentralized learning.

December 31, 2021, 3:00 pm

Thijs Vogels, Hadrien Hendrikx, Martin Jaggi: Beyond spectral gap: the role of the topology in decentralized learning. NeurIPS 2022

View Article

Image may be NSFW.
Clik here to view.

FLamby: Datasets and Benchmarks for Cross-Silo Federated Learning in...

December 31, 2021, 3:00 pm

Jean Ogier du Terrail, Samy-Safwan Ayed, Edwige Cyffers, Felix Grimberg, Chaoyang He, Regis Loeb, Paul Mangold, Tanguy Marchand, Othmane Marfoq, Erum Mushtaq, Boris Muzellec, Constantin Philippenko,...

View Article

Image may be NSFW.
Clik here to view.

Sharper Convergence Guarantees for Asynchronous SGD for Distributed and...

December 31, 2021, 3:00 pm

Anastasia Koloskova, Sebastian U. Stich, Martin Jaggi: Sharper Convergence Guarantees for Asynchronous SGD for Distributed and Federated Learning. NeurIPS 2022

View Article

Image may be NSFW.
Clik here to view.

SKILL: Structured Knowledge Infusion for Large Language Models.

December 31, 2021, 3:00 pm

Fedor Moiseev, Zhe Dong, Enrique Alfonseca, Martin Jaggi: SKILL: Structured Knowledge Infusion for Large Language Models. NAACL-HLT 2022: 1581-1588

View Article

Image may be NSFW.
Clik here to view.

Byzantine-Robust Learning on Heterogeneous Datasets via Bucketing.

December 31, 2021, 3:00 pm

Sai Praneeth Karimireddy, Lie He, Martin Jaggi: Byzantine-Robust Learning on Heterogeneous Datasets via Bucketing. ICLR 2022

View Article

Image may be NSFW.
Clik here to view.

Masked Training of Neural Networks with Partial Gradients.

December 31, 2021, 3:00 pm

Amirkeivan Mohtashami, Martin Jaggi, Sebastian U. Stich: Masked Training of Neural Networks with Partial Gradients. AISTATS 2022: 5876-5890

View Article

Image may be NSFW.
Clik here to view.

Implicit Gradient Alignment in Distributed and Federated Learning.

December 31, 2021, 3:00 pm

Yatin Dandi, Luis Barba, Martin Jaggi: Implicit Gradient Alignment in Distributed and Federated Learning. AAAI 2022: 6454-6462

View Article

Image may be NSFW.
Clik here to view.

MEDITRON-70B: Scaling Medical Pretraining for Large Language Models.

December 31, 2022, 3:00 pm

Zeming Chen, Alejandro Hernández-Cano, Angelika Romanou, Antoine Bonnet, Kyle Matoba, Francesco Salvi, Matteo Pagliardini, Simin Fan, Andreas Köpf, Amirkeivan Mohtashami, Alexandre Sallinen, Alireza...

View Article

Image may be NSFW.
Clik here to view.

Controllable Topic-Focused Abstractive Summarization.

December 31, 2022, 3:00 pm

Seyed Ali Bahrainian, Martin Jaggi, Carsten Eickhoff: Controllable Topic-Focused Abstractive Summarization. CoRR abs/2311.06724 (2023)

View Article

Image may be NSFW.
Clik here to view.

DoGE: Domain Reweighting with Generalization Estimation.

December 31, 2022, 3:00 pm

Simin Fan, Matteo Pagliardini, Martin Jaggi: DoGE: Domain Reweighting with Generalization Estimation. CoRR abs/2310.15393 (2023)

View Article

Image may be NSFW.
Clik here to view.

Irreducible Curriculum for Language Model Pretraining.

December 31, 2022, 3:00 pm

Simin Fan, Martin Jaggi: Irreducible Curriculum for Language Model Pretraining. CoRR abs/2310.15389 (2023)

View Article

Image may be NSFW.
Clik here to view.

LASER: Linear Compression in Wireless Distributed Optimization.

December 31, 2022, 3:00 pm

Ashok Vardhan Makkuva, Marco Bondaschi, Thijs Vogels, Martin Jaggi, Hyeji Kim, Michael C. Gastpar: LASER: Linear Compression in Wireless Distributed Optimization. CoRR abs/2310.13033 (2023)

View Article

Image may be NSFW.
Clik here to view.

CoTFormer: More Tokens With Attention Make Up For Less Depth.

December 31, 2022, 3:00 pm

Amirkeivan Mohtashami, Matteo Pagliardini, Martin Jaggi: CoTFormer: More Tokens With Attention Make Up For Less Depth. CoRR abs/2310.10845 (2023)

View Article

Image may be NSFW.
Clik here to view.

MultiModN- Multimodal, Multi-Task, Interpretable Modular Networks.

December 31, 2022, 3:00 pm

Vinitra Swamy, Malika Satayeva, Jibril Frej, Thierry Bossy, Thijs Vogels, Martin Jaggi, Tanja Käser, Mary-Anne Hartley: MultiModN- Multimodal, Multi-Task, Interpretable Modular Networks. CoRR...

View Article

Image may be NSFW.
Clik here to view.

Layerwise Linear Mode Connectivity.

December 31, 2022, 3:00 pm

Linara Adilova, Asja Fischer, Martin Jaggi: Layerwise Linear Mode Connectivity. CoRR abs/2307.06966 (2023)

View Article

Image may be NSFW.
Clik here to view.

Provably Personalized and Robust Federated Learning.

December 31, 2022, 3:00 pm

Mariel A. Werner, Lie He, Sai Praneeth Karimireddy, Michael I. Jordan, Martin Jaggi: Provably Personalized and Robust Federated Learning. CoRR abs/2306.08393 (2023)

View Article

Image may be NSFW.
Clik here to view.

Faster Causal Attention Over Large Sequences Through Sparse Flash Attention.

December 31, 2022, 3:00 pm

Matteo Pagliardini, Daniele Paliotta, Martin Jaggi, François Fleuret: Faster Causal Attention Over Large Sequences Through Sparse Flash Attention. CoRR abs/2306.01160 (2023)

View Article

Image may be NSFW.
Clik here to view.

Shuffle SGD is Always Better than SGD: Improved Analysis of SGD with...

December 31, 2022, 3:00 pm

Anastasia Koloskova, Nikita Doikov, Sebastian U. Stich, Martin Jaggi: Shuffle SGD is Always Better than SGD: Improved Analysis of SGD with Arbitrary Data Orders. CoRR abs/2305.19259 (2023)

View Article

Image may be NSFW.
Clik here to view.

Collaborative Learning via Prediction Consensus.

December 31, 2022, 3:00 pm

Dongyang Fan, Celestine Mendler-Dünner, Martin Jaggi: Collaborative Learning via Prediction Consensus. CoRR abs/2305.18497 (2023)

View Article

Image may be NSFW.
Clik here to view.

Rotational Optimizers: Simple & Robust DNN Training.

December 31, 2022, 3:00 pm

Atli Kosson, Bettina Messmer, Martin Jaggi: Rotational Optimizers: Simple & Robust DNN Training. CoRR abs/2305.17212 (2023)

View Article

Image may be NSFW.
Clik here to view.

Ghost Noise for Regularizing Deep Neural Networks.

December 31, 2022, 3:00 pm

Atli Kosson, Dongyang Fan, Martin Jaggi: Ghost Noise for Regularizing Deep Neural Networks. CoRR abs/2305.17205 (2023)

View Article

Image may be NSFW.
Clik here to view.

Hardware-Efficient Transformer Training via Piecewise Affine Operations.

December 31, 2022, 3:00 pm

Atli Kosson, Martin Jaggi: Hardware-Efficient Transformer Training via Piecewise Affine Operations. CoRR abs/2305.17190 (2023)

View Article

Image may be NSFW.
Clik here to view.

Landmark Attention: Random-Access Infinite Context Length for Transformers.

December 31, 2022, 3:00 pm

Amirkeivan Mohtashami, Martin Jaggi: Landmark Attention: Random-Access Infinite Context Length for Transformers. CoRR abs/2305.16300 (2023)

View Article

Image may be NSFW.
Clik here to view.

Linearization Algorithms for Fully Composite Optimization.

December 31, 2022, 3:00 pm

Maria-Luiza Vladarean, Nikita Doikov, Martin Jaggi, Nicolas Flammarion: Linearization Algorithms for Fully Composite Optimization. CoRR abs/2302.12808 (2023)

View Article

Image may be NSFW.
Clik here to view.

Unified Convergence Theory of Stochastic and Variance-Reduced Cubic Newton...

December 31, 2022, 3:00 pm

El Mahdi Chayti, Nikita Doikov, Martin Jaggi: Unified Convergence Theory of Stochastic and Variance-Reduced Cubic Newton Methods. CoRR abs/2302.11962 (2023)

View Article

Image may be NSFW.
Clik here to view.

Beyond spectral gap (extended): The role of the topology in decentralized...

December 31, 2022, 3:00 pm

Thijs Vogels, Hadrien Hendrikx, Martin Jaggi: Beyond spectral gap (extended): The role of the topology in decentralized learning. CoRR abs/2301.02151 (2023)

View Article

Image may be NSFW.
Clik here to view.

Special Properties of Gradient Descent with Large Learning Rates.

December 31, 2022, 3:00 pm

Amirkeivan Mohtashami, Martin Jaggi, Sebastian U. Stich: Special Properties of Gradient Descent with Large Learning Rates. ICML 2023: 25082-25104

View Article

Image may be NSFW.
Clik here to view.

Second-Order Optimization with Lazy Hessians.

December 31, 2022, 3:00 pm

Nikita Doikov, El Mahdi Chayti, Martin Jaggi: Second-Order Optimization with Lazy Hessians. ICML 2023: 8138-8161

View Article

Image may be NSFW.
Clik here to view.

Agree to Disagree: Diversity through Disagreement for Better Transferability.

December 31, 2022, 3:00 pm

Matteo Pagliardini, Martin Jaggi, François Fleuret, Sai Praneeth Karimireddy: Agree to Disagree: Diversity through Disagreement for Better Transferability. ICLR 2023

View Article

Image may be NSFW.
Clik here to view.

Linearization Algorithms for Fully Composite Optimization.

December 31, 2022, 3:00 pm

Maria-Luiza Vladarean, Nikita Doikov, Martin Jaggi, Nicolas Flammarion: Linearization Algorithms for Fully Composite Optimization. COLT 2023: 3669-3695

View Article

Image may be NSFW.
Clik here to view.

SIMSUM: Document-level Text Simplification via Simultaneous Summarization.

December 31, 2022, 3:00 pm

Sofia Blinova, Xinyu Zhou, Martin Jaggi, Carsten Eickhoff, Seyed Ali Bahrainian: SIMSUM: Document-level Text Simplification via Simultaneous Summarization. ACL (1) 2023: 9927-9944

View Article

Image may be NSFW.
Clik here to view.

DeepBreath - automated detection of respiratory pathology from lung...

December 31, 2022, 3:00 pm

Julien Heitmann, Alban Glangetas, Jonathan Doenz, Juliane Dervaux, Deeksha M. Shama, Daniel Hinjos Garcia, Mohamed Rida Benissa, Aymeric Cantais, Alexandre Perez, Daniel Müller, Tatjana Chavdarova,...

View Article

Image may be NSFW.
Clik here to view.

Attention with Markov: A Framework for Principled Analysis of Transformers...

December 31, 2023, 3:00 pm

Ashok Vardhan Makkuva, Marco Bondaschi, Adway Girish, Alliot Nagle, Martin Jaggi, Hyeji Kim, Michael Gastpar: Attention with Markov: A Framework for Principled Analysis of Transformers via Markov...

View Article

Image may be NSFW.
Clik here to view.

InterpretCC: Conditional Computation for Inherently Interpretable Neural...

December 31, 2023, 3:00 pm

Vinitra Swamy, Julian Blackwell, Jibril Frej, Martin Jaggi, Tanja Käser: InterpretCC: Conditional Computation for Inherently Interpretable Neural Networks. CoRR abs/2402.02933 (2024)

View Article

Image may be NSFW.
Clik here to view.

DenseFormer: Enhancing Information Flow in Transformers via Depth Weighted...

December 31, 2023, 3:00 pm

Matteo Pagliardini, Amirkeivan Mohtashami, François Fleuret, Martin Jaggi: DenseFormer: Enhancing Information Flow in Transformers via Depth Weighted Averaging. CoRR abs/2402.02622 (2024)

View Article

Image may be NSFW.
Clik here to view.

MultiMoDN - Multimodal, Multi-Task, Interpretable Modular Networks.

December 31, 2022, 3:00 pm

Vinitra Swamy, Malika Satayeva, Jibril Frej, Thierry Bossy, Thijs Vogels, Martin Jaggi, Tanja Käser, Mary-Anne Hartley: MultiMoDN - Multimodal, Multi-Task, Interpretable Modular Networks. NeurIPS 2023

View Article

Image may be NSFW.
Clik here to view.

Random-Access Infinite Context Length for Transformers.

December 31, 2022, 3:00 pm

Amirkeivan Mohtashami, Martin Jaggi: Random-Access Infinite Context Length for Transformers. NeurIPS 2023

View Article

Image may be NSFW.
Clik here to view.

Multiplication-Free Transformer Training via Piecewise Affine Operations.

December 31, 2022, 3:00 pm

Atli Kosson, Martin Jaggi: Multiplication-Free Transformer Training via Piecewise Affine Operations. NeurIPS 2023

View Article

Image may be NSFW.
Clik here to view.

Collaborative Learning via Prediction Consensus.

December 31, 2022, 3:00 pm

Dongyang Fan, Celestine Mendler-Dünner, Martin Jaggi: Collaborative Learning via Prediction Consensus. NeurIPS 2023

View Article

Image may be NSFW.
Clik here to view.

Fast Attention Over Long Sequences With Dynamic Sparse Flash Attention.

December 31, 2022, 3:00 pm

Matteo Pagliardini, Daniele Paliotta, Martin Jaggi, François Fleuret: Fast Attention Over Long Sequences With Dynamic Sparse Flash Attention. NeurIPS 2023

View Article