Beyond spectral gap: The role of the topology in decentralized learning.
Thijs Vogels, Hadrien Hendrikx, Martin Jaggi: Beyond spectral gap: The role of the topology in decentralized learning. CoRR abs/2206.03093 (2022)
View ArticleOn Avoiding Local Minima Using Gradient Descent With Large Learning Rates.
Amirkeivan Mohtashami, Martin Jaggi, Sebastian U. Stich: On Avoiding Local Minima Using Gradient Descent With Large Learning Rates. CoRR abs/2205.15142 (2022)
View ArticleSKILL: Structured Knowledge Infusion for Large Language Models.
Fedor Moiseev, Zhe Dong, Enrique Alfonseca, Martin Jaggi: SKILL: Structured Knowledge Infusion for Large Language Models. CoRR abs/2205.08184 (2022)
View ArticleData-heterogeneity-aware Mixing for Decentralized Learning.
Yatin Dandi, Anastasia Koloskova, Martin Jaggi, Sebastian U. Stich: Data-heterogeneity-aware Mixing for Decentralized Learning. CoRR abs/2204.06477 (2022)
View ArticleImproving Generalization via Uncertainty Driven Perturbations.
Matteo Pagliardini, Gilberto Manunza, Martin Jaggi, Michael I. Jordan, Tatjana Chavdarova: Improving Generalization via Uncertainty Driven Perturbations. CoRR abs/2202.05737 (2022)
View ArticleAgree to Disagree: Diversity through Disagreement for Better Transferability.
Matteo Pagliardini, Martin Jaggi, François Fleuret, Sai Praneeth Karimireddy: Agree to Disagree: Diversity through Disagreement for Better Transferability. CoRR abs/2202.04414 (2022)
View ArticleCharacterizing & Finding Good Data Orderings for Fast Convergence of...
Amirkeivan Mohtashami, Sebastian U. Stich, Martin Jaggi: Characterizing & Finding Good Data Orderings for Fast Convergence of Sequential Gradient Methods. CoRR abs/2202.01838 (2022)
View ArticleByzantine-Robust Decentralized Learning via Self-Centered Clipping.
Lie He, Sai Praneeth Karimireddy, Martin Jaggi: Byzantine-Robust Decentralized Learning via Self-Centered Clipping. CoRR abs/2202.01545 (2022)
View ArticleBeyond spectral gap: the role of the topology in decentralized learning.
Thijs Vogels, Hadrien Hendrikx, Martin Jaggi: Beyond spectral gap: the role of the topology in decentralized learning. NeurIPS 2022
View ArticleFLamby: Datasets and Benchmarks for Cross-Silo Federated Learning in...
Jean Ogier du Terrail, Samy-Safwan Ayed, Edwige Cyffers, Felix Grimberg, Chaoyang He, Regis Loeb, Paul Mangold, Tanguy Marchand, Othmane Marfoq, Erum Mushtaq, Boris Muzellec, Constantin Philippenko,...
View ArticleSharper Convergence Guarantees for Asynchronous SGD for Distributed and...
Anastasia Koloskova, Sebastian U. Stich, Martin Jaggi: Sharper Convergence Guarantees for Asynchronous SGD for Distributed and Federated Learning. NeurIPS 2022
View ArticleSKILL: Structured Knowledge Infusion for Large Language Models.
Fedor Moiseev, Zhe Dong, Enrique Alfonseca, Martin Jaggi: SKILL: Structured Knowledge Infusion for Large Language Models. NAACL-HLT 2022: 1581-1588
View ArticleByzantine-Robust Learning on Heterogeneous Datasets via Bucketing.
Sai Praneeth Karimireddy, Lie He, Martin Jaggi: Byzantine-Robust Learning on Heterogeneous Datasets via Bucketing. ICLR 2022
View ArticleMasked Training of Neural Networks with Partial Gradients.
Amirkeivan Mohtashami, Martin Jaggi, Sebastian U. Stich: Masked Training of Neural Networks with Partial Gradients. AISTATS 2022: 5876-5890
View ArticleImplicit Gradient Alignment in Distributed and Federated Learning.
Yatin Dandi, Luis Barba, Martin Jaggi: Implicit Gradient Alignment in Distributed and Federated Learning. AAAI 2022: 6454-6462
View ArticleMEDITRON-70B: Scaling Medical Pretraining for Large Language Models.
Zeming Chen, Alejandro Hernández-Cano, Angelika Romanou, Antoine Bonnet, Kyle Matoba, Francesco Salvi, Matteo Pagliardini, Simin Fan, Andreas Köpf, Amirkeivan Mohtashami, Alexandre Sallinen, Alireza...
View ArticleControllable Topic-Focused Abstractive Summarization.
Seyed Ali Bahrainian, Martin Jaggi, Carsten Eickhoff: Controllable Topic-Focused Abstractive Summarization. CoRR abs/2311.06724 (2023)
View ArticleDoGE: Domain Reweighting with Generalization Estimation.
Simin Fan, Matteo Pagliardini, Martin Jaggi: DoGE: Domain Reweighting with Generalization Estimation. CoRR abs/2310.15393 (2023)
View ArticleIrreducible Curriculum for Language Model Pretraining.
Simin Fan, Martin Jaggi: Irreducible Curriculum for Language Model Pretraining. CoRR abs/2310.15389 (2023)
View ArticleLASER: Linear Compression in Wireless Distributed Optimization.
Ashok Vardhan Makkuva, Marco Bondaschi, Thijs Vogels, Martin Jaggi, Hyeji Kim, Michael C. Gastpar: LASER: Linear Compression in Wireless Distributed Optimization. CoRR abs/2310.13033 (2023)
View ArticleCoTFormer: More Tokens With Attention Make Up For Less Depth.
Amirkeivan Mohtashami, Matteo Pagliardini, Martin Jaggi: CoTFormer: More Tokens With Attention Make Up For Less Depth. CoRR abs/2310.10845 (2023)
View ArticleMultiModN- Multimodal, Multi-Task, Interpretable Modular Networks.
Vinitra Swamy, Malika Satayeva, Jibril Frej, Thierry Bossy, Thijs Vogels, Martin Jaggi, Tanja Käser, Mary-Anne Hartley: MultiModN- Multimodal, Multi-Task, Interpretable Modular Networks. CoRR...
View ArticleLayerwise Linear Mode Connectivity.
Linara Adilova, Asja Fischer, Martin Jaggi: Layerwise Linear Mode Connectivity. CoRR abs/2307.06966 (2023)
View ArticleProvably Personalized and Robust Federated Learning.
Mariel A. Werner, Lie He, Sai Praneeth Karimireddy, Michael I. Jordan, Martin Jaggi: Provably Personalized and Robust Federated Learning. CoRR abs/2306.08393 (2023)
View ArticleFaster Causal Attention Over Large Sequences Through Sparse Flash Attention.
Matteo Pagliardini, Daniele Paliotta, Martin Jaggi, François Fleuret: Faster Causal Attention Over Large Sequences Through Sparse Flash Attention. CoRR abs/2306.01160 (2023)
View ArticleShuffle SGD is Always Better than SGD: Improved Analysis of SGD with...
Anastasia Koloskova, Nikita Doikov, Sebastian U. Stich, Martin Jaggi: Shuffle SGD is Always Better than SGD: Improved Analysis of SGD with Arbitrary Data Orders. CoRR abs/2305.19259 (2023)
View ArticleCollaborative Learning via Prediction Consensus.
Dongyang Fan, Celestine Mendler-Dünner, Martin Jaggi: Collaborative Learning via Prediction Consensus. CoRR abs/2305.18497 (2023)
View ArticleRotational Optimizers: Simple & Robust DNN Training.
Atli Kosson, Bettina Messmer, Martin Jaggi: Rotational Optimizers: Simple & Robust DNN Training. CoRR abs/2305.17212 (2023)
View ArticleGhost Noise for Regularizing Deep Neural Networks.
Atli Kosson, Dongyang Fan, Martin Jaggi: Ghost Noise for Regularizing Deep Neural Networks. CoRR abs/2305.17205 (2023)
View ArticleHardware-Efficient Transformer Training via Piecewise Affine Operations.
Atli Kosson, Martin Jaggi: Hardware-Efficient Transformer Training via Piecewise Affine Operations. CoRR abs/2305.17190 (2023)
View ArticleLandmark Attention: Random-Access Infinite Context Length for Transformers.
Amirkeivan Mohtashami, Martin Jaggi: Landmark Attention: Random-Access Infinite Context Length for Transformers. CoRR abs/2305.16300 (2023)
View ArticleLinearization Algorithms for Fully Composite Optimization.
Maria-Luiza Vladarean, Nikita Doikov, Martin Jaggi, Nicolas Flammarion: Linearization Algorithms for Fully Composite Optimization. CoRR abs/2302.12808 (2023)
View ArticleUnified Convergence Theory of Stochastic and Variance-Reduced Cubic Newton...
El Mahdi Chayti, Nikita Doikov, Martin Jaggi: Unified Convergence Theory of Stochastic and Variance-Reduced Cubic Newton Methods. CoRR abs/2302.11962 (2023)
View ArticleBeyond spectral gap (extended): The role of the topology in decentralized...
Thijs Vogels, Hadrien Hendrikx, Martin Jaggi: Beyond spectral gap (extended): The role of the topology in decentralized learning. CoRR abs/2301.02151 (2023)
View ArticleSpecial Properties of Gradient Descent with Large Learning Rates.
Amirkeivan Mohtashami, Martin Jaggi, Sebastian U. Stich: Special Properties of Gradient Descent with Large Learning Rates. ICML 2023: 25082-25104
View ArticleSecond-Order Optimization with Lazy Hessians.
Nikita Doikov, El Mahdi Chayti, Martin Jaggi: Second-Order Optimization with Lazy Hessians. ICML 2023: 8138-8161
View ArticleAgree to Disagree: Diversity through Disagreement for Better Transferability.
Matteo Pagliardini, Martin Jaggi, François Fleuret, Sai Praneeth Karimireddy: Agree to Disagree: Diversity through Disagreement for Better Transferability. ICLR 2023
View ArticleLinearization Algorithms for Fully Composite Optimization.
Maria-Luiza Vladarean, Nikita Doikov, Martin Jaggi, Nicolas Flammarion: Linearization Algorithms for Fully Composite Optimization. COLT 2023: 3669-3695
View ArticleSIMSUM: Document-level Text Simplification via Simultaneous Summarization.
Sofia Blinova, Xinyu Zhou, Martin Jaggi, Carsten Eickhoff, Seyed Ali Bahrainian: SIMSUM: Document-level Text Simplification via Simultaneous Summarization. ACL (1) 2023: 9927-9944
View ArticleDeepBreath - automated detection of respiratory pathology from lung...
Julien Heitmann, Alban Glangetas, Jonathan Doenz, Juliane Dervaux, Deeksha M. Shama, Daniel Hinjos Garcia, Mohamed Rida Benissa, Aymeric Cantais, Alexandre Perez, Daniel Müller, Tatjana Chavdarova,...
View ArticleAttention with Markov: A Framework for Principled Analysis of Transformers...
Ashok Vardhan Makkuva, Marco Bondaschi, Adway Girish, Alliot Nagle, Martin Jaggi, Hyeji Kim, Michael Gastpar: Attention with Markov: A Framework for Principled Analysis of Transformers via Markov...
View ArticleInterpretCC: Conditional Computation for Inherently Interpretable Neural...
Vinitra Swamy, Julian Blackwell, Jibril Frej, Martin Jaggi, Tanja Käser: InterpretCC: Conditional Computation for Inherently Interpretable Neural Networks. CoRR abs/2402.02933 (2024)
View ArticleDenseFormer: Enhancing Information Flow in Transformers via Depth Weighted...
Matteo Pagliardini, Amirkeivan Mohtashami, François Fleuret, Martin Jaggi: DenseFormer: Enhancing Information Flow in Transformers via Depth Weighted Averaging. CoRR abs/2402.02622 (2024)
View ArticleMultiMoDN - Multimodal, Multi-Task, Interpretable Modular Networks.
Vinitra Swamy, Malika Satayeva, Jibril Frej, Thierry Bossy, Thijs Vogels, Martin Jaggi, Tanja Käser, Mary-Anne Hartley: MultiMoDN - Multimodal, Multi-Task, Interpretable Modular Networks. NeurIPS 2023
View ArticleRandom-Access Infinite Context Length for Transformers.
Amirkeivan Mohtashami, Martin Jaggi: Random-Access Infinite Context Length for Transformers. NeurIPS 2023
View ArticleMultiplication-Free Transformer Training via Piecewise Affine Operations.
Atli Kosson, Martin Jaggi: Multiplication-Free Transformer Training via Piecewise Affine Operations. NeurIPS 2023
View ArticleCollaborative Learning via Prediction Consensus.
Dongyang Fan, Celestine Mendler-Dünner, Martin Jaggi: Collaborative Learning via Prediction Consensus. NeurIPS 2023
View ArticleFast Attention Over Long Sequences With Dynamic Sparse Flash Attention.
Matteo Pagliardini, Daniele Paliotta, Martin Jaggi, François Fleuret: Fast Attention Over Long Sequences With Dynamic Sparse Flash Attention. NeurIPS 2023
View ArticleTowards an empirical understanding of MoE design choices.
Dongyang Fan, Bettina Messmer, Martin Jaggi: Towards an empirical understanding of MoE design choices. CoRR abs/2402.13089 (2024)
View ArticleGhost Noise for Regularizing Deep Neural Networks.
Atli Kosson, Dongyang Fan, Martin Jaggi: Ghost Noise for Regularizing Deep Neural Networks. AAAI 2024: 13274-13282
View Article