Quantcast
Channel: dblp: Martin Jaggi
Browsing latest articles
Browse All 230 View Live

Image may be NSFW.
Clik here to view.

Beyond spectral gap: The role of the topology in decentralized learning.

Thijs Vogels, Hadrien Hendrikx, Martin Jaggi: Beyond spectral gap: The role of the topology in decentralized learning. CoRR abs/2206.03093 (2022)

View Article



Image may be NSFW.
Clik here to view.

On Avoiding Local Minima Using Gradient Descent With Large Learning Rates.

Amirkeivan Mohtashami, Martin Jaggi, Sebastian U. Stich: On Avoiding Local Minima Using Gradient Descent With Large Learning Rates. CoRR abs/2205.15142 (2022)

View Article

Image may be NSFW.
Clik here to view.

SKILL: Structured Knowledge Infusion for Large Language Models.

Fedor Moiseev, Zhe Dong, Enrique Alfonseca, Martin Jaggi: SKILL: Structured Knowledge Infusion for Large Language Models. CoRR abs/2205.08184 (2022)

View Article

Image may be NSFW.
Clik here to view.

Data-heterogeneity-aware Mixing for Decentralized Learning.

Yatin Dandi, Anastasia Koloskova, Martin Jaggi, Sebastian U. Stich: Data-heterogeneity-aware Mixing for Decentralized Learning. CoRR abs/2204.06477 (2022)

View Article

Image may be NSFW.
Clik here to view.

Improving Generalization via Uncertainty Driven Perturbations.

Matteo Pagliardini, Gilberto Manunza, Martin Jaggi, Michael I. Jordan, Tatjana Chavdarova: Improving Generalization via Uncertainty Driven Perturbations. CoRR abs/2202.05737 (2022)

View Article


Image may be NSFW.
Clik here to view.

Agree to Disagree: Diversity through Disagreement for Better Transferability.

Matteo Pagliardini, Martin Jaggi, François Fleuret, Sai Praneeth Karimireddy: Agree to Disagree: Diversity through Disagreement for Better Transferability. CoRR abs/2202.04414 (2022)

View Article

Image may be NSFW.
Clik here to view.

Characterizing & Finding Good Data Orderings for Fast Convergence of...

Amirkeivan Mohtashami, Sebastian U. Stich, Martin Jaggi: Characterizing & Finding Good Data Orderings for Fast Convergence of Sequential Gradient Methods. CoRR abs/2202.01838 (2022)

View Article

Image may be NSFW.
Clik here to view.

Byzantine-Robust Decentralized Learning via Self-Centered Clipping.

Lie He, Sai Praneeth Karimireddy, Martin Jaggi: Byzantine-Robust Decentralized Learning via Self-Centered Clipping. CoRR abs/2202.01545 (2022)

View Article


Image may be NSFW.
Clik here to view.

Beyond spectral gap: the role of the topology in decentralized learning.

Thijs Vogels, Hadrien Hendrikx, Martin Jaggi: Beyond spectral gap: the role of the topology in decentralized learning. NeurIPS 2022

View Article


Image may be NSFW.
Clik here to view.

FLamby: Datasets and Benchmarks for Cross-Silo Federated Learning in...

Jean Ogier du Terrail, Samy-Safwan Ayed, Edwige Cyffers, Felix Grimberg, Chaoyang He, Regis Loeb, Paul Mangold, Tanguy Marchand, Othmane Marfoq, Erum Mushtaq, Boris Muzellec, Constantin Philippenko,...

View Article

Image may be NSFW.
Clik here to view.

Sharper Convergence Guarantees for Asynchronous SGD for Distributed and...

Anastasia Koloskova, Sebastian U. Stich, Martin Jaggi: Sharper Convergence Guarantees for Asynchronous SGD for Distributed and Federated Learning. NeurIPS 2022

View Article

Image may be NSFW.
Clik here to view.

SKILL: Structured Knowledge Infusion for Large Language Models.

Fedor Moiseev, Zhe Dong, Enrique Alfonseca, Martin Jaggi: SKILL: Structured Knowledge Infusion for Large Language Models. NAACL-HLT 2022: 1581-1588

View Article

Image may be NSFW.
Clik here to view.

Byzantine-Robust Learning on Heterogeneous Datasets via Bucketing.

Sai Praneeth Karimireddy, Lie He, Martin Jaggi: Byzantine-Robust Learning on Heterogeneous Datasets via Bucketing. ICLR 2022

View Article


Image may be NSFW.
Clik here to view.

Masked Training of Neural Networks with Partial Gradients.

Amirkeivan Mohtashami, Martin Jaggi, Sebastian U. Stich: Masked Training of Neural Networks with Partial Gradients. AISTATS 2022: 5876-5890

View Article

Image may be NSFW.
Clik here to view.

Implicit Gradient Alignment in Distributed and Federated Learning.

Yatin Dandi, Luis Barba, Martin Jaggi: Implicit Gradient Alignment in Distributed and Federated Learning. AAAI 2022: 6454-6462

View Article


Image may be NSFW.
Clik here to view.

MEDITRON-70B: Scaling Medical Pretraining for Large Language Models.

Zeming Chen, Alejandro Hernández-Cano, Angelika Romanou, Antoine Bonnet, Kyle Matoba, Francesco Salvi, Matteo Pagliardini, Simin Fan, Andreas Köpf, Amirkeivan Mohtashami, Alexandre Sallinen, Alireza...

View Article

Image may be NSFW.
Clik here to view.

Controllable Topic-Focused Abstractive Summarization.

Seyed Ali Bahrainian, Martin Jaggi, Carsten Eickhoff: Controllable Topic-Focused Abstractive Summarization. CoRR abs/2311.06724 (2023)

View Article


Image may be NSFW.
Clik here to view.

DoGE: Domain Reweighting with Generalization Estimation.

Simin Fan, Matteo Pagliardini, Martin Jaggi: DoGE: Domain Reweighting with Generalization Estimation. CoRR abs/2310.15393 (2023)

View Article

Image may be NSFW.
Clik here to view.

Irreducible Curriculum for Language Model Pretraining.

Simin Fan, Martin Jaggi: Irreducible Curriculum for Language Model Pretraining. CoRR abs/2310.15389 (2023)

View Article

Image may be NSFW.
Clik here to view.

LASER: Linear Compression in Wireless Distributed Optimization.

Ashok Vardhan Makkuva, Marco Bondaschi, Thijs Vogels, Martin Jaggi, Hyeji Kim, Michael C. Gastpar: LASER: Linear Compression in Wireless Distributed Optimization. CoRR abs/2310.13033 (2023)

View Article

Image may be NSFW.
Clik here to view.

CoTFormer: More Tokens With Attention Make Up For Less Depth.

Amirkeivan Mohtashami, Matteo Pagliardini, Martin Jaggi: CoTFormer: More Tokens With Attention Make Up For Less Depth. CoRR abs/2310.10845 (2023)

View Article


Image may be NSFW.
Clik here to view.

MultiModN- Multimodal, Multi-Task, Interpretable Modular Networks.

Vinitra Swamy, Malika Satayeva, Jibril Frej, Thierry Bossy, Thijs Vogels, Martin Jaggi, Tanja Käser, Mary-Anne Hartley: MultiModN- Multimodal, Multi-Task, Interpretable Modular Networks. CoRR...

View Article


Image may be NSFW.
Clik here to view.

Layerwise Linear Mode Connectivity.

Linara Adilova, Asja Fischer, Martin Jaggi: Layerwise Linear Mode Connectivity. CoRR abs/2307.06966 (2023)

View Article

Image may be NSFW.
Clik here to view.

Provably Personalized and Robust Federated Learning.

Mariel A. Werner, Lie He, Sai Praneeth Karimireddy, Michael I. Jordan, Martin Jaggi: Provably Personalized and Robust Federated Learning. CoRR abs/2306.08393 (2023)

View Article

Image may be NSFW.
Clik here to view.

Faster Causal Attention Over Large Sequences Through Sparse Flash Attention.

Matteo Pagliardini, Daniele Paliotta, Martin Jaggi, François Fleuret: Faster Causal Attention Over Large Sequences Through Sparse Flash Attention. CoRR abs/2306.01160 (2023)

View Article


Image may be NSFW.
Clik here to view.

Shuffle SGD is Always Better than SGD: Improved Analysis of SGD with...

Anastasia Koloskova, Nikita Doikov, Sebastian U. Stich, Martin Jaggi: Shuffle SGD is Always Better than SGD: Improved Analysis of SGD with Arbitrary Data Orders. CoRR abs/2305.19259 (2023)

View Article

Image may be NSFW.
Clik here to view.

Collaborative Learning via Prediction Consensus.

Dongyang Fan, Celestine Mendler-Dünner, Martin Jaggi: Collaborative Learning via Prediction Consensus. CoRR abs/2305.18497 (2023)

View Article

Image may be NSFW.
Clik here to view.

Rotational Optimizers: Simple & Robust DNN Training.

Atli Kosson, Bettina Messmer, Martin Jaggi: Rotational Optimizers: Simple & Robust DNN Training. CoRR abs/2305.17212 (2023)

View Article

Image may be NSFW.
Clik here to view.

Ghost Noise for Regularizing Deep Neural Networks.

Atli Kosson, Dongyang Fan, Martin Jaggi: Ghost Noise for Regularizing Deep Neural Networks. CoRR abs/2305.17205 (2023)

View Article



Image may be NSFW.
Clik here to view.

Hardware-Efficient Transformer Training via Piecewise Affine Operations.

Atli Kosson, Martin Jaggi: Hardware-Efficient Transformer Training via Piecewise Affine Operations. CoRR abs/2305.17190 (2023)

View Article

Image may be NSFW.
Clik here to view.

Landmark Attention: Random-Access Infinite Context Length for Transformers.

Amirkeivan Mohtashami, Martin Jaggi: Landmark Attention: Random-Access Infinite Context Length for Transformers. CoRR abs/2305.16300 (2023)

View Article

Image may be NSFW.
Clik here to view.

Linearization Algorithms for Fully Composite Optimization.

Maria-Luiza Vladarean, Nikita Doikov, Martin Jaggi, Nicolas Flammarion: Linearization Algorithms for Fully Composite Optimization. CoRR abs/2302.12808 (2023)

View Article

Image may be NSFW.
Clik here to view.

Unified Convergence Theory of Stochastic and Variance-Reduced Cubic Newton...

El Mahdi Chayti, Nikita Doikov, Martin Jaggi: Unified Convergence Theory of Stochastic and Variance-Reduced Cubic Newton Methods. CoRR abs/2302.11962 (2023)

View Article


Image may be NSFW.
Clik here to view.

Beyond spectral gap (extended): The role of the topology in decentralized...

Thijs Vogels, Hadrien Hendrikx, Martin Jaggi: Beyond spectral gap (extended): The role of the topology in decentralized learning. CoRR abs/2301.02151 (2023)

View Article

Image may be NSFW.
Clik here to view.

Special Properties of Gradient Descent with Large Learning Rates.

Amirkeivan Mohtashami, Martin Jaggi, Sebastian U. Stich: Special Properties of Gradient Descent with Large Learning Rates. ICML 2023: 25082-25104

View Article

Image may be NSFW.
Clik here to view.

Second-Order Optimization with Lazy Hessians.

Nikita Doikov, El Mahdi Chayti, Martin Jaggi: Second-Order Optimization with Lazy Hessians. ICML 2023: 8138-8161

View Article


Image may be NSFW.
Clik here to view.

Agree to Disagree: Diversity through Disagreement for Better Transferability.

Matteo Pagliardini, Martin Jaggi, François Fleuret, Sai Praneeth Karimireddy: Agree to Disagree: Diversity through Disagreement for Better Transferability. ICLR 2023

View Article


Image may be NSFW.
Clik here to view.

Linearization Algorithms for Fully Composite Optimization.

Maria-Luiza Vladarean, Nikita Doikov, Martin Jaggi, Nicolas Flammarion: Linearization Algorithms for Fully Composite Optimization. COLT 2023: 3669-3695

View Article

Image may be NSFW.
Clik here to view.

SIMSUM: Document-level Text Simplification via Simultaneous Summarization.

Sofia Blinova, Xinyu Zhou, Martin Jaggi, Carsten Eickhoff, Seyed Ali Bahrainian: SIMSUM: Document-level Text Simplification via Simultaneous Summarization. ACL (1) 2023: 9927-9944

View Article

Image may be NSFW.
Clik here to view.

DeepBreath - automated detection of respiratory pathology from lung...

Julien Heitmann, Alban Glangetas, Jonathan Doenz, Juliane Dervaux, Deeksha M. Shama, Daniel Hinjos Garcia, Mohamed Rida Benissa, Aymeric Cantais, Alexandre Perez, Daniel Müller, Tatjana Chavdarova,...

View Article

Image may be NSFW.
Clik here to view.

Attention with Markov: A Framework for Principled Analysis of Transformers...

Ashok Vardhan Makkuva, Marco Bondaschi, Adway Girish, Alliot Nagle, Martin Jaggi, Hyeji Kim, Michael Gastpar: Attention with Markov: A Framework for Principled Analysis of Transformers via Markov...

View Article


Image may be NSFW.
Clik here to view.

InterpretCC: Conditional Computation for Inherently Interpretable Neural...

Vinitra Swamy, Julian Blackwell, Jibril Frej, Martin Jaggi, Tanja Käser: InterpretCC: Conditional Computation for Inherently Interpretable Neural Networks. CoRR abs/2402.02933 (2024)

View Article

Image may be NSFW.
Clik here to view.

DenseFormer: Enhancing Information Flow in Transformers via Depth Weighted...

Matteo Pagliardini, Amirkeivan Mohtashami, François Fleuret, Martin Jaggi: DenseFormer: Enhancing Information Flow in Transformers via Depth Weighted Averaging. CoRR abs/2402.02622 (2024)

View Article


Image may be NSFW.
Clik here to view.

MultiMoDN - Multimodal, Multi-Task, Interpretable Modular Networks.

Vinitra Swamy, Malika Satayeva, Jibril Frej, Thierry Bossy, Thijs Vogels, Martin Jaggi, Tanja Käser, Mary-Anne Hartley: MultiMoDN - Multimodal, Multi-Task, Interpretable Modular Networks. NeurIPS 2023

View Article

Image may be NSFW.
Clik here to view.

Random-Access Infinite Context Length for Transformers.

Amirkeivan Mohtashami, Martin Jaggi: Random-Access Infinite Context Length for Transformers. NeurIPS 2023

View Article


Image may be NSFW.
Clik here to view.

Multiplication-Free Transformer Training via Piecewise Affine Operations.

Atli Kosson, Martin Jaggi: Multiplication-Free Transformer Training via Piecewise Affine Operations. NeurIPS 2023

View Article

Image may be NSFW.
Clik here to view.

Collaborative Learning via Prediction Consensus.

Dongyang Fan, Celestine Mendler-Dünner, Martin Jaggi: Collaborative Learning via Prediction Consensus. NeurIPS 2023

View Article

Image may be NSFW.
Clik here to view.

Fast Attention Over Long Sequences With Dynamic Sparse Flash Attention.

Matteo Pagliardini, Daniele Paliotta, Martin Jaggi, François Fleuret: Fast Attention Over Long Sequences With Dynamic Sparse Flash Attention. NeurIPS 2023

View Article

Towards an empirical understanding of MoE design choices.

Dongyang Fan, Bettina Messmer, Martin Jaggi: Towards an empirical understanding of MoE design choices. CoRR abs/2402.13089 (2024)

View Article


Ghost Noise for Regularizing Deep Neural Networks.

Atli Kosson, Dongyang Fan, Martin Jaggi: Ghost Noise for Regularizing Deep Neural Networks. AAAI 2024: 13274-13282

View Article

Browsing latest articles
Browse All 230 View Live


Latest Images