Robustly identifying concepts introduced during chat fine-tuning using crosscoders
Julian Minder*, Clement Dumas*, Caden Juang, Bilal Chugtai, Neel Nanda
ICLR 2025 Workshop on Sparsity in LLMs (SLLM)
I am an incoming PhD student at DLAB at EPFL, where I will be supervised by Prof. Robert West, and co-advised by Prof. Ryan Cotterell (ETH Zurich).
I am passionate about understanding and improving artificial intelligence systems. My work focuses on making AI systems more transparent and trustworthy through interpretability research, with the goal of enhancing model robustness and reducing bias. I aim to better understand how these systems work and how we can make them more reliable.
I completed my master's degree in computer science at ETH Zurich in 2024, following earlier studies in computer science and neuroinformatics at the University of Zurich.
My recent research explores how pretraining distributions shape language model internals and influence fine-tuning performance. I wrote my master's thesis at EPFL under Bob West and Chris Wendler, investigating the mechanistic effects of fine-tuning language models.
Currently, I am a research scholar at MATS 7.0 working together with Clement Dumas under the mentorship of Neel Nanda to study the differences between base and instruct models.
Please feel free to reach out anytime!
Julian Minder*, Clement Dumas*, Caden Juang, Bilal Chugtai, Neel Nanda
ICLR 2025 Workshop on Sparsity in LLMs (SLLM)
Julian Minder*, Kevin Du*, Niklas Stoehr, Giovanni Monea, Chris Wendler, Robert West, Ryan Cotterell
The Thirteenth International Conference on Learning Representations (ICLR 2025)
Julian Minder, Felix Grötschla, Johannes Mathys, Robert Wattenhofer
(Extended Abstract) Second Learning on Graphs Conference (LoG 2023)
Thomas Gschwind, Christoph Miksovic, Julian Minder, Katsiaryna Mirylenka, Paolo Scotton
2019 IEEE International Conference on Big Data (Big Data), 623-630