Iván Arcuschin Moreno

Lead Research Scientist

I’m Iván, Lead Research Scientist at Poseidon Research working on AI Safety.

I earned my Computer Science PhD from the University of Buenos Aires, Argentina. Since 2024, I’ve focused on AI Safety research, completing two MATS terms (under Adrià Garriga-Alonso at FAR AI and Arthur Conmy at Google DeepMind) and publishing at NeurIPS, ICML, and ICLR workshops.

My research focuses on chain-of-thought interpretability of frontier LLMs, at the intersection of black-box methods (what models say) and white-box methods (what they compute). On the black-box side, I’ve shown that frontier LLMs produce rare but subtle unfaithful chain-of-thought with no nudge prompting or response editing (ICML 2026, ~180 citations), and I’ve built automated pipelines that uncover the unverbalised biases LLMs hide in their stated reasoning (ICML 2026). On the white-box side, I’ve shown that thinking models like DeepSeek R1 mostly repurpose reasoning mechanisms already present in their base counterparts (ICML 2026 Spotlight), and I’ve collaborated on benchmarks for evaluating mechanistic interpretability techniques at NeurIPS 2024 and ICML 2025.

Publications

Biases in the Blind Spot: Detecting What LLMs Fail to Mention

Large Language Models (LLMs) often provide chain-of-thought (CoT) reasoning traces that appear plausible, but may hide internal biases. …

Iván Arcuschin*, David Chanin*, Adrià Garriga-Alonso, Oana-Maria Camburu

43rd International Conference on Machine Learning (ICML 2026)

arXiv

Base Models Know How to Reason, Thinking Models Learn When ⭐

Why do thinking language models like DeepSeek R1 outperform their base counterparts? Despite consistent performance gains, it remains …

Constantin Venhoff*, Iván Arcuschin*, Philip Torr, Arthur Conmy, Neel Nanda

43rd International Conference on Machine Learning (ICML 2026 Spotlight)

arXiv

Chain-of-Thought Is Not Explainability

Chains-of-thought (CoT) allow language models to verbalise multi-step rationales before producing their final answer. While this …

Fazl Barez, Tung-Yu Wu, Iván Arcuschin, Michael Lan, Vincent Wang, Noah Siegel, Nicolas Collignon, Clement Neo, Isabelle Lee, Alasdair Paren, Adel Bibi, Robert Trager, Damiano Fornasiere, John Yan, Yanai Elazar, Yoshua Bengio

Oxford AI Governance Initiative (AIGI)

Preprint

Chain-of-Thought Reasoning In The Wild Is Not Always Faithful

Chain-of-Thought (CoT) reasoning has significantly advanced state-of-the-art AI capabilities. However, recent studies have shown that …

Iván Arcuschin, Jett Janiak, Robert Krzyzanowski, Senthooran Rajamanoharan, Neel Nanda, Arthur Conmy

43rd International Conference on Machine Learning (ICML 2026)

arXiv

More publications