<aside> 📌

The resources I'd give someone starting out in AI safety and mechanistic interpretability. I only recommend what I think is interesting and worth spending time on. All of these have helped me to some extent.

</aside>

🔴 = Essential

🟡 = Important

🟢 = Nice to know

Foundations

🔴 Essence of Linear Algebra — 3Blue1Brown

🔴 What is a Transformer? (Transformer Walkthrough Part 1/2)

🔴 BlueDot Impact | Free Courses

🔴 Chapter 0 - Fundamentals | ARENA

🔴 Chapter 1 - Transformer Interpretability | ARENA

🟡 Neural Networks: Zero to Hero

Mechanistic Interpretability

🔴 How To Become A Mechanistic Interpretability Researcher

🟡 Interpretability Will Not Reliably Find Deceptive AI

🟢 Chris Olah on what the hell is going on inside neural networks

Mindset

🔴 What’s Stopping You — Neel Nanda

🟡 Some advice on independent research

Research Taste/Ideas

🔴 Tips for Empirical Alignment Research — AI Alignment Forum

🔴 Neel Nanda’s Research Process