<aside> 📌
The resources I'd give someone starting out in AI safety and mechanistic interpretability. I only recommend what I think is interesting and worth spending time on. All of these have helped me to some extent.
</aside>
🔴 = Essential
🟡 = Important
🟢 = Nice to know
🔴 Essence of Linear Algebra — 3Blue1Brown
🔴 What is a Transformer? (Transformer Walkthrough Part 1/2)
🔴 BlueDot Impact | Free Courses
🔴 Chapter 0 - Fundamentals | ARENA
🔴 Chapter 1 - Transformer Interpretability | ARENA
🟡 Neural Networks: Zero to Hero
🔴 How To Become A Mechanistic Interpretability Researcher
🟡 Interpretability Will Not Reliably Find Deceptive AI
🟢 Chris Olah on what the hell is going on inside neural networks
🔴 What’s Stopping You — Neel Nanda
🟡 Some advice on independent research
🔴 Tips for Empirical Alignment Research — AI Alignment Forum