# Old Mental Model (Karpathy)
# 'B T C -> ...'
# New Mental Model (ARENA)
# 'b s d_model -> ...'
Core Syntax
output = rearrange(tensor, 'input_pattern -> output_pattern', **constants)
Move 1: The Swap (Permute)
.transpose or .permute).heads dimension next to the batch dimension so you can parallelize attention.# Move 'c' (channels) to the front
y = rearrange(x, 'b s c -> c b s')
Move 2: The Split (Decomposition)
d_model (C) into n_heads and d_head