Sparse Latents

Cartographic exploration of interpretable latents within large language models. Twelve Gemma Scope sparse autoencoder 'features' inside the Gemma 2 2B model are investigated by clustering and rendering thousands of maximally activating prompts. Select a latent below to explore a map of activating concepts.

2412.02412 #sparse-latents
blood
Blood
references to blood and violent imagery
hands
Hands
actions involving raising or moving hands and interacting with other subjects
indebted
Indebted
expressions related to legal or financial obligations
ingredients
Ingredients
ingredients and dishes related to food preparation and recipes
marine
Marine
terms related to marine and naval concepts
mechanical
Mechanical
terms and concepts related to mechanical systems and engineering
muscle
Muscle
references to muscle-related subjects and terminology
other
Other
references to alternatives or additional options
parameters
Parameters
references to numerical values and parameters in a technical context
psychological
Psychological
concepts related to psychological and emotional processes
seating
Seating
references to comfortable seating arrangements
short
Short
the word "short" and related terms or phrases indicating brevity