Steer LLM outputs towards a certain topic/subject and enhance response capabilities using activation engineering by adding steering vectors, now integrated within the oobabooga text generation webui!
llm_steer, the underlying codebase utilized for this extension, was created by https://github.com/Mihaiii
Note: This extension only works for models loaded using the "transformers" backend.
-
pip3 install llm_steer
(Make sure pip3 corresponds to the particular pip used by oobabooga, for me it's the pip3 located at/home/(user)/text-generation-webui/installer_files/env/bin/pip3
- otherwise oobabooga won't pick up the installed package) -
run oobabooga, and navigate to the session page. Copy and paste the github url (https://github.com/Hellisotherpeople/llm_steer-oobabooga) into the install box and press enter.
There are three values:
Layer Index (int): Which layer should the steering vector be inserted into?
This is not well understood, but in general, the earlier layers are supposedly more "general" and potentially more "impactful". Results will very
Mistral models usually have at least 24 layers.
Coefficient (float): The intensity of the vector. Gives fully granular control over the impact of the vector. Can be negative.
Steering Text (string): The prompt used for creating the vector.
Set these values and click "Add Steering Vector". Any combination of steering vectors can be used at the same time.
To reset and delete all Steering Vectors, click "Reset Steering Vectors"
To view the currently applied Steering Vectors, click "Get Steering Vectors"
Several reasons!
- You don't consume any tokens this way, leaving the remaining system prompt tokens to have a stronger impact
- You can dial the particular intensity/attention of a token up or down, and apply it to any layer or combination of layers that you'd like
- Supports negative values of coeffecient, which implements effectively faster "negative prompting" behavior than existing classifier free guidance built into oobabooga.
- Makes it pretty easy to implement personalization, or alignment/unalignment.
Further Background on Steering Vectors:
- Activation Addition: Steering Language Models Without Optimization
- Steering GPT-2-XL by adding an activation vector
Related ideas/inspiration:
(No vector)
(Add Sad Vector)
(Add Tax Preperation Vector)