The Heretic CLI provides a fully automatic way to remove censorship (“safety alignment”) from language models. The tool requires minimal configuration and handles the entire decensoring process from start to finish.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/p-e-w/heretic/llms.txt
Use this file to discover all available pages before exploring further.
Basic Command Structure
The simplest way to use Heretic is to provide a model identifier:--model flag explicitly:
Common Workflows
Standard Decensoring Workflow
- Run Heretic on your target model
- Wait for optimization - Heretic will automatically run trials to find optimal parameters
- Select a trial - Choose from Pareto-optimal results based on refusals vs KL divergence
- Export the model - Save locally, upload to HuggingFace, or test with interactive chat
Evaluation Workflow
To evaluate an already-decensored model against its base:Resume Workflow
Heretic automatically checkpoints progress. If interrupted, simply re-run the same command:- Continue the previous run
- Show results from a completed run
- Restart from scratch
Configuration Methods
Heretic supports three configuration methods (in order of precedence):- Command-line flags:
heretic --quantization bnb_4bit --n-trials 100 MODEL_NAME - Environment variables:
HERETIC_QUANTIZATION=bnb_4bit heretic MODEL_NAME - Configuration file: Create
config.tomlin the working directory
The Optimization Process
Heretic uses a multi-stage process:- Hardware Detection - Identifies GPUs and available VRAM
- Model Loading - Loads the base model with optimal dtype
- Batch Size Optimization - Benchmarks to find optimal throughput
- Refusal Direction Calculation - Analyzes model internals
- Parameter Optimization - Runs trials to minimize refusals and KL divergence
- Model Export - Saves or uploads the best result
The entire process is fully automatic. On an RTX 3090, decensoring Llama-3.1-8B-Instruct takes approximately 45 minutes with default settings.
Output and Post-Processing
After optimization completes, Heretic presents Pareto-optimal trials:- Refusals: Number of refused prompts out of 100 test cases
- KL Divergence: How much the model’s behavior changed (lower is better)
- Save to local folder - Export merged model or LoRA adapter
- Upload to HuggingFace - Push directly to your HF account
- Chat with model - Interactive testing to evaluate quality
- Return to menu - Try a different trial
Next Steps
Basic Usage
Learn common usage patterns and examples
CLI Options
Complete reference of all command-line options
