CLI Overview - Heretic

The Heretic CLI provides a fully automatic way to remove censorship (“safety alignment”) from language models. The tool requires minimal configuration and handles the entire decensoring process from start to finish.

Basic Command Structure

The simplest way to use Heretic is to provide a model identifier:

heretic Qwen/Qwen3-4B-Instruct-2507

You can also use the --model flag explicitly:

heretic --model Qwen/Qwen3-4B-Instruct-2507

Both HuggingFace model IDs and local paths are supported:

heretic /path/to/local/model

Common Workflows

Standard Decensoring Workflow

Run Heretic on your target model
Wait for optimization - Heretic will automatically run trials to find optimal parameters
Select a trial - Choose from Pareto-optimal results based on refusals vs KL divergence
Export the model - Save locally, upload to HuggingFace, or test with interactive chat

Evaluation Workflow

To evaluate an already-decensored model against its base:

heretic --model google/gemma-3-12b-it --evaluate-model p-e-w/gemma-3-12b-it-heretic

This compares the decensored model to the base model using the same evaluation metrics used during optimization.

Resume Workflow

Heretic automatically checkpoints progress. If interrupted, simply re-run the same command:

heretic Qwen/Qwen3-4B-Instruct-2507

You’ll be prompted to:

Continue the previous run
Show results from a completed run
Restart from scratch

Configuration Methods

Heretic supports three configuration methods (in order of precedence):

Command-line flags: heretic --quantization bnb_4bit --n-trials 100 MODEL_NAME
Environment variables: HERETIC_QUANTIZATION=bnb_4bit heretic MODEL_NAME
Configuration file: Create config.toml in the working directory

For one-off runs, use command-line flags. For repeated experiments with the same settings, use a configuration file.

The Optimization Process

Heretic uses a multi-stage process:

Hardware Detection - Identifies GPUs and available VRAM
Model Loading - Loads the base model with optimal dtype
Batch Size Optimization - Benchmarks to find optimal throughput
Refusal Direction Calculation - Analyzes model internals
Parameter Optimization - Runs trials to minimize refusals and KL divergence
Model Export - Saves or uploads the best result

The entire process is fully automatic. On an RTX 3090, decensoring Llama-3.1-8B-Instruct takes approximately 45 minutes with default settings.

Output and Post-Processing

After optimization completes, Heretic presents Pareto-optimal trials:

Refusals: Number of refused prompts out of 100 test cases
KL Divergence: How much the model’s behavior changed (lower is better)

KL divergence values above 1.0 typically indicate significant damage to the model’s original capabilities.

For each selected trial, you can:

Save to local folder - Export merged model or LoRA adapter
Upload to HuggingFace - Push directly to your HF account
Chat with model - Interactive testing to evaluate quality
Return to menu - Try a different trial

Next Steps

Basic Usage

Learn common usage patterns and examples

CLI Options

Complete reference of all command-line options

​Basic Command Structure

​Common Workflows

​Standard Decensoring Workflow

​Evaluation Workflow

​Resume Workflow

​Configuration Methods

​The Optimization Process

​Output and Post-Processing

​Next Steps

Basic Usage

CLI Options

Basic Command Structure

Common Workflows

Standard Decensoring Workflow

Evaluation Workflow

Resume Workflow

Configuration Methods

The Optimization Process

Output and Post-Processing

Next Steps