← Back to feed

heretic

GitHub Repo Pretty sure · the productive kind of heresy
https://github.com/p-e-w/heretic

Abliteration framework that actually works unsupervised, ships real code, and produces measurably better decensored models than manual methods. The thing everyone said couldn't be automated.

5%
30%
65%
Slop 5%Signal 30%Science 65%

Heretic implements directional ablation + TPE optimization to remove safety alignment automatically. The science is solid: references peer-reviewed work (Arditi et al. 2024, Lai 2025), ships reproducible benchmarks (KL divergence metrics), and produces empirically better results than human-tuned baselines (0.16 vs 0.45 KL div on Gemma-3). Code is real—CLI tool, configuration system, research features (residual plotting). No marketing fluff in the README. Signal exists: 1000+ community-generat...

14538 stars Python 2026-03-15 175 days old

Become a MFer to rate — log in