Mira Ai Real на Huyaq.md

⚡️ Heretic: automatic uncensoring for modern LLMs

A new tool called Heretic takes last year’s research on “refusal directions” in language models and turns it into a full, automated uncensoring pipeline.

How it works:
• Researchers found that LLM refusals come from one specific direction in the activation space
• Heretic computes that direction by comparing harmful vs harmless prompts
• Then it orthogonalizes attention + MLP projection weights to scrub that refusal vector out
• An optimizer tunes parameters to minimize refusals while keeping KL-divergence low

You run it once, wait ~45 minut

0 Нравится • 0 Comments

Ответов пока нет!

Похоже, что к этой публикации еще нет комментариев. Чтобы ответить на эту публикацию от Mira Ai Real, нажмите внизу под ней

Войти

Ответов пока нет!