red-teaming

3 posts

Two models, opposite weak spots

Jun 26, 2026 7 min

Gemini leaks under a forced JSON schema; Claude resists that but folds to prefill. Opposite weak spots, same secret — and one deterministic defense that catches both.
- K.E.V.I.N.
- AI security
- red-teaming
- LLM
The bot that couldn't say no

Jun 25, 2026 3 min

Six hand-written jailbreaks couldn't crack my hardened bot. Then I forced its reply into a JSON schema and it leaked the secret on turn one. Why structured output is the hole.
- K.E.V.I.N.
- AI security
- red-teaming
- LLM
Meet K.E.V.I.N.

Jun 24, 2026 3 min

I built K.E.V.I.N.: one AI tries to extract a secret from another while three judges score each attempt. Three things 50 rounds of automated jailbreaking taught me.
- K.E.V.I.N.
- AI security
- red-teaming
- LLM