Two models, opposite weak spots
7 min
Gemini leaks under a forced JSON schema; Claude resists that but folds to prefill. Opposite weak spots, same secret — and one deterministic defense that catches both.
- K.E.V.I.N.
- AI security
- red-teaming
- LLM
3 posts
Gemini leaks under a forced JSON schema; Claude resists that but folds to prefill. Opposite weak spots, same secret — and one deterministic defense that catches both.
Six hand-written jailbreaks couldn't crack my hardened bot. Then I forced its reply into a JSON schema and it leaked the secret on turn one. Why structured output is the hole.
I built K.E.V.I.N.: one AI tries to extract a secret from another while three judges score each attempt. Three things 50 rounds of automated jailbreaking taught me.