Chat Bypass 2023 - Synergy -

Unlike basic prompt injections, the Synergy approach leverages the inherent cognitive biases embedded in LLMs during their training. By layering these biases, attackers can create a "synergistic" effect that is significantly more effective at bypassing safety protocols than any single bias alone.

: Safety benchmarks like VE-Safety and others were curated to include categories like cybercrime and physical harm, specifically to train models against "Image-as-Basis" threats and complex prompt engineering. Chat Bypass 2023 - Synergy

: Attackers began using autonomous agents to adapt bypass strategies in real-time, creating "adaptive" prompts that could learn from a model's refusal and try a different combination of biases. : Attackers began using autonomous agents to adapt

: The method uses specific linguistic patterns that trigger the model's tendency to prioritize certain types of information or "authority" over its safety training. Unlike basic prompt injections

Throughout 2023, the industry moved from "black-box" guessing of bypass codes to scientific red-teaming.