How to Hack AI Agents and Applications

Hey everyone,

For the past few weeks, I’ve been working on my largest (and best) post ever! 😊 

It’s finally finished.

The tl;dr for this email is:
- Click here to read my Megapost on Hacking AI
- or Click here to sign up for a 3 -our Masterclass on it
- Join Gray Swan AI’s Discord to win $100k in prizes for Jailbreaks
- Read about Haize Labs’ Verdict release, an Open Source LLM Judge Framework
- grok3 is the new best at ai security and appsec in general (imo)
- sonnet3.7 just came out and it’s AWESOME at coding

——————————

I just released a huge zero-to-hero style guide about how to hack AI applications. I’d love a retweet or an upvote on Hackernews. It’s hacking focused but also includes the most comprehensive mitigation section that I’ve ever seen. It is really helpful for both testers and developers.

You may have noticed my website has been overhauled to look much cleaner and now includes a Tools page where you can find two invisible prompt injection tools.

I’m also hosting a 3 hour masterclass covering all the same content, but with live Q&A and some hands-on hacking. You can sign up here!

Gray Swan (sponsored)

Gray Swan AI, a leading AI security firm known for its crowd‐sourced AI “jailbreaking arenas,” is teaming up with the UK Artificial Intelligence Security Institute (AISI) to launch its largest and most ambitious competition yet: the Agent Red-Teaming Challenge, beginning on March 8, 2025. With a $100,000 prize pool this event invites a global community of researchers, cybersecurity experts, and hobbyists to stress‐test cutting‐edge agent‐based AI models for vulnerabilities.

Event Details

  • Dates: March 8, 2025 – April 6, 2025 (with new challenges introduced weekly)

  • Prize Pool: $100,000

How to Participate: They will announce sign-ups in the community Discord discord.gg/grayswanai

Haize Labs releases Verdict (sponsored)

Haize Lab’s, my Q1 sponsor, released an awesome open-source project. It’s called Verdict and it’s the best-in-class project for LLM-as-judge use cases. You should check it out (and use them for AI Safety Testing). Verdict has huge implications for RL inside LLMs.

Grok3

I used all the top models to review and edit my AI hacking post mentioned a bunch above. AI application security is such a new thing that there’s basically no data in these models training sets so they tend to over fixate on AI safety or owasp top then.

That’s what all the models did… except grok3. It was pretty insightful and gave me 2-3 attack scenarios that I was able to add. So yeah, you should check it out for deeply technical and breaking edge appsec stuff.'

Sonnet 3.7 Release

Sonnet 3.5 has been the best coding model (and what I use in Cursor) for a long time. Well just as other companies were improving, anthropic just dropped sonnet 3.7. It’s even better, by a large margin. I have used it a little bit, and it seems awesome.

If you aren’t using AI to code yet, JUMP ON IT. It’s a massive time save, it’s fun, and it’s the future. Try cursor, windsurf, or cline.


Thanks for being on the email list! 😊 If you like this content, I’d love if you invited someone to join it or to follow me. Also, reply to this email if you’d like to sponsor a post/email.

Joseph Thacker (rez0)
josephthacker.com