Tim Backhaus

AI Model Ranking Compliance

We are regularly testing AI models against each other to identify what will give our users the best outcome. Today we are sharing the results of our last testing, let us know in the comments if this is interesting to you!

Google’s gemini-2.5-pro is the best AI model for Compliance! (and 3.5 X better than gpt-4)

We are regularly testing AI models against each other to identify what will give our users the best outcome. Today we are sharing the results of our last testing, let us know in the comments if this is interesting to you!

Most importantly, Google’s gemini-2.5-pro leads the list, closely followed by Open AI’s 4.1, our European player, mistral-large is also not too far behind, the anthropic models are not particularly good at compliance.

🔎 gemini-2.5-pro is very good at recall (few false negatives) and writing risk narratives short and precise.

🎯 gpt-4.1 is best at precision (few false negatives) but not as good at writing risk narratives as gemini

👍 mistral-large well-balanced model but not as good as the competitors

🙁 claude-sonnet our favorite model for programming and day-to-day AI tasks, but is not good at compliance

📈 When comparing our testing scores over time it is also crazy that we are 3.5 times better than 1 year ago with gpt-4

‼️ Important - there are many other factors that you should consider when picking a model:

  • Availability: we only work with models that we can host in the EU
  • Performance post fine-tuning: the ranking is based on vanilla models, with fine-tuning you can unlock additional performance
  • Specific tasks: cherry-picking multiple models for different tasks can improve user experience further

25 Screenings For Free

25 screenings free
No credit card required