Jailbreak AI Models - Search News

Leading AI Model Claude Opus 4.6 Bypassed in 30 Minutes, Exposing Critical Security Gap in Agentic AI Systems

AIM Intelligence's red team breached Anthropic's Claude Opus 4.6 in just 30 minutes, exposing major security gaps as ...

4monon MSN

AI reasoning models that can ‘think’ are more vulnerable to jailbreak attacks, new research suggests

A new study suggests that the advanced reasoning powering today’s AI models can weaken their safety systems.

HHS

Open-Weight AI Models Fail the Jailbreak Test

Cisco tested eight major open-weight artificial intelligence models and found multi-turn jailbreak attacks succeeded nearly 93% of the time. (Image: Shutterstock) Enterprise artificial intelligence ...

7don MSN

How our AI bots are ignoring their programming and giving hackers superpowers

Welcome to the age of AI hacking, in which the right prompts make amateurs into master hackers.

Hosted on MSN

Poets are now cybersecurity threats: Researchers used 'adversarial poetry' to jailbreak AI and it worked 62% of the time

Today, I have a new favorite phrase: "Adversarial poetry." It's not, as my colleague Josh Wolens surmised, a new way to refer to rap battling. Instead, it's a method used in a recent study from a team ...

Microsoft

AI as tradecraft: How threat actors operationalize AI

Threat actors are operationalizing AI to scale and sustain malicious activity, accelerating tradecraft and increasing risk for defenders, as illustrated by recent activity from North Korean groups ...

9mon

Exclusive: New Claude Model Triggers Stricter Safeguards at Anthropic

A smartphone displaying the logo of Claude, an AI language model developed by Anthropic. Correspondent Today’s newest AI models might be capable of helping would-be terrorists create bioweapons or ...

PointGuard AI Launches Advanced Guardrails to Prevent Indirect Prompt Injection Attacks

New protections inspect documents, metadata, prompts, and responses before AI models can be manipulated Indirect prompt ...

HUB

An efficient, reusable framework to evaluate AI safety

As new large language models, or LLMs, are rapidly developed and deployed, existing methods for evaluating their safety and discovering potential vulnerabilities quickly become outdated. To identify ...

OpenAI to acquire Promptfoo to expand AI application testing capabilities

Founded in 2024, Promptfoo began as an open-source framework for evaluating AI prompts and model behavior. It later expanded into a commercial platform used by developers and enterprise security teams ...

AI security: How to protect your tools and processes

Zapier reports that AI security is crucial as AI usage grows, presenting risks like data breaches and adversarial attacks ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results