Automatic LLM and AI Agent Vulnerabilty Scans: What probes and detectors are used for LLM security vulnerabilities like prompt injection?

Alert AI “Secure AI Anywhere” Zero-Trust AI Security Gatateway and Platform services offer GenAI and Agentic AI Applications Automatic Vulnerability Scan feature using Alert AI integrations to work seamlessly with Full Development, Deployment Life-cycle and collaborative results across Security, Operations, AI teams using several leading LLM Vulnerability Scanners.

One of such Alert AI managed integration is with one popular open-source LLM Vulnerability Scanner NVIDIA Garak, probes are designed to simulate various prompt injection attacks, and detectors analyze the LLM’s responses to identify instances where the attack succeeded.

Here are some of the probes and detectors relevant to prompt injection in Garak:

Probes for Prompt Injection

atkgen: Automated Attack Generation. This probe employs a red-teaming LLM to interact with the target model and generate toxic output.
dan: Various DAN (Do Anything Now) and DAN-like attacks are implemented to attempt to bypass safety instructions.
encoding: Explores prompt injection through text encoding techniques.
goodside: Implements known attacks inspired by Riley Goodside’s work on prompt injection.
gcg: Disrupts system prompts by adding adversarial suffixes.
promptinject: Uses the PromptInject framework, a known technique for adversarial prompt attacks.
latentinjection: Tests if models react to injections hidden within the prompt’s context, including indirect prompt injection and latent jailbreak scenarios.

Detectors for Prompt Injection

Garak’s detectors analyze the responses generated by the LLM to identify if the attack was successful. They can employ various methods depending on the type of attack:

Keyword-based detectors: Look for specific phrases like “DAN”, “Developer Mode”, or “successfully jailbroken” to indicate a successful prompt injection or jailbreak.
Machine learning classifiers: Trained models are used to detect specific types of outputs, like toxic or misleading content, potentially triggered by prompt injection.
LLM as a judge: In some cases, another LLM can be used to evaluate the responses of the target LLM and determine if the desired (or undesired) behavior was elicited.

Importance of Probes and Detectors

Probes and detectors are crucial for identifying and mitigating prompt injection vulnerabilities in LLMs by:

Simulating attacks: Probes mimic real-world attacks, allowing developers to test the robustness of their LLMs against known prompt injection techniques.
Automating vulnerability scanning: Detectors help automate the process of finding weaknesses in LLM applications, according to an article by Databricks.
Generating test cases: The results from these probes and detectors can be used to generate realistic test cases to improve the resilience of LLMs and their guardrails.
Understanding vulnerabilities: Analyzing successful prompt injection attempts helps in understanding the underlying mechanisms of these attacks and developing better countermeasures.

It’s important to remember that prompt injection is a constantly evolving area of AI security. Regularly updating and adapting probes and detectors is crucial to keeping pace with new attack vectors and ensuring the ongoing security of LLM-powered applications.

What other tools besides Garak help with LLM security?

Alert AI “Secure AI Anywhere” Zero-Trust AI Security Gateway and Platform features. Automatic Vulnerability Scans using integration with Several tools.

While Garak is a powerful open-source scanner for vulnerabilities, a comprehensive approach often requires utilizing a suite of tools that address different aspects of LLM security.

Here are some other notable tools that contribute to securing LLMs:

LLM vulnerability scanners and red teaming tools

PyRIT (Python Risk Identification Tool): Developed by the Microsoft AI Red Team, this open-source tool helps security professionals identify risks and vulnerabilities in generative AI systems through automated red teaming tasks.

LLMFuzzer: This open-source framework focuses on fuzz testing LLMs, particularly those integrated into applications via APIs. It helps uncover vulnerabilities by providing a wide range of fuzzing strategies.

BurpGPT: An extension for Burp Suite, BurpGPT integrates LLMs like OpenAI’s to enhance web application security testing. It offers advanced vulnerability scanning and traffic analysis.

LLM defense and guardrail tools

NeMo Guardrails (by NVIDIA): This Python toolkit enables the addition of “programmable guardrails” to LLM-based conversational applications, promoting responsible and ethical use of LLMs.

Guardrails AI: A Python package for specifying structure and type, and validating and correcting LLM outputs. It has a collection of pre-built measures for detecting various risks.

Rebuff: Focused primarily on preventing prompt injection attacks, Rebuff employs a multi-layered defense mechanism including LLM-based detection, vector database integration, and canary tokens.

LLM security monitoring platform

Alert AI “Secure AI Anywhere” Zero-Trust AI Security Gateway Platform services features end-to-end LLM security monitoring and red-teaming, including advanced data masking, role-based access controls, and compliance with regulations like GDPR.

Offers comprehensive LLM security monitoring and testing, including protection against data leakage, malicious prompts, and misinformation.

Presents an LLM security framework that includes security assessments, threat modeling, and specialized training programs.

Offers enterprise LLM security solutions with data loss prevention, full auditability, and malicious code detection features.

Helps protect against prompt injection, data loss, and insecure output handling. It offers real-time alerts and integrates with existing applications.

Alert AI Integrations with Several Tools for specific security areas

Picklescan: Used by Hugging Face, this security scanner detects suspicious actions in Python Pickle files, which are often used with LLMs.

Modelscan: Helps users gain confidence in LLM models by scanning them for vulnerabilities, similar to how Docker images are scanned.

CodeGate: A security proxy that filters input and output between the LLM and the user’s IDE, preventing API key leakage and checking for insecure dependencies or code.

Adversarial Robustness Toolbox (ART): A Python library for machine learning defense against adversarial threats, including LLMs.

As the field of LLM security is rapidly evolving. When choosing tools, it’s essential to consider your specific needs, the type of LLM you’re using, and the potential threats you’re trying to mitigate. A combination of tools, encompassing vulnerability scanning, red teaming, defense mechanisms, and monitoring, often provides the strongest AI Security posture.