Alert AI: LLM Applications and AI Agent security alerts

Alert AI: LLM Applications and AI Agents Security Alerts

Alert AI “Secure AI anywhere” AI security gateway uses a comprehensive, multi-layered defense in depth and monitoring strategy for generating 500+ unique alerts for LLM and AI agent applications.

Alert AI :  Detections

Lists of Security Alerts

Data and model security Alerts

  1. Unauthorized access attempt to AI model API.
  2. High volume of requests to model API from a single, unknown IP.
  3. Unusual model access patterns detected (e.g., from a new geographic location).
  4. Data poisoning attempt detected in model training pipeline.
  5. Unexpected changes in model weights or architecture detected.
  6. Unauthorized deployment of new model version.
  7. Attempted model inversion attack identified.
  8. Suspicious download of model artifacts from internal network.
  9. Source code for proprietary models accessed by unauthorized user.
  10. Model theft attempt using reverse engineering identified.
  11. Model inference endpoint shows signs of enumeration attack.
  12. Adversarial inputs designed to cause model misclassification detected.
  13. Unusual increase in model retraining jobs.
  14. Anomaly detected in model validation metrics during training.
  15. Sensitive data exposure in model logs or outputs.

Inference, generation Alerts

  1. Unusually high latency for inference requests from specific user.
  2. Sudden spike in inference error rates.
  3. Output generation rate drops below expected threshold.
  4. High resource consumption (GPU/TPU) for a specific inference endpoint.
  5. Significant deviation in inference output patterns.
  6. High rate of API key rotation for inference services.
  7. Suspicious inference requests originating from outside expected network ranges.
  8. Inference request payload contains obfuscated or malicious code.
  9. Model produces an unusually high volume of rejected or filtered responses.
  10. Unexpected changes to model configuration file in production.
  11. Abnormal inference traffic originating from a known bad actor IP.
  12. Inference service crashes or fails to respond.
  13. Configuration of a production model changed without proper change control.
  14. Inference endpoint logs show failed authentication attempts.
  15. Output of a deployed model does not match expected baseline.

AI development lifecycle Alerts

  1. PII or sensitive data detected in a new training dataset.
  2. Changes to production AI pipeline (CI/CD) by an unauthorized user.
  3. Mismatch between model lineage metadata and deployed artifact.
  4. Compromised data source used for retraining a production model.
  5. Access to data labeling service from an unapproved IP.
  6. Untrusted third-party component introduced into the AI supply chain.
  7. Security vulnerability scanner identifies weakness in AI application.
  8. Unauthorized changes to model training scripts.
  9. Data used for model evaluation differs significantly from training data.
  10. AI artifact repository accessed by an unapproved user.
  11. New AI vulnerability (e.g., prompt injection method) detected by threat intelligence feed.
  12. Failure to comply with data privacy regulations (e.g., HIPAA) regarding AI data.
  13. High-privilege access granted to AI infrastructure by an unauthorized process.
  14. Model serving infrastructure scaled up or down outside of normal patterns.
  15. Automated vulnerability scanner detects an AI-specific vulnerability.

LLM AI agent application security alerts

Prompt injection and manipulation

  1. User input contains classic jailbreaking phrases.
  2. Indirect prompt injection detected via data from an integrated service.
  3. Evasion of safety filters using complex, multipart prompts.
  4. High frequency of failed prompt injection attempts.
  5. Agent’s output contains unauthorized system commands or database queries.
  6. User attempts to manipulate agent into revealing confidential information.
  7. Output from an agent includes sensitive data from a previous user’s interaction.
  8. Large-scale campaign of malicious prompts detected.
  9. Agent logs show repeated attempts to override initial instructions.
  10. Input prompt contains obfuscated, base64-encoded text.
  11. Multiple users submit identical, suspicious prompts.
  12. High entropy detected in prompt text from a single user.
  13. User attempts to trick the agent into performing social engineering.
  14. Prompt content flagged by toxicity classifier in an unexpected context.
  15. A successful prompt injection resulting in an unwanted action is detected.

Plugin and tool security

  1. Unauthorized access to a plugin API by an LLM agent.
  2. Agent attempts to call a function or tool outside its specified permissions.
  3. Plugin execution fails due to corrupted or malicious input from agent.
  4. Agent performs excessive number of calls to a specific, non-critical plugin.
  5. An agent attempts to access a plugin that it is not explicitly configured to use.
  6. Outgoing request from a plugin contains sensitive user data provided by the agent.
  7. A compromised plugin is identified in the application’s supply chain.
  8. Plugin code is modified in a way that allows for privilege escalation.
  9. Agent calls a plugin with parameters that suggest an OS command injection attempt.
  10. User exploits a vulnerable plugin via an agent to access external resources.

Operational and agent-specific threats

  1. Agent exhibits excessive agency, making decisions without approval.
  2. High number of failed tool calls by an agent.
  3. Agent logs show unexpected execution paths or chain of thought.
  4. An agent’s response indicates a misunderstanding of a security policy.
  5. Agent begins generating untrusted or fabricated content (hallucination).
  6. Agent access token used from an unusual IP address.
  7. Configuration change to an agent’s privilege scope.
  8. Agent attempts to access data outside its designated context window.
  9. Agent shows signs of behavioral drift and degradation over time.
  10. PII detected in an agent’s internal memory or logs.
  11. High frequency of agent restarts or crashes.
  12. Agent attempts to perform an action explicitly disallowed by guardrails.
  13. Log of an agent’s internal monologue reveals sensitive information.
  14. Discrepancy between agent’s final action and its stated intent.
  15. Agent’s decision-making logic is circumvented via a clever prompt.

Performance alerts

  1. Average request latency exceeds acceptable SLO.
  2. Peak request latency increases by more than 25% over baseline.
  3. Sudden drop in throughput (requests per second).
  4. High p99 latency for a specific LLM model.
  5. Increased latency for API calls to third-party services used by agents.
  6. Substantial increase in token generation time.
  7. Time-to-first-token exceeds threshold.
  8. Latency for specific user groups or geographic regions spikes.
  9. High memory usage correlates with increased latency.
  10. Latency increases with an increase in request complexity.
  11. Batch processing time for LLM jobs increases significantly.
  12. Disk I/O contention detected on LLM serving infrastructure.
  13. Unexplained decrease in throughput despite constant load.
  14. Latency spikes for requests during garbage collection.
  15. Latency for requests containing specific keywords or data types increases.

Resource utilization

  1. GPU utilization exceeds predefined threshold for sustained period.
  2. Memory consumption for an LLM process exceeds a safe limit.
  3. Sudden spike in GPU temperature.
  4. High CPU usage on LLM serving nodes.
  5. Network bandwidth utilization for model serving is abnormally high.
  6. Inefficient use of GPU memory detected (e.g., low batch size).
  7. Disk space usage on serving hosts is critically low.
  8. I/O bottlenecks detected during model loading or data retrieval.
  9. High resource consumption by a non-model process on a serving node.
  10. Resource exhaustion error logs from the LLM serving stack.
  11. Memory leak detected in LLM application container.
  12. Unused GPUs are not being scaled down.
  13. Sudden decrease in GPU utilization suggesting a stalled process.
  14. Overprovisioning of cloud resources for LLM inference.
  15. Underutilization of resources for critical LLM tasks.

AI Ops and LLM Ops monitoring alerts, Drift and degradation

  1. Data drift detected in incoming production data, causing a shift from baseline.
  2. Concept drift identified, where the relationship between inputs and outputs has changed.
  3. Output distribution drift detected in LLM responses.
  4. A/B test shows degraded performance in new model version.
  5. Accuracy of a downstream classification model fed by LLM output drops significantly.
  6. Hallucination rate of LLM increases over time.
  7. Coherence score of LLM responses drops below threshold.
  8. PII detection rate in LLM outputs changes unexpectedly.
  9. Sentiment of LLM responses shows a negative drift.
  10. A statistically significant difference is found between current and historical data distributions.
  11. Anomaly detected in the evaluation metrics of a canary deployment.
  12. High rate of human overrides in a human-in-the-loop system.
  13. User satisfaction scores for an LLM-powered feature decline.
  14. LLM’s internal representation (embeddings) drift over time.
  15. Performance benchmark tests for LLM regression.

System and infrastructure health

  1. LLM application container restart count exceeds threshold.
  2. Pods for LLM serving are not ready or are in an error state.
  3. Kubernetes cluster where LLM is deployed is running low on resources.
  4. API gateway for LLM requests reports an increase in 5xx errors.
  5. Data pipeline for LLM input data shows back pressure.
  6. Failure in data collection agent for LLM monitoring.
  7. Model registry service is unreachable.
  8. High number of failed deployments for LLM services.
  9. Logging service for LLM applications experiences high ingestion latency.
  10. Out-of-memory error on a serving node during peak load.
  11. Latency for a specific microservice in the LLM chain increases.
  12. A sudden change in cloud provider pricing for LLM services.
  13. Error rate for an LLM-related cloud function increases.
  14. Discrepancy between logs collected and expected volume.
  15. Security group for LLM infrastructure is misconfigured.

Operational efficiency and cost

  1. Token usage for LLM services exceeds the budget threshold.
  2. Cost of running a specific LLM or agent increases unexpectedly.
  3. Throughput is low relative to the provisioned resources.
  4. High number of redundant or repeated LLM calls.
  5. Prompt engineering changes increase token usage without performance gains.
  6. Usage of a more expensive LLM model exceeds normal usage patterns.
  7. Low cache hit rate for LLM responses.
  8. High volume of failed requests resulting in unnecessary billing.
  9. Inefficient prompt construction leading to excessive token consumption.
  10. Cost of third-party API calls from an agent spikes.

Prompt and input vulnerabilities

  1. Prompt Injection Detected: An input is flagged by a security guardrail as a deliberate attempt to manipulate the agent’s instructions.
  2. Privilege Escalation Attempt: A user prompt attempts to gain unauthorized access or elevate permissions.
  3. PII Detected in Input: User prompt contains Personally Identifiable Information that needs to be redacted before processing.
  4. PII Leakage in Output: Agent’s response contains sensitive PII that was not part of the original prompt.
  5. Confidential Data Exfiltration Attempt: A prompt attempts to trick the agent into revealing internal data or system information.
  6. Jailbreak Attempt: User prompts are attempting to bypass safety filters and constraints.
  7. Harmful Content Generation: Agent output contains content that violates safety policies (e.g., hate speech, dangerous advice).
  8. Denial-of-Service (DoS) via Prompt: A high volume of complex or resource-intensive prompts from a single source.
  9. Unusual User Behavior (Potential Attack): A user suddenly changes their query patterns or query frequency.
  10. Unauthorized API Call: Agent attempts to use an external tool or API without proper authorization.
  11. Malicious File Upload: File upload functionality is used to inject malicious content or commands.
  12. Encoding Obfuscation: A prompt uses complex encoding to hide malicious intent.
  13. Multi-lingual Obfuscation: The prompt switches between languages to bypass filters.
  14. Excessive Tool Use: The agent makes an unusually high number of calls to a specific tool for a single request.
  15. SQL/Code Injection Attempt: A prompt includes malicious SQL or code snippets aimed at back-end systems.
  16. Indirect Prompt Injection (Stored): Malicious instructions are embedded in external documents consumed by the agent.
  17. Model Data Extraction Attempt: A prompt tries to extract the agent’s internal instructions or system prompts.
  1. Security Vulnerability Scan Detected: Automated probes from security scanners are identified in the prompt logs.
  2. User Access Anomaly: A user with minimal privileges exhibits behavior associated with highly privileged accounts.
  3. Credential Exposure: An agent-generated response inadvertently contains API keys or other credentials.
  4. Violation of Data Privacy Policies: Agent stores or processes data in a way that violates defined privacy rules.
  5. Policy Violation Detected: Agent generates a response that violates a pre-defined policy.
  1. Unusual Tool Call Parameters: Agent uses an external tool with strange or malformed parameters.
  2. Agent Self-Modification Attempt: An advanced agent attempts to rewrite its own system prompts or instructions.
  3. Sensitive Information from Source Document (RAG): The agent retrieves sensitive data from a knowledge base that should have been filtered.

Cost and Operations (25 alerts)

  1. Daily/Weekly Cost Exceeded: The application’s total token or API cost exceeds a predefined budget for a given period.
  2. Cost Spike Detected: Sudden, anomalous spike in token usage or API costs.
  3. High Cost Per User: An individual user or user group is generating a disproportionately high amount of cost.
  4. Expensive Model Usage: Frequent use of a more expensive LLM model when a cheaper one would suffice.Inefficient Prompting: Identification of prompt patterns that use an unnecessarily high number of tokens for a task.
  5. Token Overspending (Single Request): A single request consumes an unusually large number of tokens.
  6. Billing API Failure: Failure to retrieve or process billing data from the LLM provider.
  7. Token Rate Limit Exceeded: The application hits its token or request rate limit with the LLM provider API.
  8. Provider API Latency Increase: Observed latency for the LLM provider’s API is increasing.
  9. API Authentication Failure: The application’s API key is failing to authenticate with the LLM provider.
  10. Resource Exhaustion Error: The LLM provider returns an error indicating that resource limits have been reached.
  11. Caching Miss Rate Increase: Decrease in cache hit rate for frequent or identical requests.
  12. High Rate of Duplicate Requests: Agent is receiving and processing a high volume of identical requests.
  13. Load Balancer Error: Failure in the load balancing of requests to different LLM agents or providers.
  14. High Wait Time in Queue: Requests are spending an excessive amount of time in the processing queue.
  15. High Infrastructure Cost: The underlying infrastructure (e.g., GPU servers) for self-hosted models shows unusually high costs.
  16. New Model Version Cost Increase: The latest version of the model is significantly more expensive without a proportional performance increase.
  17. Token Cost Per Conversation Spike: The average cost per user conversation thread rises above a benchmark.
  18. Cost Discrepancy: Discrepancy between the expected cost and the actual billed cost from the provider.
  19. High Percentage of Zero-Value Tokens: Model generates a high number of filler tokens that don’t add value.
  20. Tool Usage Cost Spike: A specific external tool API call cost increases unexpectedly.
  21. Data Ingestion Cost Spike (RAG): The cost of ingesting and embedding data for a RAG system increases significantly.
  22. Memory Leak Detected: Agent process’s memory usage grows steadily over time, indicating a memory leak.
  23. Configuration Drift: The agent’s configuration parameters have been changed in an unexpected way.
  24. Model Obsolescence: The LLM model in use has been deprecated or is no longer the recommended version.
  25. Prompt injection detected: A user’s prompt is flagged as an attempt to hijack the agent’s instructions, such as Ignore all previous instructions….
  26. Indirect prompt injection: The agent is manipulated through an external data source, like a document retrieved from a knowledge base.
  27. Data leakage in prompt: A user’s prompt contains sensitive information (e.g., PII, PHI) that could be improperly processed.
  28. PII detected in chat history: Sensitive information is found within the conversation history, potentially in violation of privacy rules.
  29. PII redaction failure: An attempt to redact sensitive information (PII) from a prompt or response fails.
  30. Excessive prompt length: A prompt exceeds the configured maximum length, indicating a potential resource-exhaustion attack.
  31. Unusual input language: An input is received in a language not supported by the model, which could indicate a malicious intent.
  32. Character encoding attack detected: Input contains malicious character encoding designed to bypass sanitization filters.
  33. Escalated prompt complexity: The complexity score of a prompt, measured by number of sub-tasks, increases dramatically, indicating a possible abuse attempt.
  34. Malicious file upload: The agent is used to process a file containing malware or other malicious content.
  35. Malicious link injected into prompt: A prompt contains a malicious URL, potentially used for phishing or social engineering.
  36. Jailbreak attempt detected: The user attempts to circumvent the agent’s safety or guardrail policies.
  37. Obfuscated prompt injection: Sophisticated prompt obfuscation techniques are detected, possibly using base64 or other encoding methods.
  38. High-frequency prompt injection attempts: Multiple prompt injection alerts are triggered from the same user or IP address in a short time.
  39. Suspicious agent tool usage request: A prompt attempts to force the agent to use a tool in an unauthorized way.

Tool and environment security

  1. Unauthorized tool call: The agent attempts to invoke a tool or function that it is not permitted to use.
  2. Tool call with malicious arguments: The agent is instructed to use a tool with potentially harmful or invalid input.
  3. Unexpected shell command execution: The agent executes an operating system command that is not part of its normal behavior.
  4. Unauthorized environment variable access: The agent attempts to read a sensitive environment variable.
  5. Sensitive data from tool output: A tool returns sensitive data in its output, which was not properly handled or redacted.
  6. API key or token exposure: The agent’s response or a tool’s output contains a hardcoded API key or access token.
  7. Agent makes outbound network request to blocklisted domain: The agent attempts to connect to a known malicious IP address or domain.
  8. High-frequency tool errors: An agent tool fails repeatedly, indicating a potential attempt to destabilize the system.
  9. Excessive agent actions: An agent takes a high number of steps or actions to complete a single, simple request.
  10. Agent attempts to access restricted file path: The agent’s tool tries to read, write, or execute a file in a restricted directory.
  11. Agent-initiated credential theft attempt: The agent’s generated code or action attempts to interact with a credential store.
  12. Privilege escalation detected via agent tool: The agent is used to execute a tool with higher privileges than intended.
  13. Agent attempts to modify its own configuration: The agent attempts to change its own prompt or instruction set.

Output and Data security

  1. Harmful content generation: The model generates output classified as toxic, hateful, or explicit.
  2. Disclosure of sensitive training data: The LLM produces an output that appears to be a direct quote or replication of its training data, potentially exposing confidential information.
  3. LLM generates malicious code: The model generates code that is flagged as potentially malicious, such as a command injection payload.
  4. Social engineering attempt detected: The agent’s output is flagged as a social engineering tactic, like requesting sensitive information.
  5. Misinformation or hallucination detected: The agent generates a response containing factually incorrect information.
  6. LLM generates copyrighted content: The model’s output contains content that matches known copyrighted material.
  7. Excessive PII in response: The agent’s output contains an unusually high amount of PII or other sensitive data.
  8. Data exfiltration attempt: The agent’s output is structured to transfer data to an external, unauthorized location.
  9. Model bias detected: A pattern of biased or unfair responses is detected during a user interaction.

User and Application security

  1. Unusual user behavior: A user’s interaction pattern (e.g., rapid, repetitive prompts) is flagged as suspicious.
  2. API key compromised: An LLM API key shows unusual usage patterns, such as a spike in token consumption.
  3. Credential stuffing attempt: Multiple failed login attempts are made against the LLM application.
  4. High-frequency, low-relevance queries from a single user: A user submits many queries that consistently receive low-relevance scores, which can indicate an enumeration attack.
  5. Denial-of-service attempt: A high volume of requests is detected from a single source, potentially aiming to overwhelm the service.

Performance  Alerts

Latency and throughput

  1. High end-to-end latency: The total time from user prompt to agent response exceeds a set threshold (e.g., 500ms).
  2. High LLM provider latency: The time for the LLM provider to return a response exceeds a set threshold, indicating an upstream issue.
  3. High retrieval latency: The retrieval-augmented generation (RAG) system is slow to retrieve documents from the knowledge base.
  4. High tool execution latency: An external tool or API call takes too long to execute.
  5. High agent planning latency: The agent’s internal planning or reasoning steps take an excessive amount of time.
  6. Slow token generation rate: The model’s token generation speed drops below an acceptable rate.
  7. Low overall throughput: The number of requests processed per minute falls below a minimum threshold.
  8. Throughput degradation detected: A sudden drop in request throughput is observed.

Cost and resource usage

  1. Spike in token usage: The number of tokens used per conversation or over a time period increases unexpectedly.
  2. Unexpected cost increase from LLM provider: Monitoring shows an unexpected spike in the LLM provider’s billing.
  3. High GPU utilization on inference servers: GPU usage on the LLM infrastructure is consistently at or near maximum capacity.
  4. Memory usage exceeding threshold: The memory consumption of the agent or model server exceeds a safe limit, potentially causing instability.
  5. High CPU utilization on orchestrator: The agent orchestration service shows high CPU load, indicating a bottleneck.
  6. Inefficient tool usage: The agent is using costly tools or APIs for tasks that could be handled more cheaply.

Agent and model behavior

  1. High database load from knowledge base: The vector database or knowledge base is under heavy load, increasing latency and cost.
  2. High rate of hallucinations: The percentage of hallucinated or factually incorrect responses exceeds the defined baseline.
  3. Semantic drift detected: The agent’s output is subtly shifting in tone, style, or quality over time.
  4. High rate of non-responsive agents: A significant number of agent conversations end without a valid response.
  5. High rate of agent loops: An agent gets stuck in a repetitive chain of thoughts or tool calls.
  6. Poor relevance scores: The model’s output consistently fails to match the user’s intent, leading to low relevance scores.
  7. High rate of model-side errors: The LLM provider reports an elevated number of errors during requests.
  8. Incorrect tool selection: The agent repeatedly chooses the wrong tool for a given task.
  9. Evaluation metric degradation: A quality evaluation metric (e.g., RAGAS score, model grade) drops below a defined threshold.
  10. Low user feedback scores: The aggregate user feedback (e.g., thumb-up/thumb-down) is lower than the baseline.

Monitoring and infrastructure alerts

System health and availability

  1. Agent service is down: The core agent service or a critical component is not responding.
  2. LLM API endpoint unavailable: The connection to the LLM provider fails or the provider’s API is not reachable.
  3. Vector database connection error: The agent cannot connect to its knowledge base.
  4. High error rate on LLM API: The LLM API returns an excessive number of error codes (e.g., 4xx, 5xx).
  5. Out-of-memory error: The application or a specific LLM process runs out of memory.
  6. Container crash: A container hosting the agent or a related service fails unexpectedly.
  7. Inference server overload: The LLM inference server reaches its maximum concurrent request limit.
  8. Deployment rollback: A recent deployment of the agent was automatically or manually rolled back due to issues.
  9. Certificate expiration: A TLS/SSL certificate used by the application or an API is approaching its expiration date.

Data and pipeline issues Alerts

  1. Stale knowledge base: The last successful update to the knowledge base or vector index was longer than expected.
  2. Document retrieval failure: The RAG system consistently fails to retrieve relevant documents for a given query.
  3. Data pipeline failure: The ETL process for updating the LLM’s knowledge base fails or is delayed.
  4. Missing metadata: A log or trace lacks essential metadata, hindering debugging.
  5. Data version mismatch: The agent is using a different version of the knowledge base or tool than expected.
  6. Tracing service offline: The OpenTelemetry or other tracing service is down, preventing end-to-end visibility.

Agent and workflow specific Alerts

  1. Agent fails to complete task: A user’s conversation with the agent is terminated without a successful resolution.
  2. Conversation state inconsistency: The agent’s memory or conversational state becomes corrupted.
  3. Workflow path deviation: An agent’s execution trace follows a different and more complex path than normal for a simple request.
  4. Unplanned tool usage sequence: An agent uses tools in an unexpected order during a multi-step task.

Performance and Reliability Alerts

  1. High Inference Latency: Average inference time exceeds a defined threshold (e.g., > 2 seconds).
  2. Latency Spikes: Sudden, unexplained spikes in inference latency, indicating a potential bottleneck.
  3. ncreased Error Rate: Percentage of failed requests surpasses a set threshold (e.g., > 5%).
  4. Agent Failure to Answer: Agent fails to produce a meaningful or coherent response.
  5. Agent Timeouts: Agent requests time out before a response is generated.
  6. Low Throughput: Decrease in the number of successful requests processed per minute, indicating a performance regression.
  7. High Dependency Latency: Agent’s external API or tool calls are consistently slow.
  8. Increased Retries: Spikes in internal retry attempts by the agent, signaling upstream issues.
  9. Degrading Performance over Time (Drift): Gradual deterioration of key performance metrics (latency, accuracy) over weeks or months.
  10. Significant Drop in Success Rate: The agent’s ability to complete its core task successfully falls below a benchmark.
  11. Agent Version Regression: A new agent version or configuration shows a performance decline compared to the previous one.
  12. High CPU/GPU Utilization: Excessive resource consumption by the agent, indicating inefficiency or a stuck process.
  13. High Memory Consumption: Agent process memory usage exceeds its normal operating range, potentially leading to instability.
  14. Resource Saturation: System-level metrics (e.g., CPU, GPU, memory) approach maximum capacity.
  15. Container/Pod Restarts: Frequent restarts of the LLM agent’s underlying infrastructure.
  16. Intermittent Failures: Sporadic, non-reproducible request failures that are hard to diagnose.
  17. Dependency Service Outage: Alerts when a critical external service (e.g., knowledge base, search API) is unreachable.
  18. Unexpected Agent Fallback: Agent frequently falls back to a simpler, less effective response template.
  19. Anomalous Request-to-Token Ratio: Unusual number of tokens generated per request, signaling an inefficient change in agent behavior.
  20. Frequent Prompt Retries: A user or automated system submits the same prompt multiple times without success.
  21. High Rate of Re-conversations: Users are frequently restarting conversations, indicating a failure to resolve the initial request.
  22. Abnormal Distribution of Agent Tools: The frequency of tool calls deviates from its established baseline, suggesting a change in reasoning.
  23. High Agent Handover Rate: Rate of requests being escalated from the agent to a human exceeds a threshold, identifying capability gaps.
  24. Failure to Correct Error States: The agent fails to recover gracefully from a tool call error.
  25. Significant Increase in Token Usage: Sudden spike in token consumption for a user or task, indicating a potential attack or prompt efficiency issue.

AI Integrity and Quality Alerts

  1. allucination Detected: An LLM-as-a-judge or fact-checking mechanism identifies a fabricated claim or factual inconsistency.
  2. Factual Accuracy Drop: Automated evaluation metrics show a significant decrease in factual correctness.
  3. Coherence Score Drop: The output’s logical flow or grammatical correctness deteriorates according to semantic consistency checks.
  4. Relevance Score Drop: Responses are becoming less relevant to user prompts over time.
  5. Unhelpful Response Rate: User feedback or implicit signals indicate a rise in unhelpful responses.
  6. Inconsistency Detected: The agent provides contradictory responses to similar queries.
  7. High Toxicity Score: A toxicity classifier flags an agent’s response for inappropriate, hateful, or harmful content.
  8. Fairness Metric Skew: A predefined fairness metric for specific demographic groups shows a concerning bias.
  9. Incomplete Responses: The agent’s output is consistently incomplete or cut off.
  10. Sentiment Shift: Agent responses show an unexplained shift towards a more negative or inappropriate tone.
  11. Data Quality Anomaly (RAG): Documents retrieved for Retrieval Augmented Generation (RAG) are scored as having low relevance or quality.
  12. Outdated Knowledge (RAG): The agent provides an answer based on outdated information, potentially from a stale knowledge base.
  13. Semantic Drift: The agent’s overall response style or meaning changes subtly over time.
  14. Entity Extraction Failure: The agent fails to properly identify and extract key entities from user input.
  15. New Keyword/Topic Drift: The agent starts generating responses on topics or keywords outside its intended scope.
  16. User Feedback Spike (Negative): Automated systems detect a sudden increase in negative user feedback ratings or flags.
  17. High User Redundancy: Users repeatedly rephrase their queries, indicating the agent is not understanding their intent.
  18. Significant Increase in User Edits: Users are frequently editing agent-generated drafts, indicating poor quality.
  19. High User Rejection Rate: Users consistently dismiss or ignore the agent’s responses.
  20. Model Confidence Drop: Automated systems report a decrease in the model’s confidence scores for its answers.
  21. Discrepancy with Ground Truth: The agent’s response differs significantly from a known ground truth or golden dataset.
  22. Misattribution of Sources (RAG): The agent cites incorrect or non-existent sources for information.
  23. Inaccurate Summarization: The agent provides a summary that misrepresents the source text.
  24. Incorrect Formatting: The agent consistently fails to adhere to a required output format (e.g., JSON, Markdown).
  25. Tool Use Error: The agent’s logic for selecting or using external tools is flawed, leading to poor output.

Performance and reliability alerts (120 alerts)

Latency (30 alerts)

  • Response time Alerts:
    • Critical: 99th percentile response time exceeds 5 seconds for any model endpoint. (5 variations based on endpoint/model)
    • Warning: Average latency for text-to-image-model shows a 20% increase over the baseline. (5 variations based on model type)
    • Anomaly: 95th percentile latency for customer_support_agent deviates significantly from its normal 7-day pattern. (5 variations based on AI agent)
  • System throughput Alerts:
    • Warning: Requests per minute drops below 50% of the historical average for over 15 minutes. (5 variations based on endpoint/service)
  • Dependency latency Alerts:
    • Warning: Latency for external API calls (knowledge base, database) exceeds 2 seconds for more than 10% of requests. (5 variations based on tool/API)
  • Tool call latency Alerts:
    • Critical: Latency for search_api_tool calls exceeds 10 seconds for more than 1% of calls in a 5-minute window. (5 variations based on tool)
  • RAG latency Alerts:
    • Warning: The vector database retrieval step’s average latency increases by 30% for over 30 minutes. (5 variations based on RAG component)

Agent task execution (20 alerts)

  • Task completion rate Alert:
    • Critical: Agent task success rate falls below 85% in a 15-minute window. (5 variations based on task type)
  • Tool failure rate:
    • Critical: Calls to external_api_tool fail for more than 5% of requests over 10 minutes. (5 variations based on tool)
  • Retry loops:
    • Warning: An agent enters a tool-retry loop for the same task more than 3 times within a single trace. (5 variations based on agent)
  • Trace failures:
    • Warning: The rate of failed agent traces increases by 50% compared to the previous hour. (5 variations based on agent)

System health (20 alerts)

  • Server errors:
    • Critical: LLM API endpoint returns a server-side error (HTTP 5xx) for more than 1% of requests. (5 variations based on endpoint)
  • Resource utilization:
    • Warning: CPU/GPU usage exceeds 90% for the LLM inference host for over 5 minutes. (5 variations based on host/resource)
    • Warning: Memory usage on the LLM host exceeds 85% for over 10 minutes. (5 variations based on host/resource)
  • Dependency outages:
    • Critical: A downstream dependency (e.g., API, database) fails with 100% error rate for more than 30 seconds. (5 variations based on dependency)

Service quality metrics (50 alerts)

  • Evaluation metrics:
    • Critical: Automated relevance score for agent responses drops below a threshold of 0.7 for more than 30 minutes. (10 variations based on metric like relevance, coherence)
    • Warning: Average sentiment score of user feedback shifts negatively by more than 15% in the last 24 hours. (10 variations based on sentiment shift)
  • Qualitative failure modes:
    • Critical: Agent output fails the hallucination check in Datadog for more than 5% of requests. (10 variations based on failure type like hallucination, ground truth failure)
    • Warning: Rate of ‘Did not answer’ responses from the agent increases by 20% compared to the baseline. (10 variations based on failure type)
  • Feedback correlation:
    • Warning: A specific type of user query receives a user feedback score below 3/5 more than 20 times in an hour. (10 variations based on user feedback trigger)

Safety, security, and ethical alerts (100 alerts)

Prompt injection and manipulation (30 alerts)

  • Prompt injection detection:
    • Critical: An input containing a known prompt injection signature is detected and successfully bypasses guardrails. (10 variations based on attack signature/severity)
    • Warning: A user prompt is flagged with a high confidence score for attempted jailbreak, but blocked by the safety filter. (10 variations based on attempted jailbreak type)
  • Privilege escalation attempt:
    • Critical: An agent attempts to perform a privileged action (e.g., delete_database) in response to an unverified prompt. (10 variations based on tool action)

Content safety and toxicity (20 alerts)

  • Toxic input:
    • Warning: The safety filter blocks more than 50 toxic inputs from a single user within an hour. (5 variations based on user behavior)
  • Harmful output generation:
    • Critical: Agent generates an output classified as harmful or toxic, bypassing safety checks. (5 variations based on safety classification)
  • Sensitive topic engagement:
    • Warning: Agent repeatedly engages in conversations flagged as touching on sensitive or high-risk topics. (5 variations based on topic)
  • Bias amplification:
    • Warning: Automated bias detection metrics indicate a significant increase in biased language in model outputs. (5 variations based on bias metric)

Data privacy and leakage (30 alerts)

  • PII in input:
    • Warning: The system detects personally identifiable information (PII) like email addresses in a user prompt that was not properly redacted. (10 variations based on PII type)
  • PII in output:
    • Critical: The LLM generates a response containing PII that it should not have access to or that was not present in the original prompt. (10 variations based on PII type)
  • Sensitive data query:
    • Critical: An agent attempts to access a sensitive data source (e.g., patient records) without proper authentication. (10 variations based on data source)

Hallucinations and misinformation (20 alerts)

  • High hallucination rate:
    • Critical: Hallucination detection system reports more than 10% of responses as ungrounded or contradictory to provided context. (5 variations based on hallucination type/model)
  • Fact-checking failure:
    • Warning: The agent provides information that contradicts a known, verified fact in an external knowledge base. (5 variations based on knowledge source)
  • Invented citations:
    • Warning: In a RAG application, the agent generates a plausible-sounding but completely fabricated source or citation. (5 variations based on RAG type)
  • Contextual inconsistency:
    • Warning: Semantic similarity metrics between a generated response and its source context drop below a safe threshold. (5 variations based on consistency score)

Cost and efficiency alerts (80 alerts)

Token usage (30 alerts)

  • High token consumption:
    • Warning: The average number of tokens per request for a specific model increases by 30% compared to the 7-day rolling average. (10 variations based on model/endpoint)
  • Budget threshold exceeded:
    • Critical: Total token usage for the month exceeds 100% of the allocated budget. (10 variations based on budget level)
    • Warning: Total token usage for the week exceeds 75% of the weekly budget. (10 variations based on time frame)

Cost per interaction (30 alerts)

  • Cost per query increase:
    • Warning: The average cost per successful query rises by more than 20% in the last 24 hours. (10 variations based on query type)
  • Expensive queries:
    • Warning: A specific user or type of query is responsible for more than 50% of total costs in an hour. (10 variations based on user/query type)
  • Model-specific cost spikes:
    • Critical: The cost of using a specific, expensive model (gpt-4) unexpectedly spikes over a 30-minute period. (10 variations based on model)

Optimization opportunities (20 alerts)

  • High cache miss rate:
    • Warning: Cache hit rate drops below 20%, indicating under-utilization of the cache. (10 variations based on cache configuration)
  • Inefficient prompts:
    • Warning: Prompt analysis flags more than 100 queries with unusually long and verbose prompts. (10 variations based on prompt characteristic)

Conversational and UX alerts (60 alerts)

Conversation flow (20 alerts)

  • Stuck in loop:
    • Warning: An agent-user conversation trace contains more than 5 identical consecutive turns. (5 variations based on loop type)
  • Conversation abandonment:
    • Warning: The rate of conversations ending abruptly with no resolution increases by 25%. (5 variations based on metric)
  • Escalation rate increase:
    • Warning: The number of agent conversations requiring human handoff increases by 20% in the last hour. (5 variations based on metric)
  • Sentiment shift:
    • Warning: Automated sentiment detection flags a significant shift from positive to negative sentiment within a single conversation. (5 variations based on sentiment change)

User experience (20 alerts)

  • Low user satisfaction:
    • Warning: Average user feedback score on a specific feature drops below 3/5. (5 variations based on feature)
  • Negative feedback spike:
    • Critical: A sudden spike in negative user feedback (“bad response”, “not helpful”) is detected. (5 variations based on feedback type)
  • Unusual interaction patterns:
    • Warning: Anomaly detection identifies a user interaction pattern that deviates from the norm (e.g., unusually short or long conversations). (5 variations based on interaction type)
  • High user effort:
    • Warning: A user asks for clarification or rephrasing more than 3 times in a single conversation. (5 variations based on effort metric)

Dialog management (20 alerts)

  • Out-of-scope query:
    • Warning: Agent receives more than 50 queries identified as outside its defined scope in an hour. (5 variations based on scope)
  • Goal failure:
    • Warning: The agent fails to achieve its primary task objective in more than 10% of conversations over a 15-minute period. (5 variations based on task)
  • Conversation divergence:
    • Warning: The conversation topic diverges significantly from the initial user intent. (5 variations based on topic analysis)
  • Ambiguous user intent:
    • Warning: The system frequently reports low confidence in determining user intent, potentially indicating a need for prompt refinement. (5 variations based on confidence score)

Data and model drift alerts (70 alerts)

Data drift (30 alerts)

  • Input data drift:
    • Warning: The distribution of input token lengths for a specific agent shifts by more than 20% compared to the training data. (10 variations based on data feature)
  • User behavior drift:
    • Warning: The topics of incoming user queries deviate significantly from the historical norm, as detected by an unsupervised topic model. (10 variations based on topic shift)
  • Vocabulary drift:
    • Warning: A sudden influx of new, out-of-vocabulary words is detected in user prompts. (10 variations based on vocabulary metric)

Model output drift (30 alerts)

  • Output distribution change:
    • Warning: The distribution of generated response lengths changes by more than 25% from the model’s baseline behavior. (10 variations based on output metric)
  • Output sentiment drift:
    • Warning: The average sentiment of model outputs shifts unexpectedly, potentially indicating a change in tone. (10 variations based on sentiment shift)
  • Coherence drift:
    • Warning: The automated coherence score of model outputs drops significantly, suggesting less fluent responses. (10 variations based on coherence score)

Concept drift (10 alerts)

  • Drift in effectiveness:
    • Critical: An LLM monitoring system detects a significant increase in a specific type of model error (e.g., outdated information), indicating concept drift. (5 variations based on error type)
  • Ground truth mismatch:
    • Warning: The model’s performance on a daily-updated validation set deteriorates over several days. (5 variations based on metric)

RAG and knowledge base alerts (70 alerts)

Retrieval quality (30 alerts)

  • Low retrieval relevance:
    • Warning: The semantic similarity score between retrieved documents and the user query drops below a specified threshold. (10 variations based on threshold/source)
  • Contextual noise:
    • Warning: The agent is retrieving documents that are irrelevant or introduce noise into the generated response. (10 variations based on retrieval analysis)
  • Source document changes:
    • Warning: Retrieval system detects changes or updates to a core knowledge base document and triggers a verification process. (10 variations based on document type)

Retrieval process (20 alerts)

  • Retrieval failure rate:
    • Critical: The rate of zero-document retrievals for RAG-based queries increases significantly. (10 variations based on retrieval configuration)
  • Database connection issues:
    • Critical: The vector database reports connection errors or high query latency. (10 variations based on database)

Knowledge base freshness (20 alerts)

  • Stale content detected:
    • Warning: Fact-checking identifies information in the knowledge base that is no longer current. (10 variations based on freshness check)
  • Synchronization error:
    • Critical: The knowledge base synchronization process with the source of truth fails for more than 2 consecutive runs. (10 variations based on synchronization task)

Custom and advanced alerts (70 alerts)

Anomaly detection (20 alerts)

  • User interaction anomaly:
    • Warning: Anomaly detection on user interaction patterns flags an unusual surge in requests from a single IP address. (10 variations based on user metric)
  • Model behavior anomaly:
    • Warning: An internal ML model detects a significant anomaly in the LLM’s output embedding space, suggesting a shift in response style. (10 variations based on output metric)

Agentic behavior (30 alerts)

  • Unusual tool sequencing:
    • Warning: An agent attempts an unusual sequence of tool calls that deviates from typical execution paths. (10 variations based on tool sequence)
  • Unexpected termination:
    • Critical: An agent process terminates unexpectedly during a multi-step task execution. (10 variations based on termination event)
  • Tool call failures:
    • Critical: An agent fails to parse or execute a tool call correctly, resulting in an error. (10 variations based on tool call failure)

External alerts and feedback (20 alerts)

  • Third-party tool alerts:
    • Critical: An alert is received from a third-party LLM security provider (e.g., Lakera Guard) indicating a vulnerability. (10 variations based on tool)
  • Human-in-the-loop triggers:
    • Warning: A user manually flags a response as low-quality, triggering a review by a human operator. (10 variations based on human feedback)

Other Alert strategies:

  • Specific granular thresholds: Instead of a single “High Latency” alert, create multiple based on specific endpoints, tool types, or latency bands. For example: LLM_API_Latency > 500ms, LLM_API_Latency > 1s, Retrieval_Latency_Exceeded_P99.
  • Alerts per tool: For agents that use multiple tools (APIs, databases, etc.), specific alerts for each tool’s usage, performance, and security. For instance, Tool_CalendarAPI_Failure_Rate > 5%, Tool_Database_Execution_Latency > 2s.
  • Alerts segmented by user Type or business critical function: Tailored alerts to different user groups (e.g., VIP_User_Experience_Degradation) or business journeys (e.g., Product_Search_Hallucination_Rate_Increase).
  • Add statistical anomaly detection: Create alerts for each metric (latency, token usage, error rate) where a machine learning model detects a statistically significant deviation from the normal baseline.
  • Expand on threat intelligence: Integrate your monitoring with threat intelligence feeds. Alerts could trigger on attempts to use known malicious prompts, attack patterns, or access blocklisted domains.
What is AI Security Resilience and why it is important for GenAI and Agentic AI?

WE ARE AT CUSP OF AI ERA

ALERT AI FOR LASTING AI DEFENSE

START FREE TRIAL, GET UPTO 25% OFF


We are seeking to work with exceptional people who adopt, drive change. We want to know from you to understand Generative AI in business better to secure better.
``transformation = solutions + industry minds``

Hours:

Mon-Fri: 8am – 6pm

Phone:

1+(408)-663-1269

Address:

We are at the heart of Silicon valley few blocks from I-880N and 237 E.

880 McCarthy blvd, Milpitas, CA 95035

FILL CONTACT FORM