Escalation Routing

Route incidents to the right escalation policy based on alert attributes.

Overview

Escalation Routing allows you to automatically direct incoming alerts to specific escalation policies based on their attributes. Instead of using a single default policy for all incidents, you can create rules that match alert properties like severity, host, service, or custom tags to ensure the right team gets notified.

Smart Routing — Route alerts based on severity, service, environment, or any custom field

Priority-Based — Rules are evaluated in priority order - first match wins

Flexible Conditions — Use AND/OR logic with 13 different operators for precise matching

Delayed Escalation — Add delay to allow alerts to auto-resolve before escalating

How Escalation Routing Works

Alert Received
      ↓
Build Evaluation Context
(severity, host, service, tags, source, etc.)
      ↓
Evaluate Rules (by priority, lowest first)
      ↓
┌─────────────────┬─────────────────┐
│                 │                 │
Rule Matches      No Match
      ↓                 ↓
Use Rule's        Use Default
Policy            Policy
      ↓                 ↓
Apply Initial     Start Escalation
Delay (if set)    Immediately
      ↓
Start Escalation

Key Points:

Rules are evaluated in priority order (lower number = evaluated first)
The first matching rule determines the escalation policy
If no rules match, the tenant's default escalation policy is used
Inactive rules can still match but will suppress notifications

Viewing Routing Rules

The Escalation Routing page displays all your configured rules:

Column	Description
Rule	Name and description
Priority	Evaluation order (lower = first)
Conditions	Number of conditions and logic type (AND/OR)
Policy	Target escalation policy
Delay	Initial delay before escalation starts
Status	Active or Inactive

Tip: Use the search bar and status filter to quickly find specific rules.

Creating Routing Rules

Open Create Dialog — Click the Add Rule button in the toolbar
Enter Basic Info:
- Name — Descriptive rule name
- Description — Explain what this rule does and when it should match
Configure Conditions — Add conditions that must match for the rule to trigger:
- Select a field (event property or custom tag)
- Choose an operator (equals, contains, regex, etc.)
- Enter the value to match
- Add more conditions as needed
- Select AND or OR logic
Select Escalation Policy — Choose which escalation policy to use when this rule matches, or select "Use Default Policy"
Set Priority and Status:
- Priority — Lower values evaluate first (default: 100)
- Status — Active or Inactive
Configure Initial Delay (Optional) — Set a delay in seconds before escalation begins
Save Rule — Click Create Rule to save

Understanding Conditions

Conditions determine when a routing rule triggers. Each condition compares an event field against a value using an operator.

Available Fields

Event Fields

Field	Description
`event.severity`	Alert severity level
`event.priority`	Alert priority (P1-P5)
`event.host`	Affected hostname
`event.hostIp`	Host IP address
`event.service`	Service or application name
`event.environment`	Environment (production, staging, etc.)
`event.region`	Geographic region
`event.title`	Alert title
`event.description`	Alert description

Source Fields

Field	Description
`source.type`	Integration type (prometheus, zabbix, datadog, etc.)
`source.id`	Integration identifier

Custom Fields

Use the Custom Field option to match on any tag or extra field:

Pattern	Example
`tags.{key}`	`tags.team`, `tags.customer`, `tags.tier`
`extra.{key}`	`extra.customField`, `extra.runbookUrl`

Note: Any field from the incoming webhook payload can be accessed using dot notation.

Condition Operators

Operator	Description	Example
`equals`	Exact match (case-insensitive)	`event.severity equals "critical"`
`not_equals`	Does not match	`event.environment not_equals "development"`
`contains`	Contains substring	`event.title contains "database"`
`not_contains`	Does not contain	`event.host not_contains "test"`
`starts_with`	Starts with prefix	`event.host starts_with "prod-"`
`ends_with`	Ends with suffix	`event.service ends_with "-api"`
`in`	Value in list	`event.severity in "critical,high"`
`not_in`	Value not in list	`event.environment not_in "dev,test,staging"`
`regex`	Regex pattern match	`event.host regex "^web-[0-9]+$"`
`exists`	Field exists	`tags.customer exists`
`not_exists`	Field does not exist	`tags.team not_exists`
`greater_than`	Numeric greater than	`extra.errorCount greater_than 100`
`less_than`	Numeric less than	`extra.responseTime less_than 5000`

Condition Logic

AND Logic

All conditions must match for the rule to trigger.

Example: Route to DBA team only for critical database alerts in production

Conditions (AND):
  - event.severity equals "critical"
  - event.service contains "database"
  - event.environment equals "production"

All three conditions must be true.

OR Logic

At least one condition must match for the rule to trigger.

Example: Route high-priority alerts regardless of type

Conditions (OR):
  - event.severity equals "critical"
  - event.priority equals "P1"
  - tags.escalate equals "true"

Any single condition being true triggers the rule.

Tip: For complex routing with both AND and OR requirements, create multiple rules with different priorities.

Initial Escalation Delay

The initial delay feature allows you to pause before starting escalation, giving alerts time to auto-resolve.

How It Works

Alert matches a routing rule with initial delay configured
Incident is created but escalation is not started immediately
During the delay period:
- If the alert auto-resolves → Incident closes, no escalation occurs
- If the delay expires → Escalation begins normally

Delay Configuration

Setting	Description
0 seconds	Immediate escalation (default)
30+ seconds	Minimum delay (values 1-29 auto-correct to 30)
Variance	Actual delay may vary by ±15 seconds

Warning: The delay adds latency before responders are notified. Only use this for alerts that frequently auto-resolve within your monitoring system.

Use Cases

Flapping alerts — Alerts that trigger and resolve rapidly
Self-healing systems — Infrastructure with auto-remediation
Transient issues — Network blips, temporary resource spikes

Rule Priority

Rules are evaluated in priority order (lower number = higher priority):

Priority	Evaluation Order	Typical Use
1-10	First	Critical routing, VIP customers
11-50	Second	High-priority specific matches
51-100	Third	Standard routing rules
101-500	Fourth	General category rules
501+	Last	Catch-all rules

Note: When multiple rules could match the same alert, only the highest-priority (lowest number) matching rule is used.

Priority Example

Rule 1 (Priority 10): Premium customers → VIP Support Policy
Rule 2 (Priority 50): Critical severity → Critical Response Policy
Rule 3 (Priority 100): Database service → DBA On-Call Policy
Rule 4 (Priority 500): Everything else → Default Policy

A critical alert for a premium customer's database would route to VIP Support Policy (Rule 1) because it has the highest priority.

Common Use Cases

Route by Customer Tier

Goal: Premium customers get faster response

Conditions:

AND:
  - tags.tier equals "premium"

Policy: Premium Support (24/7)

Priority: 10 (evaluate first)

Route by Severity

Goal: Critical alerts go to senior engineers

Conditions:

AND:
  - event.severity in "critical,high"
  - event.environment equals "production"

Policy: Critical Response Team

Priority: 20

Route by Service

Goal: Database alerts go to DBA team

Conditions:

OR:
  - event.service contains "database"
  - event.service contains "postgres"
  - event.service contains "mysql"
  - event.service contains "redis"

Policy: DBA On-Call

Priority: 50

Route by Region

Goal: Regional teams handle their own infrastructure

Rule 1 (Priority 100):

Condition: event.region equals "eu-west-1"
Policy: EU Support Team

Rule 2 (Priority 100):

Condition: event.region equals "us-east-1"
Policy: US East Support Team

Rule 3 (Priority 100):

Condition: event.region equals "ap-southeast-1"
Policy: APAC Support Team

Route by Integration Source

Goal: Different tools route to different teams

Conditions:

AND:
  - source.type equals "prometheus"
  - event.service starts_with "k8s-"

Policy: Platform Engineering

Priority: 75

Delayed Escalation for Flapping Alerts

Goal: Avoid noise from self-healing issues

Conditions:

AND:
  - event.service contains "autoscaling"
  - event.severity equals "warning"

Policy: Infrastructure Team

Initial Delay: 300 seconds (5 minutes)

Priority: 80

Suppress Non-Production Alerts

Goal: Create incidents but don't notify for dev/test

Conditions:

OR:
  - event.environment equals "development"
  - event.environment equals "staging"
  - event.environment equals "test"

Policy: Any (won't be used)

Status: Inactive (creates incident, suppresses notification)

Priority: 1 (evaluate first)

Testing Rules

Before activating a rule, you can test how it would evaluate against sample data:

Use the test endpoint to evaluate your rules
Send sample event data matching expected alerts
Review which rule matched and why
Adjust conditions or priorities as needed

Note: The test feature shows you exactly which rule would match and which policy would be selected, helping you validate your routing logic before it affects real incidents.

Managing Rules

Editing Rules

Click the three-dot menu (⋮) on a rule
Select Edit
Modify conditions, policy, or settings
Save changes

Duplicating Rules

Create a copy of an existing rule:

Click the three-dot menu
Select Duplicate
Modify the copy as needed
Save as a new rule

Tip: Duplicating is useful when you need similar rules for different teams or environments.

Activating/Deactivating Rules

Toggle rule status without deleting:

Active — Rule is evaluated and routes to its policy
Inactive — Rule still matches but suppresses notifications (useful for testing or maintenance)

Deleting Rules

Click the three-dot menu
Select Delete
Confirm deletion

Warning: Deleted rules cannot be recovered. Alerts that would have matched will fall through to other rules or the default policy.

Best Practices

Start with Specific Rules

Create specific high-priority rules first (VIP customers, critical services), then add broader rules with lower priority.

Use Meaningful Priorities

Leave gaps between priorities (10, 20, 50, 100) so you can insert new rules later without renumbering.

Document Your Rules

Use the description field to explain why each rule exists and what scenario it handles.

Test Before Activating

Create rules as inactive first, then test with sample alerts before activating in production.

Use Tags for Flexibility

Configure your integrations to send meaningful tags (team, tier, environment) that you can use in routing conditions.

Review Rules Periodically

Audit routing rules when teams change, services are deprecated, or escalation policies are updated.

Avoid Overlapping Rules

When rules have similar conditions, ensure priorities are set correctly so the intended rule matches first.

Troubleshooting

Rule not matching expected alerts

Verify rule is Active
Check condition field names match your alert data exactly
Test with sample data to see actual field values
Check if a higher-priority rule is matching first
Verify condition logic (AND vs OR)

Wrong policy being used

Check rule priorities — lower numbers evaluate first
Look for other rules that might match the same alerts
Verify the correct policy is selected in the rule
Test to see which rule actually matches

Alerts going to default policy

Verify at least one rule should match the alert
Check all conditions in matching rule are satisfied
Ensure rule is active
Review actual alert field values in webhook samples

Notifications not being sent

Check if matched rule is Inactive (suppresses notifications)
Verify the escalation policy has active targets
Check if initial delay is still pending
Review escalation policy configuration

Delay not working as expected

Verify initialDelaySeconds is set correctly
Remember values 1-29 auto-correct to 30
Actual delay may vary by ±15 seconds
Check if alert resolved before delay expired

Regex conditions not matching

Test regex pattern separately
Patterns are case-insensitive
Escape special characters properly
Check for leading/trailing whitespace in values

Quick Reference

Condition Operators

Operator	Type	Description
`equals`	String	Exact match
`not_equals`	String	Not equal
`contains`	String	Substring match
`not_contains`	String	No substring
`starts_with`	String	Prefix match
`ends_with`	String	Suffix match
`in`	List	Value in list
`not_in`	List	Value not in list
`regex`	Regex	Pattern match
`exists`	Boolean	Field exists
`not_exists`	Boolean	Field missing
`greater_than`	Numeric	Greater than
`less_than`	Numeric	Less than

Event Fields

Field	Source
`event.severity`	Alert severity
`event.priority`	Alert priority
`event.host`	Hostname
`event.hostIp`	IP address
`event.service`	Service name
`event.environment`	Environment
`event.region`	Region
`event.title`	Alert title
`event.description`	Alert description
`source.type`	Integration type
`source.id`	Integration ID
`tags.*`	Any tag value

Priority Guidelines

Range	Use Case
1-10	VIP/Premium routing
11-50	Critical overrides
51-100	Standard routing
101-500	General categories
501+	Catch-all rules