Train your team with realistic incident response scenarios
Verify personal readiness and basic incident hygiene.
Before the "incident," every user must verify their Notification Rules. Ensure there is a "High Urgency" rule for a notification within 1 minute.
A "surprise" personal conflict is introduced (e.g., "Your internet is down at home"). The user must find a teammate and perform a Schedule Override for the next 2 hours.
Once the incident triggers, the responder must Acknowledge and then Add a Note.
The responder must manually Add a Responder to the live incident to demonstrate cross-team collaboration.
Once the "fix" is found, the user must Resolve the incident.
Fix configuration in the response workflows.
During the roleplay, the primary on-call person "doesn't answer." The manager must realize the Escalation Policy is misconfigured (e.g., the timeout is too long). They must edit the policy to reduce the "Escalate after" time to 5 minutes.
The manager notices the incident triggered as Low Urgency because the monitoring tool doesn't send a severity field to match Service/Urgency Settings. They must to Service Orchestration / Event Orchestration and create an orchestration to correct this. Example: create a rule: If "event.group" matches "xyz" set Severity to "Critical".
A manager decides it would be beneficial to create a predefined response pattern for some incidents. The manager must go to incident workflows and create a workflow. Example: adds another set of responders, posts a status update or post to status page, and creates a Slack or Microsoft Teams channel for the incident.
Trigger a real-time PagerDuty Event using the Events API v2