Disaster Dissected banner
EPISODE + ARTIFACT DROP

Cert Expiry Timebomb

Not a hack. Not a DDoS. Just a date. This page ships the exact operator artifacts from the episode: a one-page battle card and a PowerShell starter pack to inventory endpoints and flag risky expiry windows.

What broke

A certificate hit its NotAfter timestamp. That sounds small, but it can cascade: VPN drops, APIs return 503s, and browsers show “Not Secure” because trust collapses at the handshake.

The failure mode

  • Leaf cert expires → single endpoint breaks (usually recoverable fast).
  • Intermediate CA expires → everything it signed becomes untrusted (this is the “global outage” version).
  • Caching + staggered refresh makes symptoms intermittent and hard to correlate.

Why it pages people

  • The dashboards may show “Up,” while clients refuse the connection.
  • Monitoring agents may stop reporting, so you go blind during your own incident.
  • If you don’t have an owner mapped to endpoints, fixes stall on access and tribal knowledge.

What you get

These are the same artifacts referenced in the episode — designed to be copied into a real environment.

Cert Expiry Battle Card (PDF)
1 page
outage pki certs
Detection signals, fastest checks, chain verification, and prevention controls — formatted for incident-time use.
PowerShell Starter Pack (ZIP)
scripts
outage powershell pki
Inventory endpoints, perform a fast handshake, pull remote X509 data, and flag expiring certs with clear thresholds.

Emergency playbook

If you’re in the middle of it right now, don’t freestyle. Do the boring, reliable steps.

During the outage

  • Verify the full chain (leaf + intermediate + root) — don’t stop at the leaf.
  • Issue replacement immediately (ADCS / your CA platform).
  • Deploy to all termination points: LB, ingress, gateways, DR.
  • Validate outside-in from a public vantage point.

After the outage

  • Map owners to every endpoint (no owner = no production).
  • Set alerting thresholds: <30 warn, <14 critical, <7 incident.
  • Automate renewals where possible — but keep a manual audit tool as a backstop.

Sources & further reading

Primary sources used in this episode. Confirmed vs inferred is kept explicit throughout — that’s the Disaster Dissected standard.

Method

Credibility > vibes. Rule: no speculation presented as fact. Confirmed vs likely vs unknown stays explicit.

Research standards

  • Primary sources first (status pages, incident reports, postmortems).
  • Clear separation: confirmed vs hypothesis vs unknown.
  • If it’s not sourced, it’s labeled (or removed).

Corrections

  • Wrong info gets corrected fast.
  • Corrections noted in write-up + pinned comment (when relevant).
  • Email: contact@disasterdissected.com

About

Disaster Dissected is video-first: concise breakdowns of modern IT failures, with receipts and practical lessons.

Why this exists

Incidents are inevitable. Repeating the same ones is optional. The goal is to make postmortem thinking accessible — without dumbing it down.

Topics

  • Outages (cloud, SaaS, network)
  • Operational failures (deploys, configs, capacity)
  • Breaches (public + confirmed facts only)

Contact

Tips, corrections, sponsorship inquiries, or “please dissect this incident” — send it over.

Email

contact@disasterdissected.com

If you have links/sources, include them. “Receipts” speed everything up.

Submit an incident

Open submission form