Mako Logics

Blog / Business Continuity

Disaster Recovery for Houston Petrochemical: The Named-Storm Playbook

·7 min read

Every spring, the disaster-recovery conversation starts over in Houston. Named-storm forecast season begins June 1. Coastal operators along the Ship Channel, the Port of Houston, and the petrochemical corridor revisit their continuity plans. And every year, the businesses that waited until the forecast showed a cone hit the same wall: workable DR is built in February, not late August.

This is a working architecture for named-storm-ready disaster recovery in the Houston petrochemical and maritime corridors.

The real risk is geographic, not technological

Most IT-side DR conversations drift quickly into technology. Which replication tool. Which cloud region. Which RPO number. All of these matter — but the single most important decision for a Houston Ship Channel-adjacent business is where the secondary site physically sits.

Hurricane Harvey in 2017 demonstrated what many Houston IT leaders had suspected: "geographically redundant" data centers that both sit inside the surge-vulnerable zone of the Houston metro are not, in practice, geographically redundant. Pairs of facilities five miles apart fail together because they share the same evacuation zone, the same grid dependencies, and the same flood risk.

The Westland Bunker sits inland of the Ship Channel surge zone on separate power infrastructure. That's not a marketing claim — it's the reason a coastal-Houston DR architecture pointed there is fundamentally different from one pointed at a second waterfront colo.

Three named-storm-ready DR architectures

Each fits a different tolerance for downtime and data loss.

Warm standby. Production runs at the primary site. A replica of the environment is instantiated at the Tier III facility but kept in a reduced state — lower compute tier, fewer active services, data replicated on a fixed schedule. When a named-storm event is predicted, the standby is spun up to full capacity hours before the first coastal evacuation order. Typical RTO: 1-4 hours. Typical RPO: 15 minutes to 1 hour.

Best for: back-office systems (ERP, accounting, document management) where a short recovery window is acceptable.

Active standby. Production runs at the primary site. A continuously synchronized replica runs at the Tier III facility, ready to assume traffic within minutes of a declared failover. Storage replication is near-real-time. Compute is pre-provisioned. Typical RTO: 15-30 minutes. Typical RPO: under 15 minutes.

Best for: customer-facing systems where a multi-hour outage directly costs revenue or triggers SLA penalties.

Active-active. Both sites serve live traffic simultaneously. Storage replication is synchronous or asynchronous depending on distance and latency tolerance. Load balancing steers traffic toward whichever site is healthy. Typical RTO: near-zero. Typical RPO: near-zero.

Best for: systems where even a 30-minute outage is operationally catastrophic. The most expensive architecture and the one that requires the most architectural discipline — it's not the default answer.

Pre-event: what needs to exist in February, not August

A named-storm DR plan that actually works requires three things already in place before the forecast:

  1. A documented runbook specific to your environment. Not a generic template — the actual sequence of steps your team will execute, with named owners, predicted durations, and decision points. Tested at least annually.

  2. Tested failover. A rehearsal where your environment was actually failed over to the DR site and back. Measured RTO and RPO. Documented what broke and what got fixed. "We have replication running" is not a plan; "we failed over on [date] and it took [hours] and we found [issues] and corrected them" is.

  3. Pre-positioned communications. When the event hits, decisions get made in minutes. Before then: who calls the shots, who authorizes the failover, who notifies your clients, who handles press if your operation is visible enough to matter. Locked down and documented.

During the event: the checklist

When a storm is forecast to make landfall in the Houston area, our on-call runs a pre-built pre-event checklist with clients whose operations sit in the surge zone. The short version:

  • T-72 hours: Replica environment brought to full capacity. DNS TTLs shortened so failover propagates quickly. Runbook re-verified against current environment state.
  • T-48 hours: Critical data state validated at the DR site. Client's operations team briefed on failover triggers. Communication channel for the duration of the event confirmed.
  • T-24 hours: Site-specific personnel evacuation status confirmed. On-call coverage at the Bunker increased.
  • T-0: If primary site is affected (power, flooding, structural), declared failover to DR. Traffic cut over. Client operations shift to running from inland infrastructure.
  • Post-event: Damage assessment at primary site. Failback only after primary is verified stable — often days or weeks after landfall.
  • Post-event +30 days: Incident review. What worked, what didn't, what changes for next year.

This isn't theoretical. We do this.

Cyber-DR vs. weather-DR — two different architectures

One nuance worth calling out: weather-DR and cyber-DR are not the same thing, and shouldn't share the same infrastructure uncritically.

Weather-DR assumes your primary environment is physically unavailable but intact. You're running from the secondary until the primary is back.

Cyber-DR (specifically ransomware) assumes your primary environment may be compromised end-to-end. If your secondary site is online-replicated from the primary, the ransomware replicated there too. The recovery is from immutable backups on isolated infrastructure that hasn't been touched by the attack.

For Houston petrochemical and maritime clients, we architect these as two parallel systems. Weather events use the warm-or-active replica. Ransomware events use the immutable backup chain, restored to clean infrastructure. The playbooks are different, the rehearsal is different, and the mental model is different.

Where this fits

Named-storm season isn't a hypothetical planning exercise in Houston. It's a recurring constraint that shapes how serious operators architect their infrastructure. The businesses that don't plan for it find out why in August.

Talk through your situation.

The articles cover the general shape. Your specific situation deserves a real conversation.

Related

Keep reading.