When the Cable Gets Kicked — How Two Units Stay Coordinated After Disconnection
Two dedicated devices, one cable, zero cloud servers. When that cable gets disconnected, both machines keep working independently. When it is reconnected, they synchronize automatically.
A Pattern You Already Know
You may not have heard the term "Hub-Spoke topology," but you use it every day.
Open any airline's route map. A few large nodes — Denver, Los Angeles, Chicago — radiate dozens of connections to smaller cities. The large nodes are Hubs. The small cities are Spokes. Instead of flying direct from every city to every other city (which would require 435 routes for 30 cities), all traffic flows through a handful of hubs. Fewer routes, more coordination, vastly more efficient.
Point-to-point (top) vs Hub-and-spoke (bottom): routing through a central hub reduces connections dramatically. Source: Wikipedia (public domain)
Logistics works the same way. FedEx routes everything through Memphis. A package from Taipei to Kaohsiung might fly through Memphis first — seemingly absurd, but centralized sorting is more efficient than point-to-point relay.
The pattern is everywhere in information systems too: a central node coordinates multiple edge nodes.
But traditional Hub-Spoke has a fatal assumption: the Hub is always online. Flights can wait for the hub airport to reopen. Packages can wait for the sorting center. But in a disaster zone, if the Hub goes down, patients cannot wait.
xGrid's Hub-Spoke makes one critical modification: every Spoke is a complete system, not just a terminal. Hub down? Spoke keeps working. Cable disconnected? Both sides operate independently. Cable reconnected? Automatic synchronization.
The Fundamental Trade-Off
In any system that spans multiple locations, you face an unavoidable choice: prioritize consistency (every location always sees the same data) or availability (every location always keeps working).
Most hospital information systems prioritize consistency. They assume the network is reliable and treat disconnection as an exception that degrades service.
xGrid prioritizes availability. Not because consistency does not matter, but because in disaster medicine, disconnection is not an exception — it is the baseline. The cable will get kicked. The wireless signal will drop. The generator will run out of fuel.
When that happens, every node must keep working. Consistency can be restored later. Availability cannot be deferred.
The Physical Setup
xGrid's field deployment consists of two dedicated edge computing devices:
Hub: Broadcasts a local wireless network, runs the full clinical and resource management systems, has internet access via an upstream connection when available.
Spoke: Connected to the Hub via a single cable. Runs its own complete clinical and resource management instances. No internet access whatsoever.
Clinicians' phones connect to the Hub's wireless network. The Spoke serves a second physical location — a satellite triage tent, a supply depot, an overflow ward.
The cable between them is the only network link. Unplug it, and you have two independent medical systems.
Why Disconnection Is Not an Emergency
In a traditional system, losing the network link triggers alerts, degrades service, and initiates recovery procedures.
In xGrid, losing the network link triggers nothing visible to clinical staff. Both systems continue operating with full functionality — their own databases, their own clinical interfaces, their own nursing stations.
This works because of a fundamental design principle: every Spoke is a complete system. The Hub provides coordination, not capability. When coordination is lost, the only thing that changes is synchronization timing.
Three-Phase Synchronization
When the cable is connected, the Hub and Spoke synchronize using a three-phase process:
Phase 1 — Verify: The Hub checks the Spoke's health. If no response within 30 seconds, synchronization is skipped. A clock-alignment check ensures both devices agree on the current time — otherwise, timestamps become unreliable and synchronization is refused.
Phase 2 — Push (Hub to Spoke): Clinical events flow from the clinical system to the resource system: patient records, registrations, prescriptions, vital signs, handoff records. The clinical system is the authority for patient data.
Phase 3 — Pull (Spoke to Hub): Resource events flow from the resource system to the clinical system: inventory changes, blood bank operations, surgery records, dispensing logs. The resource system is the authority for supply data.
Synchronization is incremental by default — only changes since the last sync. Full snapshot synchronization is available as a recovery fallback.
Six Conflict Resolution Strategies
Two devices modify the same record during a disconnection. When they reconnect, which version wins?
The answer depends on what the data is:
| Strategy | Data Types | Logic |
|---|---|---|
| Append both | Vital signs, handoffs, dispensing records | Immutable events — keep both versions |
| Newest wins | Patient demographics | Compare timestamps, most recent update prevails |
| Hub wins | Registrations, prescriptions, surgery records | The clinical system (Hub) is authoritative |
| Sum both | Inventory quantities | Add both sides' consumption together |
| Always block | Blood products, controlled substances | Never auto-resolve — require human verification |
| On-site wins | Equipment status | The operator physically present takes precedence |
The most notable strategies:
Sum both for inventory: The Hub consumed 5 bandages, the Spoke consumed 3. The correct answer is not "whoever updated last" — that would erase one side's consumption. The correct answer is 5 + 3 = 8 consumed.
Always block for blood products: A blood unit marked as "issued" on both stations simultaneously cannot be resolved by any automated rule. Someone needs to physically verify where that blood unit actually is. The system flags the conflict and waits for a human decision.
Failover: Three Strikes
The Hub checks the Spoke's health every 30 seconds. Three consecutive failures (90 seconds of non-response) trigger a reclassification of the Spoke as disconnected.
If a backup Hub has been configured, the Spoke automatically redirects to it. But returning to the primary Hub after recovery is deliberately manual. Automatic failback sounds convenient, but in a disaster, unexpected transitions are dangerous. An operator must confirm the primary Hub is genuinely stable before switching back.
Station Consolidation: When a Site Evacuates
When a station must evacuate, its data needs to merge into a surviving station. Four merge modes handle different scenarios:
- Full merge: All data flows into the target station
- Partial merge: Only selected resource categories transfer
- Backup import: Restore from a portable backup
- Emergency close: Station shutdown with complete data preservation
Every merge records exactly what moved: how many inventory items, blood products, equipment records, and surgery records transferred. This audit trail answers the post-disaster question that always arises: "When that station evacuated, where did everything go?"
Surgeries in progress during evacuation receive a special status — Interrupted — distinct from Completed or Cancelled. A surgery halted because of incoming danger requires different follow-up than one that finished normally or was electively cancelled.
Software Updates: Controlled, Never Automatic
The Hub has internet access and can receive remote update pushes. The Spoke does not — its updates are relayed through the Hub via the internal network.
The process is controlled. Updates never auto-apply — an operator confirms deployment to each station. In a disaster environment, you do not want the system to change behavior without an explicit human decision. Automatic updates are convenient, but they are also an uncontrolled variable. Stability always takes priority over new features.
Designing for Disconnection
Most distributed systems start from the premise "the network is reliable" and add exception handling for when it is not.
xGrid starts from the premise "the network is unreliable" and optimizes for when it happens to work.
This inversion produces fundamentally different design:
- Every node is a complete system (not a thin client that depends on a server)
- Synchronization is periodic batch (not real-time streaming)
- Conflict resolution is default behavior (not exception handling)
- Human intervention is the correct answer for some conflicts (not a bug to eliminate)
- Manual operations are preferred over automatic ones when reliability matters more than convenience
The cable will get kicked. That is not a failure mode. It is a design parameter.
Related: Offline-First Is Not a Fallback · ISBAR Is More Than a Handoff Format