A graph that spans two processes

Tutorial 09 closed the in-process series. This is the follow-up: two Reflow networks running as separate processes — the same shape works for two machines, two pods, two regions — federated through the reflow-discovery server. Messages sent on one peer land in an actor running on the other.

What this is for

In-process Reflow handles per-tick concurrency inside one address space — perfect for a CLI tool, a daemon, an embedded worker. Crossing the process boundary means a different set of constraints:

Concern	Solved by
Where is the other peer?	Discovery server (`reflow-discovery`)
Auth — is this a peer I should talk to?	Shared `auth_token`
What if the peer dies?	Heartbeat eviction + auto-reconnect with backoff
What if the network blips?	Same — reconnect transparent to the actor graph
Addressing remote actors	`<actor_id>@<network_id>` proxy in the local graph

The reflow-peer CLI bundles all of that. The actor authoring side is unchanged from earlier tutorials — the same Actor trait, the same lifecycle. Everything new lives in the bridge.

flowchart LR
    disc[reflow-discovery<br/>:9000]:::svc
    alpha[peer alpha<br/>:9100]:::peer
    beta[peer beta<br/>:9101]:::peer

    alpha -. POST /register .-> disc
    beta  -. POST /register .-> disc
    alpha -. GET /networks .-> disc
    beta  -. GET /networks .-> disc

    beta == WebSocket bridge ==> alpha
    classDef svc fill:#e8eef7,stroke:#5a6f96,color:#23314f
    classDef peer fill:#f3eafa,stroke:#774d9c,color:#37214a

Prerequisites

Build the binaries:

cargo build --release -p reflow_distributed
ls target/release/reflow-{discovery,peer}

Both binaries are self-contained — no shared state files, no runtime dependencies beyond a TCP port.

The discovery server

Run it on whichever host every peer can reach. For local dev that means 127.0.0.1; in production it sits on an internal network behind your usual TLS-terminating proxy.

target/release/reflow-discovery --bind 127.0.0.1:9000
# 2026-04-28T22:48:32Z INFO  reflow distributed server listening on http://127.0.0.1:9000

The server is HTTP-only. The contract is two endpoints:

POST /register   {network_id, instance_id, endpoint, capabilities}
GET  /networks   → [{network_id, instance_id, endpoint, capabilities, last_seen}, …]

Peers re-register on every refresh tick (default 15s). Entries older than --entry-ttl-secs (default 60s) get pruned.

Peer alpha

alpha.toml:

network_id   = "alpha"
instance_id  = "alpha-1"
bind_address = "127.0.0.1"
bind_port    = 9100
discovery_endpoints = ["http://127.0.0.1:9000"]
auth_token   = "shared-secret"

alpha doesn't need to dial anyone — beta will dial it. Run it:

target/release/reflow-peer --config alpha.toml
# INFO starting peer: network_id=alpha, instance_id=alpha-1, bind=127.0.0.1:9100
# INFO Registered with discovery endpoint: http://127.0.0.1:9000
# INFO peer ready, awaiting messages (Ctrl-C to stop)

The CLI registers a built-in recorder actor — it logs every inbound message on recorder.in. That's enough for a smoke target; swap in your own actor by writing a small Rust binary that reuses reflow_distributed::peer_config::PeerConfig to load the same TOML.

Peer beta

beta.toml:

network_id   = "beta"
instance_id  = "beta-1"
bind_address = "127.0.0.1"
bind_port    = 9101
discovery_endpoints = ["http://127.0.0.1:9000"]
auth_token   = "shared-secret"

[[connect]]
endpoint = "127.0.0.1:9100"

[[connect]] tells beta to dial alpha on startup. Discovery would eventually surface alpha's endpoint anyway, but for the first message you don't want to wait one refresh cycle.

target/release/reflow-peer --config beta.toml \
    --send 'alpha:recorder:in:hello-from-beta'
# INFO connected to peer at 127.0.0.1:9100
# INFO sent test message to alpha/recorder.in: "hello-from-beta"
# INFO peer ready, awaiting messages (Ctrl-C to stop)

Watch alpha:

INFO Established connection with network: beta
INFO 🌐 BRIDGE: Received remote message from beta to alpha::recorder
INFO ✅ ROUTER: Successfully routed message to local actor recorder
INFO recorder.in: String("hello-from-beta")

That's the full federation: beta → bridge → alpha → local recorder actor's in port. The Actor::run callback fires the same way it would for an in-process message.

Crossing the bridge from your own code

The CLI is fine for ops scripts and integration tests, but a real service writes its own binary. The shape:

#![allow(unused)]
fn main() {
use reflow_distributed::peer_config::PeerConfig;
use reflow_network::distributed_network::DistributedNetwork;

let cfg = PeerConfig::from_path(&args.config)?.to_distributed_config();
let mut net = DistributedNetwork::new(cfg).await?;

// Your own actors here.
net.register_local_actor("transcoder", TranscoderActor::new(), None)?;
net.start().await?;

// Register a proxy for the actor we want to talk to.
net.register_remote_actor("recorder", "alpha").await?;

// Push messages through the local network — the proxy forwards
// across the bridge.
{
    let local = net.get_local_network();
    let net = local.read();
    net.send_to_actor("recorder@alpha", "in",
        Message::String(Arc::new("from-my-binary".into())))?;
}
}

reflow_network::distributed_network is the public API; the reflow-peer binary is a thin wrapper around it. Mixing your own actors with peer-CLI infra is just "use the same TOML config and add register_local_actor calls."

Auth tokens

The bridge enforces token equality on the accept side. With auth_token = "shared-secret" on both peers:

beta  → handshake { auth_token: "shared-secret" }
alpha → handshake response { success: true }

With a mismatch:

beta  → handshake { auth_token: "wrong" }
alpha → handshake response { success: false, error: "auth_token mismatch" }
beta  → connect_to_network returns Err("handshake refused: auth_token mismatch")

If alpha has no auth_token set, anyone can connect — sensible for loopback dev, not for anything that talks to a public IP.

Surviving a peer restart

Kill alpha while beta is running:

^C alpha

Beta's heartbeat catches it within 3 × heartbeat_interval_ms (3s by default). The connection map drops alpha. The reconnect dispatcher starts retrying with capped exponential backoff — 200ms → 400ms → 800ms → 1.6s → 3.2s, give-up after five attempts.

Restart alpha within that window:

target/release/reflow-peer --config alpha.toml

Beta logs:

WARN Reconnect attempt 1/5 for alpha failed: Connection refused
WARN Reconnect attempt 2/5 for alpha failed: Connection refused
INFO Reconnected to alpha (endpoint 127.0.0.1:9100) on attempt 3

No state replay, no message redelivery — anything sent during the gap was dropped. The bridge's job ends at delivery; durability is something you build on top of it (an outbox actor, a Kafka log, retry-with-idempotency-keys). That's deliberate: the bridge stays simple and the persistence policy stays in your hands.

Discovery refresh

Every 15s (configurable on the DiscoveryService) each peer re-registers and re-reads /networks. When a new peer joins:

#![allow(unused)]
fn main() {
let mut events = net.discovery().subscribe();
while let Ok(ev) = events.recv().await {
    match ev {
        DiscoveryEvent::Added(info)   => /* a new peer is up */,
        DiscoveryEvent::Removed(id)   => /* a peer is gone */,
        DiscoveryEvent::Updated(info) => /* same id, new endpoint */,
    }
}
}

Use this if you want to auto-register remote-actor proxies the moment a peer comes online — saves a round-trip vs. the configuration approach.

Production shape

Run discovery as its own service. Behind a TLS-terminating proxy if it crosses an untrusted boundary. The library (reflow_distributed::router) lets you mount the contract inside your existing axum app if you want auth in front of it.
Use a --bind 0.0.0.0:<port> peer + private network. The bridge speaks plain WebSocket today; put it behind your usual network controls (VPC, mesh, mTLS termination) instead of building auth into the protocol.
auth_token is a shared secret, not a JWT. Rotate by re-deploying both sides with the new token. For finer-grained authorization, add it at the actor level — the recorder actor can inspect message metadata before processing.
One peer per failure domain. Each DistributedNetwork is one bridge process. Run multiple per host if you want isolation by trust boundary or cgroups.
Heartbeat tuning. Default heartbeat_interval_ms = 1000 means timeout in 3s. Lower it for chatty local-network setups, raise it if you're crossing a slow link and don't want false positives.

What stays in-process

Everything inside one peer is the same Reflow you already know. Backpressure, ctx.send, ctx.pool, streams, await rules, the catalog — all unchanged. The bridge only sits between peers, never between actors inside a peer. Two consequences:

No latency tax for local fan-out. A 10-actor pipeline on one peer runs at memory speed; only the cross-peer hop pays the WebSocket cost.
Failure granularity matches the peer boundary. A crashing peer drops only the actors it was hosting; everyone else's graphs keep running and pick up reconnects automatically.

What's next

The series stops here for now. The distributed transport works end-to-end and ships with the binaries you need to run it; the gaps that remain (message persistence, ack semantics, backpressure across the bridge) are real but out of scope for the runtime itself — they belong in the actors that sit on either side.

Reflow Documentation