I'm trying to model a distributed protocol in TLA+, and finding it kind of awkward. The basic setup seems straightforward: There are a few nodes, which have some persistent state and some in-memory state. They can't become unavailable but they might restart arbitrarily.

Each node can have a bunch of processes/inflight requests running locally, which can do things like "take a snapshot of local persistent state", "send messages", "reply to message", "wait for replies", "write atomically to local persistent state".

If a node restarts, it loses in-memory state but keeps persistent state (and messages it didn't respond to will eventually be redelivered to it, unless the sending host also restarts).

This seems like it would be a pretty standard setup, but most examples I see aren't doing things like that, and it seems awkward enough in practice that I'm wondering whether I'm doing it wrong.

0

If you have a fediverse account, you can quote this note from your own instance. Search https://gts.y.la/users/shachaf/statuses/01JP8GREJVE8SZC0GB2NXQSPK0 on your instance and quote it. (Note that quoting is not supported in Mastodon.)