Distribution
Maggie includes a distributed runtime for exchanging code between VMs. Code is content-addressed -- every method and class is identified by a SHA-256 hash of its normalized AST. Two VMs can push and pull code over HTTP/2, and the receiver independently verifies every chunk before accepting it. No trust in the sender is required; the content hash is the proof.
This chapter explains how content addressing works, how the sync protocol exchanges code, and how to configure your project for distribution.
Content Addressing
Every compiled method in Maggie has a content hash -- a 32-byte SHA-256 digest computed from a normalized representation of the method's AST. The normalization process strips away surface differences that do not affect semantics:
- Variable names are replaced with de Bruijn indices. A local variable is identified by its scope depth and slot index, not its textual name. Two methods that do the same thing with differently named variables produce the same hash.
- Position information (line numbers, column offsets) is stripped.
- Global references are resolved to their fully-qualified names before
hashing, so Button and Widgets::Button hash identically when
they refer to the same class.
The hashing pipeline has three stages:
1. Normalize -- the compiler AST is converted into a parallel "hashing AST" with de Bruijn indices and no position data.
2. Serialize -- the hashing AST is written to a deterministic byte stream using frozen tag bytes. Each node type has a fixed tag, and compound nodes write their children in a defined order.
3. Hash -- the serialized bytes are fed to SHA-256, producing a
32-byte content hash stored on CompiledMethod.ContentHash.
The hash format includes a version prefix byte so that future changes to the normalization rules can invalidate old hashes without silent mismatches. Because the hash depends only on semantics, two developers who write the same method independently will get the same hash.
# Hashing pipeline (Go-level):
# hash.NormalizeMethod(ast, instVars, resolveGlobal)
# -> hash.Serialize(hMethodDef)
# -> sha256.Sum256(bytes)
# -> method.ContentHash = [32]byte{...}
Class Digests
Methods are hashed individually, but classes also need a stable identity. A class digest captures the structural metadata of a class -- its name, namespace, superclass, instance variables, class variables, docstring -- plus the sorted content hashes of all its methods.
The class hash is computed by sorting method hashes lexicographically, sorting variable names, concatenating all fields into a length-prefixed byte buffer with a tag byte, then running SHA-256. This means a class hash changes if any method changes, or if the class structure changes. But reordering methods or variables does not affect the hash.
# ClassDigest fields:
# Name -- "Button"
# Namespace -- "Widgets"
# SuperclassName -- "View"
# InstVars -- ["label", "onClick"]
# MethodHashes -- [<sha256>, <sha256>, ...]
# Hash -- <sha256 of the entire digest>
The ContentStore
The ContentStore is the VM-local index that maps content hashes to compiled methods and class digests. It is the backing store for the distribution protocol -- when a peer asks "do you have hash X?", the ContentStore answers the question.
Key operations:
- IndexMethod(method) -- adds a compiled method, keyed by its
ContentHash. Methods with a zero hash are silently ignored.
- IndexClass(digest) -- adds a class digest, keyed by its hash.
- LookupMethod(hash) / LookupClass(hash) -- returns the item
for a given hash, or nil.
- HasHash(hash) -- true if either a method or class with that
hash exists.
- AllHashes() -- returns every hash in the store.
The ContentStore is thread-safe and populated automatically after compilation. You can inspect it from the CLI:
$ mag sync status
Content store:
Methods: 347
Classes: 42
Total: 389
Chunks
A chunk is the atomic unit of code distribution. Each chunk carries:
- Hash -- the 32-byte content hash identifying this piece of code.
- Type -- one of three kinds:
- ChunkMethod (1) -- a single method with its source text.
- ChunkClass (2) -- a class digest with its name as content
and method hashes as dependencies.
- ChunkModule (3) -- a namespace grouping class hashes.
- Content -- the source text (for methods) or the name (for
classes and modules).
- Dependencies -- hashes this chunk depends on. A class chunk
lists its method hashes; a module chunk lists its class hashes.
- Capabilities -- optional list of required capabilities
(e.g., "File", "HTTP", "Network").
Chunks are serialized with CBOR (Concise Binary Object Representation) in canonical mode for deterministic encoding. The dependency structure forms a DAG: modules depend on classes, classes depend on methods. During transfer, methods are always sent before classes so the receiver can verify dependencies before accepting.
# A method chunk carries source text:
# Hash: abc123...
# Type: ChunkMethod (1)
# Content: "method: factorial [^self <= 1 ifTrue: [1] ifFalse: [self * (self - 1) factorial]]"
# Deps: (none)
# A class chunk references its methods:
# Hash: def456...
# Type: ChunkClass (2)
# Content: "MathUtils"
# Deps: [abc123..., fff789...]
Sync Protocol
The sync protocol has two directions: push (sender-initiated) and pull (requester-initiated). Both use the same Connect/gRPC service running over HTTP/2.
Push flow (Announce + Transfer):
1. The pusher sends an Announce RPC with the root hash, all
available hashes, and an optional capability manifest.
2. The receiver checks capabilities against its policy, then
computes a "want list" -- hashes it does not already have.
3. The receiver replies: ACCEPTED (with want list), ALREADY_HAVE,
or REJECTED (with reason).
4. The pusher sends wanted chunks via Transfer. Methods first,
then classes, ensuring dependency order.
5. The receiver verifies each chunk (compile and hash-check for
methods; dependency existence check for classes and modules).
6. The receiver responds with accepted/rejected counts.
Pull flow (Serve):
1. The requester sends a Serve RPC with the root hash and all
hashes it already has (the "have" set).
2. The server computes a transitive closure from the root hash,
following dependency links through the ContentStore.
3. The server filters out hashes the requester already has and
returns the remaining chunks in dependency order.
Both push and pull are idempotent -- repeating a sync is harmless because content is identified by hash, not by timestamp or sequence.
CLI: mag sync
The mag sync command has three subcommands: push, pull, and
status.
push sends your compiled project to a remote peer:
# Push to a specific peer
$ mag sync push localhost:8081
# Push to all peers configured in maggie.toml
$ mag sync push
# Verbose output
$ mag -v sync push localhost:8081
Announcing 389 hashes to http://localhost:8081
Transferring 47 chunks
Transfer complete: 47 accepted, 0 rejected
Without an explicit address, push reads [sync].peers from
maggie.toml and pushes to each peer in sequence.
pull fetches content from a remote peer by root hash:
$ mag sync pull localhost:8081 a1b2c3d4e5f6... (64 hex chars)
Pull complete: 86 accepted, 0 rejected
Content store: 433 methods, 56 classes
The root hash must be a 64-character hex string (32 bytes). If the root hash already exists locally, pull exits immediately.
status prints a summary of the local ContentStore:
$ mag sync status
Content store:
Methods: 347
Classes: 42
Total: 389
Dual-Hash Verification
Every method carries two content hashes:
- Semantic hash -- computed from the normalized AST (de Bruijn indexed, no position data). Identifies the method's behavior. Two methods with the same logic produce the same semantic hash, even if they have different type annotations or variable names.
- Typed hash -- computed from the normalized AST plus type annotations (parameter types, return type, effect annotations). Identifies the method's behavior AND its type contract.
The semantic hash is the primary identity used for negotiation between nodes. The typed hash is metadata for local verification -- when receiving code from a peer, the sync service verifies both hashes if present.
Class digests also carry both hash layers: a semantic class hash (using semantic method hashes) and a typed class hash (using typed method hashes). The ContentStore indexes by semantic hash and maintains a reverse index from typed hash to semantic hash.
# Two methods with the same logic but different type annotations
# have the same semantic hash but different typed hashes:
#
# method: add: x to: y [ ^x + y ]
# method: add: x <Integer> to: y <Integer> ^<Integer> [ ^x + y ]
#
# Semantic hash: abc123... (same)
# Typed hash: def456... vs fff789... (different)
Node Identity
Each Maggie node has a stable Ed25519 keypair stored in
.maggie/node.key. The 32-byte public key is the node ID.
- Generated automatically on first run (via
LoadOrCreateIdentity) - Displayed as a 4-segment proquint for human readability
- Used to sign outgoing messages (Ed25519 signatures)
- Used as the peer identifier in the trust system (
NodeID)
All inter-node messages are signed by the sender's private key. The receiver verifies the signature against the sender's public key before processing. This prevents message forgery and enables cryptographic peer identification.
# Node identity on disk:
# .maggie/node.key (64 bytes: 32-byte seed + 32-byte public key)
#
# Node ID displayed as proquint:
# babab-dabab-babab-dabab.fidod-gutih-josij-kumol.
# nobop-qurur-sasit-tuvuv.waxay-zibab-dabab-fidod
#
# CLI commands (planned):
# mag node id # display proquint node ID
# mag node init # generate keypair if not present
Message Delivery
The DeliverMessage RPC delivers a signed message to a process
on the target node. Messages are wrapped in a MessageEnvelope
containing:
SenderNode-- the sender's Ed25519 public key (32 bytes)TargetProcess-- process ID on the target nodeTargetName-- registered process name (alternative to ID)ReplyTo-- optional reply address (node + process)Selector-- message selector (method name)Payload-- CBOR-encoded serialized Maggie valueClassHints-- class hashes referenced in payload (for prefetch)Nonce-- monotonic counter for replay preventionSignature-- Ed25519 over (payload || nonce || targetProcess)
The receiver: 1. Decodes the CBOR envelope 2. Checks if the sender peer is banned 3. Verifies the Ed25519 signature 4. Resolves the target process (by name or ID) 5. Deserializes the payload 6. Creates a MailboxMessage and delivers to the process's mailbox
Error kinds returned on failure: signatureInvalid,
processNotFound, mailboxFull, deserializationError.
Trust and Peer Tracking
The TrustStore manages all peer trust decisions. Each peer is
identified by its Ed25519 public key (NodeID, 32 bytes). The
store tracks per-peer permission bits, operation counts, and ban
status.
Permission bits (combinable):
sync-- push/pull code via Announce, Transfer, Serve RPCsmessage-- send messages to registered processes via DeliverMessagespawn-- remote process spawning via forkOn: (Phase B)
Auto-banning: After 3 hash mismatches (configurable), a peer
is automatically banned. Banned peers are denied all operations.
Explicitly adding a peer to maggie.toml un-bans it.
Default permissions for unknown peers are configured in the
[trust] section of maggie.toml. The safe default is sync
only -- messaging and spawning require explicit allowlisting.
# TrustStore per-peer record:
# NodeID -- Ed25519 public key (32 bytes, displayed as hex)
# Perms -- sync, message, spawn (bitmask)
# SuccessfulOps -- 47
# HashMismatches -- 0
# Banned -- false
# Configured -- true (if declared in maggie.toml)
Peer identity for sync RPCs is derived from the X-Maggie-Node-ID
header (hex-encoded public key). For message delivery, it comes
from the Ed25519 key in the signed MessageEnvelope.
Capabilities and Spawn Restrictions
Each chunk can declare capabilities it requires (e.g., "File",
"HTTP", "Network") via a CapabilityManifest. This is
informational metadata carried during code distribution.
The trust model controls what peers can *do* (sync, message, spawn) via permission bits. Capabilities describe what the *code* needs. These are orthogonal:
- Permission bits gate who can connect and what operations
they can perform (configured per-peer in [trust]).
- Spawn restrictions gate what remotely-spawned code can
access (configured node-wide in [trust].spawn-restrictions).
When a remote node spawns a process via forkOn:, the spawned
process runs in a restricted sandbox. The spawn-restrictions
list defines globals that are always hidden from remote code:
[trust]
spawn-restrictions = ["File", "ExternalProcess", "HTTP", "HttpServer"]
The spawner can request *additional* restrictions, but cannot remove policy restrictions. The receiving node always has final say.
Capabilities also integrate with process-level restriction locally. A VM that receives code requiring "File" access can fork a restricted process that hides the File class, regardless of trust settings.
Manifest Configuration: [sync] and [trust]
The [sync] section configures code distribution:
[sync]
capabilities = ["File", "HTTP"] # Capabilities this project declares
listen = ":8081" # Start a sync server on this address
peers = ["localhost:8082"] # Default push targets
The [trust] section configures the peer trust model:
[trust]
default = "sync" # Permissions for unknown peers
ban-threshold = 3 # Mismatches before auto-ban
spawn-restrictions = ["File", "HTTP"] # Hidden from remote spawns
[[trust.peer]]
id = "a1b2c3d4e5f6..." # Ed25519 public key (hex)
name = "build-farm" # Human-readable label
perms = "sync,spawn" # Allowed operations
[sync] fields:
capabilities-- capability strings declared when pushing.listen-- address for the sync server to bind to.peers-- default push targets formag sync push.
[trust] fields:
- default -- permissions for unknown peers (default: "sync").
Options: sync, message, spawn, all, comma-separated.
- ban-threshold -- hash mismatches before auto-ban (default: 3).
- spawn-restrictions -- globals always hidden from remotely
spawned processes.
- [[trust.peer]] -- explicitly configured peers with id
(hex public key), name (label), and perms (permissions).
Peer Networking
The sync service runs as part of the Maggie language server, which
serves both Connect (HTTP/JSON) and gRPC (binary protobuf) on the
same port. The SyncService exposes four RPCs:
Announce-- receive a push announcement and return a want list.Transfer-- receive and verify chunks from a pusher.Serve-- respond to a pull request with available chunks.Ping-- return the hash count in the local store (health check).
When [sync].listen is set in maggie.toml, the sync server starts
automatically. The server is configured with:
- A
CompileFuncfor verifying incoming method chunks. - A
TrustStoreloaded from the[trust]section of the manifest. - The VM's
ContentStorefor looking up and indexing content.
All communication uses standard HTTP/2. There is no custom binary protocol, no UDP, and no peer discovery. Peers are configured explicitly in the manifest or specified on the command line.
# Server startup output:
Maggie language server listening on :8081
Connect (HTTP/JSON): http://localhost:8081/maggie.v1.SyncService/Announce
gRPC (binary): grpc://localhost:8081
Remote Messaging
Maggie supports actor-style messaging between processes on different nodes. Connect to a remote node, get a reference to a named process, and send messages via fire-and-forget or request-response patterns.
Connecting to a remote node:
node := Node connect: 'localhost:8081'.
node ping. "=> true if reachable"
node addr. "=> 'localhost:8081'"
Getting a remote process reference:
worker := node processNamed: 'worker-1'.
The remote process must be registered with registerAs: on the
remote node. processNamed: returns a RemoteProcess value that
can send messages but cannot receive -- it is a one-way handle.
Fire-and-forget (cast):
"Delivers to the remote process's mailbox. Does not wait."
worker cast: #logEvent: with: eventData.
Returns true immediately. The message is sent in a background
goroutine. The remote process reads it with Process receive.
Request-response (asyncSend):
"Returns a Future that resolves when the remote side responds."
future := worker asyncSend: #compute: with: 42.
result := future await. "Blocks until resolved"
result := future await: 5000. "With timeout in milliseconds"
future isResolved. "true/false (non-blocking)"
future error. "Error message or nil"
Value serialization:
All message payloads are serialized with CBOR. Supported types: SmallInt, BigInteger, Float, Boolean, Nil, String, Symbol, Character, Array, Dictionary, Object (with named ivars), and CueValue. Circular object references are handled via backreference tags. Non-serializable types (Process, Channel, Mutex, etc.) raise an error at send time.
Example: distributed counter
Server (registers a counter process):
"On the server node:"
Process current registerAs: 'counter'.
total := 0.
[true] whileTrue: [
msg := Process receive.
msg selector = #increment:
ifTrue: [total := total + msg payload]
]
Client (sends increment messages):
"On the client node:"
node := Node connect: 'server:8081'.
counter := node processNamed: 'counter'.
1 to: 5 do: [:i |
counter cast: #increment: with: i * 10
]
See examples/distributed-counter/ for a complete runnable example.
Cross-Node Monitors and Links
Process monitors and links work transparently across nodes. When you
monitor a RemoteProcess, the VM establishes the monitor via a
MonitorProcess RPC. When the remote process dies, a __down__
notification is sent back and delivered as a local #processDown:
message.
"Monitor a remote process"
node := Node connect: 'worker-host:9200'.
worker := node processNamed: 'compute-worker'.
ref := Process current monitor: worker.
"Wait for DOWN notification"
msg := Process receive.
msg selector = #processDown:
ifTrue: [
info := msg payload.
"info is #(refID processValue reason result)"
('Worker died: ', (info at: 3) printString) println
]
Node failure detection:
When the first cross-node monitor or link is established, a
NodeHealthMonitor starts sending periodic heartbeat pings to
the remote node (every 5 seconds by default). If 3 consecutive
pings fail, the node is declared dead and all monitors/links for
that node receive synthetic nodeDown exit signals.
This means monitors catch both individual process death (normal
notification) and entire node failure (timeout-based detection).
The watcher receives the same #processDown: message in both
cases -- the exit reason is #normal, #error, #linked, or
#nodeDown.
"The watcher handles both process and node death the same way"
msg := Process receive.
msg selector = #processDown: ifTrue: [
reason := msg payload at: 3. "Symbol: #normal, #error, or #nodeDown"
reason = #nodeDown
ifTrue: ['Entire node went down' println]
ifFalse: ['Process exited normally or crashed' println]
]
Links also work across nodes: if a remote linked process dies
abnormally, the local process is killed (or receives an #exit
message if trapping exits).
Remote Process Spawning
Blocks can be sent to remote nodes for execution using forkOn: and
spawnOn:. The block is serialized as a content hash (identifying
the block's compiled method) plus its captured variable values. The
remote node resolves the method by hash, deserializes the upvalues,
and executes the block in a restricted interpreter.
forkOn: returns a Future that resolves when the block completes:
node := Node connect: 'compute-host:9200'.
future := [20 factorial] forkOn: node.
result := future await.
result println
forkOn:with: passes an argument to the remote block:
node := Node connect: 'compute-host:9200'.
future := [:n | n factorial] forkOn: node with: 100.
result := future await: 30000. "30 second timeout"
spawnOn: starts a long-lived process and returns a RemoteProcess:
node := Node connect: 'worker-host:9200'.
worker := [
[true] whileTrue: [
| msg |
msg := Process receive.
msg selector = #compute: ifTrue: [
(msg payload factorial) println
]
]
] spawnOn: node.
worker cast: #compute: with: 42
Captured variables must be serializable. Channels, Processes, Mutexes, and other non-serializable types cause an immediate error at the call site. Communicate with remote processes via their mailbox instead.
Code-on-demand: If the remote node doesn't have the block's method (identified by content hash), it automatically pulls it from the spawning node via the sync protocol. This includes transitive dependencies.
Spawn restrictions: The trust policy's spawn-restrictions
list specifies global names to hide from remotely-spawned
processes. For example, File and HTTP can be hidden to
prevent remote code from accessing the filesystem or network:
# maggie.toml
[trust]
spawn-restrictions = ["File", "ExternalProcess", "HTTP"]
Remotely-spawned processes inherit these restrictions automatically.
They see nil when looking up restricted globals rather than
receiving a permission error.
Distributed Channels
Channels can be passed across node boundaries. When a channel is
serialized (e.g., as a captured variable in forkOn: or as a
message payload), it becomes a RemoteChannel proxy on the
receiving node. The proxy forwards all operations (send, receive,
trySend, tryReceive, close) back to the owning node via RPC.
Owner-proxy model: A channel always lives on the node that created it. Remote nodes get lightweight proxies. All operations go through the owner, preserving FIFO ordering and backpressure.
"Node A creates a channel and sends it to Node B via spawn"
| ch node |
ch := Channel new: 10.
node := Node connect: 'worker:9200'.
"The block captures ch — it becomes a RemoteChannel on Node B"
[:remoteCh |
remoteCh send: 'hello from B'.
remoteCh send: 'another message'
] forkOn: node with: ch.
"Node A receives from the local channel"
ch receive println. "'hello from B'"
ch receive println. "'another message'"
Channel select works with remote channels. When a select
includes remote channels, the VM uses a polling strategy instead
of Go's reflect.Select:
localCh := Channel new.
remoteCh := node processNamed: 'data-source'.
Channel select: {
localCh onReceive: [:v | 'Local: ', v].
remoteCh onReceive: [:v | 'Remote: ', v]
}
Node failure: When a node goes down (detected by the health monitor), all remote channel proxies for that node are marked closed. Subsequent operations return nil / false.
Performance note: Every remote channel operation is a network round-trip. For high-throughput scenarios, use process mailboxes instead — they are fire-and-forget with no blocking.
Security Model
The security model is built on two principles: never trust content, always verify and every message is signed.
Code verification: When a method chunk arrives from a peer, the receiver compiles it independently and verifies the content hash matches. A malicious peer cannot send arbitrary bytecode disguised as a known hash.
Message signing: Every inter-node message is signed with the sender's Ed25519 private key. The receiver verifies the signature before processing. The signature covers the payload, nonce, and target process ID -- preventing message forgery, replay attacks, and message redirection.
Security layers:
- Ed25519 message signing -- every message is cryptographically authenticated. Bad signatures are rejected and count toward the peer ban threshold. - Capability filtering -- reject content requiring unwanted capabilities (e.g., File or Network access). - Peer banning -- auto-ban peers after repeated hash mismatches or signature failures. - Process restriction -- sandbox received code in restricted processes that hide sensitive globals. - No remote code execution -- the protocol transfers source text, not bytecode. The receiver compiles everything itself. - Deterministic serialization -- CBOR canonical encoding ensures identical messages produce identical bytes.
Not yet provided:
- Encrypted transport (use a TLS reverse proxy for now)
- Peer discovery (peers are configured manually)
# Message security flow:
#
# Sender: serialize(value) -> sign(payload || nonce || target) -> send
# Receiver: verify(signature, sender_pubkey) -> check_peer_reputation
# -> deserialize(payload) -> deliver_to_mailbox
#
# If signature fails: reject, record mismatch, ban after 3 failures
Putting It Together
A typical workflow for distributing a Maggie project:
# 1. Configure maggie.toml
[project]
name = "my-app"
namespace = "MyApp"
[source]
dirs = ["src"]
[sync]
capabilities = ["HTTP"]
listen = ":8081"
peers = ["staging.example.com:8081"]
# 2. Compile and check the content store
$ mag ./src/...
$ mag sync status
Content store:
Methods: 127
Classes: 18
Total: 145
# 3. Push to staging
$ mag sync push
Pushing to manifest peer: staging.example.com:8081
Transfer complete: 145 accepted, 0 rejected
# 4. On staging, pull from production
$ mag sync pull prod.example.com:8081 a1b2c3d4...
Pull complete: 89 accepted, 0 rejected
# 5. Start a server that accepts pushes from CI
$ mag -m Main.start
# (sync server auto-starts on :8081 because listen is set)
The content-addressed design means that:
- Pushes are incremental -- only new or changed methods transfer.
- Pulls are idempotent -- requesting the same root hash twice is a no-op.
- Verification is automatic -- every chunk is checked before acceptance.
- Rollback is trivial -- old content still has valid hashes in the store.