Distribution

Maggie includes a distributed runtime for exchanging code between VMs. Code is content-addressed -- every method and class is identified by a SHA-256 hash of its normalized AST. Two VMs can push and pull code over HTTP/2, and the receiver independently verifies every chunk before accepting it. No trust in the sender is required; the content hash is the proof.

This chapter explains how content addressing works, how the sync protocol exchanges code, and how to configure your project for distribution.

Content Addressing

Every compiled method in Maggie has a content hash -- a 32-byte SHA-256 digest computed from a normalized representation of the method's AST. The normalization process strips away surface differences that do not affect semantics:

- Variable names are replaced with de Bruijn indices. A local variable is identified by its scope depth and slot index, not its textual name. Two methods that do the same thing with differently named variables produce the same hash.

Position information (line numbers, column offsets) is stripped.

- Global references are resolved to their fully-qualified names before hashing, so Button and Widgets::Button hash identically when they refer to the same class.

The hashing pipeline has three stages:

1. Normalize -- the compiler AST is converted into a parallel "hashing AST" with de Bruijn indices and no position data.

2. Serialize -- the hashing AST is written to a deterministic byte stream using frozen tag bytes. Each node type has a fixed tag, and compound nodes write their children in a defined order.

3. Hash -- the serialized bytes are fed to SHA-256, producing a 32-byte content hash stored on CompiledMethod.ContentHash.

The hash format includes a version prefix byte so that future changes to the normalization rules can invalidate old hashes without silent mismatches. Because the hash depends only on semantics, two developers who write the same method independently will get the same hash.

Example

# Hashing pipeline (Go-level):
#   hash.NormalizeMethod(ast, instVars, resolveGlobal)
#     -> hash.Serialize(hMethodDef)
#       -> sha256.Sum256(bytes)
#         -> method.ContentHash = [32]byte{...}

Class Digests

Methods are hashed individually, but classes also need a stable identity. A class digest captures the structural metadata of a class -- its name, namespace, superclass, instance variables, class variables, docstring -- plus the sorted content hashes of all its methods.

The class hash is computed by sorting method hashes lexicographically, sorting variable names, concatenating all fields into a length-prefixed byte buffer with a tag byte, then running SHA-256. This means a class hash changes if any method changes, or if the class structure changes. But reordering methods or variables does not affect the hash.

Example

# ClassDigest fields:
#   Name           -- "Button"
#   Namespace      -- "Widgets"
#   SuperclassName -- "View"
#   InstVars       -- ["label", "onClick"]
#   MethodHashes   -- [<sha256>, <sha256>, ...]
#   Hash           -- <sha256 of the entire digest>

The ContentStore

The ContentStore is the VM-local index that maps content hashes to compiled methods and class digests. It is the backing store for the distribution protocol -- when a peer asks "do you have hash X?", the ContentStore answers the question.

Key operations:

- IndexMethod(method) -- adds a compiled method, keyed by its ContentHash. Methods with a zero hash are silently ignored. - IndexClass(digest) -- adds a class digest, keyed by its hash. - LookupMethod(hash) / LookupClass(hash) -- returns the item for a given hash, or nil. - HasHash(hash) -- true if either a method or class with that hash exists. - AllHashes() -- returns every hash in the store.

The ContentStore is thread-safe and populated automatically after compilation. You can inspect it from the CLI:

Example

$ mag sync status
Content store:
  Methods: 347
  Classes: 42
  Total:   389

Chunks

A chunk is the atomic unit of code distribution. Each chunk carries:

- Hash -- the 32-byte content hash identifying this piece of code. - Type -- one of three kinds: - ChunkMethod (1) -- a single method with its source text. - ChunkClass (2) -- a class digest with its name as content and method hashes as dependencies. - ChunkModule (3) -- a namespace grouping class hashes. - Content -- the source text (for methods) or the name (for classes and modules). - Dependencies -- hashes this chunk depends on. A class chunk lists its method hashes; a module chunk lists its class hashes. - Capabilities -- optional list of required capabilities (e.g., "File", "HTTP", "Network").

Chunks are serialized with CBOR (Concise Binary Object Representation) in canonical mode for deterministic encoding. The dependency structure forms a DAG: modules depend on classes, classes depend on methods. During transfer, methods are always sent before classes so the receiver can verify dependencies before accepting.

Example

# A method chunk carries source text:
#   Hash:    abc123...
#   Type:    ChunkMethod (1)
#   Content: "method: factorial [^self <= 1 ifTrue: [1] ifFalse: [self * (self - 1) factorial]]"
#   Deps:    (none)

# A class chunk references its methods:
#   Hash:    def456...
#   Type:    ChunkClass (2)
#   Content: "MathUtils"
#   Deps:    [abc123..., fff789...]

Sync Protocol

The sync protocol has two directions: push (sender-initiated) and pull (requester-initiated). Both use the same Connect/gRPC service running over HTTP/2.

Push flow (Announce + Transfer):

1. The pusher sends an Announce RPC with the root hash, all available hashes, and an optional capability manifest. 2. The receiver checks capabilities against its policy, then computes a "want list" -- hashes it does not already have. 3. The receiver replies: ACCEPTED (with want list), ALREADY_HAVE, or REJECTED (with reason). 4. The pusher sends wanted chunks via Transfer. Methods first, then classes, ensuring dependency order. 5. The receiver verifies each chunk (compile and hash-check for methods; dependency existence check for classes and modules). 6. The receiver responds with accepted/rejected counts.

Pull flow (Serve):

1. The requester sends a Serve RPC with the root hash and all hashes it already has (the "have" set). 2. The server computes a transitive closure from the root hash, following dependency links through the ContentStore. 3. The server filters out hashes the requester already has and returns the remaining chunks in dependency order.

Both push and pull are idempotent -- repeating a sync is harmless because content is identified by hash, not by timestamp or sequence.

CLI: mag sync

The mag sync command has three subcommands: push, pull, and status.

push sends your compiled project to a remote peer:

Example

# Push to a specific peer
$ mag sync push localhost:8081

# Push to all peers configured in maggie.toml
$ mag sync push

# Verbose output
$ mag -v sync push localhost:8081
Announcing 389 hashes to http://localhost:8081
Transferring 47 chunks
Transfer complete: 47 accepted, 0 rejected

Without an explicit address, push reads [sync].peers from maggie.toml and pushes to each peer in sequence.

pull fetches content from a remote peer by root hash:

Example

$ mag sync pull localhost:8081 a1b2c3d4e5f6...  (64 hex chars)
Pull complete: 86 accepted, 0 rejected
Content store: 433 methods, 56 classes

The root hash must be a 64-character hex string (32 bytes). If the root hash already exists locally, pull exits immediately.

status prints a summary of the local ContentStore:

Example

$ mag sync status
Content store:
  Methods: 347
  Classes: 42
  Total:   389

Dual-Hash Verification

Every method carries two content hashes:

- Semantic hash -- computed from the normalized AST (de Bruijn indexed, no position data). Identifies the method's behavior. Two methods with the same logic produce the same semantic hash, even if they have different type annotations or variable names.

- Typed hash -- computed from the normalized AST plus type annotations (parameter types, return type, effect annotations). Identifies the method's behavior AND its type contract.

The semantic hash is the primary identity used for negotiation between nodes. The typed hash is metadata for local verification -- when receiving code from a peer, the sync service verifies both hashes if present.

Class digests also carry both hash layers: a semantic class hash (using semantic method hashes) and a typed class hash (using typed method hashes). The ContentStore indexes by semantic hash and maintains a reverse index from typed hash to semantic hash.

Example

# Two methods with the same logic but different type annotations
# have the same semantic hash but different typed hashes:
#
# method: add: x to: y [ ^x + y ]
# method: add: x <Integer> to: y <Integer> ^<Integer> [ ^x + y ]
#
#   Semantic hash: abc123...  (same)
#   Typed hash:    def456... vs fff789...  (different)

Node Identity

Each Maggie node has a stable Ed25519 keypair stored in .maggie/node.key. The 32-byte public key is the node ID.

Generated automatically on first run (via LoadOrCreateIdentity)
Displayed as a 4-segment proquint for human readability
Used to sign outgoing messages (Ed25519 signatures)
Used as the peer identifier in the trust system (NodeID)

All inter-node messages are signed by the sender's private key. The receiver verifies the signature against the sender's public key before processing. This prevents message forgery and enables cryptographic peer identification.

Example

# Node identity on disk:
#   .maggie/node.key    (64 bytes: 32-byte seed + 32-byte public key)
#
# Node ID displayed as proquint:
#   babab-dabab-babab-dabab.fidod-gutih-josij-kumol.
#   nobop-qurur-sasit-tuvuv.waxay-zibab-dabab-fidod
#
# CLI commands (planned):
#   mag node id           # display proquint node ID
#   mag node init         # generate keypair if not present

Message Delivery

The DeliverMessage RPC delivers a signed message to a process on the target node. Messages are wrapped in a MessageEnvelope containing:

SenderNode -- the sender's Ed25519 public key (32 bytes)
TargetProcess -- process ID on the target node
TargetName -- registered process name (alternative to ID)
ReplyTo -- optional reply address (node + process)
Selector -- message selector (method name)
Payload -- CBOR-encoded serialized Maggie value
ClassHints -- class hashes referenced in payload (for prefetch)
Nonce -- monotonic counter for replay prevention
Signature -- Ed25519 over (payload || nonce || targetProcess)

The receiver: 1. Decodes the CBOR envelope 2. Checks if the sender peer is banned 3. Verifies the Ed25519 signature 4. Resolves the target process (by name or ID) 5. Deserializes the payload 6. Creates a MailboxMessage and delivers to the process's mailbox

Error kinds returned on failure: signatureInvalid, processNotFound, mailboxFull, deserializationError.

Trust and Peer Tracking

The TrustStore manages all peer trust decisions. Each peer is identified by its Ed25519 public key (NodeID, 32 bytes). The store tracks per-peer permission bits, operation counts, and ban status.

Permission bits (combinable):

sync -- push/pull code via Announce, Transfer, Serve RPCs
message -- send messages to registered processes via DeliverMessage
spawn -- remote process spawning via forkOn: (Phase B)

Auto-banning: After 3 hash mismatches (configurable), a peer is automatically banned. Banned peers are denied all operations. Explicitly adding a peer to maggie.toml un-bans it.

Default permissions for unknown peers are configured in the [trust] section of maggie.toml. The safe default is sync only -- messaging and spawning require explicit allowlisting.

Example

# TrustStore per-peer record:
#   NodeID         -- Ed25519 public key (32 bytes, displayed as hex)
#   Perms          -- sync, message, spawn (bitmask)
#   SuccessfulOps  -- 47
#   HashMismatches -- 0
#   Banned         -- false
#   Configured     -- true (if declared in maggie.toml)

Peer identity for sync RPCs is derived from the X-Maggie-Node-ID header (hex-encoded public key). For message delivery, it comes from the Ed25519 key in the signed MessageEnvelope.

Capabilities and Spawn Restrictions

Each chunk can declare capabilities it requires (e.g., "File", "HTTP", "Network") via a CapabilityManifest. This is informational metadata carried during code distribution.

The trust model controls what peers can *do* (sync, message, spawn) via permission bits. Capabilities describe what the *code* needs. These are orthogonal:

- Permission bits gate who can connect and what operations they can perform (configured per-peer in [trust]). - Spawn restrictions gate what remotely-spawned code can access (configured node-wide in [trust].spawn-restrictions).

When a remote node spawns a process via forkOn:, the spawned process runs in a restricted sandbox. The spawn-restrictions list defines globals that are always hidden from remote code:

Example

[trust]
spawn-restrictions = ["File", "ExternalProcess", "HTTP", "HttpServer"]

The spawner can request *additional* restrictions, but cannot remove policy restrictions. The receiving node always has final say.

Capabilities also integrate with process-level restriction locally. A VM that receives code requiring "File" access can fork a restricted process that hides the File class, regardless of trust settings.

Manifest Configuration: [sync] and [trust]

The [sync] section configures code distribution:

Example

[sync]
capabilities = ["File", "HTTP"]   # Capabilities this project declares
listen = ":8081"                  # Start a sync server on this address
peers = ["localhost:8082"]        # Default push targets

The [trust] section configures the peer trust model:

Example

[trust]
default = "sync"                          # Permissions for unknown peers
ban-threshold = 3                         # Mismatches before auto-ban
spawn-restrictions = ["File", "HTTP"]     # Hidden from remote spawns

[[trust.peer]]
id = "a1b2c3d4e5f6..."                   # Ed25519 public key (hex)
name = "build-farm"                       # Human-readable label
perms = "sync,spawn"                      # Allowed operations

[sync] fields:

capabilities -- capability strings declared when pushing.
listen -- address for the sync server to bind to.
peers -- default push targets for mag sync push.

[trust] fields:

- default -- permissions for unknown peers (default: "sync"). Options: sync, message, spawn, all, comma-separated. - ban-threshold -- hash mismatches before auto-ban (default: 3). - spawn-restrictions -- globals always hidden from remotely spawned processes. - [[trust.peer]] -- explicitly configured peers with id (hex public key), name (label), and perms (permissions).

Peer Networking

The sync service runs as part of the Maggie language server, which serves both Connect (HTTP/JSON) and gRPC (binary protobuf) on the same port. The SyncService exposes four RPCs:

Announce -- receive a push announcement and return a want list.
Transfer -- receive and verify chunks from a pusher.
Serve -- respond to a pull request with available chunks.
Ping -- return the hash count in the local store (health check).

When [sync].listen is set in maggie.toml, the sync server starts automatically. The server is configured with:

A CompileFunc for verifying incoming method chunks.
A TrustStore loaded from the [trust] section of the manifest.
The VM's ContentStore for looking up and indexing content.

All communication uses standard HTTP/2. There is no custom binary protocol, no UDP, and no peer discovery. Peers are configured explicitly in the manifest or specified on the command line.

Example

# Server startup output:
Maggie language server listening on :8081
  Connect (HTTP/JSON): http://localhost:8081/maggie.v1.SyncService/Announce
  gRPC (binary):       grpc://localhost:8081

Remote Messaging

Maggie supports actor-style messaging between processes on different nodes. Connect to a remote node, get a reference to a named process, and send messages via fire-and-forget or request-response patterns.

Connecting to a remote node:

Example

node := Node connect: 'localhost:8081'.
node ping.            "=> true if reachable"
node addr.            "=> 'localhost:8081'"

Getting a remote process reference:

Example

worker := node processNamed: 'worker-1'.

The remote process must be registered with registerAs: on the remote node. processNamed: returns a RemoteProcess value that can send messages but cannot receive -- it is a one-way handle.

Fire-and-forget (cast):

Example

"Delivers to the remote process's mailbox. Does not wait."
worker cast: #logEvent: with: eventData.

Returns true immediately. The message is sent in a background goroutine. The remote process reads it with Process receive.

Request-response (asyncSend):

Example

"Returns a Future that resolves when the remote side responds."
future := worker asyncSend: #compute: with: 42.
result := future await.        "Blocks until resolved"
result := future await: 5000.  "With timeout in milliseconds"
future isResolved.             "true/false (non-blocking)"
future error.                  "Error message or nil"

Value serialization:

All message payloads are serialized with CBOR. Supported types: SmallInt, BigInteger, Float, Boolean, Nil, String, Symbol, Character, Array, Dictionary, Object (with named ivars), and CueValue. Circular object references are handled via backreference tags. Non-serializable types (Process, Channel, Mutex, etc.) raise an error at send time.

Example: distributed counter

Server (registers a counter process):

Example

"On the server node:"
Process current registerAs: 'counter'.
total := 0.
[true] whileTrue: [
    msg := Process receive.
    msg selector = #increment:
        ifTrue: [total := total + msg payload]
]

Client (sends increment messages):

Example

"On the client node:"
node := Node connect: 'server:8081'.
counter := node processNamed: 'counter'.
1 to: 5 do: [:i |
    counter cast: #increment: with: i * 10
]

See examples/distributed-counter/ for a complete runnable example.

Cross-Node Monitors and Links

Process monitors and links work transparently across nodes. When you monitor a RemoteProcess, the VM establishes the monitor via a MonitorProcess RPC. When the remote process dies, a __down__ notification is sent back and delivered as a local #processDown: message.

Example

"Monitor a remote process"
node := Node connect: 'worker-host:9200'.
worker := node processNamed: 'compute-worker'.
ref := Process current monitor: worker.

"Wait for DOWN notification"
msg := Process receive.
msg selector = #processDown:
    ifTrue: [
        info := msg payload.
        "info is #(refID processValue reason result)"
        ('Worker died: ', (info at: 3) printString) println
    ]

Node failure detection:

When the first cross-node monitor or link is established, a NodeHealthMonitor starts sending periodic heartbeat pings to the remote node (every 5 seconds by default). If 3 consecutive pings fail, the node is declared dead and all monitors/links for that node receive synthetic nodeDown exit signals.

This means monitors catch both individual process death (normal notification) and entire node failure (timeout-based detection). The watcher receives the same #processDown: message in both cases -- the exit reason is #normal, #error, #linked, or #nodeDown.

Example

"The watcher handles both process and node death the same way"
msg := Process receive.
msg selector = #processDown: ifTrue: [
    reason := msg payload at: 3.   "Symbol: #normal, #error, or #nodeDown"
    reason = #nodeDown
        ifTrue: ['Entire node went down' println]
        ifFalse: ['Process exited normally or crashed' println]
]

Links also work across nodes: if a remote linked process dies abnormally, the local process is killed (or receives an #exit message if trapping exits).

Remote Process Spawning

Blocks can be sent to remote nodes for execution using forkOn: and spawnOn:. The block is serialized as a content hash (identifying the block's compiled method) plus its captured variable values. The remote node resolves the method by hash, deserializes the upvalues, and executes the block in a restricted interpreter.

forkOn: returns a Future that resolves when the block completes:

Example

node := Node connect: 'compute-host:9200'.
future := [20 factorial] forkOn: node.
result := future await.
result println

forkOn:with: passes an argument to the remote block:

Example

node := Node connect: 'compute-host:9200'.
future := [:n | n factorial] forkOn: node with: 100.
result := future await: 30000.  "30 second timeout"

spawnOn: starts a long-lived process and returns a RemoteProcess:

Example

node := Node connect: 'worker-host:9200'.
worker := [
    [true] whileTrue: [
        | msg |
        msg := Process receive.
        msg selector = #compute: ifTrue: [
            (msg payload factorial) println
        ]
    ]
] spawnOn: node.
worker cast: #compute: with: 42

Captured variables must be serializable. Channels, Processes, Mutexes, and other non-serializable types cause an immediate error at the call site. Communicate with remote processes via their mailbox instead.

Code-on-demand: If the remote node doesn't have the block's method (identified by content hash), it automatically pulls it from the spawning node via the sync protocol. This includes transitive dependencies.

Spawn restrictions: The trust policy's spawn-restrictions list specifies global names to hide from remotely-spawned processes. For example, File and HTTP can be hidden to prevent remote code from accessing the filesystem or network:

Example

# maggie.toml
[trust]
spawn-restrictions = ["File", "ExternalProcess", "HTTP"]

Remotely-spawned processes inherit these restrictions automatically. They see nil when looking up restricted globals rather than receiving a permission error.

Distributed Channels

Channels can be passed across node boundaries. When a channel is serialized (e.g., as a captured variable in forkOn: or as a message payload), it becomes a RemoteChannel proxy on the receiving node. The proxy forwards all operations (send, receive, trySend, tryReceive, close) back to the owning node via RPC.

Owner-proxy model: A channel always lives on the node that created it. Remote nodes get lightweight proxies. All operations go through the owner, preserving FIFO ordering and backpressure.

Example

"Node A creates a channel and sends it to Node B via spawn"
| ch node |
ch := Channel new: 10.
node := Node connect: 'worker:9200'.

"The block captures ch — it becomes a RemoteChannel on Node B"
[:remoteCh |
    remoteCh send: 'hello from B'.
    remoteCh send: 'another message'
] forkOn: node with: ch.

"Node A receives from the local channel"
ch receive println.   "'hello from B'"
ch receive println.   "'another message'"

Channel select works with remote channels. When a select includes remote channels, the VM uses a polling strategy instead of Go's reflect.Select:

Example

localCh := Channel new.
remoteCh := node processNamed: 'data-source'.

Channel select: {
    localCh onReceive: [:v | 'Local: ', v].
    remoteCh onReceive: [:v | 'Remote: ', v]
}

Node failure: When a node goes down (detected by the health monitor), all remote channel proxies for that node are marked closed. Subsequent operations return nil / false.

Performance note: Every remote channel operation is a network round-trip. For high-throughput scenarios, use process mailboxes instead — they are fire-and-forget with no blocking.

Security Model

The security model is built on two principles: never trust content, always verify and every message is signed.

Code verification: When a method chunk arrives from a peer, the receiver compiles it independently and verifies the content hash matches. A malicious peer cannot send arbitrary bytecode disguised as a known hash.

Message signing: Every inter-node message is signed with the sender's Ed25519 private key. The receiver verifies the signature before processing. The signature covers the payload, nonce, and target process ID -- preventing message forgery, replay attacks, and message redirection.

Security layers:

- Ed25519 message signing -- every message is cryptographically authenticated. Bad signatures are rejected and count toward the peer ban threshold. - Capability filtering -- reject content requiring unwanted capabilities (e.g., File or Network access). - Peer banning -- auto-ban peers after repeated hash mismatches or signature failures. - Process restriction -- sandbox received code in restricted processes that hide sensitive globals. - No remote code execution -- the protocol transfers source text, not bytecode. The receiver compiles everything itself. - Deterministic serialization -- CBOR canonical encoding ensures identical messages produce identical bytes.

Not yet provided:

Encrypted transport (use a TLS reverse proxy for now)
Peer discovery (peers are configured manually)

Example

# Message security flow:
#
#   Sender:   serialize(value) -> sign(payload || nonce || target) -> send
#   Receiver: verify(signature, sender_pubkey) -> check_peer_reputation
#             -> deserialize(payload) -> deliver_to_mailbox
#
#   If signature fails: reject, record mismatch, ban after 3 failures

Putting It Together

A typical workflow for distributing a Maggie project:

Example

# 1. Configure maggie.toml
[project]
name = "my-app"
namespace = "MyApp"

[source]
dirs = ["src"]

[sync]
capabilities = ["HTTP"]
listen = ":8081"
peers = ["staging.example.com:8081"]

# 2. Compile and check the content store
$ mag ./src/...
$ mag sync status
Content store:
  Methods: 127
  Classes: 18
  Total:   145

# 3. Push to staging
$ mag sync push
Pushing to manifest peer: staging.example.com:8081
Transfer complete: 145 accepted, 0 rejected

# 4. On staging, pull from production
$ mag sync pull prod.example.com:8081 a1b2c3d4...
Pull complete: 89 accepted, 0 rejected

# 5. Start a server that accepts pushes from CI
$ mag -m Main.start
# (sync server auto-starts on :8081 because listen is set)

The content-addressed design means that:

Pushes are incremental -- only new or changed methods transfer.
Pulls are idempotent -- requesting the same root hash twice is a no-op.
Verification is automatic -- every chunk is checked before acceptance.
Rollback is trivial -- old content still has valid hashes in the store.