Skip to main content
MoleSignal is a single binary: the same executable (the same image) serves every role, and you choose which roles a given process runs. This lets it act as a sandbox that starts with one command, and scale out horizontally into a multi-role cluster. This page is aimed at operators / SREs. It systematically covers the role breakdown, which roles own which background workers, the data flow, cluster discovery, external dependencies and ports, and three deployment topologies: single-node, Docker Compose, and Kubernetes.

Overview: design philosophy

Single binary

Every role is compiled into the same binary and packaged into the same image. Processes are distinguished purely by configuration ([node].roles); no separate build artifacts are required.

Roles selected by config

[node].roles determines what this process exposes and which foreground responsibilities it carries. The default is ["standalone"], i.e. a single process running every responsibility.

State externalized where possible

Metadata lands in Postgres, data lands in the object store. Aside from the Ingester’s WAL/buffer, most roles are stateless and scale horizontally.

Discovery via Postgres

There is no separate gossip / consensus component. Nodes write themselves into the cluster_nodes table via heartbeat, and other nodes discover live peers by querying that table directly.
When to run single-node vs. split by role:
  • Evaluation, development, PoC, low-traffic production.
  • One process exposes HTTP + gRPC and runs every internal responsibility in-process (ingest, query, compaction, alert evaluation, etc.).
  • Dependencies are still externalized: Postgres + object store (or a local filesystem backend).
Role enum values use snake_case in TOML / environment variables (standalone, alert_manager), matching the naming convention of the internal implementation. All config examples below use snake_case.

Node role reference

[node].roles is an array (Vec<Role>), with valid values: standalone, router, ingester, querier, compactor, alert_manager. The default is ["standalone"].
The current implementation only uses the first role in the array for foreground dispatch and the PeerRole annotation on cluster heartbeats. A multi-role form like [node].roles = ["ingester", "compactor"] parses successfully, but only the first element currently takes effect. “A single process carrying multiple foreground roles at once” is a design intent that has not yet landed, so do not rely on it in production. If you need to combine responsibilities, use standalone.
RoleForeground exposureStateful / statelessScaling characteristics
standaloneHTTP + gRPC, running every internal responsibility in-processStateful (embeds an Ingester)Single instance for evaluation; not suitable as a horizontal scaling unit
routerHTTP reverse proxy + rate limiting; does not directly expose internal services beyond the business HTTP listenerStateless (the rate limiter is in-process and ephemeral)Scales horizontally; just put an L4/L7 LB in front
ingesterCarries WAL replay + buffering + periodic flush (the actual work is pre-spawned at process startup)Stateful (WAL, in-memory buffer, file_meta cache)Sharded via consistent hashing; scaling must account for the WAL persistent volume and rebalancing
querierDesigned to be the distributed scan endpointStatelessScales horizontally
compactorBackground tick loop (pre-spawned); the role body itself idlesStateless (processes tick by tick)Single instance recommended (see the high-availability section)
alert_managerTwo pre-spawned background loops: evaluation + dispatch; the role body itself idlesStateful (evaluation state and incident state are in-process)Single instance recommended
The Querier role is currently a placeholder / incomplete. The role dispatch function currently only logs a single “querier role started (stub)” line and idles; the standalone distributed-scan server implementation has not yet been wired up. The fan-out protocol for distributed queries (see the data flow section) exists in the code and is invoked by the coordinator, but the layer that serves the scan RPC as a standalone querier process is not yet complete. The currently usable form of distributed query is the engine composition inside a standalone process.

Which role carries which background worker

One key fact: nearly all background workers are spawned unconditionally during the unified assembly phase at process startup, not inside each role’s dispatch function. They self-gate via feature flags or configuration. This means: in a standalone process, they all run; in a role-split image, whether they actually “do work” depends on whether that role was injected with the corresponding dependencies and configuration. The table below gives the recommended carrying role and interval.
Background workerRecommended carrying roleIntervalConfig keyStatus
heartbeatAll rolesheartbeat_interval_secs, default 5s (first beat immediate)[cluster].heartbeat_interval_secsWired up
stale node cleanup (sweeper)All rolesFixed 60sNone (hardcoded)Wired up
object store health probeAll roles (blocking probe once at startup)health_probe_interval_secs, default 30s[store.object].health_probe_interval_secsWired up
alert evaluation (evaluator)alert_managereval_interval_secs, default 30s[alert_manager].eval_interval_secsWired up
alert dispatch (dispatcher)alert_managerdispatch_interval_secs, default 10s[alert_manager].dispatch_interval_secsWired up
compactioncompactorinterval_secs, default 300s (5 min)[compactor].interval_secsWired up
file_meta_dumper (cold partition flush to storage)compactor (same process as compaction)interval_secs, default 3600s (1h)[storage.file_meta_dump].interval_secs; set enabled=false to disableWired up
scheduled_reportsalert_manager (needs rendering dependencies, see below)Fixed 60s tick; each report’s own cron decides whether it is duethe report’s cron (the tick interval is not configurable)Wired up
search_jobs (async search pool)The role carrying AppState (typically standalone / a node exposing query)Idle poll idle_poll_secs, default 2s; cleanup cleanup_interval_secs, default 3600s[search_jobs].workers (default 2), etc.Wired up
ACME issuance / renewalrouter (TLS entry point)Issuance issue_poll_secs (default 60s); renewal renewal_retry_secs (default 21600s / 6h)[http.tls].issue_poll_secs / renewal_retry_secsCurrently a placeholder / not enabled
The ACME issuance / renewal loops are currently a stub and are not spawned during the assembly phase. The loop interval fields, per-domain cooldown, and single-issuance method are all present, but the functions that scan for pending domains and renewal domains currently just return success without doing anything, and nothing spawns the runner anywhere. Conclusion: automatic certificate issuance / renewal is not yet in effect. TLS itself additionally requires building with the domain-acme-tls feature; OSS builds do not support TLS.
The compactor and alert_manager loops follow a “pre-spawned” model: they run as independent background tasks during the unified assembly phase, while the role body functions compactor::run() / alert_manager::run() are merely idle placeholders. So setting a process’s [node].roles to compactor or alert_manager results in “that process’s role body idling + the background loops running as usual.”

How to select roles

Roles are configured via [node].roles; they can be overridden with environment variables. Environment variables share the MS_ prefix, with section and field separated by a . (dot) (the MS_ prefix is stripped and the remaining key is split on .). For example: MS_NODE.ROLES, MS_STORE.META.DSN, MS_HTTP.PORT.
[node]
# Unique node identifier; leave empty to have the process generate one
id = "ingester-a1"
# Role array; only the first element currently takes effect
roles = ["ingester"]

[cluster]
# gRPC address (host:port) peers use to interconnect
advertise_addr = "10.0.1.21:5082"
heartbeat_interval_secs = 5
peer_timeout_secs = 15
A few bootstrap / secret variables are flat single-underscore and live outside Settings, read directly from the environment: MS_CIPHER_KEY, MS_AUTH_JWT_SECRET_OVERRIDE, MS_LICENSE_FILE, MS_COPILOT_<PROVIDER>_API_KEY, etc. All other structured fields use the MS_<SECTION>.<FIELD> dotted form, exactly as the docker-compose and k8s manifests do (e.g. MS_NODE.ROLES, MS_STORE.META.DSN, MS_CLUSTER.ADVERTISE_ADDR).

Data flow

Ingest path

The entry point lands on some Ingester via the Router (rate limiting + consistent hashing); the Ingester first writes the WAL (durably persisted to disk), then writes the in-memory buffer; a background flush loop encodes the buffer into columnar files + a search index by time window or size threshold and uploads it to the object store, finally landing the FileMeta in Postgres and truncating the flushed WAL segments.
                 X-Org-Id            consistent hash( org_id|stream )
 client ──HTTP/gRPC──► Router ──consistent hashing──► Ingester (stateful)
 :5080/:5082          (stateless, rate limit)         │
                       │ 429 + Retry-After             │ 1) write WAL (to disk, configurable fsync strategy)
                       │ when over org QPS             │ 2) write in-memory buffer (accumulated by seq)

                              background flush loop (triggered by flush_interval_secs / buffer threshold)

                          snapshot → columnar files + search-index sidecar
                                               │ upload
                          ┌────────────────────┴────────────────────┐
                          ▼                                          ▼
                  object store (local / s3 / azure / gcs)        Postgres: FileMeta
                  {org}/{type}/{stream}/{date}/{file-id}  (time_range, rows,
                  …a search-index sidecar per file              min/max, object_key)

                          after a successful flush, seal + truncate the flushed WAL segments
Key config: [wal].dir, [wal].segment_size_mb, [wal].flush_strategy (batch/none/every_write), [wal].sync_level (data/all); [ingester].buffer_max_mb (default 256), flush_interval_secs (default 30), flush_parallelism (default 4); [router.rate_limit].ingest_qps (default 1000 per org, 0 = unlimited).
The Router rate limits at (org_id, route_class) granularity, returning 429 with a Retry-After header when exceeded. org_id comes from the X-Org-Id request header and defaults to default.

Query path

The query entry point is on a node that exposes HTTP. The engine is wrapped layer by layer according to cluster size: local query engine → (with ≥2 querier peers) distributed engine → (when a remote cluster is specified) federated engine.
 client ──POST /api/v1/query──► QueryService
                                   │  Prefer: respond-async / respond-sync
                                   │  or auto_async_threshold_rows triggers automatic async
                          ┌────────┴─────────┐
                       synchronous execution   async → create a search_jobs row, return 202 + job_id

       engine composition: query engine (local) → distributed (≥2 querier) → federated (remote cluster)

            FileMeta partition pruning (org/stream/stream_type/time_range) + full-text (MATCH) pruning

      distributed sharding: hash(object_key) % peer_count → each shard encoded as a scan request (ticket)
                          │  internal scan RPC (:5082)
            ┌─────────────┼─────────────┐
            ▼             ▼             ▼
        Querier peer   Querier peer   …(each reads columnar files, SELECT * scan, returns result batches)
            └─────────────┼─────────────┘

        coordinator UNION ALL → register an in-memory table → execute the full user SQL (aggregation/filter/projection)

                   QueryResult (columns, rows, scanned_rows, took_ms)
Distributed query shards are fanned out to peers via a hash of object_key; the shard SQL only performs a scan (SELECT * FROM <stream>), while the full aggregation runs on the coordinator to avoid partial/final aggregation inconsistencies. The distributed path is taken only when the cluster has ≥2 querier peers; otherwise it falls back to the local engine with no network hop.

Async search job pipeline

 POST /query (Prefer: respond-async or over threshold)

        ▼  write a search_jobs row (state=Pending, request_json, expires_at = now+7d)
   return 202 + job_id + monitoring URL

        ▼  search_jobs worker pool (default 2 workers)
   claim_next_pending()  ──PG FOR UPDATE SKIP LOCKED──► deserialize QueryRequest

        ▼  QueryService::run() executes
   result encoded as NDJSON, uploaded to object store: {org_id}/search_jobs/{job_id}.ndjson

        ▼  mark_done(result_object_key, rows) / mark_failed(error) on failure
   cleanup loop deletes expired jobs every cleanup_interval_secs (DB + object store)

 client polls /api/v1/query/jobs/{job_id}, downloads the NDJSON once state=Done
Config: [search_jobs].workers (default 2), idle_poll_secs (default 2), cleanup_interval_secs (default 3600); the automatic async threshold [querier].auto_async_threshold_rows (default 50 million rows).
The FOR UPDATE SKIP LOCKED claim semantics are safe across multiple workers / multiple nodes: multiple processes carrying search_jobs can share the same search_jobs table and claim concurrently without executing the same job twice.

Federated / multi-cluster query

Target clusters are specified via ?clusters=local,sf,nyc. After scanning locally, the coordinator issues one internal scan RPC (with a Bearer token) to each enabled remote cluster, then UNION ALLs the batches each cluster returns with the local data and executes the full SQL.
 POST /api/v1/query?clusters=local,sf,nyc
        │  license check: contains a non-"local" target AND no federated_search feature → 403 Forbidden

   local scan (single-threaded, for correctness) + fan-out to each enabled remote cluster:
        │   token: env:VAR / cipher_keys:<id> (the latter needs the injected secret store to decrypt)
        │   connect to advertise_addr (http:// or https:// depending on tls_verify)

   remote returns → failures (auth/unreachable) recorded in degraded_clusters and processing continues (graceful degradation)

   merge all successful batches → execute the full user SQL
   return federation: { scanned_clusters, degraded_clusters, degraded_reason }
Federated query is license-gated: as long as clusters contains a non-local target and the current license lacks the federated_search feature, the HTTP layer returns 403 directly. OSS / community editions fall back to single-cluster. Remote cluster definitions are stored in the Postgres remote_clusters table (advertise_addr, token_secret_ref, tls_verify, enabled); clusters with enabled=false are skipped during fan-out. Remote auth currently supports Bearer token only; tls_verify=false maps to http:// (rather than “https with verification skipped”).

Cluster membership and discovery

1

Node registration

The heartbeat task periodically upserts (node_id, role, advertise_addr, last_heartbeat_at_micros) into the Postgres cluster_nodes table (primary key node_id, ON CONFLICT DO UPDATE). The first heartbeat fires immediately, subsequent ones at the configured interval.
2

Heartbeat interval

[cluster].heartbeat_interval_secs, default 5s. advertise_addr defaults to 127.0.0.1:5082, i.e. the gRPC address (host:port) peers use to interconnect.
3

Liveness window and stale cleanup

The liveness window is controlled by [cluster].peer_timeout_secs, default 15s: the registry only returns nodes with last_heartbeat_at >= now - peer_timeout. Additionally, a sweeper deletes cluster_nodes rows that have not heartbeated for over 5 minutes every 60s.
4

Peer discovery

There is no gossip / consensus; in distributed mode each role queries the cluster_nodes table directly and filters live peers by role. standalone mode skips the entire discovery flow, returning only itself and never touching cluster_nodes.
5

Placement algorithm

Router selecting an Ingester: applies consistent hashing over org_id|stream_name then mods, landing deterministically on a particular Ingester. Router selecting a Querier: naive round-robin (now_ns % peer_count), not full consistent hashing. Distributed query sharding: a hash over object_key then mods, fanning out to querier peers.
6

Distributed scan RPC

The coordinator encodes a scan request (org/stream/sql/file_metas/time_range) into a ticket and sends it to peers over the internal scan RPC; the peer reads columnar files, registers an in-memory table, runs the shard SQL, and returns a result stream. This RPC shares the same port (default 5082) with the gRPC node service and ingest service.
Local scan-RPC calls within the cluster are unauthenticated; only federated / remote-cluster calls use an optional Bearer token.
cluster_nodes table schema: node_id VARCHAR(64) PK, role VARCHAR(32), advertise_addr VARCHAR(255), started_at_micros BIGINT, last_heartbeat_at_micros BIGINT, with indexes on role and last_heartbeat_at_micros.

External dependencies and ports

PostgreSQL

The metadata database: FileMeta, streams, rules, incidents, users, orgs, audit, quotas, certificates, cluster_nodes, search_jobs, remote_clusters, and more. [store.meta]: backend (default sqlite; set to postgres for production), dsn, max_connections (default 16). Migrations are embedded at compile time.

Object store

Columnar files + search-index sidecars. [store.object].backend: local (default, root=./data/objects) / s3 (including MinIO, R2, Alibaba Cloud OSS, overridden via endpoint) / azure / gcs. Credential precedence: env vars > credential file > inline TOML.

No external cache / consensus

Only in-process LRU+TTL caches, the rate limiter, and the async runtime. No Redis / Memcached, no external consensus component.

Rendering dependency (optional)

PNG/PDF rendering for scheduled reports requires headless Chromium and is feature-gated; OSS builds always fall back to an SVG placeholder. The image’s runtime layer already ships with chromium.

Listening ports

PortProtocol / serviceConfig keyNotes
5080HTTP[http].bind (default 0.0.0.0), [http].port/api/v1/*, /metrics, /healthz, /readyz, /.well-known/acme-challenge/, etc.
5082gRPC[grpc].bind, [grpc].port, max_message_size_mb (default 32MB)Ingest + scan RPC + cluster heartbeat, three services sharing one port
80Plaintext HTTP in TLS mode[http.tls].plain_portHealth checks + ACME HTTP-01 challenge + 301 redirect to HTTPS
443HTTPS in TLS mode (SNI)[http.tls].portFull routing; requires the domain-acme-tls feature
/metrics (GET, Prometheus text 0.0.4) is controlled by [telemetry].metrics_enabled (default true) and exposes metrics such as cache hits / invalidations, object_store_*, wal_*, file_meta_dump_*, tantivy_pruned_files_total, alert_rule_eval_timeout_total.

Health / readiness probe semantics

ProbePath200 condition503 conditionPurpose
LivenessGET /healthzWAL replay complete (Ingester-relevant only; other roles bypass it) and the object store is not degradedreplay in progress or object store connection lostProcess liveness and dependency health
ReadinessGET /readyzreplay_done = true (even if the object store is degraded)replay still in progressLets K8s keep admitting read traffic while the write path is degraded
1

Object store probe at startup (blocking)

startup_ping() synchronously performs a PUT→GET→DELETE (128 bytes) against _health/{uuid}.probe. On failure, the process does not start.
2

Background periodic probe

The same round-trip runs every [store.object].health_probe_interval_secs (default 30s); only 3 consecutive failures set object_store_degraded=true, and a success resets the counter to zero.
3

Ingester WAL replay

At startup the Ingester scans the segment files under [wal].dir, replays them into the in-memory buffer by (org, stream_type, stream), forces a single flush once everything is loaded, then sets replay_done=true. /readyz returns 503 until replay completes.

Authentication and config overrides

  • The JWT secret is auto-bootstrapped by the database on first startup; the old jwt_secret TOML field is deprecated (parsing is retained only for backward compatibility with old configs). To pin a fixed secret, use the environment variable MS_AUTH_JWT_SECRET_OVERRIDE.
  • API tokens take the form ms_<prefix>_<secret>, with the secret stored as an argon2id hash.
  • At startup, fields such as store.meta.dsn, wal.dir, http.port, grpc.port, node.id are treated as immutable; changing them at runtime triggers a warning.

Deployment topologies

All topologies share the same image (e.g. molesignal:dev), distinguished by MS_NODE.ROLES. The web frontend is a separate nginx image (e.g. molesignal-web:dev).
A single process with all roles combined; the fastest way to get started.
[node]
roles = ["standalone"]

[store.meta]
backend = "postgres"
dsn = "postgres://molesignal:molesignal@localhost:5432/molesignal"

[store.object]
backend = "local"
root = "./data/objects"

[wal]
dir = "./data/wal"
molesignal --config ./conf/config.toml
# HTTP :5080  gRPC :5082
The local backend (store.object.backend=local) is fine for development; use an object store backend for production.

Scaling and high availability

Router — scales horizontally

Stateless; scale replicas with entry traffic. The rate limiter is in-process and ephemeral: under multiple replicas each replica counts independently, so the effective org QPS ceiling is roughly configured value × replica count. When needed, move rate limiting up to a unified gateway or lower the per-replica threshold.

Querier — scales horizontally

Stateless. Distributed scans only trigger with ≥2 peers. Note: the standalone querier’s scan-RPC server is currently a placeholder, so the fully landed form of distributed query is the in-process engine of standalone. The bottleneck is columnar-file reads + query-engine memory.

Ingester — stateful

Use a StatefulSet + a per-replica PVC for a persistent WAL. The Router places via consistent hashing over org|stream; scaling changes the mod result and causes rebalancing, so ensure the WAL is flushed before scaling down. Leave enough of a replay window for the readiness probe (the manifest sets a 10s initial delay).

Compactor / AlertManager — single instance

Both are recommended at a single replica: multiple Compactor instances would produce merge conflicts (a lease-table lock is planned for the future), and AlertManager’s evaluation / incident state is in-process. Reliability comes from fast restarts rather than multiple replicas.
Recommended metrics to monitor: for the write path, watch wal_append_lock_wait_seconds, wal_append_inflight, wal_fsync_errors_total, file_meta_dump_*; for the object store, object_store_operations_total, object_store_errors_total, object_store_op_duration_seconds, object_store_probe_*; for queries, the cache hit rate cache_* and tantivy_pruned_files_total; for alerts, alert_rule_eval_timeout_total. Combine /healthz, /readyz, and the cluster_nodes table to observe member liveness.

Minimal production checklist / validation checklist

1

External dependencies ready

Postgres reachable, [store.meta].backend = "postgres" with a correct dsn; choose an object store backend of s3/azure/gcs (not local), with credentials injected in the precedence order “env vars > credential file > inline”.
2

Roles and interconnect addresses

Each process has an explicit MS_NODE.ROLES; set MS_CLUSTER.ADVERTISE_ADDR to a peer-reachable host:5082 (K8s uses $(POD_IP):5082). Confirm only a single foreground role is set (a multi-role array only takes the first element).
3

Ingester persistence

Run the Ingester as a StatefulSet + PVC mounting the WAL directory; set [wal].flush_strategy / sync_level according to durability requirements; leave enough replay delay for the readiness probe.
4

Single-instance roles

Keep Compactor and AlertManager at 1 replica each. Confirm [compactor].interval_secs, retention_days, and [storage.file_meta_dump].enabled match expectations.
5

Entry point and rate limiting

Put an LB in front of the Router; set [router.rate_limit].ingest_qps / query_qps per org; note that rate limiting is approximate under multiple replicas. Disable buffering for /api/v1/query/stream on the Ingress.
6

Observability and probes

Wire /metrics into Prometheus; point K8s probes at /api/v1/healthz (liveness) and /readyz (readiness). Confirm the object store startup probe passes (otherwise the process will not start).
7

Security and license

Set MS_AUTH_JWT_SECRET_OVERRIDE when you need a fixed JWT secret; otherwise it is auto-bootstrapped by the DB. Inject a cipher key (do not use the dev all-zeros value in production). Federated query requires the federated_search license feature; OSS does not support TLS/ACME (which requires building with the domain-acme-tls feature), and automatic certificate issuance/renewal is currently a placeholder and not enabled.
Before going live, be clear about the following “current state” boundaries: the standalone querier role is a placeholder (the distributed scan-RPC server is incomplete); a multi-role array only takes effect for the first element; ACME issuance/renewal is a placeholder and not spawned; the distributed consensus WAL term source is still a static placeholder (multi-node consensus is not implemented); federated query and SSO (OIDC only; SAML is not implemented), among others, are license/feature-gated.