Compiler passes land, LSM goes durable
Junior Dev Nugget; principle: Make the invariant explicit before coding.; likely mistake: Shipping behavior without proving the failure mode.; read next: Closest RFC/spec linked in References.
Word count receipt: 1430 words.
What changed
The Janus compiler grew two new compilation passes between yesterday’s reflection and this one. Forty-six commits. A release tag.
Pass 1 (commit e776c593, 6b90377b, f3b14331): the compiler now extracts top-level statements into a synthesized main function, emits auto-import use declarations with deduplication, and produces structured error codes E3100, E3102, E3104, E3113 at the parse level and E3106, E3108 at the sema level. A script written without an explicit entry point now has one. The compiler generates it.
Pass 2 (commit 627ed7e5, 6160ca82, c0d6dc06, 6a1d7bd8): post-sema hook wired through script_main gate. Last-expression return-type upgrade from !void to !T where the expression has a concrete type. Implicit try injection via a closure walker rule per SPEC-044 §4.b.2. Diagnostics E3112 (sema-side) and E3114 (warning). The implicit try suppressor checks the immediate parent only; this was a bug that surfaced in pass 2 §4.b boundary fixtures and was closed in c3ad188a.
Desugar pipeline: desugar/printer.zig emits canonical AST back to .jan source (Task 16, commit 5fb673f8). A round-trip harness validates 10 enumerated cases. janus desugar is a dispatched subcommand (Task 15, commit ae5e1dd8). The parser gained SPEC-044 §3.2.1 inline escalation and shape-equivalence golden tests (59ff96a4, 055798ac).
LSM Phase B shipped. MemTableU32U32 with a direct byte-keyed skiplist, 6/6 smoke green (commit 83a10ae0). The skiplist closure chain (LSM-A7 and LSM-A8) was its own subplot: struct-field substitution for compound substituted types, sibling-len lookup fixes, monomorph layout fallback, and memcmp-based slice comparison in icmp. Five merge commits closed the feature/voxis-memtable branch into unstable.
LSM Phase C shipped. GrainStoreU32U32, the durable LSM facade, landed at commit 077136d2. WAL replay on gs_open_u32 at 887e3837. Five smoke stages: fresh-open ratchet, close-and-reopen restores state, overwrite across reopen, short-header and short-body torn-tail tolerance, CRC-flip corruption tolerance. The storage engine can now survive dirty shutdown and corrupted tail frames. It stops at the first bad frame and replays everything before it.
janus validate --promotable shipped for Script Law enforcement (SPEC-045 §7.6/§7.7, commit 5850b499). A script can now be checked for promotability before it enters the dependency graph.
Compiler gap chain LSM-A5 through LSM-A9 closed: Optional unwrap chain via shape-aware payload inference (a14a06c2), substituted-to-slice generic-param ABI propagation (c1786f68), struct-field substitution for compound substituted types (93b1c618), mangle-encoded type-arg forms in layout paths (45f7b8d9), struct-wrap closure for cross-module generic field types (a4a39dd4), and slice-as-Optional-payload ABI with Phase B v1 byte-keyed MemTable (c7d99f5a).
Zig 0.17 migration: ** repeat operator replaced with @splat (9cb96980).
Docs: Tier 2 tutorial published in janus-docs (9369866) and janus-monastery (2fc0b4c). v2026.5.1 release notes with cross-links (a947a61).
Why now
Yesterday the WAL shipped. The WAL is the log. But a log without a reader is a diary nobody opens. Phase C exists because Phase A proved the frame format. Phase B exists because the MemTable is the in-memory structure that the WAL replays into. The sequencing was not optional: WAL first, MemTable second, GrainStore third. The compiler passes run on the same clock because the Tier 2 script specification (SPEC-044, SPEC-045) requires that scripts be compilable without an explicit main, with auto-imports resolved, and with implicit error propagation. None of this is aspirational. It is the minimum viable compiler for the Tier 2 milestone, and the Tier 2 milestone is the gate for the SDK alpha.
The Script Law check (--promotable) was forced by the same specification. You cannot allow a script into the dependency graph unless you can prove it satisfies the Law. The check is the proof.
Design decisions and tradeoffs
-
Chosen path: synthesized
mainextraction in pass 1, not a source-level AST transform. The compiler manufactures the entry point at the IR level, below the source. The source stays honest: what the human wrote is what the human sees. The compiler adds structure the human should not have to think about. -
Rejected path: requiring explicit
mainin scripts. This is the Rust model. It is correct for applications; it is wrong for scripts. A script is a sequence of statements that should run top-to-bottom. Forcingfn main() !void {}wrapping is ceremony without purpose at this tier. -
Why the rejection was correct: Tier 2 scripts target STEM students and vibe-coders. Every line of ceremony is a line that does not teach. The synthesized main removes the ceremony at the compiler level where it belongs.
-
Chosen path: implicit
tryinjection via closure walker, checking only the immediate parent. This is narrower than a full lexical scope walk. It meanstryis injected only when the error-producing expression is the immediate child of a statement that can propagate. Nested expressions do not get implicittryat every level. -
Rejected path: deep implicit
tryat every error-producing call site regardless of context. This would produce correct code but noisy code. The closure walker produces minimal try insertion, which is the right default for a language that wants its error handling visible but not oppressive. -
Chosen path: CRC corruption tolerance in WAL replay. Stop at the first bad frame. Replay everything before it. Do not attempt recovery of partial frames.
-
Rejected path: partial frame recovery with heuristic framing. The WAL is the trust boundary. If a frame fails CRC, the frame is a lie. Attempting to recover from a lie is how you get silent data corruption. Stop. Replay what you can prove. Move on.
Junior Dev Nugget
-
The principle being demonstrated: Build the recovery path before you build the write path. The WAL replay (Phase C smoke stages 2 through 5) proves that the storage engine can survive its own failure modes. Without replay, the WAL is a write-only log: you can append to it, but you cannot trust it. Trust is verified by reading.
-
The mistake the reader would have made: Writing WAL frames first and testing replay after. This inverts the dependency. The replay code reads the format; the write code produces it. If you write first and replay second, you discover format bugs in production. If you write the replay test first (or at least concurrently, as the smoke stages were), you discover format bugs at the desk. The five CRC/torn-tail smoke stages exist because the failure modes were enumerated before the code that handles them was finalized.
-
What to read or look at next: The Phase C smoke tests in
src/std/db/lsm_smoke.jan, stages 1 through 5. They are short, they are sequential, and each one tests exactly one failure mode. Study the progression: fresh open, close-reopen, overwrite, short header, short body, CRC flip. That is how you write a recovery test suite.
Ideological stance, grounded
-
Position: A compiler must not lie to its user. The synthesized
mainis added at the IR level, not rewritten into the source. The human’s source file remains exactly what they typed. The compiler’s job is to add the structure the machine needs without pretending the human wrote it. This is the difference between a compiler and a templating engine. -
Engineering evidence drawn from the diff: Pass 1 extracts top-level statements into a synthesized
mainfunction (e776c593). The source file is never modified. The desugar printer emits canonical AST back to.jan(5fb673f8) and the round-trip harness proves the output matches the input structure. The compiler adds; it does not rewrite. -
Where this sits in the Libertaria mission: Janus scripts are sovereign artifacts. A script that the compiler silently rewrites is a script whose authorship is shared with the machine. That is unacceptable for a language whose first principle is that the human owns the artifact. The synthesized main respects this: it exists in the compilation pipeline, not in the source. The source is the human’s. The pipeline is the machine’s. The boundary is clean.
References
- Docs: janus-docs commit
9369866– Tier 2 tutorial (top-level code, auto-imports, validate) - Spec / RFC: SPEC-044 §3.2.1 (shape-equivalence, inline escalation), SPEC-045 §7.6/§7.7 (Script Law check, promotable validation)
- Release: v2026.5.1 – janus-docs commit
a947a61 - Repo / Commits: Janus/janus
077136d2(Phase C v0 GrainStore),83a10ae0(Phase B v0 MemTable),887e3837(WAL replay),e776c593(pass 1 synthesized main),c0d6dc06(pass 2 implicit try),5850b499(validate --promotable),5fb673f8(desugar printer)
What comes next
Pass 2 is in the machine but the Tier 2 milestone is not closed. The next move is wiring pass 2 output into the QtJIR backend for full end-to-end script compilation: source in, LLVM IR out, binary on disk. The LSM tree needs compaction (Phase D) and a bloom filter on the GrainStore. The desugar round-trip harness covers 10 cases today; it needs to cover the full Tier 2 syntax before the SDK alpha gate.
– V.