Skip to content

Compiler Architecture

Almide is a ~72,000-line pure-Rust compiler organized as a workspace of 9 crates. Dependencies: serde (AST serialization), toml (template loading), clap (CLI), lasso (string interning).

The compiler operates in two phases: build time (when the compiler itself is built) and run time (when your .almd source is compiled).

graph LR
  subgraph Inputs
    G["grammar/*.toml"]
    RR["runtime/rs/src/"]
    S["stdlib/defs/"]
  end
  B["build.rs"]
  subgraph "generated/"
    AT["arg_transforms.rs"]
    SS["stdlib_sigs.rs"]
    RRT["rust_runtime.rs"]
    TT["token_table.rs"]
  end
  G --> B
  RR --> B
  S --> B
  B --> AT
  B --> SS
  B --> RRT
  B --> TT
graph TD
  SRC[".almd source"] --> LEX["Lexer"]
  LEX --> PAR["Parser"]
  PAR --> AST["AST"]
  AST --> CHK["Type Checker"]
  CHK -->|"expr_types + env"| LOW["Lowering\n(AST → Typed IR)"]
  LOW --> NP["Nanopass Pipeline\n(target-specific rewrites)"]
  LOW --> WD["Direct WASM Emission"]
  NP --> TR["Template Renderer\n(TOML-driven)"]
  TR --> OUT[".rs"]
  WD --> WOUT[".wasm"]

  style SRC fill:#4f7cff,color:#fff
  style OUT fill:#d97706,color:#fff
  style WOUT fill:#7c3aed,color:#fff
  1. Lexer — Source text → token stream. Handles string interpolation, heredocs, 42 keywords.

  2. Parser — Recursive descent. Produces AST with span information. Error recovery with actionable hints.

  3. Type Checker — Constraint-based inference with eager unification. Resolves UFCS calls (xs.map(f)list.map(xs, f)).

  4. Lowering — AST + type information → Typed IR. Every expression carries its resolved type.

  5. Nanopass Pipeline — Target-specific structural rewrites on the IR (see below).

  6. Output — Rust target uses TOML-driven template rendering. WASM target emits binary directly.


All semantic decisions are made in the IR before any text is emitted. The walker never checks what target it’s rendering for.

Each pass receives &mut IrProgram and rewrites it structurally. Passes are composable and target-specific.

flowchart TD
  IR["Typed IR"]

  IR --> TC & WE

  subgraph Rust["Rust target"]
    direction TB
    TC["TypeConcretization"] --> BI["BorrowInsertion"]
    BI --> CI["CloneInsertion"]
    CI --> SL["StdlibLowering"]
    SL --> RP["ResultPropagation"]
    RP --> BL["BuiltinLowering"]
    BL --> FL1["FanLowering"]
  end

  subgraph WASM["WASM target"]
    direction TB
    WE["Direct Emission\n(linear memory, WASI)"]
  end

  FL1 --> RS[".rs"]
  WE --> WO[".wasm"]

  style IR fill:#4f7cff,color:#fff,stroke:#4f7cff
  style RS fill:#d97706,color:#fff,stroke:#d97706
  style WO fill:#7c3aed,color:#fff,stroke:#7c3aed
  style Rust stroke:#d97706,stroke-width:2px,color:#d97706
  style WASM stroke:#7c3aed,stroke-width:2px,color:#7c3aed
PassTargetWhat it does
StdlibLoweringRustModule { "list", "map" }Named { "almide_rt_list_map" }
ResultPropagationRustInsert Try { expr } (Rust ?) in effect fn
CloneInsertionRustInsert Clone based on use-count analysis
BoxDerefRustInsert Deref for recursive types through Box
BuiltinLoweringRustassert_eqRustMacro, printlnRustMacro
FanLoweringRustStrip auto-try from fan spawn closures
TailCallMarkWASMMark tail-recursive calls for return_call emission
ClosureConversionWASMLambda capture → explicit env struct passing

TOML files define syntax patterns. ~330 template rules for the Rust target.

# codegen/templates/rust.toml
[if_expr]
template = "if ({cond}) {{ {then} }} else {{ {else} }}"
[[power_expr]]
when_type = "Int"
template = "{left}.pow({right} as u32)"
[[power_expr]]
when_type = "Float"
template = "{left}.powf({right})"

All string rendering is done here — passes never produce text.


Inference

Constraint-based with eager unification. Walk AST → assign fresh type variables → collect constraints → unify → resolve.

UFCS

xs.map(fn) → checker finds builtin_module_for_type(List) = "list" → dispatches to list.map(xs, fn).

Key types in the type system:

Ty::Int | Ty::Float | Ty::String | Ty::Bool | Ty::Unit
Ty::List(Box<Ty>)
Ty::Map(Box<Ty>, Box<Ty>)
Ty::Option(Box<Ty>)
Ty::Result(Box<Ty>, Box<Ty>)
Ty::Record { fields: Vec<(Sym, Ty)> }
Ty::Variant { cases: Vec<VariantCase> }
Ty::Fn { params: Vec<Ty>, ret: Box<Ty> }
Ty::Tuple(Vec<Ty>)

crates/
├── almide-base/ Shared primitives
│ ├── diagnostic.rs Error/warning types with file:line + hint
│ ├── intern.rs String interning (lasso)
│ └── span.rs Source span types
├── almide-syntax/ Parsing
│ ├── ast.rs AST node types (serde-serializable)
│ ├── lexer.rs Tokenizer (42 keywords, interpolation)
│ └── parser/ Recursive descent parser
│ ├── entry.rs Top-level: program, imports, declarations
│ ├── declarations.rs fn, type, trait, impl, test
│ ├── expressions.rs Binary, unary, pipe, match, if/then/else
│ ├── primary.rs Literals, identifiers, lambdas
│ ├── statements.rs let, var, guard, assignment
│ ├── patterns.rs Match arm patterns
│ ├── types.rs Type expressions
│ └── hints/ Smart error hints for common mistakes
├── almide-types/ Type system
│ ├── types/ Ty enum, unification
│ └── stdlib_info.rs UFCS tables, auto-import lists
└── almide-frontend/ Type checking & lowering
├── check/ Constraint-based type inference + UFCS
├── lower/ AST + Types → IR lowering, VarId assignment
├── type_env.rs Scoped variables, functions, types, modules
└── stdlib.rs Stdlib signature registration

Every diagnostic includes structured information for both humans and tools:

error[E005]: argument 'xs' expects List[Int] but got String
at line 5
in call to list.sort()
hint: Fix the argument type
|
5 | let sorted = list.sort("hello")
| ^^^^^^^

Error codes

E001–E010 for programmatic consumption. Each code maps to a specific error category.

Source context

File:line:col location with source underline pointing to the exact span.

Actionable hints

Every error suggests a specific fix. The compiler is a repair tool, not a rejection tool.

Smart hints

Common mistakes from other languages get targeted suggestions: let mut → use var, && → use and, !x → use not x.