Skip to main content

Sovereign Script Compiler Internals

The compiler is a classic 3-stage pipeline: source → tokens → AST → Python. It lives in sovereign_lang/ (3 files, ~870 LOC total).

Pipeline

Source Code            Lexer              Parser           CodeGen
───────────── → ───────────── → ───────────── → ─────────────
"pipeline x { [PIPELINE, ID, Pipeline( "def pipeline_x():
let y = 42 LET, ID, ASSIGN, name="x", y = 42
}" NUMBER, ...] body=[Let(...)]) _result = pipeline_x()"

Stage 1: Lexer (lexer.py — 213 LOC)

The tokenizer converts raw source text into typed tokens. Single-pass, character-by-character.

Token Types (40+)

CategoryTokens
Keywordspipeline, let, if, else, fn, return, for, in, while, true, false, and, or, not
LiteralsSTRING, NUMBER, IDENTIFIER
Operators+, -, *, /, %, =, ==, !=, <, >, <=, >=, |>
Delimiters(, ), {, }, [, ], ,, ., :
SpecialNEWLINE, EOF

Features

  • String escape sequences (\n, \t, \\, \", \')
  • Single-line comments (// comment)
  • Decimal numbers (3.14)
  • Two-character operator lookahead (|>, ==, !=, <=, >=)

Stage 2: Parser (parser.py — 454 LOC)

Recursive descent parser. Produces an AST from the token stream.

AST Node Types (20)

NodeFieldsExample
Pipelinename, bodypipeline x { ... }
Letname, valuelet x = 42
Ifcondition, then_body, else_bodyif x > 10 { ... }
Fnname, params, bodyfn add(a, b) { ... }
Returnvaluereturn x + 1
Forvar, iterable, bodyfor item in list { ... }
Whilecondition, bodywhile x < 10 { ... }
Callname, argsprint("hello")
MethodCallobject, method, argslist.map(fn)
BinOpop, left, rightx + y
UnaryOpop, operandnot x
Pipeleft, rightdata |> transform
Stringvalue"hello"
Numbervalue42
Boolvaluetrue
Identifiernamex
Arrayelements[1, 2, 3]
Indexobject, indexlist[0]
Assignname, valuex = 42

Operator Precedence (Low → High)

Pipe (|>)
→ Or (or)
→ And (and)
→ Comparison (==, !=, <, >, <=, >=)
→ Addition (+, -)
→ Multiplication (*, /, %)
→ Unary (not, -)
→ Postfix (calls, index, method)
→ Primary (literals, identifiers, arrays)

Stage 3: Code Generator (codegen.py — 202 LOC)

Translates the AST into executable Python. Two key features:

Prelude Injection

Every compiled pipeline gets a prelude with built-in functions:

# Auto-injected builtins
cx_remember(content, tags, importance) # Store memory
cx_recall(query, limit) # Search memories
cx_has_seen(target) # Check if memory exists
sov_emit(event, data) # Queue event emission
sov_run(pipeline_name) # Execute another pipeline

The prelude also imports safe standard library modules: json, time, math, re, hashlib, datetime, random.

Pipe Desugaring

The |> operator is syntactic sugar for function application:

data |> transform |> output

Compiles to:

output(transform(data))

Pipeline Auto-Execution

Pipelines compile to a function definition + immediate call:

# Sovereign Script:
pipeline scanner { print("scanning") }

# Compiled Python:
def pipeline_scanner():
print("scanning")

# Auto-execute pipeline
_result = pipeline_scanner()

Using the Compiler Directly

from sovereign_lang import parse, generate
from sovereign_lang.lexer import tokenize

source = 'pipeline test { let x = 42\n print(x) }'

# Stage 1: Tokenize
tokens = tokenize(source)

# Stage 2: Parse
ast = parse(tokens)

# Stage 3: Generate
python_code = generate(ast)

Next: API Reference →