← All posts

Cure v0.31.0 :: Specialise & Steer

by Aleksei Matiushkin

release optimiser monomorphisation pgo smt diagrams

v0.30.0 admitted that we did not trust each other with the full picture, so it gave us john -- the panoramic diagnostic that just shows everything. With the operator caught up to the language, v0.31.0 can finally turn to the language itself, and to a corner of it that has been tagged ## Future since v0.24.0.

This release is about not paying.

The pact at the heart of polymorphism

fn id(x: T) -> T = x is one of the most beautiful five-token programs you can write. It also lies a little. The compiler accepts id once and for all, but every call site that uses id pays the price of the lie: a generic dispatch, a generic body, a generic sub-expression that has to keep its options open in case T turns out to be the next thing.

In practice, T is almost never the next thing. It is Int here and String there, and the compiler is the only one in the room pretending otherwise. Specialisation is the polite way to stop pretending.

Monomorphisation

Cure.Optimizer.Monomorphise is a new MetaAST pass wired in front of Cure.Optimizer.Inline. It walks the module body, identifies function_def nodes whose declared signature mentions {:type_var, _}, and for every call site whose arguments pin those variables to concrete types, it synthesises a specialised clone:

mod Specialise

  fn id(x: T) -> T = x

  fn use_int() -> Int = id(42)
  fn use_str() -> String = id("hi")

After the optimiser has finished, the module also exports id/1 unchanged -- out-of-unit callers, the protocol registry, and dynamic apply/3` keep working exactly as before -- and privately holds two fresh definitions:

id__223a2c(x) -> x.    %% T -> Int
id__17522e(x) -> x.    %% T -> String

The mangled suffix is the lower-case 6-character base-16 of :erlang.phash2/1 of the canonicalised substitution. The hash is stable across recompiles: the same (name, substitution) pair always maps to the same suffix, so cross-build artifacts and tooling stay in sync.

The two id__* functions look pointless on their own -- they are verbatim copies of the original body -- and that is the whole point. The inliner, which runs immediately afterwards, has been waiting for this. It now sees a one-expression non-recursive function with a concrete signature and a single call site, and devours each clone in turn. The codegen quietly emits a per-module -compile({nowarn_unused_function, ...}). attribute for the :specialised_from clones so erl_lint does not complain about the ghosts the inliner just laid to rest.

The pact also applies to taxes

The "what if it's something else" tax shows up in BEAM bytecode, in JIT decisions, and -- the part I care about most -- in dependent types. A specialised body with concrete refinements is a body the SMT solver can actually reason about. v0.31.0 is the scaffolding; v0.32.0 will start extracting interest payments from it.

Budget

Specialisation is so seductive that it will happily blow up your bytecode if you let it. The pass therefore caps the number of unique specialisations at 16 per source function. When the cap is exceeded, the first 16 clones survive, the rest fall back to the original generic, and the type checker emits E064 Monomorphisation Budget Exhausted with a count of how many clones it dropped on the floor.

Tune it per project:

[compiler]
monomorph_budget = 32

Or one-off:

$ cure compile --monomorph-budget=32 lib/

Disable it entirely with --no-monomorphise. The pass is on by default whenever --optimize is in effect, which means the rest of the optimiser sees specialised signatures wherever the inputs are concrete enough.

Profile-guided optimisation

Specialisation is a good story when it can find a story to tell. Some functions are polymorphic-ish in shape but always called with a single concrete type; some are absolutely monomorphic and never hot; some are hot enough that the inliner's default ast_size <= 5 cap is embarrassingly small. The optimiser does not know which is which from source alone. It needs evidence.

Cure.PGO is the evidence pipeline. It has two halves.

Collection

Cure.PGO.Recorder is a GenServer over a public ETS table. The tick/* operations are direct ETS writes; they are also no-ops when the recorder is not running, so test fixtures and library users that import instrumented code do not have to care.

Cure.PGO.Recorder.tick(&lbrace;MyMod, :foo, 1&rbrace;)
Cure.PGO.Recorder.tick_with_us(&lbrace;MyMod, :bar, 0&rbrace;, 250)
Cure.PGO.Recorder.tick_branch(&lbrace;MyMod, :baz, 1&rbrace;, _site_id = 7)
Cure.PGO.Recorder.tick_smt(&lbrace;MyMod, :quux, 2&rbrace;, _queries = 4, 1500)

Cure.PGO.Recorder.flush/1 serialises one Cure.PGO.Profile per module to _build/cure/pgo/<mod>.profile as a compressed Erlang external term. Each entry stores :erlang.phash2/1 of the live function_def meta as :def_hash; the loader uses it to detect stale profiles whose underlying source has drifted.

The CLI plumbing wraps the recorder lifecycle:

$ cure run --record-profile lib/main.cure
$ cure profile show --top 5
PGO summary (_build/cure/pgo)
  hot:  3 function(s)
  cold: 41 function(s)
  threshold: 0.8
$ cure profile clear

Consumption

Cure.PGO.load("_build/cure/pgo") reads every *.profile and returns a %Cure.PGO{} summary partitioning per-MFA entries into a hot set (top 80% of total cost, with the threshold tunable) and a cold set. The optimiser pipeline takes that summary as a keyword option:

&lbrace;:ok, pgo&rbrace; = Cure.PGO.load("_build/cure/pgo")
Cure.Optimizer.optimize(ast, pgo: pgo, module: :"Cure.MyMod")

Cure.Optimizer.Inline then classifies each function_def:

  • Hot. Size cap relaxes to <= 12. The inliner becomes more aggressive.
  • Cold. Size cap tightens to <= 2. Effectively disables inlining; the call site keeps its frame.
  • Default. The pre-v0.31.0 cap of <= 5 stays put.

Cure.SMT.Solver.check_sat/3 accepts the same hint:

  • Hot. Doubles the Z3 timeout (6 s) and emits (set-option :smt.arith.solver 6) so flaky :unknowns on the hot path get a second look with the stronger arithmetic engine.
  • Cold. Halves the timeout (1.5 s) and conservatively returns :sat on :unknown. Cold-path queries can be wrong, but they are not allowed to be slow.
  • Default. Pre-v0.31.0 behaviour, unchanged.

Strictly opt-in

PGO is not a default. If you compile without --pgo, Cure.PGO is never consulted, even when _build/cure/pgo/ exists and is full. This is deliberate. The rule of thumb I have lived by since the v0.23.0 transparency-log story is that nothing should silently change the bytecode that ships. PGO is exactly the kind of feature that wants to break that rule. v0.31.0 says no.

$ cure compile lib/                # never touches profiles
$ cure compile --pgo lib/          # the only way in

Pattern-aware SMT, just enough to grow

Cure.SMT.Translator.generate_query/3 learnt a pgo_hint parameter: :hot | :cold | :default. Hot queries gain the (set-option :smt.arith.solver 6) prelude before everything else. Cold and default queries are byte-identical to v0.30.x.

The richer trigger emission (forall ... :pattern (...)) for refinement obligations is the natural next step, and it stays on the roadmap. v0.31.0 ships the seam, the budget, the timeout machinery, and the cold-path "may hold" semantics. Triggers move when there are real queries to attach them to.

ASCII diagrams

cure top (v0.27.0) gives you a runtime snapshot. cure doc ships Mermaid for FSMs, supervisors, and applications. There is one shape left between them: the static, copy-pasteable terminal panel, which is exactly what you want when your terminal cannot render Mermaid, when you are reviewing a PR with git show, or when you are explaining a supervision tree on a phone call.

Cure.Doc.Ascii reuses the FSM / sup / app traversals from the v0.27.0 Mermaid emitter and replaces the renderer with Unicode box-drawing:

$ cure draw lib/turnstile.cure
fsm Turnstile
=============

  *──> Locked

states:
  ▢ Locked
  ▢ Unlocked

transitions:
  Locked ──[coin]──> Unlocked
  Unlocked ──[push]──> Locked

Supervisors render as a vertical tree (├── / └──) with each child's id, module, and restart policy. Applications render as a panel with vsn, root supervisor, and declared applications. A --filter fsm|sup|app flag lets cure draw ignore the other container kinds when a file declares more than one.

What I left on the table

Two items moved out of v0.31.0 deliberately.

Typed hot code upgrades. A typed .appup story (cure release --upgrade-from, SMT-verified @migration functions, E057 / E058) was on the original v0.31.0 manifest. The appup / relup machinery is large enough, and fragile enough, that I want to give it a release of its own rather than smuggle a half-finished version into the optimisation release. Rescheduled for v0.32.0.

Automatic codegen instrumentation. v0.31.0 ships the data layer, the recorder, and the consumer; the codegen pass that injects a Cure.PGO.Recorder.tick/1 call at every function entry is the next step. Until it lands, users who want full PGO drive ticks manually, extend Cure.Profiler to bridge events into the recorder, or hand-craft profiles for high-traffic modules.

Numbers

  • Two new modules: Cure.Optimizer.Monomorphise, Cure.PGO, Cure.PGO.Profile, Cure.PGO.Recorder, Cure.Doc.Ascii. Plus surgical extensions to Cure.Optimizer, Cure.Optimizer.Inline, Cure.SMT.Translator, Cure.SMT.Solver, Cure.Compiler, Cure.Compiler.Codegen, and Cure.CLI.
  • New error code E064 Monomorphisation Budget Exhausted.
  • Two new docs: docs/MONOMORPHISATION.md and docs/PGO.md, both wired into the Hex documentation extras.
  • New example: examples/specialise.cure -- a small showcase that exercises four distinct substitutions across id/1 and pair_first/2.
  • 1691 tests passing. Zero credo issues across 275 source files in strict mode. Zero warnings.

What's next

The long-horizon list got shorter and a little sharper.

  • Typed hot code upgrades. cure release --upgrade-from to emit a typed .appup with SMT-verified state migrations (@migration, E057 / E058). Rescheduled from v0.31.0; the appup / relup machinery deserves its own cycle.
  • Automatic PGO instrumentation. Inject Cure.PGO.Recorder.tick/1 at every compiled function's entry so cure run --record-profile produces useful profiles by default, not by hand.
  • Pattern-trigger SMT encoding. v0.31.0 ships the pgo_hint parameter and the hot-path arithmetic-solver switch; richer forall ... :pattern emission for refinement obligations is the follow-up.
  • Cross-module monomorphisation. v0.31.0 specialises per compilation unit. A whole-program pass that threads concrete substitutions across modules is the obvious next step once we have a project-wide call graph.
  • Broader IDE support. First-class Helix and Zed configurations. An upgraded VS Code extension that tracks the current LSP surface.
  • REPL-level hot reload. Recompile-and-rebind on every file save for long-running REPL sessions, closing the loop between editing .cure files and the running REPL environment.

The repository lives at github.com/am-kantox/cure-lang. The new on-disk references are docs/MONOMORPHISATION.md and docs/PGO.md. The full CHANGELOG lists every touch point.