View Source cure_lexer (cure v0.1.0)

Cure Programming Language - Lexer

The lexer module provides tokenization services for Cure source code, converting raw text into a structured stream of tokens for the parser. It supports all Cure language constructs including keywords, operators, literals, and comments.

Features

Position Tracking: Every token includes precise line and column information
String Interpolation: Support for #{expr} string interpolation syntax
Multi-character Operators: Recognition of operators like ->, |>, ::, etc.
Comprehensive Literals: Numbers, strings, atoms, and boolean values
Keywords: All Cure language keywords including FSM constructs
Error Recovery: Detailed error reporting with location information

Token Types

The lexer recognizes the following token categories:

Keywords: def, fsm, match, when, etc.
Identifiers: Variable and function names
Literals: Numbers, strings, atoms, booleans
Operators: +, ->, |>, ::, ==, etc.
Delimiters: (), [], {}, ,, ;, etc.
Comments: Line comments starting with #

String Interpolation

Supports string interpolation with #{expression} syntax:

"Hello #{name}!"  % Tokenized as interpolated string

Error Handling

All tokenization errors include precise location information:

{error, {Reason, Line, Column}}

Summary

Functions

scan/1

token_type/1

Extract the type from a token record.

tokenize(Source)

Tokenize a string of Cure source code into a list of tokens.

tokenize_file(Filename)

Tokenize a Cure source file.

Functions

scan/1

-spec scan(string() | binary()) -> {ok, [tuple()]} | {error, term()}.

token_type/1

-spec token_type(#token{type :: atom(), value :: term(), line :: integer(), column :: integer()}) ->
                    atom().

Extract the type from a token record.

This utility function extracts the token type, which is useful for pattern matching and categorizing tokens in the parser.

Arguments

Token - Token record to extract type from

Returns

Atom representing the token type (e.g., def, identifier, number)

Examples

Token = #token{type = identifier, value = <<"add">>, line = 1, column = 5},
token_type(Token).
% => identifier

KeywordToken = #token{type = def, value = def, line = 1, column = 1},
token_type(KeywordToken).
% => def

tokenize(Source)

-spec tokenize(binary()) ->
                  {ok,
                   [#token{type :: atom(), value :: term(), line :: integer(), column :: integer()}]} |
                  {error, term()}.

Tokenize a string of Cure source code into a list of tokens.

This is the main entry point for lexical analysis. It processes the entire input and returns a list of tokens with position information.

Arguments

Source - Binary containing Cure source code to tokenize

Returns

{ok, Tokens} - Successful tokenization with list of token records
{error, {Reason, Line, Column}} - Tokenization error with location
{error, {Error, Reason, Stack}} - Internal error with stack trace

Token Record

Each token is a record with fields:

type - Token type atom (e.g., identifier, number, def)
value - Token value (e.g., variable name, number value)
line - Line number (1-based)
column - Column number (1-based)

Examples

tokenize(<<"def add(x, y) = x + y">>).
% => {ok, [
%      #token{type=def, value=def, line=1, column=1},
%      #token{type=identifier, value= <<"add">>, line=1, column=5},
%      #token{type='(', value='(', line=1, column=8},
%      ...
%    ]}

tokenize(<<"invalid \xff character">>).
% => {error, {invalid_character, 1, 9}}

Error Types

invalid_character - Unrecognized character in input
unterminated_string - String literal without closing quote
invalid_number_format - Malformed numeric literal
unterminated_comment - Block comment without proper termination

tokenize_file(Filename)

-spec tokenize_file(string()) ->
                       {ok,
                        [#token{type :: atom(), value :: term(), line :: integer(), column :: integer()}]} |
                       {error, term()}.

Tokenize a Cure source file.

This is a convenience function that reads a file from disk and tokenizes its contents. It handles file I/O errors and passes the content to the main tokenization function.

Arguments

Filename - Path to the .cure source file to tokenize

Returns

{ok, Tokens} - Successful tokenization with list of token records
{error, {file_error, Reason}} - File I/O error
{error, {Reason, Line, Column}} - Tokenization error with location

Examples

tokenize_file("examples/hello.cure").
% => {ok, [#token{type=def, ...}, ...]}

tokenize_file("nonexistent.cure").
% => {error, {file_error, enoent}}