Syntax & Types
Rayfall is a Lisp-like query language with prefix notation, rich scalar types, columnar vectors, and first-class tables. The parser produces ray_t objects directly with no separate AST.
Atoms
Atoms are scalar values. Rayfall supports a wide range of types, each with a distinct literal syntax.
Integers
64-bit signed integers by default. Suffixed variants available for narrower types.
42 ; i64 (default)
-17 ; negative
0 ; zero
1000000 ; no separators needed
Floats
64-bit IEEE 754 double-precision floating point.
3.14 ; standard float
-0.5 ; negative float
1e6 ; scientific notation
2.5e-3 ; small value
Booleans
true ; boolean true (1b)
false ; boolean false (0b)
Symbols
Symbols are interned identifiers used for column names, dictionary keys, and categorical data. Prefix with a single quote to create a literal symbol.
'AAPL ; symbol atom
'price ; used as column reference
'hello ; any alphanumeric + hyphens
Strings
Double-quoted character sequences. Two internal representations: short strings (up to 12 bytes) stored inline, longer strings in a per-vector pool.
"hello" ; inline short string
"hello world!" ; still inline (12 bytes)
"a longer string" ; pool-allocated
Dates
Date literals use dot-separated year.month.day format. Stored as days since epoch (i32).
2024.01.15 ; January 15, 2024
2023.12.31 ; December 31, 2023
Times
Time-of-day literals in HH:MM:SS.mmm format. Stored as milliseconds since midnight (i32).
09:30:00.000 ; 9:30 AM
14:15:30.500 ; 2:15:30.5 PM
Timestamps
Full date+time with nanosecond precision. Stored as nanoseconds since epoch (i64).
2024.01.15T09:30:00.000000000 ; date + time
GUIDs
128-bit globally unique identifiers.
(guid 0) ; generate a new GUID
Null Values
Each type has a null sentinel: INT64_MIN for i64, NaN for f64, INT32_MIN for i32/date/time. Use nil? to test:
(nil? x) ; true if x is null
Vectors
Vectors are homogeneous, typed, columnar arrays. Created with square brackets. The type is inferred from the first element.
[1 2 3 4 5] ; i64 vector
[1.5 2.7 3.9] ; f64 vector
[true false true] ; boolean vector
[AAPL GOOG MSFT] ; symbol vector
["hello" "world"] ; string vector
Vector operations are morsel-driven, processing 1024 elements at a time for cache efficiency.
Vector Arithmetic
All arithmetic operators auto-map over vectors (marked FN_ATOMIC):
(+ [1 2 3] [10 20 30]) ; [11 22 33]
(* [1 2 3] 10) ; [10 20 30] — scalar broadcast
Lists
Lists are heterogeneous collections of vectors. Created with the list function. Used as the data component of tables.
(list [1 2 3] [A B C]) ; list of two vectors
Tables
Tables are the core data structure in Rayforce. A table is a vector of column names paired with a list of column vectors. All column vectors must have the same length.
; Create a table with explicit column names
(set trades (table
[sym price size]
(list
[AAPL GOOG MSFT]
[150.5 2800.0 300.2]
[100 50 200])))
; Access column names
(key trades) ; [sym price size]
; Access column data
(value trades) ; list of 3 vectors
Dictionaries
Dictionaries map keys to values. Created with the dict function or with {key: value} syntax in query contexts.
(set d (dict [a b c] [1 2 3]))
(get d 'a) ; 1
(key d) ; [a b c]
(value d) ; [1 2 3]
Function Calls
Rayfall uses prefix (Polish) notation. Every expression is either an atom or a parenthesized list where the first element is the function:
(+ 1 2) ; 3
(* (+ 1 2) 3) ; 9 — nested
(sum [1 2 3]) ; 6
(count [1 2 3]) ; 3
Function Types
Built-in functions fall into three arity categories:
| Type | Arguments | Examples |
|---|---|---|
| Unary | Exactly 1 | sum, count, not, neg, type |
| Binary | Exactly 2 | +, -, set, take, at |
| Variadic | 1 or more | if, do, fn, select, list |
Function Flags
| Flag | Behavior |
|---|---|
FN_ATOMIC | Auto-maps element-wise over vectors. (+ [1 2] [3 4]) yields [4 6]. |
FN_AGGR | Aggregation function. Reduces a vector to a scalar. (sum [1 2 3]) yields 6. |
FN_SPECIAL_FORM | Arguments are not evaluated before being passed. Used by set, if, fn, select. |
Quoting
The single quote ' prevents evaluation, creating a symbol atom. Useful for column references and dictionary keys:
'price ; symbol, not a variable lookup
(quote (+ 1 2)) ; returns the unevaluated list (+ 1 2)
Comments
Line comments start with a semicolon and extend to the end of the line:
; This is a comment
(+ 1 2) ; inline comment
Control Flow
Conditional: if
Evaluates the condition and returns the true or false branch. Supports if/then/else chaining:
(if (> x 0) "positive" "non-positive")
; Multi-branch
(if (> x 100) "high"
(> x 50) "mid"
"low")
Sequential Execution: do
Evaluates multiple expressions in order, returning the last result:
(do
(set x 10)
(set y 20)
(+ x y)) ; 30
Variable Binding: set and let
(set x 42) ; global binding
(let y (+ x 1)) ; local binding
Error Handling: try / raise
(try
(/ 1 0) ; might error
(fn [e] "caught")) ; handler receives error
(raise "custom error") ; throw an error
Lambdas & the VM
User-defined functions are created with fn. Lambdas compile lazily to bytecode and run in a stack-based computed-goto VM (ray_vm_t) with a 1024-slot program stack and return stack.
; Named function
(set square (fn [x] (* x x)))
(square 5) ; 25
; Multi-expression body
(set clamp (fn [x lo hi]
(if (< x lo) lo
(> x hi) hi
x)))
; Anonymous lambda passed to map
(map (fn [x] (* x 2)) [1 2 3]) ; [2 4 6]
The VM supports trap frames for try/raise error handling, ensuring exceptions unwind cleanly through compiled code.
Select & Update
The select and update builtins bridge to the Rayforce DAG executor. They accept a dictionary of options:
select
; Basic filter
(select {from: trades where: (> price 100)})
; Project specific columns with expressions
(select {from: trades
cols: {sym: sym notional: (* price size)}})
; Group by with aggregation
(select {from: trades
by: {sym: sym}
cols: {avg_price: (avg price)
total_size: (sum size)}})
update
; Add a computed column
(update {from: trades
cols: {notional: (* price size)}})
insert / upsert
; Insert new rows
(insert {into: trades
values: (table [sym price size]
(list [TSLA] [250.0] [300]))})
C API
Rayforce exposes a single public header: include/rayforce.h. The core abstraction is ray_t — a 32-byte block header. Every object (atom, vector, list, table) is a ray_t with data following at byte 32.
Key Types
| Type | Description |
|---|---|
ray_t | 32-byte universal block header for all objects |
ray_err_t | Error code return type |
ray_str_t | 16-byte string element (inline or pooled) |
ray_csr_t | CSR graph edge storage |
ray_rel_t | Graph relationship (forward + reverse CSR) |
ray_arena_t | Bump allocator for bulk allocations |
ray_vm_t | Bytecode VM for compiled lambdas |
Error Handling
ray_t* result = ray_eval_str("(+ 1 2)");
if (RAY_IS_ERR(result)) {
// handle error
}
// RAY_ERR_PTR() to create error pointers
Memory Management
Never use malloc/free. Use the Rayforce allocator:
ray_t* obj = ray_alloc(size); // general allocation
ray_release(obj); // decrement refcount, free if zero
ray_retain(obj); // increment refcount
ray_t* copy = ray_cow(obj); // copy-on-write
DAG & Execution
The execution pipeline builds a lazy DAG, optimizes it, then executes with fused morsel-driven processing:
// 1. Build lazy DAG
ray_t* g = ray_graph_new(df);
ray_t* filtered = ray_filter(g, predicate);
ray_t* projected = ray_project(g, filtered, cols);
// 2. Execute (optimizer runs automatically)
ray_t* result = ray_execute(g, projected);
Optimizer Passes
- Type inference — propagate types through the DAG
- Constant folding — evaluate compile-time-known expressions
- SIP (Sideways Information Passing) — propagate selection bitmaps backward through expand chains
- Factorize — avoid materializing cross-products with factorized vectors
- Predicate pushdown — move filters closer to data sources
- Filter reorder — cheapest filters first
- Fusion — merge adjacent operations into single morsel loops
- DCE (Dead Code Elimination) — remove unused DAG nodes
CSR Storage
Rayforce stores graph edges in double-indexed Compressed Sparse Row (CSR) format: one forward index (source to destination) and one reverse index (destination to source). Both indices are built simultaneously.
// Build CSR from edge list
ray_csr_t csr;
ray_csr_build(&csr, src_ids, dst_ids, n_edges);
// Persist to disk
ray_csr_save(&csr, "edges.col");
// Memory-map for zero-copy access
ray_csr_mmap(&csr, "edges.col");
Graph Algorithms
Available as DAG opcodes, all integrated into the same morsel-driven pipeline:
| Opcode | Algorithm | Description |
|---|---|---|
OP_EXPAND | 1-Hop Expand | Follow edges one step from source nodes |
OP_VAR_EXPAND | BFS | Variable-length path expansion (breadth-first) |
OP_SHORTEST_PATH | Shortest Path | Single-source shortest paths |
OP_ASTAR | A* | Heuristic-guided shortest path |
OP_K_SHORTEST | Yen's K-Shortest | K shortest loopless paths |
OP_WCO_JOIN | LFTJ | Worst-case optimal join (Leapfrog Triejoin) |
OP_BETWEENNESS | Brandes | Betweenness centrality |
OP_CLOSENESS | Closeness | Closeness centrality |
OP_CLUSTER_COEFF | Clustering | Local clustering coefficients |
OP_RANDOM_WALK | Random Walk | Random walks on graph |
OP_MST | Kruskal | Minimum spanning tree |
Pipeline & Optimizer
The full execution pipeline:
Rayfall source
| parse (ASCII dispatch table, recursive descent)
v
ray_t objects (no separate AST)
| ray_eval() / bytecode VM
v
Lazy DAG construction
| ray_graph_new() -> ray_scan/ray_add/ray_filter/...
v
Optimizer (8 passes)
| type inference -> constant fold -> SIP -> factorize
| -> predicate pushdown -> filter reorder -> fusion -> DCE
v
Fused morsel-driven executor
| bytecode over register slots, 1024 elements per morsel
v
Result (ray_t)
Memory Model
- Buddy allocator with thread-local arenas for contention-free allocation
- Slab cache for small, frequently-allocated objects
- COW ref counting —
ray_cow()returns a private copy only when the refcount exceeds 1 - Arena (bump) allocator (
ray_arena_t) for bulk short-lived allocations; blocks carryRAY_ATTR_ARENA, making retain/release no-ops - Per-VM heaps — each heap carries a
heap_id(u16); cross-heap frees enqueue to a lock-free LIFO, reclaimed viaray_heap_flush_foreign()
Files & Partitions
- Column files (
.col) — native binary format for vectors and CSR graphs, supports mmap - Sym table — global string intern table, arena-backed, append-only persistence with file locking
- CSV loader — mmap-based, parallel parse, automatic type inference, null handling, sym merge
- File I/O — cross-platform locking (flock/LockFileEx), fsync, atomic rename