Rayforce Rayforce ← Back to home
GitHub

Syntax & Types

Rayfall is a Lisp-like query language with prefix notation, rich scalar types, columnar vectors, and first-class tables. The parser produces ray_t objects directly with no separate AST.

Atoms

Atoms are scalar values. Rayfall supports a wide range of types, each with a distinct literal syntax.

Integers

64-bit signed integers by default. Suffixed variants available for narrower types.

42          ; i64 (default)
-17         ; negative
0           ; zero
1000000     ; no separators needed

Floats

64-bit IEEE 754 double-precision floating point.

3.14        ; standard float
-0.5        ; negative float
1e6         ; scientific notation
2.5e-3      ; small value

Booleans

true        ; boolean true (1b)
false       ; boolean false (0b)

Symbols

Symbols are interned identifiers used for column names, dictionary keys, and categorical data. Prefix with a single quote to create a literal symbol.

'AAPL       ; symbol atom
'price      ; used as column reference
'hello      ; any alphanumeric + hyphens

Strings

Double-quoted character sequences. Two internal representations: short strings (up to 12 bytes) stored inline, longer strings in a per-vector pool.

"hello"             ; inline short string
"hello world!"      ; still inline (12 bytes)
"a longer string"   ; pool-allocated

Dates

Date literals use dot-separated year.month.day format. Stored as days since epoch (i32).

2024.01.15  ; January 15, 2024
2023.12.31  ; December 31, 2023

Times

Time-of-day literals in HH:MM:SS.mmm format. Stored as milliseconds since midnight (i32).

09:30:00.000  ; 9:30 AM
14:15:30.500  ; 2:15:30.5 PM

Timestamps

Full date+time with nanosecond precision. Stored as nanoseconds since epoch (i64).

2024.01.15T09:30:00.000000000  ; date + time

GUIDs

128-bit globally unique identifiers.

(guid 0)   ; generate a new GUID

Null Values

Each type has a null sentinel: INT64_MIN for i64, NaN for f64, INT32_MIN for i32/date/time. Use nil? to test:

(nil? x)   ; true if x is null

Vectors

Vectors are homogeneous, typed, columnar arrays. Created with square brackets. The type is inferred from the first element.

[1 2 3 4 5]         ; i64 vector
[1.5 2.7 3.9]       ; f64 vector
[true false true]   ; boolean vector
[AAPL GOOG MSFT]    ; symbol vector
["hello" "world"]   ; string vector

Vector operations are morsel-driven, processing 1024 elements at a time for cache efficiency.

Vector Arithmetic

All arithmetic operators auto-map over vectors (marked FN_ATOMIC):

(+ [1 2 3] [10 20 30])   ; [11 22 33]
(* [1 2 3] 10)            ; [10 20 30] — scalar broadcast

Lists

Lists are heterogeneous collections of vectors. Created with the list function. Used as the data component of tables.

(list [1 2 3] [A B C])   ; list of two vectors

Tables

Tables are the core data structure in Rayforce. A table is a vector of column names paired with a list of column vectors. All column vectors must have the same length.

; Create a table with explicit column names
(set trades (table
  [sym price size]
  (list
    [AAPL GOOG MSFT]
    [150.5 2800.0 300.2]
    [100 50 200])))

; Access column names
(key trades)     ; [sym price size]

; Access column data
(value trades)   ; list of 3 vectors

Dictionaries

Dictionaries map keys to values. Created with the dict function or with {key: value} syntax in query contexts.

(set d (dict [a b c] [1 2 3]))
(get d 'a)    ; 1
(key d)       ; [a b c]
(value d)     ; [1 2 3]

Function Calls

Rayfall uses prefix (Polish) notation. Every expression is either an atom or a parenthesized list where the first element is the function:

(+ 1 2)           ; 3
(* (+ 1 2) 3)     ; 9 — nested
(sum [1 2 3])     ; 6
(count [1 2 3])   ; 3

Function Types

Built-in functions fall into three arity categories:

TypeArgumentsExamples
UnaryExactly 1sum, count, not, neg, type
BinaryExactly 2+, -, set, take, at
Variadic1 or moreif, do, fn, select, list

Function Flags

FlagBehavior
FN_ATOMICAuto-maps element-wise over vectors. (+ [1 2] [3 4]) yields [4 6].
FN_AGGRAggregation function. Reduces a vector to a scalar. (sum [1 2 3]) yields 6.
FN_SPECIAL_FORMArguments are not evaluated before being passed. Used by set, if, fn, select.

Quoting

The single quote ' prevents evaluation, creating a symbol atom. Useful for column references and dictionary keys:

'price            ; symbol, not a variable lookup
(quote (+ 1 2))   ; returns the unevaluated list (+ 1 2)

Comments

Line comments start with a semicolon and extend to the end of the line:

; This is a comment
(+ 1 2)  ; inline comment

Control Flow

Conditional: if

Evaluates the condition and returns the true or false branch. Supports if/then/else chaining:

(if (> x 0) "positive" "non-positive")

; Multi-branch
(if (> x 100) "high"
    (> x 50)  "mid"
              "low")

Sequential Execution: do

Evaluates multiple expressions in order, returning the last result:

(do
  (set x 10)
  (set y 20)
  (+ x y))    ; 30

Variable Binding: set and let

(set x 42)        ; global binding
(let y (+ x 1))   ; local binding

Error Handling: try / raise

(try
  (/ 1 0)           ; might error
  (fn [e] "caught")) ; handler receives error

(raise "custom error") ; throw an error

Lambdas & the VM

User-defined functions are created with fn. Lambdas compile lazily to bytecode and run in a stack-based computed-goto VM (ray_vm_t) with a 1024-slot program stack and return stack.

; Named function
(set square (fn [x] (* x x)))
(square 5)    ; 25

; Multi-expression body
(set clamp (fn [x lo hi]
  (if (< x lo) lo
      (> x hi) hi
              x)))

; Anonymous lambda passed to map
(map (fn [x] (* x 2)) [1 2 3])   ; [2 4 6]

The VM supports trap frames for try/raise error handling, ensuring exceptions unwind cleanly through compiled code.

Select & Update

The select and update builtins bridge to the Rayforce DAG executor. They accept a dictionary of options:

select

; Basic filter
(select {from: trades  where: (> price 100)})

; Project specific columns with expressions
(select {from: trades
         cols: {sym: sym  notional: (* price size)}})

; Group by with aggregation
(select {from: trades
         by:   {sym: sym}
         cols: {avg_price: (avg price)
                total_size: (sum size)}})

update

; Add a computed column
(update {from: trades
          cols: {notional: (* price size)}})

insert / upsert

; Insert new rows
(insert {into: trades
          values: (table [sym price size]
                    (list [TSLA] [250.0] [300]))})

C API

Rayforce exposes a single public header: include/rayforce.h. The core abstraction is ray_t — a 32-byte block header. Every object (atom, vector, list, table) is a ray_t with data following at byte 32.

Key Types

TypeDescription
ray_t32-byte universal block header for all objects
ray_err_tError code return type
ray_str_t16-byte string element (inline or pooled)
ray_csr_tCSR graph edge storage
ray_rel_tGraph relationship (forward + reverse CSR)
ray_arena_tBump allocator for bulk allocations
ray_vm_tBytecode VM for compiled lambdas

Error Handling

ray_t* result = ray_eval_str("(+ 1 2)");
if (RAY_IS_ERR(result)) {
    // handle error
}
// RAY_ERR_PTR() to create error pointers

Memory Management

Never use malloc/free. Use the Rayforce allocator:

ray_t* obj = ray_alloc(size);    // general allocation
ray_release(obj);                 // decrement refcount, free if zero
ray_retain(obj);                  // increment refcount
ray_t* copy = ray_cow(obj);      // copy-on-write

DAG & Execution

The execution pipeline builds a lazy DAG, optimizes it, then executes with fused morsel-driven processing:

// 1. Build lazy DAG
ray_t* g = ray_graph_new(df);
ray_t* filtered = ray_filter(g, predicate);
ray_t* projected = ray_project(g, filtered, cols);

// 2. Execute (optimizer runs automatically)
ray_t* result = ray_execute(g, projected);

Optimizer Passes

  1. Type inference — propagate types through the DAG
  2. Constant folding — evaluate compile-time-known expressions
  3. SIP (Sideways Information Passing) — propagate selection bitmaps backward through expand chains
  4. Factorize — avoid materializing cross-products with factorized vectors
  5. Predicate pushdown — move filters closer to data sources
  6. Filter reorder — cheapest filters first
  7. Fusion — merge adjacent operations into single morsel loops
  8. DCE (Dead Code Elimination) — remove unused DAG nodes

CSR Storage

Rayforce stores graph edges in double-indexed Compressed Sparse Row (CSR) format: one forward index (source to destination) and one reverse index (destination to source). Both indices are built simultaneously.

// Build CSR from edge list
ray_csr_t csr;
ray_csr_build(&csr, src_ids, dst_ids, n_edges);

// Persist to disk
ray_csr_save(&csr, "edges.col");

// Memory-map for zero-copy access
ray_csr_mmap(&csr, "edges.col");

Graph Algorithms

Available as DAG opcodes, all integrated into the same morsel-driven pipeline:

OpcodeAlgorithmDescription
OP_EXPAND1-Hop ExpandFollow edges one step from source nodes
OP_VAR_EXPANDBFSVariable-length path expansion (breadth-first)
OP_SHORTEST_PATHShortest PathSingle-source shortest paths
OP_ASTARA*Heuristic-guided shortest path
OP_K_SHORTESTYen's K-ShortestK shortest loopless paths
OP_WCO_JOINLFTJWorst-case optimal join (Leapfrog Triejoin)
OP_BETWEENNESSBrandesBetweenness centrality
OP_CLOSENESSClosenessCloseness centrality
OP_CLUSTER_COEFFClusteringLocal clustering coefficients
OP_RANDOM_WALKRandom WalkRandom walks on graph
OP_MSTKruskalMinimum spanning tree

Pipeline & Optimizer

The full execution pipeline:

Rayfall source
  |  parse (ASCII dispatch table, recursive descent)
  v
ray_t objects (no separate AST)
  |  ray_eval() / bytecode VM
  v
Lazy DAG construction
  |  ray_graph_new() -> ray_scan/ray_add/ray_filter/...
  v
Optimizer (8 passes)
  |  type inference -> constant fold -> SIP -> factorize
  |  -> predicate pushdown -> filter reorder -> fusion -> DCE
  v
Fused morsel-driven executor
  |  bytecode over register slots, 1024 elements per morsel
  v
Result (ray_t)

Memory Model

Files & Partitions