Rayforce Rayforce ← Back to home
GitHub

Collections

Vectors, lists, tables, dictionaries, and selection bitmaps — the compound data structures that organize scalar values into queryable datasets.

Vectors

Vectors are the fundamental columnar data structure in Rayforce. A vector is a typed, contiguous array of scalar elements — every element shares the same type. Vectors are the columns inside tables and the operands in every DAG operation.

; I64 vector (integer literals)
ray> [1 2 3 4 5]
[1 2 3 4 5]

; F64 vector (float literals)
ray> [1.0 2.5 3.14]
[1.0 2.5 3.14]

; SYM vector (unquoted identifiers)
ray> [AAPL GOOG MSFT]
[AAPL GOOG MSFT]

; BOOL vector
ray> [true false true]
[1b 0b 1b]

Morsel Iteration

All vector processing in Rayforce happens in morsels — fixed-size chunks of 1024 elements. The executor never processes an entire column at once. Instead, it iterates morsel by morsel, which keeps data in L1/L2 cache and enables pipeline parallelism.

// C API: morsel iteration
ray_morsel_t m;
ray_morsel_init(&m, vec);

while (ray_morsel_next(&m)) {
    // m.base   — pointer to element data for this morsel
    // m.count  — number of elements (up to 1024)
    // m.selection — RAY_SEL bitmap (NULL = all pass)
    process_morsel(&m);
}
Why 1024? A morsel of 1024 int64 elements is 8 KB — fits comfortably in L1 cache on modern CPUs. This size was chosen to balance cache utilization against morsel scheduling overhead.

Null Handling

Nulls are tracked via a per-vector bitmap, not sentinel values in the data array. This means operations can use fast SIMD loops on the data and only check the null bitmap when needed.

// C API: null bitmap
ray_vec_set_null(vec, 3, true);   // mark index 3 as null
ray_vec_is_null(vec, 3);           // returns true
ray_vec_is_null(vec, 0);           // returns false

The bitmap is stored inline in the first 16 bytes of the ray_t header for vectors with up to 128 elements. Larger vectors use an external bitmap allocation (flagged with RAY_ATTR_NULLMAP_EXT). The RAY_ATTR_HAS_NULLS flag on the vector indicates whether any nulls exist at all — when clear, the bitmap is never checked.

COW Semantics

Vectors use copy-on-write (COW) reference counting. Multiple consumers can share the same vector via ray_retain(). Mutation goes through ray_cow(), which returns the same pointer if the reference count is 1, or a fresh copy if shared.

// C API: COW pattern
ray_retain(vec);           // rc: 1 → 2 (shared)

ray_t* writable = ray_cow(vec);
if (writable != vec) {
    // Got a fresh copy — vec is still shared
    // Must release writable on error paths
}
// Safe to mutate writable

Vector Operations

C Function Description
ray_vec_new(type, cap)Allocate an empty vector with capacity
ray_vec_append(vec, elem)Append one element (may reallocate)
ray_vec_set(vec, idx, elem)Set element at index
ray_vec_get(vec, idx)Get pointer to element at index
ray_vec_slice(vec, off, len)Zero-copy slice (shares data)
ray_vec_concat(a, b)Concatenate two vectors
ray_vec_from_raw(type, data, n)Create from existing data array
ray_str_vec_append(vec, s, len)Append a string to RAY_STR vector
ray_str_vec_get(vec, idx, &len)Get string at index

Lists

Lists are boxed, heterogeneous containers. Each element is a ray_t* pointer to any Rayforce object. Lists are the backbone of table column storage: a table's columns are held in a list.

; Create a list of mixed vectors
ray> (list [1 2 3] [A B C])
([1 2 3] [A B C])

; Lists can hold any type
ray> (list 42 "hello" [1 2])
(42 "hello" [1 2])
// C API: list operations
ray_t* lst = ray_list_new(4);          // initial capacity 4
lst = ray_list_append(lst, vec1);      // append element
lst = ray_list_append(lst, vec2);
ray_t* item = ray_list_get(lst, 0);    // get by index
lst = ray_list_set(lst, 0, new_item);  // replace at index

Lists have type = RAY_LIST (0) and store pointers in their data[] array. The len field tracks the number of elements.

Tables

A table is a collection of named column vectors, all the same length. Tables are the primary data structure for analytical queries — the target of select, update, joins, and aggregations.

; Create a table with column names and data
ray> (set t (table [sym price qty] (list [AAPL GOOG MSFT] [150.0 140.0 380.0] [100 200 50])))
sym  price  qty
-----------------
AAPL 150.0  100
GOOG 140.0  200
MSFT 380.0   50

; Query with select
ray> (select {from:t where: (> price 145.0)})
sym  price  qty
-----------------
AAPL 150.0  100
MSFT 380.0   50

Internal Structure

A table is a ray_t with type = RAY_TABLE (98). Internally it contains:

// C API: table construction
ray_t* tbl = ray_table_new(3);               // 3 columns
ray_table_add_col(tbl, sym_id, col_vec);      // add named column

// Access
ray_t* col = ray_table_get_col(tbl, sym_id); // by name (symbol ID)
ray_t* col = ray_table_get_col_idx(tbl, 0);  // by position
int64_t nr = ray_table_nrows(tbl);           // row count
int64_t nc = ray_table_ncols(tbl);           // column count
ray_t* sch = ray_table_schema(tbl);          // I64 vec of col name IDs

Column Name Management

C Function Description
ray_table_col_name(tbl, idx)Get symbol ID of column at index
ray_table_set_col_name(tbl, idx, id)Rename column at index
ray_table_schema(tbl)Get the full schema as an I64 vector

Dictionaries

Dictionaries are key-value structures built on top of lists. A dict is a RAY_LIST with the RAY_ATTR_DICT (0x02) attribute flag set. Keys and values alternate as list elements.

; Dictionary literal (curly braces)
ray> {name: "Alice" age: 30 active: true}
{name: "Alice" age: 30 active: true}

; Access by key
ray> (set d {x: 10 y: 20})
ray> (get d 'x)
10

Dictionaries are used extensively in Rayfall for passing named arguments to query forms like select and update:

; The select argument is a dictionary
ray> (select {from:t where: (> x 1) cols: {x:x x2: (* x x)}})

Selection Bitmaps (RAY_SEL)

A selection bitmap is a lazy filter representation used internally by the query optimizer and executor. Instead of materializing filtered rows into a new vector, Rayforce tracks which rows pass the filter as a compact bitmap.

Segment Flags

Selections are organized in segments matching the morsel size (1024 elements). Each segment carries a flag that enables fast short-circuiting:

Flag Constant Meaning
NONE RAY_SEL_NONE (0) All bits zero — skip entire morsel, no rows pass
ALL RAY_SEL_ALL (1) All bits one — process without checking bitmap
MIX RAY_SEL_MIX (2) Mixed bits — must check bitmap per row
Why lazy? Selection bitmaps avoid materializing intermediate results during predicate evaluation. The optimizer can push selections backward through OP_EXPAND chains (sideways information passing) and compose multiple predicates by ANDing bitmaps — all without copying data.

Block Layout

A RAY_SEL object has type = 14 and a variable-size layout in its data[] region:

// C API: bitmap manipulation
RAY_SEL_BIT_TEST(bits, row);   // test if row passes
RAY_SEL_BIT_SET(bits, row);    // mark row as passing
RAY_SEL_BIT_CLR(bits, row);    // mark row as filtered

// Convert a BOOL vector to a selection bitmap
ray_t* sel = ray_sel_from_pred(bool_vec);

Partitioned Columns

Partitioned tables split data across multiple segments (typically by date). Each column in a partitioned table uses a parted type that wraps multiple memory-mapped vector segments into a single logical column.

Type Encoding

Parted types are encoded as RAY_PARTED_BASE + base_type. For example, a partitioned I64 column has type 32 + 5 = 37. The base type is recovered with RAY_PARTED_BASETYPE(t).

Constant Value Description
RAY_PARTED_BASE32Base offset for parted types
RAY_MAPCOMMON64Virtual partition column (e.g., date)

MAPCOMMON

When loading a date-partitioned table, Rayforce creates a virtual RAY_MAPCOMMON column. This column does not store actual data — it derives values from the partition directory names (e.g., 2024.01.15/). Each row in a partition shares the same date value, so the MAPCOMMON column can represent millions of rows with zero per-row storage.

; Load a date-partitioned table
ray> (set trades (part-load "db" "trades"))

; The 'date' column is MAPCOMMON — derived from directory names
; Queries that filter on date trigger partition pruning
ray> (select {from:trades where: (= date 2024.01.15)})
Partition pruning: The optimizer recognizes filters on MAPCOMMON columns and eliminates entire partitions from the scan — skipping their memory-mapped segments entirely. A query filtering on a single date in a year of data reads only 1/365th of the files.