Storage
Columnar file I/O, splayed tables, date-partitioned storage, symbol table persistence, and CSV import/export — everything for getting data in and out of Rayforce.
Columnar .col Files
The .col format is Rayforce's native binary representation for a single vector. Each file stores a 32-byte header followed by the raw element data and an optional null bitmap. The format is designed for direct memory mapping — no deserialization needed.
File Structure
/*
* .col file layout:
*
* Bytes 0-31: ray_t header (type, attrs, len, etc.)
* Bytes 32-N: element data (len * elem_size bytes)
* Bytes N-M: null bitmap (if RAY_ATTR_HAS_NULLS + RAY_ATTR_NULLMAP_EXT)
* — (len + 7) / 8 bytes
*/
C API
| Function | Description |
|---|---|
ray_col_save(vec, path) |
Write a vector to a .col file. Handles slices, external null bitmaps, and string pools transparently. |
ray_col_load(path) |
Read a .col file into a heap-allocated vector. The file is read entirely into memory. |
ray_col_mmap(path) |
Memory-map a .col file for zero-copy access. The returned vector points directly into the mapped file pages. Ideal for large datasets that exceed available RAM. |
// Save a vector
ray_t* prices = ray_vec_from_raw(RAY_F64, data, 1000000);
ray_err_t err = ray_col_save(prices, "db/trades/price.col");
// Load into memory
ray_t* loaded = ray_col_load("db/trades/price.col");
// Memory-map for zero-copy access
ray_t* mapped = ray_col_mmap("db/trades/price.col");
// mapped->data points into file pages — no allocation
mmod = 1 in their header, distinguishing them from heap-allocated vectors. The buddy allocator skips them during free. Slices of mmap'd vectors retain a reference to the parent mapping.
Splayed Tables
A splayed table stores each column as a separate .col file in a directory. This is the standard on-disk representation for a Rayforce table. The schema (column names and types) is stored alongside the data files.
Directory Layout
db/trades/
.schema.col — I64 vector of column name symbol IDs
sym.col — SYM column (stock tickers)
price.col — F64 column (trade prices)
qty.col — I64 column (quantities)
time.col — TIMESTAMP column
C API
| Function | Description |
|---|---|
ray_splay_save(tbl, dir, sym_path) |
Save a table as a splayed directory. Each column becomes a .col file named after its column symbol. Pass sym_path to also save the symbol table. |
ray_splay_load(dir, sym_path) |
Load a splayed table from a directory. Columns are memory-mapped by default. Pass sym_path to load the associated symbol table. |
// Save table to disk
ray_err_t err = ray_splay_save(table, "db/trades", "db/sym");
// Load table (columns are mmap'd)
ray_t* trades = ray_splay_load("db/trades", "db/sym");
; Rayfall: save and load splayed tables
ray> (splay-save t "db/trades")
ray> (set trades (splay-load "db/trades"))
Date-Partitioned Tables
For large time-series datasets, Rayforce supports date-partitioned storage. Data is split into directories named by date, each containing a splayed table for that day's data.
Directory Layout
db/trades/
sym — shared symbol table
2024.01.15/
sym.col
price.col
qty.col
time.col
2024.01.16/
sym.col
price.col
qty.col
time.col
2024.01.17/
...
Loading Partitioned Data
The ray_part_load() function scans all partition directories, memory-maps every column file, and assembles them into a single logical table with parted columns and a virtual MAPCOMMON date column.
// C API: load all partitions
ray_t* trades = ray_part_load("db", "trades");
// The result is a single table with:
// - A MAPCOMMON 'date' column derived from directory names
// - Parted columns (RAY_PARTED_BASE + base_type) for each data column
// - All segments are memory-mapped — no data copy
; Rayfall: load partitioned table
ray> (set trades (part-load "db" "trades"))
; Filter on date — optimizer prunes partitions
ray> (select {from:trades where: (= date 2024.01.15)})
; Range filter — only relevant partitions are scanned
ray> (select {from:trades where: (and (>= date 2024.01.15) (<= date 2024.01.17))})
Partition Pruning
The query optimizer recognizes predicates on the MAPCOMMON column and eliminates entire partitions from the scan plan. This means a query filtering on a single date in a year of data only touches 1/365th of the files on disk — with zero per-row cost for the pruned partitions.
Symbol Table Persistence
The global symbol intern table maps strings to integer IDs. When saving data to disk, the symbol table must be persisted so that symbol vectors can be correctly interpreted when reloaded.
Append-Only .sym Files
Symbol files use an append-only format. New symbols are appended to the end of the file without rewriting existing entries. This makes concurrent writes safe and enables incremental updates.
| Function | Description |
|---|---|
ray_sym_save(path) |
Persist the current global symbol table to a .sym file |
ray_sym_load(path) |
Load a symbol table from disk, merging with any existing entries |
ray_sym_intern(str, len) |
Intern a string, returning its integer ID |
ray_sym_find(str, len) |
Look up a string without interning (returns -1 if absent) |
ray_sym_str(id) |
Resolve an ID back to its string |
// Save symbol table alongside data
ray_sym_save("db/sym");
// On startup, load symbols before loading data
ray_sym_load("db/sym");
Concurrency and Integrity
- File locking —
flock()on POSIX,LockFileEx()on Windows. Multiple processes can safely read the symbol file; writes acquire an exclusive lock. - Corruption detection — symbol files include checksums. If a file is truncated or corrupted,
ray_sym_load()returnsRAY_ERR_CORRUPTand leaves the in-memory table unchanged. - Arena backing — interned strings are allocated from a dedicated arena (
ray_arena_t), making bulk allocation fast and ensuring all strings are freed together when the symbol table is destroyed.
CSV Import and Export
Rayforce includes a high-performance CSV loader with parallel parsing, automatic type inference, and null handling. No external libraries are used — the parser operates directly on memory-mapped file contents.
Reading CSV Files
; Basic CSV load — auto-detect types, comma delimiter, header row
ray> (set data (read-csv "trades.csv"))
sym price qty date
----------------------------
AAPL 150.25 100 2024.01.15
GOOG 140.50 200 2024.01.15
MSFT 380.00 50 2024.01.15
C API
| Function | Description |
|---|---|
ray_read_csv(path) |
Load a CSV file with default options: comma delimiter, first row as header, automatic type inference, "" as null. |
ray_read_csv_opts(path, delim, header, null_str) |
Load with custom options: delimiter character, whether first row is a header, and null string representation. |
ray_write_csv(table, path) |
Write a table to a CSV file with header row and comma delimiter. |
// Default options
ray_t* data = ray_read_csv("trades.csv");
// Tab-delimited, no header, "NA" as null
ray_t* tsv = ray_read_csv_opts("data.tsv", '\t', false, "NA");
// Write results back
ray_write_csv(result, "output.csv");
Type Inference
The CSV loader samples values in each column and infers types in priority order:
- BOOL —
true/false,1/0 - I64 — integer values within 64-bit range
- F64 — floating-point values
- DATE —
YYYY.MM.DDorYYYY-MM-DDformat - TIMESTAMP — date + time with nanosecond precision
- SYM — short repeated strings (auto-interned as symbols)
- STR — fallback for everything else
Parallel Parsing
The CSV file is memory-mapped and split into chunks. Multiple threads parse chunks in parallel, with a merge step that reconciles column types and combines partial results. For large files (100 MB+), this delivers near-linear speedup with core count.
Null Handling
Empty fields and fields matching the null string (default: "") are recognized as null values. The loader sets the appropriate null bitmap bits on the resulting vectors and marks them with RAY_ATTR_HAS_NULLS.
Symbol Merge
When loading CSV data with symbol columns, the loader interns all unique strings into the global symbol table. If a symbol table was previously loaded from disk, existing IDs are preserved and new symbols are appended.
Cross-Platform File I/O
All file operations go through a portable abstraction layer in src/store/fileio.{h,c} that handles platform differences:
| Feature | POSIX | Windows |
|---|---|---|
| File locking | flock() | LockFileEx() |
| Sync to disk | fsync() | FlushFileBuffers() |
| Atomic rename | rename() | MoveFileEx(MOVEFILE_REPLACE_EXISTING) |
| Memory mapping | mmap() | CreateFileMapping() + MapViewOfFile() |
fsync(), then atomically renames it to the target path. This prevents data corruption if the process is interrupted during a write.