Rayforce Rayforce ← Back to home
GitHub

Graph Storage

Compressed Sparse Row (CSR) format, double-indexed relationships, persistence to disk, and memory layout.

CSR Format

Rayforce stores graph edges in Compressed Sparse Row (CSR) format — a compact, cache-friendly representation that enables O(1) neighbor lookups and sequential scan of adjacency lists. CSR is the standard format for high-performance graph engines because it stores edges contiguously in memory, maximizing prefetch efficiency.

A CSR index consists of two arrays:

O(1) degree lookup. The degree of any node is simply offsets[i+1] - offsets[i]. No iteration required.

Visual Layout

Consider a graph with 4 nodes and 5 edges:

; Edges: 0->1, 0->2, 1->2, 2->3, 3->0

offsets:  [0, 2, 3, 4, 5]      ; 5 entries (n_nodes + 1)
targets:  [1, 2, 2, 3, 0]      ; 5 entries (n_edges)

Node 0: targets[0..2) = [1, 2]  ; neighbors of node 0
Node 1: targets[2..3) = [2]     ; neighbors of node 1
Node 2: targets[3..4) = [3]     ; neighbors of node 2
Node 3: targets[4..5) = [0]     ; neighbors of node 3

Sorted Adjacency Lists

When sorted == true, the targets within each adjacency list are stored in ascending order. This is required for OP_WCO_JOIN (Leapfrog TrieJoin), which relies on sorted lists for its O(N^{1/2}) intersection algorithm. Sorting is enabled by passing sort_targets = true when building the relationship.

The ray_csr_t Structure

The internal CSR representation is defined as:

typedef struct ray_csr {
    ray_t*    offsets;      /* I64 vec, length = n_nodes + 1                 */
    ray_t*    targets;      /* I64 vec, length = n_edges                     */
    ray_t*    rowmap;       /* I64 vec, length = n_edges (CSR pos -> prop row)*/
    ray_t*    props;        /* optional edge property table (ray_t RAY_TABLE) */
    int64_t  n_nodes;
    int64_t  n_edges;
    bool     sorted;       /* targets sorted per adjacency list             */
} ray_csr_t;

Key fields:

Double-Indexed CSR: ray_rel_t

A relationship (ray_rel_t) wraps two CSR indices — one for forward traversal (src → dst) and one for reverse (dst → src):

typedef struct ray_rel {
    uint16_t    from_table;   /* source table ID                */
    uint16_t    to_table;     /* destination table ID            */
    int64_t     name_sym;     /* relationship name (symbol ID)  */
    ray_csr_t    fwd;          /* src -> dst                     */
    ray_csr_t    rev;          /* dst -> src                     */
} ray_rel_t;

Having both directions pre-built allows the query engine to traverse edges in either direction without re-indexing at query time. The direction parameter on graph opcodes selects which CSR to use:

Building a Graph from Edge Lists

The primary API for constructing a relationship from a table of edges:

ray_rel_t* ray_rel_from_edges(
    ray_t*       edge_table,     /* table with src and dst columns */
    const char* src_col,        /* name of source column          */
    const char* dst_col,        /* name of destination column     */
    int64_t     n_src_nodes,    /* number of source nodes         */
    int64_t     n_dst_nodes,    /* number of destination nodes    */
    bool        sort_targets    /* sort adjacency lists?          */
);

There is also ray_rel_build for constructing from a foreign-key column:

ray_rel_t* ray_rel_build(
    ray_t*       from_table,     /* table containing foreign key   */
    const char* fk_col,         /* foreign key column name        */
    int64_t     n_target_nodes, /* number of target nodes         */
    bool        sort_targets    /* sort adjacency lists?          */
);

Complete C Example

Build a small social graph with 4 users and query neighbors:

/* Build an edge table: Alice(0)->Bob(1), Alice(0)->Carol(2),
 * Bob(1)->Carol(2), Carol(2)->Dave(3) */
ray_t* edges = ray_table_new(2);

/* Source column */
ray_t* src = ray_vec_new(RAY_I64, 4);
int64_t src_vals[] = {0, 0, 1, 2};
for (int i = 0; i < 4; i++)
    src = ray_vec_append(src, &src_vals[i]);

/* Destination column */
ray_t* dst = ray_vec_new(RAY_I64, 4);
int64_t dst_vals[] = {1, 2, 2, 3};
for (int i = 0; i < 4; i++)
    dst = ray_vec_append(dst, &dst_vals[i]);

/* Add columns to table */
int64_t src_sym = ray_sym_intern("src", 3);
int64_t dst_sym = ray_sym_intern("dst", 3);
edges = ray_table_add_col(edges, src_sym, src);
edges = ray_table_add_col(edges, dst_sym, dst);

/* Build double-indexed CSR with sorted adjacency lists */
ray_rel_t* rel = ray_rel_from_edges(edges, "src", "dst",
                                    4, 4, true);

/* Query neighbors of node 0 (Alice) in the forward direction */
int64_t count;
const int64_t* nbrs = ray_rel_neighbors(rel, 0, 0, &count);
/* nbrs = [1, 2], count = 2 (Bob and Carol) */

/* Query reverse neighbors of node 2 (Carol): who points to Carol? */
const int64_t* rev_nbrs = ray_rel_neighbors(rel, 2, 1, &count);
/* rev_nbrs = [0, 1], count = 2 (Alice and Bob) */

/* Cleanup */
ray_rel_free(rel);
ray_release(edges);

Neighbor Lookup API

Two inline functions provide fast neighbor access without going through the DAG pipeline:

ray_csr_degree

static inline int64_t ray_csr_degree(ray_csr_t* csr, int64_t node);

Returns the degree (number of outgoing edges) of the given node in O(1) time.

ray_csr_neighbors

static inline int64_t* ray_csr_neighbors(
    ray_csr_t* csr, int64_t node, int64_t* out_count);

Returns a pointer directly into the targets array for the given node's adjacency list. The pointer is valid as long as the CSR is alive. Writes the neighbor count to out_count.

ray_rel_neighbors

const int64_t* ray_rel_neighbors(
    ray_rel_t* rel, int64_t node,
    uint8_t direction, int64_t* out_count);

Higher-level API that selects the forward or reverse CSR based on direction (0=forward, 1=reverse).

Edge Properties

Edge properties (weights, labels, timestamps) are stored in a standard RAY_TABLE attached to the CSR via ray_rel_set_props:

ray_t* props = ray_table_new(1);
/* ... add a "weight" column with F64 values ... */
ray_rel_set_props(rel, props);

The rowmap vector in each CSR maps edge positions back to property table rows, so graph algorithms like Dijkstra and A* can look up edge weights during traversal.

Persistence

Relationships can be saved to disk and loaded back, supporting both eager loading and memory-mapped I/O.

Save

ray_err_t ray_rel_save(ray_rel_t* rel, const char* dir);

Writes the forward and reverse CSR arrays (offsets, targets, rowmap) as .col files in the given directory. Returns RAY_OK on success.

Load (eager)

ray_rel_t* ray_rel_load(const char* dir);

Reads the .col files into memory and reconstructs the ray_rel_t. The loaded vectors use the standard Rayforce allocator and participate in COW ref counting.

Memory-mapped Load

ray_rel_t* ray_rel_mmap(const char* dir);

Maps the .col files directly into the address space without copying. The vectors are marked as mmod=1 (file-mmap), making them read-only. This is ideal for large graphs that exceed available RAM — the OS page cache handles eviction transparently.

mmap vs. eager load. Use ray_rel_mmap for graphs that are larger than available memory or when startup time matters. Use ray_rel_load when you need to mutate the graph or want predictable query latency without page faults.

Cleanup

void ray_rel_free(ray_rel_t* rel);

Releases all vectors in both the forward and reverse CSR, then frees the ray_rel_t itself.

Memory Layout

Since CSR arrays are standard ray_t vectors, they benefit from the same infrastructure as all other Rayforce data:

The total memory footprint of a CSR with N nodes and E edges is:

offsets:  (N + 1) * 8 bytes  +  32 bytes header
targets:  E * 8 bytes        +  32 bytes header
rowmap:   E * 8 bytes        +  32 bytes header  (if properties exist)

Total (without props): ~16E + 8N + 96 bytes

Integration with the DAG Pipeline

Once a ray_rel_t is built, it feeds into the operation graph for query execution. Graph opcodes like OP_EXPAND, OP_DIJKSTRA, and OP_PAGERANK receive the relationship as a parameter and operate directly on the CSR arrays during morsel-driven execution.

/* Build a DAG that expands neighbors of filtered nodes */
ray_graph_t* g = ray_graph_new(nodes_table);
ray_op_t* id_col    = ray_scan(g, "id");
ray_op_t* filter     = ray_filter(g, id_col, ray_lt(g, id_col, ray_const_i64(g, 100)));
ray_op_t* neighbors  = ray_expand(g, filter, rel, 0);  /* forward */
ray_t*    result     = ray_execute(g, neighbors);
ray_graph_free(g);

See Graph Algorithms for the full catalog of graph opcodes that operate on CSR storage.