Graph Algorithms

Graph opcodes covering traversal, shortest paths, centrality, community detection, similarity search, optimal joins, and spanning trees — all integrated into the DAG pipeline.

Graph algorithms are available through the C API. Many graph queries can also be expressed in Rayfall using the Datalog engine — see the Graph Queries Tutorial.

All graph algorithms operate on ray_rel_t relationships built from CSR storage. See Graph Storage for how to build and persist relationships.

Traversal

OP_EXPAND — 1-Hop Neighbor Expansion

Expands each input node to its direct neighbors in the CSR. The fundamental building block for all graph queries.

Property	Value
Opcode	`OP_EXPAND (80)`
Complexity	O(d) per node, where d = average degree
Use case	Neighbor lookup, 1-hop reachability, join-like expansions

/* C API */
ray_op_t* nbrs = ray_expand(g, src_nodes, rel, 0);  /* direction: 0=fwd */

Rayfall

In Rayfall, 1-hop expansion is a Datalog query with a single pattern. Pin the source entity to query its direct neighbors:

; Build a graph
(set db (datoms))
(set db (assert-fact db 0 'edge 1))
(set db (assert-fact db 0 'edge 2))
(set db (assert-fact db 1 'edge 2))

; 1-hop neighbors of node 0
(query db (find ?target)
  (where (0 :edge ?target)))
; => ?target: 1, 2

OP_VAR_EXPAND — Variable-Length BFS/DFS

Expands nodes through multiple hops with configurable minimum and maximum depth. Uses BFS by default. Optionally tracks the full path for each discovered node.

Property	Value
Opcode	`OP_VAR_EXPAND (81)`
Complexity	O(V + E) within the explored subgraph
Use case	Multi-hop reachability, friends-of-friends, transitive closure

/* BFS from start_nodes, depth 1..3, tracking paths */
ray_op_t* reachable = ray_var_expand(g, start_nodes, rel,
    0,    /* direction: forward */
    1,    /* min_depth */
    3,    /* max_depth */
    true  /* track_path */
);

Rayfall

Recursive Datalog rules express variable-length reachability. The path rule computes transitive closure — equivalent to BFS over all reachable nodes:

; Base case: direct edge
(rule (path ?x ?y) (?x :edge ?y))
; Recursive case: follow chains
(rule (path ?x ?z) (?x :edge ?y) (path ?y ?z))

; All nodes reachable from node 0
(query db (find ?y) (where (path 0 ?y)))
; => all transitively reachable nodes

OP_DFS — Depth-First Search

Explicit depth-first traversal from a source node, returning nodes in DFS visit order.

Property	Value
Opcode	`OP_DFS (94)`
Complexity	O(V + E) within explored subgraph
Use case	Topological processing, cycle detection, graph exploration

ray_op_t* dfs_order = ray_dfs(g, src_node, rel, 10); /* max depth 10 */

OP_RANDOM_WALK — Random Walk

Performs random walks from source nodes, selecting a uniformly random neighbor at each step. Used for node2vec-style embeddings and sampling.

Property	Value
Opcode	`OP_RANDOM_WALK (98)`
Complexity	O(L) per walk, where L = walk_length
Use case	Graph sampling, node2vec, DeepWalk embeddings

ray_op_t* walk = ray_random_walk(g, src_node, rel, 80); /* 80-step walk */

Shortest Path

OP_SHORTEST_PATH — BFS Shortest Path

Finds the shortest unweighted path between two nodes using bidirectional BFS.

Property	Value
Opcode	`OP_SHORTEST_PATH (82)`
Complexity	O(V + E)
Use case	Hop distance, unweighted routing, social distance

ray_op_t* path = ray_shortest_path(g, src, dst, rel, 20); /* max depth 20 */

Rayfall

Datalog reachability rules find all paths between two nodes. Pin both source and destination to check connectivity:

; Reuse the path rule from above
(rule (path ?x ?y) (?x :edge ?y))
(rule (path ?x ?z) (?x :edge ?y) (path ?y ?z))

; Is node 4 reachable from node 0?
(query db (find ?y) (where (path 0 ?y)))
; => includes 4 if reachable

Note: Datalog computes reachability (whether a path exists), not the actual shortest hop sequence. For weighted shortest paths or explicit path reconstruction, use the C API (ray_shortest_path, ray_dijkstra).

OP_DIJKSTRA — Weighted Shortest Path

Dijkstra's algorithm for weighted shortest paths. Reads edge weights from a property column on the relationship.

Property	Value
Opcode	`OP_DIJKSTRA (86)`
Complexity	O((V + E) log V) with binary heap
Use case	Weighted routing, cost-optimal paths, network flow

ray_op_t* path = ray_dijkstra(g, src, dst, rel,
    "weight",  /* edge weight column name */
    255        /* max depth */
);

OP_ASTAR — A* Shortest Path

A* search with a coordinate-based heuristic (Haversine distance). Requires node property columns for latitude and longitude.

Property	Value
Opcode	`OP_ASTAR (95)`
Complexity	O((V + E) log V), typically faster than Dijkstra with a good heuristic
Use case	Geospatial routing, map navigation, spatial networks

ray_op_t* path = ray_astar(g, src, dst, rel,
    "distance",   /* weight column */
    "lat",         /* latitude column */
    "lon",         /* longitude column */
    node_props,    /* node property table with lat/lon */
    255            /* max depth */
);

OP_K_SHORTEST — Yen's k-Shortest Paths

Finds the k shortest paths between two nodes using Yen's algorithm. Each successive path is the shortest path that differs from all previously found paths.

Property	Value
Opcode	`OP_K_SHORTEST (96)`
Complexity	O(kV(V + E) log V)
Use case	Route alternatives, network resilience, diverse path discovery

ray_op_t* paths = ray_k_shortest(g, src, dst, rel,
    "weight",  /* edge weight column */
    5          /* k = 5 shortest paths */
);

Centrality

OP_PAGERANK — PageRank

Iterative PageRank computation. Converges after max_iter iterations or when residuals fall below an internal threshold.

Property	Value
Opcode	`OP_PAGERANK (84)`
Complexity	O(I * (V + E)), where I = iterations
Use case	Node importance ranking, influence analysis, link analysis

ray_op_t* pr = ray_pagerank(g, rel,
    100,   /* max iterations */
    0.85   /* damping factor */
);

C API only. PageRank is an iterative numerical algorithm that operates directly on CSR arrays. It is not available as a Rayfall builtin. Use the C API shown above.

OP_DEGREE_CENT — Degree Centrality

Computes degree centrality for all nodes (normalized degree count).

Property	Value
Opcode	`OP_DEGREE_CENT (92)`
Complexity	O(V)
Use case	Hub identification, connectivity analysis

ray_op_t* dc = ray_degree_cent(g, rel);

OP_BETWEENNESS — Betweenness Centrality (Brandes)

Brandes' algorithm for betweenness centrality. Supports approximate computation via sampling to reduce cost on large graphs.

Property	Value
Opcode	`OP_BETWEENNESS (99)`
Complexity	O(VE) exact, O(SE) sampled (S = sample_size)
Use case	Bridge detection, information flow bottlenecks

ray_op_t* bc = ray_betweenness(g, rel, 0);   /* 0 = exact (all nodes) */
ray_op_t* bc_approx = ray_betweenness(g, rel, 100); /* sample 100 sources */

OP_CLOSENESS — Closeness Centrality

Closeness centrality measures how close a node is to all other reachable nodes. Supports sampling for large graphs.

Property	Value
Opcode	`OP_CLOSENESS (100)`
Complexity	O(VE) exact, O(SE) sampled
Use case	Network accessibility, facility placement

ray_op_t* cc = ray_closeness(g, rel, 0); /* exact */

Community Detection

OP_LOUVAIN — Louvain Community Detection

Modularity-based community detection using the Louvain method. Iteratively merges communities to maximize modularity.

Property	Value
Opcode	`OP_LOUVAIN (87)`
Complexity	O(V + E) per iteration, typically converges in a few passes
Use case	Community discovery, social clusters, network partitioning

ray_op_t* communities = ray_louvain(g, rel, 50); /* max 50 iterations */

OP_CONNECTED_COMP — Connected Components

Finds connected components using label propagation. Each node receives a component ID.

Property	Value
Opcode	`OP_CONNECTED_COMP (85)`
Complexity	O(V + E)
Use case	Graph partitioning, isolated subgraph detection, data lineage

ray_op_t* comp = ray_connected_comp(g, rel);

OP_CLUSTER_COEFF — Clustering Coefficients

Computes the local clustering coefficient for each node: the fraction of its neighbors that are also connected to each other.

Property	Value
Opcode	`OP_CLUSTER_COEFF (97)`
Complexity	O(V * d^2) where d = average degree
Use case	Network density, small-world analysis, triadic closure

ray_op_t* cc = ray_cluster_coeff(g, rel);

Worst-Case Optimal Joins

OP_WCO_JOIN — Leapfrog TrieJoin

Worst-case optimal join using Leapfrog TrieJoin (LFTJ). Finds triangles, k-cliques, and arbitrary pattern matches in the graph without materializing intermediate cross-products. Requires sorted adjacency lists in the CSR.

Property	Value
Opcode	`OP_WCO_JOIN (83)`
Complexity	O(E^{3/2}) for triangle listing (worst-case optimal)
Use case	Triangle counting, k-clique enumeration, pattern matching, motif detection

/* Find all triangles: nodes (a,b,c) where a->b, b->c, a->c */
ray_rel_t* rels[] = {rel, rel, rel};
ray_op_t* triangles = ray_wco_join(g,
    rels,   /* 3 relationships (can be same or different) */
    3,      /* n_rels */
    3       /* n_vars (a, b, c) */
);

Sorted CSR required. LFTJ relies on binary search within sorted adjacency lists. Always build relationships with sort_targets = true when using OP_WCO_JOIN.

Spanning Trees

OP_MST — Minimum Spanning Tree (Kruskal)

Computes the minimum spanning forest using Kruskal's algorithm with union-find. Returns the MST edges as a table.

Property	Value
Opcode	`OP_MST (101)`
Complexity	O(E log E)
Use case	Network backbone, minimal wiring, clustering via MST cuts

ray_op_t* mst = ray_mst(g, rel, "weight");

Vector Similarity

OP_COSINE_SIM — Cosine Similarity

Computes cosine similarity between a query vector and each row of an embedding column.

Property	Value
Opcode	`OP_COSINE_SIM (88)`
Complexity	O(N * D) where N = rows, D = dimension
Use case	Semantic search, recommendation, duplicate detection

float query[128] = { /* ... */ };
ray_op_t* sim = ray_cosine_sim(g, emb_col, query, 128);

OP_EUCLIDEAN_DIST — Euclidean Distance

Computes Euclidean (L2) distance between a query vector and each row of an embedding column.

Property	Value
Opcode	`OP_EUCLIDEAN_DIST (89)`
Complexity	O(N * D)
Use case	Spatial queries, clustering, anomaly detection

ray_op_t* dist = ray_euclidean_dist(g, emb_col, query, 128);

OP_KNN — Brute-Force K Nearest Neighbors

Finds the K nearest neighbors by exhaustive comparison. Returns the top-K rows sorted by distance.

Property	Value
Opcode	`OP_KNN (90)`
Complexity	O(N * D + N log K)
Use case	Exact nearest neighbor search on small to medium datasets

ray_op_t* neighbors = ray_knn(g, emb_col, query, 128, 10); /* top-10 */

OP_HNSW_KNN — HNSW Approximate KNN

Approximate K nearest neighbors using a pre-built HNSW (Hierarchical Navigable Small World) index. Orders of magnitude faster than brute-force for large datasets.

Property	Value
Opcode	`OP_HNSW_KNN (91)`
Complexity	O(D * log N) approximate
Use case	Large-scale semantic search, real-time recommendation, RAG retrieval

ray_op_t* neighbors = ray_hnsw_knn(g, hnsw_idx,
    query, 128,   /* query vector + dimension */
    10,            /* k */
    200            /* ef_search (beam width, higher = more accurate) */
);

C API only. HNSW index construction and approximate KNN search operate on raw float arrays and are not available as Rayfall builtins. Use the C API shown above for vector similarity workloads.

Ordering

OP_TOPSORT — Topological Sort (Kahn's)

Produces a topological ordering of a directed acyclic graph using Kahn's algorithm. Returns an error if cycles are detected.

Property	Value
Opcode	`OP_TOPSORT (93)`
Complexity	O(V + E)
Use case	Task scheduling, dependency resolution, build systems

ray_op_t* order = ray_topsort(g, rel);

SIP Optimization

Sideways Information Passing (SIP) is an optimizer pass that propagates selection bitmaps (RAY_SEL) backward through OP_EXPAND chains. When a filter is applied after a graph expansion, SIP pushes the filter condition back to the source side of the expansion, allowing the executor to skip entire source nodes whose neighbors would all be filtered out.

This optimization is automatic. The optimizer detects EXPAND chains and propagates sip_sel bitmaps into the graph operation's extended node. During execution, the EXPAND opcode checks each source node against the SIP bitmap and skips it entirely if no neighbors can pass the downstream filter.

/* Without SIP: expand all 1M source nodes, then filter */
/* With SIP: skip source nodes that can't produce passing results */

ray_op_t* nbrs   = ray_expand(g, src_nodes, rel, 0);
ray_op_t* pred   = ray_lt(g, nbrs, ray_const_i64(g, 1000));
ray_op_t* result = ray_filter(g, nbrs, pred);
/* Optimizer automatically injects SIP bitmap on the EXPAND node */

Factorized Execution

Multi-hop graph expansions can produce enormous intermediate results (the cross-product of all paths). Rayforce avoids materializing these cross-products using factorized vectors (ray_fvec_t) and factorized tables (ray_ftable_t).

/* Factorized vector: represents a column without materializing all rows */
typedef struct ray_fvec {
    ray_t*    vec;            /* underlying ray_t vector           */
    int64_t  cur_idx;        /* >= 0: single value at index       */
                             /* -1: full vector active            */
    int64_t  cardinality;    /* how many rows this represents     */
} ray_fvec_t;

/* Factorized table: accumulation buffer for WCO joins */
typedef struct ray_ftable {
    ray_fvec_t*  columns;     /* array of factorized vectors       */
    uint16_t    n_cols;
    int64_t     n_tuples;    /* factorized tuple count            */
    ray_t*       semijoin;    /* RAY_SEL bitmap of qualifying keys */
} ray_ftable_t;

When factorized = 1 is set on a graph op's extended node, the executor emits factorized output instead of flat vectors. The factorized representation keeps each expansion level as a separate vector with a cardinality multiplier, deferring materialization until the final result is needed (via ray_ftable_materialize).

Algorithm Summary

Category	Opcode	Algorithm	Complexity
Traversal	`OP_EXPAND`	1-hop CSR lookup	O(d)
Traversal	`OP_VAR_EXPAND`	BFS/DFS variable-length	O(V+E)
Traversal	`OP_DFS`	Depth-first search	O(V+E)
Traversal	`OP_RANDOM_WALK`	Random walk	O(L)
Shortest path	`OP_SHORTEST_PATH`	BFS	O(V+E)
Shortest path	`OP_DIJKSTRA`	Dijkstra (binary heap)	O((V+E) log V)
Shortest path	`OP_ASTAR`	A* with Haversine	O((V+E) log V)
Shortest path	`OP_K_SHORTEST`	Yen's algorithm	O(kV(V+E) log V)
Centrality	`OP_PAGERANK`	Iterative PageRank	O(I(V+E))
Centrality	`OP_DEGREE_CENT`	Degree centrality	O(V)
Centrality	`OP_BETWEENNESS`	Brandes	O(VE)
Centrality	`OP_CLOSENESS`	Closeness centrality	O(VE)
Community	`OP_LOUVAIN`	Louvain modularity	O(V+E)
Community	`OP_CONNECTED_COMP`	Label propagation	O(V+E)
Community	`OP_CLUSTER_COEFF`	Local clustering	O(V*d^2)
Optimal join	`OP_WCO_JOIN`	Leapfrog TrieJoin	O(E^{3/2})
Spanning	`OP_MST`	Kruskal	O(E log E)
Similarity	`OP_COSINE_SIM`	Cosine similarity	O(ND)
Similarity	`OP_EUCLIDEAN_DIST`	L2 distance	O(ND)
Similarity	`OP_KNN`	Brute-force KNN	O(ND + N log K)
Similarity	`OP_HNSW_KNN`	HNSW approximate KNN	O(D log N)
Ordering	`OP_TOPSORT`	Kahn's topological sort	O(V+E)