Graph Algorithms
22 graph opcodes covering traversal, shortest paths, centrality, community detection, similarity search, optimal joins, and spanning trees — all integrated into the DAG pipeline.
ray_rel_t relationships built from CSR storage. See Graph Storage for how to build and persist relationships.
Traversal
OP_EXPAND — 1-Hop Neighbor Expansion
Expands each input node to its direct neighbors in the CSR. The fundamental building block for all graph queries.
| Property | Value |
|---|---|
| Opcode | OP_EXPAND (80) |
| Complexity | O(d) per node, where d = average degree |
| Use case | Neighbor lookup, 1-hop reachability, join-like expansions |
/* C API */
ray_op_t* nbrs = ray_expand(g, src_nodes, rel, 0); /* direction: 0=fwd */
OP_VAR_EXPAND — Variable-Length BFS/DFS
Expands nodes through multiple hops with configurable minimum and maximum depth. Uses BFS by default. Optionally tracks the full path for each discovered node.
| Property | Value |
|---|---|
| Opcode | OP_VAR_EXPAND (81) |
| Complexity | O(V + E) within the explored subgraph |
| Use case | Multi-hop reachability, friends-of-friends, transitive closure |
/* BFS from start_nodes, depth 1..3, tracking paths */
ray_op_t* reachable = ray_var_expand(g, start_nodes, rel,
0, /* direction: forward */
1, /* min_depth */
3, /* max_depth */
true /* track_path */
);
OP_DFS — Depth-First Search
Explicit depth-first traversal from a source node, returning nodes in DFS visit order.
| Property | Value |
|---|---|
| Opcode | OP_DFS (94) |
| Complexity | O(V + E) within explored subgraph |
| Use case | Topological processing, cycle detection, graph exploration |
ray_op_t* dfs_order = ray_dfs(g, src_node, rel, 10); /* max depth 10 */
OP_RANDOM_WALK — Random Walk
Performs random walks from source nodes, selecting a uniformly random neighbor at each step. Used for node2vec-style embeddings and sampling.
| Property | Value |
|---|---|
| Opcode | OP_RANDOM_WALK (98) |
| Complexity | O(L) per walk, where L = walk_length |
| Use case | Graph sampling, node2vec, DeepWalk embeddings |
ray_op_t* walk = ray_random_walk(g, src_node, rel, 80); /* 80-step walk */
Shortest Path
OP_SHORTEST_PATH — BFS Shortest Path
Finds the shortest unweighted path between two nodes using bidirectional BFS.
| Property | Value |
|---|---|
| Opcode | OP_SHORTEST_PATH (82) |
| Complexity | O(V + E) |
| Use case | Hop distance, unweighted routing, social distance |
ray_op_t* path = ray_shortest_path(g, src, dst, rel, 20); /* max depth 20 */
OP_DIJKSTRA — Weighted Shortest Path
Dijkstra's algorithm for weighted shortest paths. Reads edge weights from a property column on the relationship.
| Property | Value |
|---|---|
| Opcode | OP_DIJKSTRA (86) |
| Complexity | O((V + E) log V) with binary heap |
| Use case | Weighted routing, cost-optimal paths, network flow |
ray_op_t* path = ray_dijkstra(g, src, dst, rel,
"weight", /* edge weight column name */
255 /* max depth */
);
OP_ASTAR — A* Shortest Path
A* search with a coordinate-based heuristic (Haversine distance). Requires node property columns for latitude and longitude.
| Property | Value |
|---|---|
| Opcode | OP_ASTAR (95) |
| Complexity | O((V + E) log V), typically faster than Dijkstra with a good heuristic |
| Use case | Geospatial routing, map navigation, spatial networks |
ray_op_t* path = ray_astar(g, src, dst, rel,
"distance", /* weight column */
"lat", /* latitude column */
"lon", /* longitude column */
node_props, /* node property table with lat/lon */
255 /* max depth */
);
OP_K_SHORTEST — Yen's k-Shortest Paths
Finds the k shortest paths between two nodes using Yen's algorithm. Each successive path is the shortest path that differs from all previously found paths.
| Property | Value |
|---|---|
| Opcode | OP_K_SHORTEST (96) |
| Complexity | O(kV(V + E) log V) |
| Use case | Route alternatives, network resilience, diverse path discovery |
ray_op_t* paths = ray_k_shortest(g, src, dst, rel,
"weight", /* edge weight column */
5 /* k = 5 shortest paths */
);
Centrality
OP_PAGERANK — PageRank
Iterative PageRank computation. Converges after max_iter iterations or when residuals fall below an internal threshold.
| Property | Value |
|---|---|
| Opcode | OP_PAGERANK (84) |
| Complexity | O(I * (V + E)), where I = iterations |
| Use case | Node importance ranking, influence analysis, link analysis |
ray_op_t* pr = ray_pagerank(g, rel,
100, /* max iterations */
0.85 /* damping factor */
);
OP_DEGREE_CENT — Degree Centrality
Computes degree centrality for all nodes (normalized degree count).
| Property | Value |
|---|---|
| Opcode | OP_DEGREE_CENT (92) |
| Complexity | O(V) |
| Use case | Hub identification, connectivity analysis |
ray_op_t* dc = ray_degree_cent(g, rel);
OP_BETWEENNESS — Betweenness Centrality (Brandes)
Brandes' algorithm for betweenness centrality. Supports approximate computation via sampling to reduce cost on large graphs.
| Property | Value |
|---|---|
| Opcode | OP_BETWEENNESS (99) |
| Complexity | O(VE) exact, O(SE) sampled (S = sample_size) |
| Use case | Bridge detection, information flow bottlenecks |
ray_op_t* bc = ray_betweenness(g, rel, 0); /* 0 = exact (all nodes) */
ray_op_t* bc_approx = ray_betweenness(g, rel, 100); /* sample 100 sources */
OP_CLOSENESS — Closeness Centrality
Closeness centrality measures how close a node is to all other reachable nodes. Supports sampling for large graphs.
| Property | Value |
|---|---|
| Opcode | OP_CLOSENESS (100) |
| Complexity | O(VE) exact, O(SE) sampled |
| Use case | Network accessibility, facility placement |
ray_op_t* cc = ray_closeness(g, rel, 0); /* exact */
Community Detection
OP_LOUVAIN — Louvain Community Detection
Modularity-based community detection using the Louvain method. Iteratively merges communities to maximize modularity.
| Property | Value |
|---|---|
| Opcode | OP_LOUVAIN (87) |
| Complexity | O(V + E) per iteration, typically converges in a few passes |
| Use case | Community discovery, social clusters, network partitioning |
ray_op_t* communities = ray_louvain(g, rel, 50); /* max 50 iterations */
OP_CONNECTED_COMP — Connected Components
Finds connected components using label propagation. Each node receives a component ID.
| Property | Value |
|---|---|
| Opcode | OP_CONNECTED_COMP (85) |
| Complexity | O(V + E) |
| Use case | Graph partitioning, isolated subgraph detection, data lineage |
ray_op_t* comp = ray_connected_comp(g, rel);
OP_CLUSTER_COEFF — Clustering Coefficients
Computes the local clustering coefficient for each node: the fraction of its neighbors that are also connected to each other.
| Property | Value |
|---|---|
| Opcode | OP_CLUSTER_COEFF (97) |
| Complexity | O(V * d^2) where d = average degree |
| Use case | Network density, small-world analysis, triadic closure |
ray_op_t* cc = ray_cluster_coeff(g, rel);
Worst-Case Optimal Joins
OP_WCO_JOIN — Leapfrog TrieJoin
Worst-case optimal join using Leapfrog TrieJoin (LFTJ). Finds triangles, k-cliques, and arbitrary pattern matches in the graph without materializing intermediate cross-products. Requires sorted adjacency lists in the CSR.
| Property | Value |
|---|---|
| Opcode | OP_WCO_JOIN (83) |
| Complexity | O(E^{3/2}) for triangle listing (worst-case optimal) |
| Use case | Triangle counting, k-clique enumeration, pattern matching, motif detection |
/* Find all triangles: nodes (a,b,c) where a->b, b->c, a->c */
ray_rel_t* rels[] = {rel, rel, rel};
ray_op_t* triangles = ray_wco_join(g,
rels, /* 3 relationships (can be same or different) */
3, /* n_rels */
3 /* n_vars (a, b, c) */
);
sort_targets = true when using OP_WCO_JOIN.
Spanning Trees
OP_MST — Minimum Spanning Tree (Kruskal)
Computes the minimum spanning forest using Kruskal's algorithm with union-find. Returns the MST edges as a table.
| Property | Value |
|---|---|
| Opcode | OP_MST (101) |
| Complexity | O(E log E) |
| Use case | Network backbone, minimal wiring, clustering via MST cuts |
ray_op_t* mst = ray_mst(g, rel, "weight");
Vector Similarity
OP_COSINE_SIM — Cosine Similarity
Computes cosine similarity between a query vector and each row of an embedding column.
| Property | Value |
|---|---|
| Opcode | OP_COSINE_SIM (88) |
| Complexity | O(N * D) where N = rows, D = dimension |
| Use case | Semantic search, recommendation, duplicate detection |
float query[128] = { /* ... */ };
ray_op_t* sim = ray_cosine_sim(g, emb_col, query, 128);
OP_EUCLIDEAN_DIST — Euclidean Distance
Computes Euclidean (L2) distance between a query vector and each row of an embedding column.
| Property | Value |
|---|---|
| Opcode | OP_EUCLIDEAN_DIST (89) |
| Complexity | O(N * D) |
| Use case | Spatial queries, clustering, anomaly detection |
ray_op_t* dist = ray_euclidean_dist(g, emb_col, query, 128);
OP_KNN — Brute-Force K Nearest Neighbors
Finds the K nearest neighbors by exhaustive comparison. Returns the top-K rows sorted by distance.
| Property | Value |
|---|---|
| Opcode | OP_KNN (90) |
| Complexity | O(N * D + N log K) |
| Use case | Exact nearest neighbor search on small to medium datasets |
ray_op_t* neighbors = ray_knn(g, emb_col, query, 128, 10); /* top-10 */
OP_HNSW_KNN — HNSW Approximate KNN
Approximate K nearest neighbors using a pre-built HNSW (Hierarchical Navigable Small World) index. Orders of magnitude faster than brute-force for large datasets.
| Property | Value |
|---|---|
| Opcode | OP_HNSW_KNN (91) |
| Complexity | O(D * log N) approximate |
| Use case | Large-scale semantic search, real-time recommendation, RAG retrieval |
ray_op_t* neighbors = ray_hnsw_knn(g, hnsw_idx,
query, 128, /* query vector + dimension */
10, /* k */
200 /* ef_search (beam width, higher = more accurate) */
);
Ordering
OP_TOPSORT — Topological Sort (Kahn's)
Produces a topological ordering of a directed acyclic graph using Kahn's algorithm. Returns an error if cycles are detected.
| Property | Value |
|---|---|
| Opcode | OP_TOPSORT (93) |
| Complexity | O(V + E) |
| Use case | Task scheduling, dependency resolution, build systems |
ray_op_t* order = ray_topsort(g, rel);
SIP Optimization
Sideways Information Passing (SIP) is an optimizer pass that propagates selection bitmaps (RAY_SEL) backward through OP_EXPAND chains. When a filter is applied after a graph expansion, SIP pushes the filter condition back to the source side of the expansion, allowing the executor to skip entire source nodes whose neighbors would all be filtered out.
This optimization is automatic. The optimizer detects EXPAND chains and propagates sip_sel bitmaps into the graph operation's extended node. During execution, the EXPAND opcode checks each source node against the SIP bitmap and skips it entirely if no neighbors can pass the downstream filter.
/* Without SIP: expand all 1M source nodes, then filter */
/* With SIP: skip source nodes that can't produce passing results */
ray_op_t* nbrs = ray_expand(g, src_nodes, rel, 0);
ray_op_t* pred = ray_lt(g, nbrs, ray_const_i64(g, 1000));
ray_op_t* result = ray_filter(g, nbrs, pred);
/* Optimizer automatically injects SIP bitmap on the EXPAND node */
Factorized Execution
Multi-hop graph expansions can produce enormous intermediate results (the cross-product of all paths). Rayforce avoids materializing these cross-products using factorized vectors (ray_fvec_t) and factorized tables (ray_ftable_t).
/* Factorized vector: represents a column without materializing all rows */
typedef struct ray_fvec {
ray_t* vec; /* underlying ray_t vector */
int64_t cur_idx; /* >= 0: single value at index */
/* -1: full vector active */
int64_t cardinality; /* how many rows this represents */
} ray_fvec_t;
/* Factorized table: accumulation buffer for WCO joins */
typedef struct ray_ftable {
ray_fvec_t* columns; /* array of factorized vectors */
uint16_t n_cols;
int64_t n_tuples; /* factorized tuple count */
ray_t* semijoin; /* RAY_SEL bitmap of qualifying keys */
} ray_ftable_t;
When factorized = 1 is set on a graph op's extended node, the executor emits factorized output instead of flat vectors. The factorized representation keeps each expansion level as a separate vector with a cardinality multiplier, deferring materialization until the final result is needed (via ray_ftable_materialize).
Algorithm Summary
| Category | Opcode | Algorithm | Complexity |
|---|---|---|---|
| Traversal | OP_EXPAND | 1-hop CSR lookup | O(d) |
| Traversal | OP_VAR_EXPAND | BFS/DFS variable-length | O(V+E) |
| Traversal | OP_DFS | Depth-first search | O(V+E) |
| Traversal | OP_RANDOM_WALK | Random walk | O(L) |
| Shortest path | OP_SHORTEST_PATH | BFS | O(V+E) |
| Shortest path | OP_DIJKSTRA | Dijkstra (binary heap) | O((V+E) log V) |
| Shortest path | OP_ASTAR | A* with Haversine | O((V+E) log V) |
| Shortest path | OP_K_SHORTEST | Yen's algorithm | O(kV(V+E) log V) |
| Centrality | OP_PAGERANK | Iterative PageRank | O(I(V+E)) |
| Centrality | OP_DEGREE_CENT | Degree centrality | O(V) |
| Centrality | OP_BETWEENNESS | Brandes | O(VE) |
| Centrality | OP_CLOSENESS | Closeness centrality | O(VE) |
| Community | OP_LOUVAIN | Louvain modularity | O(V+E) |
| Community | OP_CONNECTED_COMP | Label propagation | O(V+E) |
| Community | OP_CLUSTER_COEFF | Local clustering | O(V*d^2) |
| Optimal join | OP_WCO_JOIN | Leapfrog TrieJoin | O(E^{3/2}) |
| Spanning | OP_MST | Kruskal | O(E log E) |
| Similarity | OP_COSINE_SIM | Cosine similarity | O(ND) |
| Similarity | OP_EUCLIDEAN_DIST | L2 distance | O(ND) |
| Similarity | OP_KNN | Brute-force KNN | O(ND + N log K) |
| Similarity | OP_HNSW_KNN | HNSW approximate KNN | O(D log N) |
| Ordering | OP_TOPSORT | Kahn's topological sort | O(V+E) |