Rayforce Rayforce ← Back to home
GitHub

Graph Algorithms

22 graph opcodes covering traversal, shortest paths, centrality, community detection, similarity search, optimal joins, and spanning trees — all integrated into the DAG pipeline.

All graph algorithms operate on ray_rel_t relationships built from CSR storage. See Graph Storage for how to build and persist relationships.

Traversal

OP_EXPAND — 1-Hop Neighbor Expansion

Expands each input node to its direct neighbors in the CSR. The fundamental building block for all graph queries.

PropertyValue
OpcodeOP_EXPAND (80)
ComplexityO(d) per node, where d = average degree
Use caseNeighbor lookup, 1-hop reachability, join-like expansions
/* C API */
ray_op_t* nbrs = ray_expand(g, src_nodes, rel, 0);  /* direction: 0=fwd */

OP_VAR_EXPAND — Variable-Length BFS/DFS

Expands nodes through multiple hops with configurable minimum and maximum depth. Uses BFS by default. Optionally tracks the full path for each discovered node.

PropertyValue
OpcodeOP_VAR_EXPAND (81)
ComplexityO(V + E) within the explored subgraph
Use caseMulti-hop reachability, friends-of-friends, transitive closure
/* BFS from start_nodes, depth 1..3, tracking paths */
ray_op_t* reachable = ray_var_expand(g, start_nodes, rel,
    0,    /* direction: forward */
    1,    /* min_depth */
    3,    /* max_depth */
    true  /* track_path */
);

OP_DFS — Depth-First Search

Explicit depth-first traversal from a source node, returning nodes in DFS visit order.

PropertyValue
OpcodeOP_DFS (94)
ComplexityO(V + E) within explored subgraph
Use caseTopological processing, cycle detection, graph exploration
ray_op_t* dfs_order = ray_dfs(g, src_node, rel, 10); /* max depth 10 */

OP_RANDOM_WALK — Random Walk

Performs random walks from source nodes, selecting a uniformly random neighbor at each step. Used for node2vec-style embeddings and sampling.

PropertyValue
OpcodeOP_RANDOM_WALK (98)
ComplexityO(L) per walk, where L = walk_length
Use caseGraph sampling, node2vec, DeepWalk embeddings
ray_op_t* walk = ray_random_walk(g, src_node, rel, 80); /* 80-step walk */

Shortest Path

OP_SHORTEST_PATH — BFS Shortest Path

Finds the shortest unweighted path between two nodes using bidirectional BFS.

PropertyValue
OpcodeOP_SHORTEST_PATH (82)
ComplexityO(V + E)
Use caseHop distance, unweighted routing, social distance
ray_op_t* path = ray_shortest_path(g, src, dst, rel, 20); /* max depth 20 */

OP_DIJKSTRA — Weighted Shortest Path

Dijkstra's algorithm for weighted shortest paths. Reads edge weights from a property column on the relationship.

PropertyValue
OpcodeOP_DIJKSTRA (86)
ComplexityO((V + E) log V) with binary heap
Use caseWeighted routing, cost-optimal paths, network flow
ray_op_t* path = ray_dijkstra(g, src, dst, rel,
    "weight",  /* edge weight column name */
    255        /* max depth */
);

OP_ASTAR — A* Shortest Path

A* search with a coordinate-based heuristic (Haversine distance). Requires node property columns for latitude and longitude.

PropertyValue
OpcodeOP_ASTAR (95)
ComplexityO((V + E) log V), typically faster than Dijkstra with a good heuristic
Use caseGeospatial routing, map navigation, spatial networks
ray_op_t* path = ray_astar(g, src, dst, rel,
    "distance",   /* weight column */
    "lat",         /* latitude column */
    "lon",         /* longitude column */
    node_props,    /* node property table with lat/lon */
    255            /* max depth */
);

OP_K_SHORTEST — Yen's k-Shortest Paths

Finds the k shortest paths between two nodes using Yen's algorithm. Each successive path is the shortest path that differs from all previously found paths.

PropertyValue
OpcodeOP_K_SHORTEST (96)
ComplexityO(kV(V + E) log V)
Use caseRoute alternatives, network resilience, diverse path discovery
ray_op_t* paths = ray_k_shortest(g, src, dst, rel,
    "weight",  /* edge weight column */
    5          /* k = 5 shortest paths */
);

Centrality

OP_PAGERANK — PageRank

Iterative PageRank computation. Converges after max_iter iterations or when residuals fall below an internal threshold.

PropertyValue
OpcodeOP_PAGERANK (84)
ComplexityO(I * (V + E)), where I = iterations
Use caseNode importance ranking, influence analysis, link analysis
ray_op_t* pr = ray_pagerank(g, rel,
    100,   /* max iterations */
    0.85   /* damping factor */
);

OP_DEGREE_CENT — Degree Centrality

Computes degree centrality for all nodes (normalized degree count).

PropertyValue
OpcodeOP_DEGREE_CENT (92)
ComplexityO(V)
Use caseHub identification, connectivity analysis
ray_op_t* dc = ray_degree_cent(g, rel);

OP_BETWEENNESS — Betweenness Centrality (Brandes)

Brandes' algorithm for betweenness centrality. Supports approximate computation via sampling to reduce cost on large graphs.

PropertyValue
OpcodeOP_BETWEENNESS (99)
ComplexityO(VE) exact, O(SE) sampled (S = sample_size)
Use caseBridge detection, information flow bottlenecks
ray_op_t* bc = ray_betweenness(g, rel, 0);   /* 0 = exact (all nodes) */
ray_op_t* bc_approx = ray_betweenness(g, rel, 100); /* sample 100 sources */

OP_CLOSENESS — Closeness Centrality

Closeness centrality measures how close a node is to all other reachable nodes. Supports sampling for large graphs.

PropertyValue
OpcodeOP_CLOSENESS (100)
ComplexityO(VE) exact, O(SE) sampled
Use caseNetwork accessibility, facility placement
ray_op_t* cc = ray_closeness(g, rel, 0); /* exact */

Community Detection

OP_LOUVAIN — Louvain Community Detection

Modularity-based community detection using the Louvain method. Iteratively merges communities to maximize modularity.

PropertyValue
OpcodeOP_LOUVAIN (87)
ComplexityO(V + E) per iteration, typically converges in a few passes
Use caseCommunity discovery, social clusters, network partitioning
ray_op_t* communities = ray_louvain(g, rel, 50); /* max 50 iterations */

OP_CONNECTED_COMP — Connected Components

Finds connected components using label propagation. Each node receives a component ID.

PropertyValue
OpcodeOP_CONNECTED_COMP (85)
ComplexityO(V + E)
Use caseGraph partitioning, isolated subgraph detection, data lineage
ray_op_t* comp = ray_connected_comp(g, rel);

OP_CLUSTER_COEFF — Clustering Coefficients

Computes the local clustering coefficient for each node: the fraction of its neighbors that are also connected to each other.

PropertyValue
OpcodeOP_CLUSTER_COEFF (97)
ComplexityO(V * d^2) where d = average degree
Use caseNetwork density, small-world analysis, triadic closure
ray_op_t* cc = ray_cluster_coeff(g, rel);

Worst-Case Optimal Joins

OP_WCO_JOIN — Leapfrog TrieJoin

Worst-case optimal join using Leapfrog TrieJoin (LFTJ). Finds triangles, k-cliques, and arbitrary pattern matches in the graph without materializing intermediate cross-products. Requires sorted adjacency lists in the CSR.

PropertyValue
OpcodeOP_WCO_JOIN (83)
ComplexityO(E^{3/2}) for triangle listing (worst-case optimal)
Use caseTriangle counting, k-clique enumeration, pattern matching, motif detection
/* Find all triangles: nodes (a,b,c) where a->b, b->c, a->c */
ray_rel_t* rels[] = {rel, rel, rel};
ray_op_t* triangles = ray_wco_join(g,
    rels,   /* 3 relationships (can be same or different) */
    3,      /* n_rels */
    3       /* n_vars (a, b, c) */
);
Sorted CSR required. LFTJ relies on binary search within sorted adjacency lists. Always build relationships with sort_targets = true when using OP_WCO_JOIN.

Spanning Trees

OP_MST — Minimum Spanning Tree (Kruskal)

Computes the minimum spanning forest using Kruskal's algorithm with union-find. Returns the MST edges as a table.

PropertyValue
OpcodeOP_MST (101)
ComplexityO(E log E)
Use caseNetwork backbone, minimal wiring, clustering via MST cuts
ray_op_t* mst = ray_mst(g, rel, "weight");

Vector Similarity

OP_COSINE_SIM — Cosine Similarity

Computes cosine similarity between a query vector and each row of an embedding column.

PropertyValue
OpcodeOP_COSINE_SIM (88)
ComplexityO(N * D) where N = rows, D = dimension
Use caseSemantic search, recommendation, duplicate detection
float query[128] = { /* ... */ };
ray_op_t* sim = ray_cosine_sim(g, emb_col, query, 128);

OP_EUCLIDEAN_DIST — Euclidean Distance

Computes Euclidean (L2) distance between a query vector and each row of an embedding column.

PropertyValue
OpcodeOP_EUCLIDEAN_DIST (89)
ComplexityO(N * D)
Use caseSpatial queries, clustering, anomaly detection
ray_op_t* dist = ray_euclidean_dist(g, emb_col, query, 128);

OP_KNN — Brute-Force K Nearest Neighbors

Finds the K nearest neighbors by exhaustive comparison. Returns the top-K rows sorted by distance.

PropertyValue
OpcodeOP_KNN (90)
ComplexityO(N * D + N log K)
Use caseExact nearest neighbor search on small to medium datasets
ray_op_t* neighbors = ray_knn(g, emb_col, query, 128, 10); /* top-10 */

OP_HNSW_KNN — HNSW Approximate KNN

Approximate K nearest neighbors using a pre-built HNSW (Hierarchical Navigable Small World) index. Orders of magnitude faster than brute-force for large datasets.

PropertyValue
OpcodeOP_HNSW_KNN (91)
ComplexityO(D * log N) approximate
Use caseLarge-scale semantic search, real-time recommendation, RAG retrieval
ray_op_t* neighbors = ray_hnsw_knn(g, hnsw_idx,
    query, 128,   /* query vector + dimension */
    10,            /* k */
    200            /* ef_search (beam width, higher = more accurate) */
);

Ordering

OP_TOPSORT — Topological Sort (Kahn's)

Produces a topological ordering of a directed acyclic graph using Kahn's algorithm. Returns an error if cycles are detected.

PropertyValue
OpcodeOP_TOPSORT (93)
ComplexityO(V + E)
Use caseTask scheduling, dependency resolution, build systems
ray_op_t* order = ray_topsort(g, rel);

SIP Optimization

Sideways Information Passing (SIP) is an optimizer pass that propagates selection bitmaps (RAY_SEL) backward through OP_EXPAND chains. When a filter is applied after a graph expansion, SIP pushes the filter condition back to the source side of the expansion, allowing the executor to skip entire source nodes whose neighbors would all be filtered out.

This optimization is automatic. The optimizer detects EXPAND chains and propagates sip_sel bitmaps into the graph operation's extended node. During execution, the EXPAND opcode checks each source node against the SIP bitmap and skips it entirely if no neighbors can pass the downstream filter.

/* Without SIP: expand all 1M source nodes, then filter */
/* With SIP: skip source nodes that can't produce passing results */

ray_op_t* nbrs   = ray_expand(g, src_nodes, rel, 0);
ray_op_t* pred   = ray_lt(g, nbrs, ray_const_i64(g, 1000));
ray_op_t* result = ray_filter(g, nbrs, pred);
/* Optimizer automatically injects SIP bitmap on the EXPAND node */

Factorized Execution

Multi-hop graph expansions can produce enormous intermediate results (the cross-product of all paths). Rayforce avoids materializing these cross-products using factorized vectors (ray_fvec_t) and factorized tables (ray_ftable_t).

/* Factorized vector: represents a column without materializing all rows */
typedef struct ray_fvec {
    ray_t*    vec;            /* underlying ray_t vector           */
    int64_t  cur_idx;        /* >= 0: single value at index       */
                             /* -1: full vector active            */
    int64_t  cardinality;    /* how many rows this represents     */
} ray_fvec_t;

/* Factorized table: accumulation buffer for WCO joins */
typedef struct ray_ftable {
    ray_fvec_t*  columns;     /* array of factorized vectors       */
    uint16_t    n_cols;
    int64_t     n_tuples;    /* factorized tuple count            */
    ray_t*       semijoin;    /* RAY_SEL bitmap of qualifying keys */
} ray_ftable_t;

When factorized = 1 is set on a graph op's extended node, the executor emits factorized output instead of flat vectors. The factorized representation keeps each expansion level as a separate vector with a cardinality multiplier, deferring materialization until the final result is needed (via ray_ftable_materialize).

Algorithm Summary

CategoryOpcodeAlgorithmComplexity
TraversalOP_EXPAND1-hop CSR lookupO(d)
TraversalOP_VAR_EXPANDBFS/DFS variable-lengthO(V+E)
TraversalOP_DFSDepth-first searchO(V+E)
TraversalOP_RANDOM_WALKRandom walkO(L)
Shortest pathOP_SHORTEST_PATHBFSO(V+E)
Shortest pathOP_DIJKSTRADijkstra (binary heap)O((V+E) log V)
Shortest pathOP_ASTARA* with HaversineO((V+E) log V)
Shortest pathOP_K_SHORTESTYen's algorithmO(kV(V+E) log V)
CentralityOP_PAGERANKIterative PageRankO(I(V+E))
CentralityOP_DEGREE_CENTDegree centralityO(V)
CentralityOP_BETWEENNESSBrandesO(VE)
CentralityOP_CLOSENESSCloseness centralityO(VE)
CommunityOP_LOUVAINLouvain modularityO(V+E)
CommunityOP_CONNECTED_COMPLabel propagationO(V+E)
CommunityOP_CLUSTER_COEFFLocal clusteringO(V*d^2)
Optimal joinOP_WCO_JOINLeapfrog TrieJoinO(E^{3/2})
SpanningOP_MSTKruskalO(E log E)
SimilarityOP_COSINE_SIMCosine similarityO(ND)
SimilarityOP_EUCLIDEAN_DISTL2 distanceO(ND)
SimilarityOP_KNNBrute-force KNNO(ND + N log K)
SimilarityOP_HNSW_KNNHNSW approximate KNNO(D log N)
OrderingOP_TOPSORTKahn's topological sortO(V+E)