Datalog: Rules & Queries
Declare WHAT you want, not HOW to compute it. Rayforce's Datalog layer compiles rules and queries down to the same DAG-based vectorized executor that powers select and update.
What is Datalog?
Datalog is a rule-based declarative query language. You define facts and rules, then ask questions. The engine figures out all possible answers automatically by applying rules until no new facts can be derived.
Unlike SQL, Datalog handles recursive queries naturally — transitive closure, reachability, and graph traversal are first-class operations, not awkward CTEs bolted on as an afterthought.
EAV Triple Storage
Datalog in Rayforce uses Entity-Attribute-Value (EAV) triples as its storage model. Every fact is a triple (entity, attribute, value) stored in a columnar datoms table.
Creating a datoms database
Use (datoms) to create an empty EAV database, then (assert-fact db entity attribute value) to add triples:
;; Create an empty EAV database
(set db (datoms))
;; Assert facts: entity 1 has name, dept, salary
(set db (assert-fact db 1 'name 'Alice))
(set db (assert-fact db 1 'dept 'Engineering))
(set db (assert-fact db 1 'salary 80000))
Each call to assert-fact returns a new database with the triple added. The underlying storage is a three-column table [e, a, v] backed by Rayforce's columnar vectors.
Scanning by attribute
Use (scan-eav db attribute) to query all entities with a given attribute, or (scan-eav db entity attribute) to get a specific value:
;; All salaries: returns a table of [entity, value]
(scan-eav db 'salary)
;; Specific entity's salary: returns the scalar value
(scan-eav db 3 'salary) ; => 90000
Rules
Rules define derived relations. A rule has a head (the relation being defined) and a body (the conditions that must hold):
(rule (head ?vars...)
;; body clauses — all must be satisfied
(?e :attr ?v)
(?e :other-attr ?w))
Variables start with ?. When the same variable appears in multiple clauses, it acts as a join condition — the values must be equal.
Simple rule example
;; Define "employee" as entities with both name and dept
(rule (employee ?e ?n ?d)
(?e :name ?n)
(?e :dept ?d))
This rule says: "an employee is any entity ?e that has both a :name attribute (bound to ?n) and a :dept attribute (bound to ?d)." The shared variable ?e across the two clauses produces a join on entity ID.
OR semantics with multiple clauses
Define multiple rules with the same head to express disjunction (OR). Each rule clause contributes its own set of results, and they are combined:
;; Two ways to be "reachable"
(rule (reachable ?x ?y) (?x :edge ?y)) ;; base: direct edge
(rule (reachable ?x ?z) (?x :edge ?y) (reachable ?y ?z)) ;; recursive
Queries
Queries retrieve data from the datoms database using pattern matching. The syntax is:
(query db (find ?vars...) (where clauses...))
The find clause specifies which variables to return. The where clause contains triple patterns and rule invocations that constrain the results.
Simple pattern query
;; Find all entities and their names
(query db (find ?e ?n) (where (?e :name ?n)))
Join query
When multiple patterns share a variable, Rayforce compiles them into a join:
;; Find name + department (join on entity ?e)
(query db (find ?n ?d) (where (?e :name ?n) (?e :dept ?d)))
Query with rules
Rule invocations can appear in the where clause just like triple patterns:
;; Use the "employee" rule defined earlier
(query db (find ?n ?d) (where (employee ?e ?n ?d)))
How queries compile to the DAG
Under the hood, each query compiles to Rayforce's DAG execution pipeline:
- Each triple pattern
(?e :attr ?v)becomes aray_scan+ray_filteron the datoms table - Shared variables across patterns become
ray_joinoperations - The
findclause becomes a final projection selecting the requested columns - The optimizer applies predicate pushdown, filter reorder, and fusion — the same passes used for
select
Recursive Rules & Fixpoint
The real power of Datalog is recursive rules. Define a rule that references itself, and the engine automatically computes the transitive closure by iterating until no new facts are produced (the "fixpoint").
Transitive closure example
;; Build a graph
(set gdb (datoms))
(set gdb (assert-fact gdb 1 'edge 2))
(set gdb (assert-fact gdb 2 'edge 3))
(set gdb (assert-fact gdb 3 'edge 4))
(set gdb (assert-fact gdb 4 'edge 5))
;; Base case: direct edge
(rule (reachable ?x ?y) (?x :edge ?y))
;; Recursive case: edge + reachability
(rule (reachable ?x ?z) (?x :edge ?y) (reachable ?y ?z))
;; Query: all reachable pairs
(query gdb (find ?x ?y) (where (reachable ?x ?y)))
This produces all 10 reachable pairs from the 4-edge chain: direct edges (1→2, 2→3, 3→4, 4→5) plus transitive ones (1→3, 1→4, 1→5, 2→4, 2→5, 3→5).
Semi-naive evaluation
Rayforce uses semi-naive evaluation for fixpoint computation, which is significantly faster than naive re-evaluation:
- Base pass: Apply all non-recursive rule clauses to produce initial facts
- Delta iteration: In each round, only use newly derived facts (the "delta") as input to rule bodies
- Antijoin: Remove facts already known from the delta to keep only truly new derivations
- Terminate: When the delta is empty, the fixpoint has been reached
Each iteration compiles to a fresh DAG and calls ray_execute. The antijoin step uses ray_antijoin to efficiently filter out previously known facts.
Pull Queries
Pull queries provide entity-centric retrieval — given an entity ID, return all (or selected) attributes as a dictionary:
;; Pull all attributes for entity 1
(pull db 1)
;; => ['name 150 'dept 152 'salary 80000]
;; Pull only specific attributes
(pull db 2 [name salary])
;; => ['name 154 'salary 60000]
Pull queries scan the EAV index for the given entity ID and collect all matching attribute-value pairs. When an attribute list is provided, only those attributes are returned.
How Datalog Maps to Rayforce
Every Datalog concept compiles down to existing Rayforce DAG operations. The Datalog layer is purely a compilation frontend — the engine does all the heavy lifting.
| Datalog Concept | Rayforce Operation | Description |
|---|---|---|
Triple pattern (?e :attr ?v) |
ray_scan + ray_filter |
Indexed column scan filtered by attribute |
| Shared variable (join) | ray_join |
Hash join on shared variable column |
| OR rules (same head) | union-all + distinct |
Combine results and deduplicate |
Negation (not ...) |
ray_antijoin |
Keep rows not present in another result |
| Fixpoint (recursion) | Loop with ray_execute |
Iterate until delta is empty |
(find ?a ?n) |
Projection | Select output columns from the result |
(pull db entity) |
ray_scan on EAV index |
Entity attribute scan and collection |
| Rule invocation | Subgraph expansion | Inline the rule's compiled DAG as a subplan |
select/update, queries benefit from all optimizer passes: predicate pushdown, filter reorder, fusion, and morsel-parallel execution with SIMD.
Complete Example
The following example demonstrates the full Datalog workflow: creating an EAV database, asserting facts, defining rules, running queries, computing transitive closure, and using pull queries. This is the content of examples/rfl/datalog.rfl.
Source
; Datalog Example
; Demonstrates: datoms, assert-fact, rules, queries,
; fixpoint (transitive closure), and pull
; 1. Create EAV database with employee data
(set db (datoms))
(set db (assert-fact db 1 'name 'Alice))
(set db (assert-fact db 1 'dept 'Engineering))
(set db (assert-fact db 1 'salary 80000))
(set db (assert-fact db 2 'name 'Bob))
(set db (assert-fact db 2 'dept 'Sales))
(set db (assert-fact db 2 'salary 60000))
(set db (assert-fact db 3 'name 'Charlie))
(set db (assert-fact db 3 'dept 'Engineering))
(set db (assert-fact db 3 'salary 90000))
; 2. Simple query: find all names
(println "All names:")
(show (query db (find ?e ?n) (where (?e :name ?n))))
; 3. Join query: name + department
(println "Name + Dept:")
(show (query db (find ?n ?d) (where (?e :name ?n) (?e :dept ?d))))
; 4. Rule: define "employee" relation
(rule (employee ?e ?n ?d)
(?e :name ?n)
(?e :dept ?d))
(println "Employees (via rule):")
(show (query db (find ?n ?d) (where (employee ?e ?n ?d))))
; 5. Transitive closure: reachability in a graph
(set gdb (datoms))
(set gdb (assert-fact gdb 1 'edge 2))
(set gdb (assert-fact gdb 2 'edge 3))
(set gdb (assert-fact gdb 3 'edge 4))
(set gdb (assert-fact gdb 4 'edge 5))
(rule (reachable ?x ?y) (?x :edge ?y))
(rule (reachable ?x ?z) (?x :edge ?y) (reachable ?y ?z))
(println "Reachable (transitive closure):")
(show (query gdb (find ?x ?y) (where (reachable ?x ?y))))
; 6. Pull: entity-centric retrieval
(println "Entity 1 (all attributes):")
(println (pull db 1))
(println "Entity 2 (name + salary only):")
(println (pull db 2 [name salary]))
; 7. Scan-eav: low-level attribute lookup
(println "All salaries:")
(show (scan-eav db 'salary))
(println "Entity 3 salary:")
(println (scan-eav db 3 'salary))
Output
Running ./rayforce examples/rfl/datalog.rfl produces:
All names:
+-----+-------------------------------+
| ?e | ?n |
| i64 | i64 |
+-----+-------------------------------+
| 1 | 150 |
| 2 | 154 |
| 3 | 156 |
+-----+-------------------------------+
| 3 rows (3 shown) 2 columns (2 shown)|
+-------------------------------------+
Name + Dept:
+-----+-------------------------------+
| ?n | ?d |
| i64 | i64 |
+-----+-------------------------------+
| 150 | 152 |
| 154 | 155 |
| 156 | 152 |
+-----+-------------------------------+
| 3 rows (3 shown) 2 columns (2 shown)|
+-------------------------------------+
Employees (via rule):
+-----+-------------------------------+
| ?n | ?d |
| i64 | i64 |
+-----+-------------------------------+
| 150 | 152 |
| 154 | 155 |
| 156 | 152 |
+-----+-------------------------------+
| 3 rows (3 shown) 2 columns (2 shown)|
+-------------------------------------+
Reachable (transitive closure):
+-----+---------------------------------+
| ?x | ?y |
| i64 | i64 |
+-----+---------------------------------+
| 1 | 2 |
| 2 | 3 |
| 3 | 4 |
| 4 | 5 |
| 1 | 3 |
| 2 | 4 |
| 3 | 5 |
| 1 | 4 |
| 2 | 5 |
| 1 | 5 |
+-----+---------------------------------+
| 10 rows (10 shown) 2 columns (2 shown)|
+---------------------------------------+
Entity 1 (all attributes):
['name 150 'dept 152 'salary 80000]
Entity 2 (name + salary only):
['name 154 'salary 60000]
All salaries:
+-----+-------------------------------+
| e | v |
| i64 | i64 |
+-----+-------------------------------+
| 1 | 80000 |
| 2 | 60000 |
| 3 | 90000 |
+-----+-------------------------------+
| 3 rows (3 shown) 2 columns (2 shown)|
+-------------------------------------+
Entity 3 salary:
90000
(pull db entity) for human-readable attribute display.