Rayforce Rayforce ← Back to home
GitHub

Datalog: Rules & Queries

Declare WHAT you want, not HOW to compute it. Rayforce's Datalog layer compiles rules and queries down to the same DAG-based vectorized executor that powers select and update.

What is Datalog?

Datalog is a rule-based declarative query language. You define facts and rules, then ask questions. The engine figures out all possible answers automatically by applying rules until no new facts can be derived.

Unlike SQL, Datalog handles recursive queries naturally — transitive closure, reachability, and graph traversal are first-class operations, not awkward CTEs bolted on as an afterthought.

Key idea: You declare logical relationships. Rayforce compiles them into the same vectorized, morsel-parallel execution pipeline used by the rest of the engine. No interpretation overhead — rules become DAG nodes.

EAV Triple Storage

Datalog in Rayforce uses Entity-Attribute-Value (EAV) triples as its storage model. Every fact is a triple (entity, attribute, value) stored in a columnar datoms table.

Creating a datoms database

Use (datoms) to create an empty EAV database, then (assert-fact db entity attribute value) to add triples:

;; Create an empty EAV database
(set db (datoms))

;; Assert facts: entity 1 has name, dept, salary
(set db (assert-fact db 1 'name 'Alice))
(set db (assert-fact db 1 'dept 'Engineering))
(set db (assert-fact db 1 'salary 80000))

Each call to assert-fact returns a new database with the triple added. The underlying storage is a three-column table [e, a, v] backed by Rayforce's columnar vectors.

Scanning by attribute

Use (scan-eav db attribute) to query all entities with a given attribute, or (scan-eav db entity attribute) to get a specific value:

;; All salaries: returns a table of [entity, value]
(scan-eav db 'salary)

;; Specific entity's salary: returns the scalar value
(scan-eav db 3 'salary)  ; => 90000

Rules

Rules define derived relations. A rule has a head (the relation being defined) and a body (the conditions that must hold):

(rule (head ?vars...)
  ;; body clauses — all must be satisfied
  (?e :attr ?v)
  (?e :other-attr ?w))

Variables start with ?. When the same variable appears in multiple clauses, it acts as a join condition — the values must be equal.

Simple rule example

;; Define "employee" as entities with both name and dept
(rule (employee ?e ?n ?d)
  (?e :name ?n)
  (?e :dept ?d))

This rule says: "an employee is any entity ?e that has both a :name attribute (bound to ?n) and a :dept attribute (bound to ?d)." The shared variable ?e across the two clauses produces a join on entity ID.

OR semantics with multiple clauses

Define multiple rules with the same head to express disjunction (OR). Each rule clause contributes its own set of results, and they are combined:

;; Two ways to be "reachable"
(rule (reachable ?x ?y) (?x :edge ?y))              ;; base: direct edge
(rule (reachable ?x ?z) (?x :edge ?y) (reachable ?y ?z))  ;; recursive

Queries

Queries retrieve data from the datoms database using pattern matching. The syntax is:

(query db (find ?vars...) (where clauses...))

The find clause specifies which variables to return. The where clause contains triple patterns and rule invocations that constrain the results.

Simple pattern query

;; Find all entities and their names
(query db (find ?e ?n) (where (?e :name ?n)))

Join query

When multiple patterns share a variable, Rayforce compiles them into a join:

;; Find name + department (join on entity ?e)
(query db (find ?n ?d) (where (?e :name ?n) (?e :dept ?d)))

Query with rules

Rule invocations can appear in the where clause just like triple patterns:

;; Use the "employee" rule defined earlier
(query db (find ?n ?d) (where (employee ?e ?n ?d)))

How queries compile to the DAG

Under the hood, each query compiles to Rayforce's DAG execution pipeline:

Recursive Rules & Fixpoint

The real power of Datalog is recursive rules. Define a rule that references itself, and the engine automatically computes the transitive closure by iterating until no new facts are produced (the "fixpoint").

Transitive closure example

;; Build a graph
(set gdb (datoms))
(set gdb (assert-fact gdb 1 'edge 2))
(set gdb (assert-fact gdb 2 'edge 3))
(set gdb (assert-fact gdb 3 'edge 4))
(set gdb (assert-fact gdb 4 'edge 5))

;; Base case: direct edge
(rule (reachable ?x ?y) (?x :edge ?y))

;; Recursive case: edge + reachability
(rule (reachable ?x ?z) (?x :edge ?y) (reachable ?y ?z))

;; Query: all reachable pairs
(query gdb (find ?x ?y) (where (reachable ?x ?y)))

This produces all 10 reachable pairs from the 4-edge chain: direct edges (1→2, 2→3, 3→4, 4→5) plus transitive ones (1→3, 1→4, 1→5, 2→4, 2→5, 3→5).

Semi-naive evaluation

Rayforce uses semi-naive evaluation for fixpoint computation, which is significantly faster than naive re-evaluation:

  1. Base pass: Apply all non-recursive rule clauses to produce initial facts
  2. Delta iteration: In each round, only use newly derived facts (the "delta") as input to rule bodies
  3. Antijoin: Remove facts already known from the delta to keep only truly new derivations
  4. Terminate: When the delta is empty, the fixpoint has been reached

Each iteration compiles to a fresh DAG and calls ray_execute. The antijoin step uses ray_antijoin to efficiently filter out previously known facts.

Pull Queries

Pull queries provide entity-centric retrieval — given an entity ID, return all (or selected) attributes as a dictionary:

;; Pull all attributes for entity 1
(pull db 1)
;; => ['name 150 'dept 152 'salary 80000]

;; Pull only specific attributes
(pull db 2 [name salary])
;; => ['name 154 'salary 60000]

Pull queries scan the EAV index for the given entity ID and collect all matching attribute-value pairs. When an attribute list is provided, only those attributes are returned.

How Datalog Maps to Rayforce

Every Datalog concept compiles down to existing Rayforce DAG operations. The Datalog layer is purely a compilation frontend — the engine does all the heavy lifting.

Datalog Concept Rayforce Operation Description
Triple pattern (?e :attr ?v) ray_scan + ray_filter Indexed column scan filtered by attribute
Shared variable (join) ray_join Hash join on shared variable column
OR rules (same head) union-all + distinct Combine results and deduplicate
Negation (not ...) ray_antijoin Keep rows not present in another result
Fixpoint (recursion) Loop with ray_execute Iterate until delta is empty
(find ?a ?n) Projection Select output columns from the result
(pull db entity) ray_scan on EAV index Entity attribute scan and collection
Rule invocation Subgraph expansion Inline the rule's compiled DAG as a subplan
Performance: Because Datalog compiles to the same DAG as select/update, queries benefit from all optimizer passes: predicate pushdown, filter reorder, fusion, and morsel-parallel execution with SIMD.

Complete Example

The following example demonstrates the full Datalog workflow: creating an EAV database, asserting facts, defining rules, running queries, computing transitive closure, and using pull queries. This is the content of examples/rfl/datalog.rfl.

Source

; Datalog Example
; Demonstrates: datoms, assert-fact, rules, queries,
; fixpoint (transitive closure), and pull

; 1. Create EAV database with employee data
(set db (datoms))
(set db (assert-fact db 1 'name 'Alice))
(set db (assert-fact db 1 'dept 'Engineering))
(set db (assert-fact db 1 'salary 80000))
(set db (assert-fact db 2 'name 'Bob))
(set db (assert-fact db 2 'dept 'Sales))
(set db (assert-fact db 2 'salary 60000))
(set db (assert-fact db 3 'name 'Charlie))
(set db (assert-fact db 3 'dept 'Engineering))
(set db (assert-fact db 3 'salary 90000))

; 2. Simple query: find all names
(println "All names:")
(show (query db (find ?e ?n) (where (?e :name ?n))))

; 3. Join query: name + department
(println "Name + Dept:")
(show (query db (find ?n ?d) (where (?e :name ?n) (?e :dept ?d))))

; 4. Rule: define "employee" relation
(rule (employee ?e ?n ?d)
  (?e :name ?n)
  (?e :dept ?d))

(println "Employees (via rule):")
(show (query db (find ?n ?d) (where (employee ?e ?n ?d))))

; 5. Transitive closure: reachability in a graph
(set gdb (datoms))
(set gdb (assert-fact gdb 1 'edge 2))
(set gdb (assert-fact gdb 2 'edge 3))
(set gdb (assert-fact gdb 3 'edge 4))
(set gdb (assert-fact gdb 4 'edge 5))

(rule (reachable ?x ?y) (?x :edge ?y))
(rule (reachable ?x ?z) (?x :edge ?y) (reachable ?y ?z))

(println "Reachable (transitive closure):")
(show (query gdb (find ?x ?y) (where (reachable ?x ?y))))

; 6. Pull: entity-centric retrieval
(println "Entity 1 (all attributes):")
(println (pull db 1))

(println "Entity 2 (name + salary only):")
(println (pull db 2 [name salary]))

; 7. Scan-eav: low-level attribute lookup
(println "All salaries:")
(show (scan-eav db 'salary))

(println "Entity 3 salary:")
(println (scan-eav db 3 'salary))

Output

Running ./rayforce examples/rfl/datalog.rfl produces:

All names:
+-----+-------------------------------+
| ?e  |              ?n               |
| i64 |              i64              |
+-----+-------------------------------+
| 1   | 150                           |
| 2   | 154                           |
| 3   | 156                           |
+-----+-------------------------------+
| 3 rows (3 shown) 2 columns (2 shown)|
+-------------------------------------+
Name + Dept:
+-----+-------------------------------+
| ?n  |              ?d               |
| i64 |              i64              |
+-----+-------------------------------+
| 150 | 152                           |
| 154 | 155                           |
| 156 | 152                           |
+-----+-------------------------------+
| 3 rows (3 shown) 2 columns (2 shown)|
+-------------------------------------+
Employees (via rule):
+-----+-------------------------------+
| ?n  |              ?d               |
| i64 |              i64              |
+-----+-------------------------------+
| 150 | 152                           |
| 154 | 155                           |
| 156 | 152                           |
+-----+-------------------------------+
| 3 rows (3 shown) 2 columns (2 shown)|
+-------------------------------------+
Reachable (transitive closure):
+-----+---------------------------------+
| ?x  |               ?y                |
| i64 |               i64               |
+-----+---------------------------------+
| 1   | 2                               |
| 2   | 3                               |
| 3   | 4                               |
| 4   | 5                               |
| 1   | 3                               |
| 2   | 4                               |
| 3   | 5                               |
| 1   | 4                               |
| 2   | 5                               |
| 1   | 5                               |
+-----+---------------------------------+
| 10 rows (10 shown) 2 columns (2 shown)|
+---------------------------------------+
Entity 1 (all attributes):
['name 150 'dept 152 'salary 80000]
Entity 2 (name + salary only):
['name 154 'salary 60000]
All salaries:
+-----+-------------------------------+
|  e  |               v               |
| i64 |              i64              |
+-----+-------------------------------+
| 1   | 80000                         |
| 2   | 60000                         |
| 3   | 90000                         |
+-----+-------------------------------+
| 3 rows (3 shown) 2 columns (2 shown)|
+-------------------------------------+
Entity 3 salary:
90000
Note: In the EAV value column, symbol values are stored as their intern table IDs (integers). This enables efficient joins where a value from one triple can be used as an entity ID in another triple (e.g., transitive closure). Use (pull db entity) for human-readable attribute display.