Rayforce Rayforce ← Back to home
GitHub

Joins

Equi-joins, left outer joins, as-of joins for time-series alignment, and window joins with aggregation. All join types compile to the DAG and execute via radix-partitioned hash join.

Inner Join

The inner-join function performs an equi-join on one or more key columns. Only rows with matching keys in both tables appear in the result.

Signature

(inner-join [keys] left right)
Join two tables on shared key columns. Returns a table with all columns from both sides (key columns appear once).

Examples

(set orders (table [id sym qty]
    (list [1 2 3 4] [AAPL GOOG MSFT AAPL] [100 200 50 150])))

(set prices (table [sym price]
    (list [AAPL GOOG TSLA] [150 280 245])))

; Join on sym — MSFT and TSLA have no match, so they are excluded
(inner-join [sym] orders prices)
; id sym  qty price
; -- ---  --- -----
;  1 AAPL 100   150
;  2 GOOG 200   280
;  4 AAPL 150   150

Join on multiple keys:

(set x (table [a b c]
    (list (take (list "aa" "bb" "cc") 10)
          (take [I J K] 10)
          (til 10))))

(set y (table [a b d]
    (list (take (list "aa" "bb") 10)
          (take [I J K] 10)
          (til 10))))

; Join on both [a b]
(inner-join [a b] x y)

Left Join

The left-join function keeps all rows from the left table. Unmatched rows have null in the right-side columns.

Signature

(left-join [keys] left right)
Left outer join. All left rows are preserved; unmatched right columns are null.

Examples

(left-join [sym] orders prices)
; id sym  qty price
; -- ---  --- -----
;  1 AAPL 100   150
;  2 GOOG 200   280
;  3 MSFT  50        ← null (no MSFT in prices)
;  4 AAPL 150   150

Left join on multiple keys:

(left-join [a b] x y)

As-of Join

The asof-join function is designed for time-series data. For each row in the left table, it finds the most recent matching row in the right table where the time column is less than or equal to the left's time value. Key columns must match exactly; the last column in the key list is the temporal column.

Signature

(asof-join [keys time-col] left right)
As-of join for time-series alignment. The last key is the temporal column; preceding keys are exact-match. For each left row, returns the most recent right row where right.time <= left.time.

Examples

; Trades and quotes for time-series alignment
(set trades (table [Sym Ts Price]
    (list [AAPL AAPL MSFT]
          [10:00:01 10:00:05 10:00:04]
          [190.05 190.1 410.25])))

(set quotes (table [Sym Ts Bid Ask]
    (list [AAPL AAPL AAPL MSFT]
          [09:59:55 10:00:03 10:00:07 10:00:02]
          [189.9 190 190.05 410.1]
          [190.1 190.2 190.25 410.3])))

; As-of join: for each trade, find prevailing quote
(asof-join [Sym Ts] trades quotes)
; Sym  Ts       Price  Bid   Ask
; ---  --------  ----- ----- -----
; AAPL 10:00:01 190.05 189.9 190.1   ← quote from 09:59:55
; AAPL 10:00:05  190.1 190   190.2   ← quote from 10:00:03
; MSFT 10:00:04 410.25 410.1 410.3   ← quote from 10:00:02

Large-scale Usage

; 10 million row as-of join
(set n 10000000)
(set tsym (take (concat (take 'AAPL 99) (take 'MSFT 1)) n))
(set ttime (+ 09:00:00 (as 'TIME (/ (* (til n) 3) 10))))
(set trades (table [Sym Ts Price] (list tsym ttime (+ 10 (til n)))))
; ... build quotes similarly ...
(asof-join [Sym Ts] trades quotes)

Window Join

The window-join function extends the as-of join with a time window. For each left row, it finds all right rows within a time interval and applies aggregation functions to the matched rows.

Signature

(window-join [keys time-col] intervals left right {col: (agg expr) ...})
Window join with per-row time intervals. intervals is a list of two vectors [lo hi] defining the time window for each left row. Aggregation functions are applied to all right rows falling within each window.

Examples

(set trades (table [Sym Time Price]
    (list [x x x]
          [12:00:01 12:00:04 12:00:06]
          [89.17 70.5 80.54])))

(set quotes (table [Sym Time Size]
    (list [x x x x x x x x x x]
          [12:00:00 12:00:01 12:00:02 12:00:03 12:00:04
           12:00:05 12:00:06 12:00:07 12:00:08 12:00:09]
          [928 528 648 914 918 626 577 817 620 698])))

; Build per-row intervals: [lo, hi] for each trade
(set intervals (map-left + [-1000 1000] (at trades 'Time)))

; Window join: min bid and max ask within each window
(window-join [Sym Time] intervals trades quotes
    {size_min: (min Size) size_max: (max Size)})

Large-scale Window Join

(set n 100000)
(set trades (table [Sym Ts Price]
    (list tsym ttime (+ 10 (til n)))))
(set quotes (table [Sym Ts Bid Ask]
    (list bsym btime bid ask)))

; Build intervals from the trades timestamp
(set intervals (map-left + [-1000 1000] (at trades 'Ts)))

(window-join [Sym Ts] intervals trades quotes
    {bid: (min Bid) ask: (max Ask)})

How Joins Compile to the DAG

All join operations compile to the Rayforce execution DAG. The optimizer and executor handle the details:

  1. DAG constructioninner-join and left-join emit OP_JOIN nodes with join type flags. asof-join emits OP_ASOF_JOIN. window-join emits OP_WINDOW_JOIN.
  2. Optimizer — Predicate pushdown moves filters before the join when possible. Type inference propagates column types through join boundaries. SIP (Sideways Information Passing) can prune the build side using selection bitmaps.
  3. Execution — Equi-joins use a radix-partitioned hash join: the build side is partitioned by hash, then each morsel from the probe side looks up matches in the corresponding partition. As-of and window joins use sorted merge with binary search on the temporal column.
Performance note. For large joins, ensure key columns use efficient types. Symbol columns (RAY_SYM) are dictionary-encoded integers and join fastest. String columns (RAY_STR) work but require hash comparison of variable-length data.

Quick Reference

Function Syntax Description
inner-join (inner-join [keys] left right) Equi-join; only matching rows
left-join (left-join [keys] left right) Left outer join; all left rows preserved
asof-join (asof-join [keys time-col] left right) Time-series alignment; most recent match
window-join (window-join [keys tc] intervals left right {aggs}) Window aggregation over time intervals