Hello,
Would it be possible for you to provide some more information? Are
all the tables on disk? What size are they? What memory
constraints do you have?
You suggested that splaying the tables might help. When you load a
splayed table, the table is mapped to memory, rather than being
loaded into RAM. However upon joining, they will be loaded into
RAM. I understand that your concern is loading all the tables for
the join and the memory cost associated with that.
It may be suitable to join one table at a time, then upsert into
your final splayed table on disk. I've outlined how you might
approach this below.
Turn on immediate garbage collection:
\g 1
For instance, create an empty splayed table, with the same schema
as the end goal table, t:
t::([]a:`$();b:`int$();c:`int$();d:`int$();e:`int$())
`:hdb/ujtab/ set .Q.en[`:hdb] t
Create a few sample tables to be joined:
ta:([]a:`q`w`e;b:10
11 12i;c:1 2 3i)
tb:([]a:`r`t`y;d:18 16 15i;e:112 221 332i)
Join each table (uj) and upsert into your splayed table on disk
(enumerating with .Q.en, assuming you have a sym column):
{`:hdb/ujtab/
upsert .Q.en[`:hdb] t uj value x} each `ta`tb
With immediate gc turned on, kdb will free up memory as each table
is joined and should keep memory usage to a minimum.
Feel free to follow up with more details and I might be able to
offer a more optimised solution.
Thanks,
Matthew
AquaQ Analytics