2022.11.30 05:33 AM
Hello Fellow Qbies,
I'm attempting to run a wj across a table with a dimension of 1million rows and 30 columns.
It works, but it takes too long to run.
What are some tips and tricks to dealing with large datasets and windowjoins?
2022.11.30 07:39 AM
The documentation does have a note on attributes to apply:
t and q are simple tables to be joined
(q should be sorted `sym`time with `p# on sym)
Have you applied this?
2022.11.30 08:11 AM
What if I'm not using a sym column? I'm going datetime to datetime for the windowjoin.
Would I get speed improvements by using a sym column?
2022.11.30 08:28 AM
So you are only joining on time windows? no other column?
You could apply the sorted attribute.
https://code.kx.com/q/ref/set-attribute/#sorted
The 'datetime' datatype is deprecated. 'timestamp' should be used instead.
https://code.kx.com/q/basics/datatypes/#temporal
If you can share a minimal reproducible example in q that will help us answer your questions.
2022.11.30 09:50 AM - edited 2022.11.30 10:14 AM
I have 800,000 rows of data in a table with the following meta:
c | t f a ---------------| ----- col1 | n col2 | s col3 | d mmm3 | f col4 | f col5 | s col6 | i col7 | i col8 | j col9 | f col10 | j dt | p s col11 | f col12 | f col13 | j col14 | j col15 | f col16 | f col17 | f col18 | f col19 | f col20 | f col21 | f col22 | f col23 | j col24 | j
Im attempting to get a rolling 5minute min/max using wj. (i recognize the window isn't such for 5minutes, im just writing up an example)
f:`dt;
w:(-1000000; 0)+\:data[`dt];
data: wj[w;f;data;(data;(max;`mmm3);(min;`mmm3))];
The wj works just fine, but takes insanely long to complete.
2022.11.30 01:04 PM
I've tested it down to only 1 column, the timestamp, being wj to.
Is there a way to open threading or something? It's like 12,000 entries per timestamp being read. How long should that take?
2022.12.01 08:29 AM
On my laptop this took 765 milliseconds - what runtime are you seeing?
data:([] dt:`s#.z.p+0D00:00:01*til 800000;mmm3:800000?1000.0)
f:`dt;
w:(-1000000; 0)+\:data[`dt];
\ts result: wj[w;f;data;(data;(max;`mmm3);(min;`mmm3))]
2022.12.01 09:40 AM
It took me significantly longer. But, I'm also dealing with 30 columns -- would that matter, even though I'm just using mmm3 for the wj?
In testing, when I make the data table that im searching (data;(max:`mmm3)) smaller, things speed up. For example, I ran a 1 minute xbar on that table and the windowjoin now takes a couple seconds.
I dont understand what I could be doing wrong. At full scale, 800,000x30cols, it took like 30-40 minutes to complete a 5 minute lookback.
2022.12.02 02:38 AM - edited 2022.12.02 02:39 AM
I added extra columns and it did not slow down the operation for me.
Is your data in memory or on disk?
data:([] dt:`s#.z.p+0D00:00:01*til 800000;mmm3:800000?1000.0)
data:data,'flip (`$"col",/:string til 30)!30#enlist til 800000
f:`dt;
w:(-1000000; 0)+\:data[`dt];
result: wj[w;f;data;(data;(max;`mmm3);(min;`mmm3))]
2022.12.02 05:48 AM
Im working in memory.
I load the file in its 800,000 x 7. Then run a bunch of updates to make the remaining 23 columns (30 or so total).
And then run the wj. And pass back the resulting table.
Im trying to simulate what it would be doing in a tp. Like as more data comes in the wj is going to continuously run slower until it hits max file for the day, 800k rows. So I figured loading the whole file in and doing it all at once would be a decent enough way to test what it'd be doing.
EMEA
Tel: +44 (0)28 3025 2242
AMERICAS
Tel: +1 (212) 447 6700
APAC
Tel: +61 (0)2 9236 5700
KX. All Rights Reserved.
KX and kdb+ are registered trademarks of KX Systems, Inc., a subsidiary of FD Technologies plc.