cancel
Showing results for 
Search instead for 
Did you mean: 

improving native lj performance

hijibijbij73
New Contributor
I need to update a handful of rows in a large table, where both tables share a unique identifier as key and are sorted.

In below, t is the large table, and u is the small table.

t:update `s#id from ([] id:til 10000; foo:`a; bar:`b; baz:`c; quux:`d)
u:`id xkey update `s#id from ([] id:5000 5050; foo:`e1`e2; bar:`f1`f2; foobar:`g1`g2)

First tried using left-join - the performance is about 117ms (for 1000 iterations) on my machine:

\ts:1000 t lj u / 117ms, 394k

Then noticed that update does this much faster

\ts:1000 update foo:`e1`e2,bar:`f1`f2,foobar:`g1`g2 from t where id in (5000;5050) / 15ms,198k

And it's fairly straightforward to write a generic function that does the update using functional queries:

.my.lj:{[t;u;k]
    ![t;enlist(in;`id;u[k]);0b;enlist each k _ flip u]
    }

\ts:1000 .my.lj[t;0!u;`id] / 17ms, 198k

Seems like at least in this restricted case, it's fairly straightforward to get a 10x improvement on native left-join on what i think is a fairly common use case for left-join.

Am I missing something? Is there a better native way of doing this than lj?

Thanks!
Hiji


3 REPLIES 3

quintanar401
New Contributor
I guess lj is designed for cases where size(t)~size(u). If you try your example for u with 9000 rows for example your fn will be much slower. This may happen sometimes. For example the optimal sym/time or time/sym order in the where clause (with `g# on sym) depends on sym size.

четверг, 28 декабря 2017 г., 11:07:49 UTC+3 пользователь hijibi...@gmail.com написал:
I need to update a handful of rows in a large table, where both tables share a unique identifier as key and are sorted.

In below, t is the large table, and u is the small table.

t:update `s#id from ([] id:til 10000; foo:`a; bar:`b; baz:`c; quux:`d)
u:`id xkey update `s#id from ([] id:5000 5050; foo:`e1`e2; bar:`f1`f2; foobar:`g1`g2)

First tried using left-join - the performance is about 117ms (for 1000 iterations) on my machine:

\ts:1000 t lj u / 117ms, 394k

Then noticed that update does this much faster

\ts:1000 update foo:`e1`e2,bar:`f1`f2,foobar:`g1`g2 from t where id in (5000;5050) / 15ms,198k

And it's fairly straightforward to write a generic function that does the update using functional queries:

.my.lj:{[t;u;k]
    ![t;enlist(in;`id;u[k]);0b;enlist each k _ flip u]
    }

\ts:1000 .my.lj[t;0!u;`id] / 17ms, 198k

Seems like at least in this restricted case, it's fairly straightforward to get a 10x improvement on native left-join on what i think is a fairly common use case for left-join.

Am I missing something? Is there a better native way of doing this than lj?

Thanks!
Hiji


Flying
New Contributor III
The difference in speed comes from the fact that lj has to perform look-up in u for every single row in t, whereas .my.lj only performs that for 2 rows out of the large t.

So I think Andrew pointed out quite correctly that your own "lj" works much better than the default implementation.

Also, lj is doing a bit more work here to give the correct output.  .my.lj is dependent on the order of t and u being the same, really the lj equivalent update should be something like:

update foo:(5000 5050!`e1`e2)id,bar:(5000 5050!`f1`f2)id,foobar:(5000 5050!`g1`g2)id from t where id in (5000;5050)

If you reverse the order of t the results are incorrect:

q)select from  .my.lj[t;0!u;`id] where id in 5000 5050                                                                                                    
id   foo bar baz quux foobar
----------------------------
5000 e1  f1  c   d    g1    
5050 e2  f2  c   d    g2    
q)select from  .my.lj[reverse t;0!u;`id] where id in 5000 5050                                                                                            
id   foo bar baz quux foobar
----------------------------
5050 e1  f1  c   d    g1    
5000 e2  f2  c   d    g2

Also if you have a record in u which isn’t in t it fails:

q).my.lj[t;0!u,1!enlist `id`foo`bar`foobar!(20000;`e3;`f3;`g3);`id]                                                                                       
{[t;u;k]  ![t;enlist(in;`id;u[k]);0b;enlist each k _ flip u]}
'length

I guess the bottom line is that there will always be certain datasets which can be special cased and beat the native generic q operators, but it’s got to be worth the dev effort to think through and handle all the cases.