2023.02.20 02:38 PM - edited 2023.03.02 02:29 PM
{[d;p;f;t]
i:iasc t f;
tab:.Q.en[d;`. t];
.[{[d;t;i;c;a]@[d;c;:;a t[c]i]}[d:.Q.par[d;p;t];tab;i;;]]peach flip(c;)(::;`p#)f=c:cols t;
@[d;`.d;:;f,c where not f=c];
t
};
// Set default compression and delete HDB
.z.zd:17 2 6;
system"rm -r /home/alivingston/HDB/*";
dir:`:/home/alivingston/HDB;
// define parallelised .Q.dpft
func:{[d;p;f;t]
i:iasc t f;
tab:.Q.en[d;`. t];
.[{[d;t;i;c;a]@[d;c;:;a t[c]i]}[d:.Q.par[d;p;t];tab;i;;]]peach flip(c;)(::;`p#)f=c:cols t;
@[d;`.d;:;f,c where not f=c];
t
};
// Create table
n:10000000;
trade:([]timestamp:.z.p+til n;sym:n?`2;a:n?0;b:n?1f;c:string n?`3;d:n?0b;e:n?0;f:n?1f;g:string n?`3;h:n?0b);
\ts func[dir;.z.d;`sym;`trade]
\ts .Q.dpft[dir;.z.d;`sym;`trade]
threads| time space
-------| -----------
0 | 0.992 1
2 | 1.52 1.17
4 | 1.8 1.32
8 | 2.61 1.66
threads| time space
-------| -----------
0 | 0.981 1
2 | 1.56 1.08
4 | 1.84 1.2
8 | 2.63 1.49
The parallelised .Q.dpft func with 2 threads ran 56% faster using 8% more RAM, while 8 threads was 163% faster using 50% more RAM.
2023.02.21 09:05 AM
Thanks for posting! Looking forward to seeing if the community has any feedback on your approach.
2023.03.07 08:35 PM
Tacking on here some further improvements Alex and myself discussed:
funcMem:{[d;p;f;t]
i:iasc t f;
c:cols t;
is:(ceiling count[i]%count c) cut i;
tab:.Q.en[d;`. t];
{[d;tab;c;t;f;i].[{[d;t;i;c;a]@[d;c;,;a t[c]i]}[d;tab;i;;]]peach flip(c;)(::;`p#)f=c:cols t}[d:.Q.par[d;p;t];tab;c;t;f;]each is;
@[d;`.d;:;f,c where not f=c];
t
};
This makes the memory drawback less - theoretically this will be more memory efficient than the standard .Q.dpft. What the above is doing is slicing up the parted column into chunks, such that the maximum size of a chunk in memory of the table contains the same number of entries as a single column of the table (which is the maximum amount of data .Q.dpft holds in memory due to writing column-by-column).
The result of this will lead to the benefits of parallelisation as above without the memory drawback we have seen by simply adding peach.
My above statement I made of "more memory efficient than standard .Q.dpft", I've claimed because the chunks are based on matching the number of elements of a column. .Q.dpft writing column-by-column means the maximum memory used would be for the biggest (in bytes) datatype column. The biggest for this new method would only contain part of that large datatype column at any one time, as well as other smaller datatypes, which will lead to at maximum the same memory usage of .Q.dpft in the case when the columns are of the same sized datatype.
Preliminary tests showed the maintained improvement in speed, with no memory drawback. However these tests were not standardised or conducted in an official unit testing framework. Would love to know the official results of this at some point - be that generated by myself or someone else who is curious.
2023.08.04 06:57 PM
This is cool. I also have used and created similar override to .Q.dpft. Use case is valid till 3.6.
If you use KX 4.0 with slaves, it internally is optimised to save data faster on disk (Probably some c code ). Something I did not find in the KX release notes.
In my tests, it will beat/match the peach override.
Many Thanks
Sujoy
EMEA
Tel: +44 (0)28 3025 2242
AMERICAS
Tel: +1 (212) 447 6700
APAC
Tel: +61 (0)2 9236 5700
KX. All Rights Reserved.
KX and kdb+ are registered trademarks of KX Systems, Inc., a subsidiary of FD Technologies plc.