cancel
Showing results for 
Search instead for 
Did you mean: 

xasc gives wsfull - is there any workaround?

Nick_Mospan
New Contributor III
Hi All,

I'm running a 32bit version, 

KDB+ 3.3 2015.08.23 Copyright (C) 1993-2015 Kx Systems
l32/ 4()core 7893MB 

Using TorQ WDB and TickerLogReplay - both crash on sorting my data with wsfull. I'm sorting a quote table of about 70M records.

The problem can be reproduced by the following. I don't think kdb does it differently for on-disk sorting, otherwise I won't be getting wsfull.

q)\ts t:([]a:100000000?1.0;b:100000000?1.0)
1283 2147484192
q)`a xasc t
wsfull

Is there a workaround? How can I sort a table without loading the whole column into memory?

Thanks
Nick 
5 REPLIES 5

Nick_Mospan
New Contributor III
I've been looking into this a bit further and I don't quite understand why kdb sort algo consumes multiples of the original memory taken by a column.
E.g. my sym column in an enumerated splayed table takes 260Mb.

splayedtable: `:/path/to/table

\ts asc value splayedtable`sym
1759 2147483936

\ts iasc value splayedtable`sym
728 1610744064

So it takes 2.1Gb of memory to sort a 260Mb vector of ints? A bit less (1.6Gb) to produce indexes for sorting but still way too much.
The `time column takes twice as much bytes and can't be sorted at all.

\ts asc splayedtable`time
wsfull


Is there any in-place sort algo available?

Thanks
Nick

KDB doesn't have any other implicit sorting mechanism . You need to define your own function for that.

For On-Disk sorting, take a look at following. This will give you some ideas to start:

http://code.kx.com/wiki/Reference/xasc#Sorting_data_on_disk
http://code.kx.com/wiki/JB:KdbplusForMortals/splayed_tables#1.2.5.3_Sorting_by_a_Column_on_Disk
https://groups.google.com/forum/#!msg/personal-kdbplus/kho3unfJ9uc/7UdOBGKKigkJ

Please check the Note on the page about the issues that could occur during on disk sorting.

Hi Rahul,

Thanks for your reply. Can't see any pointers in the links. The code there is based on xasc or iask which fails to run itself, the last link confirms that there's a problem with memory consumption, perhaps it's just a trade off for speed.


So what are my options?
- Implement custom sort in q and/or k.
- Break the table into a few parts with distinct keys, sort each part and then merge.
- Use Kona to sort the table, never used Kona before, not sure if there's a built in sort.
- Implement a custom table sorter in C/C++ using structures from C interface provided by KX, will require some reverse engineering I guess.
- Buy a licence from KX.

What would you choose and why?

Thanks!