cancel
Showing results for 
Search instead for 
Did you mean: 

Finding the oldest record in a large partitioned table

jwbuitenhuis
New Contributor
Hi all - enjoying this mailgroup a lot - thanks for all the useful input! Two questions:

1) Given the below, is there a faster solution to find the oldest record?
I'm assuming a large partitioned table and a maximum timeframe of 50 years.

2) Is there a prettier way to find out if there's a record for a given date?
(I'm still finding out about grepping the directory structure on disk, i.e. do we have 
a non-empty file /tmp/db/1995.01.01/t/date)

---------------------

create:{[size;date]
sym: `$":/tmp/db/",(string date),"/t/";
t:([] date:x+00:30*til size; price:size?10);
sym set t
}

/ 10000 directories - 110MB - uncomment to generate:
/ create[100] each .z.D - til 10000

/ is there a prettier way?
check:{0 < first exec cnt from select cnt: count date from t where date=x}

/ x is the search window which will be repeatedly halved
find:{
/ generate halfway marks for search
steps: reverse {2*x}\[x>;1];
/ if date exists, turn left, else return right
{$[check x;x-y;x+y]}/[.z.D;steps]}

system("l /tmp/db")

show "binary search"
\t find[100*365]
/ 2 ms on my Mac

show "select min date from t"
\t select min date from t
/ 480 ms on my Mac

-------------
Thanks,
Will

3 REPLIES 3

thomas_smyth
New Contributor
Hi,

The counts of partitions can be retrieved usign the .Q namespace, of which there is an example here: http://code.kx.com/q/ref/dotq/#qpn-partition-counts

For a table t, .Q.cn will get the partition counts, which will then be visible in .Q.pn.

The counts for each date can be viewed with .Q.pv!.Q.pn`t.

You should be able get the oldest partition containing values with: .Q.pv first where 0<.Q.pn`t.

The dictionary: (.Q.pv!0<.Q.pn`t) should allow you to find out if a partition is populated for table t.

Regards,
Thomas
AquaQ Analytics

Hi Thomas,

Thanks for pointing that out, that seems like a great option. In the environment I need this I won't be able to run these functions - consequences of specific large scale systems.

The above works well, however, is there a cleaner way to phrase this query? The exec.. select ... approach comes from 'Q for Mortals' to work around a limitation in partitioned tables.

check:{0 < first exec cnt from select cnt: count date from t where date=x}

Thanks,
Will

sujoy
New Contributor II
Since you are querying HDB's.... 

Try "select cnt:count i=0 from t where date=x"   -> This uses the same concept of .Q.cn and .Q.pn internally and maps to find the partitions in a second. Please note, this gives exceptional performance, but only for HDB's.


Regards
Sujoy