cancel
Showing results for 
Search instead for 
Did you mean: 

Question about the performance difference of two queries on HDB

carfield1
New Contributor
HI all

I have 2 very simple query which do essentially same work:

\t select avg x by sym, minute from t where date in dates,sym in syms
\t t1:select from t where date in dates; select avg x by sym, minute from t1 where sym in syms

The table is not really huge, there are about 20K syms with data every minutes within market hours, date, minute, sym, x are the only columns, the HDB is date partition and there is parted attribute at sym column.

syms is about 5K syms and data is across 20 days

The 2nd one run 5 times faster than the 1st one pretty consistently, I think the reason is that first one will do aggregation date by date so that 1st take more IO, but taking 5 times more is a bit surprising to me, will anyone have comment about the performance difference

Thanks
7 REPLIES 7

effbiae
New Contributor
does changing the order of the where conditions help?

\t select avg x by sym, minute from t where sym in syms,date in dates

It thought it will loop through all date partitions? which is over few years....

To: personal-kdbplus@googlegroups.com
Are you running kdb+ with slaves?

From: Carfield Yim
Sent: Friday, January 22, 2016 16:44
To: personal-kdbplus@googlegroups.com
Reply To: personal-kdbplus@googlegroups.com
Subject: Re: [personal kdb+] Question about the performance difference of two queries on HDB

It thought it will loop through all date partitions? which is over few years....

Sorry, missed this reply, no, it is just a normal KDB HDB process

sohagan
New Contributor
Hey Carfield,

Could you answer David's question? I am intrigued by your question. Unless of course you have figured it out?
Threads may very well be having an effect.

If you haven't figured it out, are all the sym files in each partition parted? The meta will be just showing you the latest..so we should check that too. 

Something like below would work
group date!{[x;y]exec attr sym from select sym from x where date=y}[t]each date

Cheers,
Sean

Hi Sean, thanks, there is no threading involved, it is just a normal single threaded HDB process

sohagan
New Contributor
Thanks Carfied.

So we can tick threading off the list. 
Lets continue to narrow down your issue.

Are there any differences in the attributes of sym across partitions?
What does count of 'group date!{[x;y]exec attr sym from select sym from x where date=y}[t]each date' give you?

What version of kdb+ are you running? (.z.k;.z.K)

Could you perhaps copy and paste your output from running the query multiple times?

\ts:10 select avg x by sym, minute from t where date in dates,sym in syms
???
\ts:10 t1:select from t where date in dates; select avg x by sym, minute from t1 where sym in syms
???