cancel
Showing results for 
Search instead for 
Did you mean: 

What is the role played by key columns in a keyed table [ query join/performance ] ?

Anand_Kulkarni_
New Contributor II

I would like to understand what is the underlying optimization or performance tweak that kdb+/q is able to extract when we apply key on one/more columns in the table. Key serves the purpose of helping join 2 or more tables via row to row match using the keys. Internally the keyed table is stored as a map [ key value pair ]. The keyed columns are the key and the rest of unkeyed columns make the payload/data tuple. since it is a map querying the map for a particular key must be faster than an unkeyed table. The keys are basically nodes on a BST ( binary search tree ) which should look up in O(log(n)) time given n is size of the table. An unkeyed table on the other hand is an aggregate of rows [ an array basically ]. Hence searching it or  querying should be O(n). But when we run a simple test with some data there is no difference in speed between querying unkeyed vs keyed table at all. Why is this ? In Fact the keyed table takes slightly more   time to  lookup a particular record. I do understand that the table may not be sorted in memory based on the keys but still with a dictionary type keyed table should a faster search be not possible ?

2 ACCEPTED SOLUTIONS

rocuinneagain
Contributor III
Contributor III

You can see the speed improvement if you index rather than query using qsql:

q)select from t1 where Name=`Anand
Name  Age Salary
----------------
Anand 10  1000
q)select from t2 where Name=`Anand
Name | Age Salary
-----| ----------
Anand| 10  1000
q)t2`Anand
Age   | 10
Salary| 1000
q)\ts do[100000;select from t1 where Name=`Anand]
164 1808
q)\ts do[100000;select from t2 where Name=`Anand]
174 1808
q)\ts do[100000;t2`Anand]
71 960

 

View solution in original post

darrenwsun
New Contributor III

With qsql, the full column is searched before the result is presented. With key lookup, the search stops when it finds the first match. This is where the performance gain from key lookup comes from. 

The keys are basically nodes on a BST ( binary search tree ) which should look up in O(log(n)) time given n is size of the table

This isn't true. KDB's keyed table, or more generally a plain dictionary, does not use hashing techniques like Java's HashMap. Lookup is done by searching through the table/list linearly. If one wishes less time for the lookup, grouped (conceptually the same as index) or parted (conceptually a space-optimized index that assumes data are sorted) attributes help.

 

View solution in original post

7 REPLIES 7

rocuinneagain
Contributor III
Contributor III

You can see the speed improvement if you index rather than query using qsql:

q)select from t1 where Name=`Anand
Name  Age Salary
----------------
Anand 10  1000
q)select from t2 where Name=`Anand
Name | Age Salary
-----| ----------
Anand| 10  1000
q)t2`Anand
Age   | 10
Salary| 1000
q)\ts do[100000;select from t1 where Name=`Anand]
164 1808
q)\ts do[100000;select from t2 where Name=`Anand]
174 1808
q)\ts do[100000;t2`Anand]
71 960

 

It's amazing that lookup via key works faster but same lookup isn't used as a hint when querying the table via qsql ? Sounds surprising behaviour actually and doesn't look sensible.

For optimisations in qsql queries you should look to attributes.

Here applying the unique attribute to the Name column:

q)t3:update `u#Name from t1
q)\ts do[100000;select from t3 where Name=`Anand]
64 1808

 

 

darrenwsun
New Contributor III

With qsql, the full column is searched before the result is presented. With key lookup, the search stops when it finds the first match. This is where the performance gain from key lookup comes from. 

The keys are basically nodes on a BST ( binary search tree ) which should look up in O(log(n)) time given n is size of the table

This isn't true. KDB's keyed table, or more generally a plain dictionary, does not use hashing techniques like Java's HashMap. Lookup is done by searching through the table/list linearly. If one wishes less time for the lookup, grouped (conceptually the same as index) or parted (conceptually a space-optimized index that assumes data are sorted) attributes help.

 

Regarding why q-sql doesn't do key lookup internally if the search is on the key, short answer is that they are not equivalent. Consider a keyed table where there are duplicate keys (yes this is allowed in KDB, there is no such thing as "primary key constraint" in this world), q-sql may return a table of multiple rows, while the key lookup form returns a dictionary representing the first entry (you know why - the search stops at first matched entry)

PCarroll
Community Manager Community Manager
Community Manager

Just wanted to codify some of the above:

q)tab:([]sym:-50000?`6;px:50000?10)
q)ktab:`sym xkey tab 
// wanted to show the last record to convince that if kdb 
// has to search through entire list, no speed gain is evident
q)-1#tab
sym    px
---------
obafmn 6 
q)-1#ktab
sym   | px
------| --
obafmn| 6 
q)\ts do[100000;select from tab where sym=`obafmn]
2034 66240
q)\ts do[100000;select from ktab where sym=`obafmn]
2051 66240

// below no speed gain, but memory usage is reduced
q)\ts do[100000;ktab`obafmn]
2079 960
// however this does mean any other *valid* request would be faster
q)rand key ktab
sym| kfpfmd
q)\ts do[100000;ktab`kfpfmd]
1169 960

// showing point above with grouped attribute applied to keyed column
q)gktab:`sym xkey update `g#sym from tab
q)\ts do[100000;select from gktab where sym=`obafmn]
89 1808

// final point I wanted to make showing duplicate keys 
q)`a`a!(1;1)
a| 1
a| 1

Thanks @darrenwsun and @rocuinneagain for contributing here.

darrenwsun
New Contributor III

Many thanks! I typed the responses during commute, would love to accompany the explanation with examples but doing that on mobile isn't too easy 🙂