cancel
Showing results for 
Search instead for 
Did you mean: 

Key Value Store

jlucid
Contributor

Wondering if anyone has ever tried to implement a dedicated key value store in kdb+, something like levelDB.

I have a situation where users wish to perform a lookup by an alphanumeric string but I don't know which date partition contains the associated record in advance. Clearly I need to avoid an exhaustive search across all date partitions. If I had a lookup of string to date that would help narrow the search.

 

I've tried using a keyed table, stored as a flat file, but it's not scalable in terms of memory. I could hold the past months worth in memory and that would satisfy 90% of the queries but I need something more general with constant lookup time.  I'd also like to avoid having to introduce another technology

 

 

3 REPLIES 3

rocuinneagain
Valued Contributor
Valued Contributor

A splayed table on disk with an attribute on a column would be worth testing as these can be mapped rather than requiring to be all in memory

https://code.kx.com/q/ref/set-attribute/#unique

 

Using 1: to write an Anymap file also creates a mappable object worth exploring

https://code.kx.com/q/releases/ChangesIn3.6/#anymap 

 

If a single splay/anymap would be too large a fixed size int partitioned DB on a fixed range hash of the alphanumeric string could be used

Thanks for the ideas Rian, yes the single anymap file would be too large, but I could try distributing the keys across a set of int partitions, so grouping them in some way, perhaps using a hash. That would reduce the search space. Then I could split the partitions again if they get too big. 

 

Another idea was having a Bloom or Cuckoo filter associated with each date partition, using that to determine if a string is definitely not present in a partition to avoid searching, but it's not a native feature and I can't find any examples of people using that.

 

For the levelDB option, I see that I can compile the C++ library into a shared library and then load that into my q process. At least with that approach I am just writing a wrapper library for the main "Get" and "Put" methods. So that should be relatively quick to test against and use as a benchmark 

 

Conor_Mahony
New Contributor

May not be for you, however worth noting that a really simple method of improving performance is to persist a guid string representation together with the original string. This doesn't help with regex type queries of course. -> hashguid:{0x0 sv md5 x}. If the topic is of interest, see https://dataintellect.com/blog/methods-for-storing-text-data-on-disk-in-kdb/