cancel
Showing results for 
Search instead for 
Did you mean: 

HDB - map sym to int for partitioning

erichards
New Contributor II

Our access patterns are heavily based upon accessing one sym for a long history, rather than accessing everything for a given date.

As such it would be massively more efficient for us to partition by sym rather than by date. Or even to maintain two HDBs with both partition types.

While KDB doesn't natively support partitioning by sym, it sounds like this is possible if we map this to an int ourselves.

  • Is this a well-trodden path?
  • What would be the best way to maintain this sym-to-int mapping? Could we re-use the indices in the sym file?
  • How could we ensure this mapping isn't corrupted? Perhaps in a similar way to how .Q.en locks the sym file during writes?
  • Is there an upper limit to the integer that can be used for partitioning?

Many thanks

1 ACCEPTED SOLUTION

davidcrossey
Moderator Moderator
Moderator

Hi erichards,

You could partition by int in this case - note, a sym is just a mapping to an integer value anyway, so you could use that for your partitioning. A blog from Aqua-Q here that discusses how this is applicable to IoT, and how you may manage the mapping as an example: kdb+ IoT Database Structure | AquaQ

In terms of upper limit, I presume that would be the max integer value, but my best advice would be to mock up a database with some int partitions to see how performant it is and any limitations that come with it for your use-case.

One consideration to keep in mind is that depending on your data read/write access patterns, some partitions could grow exponentially large, in which case you may want to add another layer of abstraction to your write down - e.g. int = year+sym for better query distribution later.

Hope this helps.

View solution in original post

2 REPLIES 2

davidcrossey
Moderator Moderator
Moderator

Hi erichards,

You could partition by int in this case - note, a sym is just a mapping to an integer value anyway, so you could use that for your partitioning. A blog from Aqua-Q here that discusses how this is applicable to IoT, and how you may manage the mapping as an example: kdb+ IoT Database Structure | AquaQ

In terms of upper limit, I presume that would be the max integer value, but my best advice would be to mock up a database with some int partitions to see how performant it is and any limitations that come with it for your use-case.

One consideration to keep in mind is that depending on your data read/write access patterns, some partitions could grow exponentially large, in which case you may want to add another layer of abstraction to your write down - e.g. int = year+sym for better query distribution later.

Hope this helps.

Thank you David, it is re-assuring to know that we are not following an anti-pattern or something too bespoke.

 

With a small demo we have found a sym-partitioned db to be much more performant for our access patterns.

 

I had not considered the partition being a combination of sym + time, that might come in useful.