cancel
Showing results for 
Search instead for 
Did you mean: 

Partitioning Tables Intraday by Custom Fields?

LeahS
Moderator Moderator
Moderator

Hello all,

Sharing a query from a client and would appreciate some feedback from the community:

Is it possible to partition data based on a time/date field other than the date in the TP log file?

Is this a feasible request and a feature that can be easily incorporated?

Looking forward to your input,

Leah 

 

 

 

4 ACCEPTED SOLUTIONS

davidcrossey
Moderator Moderator
Moderator

Hi Leah,

A kdb logfile typically has the form of (func;tabName;tabData), which can be as simple as f:{upsert[x;y]}. There isn't anything special in terms of replaying the data into the kdb process, but more so to do with saving down the data.

Assuming you mean to save to another date, you could simply pull out a date from the data (after replay) and use it as a parameter for your save down logic.

If you mean at a higher level on partitioning the data differently across the entire database, data can be partitioned by any column of types date, month, year and int (1, 2), so depending on the requirements you could map values to a specific int based partition schema (3).

Hope this helps!

Kind regards,

David

References:

  1. 14. Introduction to Kdb+ - Q for Mortals (kx.com)
  2. Partitioned tables | Knowledge Base | kdb+ and q documentation - Kdb+ and q documentation (kx.com)
  3. Multi-partitioned kdb+ databases: an equity options case study | White Papers | kdb+ and q documenta...

View solution in original post

rocuinneagain
Contributor II
Contributor II

Hi Leah,

 

This blog post will likely be of interest: https://kx.com/blog/partitioning-data-in-kdb/ 

It covers the basics of looking at hourly partitions and also fixed size partitions.

 

Regards,

Rian

View solution in original post

The Platform codebase is designed to write date partitioned HDB's only.

The Intraday database - KX Platform exists to so that 24hrs of data does not need to be kept in memory, but it will again store to the HDB on a date partition only.

View solution in original post

That would require for deep level changes to the codebase.

For example when a HDB is written to a new:

  1. A new date folder is created.
  2. Tables are written.
  3. The HDB process is reloaded

If you reloaded the HDB before all tables were written you would have errors.

q)\mkdir HDB
q)\cd HDB
q)`:2021.01.01/tab1/ set ([] a:1 2 3)
`:2021.01.01/tab1/
q)`:2021.01.01/tab2/ set ([] b:1 2 3)
`:2021.01.01/tab2/
q)\l .
q)select from tab1
date       a
------------
2021.01.01 1
2021.01.01 2
2021.01.01 3
q)select from tab2
date       b
------------
2021.01.01 1
2021.01.01 2
2021.01.01 3
q)`:2021.01.02/tab1/ set ([] a:4 5 6)
`:2021.01.02/tab1/
q)\l .
q)select from tab1
date       a
------------
2021.01.01 1
2021.01.01 2
2021.01.01 3
2021.01.02 4
2021.01.02 5
2021.01.02 6
q)select from tab2
'./2021.01.02/tab2/b. OS reports: No such file or directory
  [0]  select from tab2
       ^
q))

.Q.bv - would be one possible helpful extension which can in memory fill tables missing from partitions (using `  as argument to fill using first partition as a template)

q)\l HDB
q)select from tab1
date       a
------------
2021.01.01 1
2021.01.01 2
2021.01.01 3
2021.01.02 4
2021.01.02 5
2021.01.02 6
q)select from tab2 //Table missing from latest partition is not found
'tab2
  [0]  select from tab2
                   ^
q).Q.bv` //Using ` first partition used as prototype and table is found
q)select from tab2
date       b
------------
2021.01.01 1
2021.01.01 2
2021.01.01 3

(.Q.chk is unsuitable as it uses most recent partition as template to fills tables missing from partitions)

 

This is only one item that would be needed to implement what you asked.

Currently EOD is a single action for all tables, all code and processes involved would need updates to operate table by table. Meaning changes in RDB/IDB/HDB and others.

 

View solution in original post

10 REPLIES 10

davidcrossey
Moderator Moderator
Moderator

Hi Leah,

A kdb logfile typically has the form of (func;tabName;tabData), which can be as simple as f:{upsert[x;y]}. There isn't anything special in terms of replaying the data into the kdb process, but more so to do with saving down the data.

Assuming you mean to save to another date, you could simply pull out a date from the data (after replay) and use it as a parameter for your save down logic.

If you mean at a higher level on partitioning the data differently across the entire database, data can be partitioned by any column of types date, month, year and int (1, 2), so depending on the requirements you could map values to a specific int based partition schema (3).

Hope this helps!

Kind regards,

David

References:

  1. 14. Introduction to Kdb+ - Q for Mortals (kx.com)
  2. Partitioned tables | Knowledge Base | kdb+ and q documentation - Kdb+ and q documentation (kx.com)
  3. Multi-partitioned kdb+ databases: an equity options case study | White Papers | kdb+ and q documenta...

This is great feedback! Thank you very much David. 

I will confirm the client's intention here but either way, you've provided me with the knowledge I was looking for. 

Appreciate the referenced documentation too - great help!

Kind regards,

Leah 

 

rocuinneagain
Contributor II
Contributor II

Hi Leah,

 

This blog post will likely be of interest: https://kx.com/blog/partitioning-data-in-kdb/ 

It covers the basics of looking at hourly partitions and also fixed size partitions.

 

Regards,

Rian

Definitely of interest, for me and the client! Great information here!

Thanks very much Rian. 

Hi all,

The client has since provided feedback and has stated that their question was specifically aimed at DS_LOG_RECOVERY contained within KX Control. 

They have already implemented their own version of intraday partitioning. However, they are wondering if DS_LOG_RECOVERY supports this? 

In the instance parameters, they observed the following description under "date":

"Date to save to in HDB, if 1970.01.01 then it takes the date from the TP logfiles"

Does this suggest that the DS_LOG_RECOVERY process supports or does not support intraday partitioning? 

Any feedback here would be great.

Thanks,

Leah 

The Platform codebase is designed to write date partitioned HDB's only.

The Intraday database - KX Platform exists to so that 24hrs of data does not need to be kept in memory, but it will again store to the HDB on a date partition only.

Appreciate the quick response, thanks very much Rian. 

Hi all,

A follow up question here: is it possible to disable querying for a SINGLE table in the query router (disable instances with GW and QR)?

DS_LOG_REPLAY template uses these function to stop queries during writedown, but is it possible to do this on table by table basis to reduce downtime.

  disableServers[];
  // disable local services with QR
  .ds.serv.disableLocal[.ds.serv.rdbClasses, .ds.serv.ihdbClasses, .ds.serv.hdbClasses];

Thanks,

Leah

That would require for deep level changes to the codebase.

For example when a HDB is written to a new:

  1. A new date folder is created.
  2. Tables are written.
  3. The HDB process is reloaded

If you reloaded the HDB before all tables were written you would have errors.

q)\mkdir HDB
q)\cd HDB
q)`:2021.01.01/tab1/ set ([] a:1 2 3)
`:2021.01.01/tab1/
q)`:2021.01.01/tab2/ set ([] b:1 2 3)
`:2021.01.01/tab2/
q)\l .
q)select from tab1
date       a
------------
2021.01.01 1
2021.01.01 2
2021.01.01 3
q)select from tab2
date       b
------------
2021.01.01 1
2021.01.01 2
2021.01.01 3
q)`:2021.01.02/tab1/ set ([] a:4 5 6)
`:2021.01.02/tab1/
q)\l .
q)select from tab1
date       a
------------
2021.01.01 1
2021.01.01 2
2021.01.01 3
2021.01.02 4
2021.01.02 5
2021.01.02 6
q)select from tab2
'./2021.01.02/tab2/b. OS reports: No such file or directory
  [0]  select from tab2
       ^
q))

.Q.bv - would be one possible helpful extension which can in memory fill tables missing from partitions (using `  as argument to fill using first partition as a template)

q)\l HDB
q)select from tab1
date       a
------------
2021.01.01 1
2021.01.01 2
2021.01.01 3
2021.01.02 4
2021.01.02 5
2021.01.02 6
q)select from tab2 //Table missing from latest partition is not found
'tab2
  [0]  select from tab2
                   ^
q).Q.bv` //Using ` first partition used as prototype and table is found
q)select from tab2
date       b
------------
2021.01.01 1
2021.01.01 2
2021.01.01 3

(.Q.chk is unsuitable as it uses most recent partition as template to fills tables missing from partitions)

 

This is only one item that would be needed to implement what you asked.

Currently EOD is a single action for all tables, all code and processes involved would need updates to operate table by table. Meaning changes in RDB/IDB/HDB and others.

 

Greatly appreciate your detailed response here.

Thank you very much Rian.