2021.11.04 08:57 AM
Hello all,
Sharing a query from a client and would appreciate some feedback from the community:
Is it possible to partition data based on a time/date field other than the date in the TP log file?
Is this a feasible request and a feature that can be easily incorporated?
Looking forward to your input,
Leah
2021.11.04 06:07 PM
Hi Leah,
A kdb logfile typically has the form of (func;tabName;tabData), which can be as simple as f:{upsert[x;y]}. There isn't anything special in terms of replaying the data into the kdb process, but more so to do with saving down the data.
Assuming you mean to save to another date, you could simply pull out a date from the data (after replay) and use it as a parameter for your save down logic.
If you mean at a higher level on partitioning the data differently across the entire database, data can be partitioned by any column of types date, month, year and int (1, 2), so depending on the requirements you could map values to a specific int based partition schema (3).
Hope this helps!
Kind regards,
David
References:
2021.11.05 02:50 AM
Hi Leah,
This blog post will likely be of interest: https://kx.com/blog/partitioning-data-in-kdb/
It covers the basics of looking at hourly partitions and also fixed size partitions.
Regards,
Rian
2021.11.10 07:32 AM
The Platform codebase is designed to write date partitioned HDB's only.
The Intraday database - KX Platform exists to so that 24hrs of data does not need to be kept in memory, but it will again store to the HDB on a date partition only.
2021.11.12 01:52 AM
That would require for deep level changes to the codebase.
For example when a HDB is written to a new:
If you reloaded the HDB before all tables were written you would have errors.
q)\mkdir HDB
q)\cd HDB
q)`:2021.01.01/tab1/ set ([] a:1 2 3)
`:2021.01.01/tab1/
q)`:2021.01.01/tab2/ set ([] b:1 2 3)
`:2021.01.01/tab2/
q)\l .
q)select from tab1
date a
------------
2021.01.01 1
2021.01.01 2
2021.01.01 3
q)select from tab2
date b
------------
2021.01.01 1
2021.01.01 2
2021.01.01 3
q)`:2021.01.02/tab1/ set ([] a:4 5 6)
`:2021.01.02/tab1/
q)\l .
q)select from tab1
date a
------------
2021.01.01 1
2021.01.01 2
2021.01.01 3
2021.01.02 4
2021.01.02 5
2021.01.02 6
q)select from tab2
'./2021.01.02/tab2/b. OS reports: No such file or directory
[0] select from tab2
^
q))
.Q.bv - would be one possible helpful extension which can in memory fill tables missing from partitions (using ` as argument to fill using first partition as a template)
q)\l HDB
q)select from tab1
date a
------------
2021.01.01 1
2021.01.01 2
2021.01.01 3
2021.01.02 4
2021.01.02 5
2021.01.02 6
q)select from tab2 //Table missing from latest partition is not found
'tab2
[0] select from tab2
^
q).Q.bv` //Using ` first partition used as prototype and table is found
q)select from tab2
date b
------------
2021.01.01 1
2021.01.01 2
2021.01.01 3
(.Q.chk is unsuitable as it uses most recent partition as template to fills tables missing from partitions)
This is only one item that would be needed to implement what you asked.
Currently EOD is a single action for all tables, all code and processes involved would need updates to operate table by table. Meaning changes in RDB/IDB/HDB and others.
2021.11.04 06:07 PM
Hi Leah,
A kdb logfile typically has the form of (func;tabName;tabData), which can be as simple as f:{upsert[x;y]}. There isn't anything special in terms of replaying the data into the kdb process, but more so to do with saving down the data.
Assuming you mean to save to another date, you could simply pull out a date from the data (after replay) and use it as a parameter for your save down logic.
If you mean at a higher level on partitioning the data differently across the entire database, data can be partitioned by any column of types date, month, year and int (1, 2), so depending on the requirements you could map values to a specific int based partition schema (3).
Hope this helps!
Kind regards,
David
References:
2021.11.05 02:12 AM
This is great feedback! Thank you very much David.
I will confirm the client's intention here but either way, you've provided me with the knowledge I was looking for.
Appreciate the referenced documentation too - great help!
Kind regards,
Leah
2021.11.05 02:50 AM
Hi Leah,
This blog post will likely be of interest: https://kx.com/blog/partitioning-data-in-kdb/
It covers the basics of looking at hourly partitions and also fixed size partitions.
Regards,
Rian
2021.11.05 02:58 AM
Definitely of interest, for me and the client! Great information here!
Thanks very much Rian.
2021.11.10 07:28 AM
Hi all,
The client has since provided feedback and has stated that their question was specifically aimed at DS_LOG_RECOVERY contained within KX Control.
They have already implemented their own version of intraday partitioning. However, they are wondering if DS_LOG_RECOVERY supports this?
In the instance parameters, they observed the following description under "date":
"Date to save to in HDB, if 1970.01.01 then it takes the date from the TP logfiles"
Does this suggest that the DS_LOG_RECOVERY process supports or does not support intraday partitioning?
Any feedback here would be great.
Thanks,
Leah
2021.11.10 07:32 AM
The Platform codebase is designed to write date partitioned HDB's only.
The Intraday database - KX Platform exists to so that 24hrs of data does not need to be kept in memory, but it will again store to the HDB on a date partition only.
2021.11.10 07:49 AM
Appreciate the quick response, thanks very much Rian.
2021.11.11 10:54 AM
Hi all,
A follow up question here: is it possible to disable querying for a SINGLE table in the query router (disable instances with GW and QR)?
DS_LOG_REPLAY template uses these function to stop queries during writedown, but is it possible to do this on table by table basis to reduce downtime.
Thanks,
Leah
2021.11.12 01:52 AM
That would require for deep level changes to the codebase.
For example when a HDB is written to a new:
If you reloaded the HDB before all tables were written you would have errors.
q)\mkdir HDB
q)\cd HDB
q)`:2021.01.01/tab1/ set ([] a:1 2 3)
`:2021.01.01/tab1/
q)`:2021.01.01/tab2/ set ([] b:1 2 3)
`:2021.01.01/tab2/
q)\l .
q)select from tab1
date a
------------
2021.01.01 1
2021.01.01 2
2021.01.01 3
q)select from tab2
date b
------------
2021.01.01 1
2021.01.01 2
2021.01.01 3
q)`:2021.01.02/tab1/ set ([] a:4 5 6)
`:2021.01.02/tab1/
q)\l .
q)select from tab1
date a
------------
2021.01.01 1
2021.01.01 2
2021.01.01 3
2021.01.02 4
2021.01.02 5
2021.01.02 6
q)select from tab2
'./2021.01.02/tab2/b. OS reports: No such file or directory
[0] select from tab2
^
q))
.Q.bv - would be one possible helpful extension which can in memory fill tables missing from partitions (using ` as argument to fill using first partition as a template)
q)\l HDB
q)select from tab1
date a
------------
2021.01.01 1
2021.01.01 2
2021.01.01 3
2021.01.02 4
2021.01.02 5
2021.01.02 6
q)select from tab2 //Table missing from latest partition is not found
'tab2
[0] select from tab2
^
q).Q.bv` //Using ` first partition used as prototype and table is found
q)select from tab2
date b
------------
2021.01.01 1
2021.01.01 2
2021.01.01 3
(.Q.chk is unsuitable as it uses most recent partition as template to fills tables missing from partitions)
This is only one item that would be needed to implement what you asked.
Currently EOD is a single action for all tables, all code and processes involved would need updates to operate table by table. Meaning changes in RDB/IDB/HDB and others.
2021.11.12 02:31 AM
Greatly appreciate your detailed response here.
Thank you very much Rian.
EMEA
Tel: +44 (0)28 3025 2242
AMERICAS
Tel: +1 (212) 447 6700
APAC
Tel: +61 (0)2 9236 5700
KX. All Rights Reserved.
KX and kdb+ are registered trademarks of KX Systems, Inc., a subsidiary of FD Technologies plc.