I understand that for partitioned table, the partition domain can be day, month etc, or an integer value. Now, I have a tick data database and it is huge. I would like to partition the table by something smaller than a day. For example, by hour. Is that possible to do that in KDB?
The only thing that I can think of is convert the datetime to nanosecond, create a column that is (nanosecond / 1e9*3600), and whenever I need a query, I will match this column first. Is that the "correct" way to achieve what I wanna do, or is there some other better method?
Yes that's possible, just make sure you don't partition by 'hour' otherwise you'll end up with 24 partitions with multiple days in each
You want to partition by date+hour (i.e. another level of granularity on top of date) which you have correctly pointed out in your post.
update dth:`long$t-t mod 60*60*1e9 from tab // would create a long column from the date and hour which you could use as the partition domain, but will have redundant 0s in the domain's value. It does however have certain advantages
update dth:`long$t div 60*60*1e9 from tab // would be a nicer domain, in that each subsequent hour is an increment of 1, and it just looks smaller/easier to work with
This 'hourly' method is entirely feasible for storing the data, but there are some caveats when you come to query the data - like you said the partition column must be matched first for efficient querying and complexity ...
where date=date is easy to understand
where int=randomNumber is slightly more complex
See here for some more info http://code.kx.com/wiki/JB:KdbplusForMortals/partitioned_tables#1.3.7_Examples_of_Partition_Domains
If everything is placed behind an api, then I guess you should have no problems - you have control of how your clients will query, so you can always ensure the partition column is filtered on first and matched/used correctly
Below is a working example of how it can work
// test tab, 24 rows across 2 days, between 9 and 11am, every 10 mins