cancel
Showing results for 
Search instead for 
Did you mean: 

Splayed table

badaouisaad
New Contributor
Hey Guys,

I have created a partitioned/splayed table. For each country/each date, I've created a file that contains all the tickdata so if I want to extract the tickdata for a particular country i do: 

1) I load the tickdata for a specific country  (`:db/Tickdata/Country)
2) select from tickdata where date=xxxx

Now the issue is that I flagged few stocks within each country and I would like to extract all stocks that I flagged across all countries.

I would like to do that by reading all Tickdata at the same time (i.e. `:db/Tickdata)  without having to specify the country

How can i do that? and whats the best way to proceed ?

Many Thanks for your help

S
3 REPLIES 3

pressjonny0
New Contributor
Hi Saad

Just to be clear here, you have the data date partitioned (separate directory) within each country? 

With kdb+ you can only partition by one field.  In your case each country is more like a separate database.  So if you want to query across all countries, you either need to load each database individually and query it, or you need to run multiple processes, each accessing each database, and query across them and join the results. 

Depending on the dataset, it might be better to restructure the data.  You can partition either by country OR date (if you partition by country then the country name needs to be enumerated to an int and mapped in each query).  The choice probably depends on: 

- how the majority of your queries are run- do you more often access the data for a subset of countries, or for a subset of dates.  If most of the queries are of the form “for country X query across all dates” then country would be the better choice. 
- how many of each do you have in the dataset.  For example if you only have a handful of countries and lots of dates then partitioning by date might be the better choice as date filters will cut out more of the data more quickly.
- how the data arrives and whether you would need to sort it to maintain attributes. 

You can also apply a p attribute on whichever one you don’t chose as the partitioning type to give faster lookups.  Depending on what your data is you might also be able to just store the data in the usual format of partitioned by date, parted by stock symbol, and if there is a 1:1 mapping between symbol and country you could either add an additional country column also with a p attribute (i.e. a small bit of denormalisation) or manage it in the queries. 

Thanks

Jonny


I think your best bet will be to partition by date, and `p# by country. But if you must retain the directory structure, I suggest you read the following: http://code.kx.com/wiki/Reference/pardottxt

Yogesh Garg

Thank you very much guys for the reply. I think i will explore "`p# by country" . I may also save the data in different ports and extract the flagged stocks without having to load all databases at the same time. Need to do some reading first and see how i can restructure the data.

Many Thanks!