2021.09.13 08:27 PM
Team,
I've got an interest in data processing for higher dimensional objects so have been putting in a bit of thought about how best to represent sometimes quite nested data structures in KDB/Q.
Using the Apply function has become part of my routine but as I've used it, I've come to realize that maybe the way it works at present could be a limited case of a more general model. I think these are the nubs of my thought bubble:
The reason this is an issue for me is in my nested structures, I can have dictionaries, lists or even tables at various depths. The apply function works well mostly but I've found situations but I find it fails where (for instance) one of the layers branches off to a list of strings.
What I'm thinking is, if the underlying object is equivalent, shouldn't we regard the distinction between lists of dictionaries and tables as just a matter of the approach you choose to manipulate the object rather than a property of the object itself? In that case, wouldn't a function such as Apply be better if it was agnostic about such things? Basically, you provide a set of keys and it should just operate on those keys, indifferent to whether it is traversing between dictionaries, lists or tables?
I've had a crack at a generic 'Apply' that kind of does this by incrementally traversing a set of keys using 'over' and flipping the structure at that level when needed. I'd be happy to share my efforts (I can't currently start Q since I rebuilt my computer on the weekend and for some reason, I'm not getting a new license when I submit my email for the Q install).
However, before I disappear too far down this rabbit hole - are there arguments why it might make more sense to keep this separation between lists and dictionaries or tables and dictionaries of lists?
2021.09.14 05:10 AM - edited 2021.09.23 04:49 AM
1. I might suggest to think about it the other way. Dictionaries are more like special paired lists.
q)dict:`a`b!1 2
q)lists:{(key x;value x)} dict
q)dict
a| 1
b| 2
q)lists
a b
1 2
q)dict `a
1
q)lists[0]?`a
0
q)lists[1] lists[0]?`a / The same as: dict `a
1
https://code.kx.com/q/ref/find/
2. Yes a list of conforming dictionaries is promoted a table
q)(`a`b!1 2;`a`b!1 2)
a b
---
1 2
1 2
Importantly in memory the way it is actually stored is 'flipped' so it is a dictionary of lists. (no longer a list of dictionaries)
q).Q.s1 (`a`b!1 2;`a`b!1 2)
"+`a`b!(1 1;2 2)"
This way the keys/column-names only need to be stored once for the whole table and not for each row.
The columns then are vectors which is more efficient and performant.
There are more details on indexing at depth here :
https://code.kx.com/q4m3/3_Lists/#38-iterated-indexing-and-indexing-at-depth
The "querying unstructured data" section of this blog may be of interest:
https://kx.com/blog/kdb-q-insights-parsing-json-files/
The code in it focuses on tables but can be adapted to lists/dictionaries as well:
q)asLists:sample cols sample
q)asLists[0;;`expiry]
17682D19:58:45.000000000
`
`long$()
,""
`long$()
0N
,""
q)@[`asLists;0;{(enlist[`]!enlist (::))(,)/:x}]
`asLists
q)asLists[0;;`expiry]
17682D19:58:45.000000000
::
::
::
::
::
::
q)fill:{n:count i:where (::)~/:y;@[y;i;:;n#x]}
q)fill[0Wn]asLists[0;;`expiry]
17682D19:58:45.000000000 0W 0W 0W 0W 0W 0W
2021.12.28 03:52 AM
Hey Rian,
I've just put in a reply to this below. I'd be keen to hear your thoughts. Sorry for the delay - if I'm honest, I really didn't have a good grasp on the nub of my issue. I think I'm there now.
Simon
2021.09.14 05:36 PM
Hi Simon,
I agree with Rian in term of the generalisation of k data types. I.e.
Your point about apply (@;.) - in both cases, dictionaries and lists, it works by indexing.
Dictionaries require the key value to index and apply the function.
Lists require the index to index to apply the function.
2021.12.28 03:50 AM
Hey Rian/Sam,
I finally got around to investigating this more fully.
The issue I have comes down to the below example of a nested data structure called dsEg here.
dsEg: (`doctype`html)!(enlist "html";`text`body!(enlist"test";enlist ([]a: `d`f`g;b: 23 43 777)));
My problem is that I don't know how to use apply (@;.) to get to the columns on that nested table ([]a: `d`f`g;b: 23 43 777).
I feel like it should be
cols .[dsEg;(`html;`body;0)]
since the table is enlisted which means it's in a single element list.
However, I can't get it, or any other approach to work using Apply alone so I can't use Apply as a method to traverse any general data structure in cases like the above where the structure descends within a nested table. The best I can do is get to the layer above and apply raze. I know that sounds like a small thing but the problem comes when nesting then continues down into the table - there is no way to use apply with a list of keys to get past that application of raze at the table level.
Building a function which will allow that to work was ultimately the motivation for this whole tangent. I think I now appreciate that Apply is intended to be fully generic so I'm basically reinventing the wheel. However, to help me abandon my method, can you advise how Apply might be used with a list of keys to get to the column names or any other nested elements within that table?
Regards,
Simon
2022.01.06 07:20 AM
As in the example it is a nested generic list the items need to be dealt with one at a time. As the list could have many different tables or even different datatypes within it.
q)cols each .[dsEg;(`html;`body)]
a b
q).[dsEg;(`html;`body);{cols each x}]
doctype| ,"html"
html | `text`body!(,"test";,`a`b)
The use of :: may be useful to you if you have not been using it
https://code.kx.com/q/ref/apply/#nulls-in-i
It allows you to skips levels
q).[dsEg;(`html;`body;::;`a)]
d f g
//Better shown on an item with multiple entries in the list
q)dsEg2:(`doctype`html)!(enlist "html";`text`body!(enlist"test";2#enlist ([]a: `d`f`g;b: 23 43 777)));
q).[dsEg2;(`html;`body;::;`a)]
d f g
d f g
.Q.s1 may also be useful to you as it can help show the underlying structure of an item better than the console at times.
https://code.kx.com/q/ref/dotq/#qs1-string-representation
q).[dsEg;(`html;`body;::;`a)]
d f g //Looks like a symbol list type 11h but is in fact a single item egeneric list type 0h
q){-1 .Q.s1 x;} .[dsEg;(`html;`body;::;`a)]
,`d`f`g //.Q.s1 output can be ugly but always shows exact structure
EMEA
Tel: +44 (0)28 3025 2242
AMERICAS
Tel: +1 (212) 447 6700
APAC
Tel: +61 (0)2 9236 5700
KX. All Rights Reserved.
KX and kdb+ are registered trademarks of KX Systems, Inc., a subsidiary of FD Technologies plc.