cancel
Showing results for 
Search instead for 
Did you mean: 

Lists, dictionaries, tables and lists of dictionaries

simon_watson_sj
New Contributor III

Team,

I've got an interest in data processing for higher dimensional objects so have been putting in a bit of thought about how best to represent sometimes quite nested data structures in KDB/Q.

Using the Apply function has become part of my routine but as I've used it, I've come to realize that maybe the way it works at present could be a limited case of a more general model. I think these are the nubs of my thought bubble:

  1. lists and dictionaries are distinct and separate objects in KDB. However, even though they might be implemented as separate objects, wouldn't it make more sense to consider a list as a special case of a dictionary where the keys are numbers?
  2. we can use 'flip' to move between a dictionary of lists and a table but actually, shouldn't a table just be a special case of a dictionary of lists where all the lists are the same length? In that case, if we think back to what a function actually offers us, should we consider 'flip' as a function that primarily allows us to move from using dictionary type syntax to table type syntax even though the semantic representations are fundamentally of the same structure?

The reason this is an issue for me is in my nested structures, I can have dictionaries, lists or even tables at various depths. The apply function works well mostly but I've found situations but I find it fails where (for instance) one of the layers branches off to a list of strings.

What I'm thinking is, if the underlying object is equivalent, shouldn't we regard the distinction between lists of dictionaries and tables as just a matter of the approach you choose to manipulate the object rather than a property of the object itself? In that case, wouldn't a function such as Apply be better if it was agnostic about such things? Basically, you provide a set of keys and it should just operate on those keys, indifferent to whether it is traversing between dictionaries, lists or tables?

I've had a crack at a generic 'Apply' that kind of does this by incrementally traversing a set of keys using 'over' and flipping the structure at that level when needed. I'd be happy to share my efforts (I can't currently start Q since I rebuilt my computer on the weekend and for some reason, I'm not getting a new license when I submit my email for the Q install).

However, before I disappear too far down this rabbit hole - are there arguments why it might make more sense to keep this separation between lists and dictionaries or tables and dictionaries of lists?

2 REPLIES 2

rocuinneagain
Contributor
Contributor

1. I might suggest to think about it the other way. Dictionaries are more like special paired lists.

q)dict:`a`b!1 2
q)lists:{(key x;value x)} dict
q)dict
a| 1
b| 2
q)lists
a b
1 2
q)dict `a
1
q)lists[0]?`a
0
q)lists[1] lists[0]?`a / The same as: dict `a
1

 https://code.kx.com/q/ref/find/

 

2. Yes a list of conforming dictionaries is promoted a table

q)(`a`b!1 2;`a`b!1 2)
a b
---
1 2
1 2

Importantly in memory the way it is actually stored is 'flipped' so it is a dictionary of lists. (no longer a list of dictionaries) 

q).Q.s1 (`a`b!1 2;`a`b!1 2)
"+`a`b!(1 1;2 2)"

 

This way the keys/column-names only need to be stored once for the whole table and not for each row.

The columns then are vectors which is more efficient and performant. 

 

There are more details on indexing at depth here :

https://code.kx.com/q4m3/3_Lists/#38-iterated-indexing-and-indexing-at-depth

 

The "querying unstructured data" section of this blog may be of interest:
https://kx.com/blog/kdb-q-insights-parsing-json-files/

The code in it focuses on tables but can be adapted to lists/dictionaries as well:

q)asLists:sample cols sample
q)asLists[0;;`expiry]
17682D19:58:45.000000000
`
`long$()
,""
`long$()
0N
,""

q)@[`asLists;0;{(enlist[`]!enlist (::))(,)/:x}]
`asLists
q)asLists[0;;`expiry]
17682D19:58:45.000000000
::
::
::
::
::
::
q)fill:{n:count i:where (::)~/:y;@[y;i;:;n#x]}
q)fill[0Wn]asLists[0;;`expiry]
17682D19:58:45.000000000 0W 0W 0W 0W 0W 0W

 

 

sstantoncook
New Contributor II
New Contributor II

Hi Simon,

 

I agree with Rian in term of the generalisation of k data types. I.e.

  • Atom is a scalar representation of a data type.
  • List is a vector representation of an Atom.
  • Dictionary is a keyed set of lists. The key can be a List of any type. The values can a list of any type, a list of lists and list of Dictionaries.
  • Table is a List of commonly keyed Dictionaries. You can see this easily when you put two dictionaries in a list, or enlist one of them.

Your point about apply (@;.) - in both cases, dictionaries and lists, it works by indexing.

Dictionaries require the key value to index and apply the function.

Lists require the index to index to apply the function.