cancel
Showing results for 
Search instead for 
Did you mean: 

Most optimum way to search a column containing a list of values

smdesai
New Contributor

Hello,
I have a table with 'parent' and 'child' columns containing an id. I have another column named 'chain' that is a list containing id's of parents all the way to the root. This means that for a given row, the chain column contains the path to the root node. 

Now what I'd like to do is find all rows whose chain column contains a particular id. I have the following:
select from table where id in/:chain

Is there a better way to achieve this? I'm looking to see if this can be improved either by redoing the query or a reorganization of the data. 

2 REPLIES 2

rocuinneagain
Contributor III
Contributor III

Can you make up a sample table with a few rows of data and share the q code?

It'll help people understand your use-case and make it easier for them to answer the question.

You can see details on how to include a code block:

https://community.kx.com/t5/Community-Blogs/Creating-q-code-block-with-syntax-highlighting-in-a-mess...

 

1. What datatype are you using for ids?

2. How many rows do you expect to be in the final table?

3. Will the table live on disk or in memory?

4. Will it receive a lot of updates or be mostly static?

5. What will the average length of a chain be?

6. You name the column 'child', so each node can only have one child? Or would  'children' make more sense? as a list?

 

One basic unoptimized example using .z.s self to get the parenting chain of each row

 

 

q)tree:([] row:til 5;parent:0N 0 0 1 1)
q)tree
row parent
----------
0
1   0
2   0
3   1
4   1
q)getChain:{[c;r] $[null p:tree[r]`parent;c,p;.z.s[c,p;p]]}
q)getChain[();4]
1 0 0N
q)update chain:getChain[()] each row from tree
row parent chain
-----------------
0          ,0N
1   0      0 0N
2   0      0 0N
3   1      1 0 0N
4   1      1 0 0N

 

 

 

These links to existing resources may also be of use:

Thank you for the reply and links to resources. I'll check those out later.

What I have is a tree where each node has an id and a node can contain children. A child can only have one parent. This data is time based (day granularity) so a tree contains changes on a daily basis. What I'd like to do is for a given day, find a subtree for a given id. That's the basic problem I'm trying to solve.

To answer the questions:

  1. The ids are symbols (the example below is simplified with integers)
  2. The entire table right now is ~23M rows and a snapshot for a given day is ~80,000 rows
  3. The table is stored on disk as a splayed table
  4. The table receives updates on a daily basis
  5. The chain as computed by the code below has length 11 for all nodes with repeating root id where the subtree ends early
  6. Yes children would've been a better choice
/ This is what the table looks after after the chain is added
id | date         parent chain
------------------------------
0                 0      0 0 0
1    2023.01.23   0      0 0 0
2    2023.01.23   1      1 0 0
3    2023.01.23   2      2 1 0
4    2023.01.23   1      1 0 0

load`table
`id xkey `table

filterTableByDateWithChain:{[d]
  t::select from table where date=d;
  t::update parent:`t$parent from t;
  update chain:flip parent scan parent from t}

t:filteredTableByDateWithChain 2023.01.23
subtree:select from t where id in/: chain

 

I've just started learning q and kdb a few weeks ago so still learning.