cancel
Showing results for 
Search instead for 
Did you mean: 

A very tricky question for Gurus

VA
New Contributor
Hi q-Gods, 

Please, help needed! I know what I need to do but I have no idea how to do it in q. I have given my absolute best and have spent 2 days trying, but nothing seem to work out. 

I have a table t with an empty RM (running minimum) column which needs to be filled with running minimum of col "px"

record id acn  px RM
0        1   1    15
1        2   1    20
2        3   1    10
3        4   1    11
4        3   0    10
5        5   1    13 
6        4   0    11
7        6   1    17  
..        ...  ..    ...

Each ID in the table can have 2 acn's, either acn=1 (order submitted) or acn=0 (order cancelled). I am looking to run a running minimum in RM column. However, whenever an ID has acn=0, the the according price is no longer registered in the system. In this case, next minimum takes its place. Allow me to demonstrate with table t.

"Record" table is just an index of each row/record and is used purely this example, I do not have have this col in my table t. 

record id acn  px RM
0        1   1    15  15
1        2   1    20  15
2        3   1    10  10
3        4   1    11   10

Record 4 says: ID=3 has acn=0, which means this orders has been cancelled and is no longer in the system. Its px is ignored from here on. This means that the next-in-order-minimum is 11. Of course to find this next in order minimum, 'mins' has to be rerun on col px up until this point (but not including record 4), but this time without, or ignoring, record 2 where ID=3 is registered. Hence (continuing from record 4)

record id acn  px RM
4        3   0    10   11
5        5   1    13   11

Again, record 6 indicates that ID=4 has acn=0 and therefore is cancelled and its price is no longer registered in the system. Yet again, "mins" has to be run up til this point (but not inclusive record 6), but this time ignoring not only records where ID=3, but also where ID=4.

record id acn  px RM
6        4   0    11   13
7        6   1    17   13

This is the result I am after: 

record id acn  px RM
0        1   1    15  15
1        2   1    20  15
2        3   1    10  10
3        4   1    11   10
4        3   0    10   11
5        5   1    13   11
6        4   0    11   13
7        6   1    17   13

Absolutely no idea how to implement this in q. I would definitely give me hard time in other programming languages that I am pretty familiar already too...

As always, if you have a minute or two your time and effort is very 'extremely' appreciated.

Regards, 
VA.
26 REPLIES 26

Mohammad_Noor
New Contributor
One solution is to write a function which maintains the state of active orders after each row. You can do this with the scan adverb.

Example using your data:

    / create the example table 
    q)show t:flip`id`acn`px!(1 2 3 4 3 5 4 6;1 1 1 1 0 1 0 1; 15 20 10 11 10 13 11 17)
        id acn px
        ---------
        1  1   15
        2  1   20
        3  1   10
        4  1   11
        3  0   10
        5  1   13
        4  0   11
        6  1   17
    
    / function to build active price state
    / currState param is dictionary with current state
    / nextEntry param is a dictionary with the changes to apply to state (ie. a row of the above table)
    / for a given 'nextEntry', function will check acn. If 0, it will remove the id from the currState and return it. otherwise will add/update the entry with the given price (px)
    / note the scan adverb (\) at the end of the function - this is important
    q)buildActivePricesState:{[currState;nextEntry] ?[0=nextEntry`acn;enlist[nextEntry`id] _ currState;currState,enlist[nextEntry`id]!enlist nextEntry[`px]] }\
    
    / Example output on your data
    / each line represents state of active prices after applying the matching row in the table 't' (added some comments on RHS of output)
    / note, initial state is given as ()!() .. and scan operator will iterate over each row in 't'
    q) buildActivePricesState[()!();t]
        (,1)!,15
        1 2!15 20
        1 2 3!15 20 10
        1 2 3 4!15 20 10 11
        1 2 4!15 20 11            / removed id=3, as the acn is 0
        1 2 4 5!15 20 11 13
        1 2 5!15 20 13            / removed id= 4, as the acn is 0
        1 2 5 6!15 20 13 17
    
   
    / Adding running minimum (RM) to your table
    / you can then use this function in a few different ways to get minimum active price -
    q) update RM:min each buildActivePricesState[()!();t] from t
    id acn px RM
    ------------
    1  1   15 15
    2  1   20 15
    3  1   10 10
    4  1   11 10
    3  0   10 11
    5  1   13 11
    4  0   11 13
    6  1   17 13

    / Another way of adding running minimum (RM) to your table
    / same as above, just anoter way to supply the 'table' argument
    q) update RM:min each buildActivePricesState[()!();([] id;acn;px)] from t
        id acn px RM
        ------------
        1  1   15 15
        2  1   20 15
        3  1   10 10
        4  1   11 10
        3  0   10 11
        5  1   13 11
        4  0   11 13
        6  1   17 13

One additional point - you do not need to bind the scan adverb (\) to the function when it is defined - it can be defined as a normal function, and you can use the adverb form later (this is recommended approach, as the original func is more easily reused):

    / notice no trailing '\'
    q)buildActivePricesState:{[currState;nextEntry] ?[0=nextEntry`acn;enlist[nextEntry`id] _ currState;currState,enlist[nextEntry`id]!enlist nextEntry[`px]] }

  / the scan operator can be bound to the function when it is used -
    q)buildActivePricesState\[()!();t]
        (,1)!,15
        1 2!15 20
        1 2 3!15 20 10
        1 2 3 4!15 20 10 11
        1 2 4!15 20 11
        1 2 4 5!15 20 11 13
        1 2 5!15 20 13
        1 2 5 6!15 20 13 17

Regards,

Mohammad Noor

Hi Mohammad, 
Thank you very much for your help. This is work of art. I will need to some time to digest it to understand the syntax entirely. Thank you once again. 

Krishna, thank you too, but your code doesn't seem to be doing what I am after. Thank you for your time as well!!! 

Hey,

Just another way 

q)update rm:{[x;y;z;a] $[z;y&x;a]}\[px[0];px;acn;prev px] from t
id acn px rm
------------
1  1   15 15
2  1   20 15
3  1   10 10
4  1   11 10
3  0   10 11
5  1   13 11
4  0   11 13
6  1   17 13

Thanks,
Krishan 

Nick
New Contributor
mohammad's idea to use scan was spot on.  i would only add that replacing inactive prices with infinity makes the logic a bit easier:

q)update  RM:min each @\[()!();id;:;?[1=acn;px;0W]] from t
id acn px RM
------------
1  1   15 15
2  1   20 15
3  1   10 10
4  1   11 10
3  0   10 11
5  1   13 11
4  0   11 13
6  1   17 13

Newbie question, any info highly appreciated - could someone explain both the scans, please? 

update rm:{[x;y;z;a] $[z;y&x;a]}\[px[0];px;acn;prev px] from t - specifically, what is px[0] in each iteration?

and 

update  RM:min each @\[()!();id;:;?[1=acn;px;0W]] from t - is this a case of functional amend or is @ being used as a indexer?

Just one sentence explaining the scans in plain English would do, 

thank you, 
jj. 

I will explain the second query and you should be able to figure out the first one.

So @\[()!();id;:;?[1=acn;px;0W]]  is using the functional form of amend with scan, in the first iteration, empty dictionary is being passed as first parameter and it simply iterates over rest of the parameters using result of previous iteration as the first parameter (dictionary of id's in this case), can be explained from below

q)select @\[()!();id;:;?[acn=1;px;0w]] from t
x
-----------------------------
(,1)!,15f
1 2!15 20f
1 2 3!15 20 10f
1 2 3 4!15 20 10 11f
1 2 3 4!15 20 0w 11
1 2 3 4 5!15 20 0w 11 13
1 2 3 4 5!15 20 0w 0w 13
1 2 3 4 5 6!15 20 0w 0w 13 17


On Friday, February 24, 2017 at 3:04:06 PM UTC, Codelocker wrote:
Newbie question, any info highly appreciated - could someone explain both the scans, please? 

update rm:{[x;y;z;a] $[z;y&x;a]}\[px[0];px;acn;prev px] from t - specifically, what is px[0] in each iteration?

and 

update  RM:min each @\[()!();id;:;?[1=acn;px;0W]] from t - is this a case of functional amend or is @ being used as a indexer?

Just one sentence explaining the scans in plain English would do, 

thank you, 
jj. 
-Ajay

Thank you very much, Ajay. 

Hi,

I just joined this community,
thinking of another way
update RM:raze mins each (0,where acn=0) _ fills ?[acn=0;0n;px] from t


Not sure if the previous replies about replacing with inf works
example:
q)t
id acn px
---------
1  1   1
2  1   20
3  1   10
4  1   11
3  0   10
5  1   13
4  0   11
6  1   17
q)update  RM:min each @\[()!();id;:;?[1=acn;px;0W]] from t
id acn px RM
------------
1  1   1  1
2  1   20 1
3  1   10 1
4  1   11 1
3  0   10 1
5  1   13 1
4  0   11 1
6  1   17 1

using my method
q)update RM:raze mins each (0,where acn=0) _ fills ?[acn=0;0n;px] from t
id acn px RM
------------
1  1   1  1
2  1   20 1
3  1   10 1
4  1   11 1
3  0   10 11
5  1   13 11
4  0   11 13
6  1   17 13


Also, what should be the behaviour if there are consecutive acc=0?

q)t
id acn px
---------
1  1   15
2  1   20
3  1   10
4  1   11
3  0   10
5  0   13
4  0   11
6  1   17
q)update  RM:min each @\[()!();id;:;?[1=acn;px;0W]] from t
id acn px RM
------------
1  1   15 15
2  1   20 15
3  1   10 10
4  1   11 10
3  0   10 11
5  0   13 11
4  0   11 15???
6  1   17 15
q)update RM:raze mins each (0,where acn=0) _ fills ?[acn=0;0n;px] from t
id acn px RM
------------
1  1   15 15
2  1   20 15
3  1   10 10
4  1   11 10
3  0   10 11
5  0   13 11
4  0   11 11
6  1   17 11

Hi 

I don’t think any solution that doesn’t carry forward the full state will work.  The example only has one state change level (i.e. it only ever has to look back 1 price).  If there is > 1 change then they don’t work, e.g. change the px at id 1 to 12:

q) t:flip`id`acn`px!(1 2 3 4 3 5 4 6;1 1 1 1 0 1 0 1; 12 20 10 11 10 13 11 17)                                                                                                                                                                   
q)show t                                                                                                                                                                                                                                         
id acn px
---------
1  1   12
2  1   20
3  1   10
4  1   11
3  0   10
5  1   13
4  0   11
6  1   17

q)buildActivePricesState[()!();t]                                                                                                                                                                                                                
(,1)!,12
1 2!12 20
1 2 3!12 20 10
1 2 3 4!12 20 10 11
1 2 4!12 20 11
1 2 4 5!12 20 11 13
1 2 5!12 20 13
1 2 5 6!12 20 13 17

// this is still correct
q)min each buildActivePricesState[()!();t]                                                                                                                                                                                                       
12 12 10 10 11 11 12 12

// this isn't
q)update RM:raze mins each (0,where acn=0) _ fills ?[acn=0;0n;px] from t                                                                                                                                                                         
id acn px RM
------------
1  1   12 12
2  1   20 12
3  1   10 10
4  1   11 10
3  0   10 11
5  1   13 11
4  0   11 13
6  1   17 13

I think there are only a couple of ways to handle these types of problem:

1. track the full state (as Mohammad’s buildActivePriceState)
2. iterate through the table and run consecutive queries over the previous n rows

These approaches are basically the same, except (1) will likely be higher memory usage (tracking the intermediate state), (2) will be higher compute time (as you are essentially building the “state to now” from fresh for every row).  There might be other approaches, one might be to use a version of over to iterate across the whole table until input=output, but I think that may be very expensive. 

If you are happy to use a global variable, and you really only care about the min value rather than the whole state at each time point, you can use a global variable to track the state to increase speed/reduce memory.  You can also do it with a local but the time performance isn’t good. 

// currState is global
q)buildActivePricesStateGlobalInner:{[nextEntry] $[0=nextEntry`acn; @[`.;`currState;_;nextEntry`id];currState,::enlist[nextEntry`id]!enlist nextEntry[`px]]; min currState}each                                                                  

// set currState to empty, build list
q)buildActivePricesStateGlobal                                                                                                                                                                                                                   
{currState::()!(); buildActivePricesStateGlobalInner x}

q)buildActivePricesStateGlobal[t]                                                                                                                                                                                                                
12 12 10 10 11 11 12 12

// timings
q)\ts buildActivePricesStateGlobal[100000#t]                                                                                                                                                                                                     
285 6318976
q)\ts min each buildActivePricesState[()!();100000#t]                                                                                                                                                                                            
364 26595136
q)(min each buildActivePricesState[()!();100000#t]) ~ buildActivePricesStateGlobal[100000#t]                                                                                                                                                     
1b


Thanks 

Jonny

Actually, can do a bit better performance wise if we shift things around a bit and use Nick's speedup: 

q)buildActivePricesStateGlobalInner

{[id;px] currState[id]:px; min currState}

q)buildActivePricesStateGlobal

{currState::()!(); x:update px:0W from x where acn=0; buildActivePricesStateGlobalInner'[x`id;x`px]}


q)\ts buildActivePricesStateGlobal[100000#t]

106 6318912

q)\ts min each buildActivePricesState[()!();100000#t]

355 26595136


q)buildActivePricesStateGlobal[100000#t] ~ min each buildActivePricesState[()!();100000#t]

1b




On Monday, February 27, 2017 at 9:09:00 AM UTC, Jonny Press wrote:
Hi 

I don’t think any solution that doesn’t carry forward the full state will work.  The example only has one state change level (i.e. it only ever has to look back 1 price).  If there is > 1 change then they don’t work, e.g. change the px at id 1 to 12:

q) t:flip`id`acn`px!(1 2 3 4 3 5 4 6;1 1 1 1 0 1 0 1; 12 20 10 11 10 13 11 17)                                                                                                                                                                   
q)show t                                                                                                                                                                                                                                         
id acn px
---------
1  1   12
2  1   20
3  1   10
4  1   11
3  0   10
5  1   13
4  0   11
6  1   17

q)buildActivePricesState[()!();t]                                                                                                                                                                                                                
(,1)!,12
1 2!12 20
1 2 3!12 20 10
1 2 3 4!12 20 10 11
1 2 4!12 20 11
1 2 4 5!12 20 11 13
1 2 5!12 20 13
1 2 5 6!12 20 13 17

// this is still correct
q)min each buildActivePricesState[()!();t]                                                                                                                                                                                                       
12 12 10 10 11 11 12 12

// this isn't
q)update RM:raze mins each (0,where acn=0) _ fills ?[acn=0;0n;px] from t                                                                                                                                                                         
id acn px RM
------------
1  1   12 12
2  1   20 12
3  1   10 10
4  1   11 10
3  0   10 11
5  1   13 11
4  0   11 13
6  1   17 13

I think there are only a couple of ways to handle these types of problem:

1. track the full state (as Mohammad’s buildActivePriceState)
2. iterate through the table and run consecutive queries over the previous n rows

These approaches are basically the same, except (1) will likely be higher memory usage (tracking the intermediate state), (2) will be higher compute time (as you are essentially building the “state to now” from fresh for every row).  There might be other approaches, one might be to use a version of over to iterate across the whole table until input=output, but I think that may be very expensive. 

If you are happy to use a global variable, and you really only care about the min value rather than the whole state at each time point, you can use a global variable to track the state to increase speed/reduce memory.  You can also do it with a local but the time performance isn’t good. 

// currState is global
q)buildActivePricesStateGlobalInner:{[nextEntry] $[0=nextEntry`acn; @[`.;`currState;_;nextEntry`id];currState,::enlist[nextEntry`id]!enlist nextEntry[`px]]; min currState}each                                                                  

// set currState to empty, build list
q)buildActivePricesStateGlobal                                                                                                                                                                                                                   
{currState::()!(); buildActivePricesStateGlobalInner x}

q)buildActivePricesStateGlobal[t]                                                                                                                                                                                                                
12 12 10 10 11 11 12 12

// timings
q)\ts buildActivePricesStateGlobal[100000#t]                                                                                                                                                                                                     
285 6318976
q)\ts min each buildActivePricesState[()!();100000#t]                                                                                                                                                                                            
364 26595136
q)(min each buildActivePricesState[()!();100000#t]) ~ buildActivePricesStateGlobal[100000#t]                                                                                                                                                     
1b


Thanks 

Jonny

VA
New Contributor
Thank you very much, this is a great improvement! But I am still running into performance problems. I have  a table that is count t = 9392779, and it is taking just too long time to compute. I do not mind waiting, but not hours and hours. Would anyone suggest a way around this? 

 

rspa9428
New Contributor
Performance depends a lot on the id column. How many distinct ids/how many rows on average for each id are there in your 9 million length table? 

Cheers 

Ryan 

VA
New Contributor
As requested, 
In this table I have 
count idList = 4451102
avg idList count = 2.11 but there are some ids that appear more than that. For example: 

desc idList
id       | idCount
---------| -------
136804121| 286
136561734| 262
142266716| 191
137702903| 186
146275665| 186
149983273| 186
135935582| 183
150624717| 161
136623003| 158
149669428| 157
134259987| 152
145771313| 151
147540408| 151
146652097| 147
139549185| 139
138536519| 136
133120715| 134
134270994| 134
138845988| 128
147137301| 127
151356267| 127
134362547| 123
136693027| 123
142536322| 122
136051457| 121
137050076| 120
136997343| 118
141489703| 117
143748919| 110
151214633| 110
133080972| 109
151515926| 108
136288633| 106
135735451| 105
148774185| 105
145408522| 104
146586761| 104
..


rspa9428
New Contributor

That is quite a large set of data you have there, and having that many unique ids doesn't make it any friendlier. I can't think of any function that is faster than what Jonny etc. have suggested (i.e. the dictionary scan), however I think with enough cores+a good machine, we can parallelise that method and get significant improvements. 


So, if you have the luxury of many cores (or many machines), you can perhaps try combining multi process peach with one of the global variable (and therefore memory efficient) functions like the ones Jonny has posted. On even a 0.5 million length example, the memory really explodes if you try and scan creating the state dictionary, so I was only able to keep the memory low by using a global variable update. I have never used the method described below before in real life (have never needed anything faster than the normal scan) so it makes some sense. 


Using Jonny's function f for parallelisation (renamed fjp here) as an example, and 16 slaves:



/ start q process with  /home/user/q/l64/q -s -16 -p 8888

/ example table t with 9 million rows, 4 millions ids, though not quite as unenvenly distributed as your ids

/ converted acn to boolean too

n:9392779;ic:4451102;idlist:neg[n]?til[ic],(n-ic)?ic;t:([]id:idlist;acn:n?01b;px:n?1000.);

/ start 16 worker processes, open handles to them, set .z.pd

slaves:16;

{system"~/kdb/rlwrap /home/user/q/l64/q -p ",string[8800+x]," &"} each til slaves;

.z.pd:`u#hopen each 8800+til slaves;

 

/ the aim is to use batches - so split table t into 16 equal parts, like with .Q.fc

/ can also try different batch sizes, but from quick testing they seem to work out pretty 

/ similar (and maybe more small batches -> UDS, but I still saw similar timings)

tabs:(slaves;0N)#t

 

/ now, each batch needs the correct state dictionary to build from

/ we can get this quickly from out list of tabs by exec’ing last px by id

/ ignore the first tabs and prepend empty dict, so each dict will be an input for

/ the the next tablist

dicts:(,)\[enlist[()!()],{exec last px by id from x}peach -1 _ tabs]

/ also remove any unnecessary entries before we begin (could move this into the , above)

dicts:{(where 0W=x)_ x}peach dicts;

 

/ Then define our function that will run on each process (multi process allows global amends)

fjp:{[tab;startd]currState::startd; update RM:{[acn;id;px] $[acn; currState[id]:px ; @[`.;`currState;_;id]];min currState}'[acn;id;px] from tab}

 

/ Finally, send it off in parallel, and wait

res:raze (fjp .) peach flip (tabs;dicts)

 

 

When I did some testing on only the first 1 millions rows of my example, running it in parallel compared to just one q process was about 30 seconds for parallel (16 slaves) vs. 206 seconds for just the ordinary function run (without peach). 

 

I finally ran it with 16 slaves on my full 9.4 million row example – and it finished eventually in about 75 minutes. So not ideal, but I guess on the plus side it’s less than infinity. 

 

Hope that helps

 

Ryan

VA
New Contributor
Hi Ryan,

Your post is incredible. I will implement it this weekend and might even start a separate thread for it if you do not mind. Thank you,. 

Regards, 
VA

rspa9428
New Contributor
Hey no worries - I've been out for a while so didn't see your email until now. If you're still interested, I realised I forgot one or two things in my previous email. Given the structure of your data (9 million rows, 4+ millions unique ids), I think that the cost of calculating "min stateDictionary" becomes significant as the dictionary becomes huge. So was testing another similar method that tries to avoid calculating the min of the dictionary, only doing it when necessary. 

For the sake of readability here is a pretty verbose version below, might be good to experiment with all the if/ifelses going on in the logic and see if it can be improved:


/ alternate method, to avoid calculating min frequently
/ function below runs on each row, basically keeping track of the global minimum gmin,
/ as well as the list of ids that have px=gmin, and the state dictionary currState. Aim 
/ is to only do the min[currState] when we absolutely have to (as the strucutre of the data
/ seems to be such that this won't happen too frequently)
/ @global - currState - current state dictionary of ids!pxs 
/ @global - gmin - (global) minimum px, lowest px in currState dict
/ @global - ml - min list, list of ids that have px=min px

f:{[tab;startd]
    currState::startd;
    gmin::$[startd~()!();0w;min currState];
    ml::where currState=gmin;
    / function runs on each row of table, returning minimum each time
    {[acn;id;px]
         / for acn=1b, add px to dict, update gmin and ml if we must
         $[acn;
             [currState[id]:px;
               $[px>gmin;
                   :gmin;
                 px<gmin;
                   [gmin::px;ml::1#id];
                   ml,:id
                ]
             ];
         / for acn=0b, remove id from dict, update gmin and ml if necessary
             [@[`.;`currState;_;id];
              if[id in ml;
                  if[not count ml::ml except id;gmin::min currState]
                ]
             ]
          ];
         gmin
    }'[tab`acn;tab`id;tab`px]
 };


/ original (and multi-process parallelisable) function, from JP
fjp:{[tab;startd]
    currState::startd; 
    update RM:{[acn;id;px] $[acn; currState[id]:px ; @[`.;`currState;_;id]];min currState}'[acn;id;px] from tab
 };


So, doing essentially the same as before, starting slaves, distributing (but this time remembering to update px:0w from start table....):


/ start q process with  q -s -16 -p 8888
/ example table t with 9 million rows, 4 millions ids, though not quite as unenvenly distributed as your ids
/ converted acn to boolean too
n:9392779;ic:4451102;idlist:neg[n]?til[ic],(n-ic)?ic;t:([]id:idlist;acn:n?01b;px:n?1000.);
/ update prices to be infinity where acn is 0b
update px:0w from `t where not acn

/ start 16 worker processes, open handles to them, set .z.pd
slaves:16;
{system"q -p ",string[8800+x]," &"} each til slaves;
.z.pd:`u#hopen each 8800+til slaves;
 
/ the aim is to use batches - so split table t into 16 equal parts, like with .Q.fc
/ can also try different batch sizes, but from quick testing they seem to work out pretty 
/ similar (and maybe more small batches can try using UDS, but I still saw similar timings)
tabs:(slaves;0N)#t
 
/ now, each batch needs the correct state dictionary to build from
/ we can get this quickly from our list of tabs by exec’ing last px by id
/ ignore the first tabs and prepend empty dict, so each dict will be an input for
/ the the next tablist. 
/ We then fill forward each of those dicts using scan, removing any uncecessary (0w)
/ entries before we begin
dicts:{_[where r=0w;r:x,y]}\[enlist[()!()],{exec last px by id from x}peach -1 _ tabs]
 
/ Finally, send it off in parallel, and wait (may want to add some logging so you can see when
/ each slave finishes, watching top's cpu will also be pretty telling)
//res:raze (f .) peach flip (tabs;dicts)

For an idea, here are some timings on my pretty old laptop (running 32 bit kdb), with t set to 100,000#t (my laptop can't handle much bigger):


q)\t res:raze (f .)peach flip (tabs;dicts)
2105
q)\t res_nonp:f[t;()!()]
4415
q)res~res_nonp
1b
q)\t res2:fjp[t;()!()]
9837
q) res~res2`RM
1b
/ blows up if just doing regular scan
q)\t res_scan:update RM:min each @\[()!();id;:;px] from t
wsfull


When I ran the full 9 million row example table t on a large server (with 16 cores) a few weeks ago, it finished in about 35 mins or so. Again, not quick, but still finite.

Hope that helps

Cheers

Ryan

VA
New Contributor
Hey, 
This is awesome,l will give it a try. Apologies I have been away due to some unforeseeable circumstances. 

pressjonny0
New Contributor
It might depend on the activity in your real data. Can you try running the first version I sent and see if it is faster? The optimization with the infinity value may slow it down on large datasets as when there is a large number of unique orders the current state dictionary will be very large, so the min operation will be slow. So if instead you remove dead orders (first version) then it might be quicker. 

(I think the code for version 1 can be improved a bit too) 


Sent from my iPhone

VA
New Contributor
Could you be a little more specific please. There have been a lot of replies. Please let me know what you mean by the first version? 

pressjonny0
New Contributor
Ok sure.  Try this one:

q)f                                                                                                                                                                             
{currState::()!(); {[iszero;id;px] $[iszero; @[`.;`currState;_;id];currState[id]:px];min currState}'[0=x`acn;x`id;x`px]}
q)t                                                                                                                                                                             
id acn px
---------
1  1   12
2  1   20
3  1   10
4  1   11
3  0   10
5  1   13
4  0   11
6  1   17
q)\ts f[100000#t]                                                                                                                                                               
151 6449952

I think this version will be better given the number of unique orders you have 


VA
New Contributor
Hi Jonny, 
Apologies for the late reply, I was brought over to another project for 1 week. I will try your improved solution today. Will let you know. Thank you very much for your time and effort. 

Nick
New Contributor
it seems you are trying to build a book and compute the best offer price. i'm guessing you also have a problem where you need to calculate the best bid?

it is very important to use the right data structure for your specific case. if we keep a dictionary keyed on price and who's value is the list of ids at that price, we can quickly compute the 'min' or 'max' price by just looking at the few distinct values in the key. to let this algorithm work, we would also have to keep track of the active price per id.  we can use the unique attribute (`u) to make our look-ups faster.  NOTE: your example gives prices as integers (which is great). this algorithm relies on this because imprecision in real/float values can cause dictionary lookups to fail. if this is not the case in  practice, then it is advisable to scale all values so they can be converted to integers. customize as you need.

/ test dataset
n:10000
t:flip`id`acn`px!(n?n div 3;n?0b;n?20)

price:(`u#())!0#0               / initial price dictionary
order:(0#0)!()                  / initial order dictionary
/ function to update order book
/ accepts custom 'f' to compute running statistic
/ pos:(price;order;stat);id;price;active flag
upd:{[f;pos;id;px;a]
 p:pos 0;                       / price
 o:pos 1;                       / order
 ppx:p id;                      / prev price
 if[not null ppx;o:@[o;ppx;{x _ x?y};id];if[not count o ppx; o _: ppx]]; / drop old px
 if[a;p[id]:px;o[px],:id];      / update new px
 r:(p;o);                       / build return data structure
 r,:pos[2],f r;                 / append computed value
 r}

/ compute best ask
update rm: last 0N!upd[min key last@]/[(price;order;());id;px;acn] from t
/ compute best bid
update rm: last upd[max key last@]/[(price;order;());id;px;acn] from t

VA
New Contributor
HI Nick, 

This is great, thank you for your suggestions and code. I am sorry for the late reply, I got moved to another project for 1 week, I am now back working on this project. 

Yes, you're right, in all you've said re best bid and LOB. I will implement your suggestions this weekend and will be back with updates. Thank you 

Regards, 
VA

sieber
New Contributor
one more solution:

t:flip`id`acn`px!(1 2 3 4 3 5 4 6;1 1 1 1 0 1 0 1; 15 20 10 11 10 13 11 17)
k)update rmin:(min '+ {@[(#t)#0N;x[0]+!x 1;:;x 2]}'+. exec i,it:(-i-#i )^it-i,px from .q.lj[t]select it:*i by id from t where acn=0) from t

improved version:

t:flip`id`acn`px!(1 2 3 4 3 5 4 6;1 1 1 1 0 1 0 1; 15 20 10 11 10 13 11 17)
s:()!(); t,'([]RM:{$[1=y`acn;[s[y`id]:y`px;min x,y`px];[s _:y`id;min value s]]}\[0N;t])

fast enough?