cancel
Showing results for 
Search instead for 
Did you mean: 

Transform Specific Column Values

kadir_kilicoglu
New Contributor II

Hello,

Assume that I have a table like;

prices:([]sym:`u#`$();price:`float$())

..and updating this table through the standard .u.upd function with the incoming data like;

data:((`$"AAPL.B*";`$"AAPL^#");(122.74;123.17))

Furthermore; I have loaded a CSV file of different symbology types mappings like;

IssueType                              CQS      CMS     NSDQ
---------------------------------------------------------------
"Preferred Class \"A\"*"               "pA"     "PRA"   "-A"
"Preferred Class \"B\"*"               "pB"     "PRB"   "-B"
"Class \"A\"*"                         "/A"     ,"A"    ".A"
"Class \"B\"*"                         "/B"     ,"B"    ".B"
...
...

Now; what I want to achieve is to transform the incoming data during .u.upd by;

  1. Suffixing the sym columns of the data
  2. Converting the sym columns of the data from NSDQ format to CQS format

I achieved the first requirement with such a function;

suffixData:{
   symColmun:0;
   suffix:".X";
   x[symColmun]:.Q.fu[{`$string[x],suffix}each; x[symColmun]];
}

But when it comes to the second requirement, I couldn't make it work yet. The pseudo-code on my mind is;

  1. For each row in the data sym column:
    1. SSR by pattern matching each row in the conversion table's NSDQ column
    2. ..and replacing the matching characters by the corresponding CQS column

I'm also open to any suggestions to achieve this in a more optimized way.

Thanks in advance!

1 ACCEPTED SOLUTION

ssr is a much slower operation than the above examples.
Using wildcards such as "*#" is also difficult as it matches more than one suffix (.e.g "#" and "^#").
To work around this you can choose the longest matching suffix.

Using 'like' and '@' where possible rather than 'ssr' as below is a big speed improvement.

Searching for "*" is also difficult as it is a wildcard. Instead, I use tab "\t" in it's place.

If "\t" is possibly in your data you would need to change this for another character.

symbology:.Q.id ("****";enlist ",")0:`:symbology.csv
update searchNASDAQ:{"*",@[x;where x="*";:;"\t"]} each NASDAQ from `symbology
func:{s:string x;
m:select from symbology where @[s;where s="*";:;"\t"] like/:searchNASDAQ;
l:max count each m`NASDAQ;
c:first exec CMS from m where l=count each NASDAQ;
`$$[c~();s;(neg[l]_s),c]
};

In a test it does seem to operate correctly.

Overall it still has risks as if bad data is sent in the function cannot truly validate what it is doing. Ideally you would have root and suffix separated by a known delimiter such as a space " " in the source data.

-6 sublist {([] symNASDAQ:n;symCMS:func each n:`$"AAPL",/:x)}symbology`NASDAQ
symNASDAQ symCMS
------------------
AAPL# AAPLWI
AAPL^# AAPLRTWI
AAPL-# AAPLPRWI
AAPL.A# AAPLAWI
AAPL+# AAPLWSWI
AAPL~ AAPLTEST

The extra complexity does come at a cost of speed.

\ts func each 10000#`$"AAPL+#"
41 553776

symbologyOld:`NASDAQ xkey .Q.id ("****";enlist ",")0:`:symbology.csv
\ts {s:string x;`$(4#s),symbologyOld[4 _ s]`CQSSuffix} each 10000#`$"AAPL+#"
12 554448
\ts {s:string x;r:first where not s in .Q.A;`$(r#s),symbologyOld[r _ s]`CQSSuffix} each 10000#`$"AAPL+#"
16 554464

As you used in your original example .Q.fu is a great tool when performing an intensive task repeatedly. 

Extreme example with only one unique input:

\ts .Q.fu[func each] 10000#`$"AAPL+#"
0 394032

 One limitation of .Q.fu is that it has no memory between executions.

One example of a way to bypass this would be a memory cache.

The library https://github.com/gitrj95/q-memo could be used.

In this use-case your function is fast enough that the cache is overkill and should not actually be used but would be useful for very slow operations you may have to run many times.

\l memo.k
.memo.init[`.;10000h] //Create a cache with 10k limit
.memo.mk[`func;`memoFunc;`cache.0] //Create a wrapped function to use the cache

\ts memoFunc each 10000#`$("AAPL~";"AAPL+#")
126 553936

//The cache pre stores the results
cache.0
f a | r
-------------------| ---------
:: | ::
..memoFunc ,`AAPL~ | `AAPLTEST
..memoFunc ,`AAPL+#| `AAPLWSWI

 

View solution in original post

4 REPLIES 4

rocuinneagain
New Contributor III
New Contributor III

Without knowing the full rules and logic here are some thoughts.

//Table taken from https://www.nasdaqtrader.com/trader.aspx?id=CQSsymbolconvention
q)symbology:`NASDAQIntegratedPlatformSuffix xkey .Q.id ("****";enlist ",")0:`:symbology.csv
q)5 sublist symbology
NASDAQIntegratedPlatformSuffix| Security CQSSuffix CMSS..
------------------------------| ---------------------------------------------..
,"-" | "Preferred" ,"p" "PR"..
"-A" | "Preferred Class \"A\"*" "pA" "PRA..
"-B" | "Preferred Class \"B\"*" "pB" "PRB..
".A" | "Class \"A\"*" ".A" ,"A"..
".B" | "Class \"B\"*" ".B" ,"B"..

 

//Assuming 4 character root
q){s:string x;`$(4#s),symbology[4 _ s]`CQSSuffix} each (`$"AAPL.B*";`$"AAPL^#")
`AAPL`AAPLrw

 

//Assuming there is always a suffix and first non capital letter is beginning of suffix
{s:string x;r:first where not s in .Q.A;`$(r#s),symbology[r _ s]`CQSSuffix} each (`$"AAPL.B*";`$"AAPL^#")
`AAPL`AAPLrw

Note: '.B*' does not appear in that mapping table so 'AAPL.B*' does not receive a new CQS Suffix.

Thank for your suggestions rocuinneagain I'll definitely try to adapt them in my context but please note that we can neither assume a 4-character root or the first non-capital letter as a suffix.

Instead, do you think that the SSR function might work? That is actually the challenge somehow go over each data line for each conversion line pattern matching in the most optimized way.

ssr is a much slower operation than the above examples.
Using wildcards such as "*#" is also difficult as it matches more than one suffix (.e.g "#" and "^#").
To work around this you can choose the longest matching suffix.

Using 'like' and '@' where possible rather than 'ssr' as below is a big speed improvement.

Searching for "*" is also difficult as it is a wildcard. Instead, I use tab "\t" in it's place.

If "\t" is possibly in your data you would need to change this for another character.

symbology:.Q.id ("****";enlist ",")0:`:symbology.csv
update searchNASDAQ:{"*",@[x;where x="*";:;"\t"]} each NASDAQ from `symbology
func:{s:string x;
m:select from symbology where @[s;where s="*";:;"\t"] like/:searchNASDAQ;
l:max count each m`NASDAQ;
c:first exec CMS from m where l=count each NASDAQ;
`$$[c~();s;(neg[l]_s),c]
};

In a test it does seem to operate correctly.

Overall it still has risks as if bad data is sent in the function cannot truly validate what it is doing. Ideally you would have root and suffix separated by a known delimiter such as a space " " in the source data.

-6 sublist {([] symNASDAQ:n;symCMS:func each n:`$"AAPL",/:x)}symbology`NASDAQ
symNASDAQ symCMS
------------------
AAPL# AAPLWI
AAPL^# AAPLRTWI
AAPL-# AAPLPRWI
AAPL.A# AAPLAWI
AAPL+# AAPLWSWI
AAPL~ AAPLTEST

The extra complexity does come at a cost of speed.

\ts func each 10000#`$"AAPL+#"
41 553776

symbologyOld:`NASDAQ xkey .Q.id ("****";enlist ",")0:`:symbology.csv
\ts {s:string x;`$(4#s),symbologyOld[4 _ s]`CQSSuffix} each 10000#`$"AAPL+#"
12 554448
\ts {s:string x;r:first where not s in .Q.A;`$(r#s),symbologyOld[r _ s]`CQSSuffix} each 10000#`$"AAPL+#"
16 554464

As you used in your original example .Q.fu is a great tool when performing an intensive task repeatedly. 

Extreme example with only one unique input:

\ts .Q.fu[func each] 10000#`$"AAPL+#"
0 394032

 One limitation of .Q.fu is that it has no memory between executions.

One example of a way to bypass this would be a memory cache.

The library https://github.com/gitrj95/q-memo could be used.

In this use-case your function is fast enough that the cache is overkill and should not actually be used but would be useful for very slow operations you may have to run many times.

\l memo.k
.memo.init[`.;10000h] //Create a cache with 10k limit
.memo.mk[`func;`memoFunc;`cache.0] //Create a wrapped function to use the cache

\ts memoFunc each 10000#`$("AAPL~";"AAPL+#")
126 553936

//The cache pre stores the results
cache.0
f a | r
-------------------| ---------
:: | ::
..memoFunc ,`AAPL~ | `AAPLTEST
..memoFunc ,`AAPL+#| `AAPLWSWI

 

View solution in original post

kadir_kilicoglu
New Contributor II

Thank you for your extensive answer, I'll try what you have suggested!