cancel
Showing results for 
Search instead for 
Did you mean: 

json decode error for strings with utf chars (.j.k)

sieber
New Contributor
Hi,

i am using .j.k to decode JSON data. Unfortunately some strings
contain unicode escaped data:

{"a":"B\u00f6rse"}

decoding this with kdb produces an error:


q).j.k "{\"a\":\"B\\u00f6rse\"}"

k){$["{"=*x;(`$c'n#'x)!c'(1+n:x?'":")_'x:d x;"["=*x;.Q.fc[c']d x;q=*x;$[1<+/v x;'`err;"",. x];"a">*x;"F"$x;"n"=*x;0n;"t"=*x]}

'"B\u00f6rse"

.:

"\"B\\u00f6rse\""

q.j))\




i hotfixed this issue by changing the .j.c function to


c:{$["{"=*x;(`$c'n#'x)!c'(1+n:x?'":")_'x:d x;"["=*x;.Q.fc[c']d x;q=*x;$[1<+/v x;'`err;"",-1_1_x];"a">*x;"F"$x;"n"=*x;0n;"t"=*x]}



i replace  ". x"  with "-1_1_x" so that json strings are not parsed via "parse" but instead just used as strings (just removing the " at the beginning and the end)

does anybody now if this fix might have some sideeffects i did not think about? Or maybe there is even a better solution?


Of course my unicode character still is not correct, but at least i dont get parse errors.



Markus

5 REPLIES 5

charlie
New Contributor II
New Contributor II
btw, the json parser is revamped in 3.3t

q).j.k "{\"a\":\"B\\u00f6rse\"}"
a| "B\303\266rse"

sieber
New Contributor
Hi Charles,
thanks for your answer. Where can I get this 3.3t? I would really like to see the new .j code and how Arthur improved them :).
Btw are there any news on kparc and the new coming k version? Occasionally i check kparc.com but there are rarely updates.
Markus

charlie
New Contributor II
New Contributor II
3.3t is available to commercially licensed customers, and the .j.j is no longer implemented in k.
kparc is Arthur's research project for a closed audience; if his research determines better ways to do things, we sometimes implement those in kdb+.

sieber
New Contributor
are theses issues also solved in 3.3t?

q).j.j 0w, `$"a\"b"

"[0w,\"a\"b\"]"

(neither 0w nor "a"b"  (unescaped ") are valid json)


my current fix for them is

J:(($`0`1)!$`false`true;s;{$[in["w";x 1 2]|~#x;"null";x]};s;j;{s@[x;&"."=8#x;:;"-"]};s)1 2 5 10 11 12 16h bin   /fixed encoding error of 0w and `$"a\"b"



On Wednesday, May 20, 2015 at 11:52:30 AM UTC+2, Charles Skelton wrote:
btw, the json parser is revamped in 3.3t

q).j.k "{\"a\":\"B\\u00f6rse\"}"
a| "B\303\266rse"

charlie
New Contributor II
New Contributor II
no, so far only the parser (.j.k) has changed (builtin implemented in c).