cancel
Showing results for 
Search instead for 
Did you mean: 

RE: [personal kdb+] kdb+tick with schemaless events

david_demner
New Contributor
X-Originating-IP: 174.7.128.233User-Agent: Workspace Webmail 5.14.0Message-Id: <20150428203053.85f80dae80d1d2f2e266ec6278e6cbe8.49e401c4f0.wbe@email07.europe.secureserver.net>From: "David Demner \(AquaQ\)" To: personal-kdbplus@googlegroups.comSubject: RE: [personal kdb+] kdb+tick with schemaless eventsDate: Tue, 28 Apr 2015 20:30:53 -0700Mime-Version: 1.0

q)t:([]time:3?0D; sym:til 3; data:3#enlist(1 2!(1 2;1 2)))
q)`:t/ set 0#t
q)`:t/ upsert t
q)value`:t/
time                 sym data
--------------------------------------
0D09:25:33.805802464 0   1 2!(1 2;1 2)
0D12:24:36.672738790 1   1 2!(1 2;1 2)
0D12:23:00.641756951 2   1 2!(1 2;1 2)


kdb+ does this to protect you from yourself when you're trying to write down complex columns that can't be efficiently accessed. 

-------- Original Message --------
Subject: Re: [personal kdb+] kdb+tick with schemaless events
From: <joshmyzie2@yandex.com>
Date: Wed, April 29, 2015 3:41 am
To: personal-kdbplus@googlegroups.com


Thanks for the reply, David.

Regarding performance, I realize I will take a hit, but my thinking was
that I will either only query a small time window / specific event type,
or I would split out a specific event type to a standard schema table.

Maybe I'm misunderstanding you, but how would I save my events (nested
dicts) to a hdb without serializing? For example, the following table
won't save unless I serialize the data column:

q)t:([]time:3?0D; sym:til 3; data:3#enlist(1 2!(1 2;1 2)))
q)t
time sym data
--------------------------------------
0D05:44:29.828280061 0 1 2!(1 2;1 2)
0D03:37:10.269978940 1 1 2!(1 2;1 2)
0D03:45:41.618905216 2 1 2!(1 2;1 2)
q)`:/tmp/t/ set t
k){$[@x;.[x;();:;y];-19!((,y),x)]}
'type
q.q))\
q)`:/tmp/t/ set update -8!'data from t
`:/tmp/t/


Josh


On 28 April 2015 20:24 UTC, David Demner (AquaQ) <david.demner@aquaq.co.uk> wrote:

> 1. I think your performance will be pretty bad especially if you have lots of events. This is especially true if you have longer hdb queries because the eventData column can't be randomly accessed.
>
> If each event type has the same schema, it may be better to split each one into a separate table (in your upd event). If your schema can change over time, have a look at dbmaint.q for HDB schema maintenance (or perhaps you won't need it since kdb+ reads the schema from the latest partition in your hdb)
>
> That being said, it's certainly possible if you're willing to pay the price.
>
> 2. I think JSON would just bloat it further for not much (no?) benefit. I don't think you need to serialize (just set the empty table then upsert the results possibly with .z.zd or manual compression) and in fact maybe serialization would slow it down further
>
> 3. I don't know much about tick.q or r.q. But it's likely pointless to serialize before (kdb is very clever about serializing where necessary)
>
> -----Original Message-----
> From: personal-kdbplus@googlegroups.com [mailto:personal-kdbplus@googlegroups.com] On Behalf Of joshmyzie2@yandex.com
> Sent: Tuesday, April 28, 2015 8:29 AM
> To: personal-kdbplus@googlegroups.com
> Subject: [personal kdb+] kdb+tick with schemaless events
>
>
> Hello,
>
> I am writing an event-driven application and I want to send all events to kdb for persistence and running real-time ad-hoc queries. Rather than hard-code the schema for all my events, which will change over time, I am thinking of sending a single table to my ticker plant:
>
> ([] time:`timespan$(); sym:`g#`symbol(); eventData:())
>
> where "sym" will be the event name and eventData can be any dict.
> Example table with two event types:
>
> time sym eventData
> ---------------------------------------------------------------------------
> 0D11:14:57.333000000 e1 `xx`yy!1 2
> 0D11:14:57.333000000 e2 `aa`bb`cc!(5;0.3927524 0.5170911 0.5159796;`a`b`c)
> 0D11:14:57.333000000 e1 `xx`yy!5 2
>
>
> My questions are:
>
> 1. Is this strategy with kdb a terrible idea?
>
> 2. How should I serialize the eventData for EOD persistence? Just "-8!"? Any reason to use JSON instead?
>
> 3. Should I instead serialize the eventData BEFORE sending it to my ticker plant, so that I don't need to modify tick.q or r.q?
>
> Thanks,
> Josh
>
> --
> You received this message because you are subscribed to the Google Groups "Kdb+ Personal Developers" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to personal-kdbplus+unsubscribe@googlegroups.com.
> To post to this group, send email to personal-kdbplus@googlegroups.com.
> Visit this group at http://groups.google.com/group/personal-kdbplus.
> For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Kdb+ Personal Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to personal-kdbplus+unsubscribe@googlegroups.com.
To post to this group, send email to personal-kdbplus@googlegroups.com.
Visit this group at http://groups.google.com/group/personal-kdbplus.
For more options, visit https://groups.google.com/d/optout.
1 REPLY 1

joshmyzie2
New Contributor
Awesome. I never would have guessed that.For anyone else interested: .Q.hdpf works fine with complex columns, no modification necessary. The `p# attribute still works on the sym column, so queries for a specific event type should still be fast. 29.04.2015, 08:47, "David Demner (AquaQ)" 😆 q)t:([]time:3?0D; sym:til 3; data:3#enlist(1 2!(1 2;1 2)))> q)`:t/ set 0#t> q)`:t/ upsert t> q)value`:t/> time � � � � � � � � sym data> --------------------------------------> 0D09:25:33.805802464 0 � 1 2!(1 2;1 2)> 0D12:24:36.672738790 1 � 1 2!(1 2;1 2)> 0D12:23:00.641756951 2 � 1 2!(1 2;1 2)>> kdb+ does this to protect you from yourself when you're trying to write down complex columns that can't be efficiently accessed.>>> -------- Original Message -------->> Subject: Re: [personal kdb+] kdb+tick with schemaless events>> From: >> Date: Wed, April 29, 2015 3:41 am>> To: personal-kdbplus@googlegroups.com>>>> Thanks for the reply, David.>>>> Regarding performance, I realize I will take a hit, but my thinking was>> that I will either only query a small time window / specific event type,>> or I would split out a specific event type to a standard schema table.>>>> Maybe I'm misunderstanding you, but how would I save my events (nested>> dicts) to a hdb without serializing? For example, the following table>> won't save unless I serialize the data column:>>>> q)t:([]time:3?0D; sym:til 3; data:3#enlist(1 2!(1 2;1 2)))>> q)t>> time sym data>> -------------------------------------->> 0D05:44:29.828280061 0 1 2!(1 2;1 2)>> 0D03:37:10.269978940 1 1 2!(1 2;1 2)>> 0D03:45:41.618905216 2 1 2!(1 2;1 2)>> q)`:/tmp/t/ set t>> k){$[@x;.[x;();:;y];-19!((,y),x)]}>> 'type>> q.q))\>> q)`:/tmp/t/ set update -8!'data from t>> `:/tmp/t/>>>> Josh>>>> On 28 April 2015 20:24 UTC, David Demner (AquaQ) wrote:>>>>> 1. I think your performance will be pretty bad especially if you have lots of events. This is especially true if you have longer hdb queries because the eventData column can't be randomly accessed.>>>>>> If each event type has the same schema, it may be better to split each one into a separate table (in your upd event). If your schema can change over time, have a look at dbmaint.q for HDB schema maintenance (or perhaps you won't need it since kdb+ reads the schema from the latest partition in your hdb)>>>>>> That being said, it's certainly possible if you're willing to pay the price.>>>>>> 2. I think JSON would just bloat it further for not much (no?) benefit. I don't think you need to serialize (just set the empty table then upsert the results possibly with .z.zd or manual compression) and in fact maybe serialization would slow it down further>>>>>> 3. I don't know much about tick.q or r.q. But it's likely pointless to serialize before (kdb is very clever about serializing where necessary)>>>>>> -----Original Message----->>> From: personal-kdbplus@googlegroups.com [mailto:personal-kdbplus@googlegroups.com] On Behalf Of joshmyzie2@yandex.com>>> Sent: Tuesday, April 28, 2015 8:29 AM>>> To: personal-kdbplus@googlegroups.com>>> Subject: [personal kdb+] kdb+tick with schemaless events>>>>>>>>> Hello,>>>>>> I am writing an event-driven application and I want to send all events to kdb for persistence and running real-time ad-hoc queries. Rather than hard-code the schema for all my events, which will change over time, I am thinking of sending a single table to my ticker plant:>>>>>> ([] time:`timespan$(); sym:`g#`symbol(); eventData:())>>>>>> where "sym" will be the event name and eventData can be any dict.>>> Example table with two event types:>>>>>> time sym eventData>>> --------------------------------------------------------------------------->>> 0D11:14:57.333000000 e1 `xx`yy!1 2>>> 0D11:14:57.333000000 e2 `aa`bb`cc!(5;0.3927524 0.5170911 0.5159796;`a`b`c)>>> 0D11:14:57.333000000 e1 `xx`yy!5 2>>>>>>>>> My questions are:>>>>>> 1. Is this strategy with kdb a terrible idea?>>>>>> 2. How should I serialize the eventData for EOD persistence? Just "-8!"? Any reason to use JSON instead?>>>>>> 3. Should I instead serialize the eventData BEFORE sending it to my ticker plant, so that I don't need to modify tick.q or r.q?>>>>>> Thanks,>>> Josh>>>>>> -->>> You received this message because you are subscribed to the Google Groups "Kdb+ Personal Developers" group.>>> To unsubscribe from this group and stop receiving emails from it, send an email to personal-kdbplus+unsubscribe@googlegroups.com.>>> To post to this group, send email to personal-kdbplus@googlegroups.com.>>> Visit this group at http://groups.google.com/group/personal-kdbplus.>>> For more options, visit https://groups.google.com/d/optout.>>>> -->> You received this message because you are subscribed to the Google Groups "Kdb+ Personal Developers" group.>> To unsubscribe from this group and stop receiving emails from it, send an email to personal-kdbplus+unsubscribe@googlegroups.com.>> To post to this group, send email to personal-kdbplus@googlegroups.com.>> Visit this group at http://groups.google.com/group/personal-kdbplus.>> For more options, visit https://groups.google.com/d/optout.>> --> You received this message because you are subscribed to the Google Groups "Kdb+ Personal Developers" group.> To unsubscribe from this group and stop receiving emails from it, send an email to personal-kdbplus+unsubscribe@googlegroups.com.> To post to this group, send email to personal-kdbplus@googlegroups.com.> Visit this group at http://groups.google.com/group/personal-kdbplus.> For more options, visit https://groups.google.com/d/optout.