cancel
Showing results for 
Search instead for 
Did you mean: 

Is there any reason to not use a TP (Feedhandler -> RDB Directly)?

Light_of_Heaven
New Contributor
Hello,

Here it mentions,

The data feed could be written directly to the RDB. More often, it is written to a kdb+ process called a tickerplant, which will:

  • write all incoming records to a log file
  • push all data to the RDB
  • push all or subsets of the data to other processes

Other processes would subscribe to a tickerplant to receive new data, and each would specify what data should be sent: all or a selection.


I am under the impression that the TP is simply a passthrough entity, responsible for logging.

But let's say for whatever reason someone is not interested in recording incoming records (maybe you don't care, or maybe it can be done at the feedhandler level). Would it ever make sense for a feedhandler ever be connected directly to an RDB? Why or why not?

If most tick architecture uses a TP, there must be a good reason beyond simply logging, as I'm assuming this can be done at the feedhandler's level.

Thanks for your responses.


5 REPLIES 5

dotsch
New Contributor
The reason for using TP is that it's log file can be replayed, so you can recover from a crash without data loss (unless the TP went down).
If you don't care, you can push data to RDB directly of course.

András

User
Not applicable
A TP fulfils several important functions in a data ingestion system
  1. The use of an RDB allows for querying of ingested data without interrupting the data ingestion. If a user wishes to perform complex operations on the realtime data using an RDB it will not hold up the ingestion. In fact, if the RDB is crashed it will not impact the ingestion, the data is always recoverable. This is not the case if the ingestion point is also the daily data cache.
  2. It provides a fast, robust method for logging ingested data in the form of a tickerplant log.
    Increase processing complexity of operations in the TP can increase any chance of a failure causing loss of data
  3. It allows for continuous ingestion of data during EOD (End of Day) writedowns to disk. EOD actions can grow to be complicated if the ingestion point failed during the EOD it could result in complete loss of data
  4. By adjusting the publication from the TP (i.e., using zero latency or tweaking the batching through the timer), throughput into the system can be maximised  
  5. A TP allows the dissemination of the incoming data to several processes by the pub/sub mechanisms, applying varying levels of filtering if specified. 
  6. Similar to the previous point, chaining TP allows for configurable levels of granularity 
Essential the TP allows for increased complexity and versatility in a data ingestion system, whilst still maintaining resilience. 

Thanks for your quick response Callum.

Can you expand a little on your third point? In particular, the difference between having a TP vs pushing directly to the RDB with reference to, "EOD actions can grow to be complicated if the ingestion point failed during the EOD it could result in complete loss of data".

Point #5 seems critical as well. It is best that each database (rdb, vwap, hlcv, tq, last) only received what it needs, and that filtering can take place at the TP level. If there was no TP, then each database would get all the data, and would have to perform the filtering itself. Please correct me if I'm wrong.

User
Not applicable
Loss of data isn't correct here, that's my bad. If the ingestion point was also performing EOD and it failed then there is the possibility that it goes down and you also stop ingested for the current day. That would mean you would need to replay the previous day, perform and EOD (during which you cannot ingest more data), and then catch up to the current point in time. If you use an RDB as an ingestion point, you can only do one thing at a time, handle the data or ingest, not both. I would stress again that the logfile replay in kdb is incredibly performant when using -11!. 

I would only expect issues with EOD if it got much more complex than a vanilla operation (such as using intraday writedowns, maintaining several HDB directories, some segmented, others partitioned, you have some tables that are stateful, others that are transactions etc).

You are correct about point 5#, skipping a TP would bloat your memory footprint. 

In general, a TP is very congruent with the general approach of kdb/q, which has a strong preference for compartmentalization of small compact operations. 

The only case I can envision for the omission of a tickerplant would be the ingestion of small, sporadic datasets. And even then it would be if I didn't have the time to grab the vanilla TP code and configure the pub/sub.

User
Not applicable
Sorry I'm on mobile I didn't see the last part of that question.

I would still use a tickerplant, due to the for some of the reasons I've mentioned. The ability to replay TP logs with -11! is very very very well optimized and will allow rapid recovery, In the event of an RDB restart, this mechanism allows the RDB to catch up with the days data very quickly, and that's within a vanilla system. You would need to create a custom method to handle this case, and I strongly feel that you would not be leveraging the potential of kdb. If you really wanted to recover from feed handler logs, then you could still keep the other benefits of kdb (re: throughput, EOD actions, etc) by simply preventing it from logging to disk, which would reduce the overhead to near 0.