cancel
Showing results for 
Search instead for 
Did you mean: 

kdb randomly hanging when run inside docker container

joshmyzie2
New Contributor
I'm not sure how to debug this, so looking for suggestions.I've been running multiple kdb instances inside a docker container.On multiple occasions, one of the instances will hang:- Running hopen on its port from a another kdb process will hang indefinitely.- Trying to run a HTTP query will also hang until the client times out.- I am, however, able to successfully open a connection with telnet.- Sending SIGINT/SIGTERM has no effect. SIGKILL is needed to bring down the hung process.- Other kdb instances in the container are unaffected.- Appears to happen randomly, sometimes after 1 hour, sometimes after 2 days.Most likely a docker bug, but so far it's only manifested itself with kdb.
2 REPLIES 2

jim1
New Contributor
I'd try attaching strace to the pid concerned, to see if it's stuck waiting on the return of a syscall. Normally you would see regular select calls, corresponding to the file descriptors/ports the process is listening on.

If there's nothing useful in the strace output once it gets into this state, you could try running it with strace from the beginning, with something like

nohup strace -s 99999 -o ~/str_out.log q myprog.q 2>&1 > ~/q_out.log < /dev/null &

and you may be able to glean something useful in the run-up to the issue. 

If it's not obvious already whether it's _really_ stuck, you could run a function within kdb on a timer (e.g. log a timestamp to disk), and see if that continues working. That might help narrow down the issue to the Docker (i.e. Linux conatainers) networking stack, and/or its signal-handling.

joshmyzie2
New Contributor
Thanks. I'll give strace a try. And it does apear to be really stuck. Theprocess was running a function on a timer once a second that sent datato another process, which stopped receiving updates when this issue occured.On 3 April 2016 08:27 UTC, James Little wrote:> I'd try attaching strace to the pid concerned, to see if it's stuck waiting> on the return of a syscall. Normally you would see regular select> <> calls, corresponding to the file> descriptors/ports the process is listening on.>> If there's nothing useful in the strace output once it gets into this> state, you could try running it with strace from the beginning, with> something like>> nohup strace -s 99999 -o ~/str_out.log q myprog.q 2>&1 > ~/q_out.log <> /dev/null &>> and you may be able to glean something useful in the run-up to the issue.>> If it's not obvious already whether it's _really_ stuck, you could run a> function within kdb on a timer (e.g. log a timestamp to disk), and see if> that continues working. That might help narrow down the issue to the Docker> (i.e. Linux conatainers) networking stack, and/or its signal-handling.>> On 3 April 2016 at 00:48, wrote:>>>>> I'm not sure how to debug this, so looking for suggestions.>>>> I've been running multiple kdb instances inside a docker container.>> On multiple occasions, one of the instances will hang:>>>> - Running hopen on its port from a another kdb process will hang>> indefinitely.>> - Trying to run a HTTP query will also hang until the client times out.>> - I am, however, able to successfully open a connection with telnet.>> - Sending SIGINT/SIGTERM has no effect. SIGKILL is needed to bring down>> the hung process.>> - Other kdb instances in the container are unaffected.>> - Appears to happen randomly, sometimes after 1 hour, sometimes after 2>> days.>>>> Most likely a docker bug, but so far it's only manifested itself with kdb.>>>> -->> You received this message because you are subscribed to the Google Groups>> "Kdb+ Personal Developers" group.>> To unsubscribe from this group and stop receiving emails from it, send an>> email to personal-kdbplus+unsubscribe@googlegroups.com.>> To post to this group, send email to personal-kdbplus@googlegroups.com.>> Visit this group at https://groups.google.com/group/personal-kdbplus.>> For more options, visit https://groups.google.com/d/optout.>>