2023.03.13 04:48 AM
I've got a process doing some calculations on a timer and sending updated table to another process. Its heap is more than 3x of used even after manual trigger of .Q.gc.
key | value |
---|---|
used | 567774096 |
heap | 1946157056 |
peak | 2617245696 |
I'm using KDB+ 4.0 2021.04.26
Is memory fragmentation the only cause to it? How do I find which operation contributes to it the most?
Are there any other cases when kdb accumulates its internal memory or known bugs leading to memory leaks?
Thanks
2023.03.13 05:04 AM
As a first step you could insert printouts of .Q.w[] in between the actual operations in the query, even breaking down expressions into single operator invocations if necessary. Additionally .Q.ts can be used to figure out the time and space used by an operation, similarly to \ts but it also returns the result (it is parameterized like . (dot) for multi-parameter apply).
2023.03.14 02:40 AM
Thanks, I found one of the causes - code that brings and refreshes a large table from another process.
I'm starting a fresh process and bringing in a table of 107Mb. The heap settles to 268Mb after .Q.gc[].
However after updating this table the heap jumps up to 469Mb and stays there.
What's different between the first and second call to position:h"position" ? Why the heap does not go back to the initial 268Mb?
Here's the console output:
q).Q.w[]
used| 360512
heap| 67108864
peak| 67108864
wmax| 0
mmap| 0
mphy| 34359267328
syms| 686
symw| 37328
q)position:h"position"
q).Q.w[]
used| 226930848
heap| 402653184
peak| 402653184
wmax| 0
mmap| 0
mphy| 34359267328
syms| 1833
symw| 95932
q).Q.gc[]
134217728
q).Q.w[]
used| 226930848
heap| 268435456
peak| 402653184
wmax| 0
mmap| 0
mphy| 34359267328
syms| 1834
symw| 95962
q)position:h"position"
q).Q.gc[]
134217728
q).Q.w[]
used| 226933216
heap| 469762048
peak| 603979776
wmax| 0
mmap| 0
mphy| 34359267328
syms| 1834
symw| 95962
q).Q.gc[]
0
q)count position
276765
q)-22!position
107637762
2023.03.14 06:31 PM
Hi Nick,
Here are the steps I did to attempt reproducing your issue:
Host Machine (Port 5000):
q)n:50000000
q)position:([]time:n?.z.p;sym:n?`ABC`APPL`WOW;x:n?10f)
Client Machine:
q)h:hopen`::5000
q).Q.w[]
used| 357632
heap| 67108864
peak| 67108864
wmax| 0
mmap| 0
mphy| 8335175680
syms| 668
symw| 28560
q)position:h"position"
q).Q.w[]
used| 1610970544
heap| 2751463424
peak| 2751463424
wmax| 0
mmap| 0
mphy| 8335175680
syms| 672
symw| 28678
q).Q.gc[]
1073741824
q).Q.w[]
used| 1610969232
heap| 1677721600
peak| 2751463424
wmax| 0
mmap| 0
mphy| 8335175680
syms| 673
symw| 28708
q)position:h"position"
q).Q.w[]
used| 1610969232
heap| 4362076160
peak| 4362076160
wmax| 0
mmap| 0
mphy| 8335175680
syms| 673
symw| 28708
q).Q.gc[]
2684354560
q).Q.w[]
used| 1610969232
heap| 1677721600
peak| 4362076160
wmax| 0
mmap| 0
mphy| 8335175680
syms| 673
symw| 28708
As you can see in trying to replicate your issue, my example releases the expected amount of memory back to OS. Due to the number of records you have and the relative size of the table after, the issue I think you're encountering is due to the data structure of position leading to memory fragmentation. As per my other reply the reference on code kx gives an example of this stating "nested data, e.g. columns of char vectors, or much grouping" will lead to fragmenting memory heavily, does this reflect your data?
To fix this I'd suggest the approach on the reference of serialise, release, deserialise. Or to extend further to your case: serialise, release, deserialise, release, IPC reassign, release. This will maintain a low memory footprint and try to remedy the memory fragmentation but you may still unavoidably have heap greater than used purely due to the data structure (however to a lesser extent to what you're experiencing).
If memory fragmentation isn't the cause can you give a bit more insight on the data structure of position as my attempt to replicate shows this problem is data specific.
2023.03.15 01:40 AM
My table has 54 columns of various simple types, mainly floats, symbols, ints and timestamps. Each column is of around 2Mb in size.
I can reproduce it with your code by dropping n to 2000000, which makes columns similar in size to my case. .Q.gc[] does not help releasing the excess heap to the OS:
q).Q.w[]
used| 50694464
heap| 134217728
peak| 201326592
wmax| 0
mmap| 0
mphy| 34359267328
syms| 696
symw| 37613
Each column with n:2000000 should be allocated 16777216 bytes of heap.
q)(-22!) each value flip position
16000014 8667837 16000014
What is the reason for such behaviour? Are these columns small enough to lead to memory fragmentation or there's something else going on?
2023.03.16 05:21 PM - edited 2023.03.19 06:31 PM
I wasn't able to replicate the issue on my local machine running on KDB+ 4.0 2020.07.15:
My heap returned back to the level it was at the start of the Q session on release as expected.
However I was able to recreate the issue running KDB+ 4.0 Cloud Edition 2022.01.31.
So the issue seems to lie with QCE releasing back to OS. I'll follow up internally on this to see if it's a known issue and what can be done to minimise the heap used.
However, per the screenshot I wasn't able to recreate the re-assigning of position via IPC call not lowering heap after running .Q.gc[] (heap is the same after GC and re-assigning this as initial assign and GC).
As a potential fix to this can you try before your second assignment of position purging it from memory:
delete position from `.
.Q.gc[]
.Q.w[] // to inspect
position:h"position"
.Q.w[] // to inspect
.Q.gc[]
.Q.w[] // to inspect
2023.03.21 05:30 AM - edited 2023.03.21 05:31 AM
To replicate the issue please copy position table twice, like you did with the cloud edition. It's the second copy that takes and not releases the memory. I'm not running a cloud edition but the windows version:
KDB+ 4.0 2021.04.26 Copyright (C) 1993-2021 Kx Systems
w64/ 8()core 32767MB
My theory is that the first copy creates the object in the first 64Mb block. For the second invocation of h"position" it had to create the second block and then assignment repoints the columns from the first to the second block. But because the first block has other objects already it cannot be freed. When the process is constantly updating this position table and at the same time serves other queries this situation repeats over and over slowly leading to a memory fragmentation that appears as a memory leak.
Is it possible to control the minimum block from command line? So knowing that a process is frequently creating "small" objects I could start it with 1Mb minimum block size instead of 64Mb?
2023.03.21 05:41 PM
Hi Nick,
Understood on the QCE version not being an issue. So in my initial response to this I wasn't able to replicate the issue with n:50000000, if you look at that you see I call position twice and the heap returns to normal.
For n:2000000 I see the issue however so on the same page now:
Regardless, did you try my fix I suggested in the latest response - as it works for both QCE and Q:
See how if I delete position from the local namespace before reassigning it the heap returns to normal after GC.
I think your theory about the first block allocation then second block use on second IPC call is correct. The reason I didn't see this for the n=50000000 case was because the data was of a size that the memory allocated was large enough to hold both the IPC read and what was currently in memory without allocating another block. For the data you're using or the n=2000000 case the memory allocated was nearer to the amount taken up by the object in memory.
So my solution of deleting from the local namespace before calling again reduces the used memory in the process enough to be able to contain the second assignment and stop the invocation of the second block. Important to note that if you delete from the local namespace immediately before the second assignment this shouldn't affect your code since the reassignment would overwrite the variable anyway.
2023.03.13 06:29 PM
Hi Nick,
The previous comment of using .Q.w[] is a good start for isolating what part of the calculations are memory intensive and requiring a large heap allocation by the OS. Printing to standard out using 0N! after each expected memory intensive line will isolate that point in your code.
On the more under-the-hood side, this article by AquaQ is quite helpful to help understand. But to summarise and add some additional points:
If you don't think that a combination of these points contributes enough to cause the heap to be this much larger than used after calling .Q.gc[] I'd recommend invoking the script from the timer manually and investigating with .Q.w from there, as the heap does appear rather large even given the above. This would eliminate the issue of running garbage collection, and the timer function running again while investigating with .Q.w causing the numbers to be misleading.
2023.03.14 07:03 PM - edited 2023.03.14 07:22 PM
@Nick_Mospan might be worth checking if the objects are <64MB too
"During that return of memory, q checks if the capacity of the object is ≥64MB. If it is and \g
is 1, the memory is returned immediately to the OS; otherwise, the memory is returned to the thread-local heap for reuse.
Executing .Q.gc[]
additionally attempts to coalesce pieces of the heap into their original allocation units and returns any units ≥64MB to the OS." - System commands in q | Basics | kdb+ and q documentation - Kdb+ and q documentation (kx.com)
EMEA
Tel: +44 (0)28 3025 2242
AMERICAS
Tel: +1 (212) 447 6700
APAC
Tel: +61 (0)2 9236 5700
KX. All Rights Reserved.
KX and kdb+ are registered trademarks of KX Systems, Inc., a subsidiary of FD Technologies plc.