cancel
Showing results for 
Search instead for 
Did you mean: 

Heap is a lot larger than used, how to find the cause?

Nick_Mospan
New Contributor III

I've got a process doing some calculations on a timer and sending updated table to another process. Its heap is more than 3x of used even after manual trigger of .Q.gc. 

 

key value
used 567774096
heap 1946157056
peak 2617245696

I'm using KDB+ 4.0 2021.04.26

Is memory fragmentation the only cause to it? How do I find which operation contributes to it the most?

Are there any other cases when kdb accumulates its internal memory or known bugs leading to memory leaks?

Thanks 

9 REPLIES 9

gyorokpeter-kx
Contributor
Contributor

As a first step you could insert printouts of .Q.w[] in between the actual operations in the query, even breaking down expressions into single operator invocations if necessary. Additionally .Q.ts can be used to figure out the time and space used by an operation, similarly to \ts but it also returns the result (it is parameterized like . (dot) for multi-parameter apply).

 

Thanks, I found one of the causes - code that brings and refreshes a large table from another process.

I'm starting a fresh process and bringing in a table of 107Mb. The heap settles to 268Mb after .Q.gc[].

However after updating this table the heap jumps up to 469Mb and stays there.

What's different between the first and second call to position:h"position" ? Why the heap does not go back to the initial 268Mb?

Here's the console output:

q).Q.w[]
used| 360512
heap| 67108864
peak| 67108864
wmax| 0
mmap| 0
mphy| 34359267328
syms| 686
symw| 37328
q)position:h"position"
q).Q.w[]
used| 226930848
heap| 402653184
peak| 402653184
wmax| 0
mmap| 0
mphy| 34359267328
syms| 1833
symw| 95932
q).Q.gc[]
134217728
q).Q.w[]
used| 226930848
heap| 268435456
peak| 402653184
wmax| 0
mmap| 0
mphy| 34359267328
syms| 1834
symw| 95962
q)position:h"position"
q).Q.gc[]
134217728
q).Q.w[]
used| 226933216
heap| 469762048
peak| 603979776
wmax| 0
mmap| 0
mphy| 34359267328
syms| 1834
symw| 95962
q).Q.gc[]
0
q)count position
276765
q)-22!position
107637762

sbruce01
New Contributor III
New Contributor III

Hi Nick,

Here are the steps I did to attempt reproducing your issue:

Host Machine (Port 5000):

 

q)n:50000000
q)position:([]time:n?.z.p;sym:n?`ABC`APPL`WOW;x:n?10f)

 

Client Machine:

 

q)h:hopen`::5000
q).Q.w[]
used| 357632
heap| 67108864
peak| 67108864
wmax| 0
mmap| 0
mphy| 8335175680
syms| 668
symw| 28560
q)position:h"position"
q).Q.w[]
used| 1610970544
heap| 2751463424
peak| 2751463424
wmax| 0
mmap| 0
mphy| 8335175680
syms| 672
symw| 28678
q).Q.gc[]
1073741824
q).Q.w[]
used| 1610969232
heap| 1677721600
peak| 2751463424
wmax| 0
mmap| 0
mphy| 8335175680
syms| 673
symw| 28708
q)position:h"position"
q).Q.w[]
used| 1610969232
heap| 4362076160
peak| 4362076160
wmax| 0
mmap| 0
mphy| 8335175680
syms| 673
symw| 28708
q).Q.gc[]
2684354560
q).Q.w[]
used| 1610969232
heap| 1677721600
peak| 4362076160
wmax| 0
mmap| 0
mphy| 8335175680
syms| 673
symw| 28708

 

As you can see in trying to replicate your issue, my example releases the expected amount of memory back to OS. Due to the number of records you have and the relative size of the table after, the issue I think you're encountering is due to the data structure of position leading to memory fragmentation. As per my other reply the reference on code kx gives an example of this stating "nested data, e.g. columns of char vectors, or much grouping" will lead to fragmenting memory heavily, does this reflect your data?

To fix this I'd suggest the approach on the reference of serialise, release, deserialise. Or to extend further to your case: serialise, release, deserialise, release, IPC reassign, release. This will maintain a low memory footprint and try to remedy the memory fragmentation but you may still unavoidably have heap greater than used purely due to the data structure (however to a lesser extent to what you're experiencing).

If memory fragmentation isn't the cause can you give a bit more insight on the data structure of position as my attempt to replicate shows this problem is data specific.

Nick_Mospan
New Contributor III

My table has 54 columns of various simple types, mainly floats, symbols, ints and timestamps. Each column is of around 2Mb in size.

I can reproduce it with your code by dropping n to 2000000, which makes columns similar in size to my case. .Q.gc[] does not help releasing the excess heap to the OS: 

 

q).Q.w[]
used| 50694464
heap| 134217728
peak| 201326592
wmax| 0
mmap| 0
mphy| 34359267328
syms| 696
symw| 37613

 

Each column with n:2000000 should be allocated 16777216 bytes of heap.

 

q)(-22!) each value flip position
16000014 8667837 16000014

 

What is the reason for such behaviour? Are these columns small enough to lead to memory fragmentation or there's something else going on?

sbruce01
New Contributor III
New Contributor III

I wasn't able to replicate the issue on my local machine running on KDB+ 4.0 2020.07.15:

sbruce01_0-1679011408253.png

My heap returned back to the level it was at the start of the Q session on release as expected.

However I was able to recreate the issue running KDB+ 4.0 Cloud Edition 2022.01.31. 

sbruce01_1-1679012797433.png

So the issue seems to lie with QCE releasing back to OS. I'll follow up internally on this to see if it's a known issue and what can be done to minimise the heap used.

However, per the screenshot I wasn't able to recreate the re-assigning of position via IPC call not lowering heap after running .Q.gc[] (heap is the same after GC and re-assigning this as initial assign and GC).

As a potential fix to this can you try before your second assignment of position purging it from memory:

 

delete position from `.
.Q.gc[]
.Q.w[] // to inspect
position:h"position"
.Q.w[] // to inspect
.Q.gc[]
.Q.w[] // to inspect

 

Nick_Mospan
New Contributor III

To replicate the issue please copy position table twice, like you did with the cloud edition. It's the second copy that takes and not releases the memory. I'm not running a cloud edition but the windows version:

 

 

 

KDB+ 4.0 2021.04.26 Copyright (C) 1993-2021 Kx Systems
w64/ 8()core 32767MB

 

 My theory is that the first copy creates the object in the first 64Mb block. For the second invocation of h"position" it had to create the second block and then assignment repoints the columns from the first to the second block. But because the first block has other objects already it cannot be freed. When the process is constantly updating this position table and at the same time serves other queries this situation repeats over and over slowly leading to a memory fragmentation that appears as a memory leak.

Is it possible to control the minimum block from command line? So knowing that a process is frequently creating "small" objects I could start it with 1Mb minimum block size instead of 64Mb?

sbruce01
New Contributor III
New Contributor III

Hi Nick,

Understood on the QCE version not being an issue. So in my initial response to this I wasn't able to replicate the issue with n:50000000, if you look at that you see I call position twice and the heap returns to normal. 

For n:2000000 I see the issue however so on the same page now:

sbruce01_0-1679444712477.png

Regardless, did you try my fix I suggested in the latest response - as it works for both QCE and Q:

sbruce01_1-1679444920332.png

See how if I delete position from the local namespace before reassigning it the heap returns to normal after GC.

I think your theory about the first block allocation then second block use on second IPC call is correct. The reason I didn't see this for the n=50000000 case was because the data was of a size that the memory allocated was large enough to hold both the IPC read and what was currently in memory without allocating another block. For the data you're using or the n=2000000 case the memory allocated was nearer to the amount taken up by the object in memory.

So my solution of deleting from the local namespace before calling again reduces the used memory in the process enough to be able to contain the second assignment and stop the invocation of the second block. Important to note that if you delete from the local namespace immediately before the second assignment this shouldn't affect your code since the reassignment would overwrite the variable anyway.

sbruce01
New Contributor III
New Contributor III

Hi Nick,

The previous comment of using .Q.w[] is a good start for isolating what part of the calculations are memory intensive and requiring a large heap allocation by the OS. Printing to standard out using 0N! after each expected memory intensive line will isolate that point in your code.

On the more under-the-hood side, this article by AquaQ is quite helpful to help understand. But to summarise and add some additional points:

  • KDB allocates memory in powers of two. Meaning a vector of data will be placed in a memory block one power of 2 up from the raw data, leading to at most 2x memory used.
  • Memory fragmentation may also be an issue depending on your aggregations - example here
  • The Q process starts with a certain amount of heap allocation that is larger than the used space (this can be seen by starting a Q session and running .Q.w[] straight away). The process won't go below this heap allocation by the OS on startup.

If you don't think that a combination of these points contributes enough to cause the heap to be this much larger than used after calling .Q.gc[]  I'd recommend invoking the script from the timer manually and investigating with .Q.w from there, as the heap does appear rather large even given the above. This would eliminate the issue of running garbage collection, and the timer function running again while investigating with .Q.w causing the numbers to be misleading.

davidcrossey
Moderator Moderator
Moderator

@Nick_Mospan might be worth checking if the objects are <64MB too

"During that return of memory, q checks if the capacity of the object is ≥64MB. If it is and \g is 1, the memory is returned immediately to the OS; otherwise, the memory is returned to the thread-local heap for reuse.

Executing .Q.gc[] additionally attempts to coalesce pieces of the heap into their original allocation units and returns any units ≥64MB to the OS." - System commands in q | Basics | kdb+ and q documentation - Kdb+ and q documentation (kx.com)