cancel
Showing results for 
Search instead for 
Did you mean: 

saving tables to disk, deleting from memory, wsfull and garbage

Goran_Bacic
New Contributor
Hi all,

Apologies in advance if this is asking too much.

I have been having a hard time getting q to free memory after I've saved a table to disk. My use case is hard to explain in full here, so I'll try and make it general.

Essentially, I have a problem I'm trying to solve that requires calculations against all permutations of sets of up to 9 elements. In other words, I have two lists of integer lists (that represent prime factors) and I have to check one against the permutations of the other to find the minimum of their mapped logarithmic differences. I must do these calculations on many variations of these sets. (That might not be entirely accurate, but it gets the procedure across.)

For scale, I have >200k sets to check, and there are up to 362880 permutations per set (if each element is unique, I can discard shared elements).

I cannot fit it all into memory, so I decided to try and loop over chunks (of 16) of the set, saving (using set) and deleting (using delete) as I go.

However, I cannot seem to free up the memory in my q process after one loop, and the process quits with a 'wsfull error after the first chunk. It does the same if I perform the first two steps by hand, failing at the second step.

I have called .Q.gc[] and it says it frees up a lot of memory (e.g. last time it said it cleaned up 1811939328 bytes) but .Q.w[] returns the same values for used and heap memory. Also, the syms and symw counts sometimes increase after garbage collection (e.g. from 685 and 26528 to 686 and 26554, respectively).

I can't seem to find the resources to properly implement or debug this kind of step-wise calculation-then-save-to-disk strategy.

How can I see where it's keeping the table in memory and then delete it to free up space for the next step in the loop?
Am I using set and delete incorrectly? 
Can/should I use save to output to files with incrementing names (or append to an existing file)? 
Is this a problem that requires splayed or partitioned tables?

I don't mean to ask for hand-holding, but it has really become the final impediment to finishing this project.

Thanks kindly in advance for the help.

Cheers,
g

4 REPLIES 4

Flying
New Contributor III
After gc, it is normal that heap size remain the same, as q does not aggressively return memory to OS unless you instruct it to (-g argument). But the used memory size should reduce if you did manage to release memory before gc.

From your descriptions, it is not obvious what you did to release memory before calling gc (note: gc does not release memory for you, it merely repacks memory to make memory allocation more efficient). If you still have variables holding onto large trunks of memory, gc will not help you. Therefore, I think it's prudent that you share some fragments of your code related to the process you described above, so that we can try to figure out what exactly is the situation you are facing.

I've been attempting to construct a table and save it as a file in on step. This code will be open-source once it is publishable, and I'm really more worried about embarrassing myself in front of other q developers (ie I use verbose names, pretty sure I'm misunderstanding a lot of the way q handles function arguments, etc.).

The project is to compute harmonic distances (and related interchordal harmonic quantities) between just-intoned chords (ie sets of prime factors).

For example, to calculate the "harmonic sum" between two chords, represented by a list of integers corresponding to the powers of their prime factors (e.g. an chord of (3 11 169) of 169 would be (0 1 0 0 0; 0 0 0 1 0; 0 0 0 0 2) when working with a 13-limit), we want to check the minimal sum of the absolute harmonic distance (i.e. HD:{sum 2 xlog primes xexp raze x}) between all different mappings of unique values of these chords.
HS: {[c0; c1]
// exclude common intervals
c0_p: c0@where not c0 in c1;
c1_p: c1@where not c1 in c0;
// minimal sum of the absolute distance between of all permutations between c1 and c0
: $[0 = count hs: min sum each {HD each x} each abs each c0_p -'/: c1_p@perms[-1+count c1_p];
0; // same chord? return 0 (same chords return empty lists)
(abs hs) < 1e-5; // very small? return 0
0;
hs // otherwise return the actual harmonic sum
]
 };
NB. it still runs out of memory even if I do not set hs to check for empty lists or small values. It is my understanding that this does not matter, as hs is set in the scope of HS.
 
For reference, I'm using Pierre's k4 implementation of permutations (http://nsl.com/k/perm.k) as I needed something that did not use recursion, and the q version listed in the q idioms on the wiki cannot take general lists as their argument. I realize how to change this now but I prefer to use this k version until I get the rest to work.
q)k)permutations:{(,/@\:)/,\(,<<:)'x=\:x:!x};
Also, to be economical, I am generating the permutations and looking them up rather than generating them each time. 
q)perms: permutations each 1+til count primeOrder; 
Where primeOrder is the order of primes used in the tree that represents the chord in some N-dimensional harmonic space (ie each dimension is an exponentially spaced plane of an arbitrary prime).

Now, I've generated all the possible mutations of a 10 element chord in 4-dimensional space to calculate this and two other values for. Specifically, I've checked the different prime order permutations, the reflections of those permutations, and the translations of those shapes in space up to a certain limit (ie the upper Hz limit of human hearing). This brings the count up to ~218k different sets of 10 elements.

I've only been attempting to generate a table of the 3 quantities required with respect to one (the first) permutation of the prime order of the chord set. My loop looks like this:
i:0;
cutPrimes: 16 cut allOtherPrimes;
while[i <= count cutPrimes;
show "Working on chunk ",(string i)," of ",string count cutPrimes;
filename: `$"ht_4d_",string i;
filename set ([] HS: {HS[pPrime; x]} peach cutPrimes[i]; IHD: {IHD[pPrime; x]} peach cutPrimes[i]; PV: {PV[pPrime; x]} peach cutPrimes[i]);
delete filename from `.;
i+::1;
 ];
NB I'm using peach with -s 8 to speed up everything. Chunk size of 16 is arbitrary. show statement is for debugging. I'm not confident in my use of set or delete here. And I have been calling .Q.gc[] manually, although from the sounds of Flying's description I'm not calling it correctly, either.

Let's say I do not set the table to anything and just execute it for the first chunk. Then, try the second. I get a 'wsfull error even though I have not set anything.
q)// before table generation
q).Q.w[]
used|210295952  
heap|268435456  
peak|268435456  
wmax|0          
mmap|0          
mphy|25238728704
syms|638        
symw|20452      
q)i:0;
q)([] HS: {HS[pPrime; x]} peach cutPrimes[i]; IHD: {IHD[pPrime; x]} peach cutPrimes[i]; PV: {PV[pPrime; x]} peach cutPrimes[i])
HS                 IHD                  PV                  
------------------------------------------------------------
27.348234169259634 3.994476838856357    -2.210896782498624f 
31.015411433268973 7.6616541028657      1.456280481510717f  
71.94088011289334  0.5464091664079223   -7.916065108069992f 
15.387849050834898 0.29199816120364064  1.4562804815107278f 
...                ...                  ...                 
q).Q.w[]
used|210291984  
heap|268435456  
peak|268435456  
wmax|0          
mmap|0          
mphy|25238728704
syms|638        
symw|20452      
q)i+:1;
q) ([] HS: {HS[pPrime; x]} peach cutPrimes[i]; IHD: {IHD[pPrime; x]} peach cutPrimes[i]; PV: {PV[pPrime; x]} peach cutPrimes[i])
'wsfull

I hope this is enough information to assist me. I am keeping a close eye on this thread.

Thank you all so much!
g

Alright, I seem to have managed to work around my issue. 

If I perform the operations one at a time within a function, appending  as I go, I do not seem to run out of memory. This makes sense to me, as some of the set combinations are more complicated (have less shared elements) than others, causing my session to have to store too many permutations at once when I try and do it in chunks, thus throwing 'wsfull. It's rather embarrassing that I didn't try this before posting, anyway thanks for looking and helping! 

For posterity, I'm roughly using something like this:
hs_tmp: ();
ihd_tmp: ();
pv_tmp: ();
g:{
hs_tmp,::  HS[pPrime; x];
ihd_tmp,:: IHD[pPrime; x];
pv_tmp,::  PV[pPrime; x];
 };
then constructing a table afterwards.


g

trentkg
New Contributor
A few thoughts: 
For a table t, you can use -22!t to see the size in bytes. 

You have not said at what point you get the wsfull error, other than its after the first chunk. Is it when you try to save the first chunk to disk? Or is it when you've loaded the next chunk into memory?  Or is it when you're performing calculations on the next chunk?  

Here's a detailed explanation of garbage collection: https://www.aquaq.co.uk/q/garbage-collection-kdb/

A quick and dirty fix would be to save where you are in the iteration to disk after each iteratoin (like an one like text file with just an integer, called state.txt). Then just delete everything in the namespace after each iteration

delete from `.

 and call .Q.gc[]. Then reload you're functions and state.txt . 

Finally, the input of the program is too big to fit into memory, that I get. But what about the output (presumably, what you're saving to disk). Is that too big to big to fit in memory? If so, you could use partitioning, or just save each output as its own file. I would use partitioning only if you can choose a partitioning variable that is a non overlapping set (such as the date).