cancel
Showing results for 
Search instead for 
Did you mean: 

Trouble With Huge CSVs

planefan
New Contributor III

Greetings All, 

I've got a couple 40gb CSVs that I'm hoping to perform some joins on.

I do not know the column format, or headers, or if headers are even in the csv.

Im working with a good bit of memory, with 256gb accessible.

Loading the files into memory clearly doesn't work -- as expected the program crashes. 

So  made my way here (loading from large files page). I understand I'll have to convert my csvs to splayed tables, save those tables down and then work from there instead of using the csvs. 

I'm able to see the rows inside the csv with .Q.fs[0N!]`:file.csv -- I still don't know the entirety of whats inside though.

I go through this little bit, 

ticktick_0-1625838980094.png

and obviously it's too big and crashes the program. I try to insert the rows directly into a table on disk with .Q.fs[{`:newfile upsert flip colnames!("DFFFFIS";",")0:x}]`:file.csv and that crashes too

Should I be chunking this and going from that angle or is there a better way to do this? 

 

1 ACCEPTED SOLUTION

rocuinneagain
New Contributor III
New Contributor III

Yes the w32 version has a limit to how much memory it can address, w64 does not have this restriction.

 

You could also stream the data to an on disk table:

.Q.fs[{`:trade/ upsert flip colnames!("**********";",")0:x}]`:filename
trade:get `:trade/

View solution in original post

3 REPLIES 3

planefan
New Contributor III

I've chunked with .Q.fs[{`trade insert flip colnames!("**********";",")0:x}]`:filename and it runs until it crashed. 

Did some more research and thought it could be a gc issue, so I added a gc call but that didn't help me either. 

Dumb question, is this bc I'm using w32 instead of w64? 

rocuinneagain
New Contributor III
New Contributor III

Yes the w32 version has a limit to how much memory it can address, w64 does not have this restriction.

 

You could also stream the data to an on disk table:

.Q.fs[{`:trade/ upsert flip colnames!("**********";",")0:x}]`:filename
trade:get `:trade/

View solution in original post

that was the move, thanks!