2021.07.09 06:59 AM
Greetings All,
I've got a couple 40gb CSVs that I'm hoping to perform some joins on.
I do not know the column format, or headers, or if headers are even in the csv.
Im working with a good bit of memory, with 256gb accessible.
Loading the files into memory clearly doesn't work -- as expected the program crashes.
So made my way here (loading from large files page). I understand I'll have to convert my csvs to splayed tables, save those tables down and then work from there instead of using the csvs.
I'm able to see the rows inside the csv with .Q.fs[0N!]`:file.csv -- I still don't know the entirety of whats inside though.
I go through this little bit,
and obviously it's too big and crashes the program. I try to insert the rows directly into a table on disk with .Q.fs[{`:newfile upsert flip colnames!("DFFFFIS";",")0:x}]`:file.csv and that crashes too
Should I be chunking this and going from that angle or is there a better way to do this?
2021.07.10 11:27 PM - edited 2021.07.10 11:31 PM
Yes the w32 version has a limit to how much memory it can address, w64 does not have this restriction.
You could also stream the data to an on disk table:
.Q.fs[{`:trade/ upsert flip colnames!("**********";",")0:x}]`:filename
trade:get `:trade/
2021.07.09 11:11 AM
I've chunked with .Q.fs[{`trade insert flip colnames!("**********";",")0:x}]`:filename and it runs until it crashed.
Did some more research and thought it could be a gc issue, so I added a gc call but that didn't help me either.
Dumb question, is this bc I'm using w32 instead of w64?
2021.07.10 11:27 PM - edited 2021.07.10 11:31 PM
Yes the w32 version has a limit to how much memory it can address, w64 does not have this restriction.
You could also stream the data to an on disk table:
.Q.fs[{`:trade/ upsert flip colnames!("**********";",")0:x}]`:filename
trade:get `:trade/
2021.07.18 01:07 PM
that was the move, thanks!
EMEA
Tel: +44 (0)28 3025 2242
AMERICAS
Tel: +1 (212) 447 6700
APAC
Tel: +61 (0)2 9236 5700
KX. All Rights Reserved.
KX and kdb+ are registered trademarks of KX Systems, Inc., a subsidiary of FD Technologies plc.