The following program is parsing csv into in-memory table at about 22 MB/sec. I have repeated the benchmark a few times. The data is in the OS disk cache. The program is 1 core CPU-bound. My CPU is i5-3230M @ 2.60 GHz on a laptop. How can I make it faster? What have I missed in the tutorials? Thx.
I have just plotted total read time vs block size.
It takes about 4.5 sec to read a 128 MB csv file under various block sizes. Avg 28 MB/sec. The CPU is i5-2400 3.1 GHz on a desktop. Speed stopped increasing above 64k block sizes. How can I make it faster? Thx.
"2015.03.14D08:21:08.150399000 - 0D00:00:15.505887000 1 k block"
"2015.03.14D08:21:17.790950000 - 0D00:00:09.640551000 2 k block"
"2015.03.14D08:21:24.702345000 - 0D00:00:06.911395000 4 k block"
"2015.03.14D08:21:30.300666000 - 0D00:00:05.598321000 8 k block"
"2015.03.14D08:21:35.237948000 - 0D00:00:04.936282000 16 k block"
"2015.03.14D08:21:39.991220000 - 0D00:00:04.753272000 32 k block"
"2015.03.14D08:21:44.511479000 - 0D00:00:04.520259000 64 k block"
"2015.03.14D08:21:48.921731000 - 0D00:00:04.410252000 128 k block"
"2015.03.14D08:18:15.374517000 - 0D00:00:04.427254000 256 k block"
"2015.03.14D08:18:19.814771000 - 0D00:00:04.439254000 512 k block"
"2015.03.14D08:18:24.336029000 - 0D00:00:04.521258000 1024 k block"
"2015.03.14D08:18:28.877289000 - 0D00:00:04.541260000 2048 k block"
"2015.03.14D08:18:33.387547000 - 0D00:00:04.510258000 4096 k block"
"2015.03.14D08:18:37.888804000 - 0D00:00:04.500257000 8192 k block"
"2015.03.14D08:18:42.397062000 - 0D00:00:04.508258000 16384 k block"
"2015.03.14D08:18:46.939322000 - 0D00:00:04.541260000 32768 k block"
"2015.03.14D08:18:51.530585000 - 0D00:00:04.590263000 65536 k block"
"2015.03.14D08:18:56.151849000 - 0D00:00:04.621264000 131072 k block"
Ok Yan, I see what you mean. So you got a x2 speedup for a x4 core addition, which is 50% scaling efficiency. This is around the same as the tests done here using the parallel read on split files.. almost x3 speedup for 6 threads (doesnt say the number of cores)
Are your cpus busy? Maybe taskset set one to get fair results?
Remember you are timing has to:
1) read in chunk
2) split on ","
3) convert each column to correct type
And the number of times this has to occur is greater as the size of the chunks you read are smaller.
Can you try without .Q.fs and see what speeds you are getting maybe?
Perhaps you should try out Simon's script. It is very well written, well tested, avoids a while loop, and has many other features you can avail off. Plus it has dynamic chunk sizes based on MB size. http://kx.com/q/e/csvguess.q. It outputs time taken, records per second, and speed so could be a lot of help to you.
I have tried csvguess.q. It loads at 36 MB/sec for chunksizes from 4 to 128 MB. My disks are not busy. I have done the benchmark on a ramdisk. It is single-core cpu-bound. The max cpu% I have seen is 25% on a 4-core machine.
I have got 98 MB/sec after REDUCING the chunk size from 10 MB to 120 KB. The default chunk size of 131000 is probably chosen to be smaller than 50% of 256 KB, the cpu level 2 cache size per core. code + input data + output data < 256 KB