cancel
Showing results for 
Search instead for 
Did you mean: 

q vector math not much faster than serial erlang code?

joelr1
New Contributor

Mime-Version: 1.0 (Apple Message framework v919.2)
Subject: q vector math not much faster than serial erlang code?
Date: Mon, 7 Apr 2008 11:33:59 +0100
Cc: Zvi Avraham
X-Mailer: Apple Mail (2.919.2)

http://groups.google.com/group/erlang-questions/browse_thread/thread/c2853f9826a19f12?hl=en

Here's a juicy stake. Where are the lions?

---
Zvi:

BTW, I somewhat disappointed with vector math implemented in Matlab and
kdb+/Q. It's not much faster than serial Erlang code (and much much
slower
than parallel Erlang code). You can see it in my little PI calculation
benchmark:

Matlab:

>> tic; N=1000000; Step = 1/N; PI = Step*sum(4./(1+
(((1:N)-0.5)*Step).^2));
>> toc

Elapsed time is 0.125428 seconds.
Q:

q)N:1000000
q)Step:1%N
q)PI: {[]; Step*((4f % (1f + ((Step*((1+ til N)-0.5)) xexp 2))) +/)}
q)PI()
3.141593
q)\t do[1000;PI()]
264359
q)

i.e. 264 ms

on my dual CPU workstation I getting ~ 200 ms.

Erlang:

Serial tail-recursuive Erlang version takes ~ 170 ms - 300 ms (on
different
machines). Parallel Erlang version on 16 cores takes 15 ms



--
wagerlabs.com






4 REPLIES 4

Attila
New Contributor

X-Mailer: Apple Mail (2.919.2)

xexp is slow, avoid it if you can

q)\t do[10;PI()]
1451
q)p:{sum Step*4%1+{x*x}Step*.5+til N}
q)\t do[10;p[]]
638

Regards,
Attila
On 7 Apr 2008, at 11:33, Joel Reymont wrote:

>
> http://groups.google.com/group/erlang-questions/browse_thread/thread/c2853f9826a19f12?hl=en
>
> Here's a juicy stake. Where are the lions?
>
> ---
> Zvi:
>
> BTW, I somewhat disappointed with vector math implemented in Matlab
> and
> kdb+/Q. It's not much faster than serial Erlang code (and much much
> slower
> than parallel Erlang code). You can see it in my little PI calculation
> benchmark:
>
> Matlab:
>
>>> tic; N=1000000; Step = 1/N; PI = Step*sum(4./(1+
> (((1:N)-0.5)*Step).^2));
>>> toc
>
> Elapsed time is 0.125428 seconds.
> Q:
>
> q)N:1000000
> q)Step:1%N
> q)PI: {[]; Step*((4f % (1f + ((Step*((1+ til N)-0.5)) xexp 2))) +/)}
> q)PI()
> 3.141593
> q)\t do[1000;PI()]
> 264359
> q)
>
> i.e. 264 ms
>
> on my dual CPU workstation I getting ~ 200 ms.
>
> Erlang:
>
> Serial tail-recursuive Erlang version takes ~ 170 ms - 300 ms (on
> different
> machines). Parallel Erlang version on 16 cores takes 15 ms
>
>
>
> --
> wagerlabs.com
>
>
>
>
>
>
> >


X-Mailer: Apple Mail (2.919.2)X-HELO-Warning: Remote host 213.182.238.93 incorrectly presented itself as [90.0.0.65]X-Scan-Signature: 3eda1eb18690ac26d5869843136654cc> q)p:{sum Step*4%1+{x*x}Step*.5+til N}could be made a tiny bit faster by pulling a multiplication in front of the sum (in k, for once):s:%n:1000000-s-s*+/4%1+t*t:s*!nmuch more interesting, however, would be to test the peaching variant, so if someone has 16-core machine under their desk ... or, preferentially, in a data center far, far away:s:%n:1000000m:_n%c:16 / cores\t do[100;-s-s*+/{+/4%1+t*t:s*x}':(c;m)#!n]

X-Mailer: Apple Mail (2.919.2)X-HELO-Warning: Remote host 213.182.238.93 incorrectly presented itself as [90.0.0.65]X-Scan-Signature: 75d58b8afcf1da6ab59221388f2830a6On 07.04.2008, at 18:17, Christian Langreiter wrote:> much more interesting, however, would be to test the peaching variant,after experimenting with the peaching variant a bit (after being startled about the wicked [i.e. bad] scaling behaviour) we found that, a bit surprisingly, constructing many small index (!) vectors is much faster than constructing a few large ones. \t do[100;!1000000]238 \t do[100;do[10;!100000]]151taking that into account, and splitting into 100 "work packages", we get: \t s:%n:1000000; do[100;r:-s-s*+/{[x;o]+/4%1+t*t:s*o+x}[!10000]': 10000*!100]3091q: ~30ms on 1 core ~15ms on 2 coreserlang: ~15ms on 16 corescase settled.

X-Y-GMX-Trusted: 0Thursday, April 10, 2008, 1:38:54 AM, Christian wrote:> On 07.04.2008, at 18:17, Christian Langreiter wrote:>> much more interesting, however, would be to test the peaching variant,> after experimenting with the peaching variant a bit [...] and> splitting into 100 "work packages", we get:To expand a bit on that: in the original peach variation Chris posted,Kdb+ generated the whole 1m integer vector in the master, thenreshaped it into work packages and passed those to the slaves.A quick look at the code of the Erlang variant [1] and we gatheredthat the Erlang version does similar blocking, but of course withoutall the data: workers are spawned with an offset parameter and returntheir part of the sum.So we tried reducing the "til 1000000" a bit, instead passing offsetsto workers and let the workers generate their own, smaller worksets,e.g. til 10000. This works fine and scales far better. The finalvariation posted by Chris speeds up things a little more, by simplycalculating the til 10000 once and passing that to each worker alongwith the offset:> s:%n:1000000;> -s-s*+/{[x;o]+/4%1+t*t:s*o+x}[!10000]':10000*!100[1] http://groups.google.com/group/erlang-questions/browse_thread/thread/7abaaddeb95a8081)-- Andreas Bolka