2022.11.22 11:42 AM
https://leetcode.com/problems/maximum-subarray/
Given an integer array
nums
, find the subarray which has the largest sum and return its sum.
The performance of q solutions to the Maximum Subarray problem appeared on StackOverflow recently.
In q/kdb+, can I speed up Kadane's algorithm for the maximum subarray problem?
A brute-force solution is unlikely to be efficient but will be useful for testing faster candidates.
Ours generates indices for all the subarrays, calculates the corresponding sums and picks the largest.
q)x:-2 1 -3 4 -1 2 1 -5 4
q)1+tcx:til count x
1 2 3 4 5 6 7 8 9
q)tcx+neg[tcx]_\:til each 1+tcx:til count x
(,0;0 1;0 1 2;0 1 2 3;0 1 2 3 4;0 1 2 3 4 5;0 1 2 3 4 5 6;0 1 2 3 4 5 6 7;0 1 2 3 4 5 6 7 8)
(,1;1 2;1 2 3;1 2 3 4;1 2 3 4 5;1 2 3 4 5 6;1 2 3 4 5 6 7;1 2 3 4 5 6 7 8)
(,2;2 3;2 3 4;2 3 4 5;2 3 4 5 6;2 3 4 5 6 7;2 3 4 5 6 7 8)
(,3;3 4;3 4 5;3 4 5 6;3 4 5 6 7;3 4 5 6 7 8)
(,4;4 5;4 5 6;4 5 6 7;4 5 6 7 8)
(,5;5 6;5 6 7;5 6 7 8)
(,6;6 7;6 7 8)
(,7;7 8)
,,8
q)raze x tcx+neg[tcx]_\:til each 1+tcx:til count x
,-2
-2 1
-2 1 -3
-2 1 -3 4
-2 1 -3 4 -1
-2 1 -3 4 -1 2
-2 1 -3 4 -1 2 1
-2 1 -3 4 -1 2 1 -5
-2 1 -3 4 -1 2 1 -5 4
,1
1 -3
1 -3 4
1 -3 4 -1
1 -3 4 -1 2
1 -3 4 -1 2 1
1 -3 4 -1 2 1 -5
1 -3 4 -1 2 1 -5 4
,-3
-3 4
-3 4 -1
-3 4 -1 2
-3 4 -1 2 1
..
q)mcs0:{max sum each raze x tcx+neg[tcx]_\:til each 1+tcx:til count x}
q)mcs0 x
6
The OP on StackOverflow offered ((|[0])(+)::)\[0f;x]
as an implementation of Kadane’s algorithm and wondered if it could run as fast on a 9M list as Python+numba does. For this he would need at least a 4× improvement.
Your eye will immediately be drawn to the 0f
. What’s a float doing in this? And you‘re right. The expression runs 2× faster with the seed set to integer 0; and fractionally faster in a simpler form.
q)M9: -10+9000000?20
q)\ts max((|[0]) (+)::)\[0f;M9]
4367 412436160
q)\ts max((|[0]) (+)::)\[0;M9]
2425 412436160
q)\ts max 0(0|+)\M9
2181 412436096
q)mcs1: max ((0|+)\) @ / Kadane
Pause here to notice the very terse composition of 0|
and +
, equivalent to but slightly faster than the lambda {0|x+y}
.
q)\ts 0(0|+)\M9
2316 412436000
q)\ts 0{0|x+y}\M9
2051 412436096
This implementation of Kadane fails on an edge case: when all the numbers are negative. Like a sundial that “marks only the sunny hours”, anytime the max-so-far falls below zero, it gets set to 0. If no number is positive the result is wrongly given as zero.
q)0(0|+)\neg 1 + til 5
0 0 0 0 0
A more exact implementation of Kadane would track not only the max-since-last-zero, but also the max-seen-so-far in case all the numbers are negative. But of course that is nothing but the max
of the numbers. If the expression above returns all 0s, then there are no positive numbers, and we can safely return the maximum instead.
q)mcs1a:{$[r:max 0(0|+)\x;r;max x]}
q)mcs1a each(M9;neg 1+til 5)
354 -1
This will do nicely if all-negative is a rare edge case. But what if it were often so?
q)M9n:neg 1+abs M9
q)\ts mcs1a M9
2224 412435728
q)\ts mcs1a M9n
2561 412435728
We could handle this by modifying the scan to track both the max-since-last zero and also the max-number-seen. At the end, if the latter is negative, that’s the result we want.
q)mcs2:{{(y;z)z<0}.(0,x 0 0) {(0|y+x 0;x[1]|0|y+x 0; y|x 2)}/x}
q)\ts mcs2 M9
7793 592
q)\ts mcs2 M9n
7865 592
Now the two arguments require the same time and space, but execution takes 3× more time than mcs1a
over 600× less space.
The implicit iteration of the vector primitives is so fast that it is often more efficient to ‘overcompute’ than to parse and execute an explicit iteration. Consider the simple case of determining the range of a vector.
q)\ts:10 (min M9;max M9)
83 960
q)\ts:10 (min;max)@\:M9
82 1024
q)\ts:10 0 0{(y&x 0;y|x 1)}/M9
44158 2192
Finally we abandon Kadane’s algorithm for a vector solution from Nathan Swann.
q)mcs3:{max s-mins 0^prev s:sums x}
q)mcs3 neg 1+til 5 / handles edge case
-1
q)\ts mcs1 M9
2209 412435616
q)\ts mcs3 M9
84 536871104
Here all the iteration is implicit and we see a 50× speed-up from the OP’s solution, well more than the 4× improvement needed to beat Python+Numba.
Archived at https://github.com/qbists/studyq
2022.11.23 02:23 AM
Super content @SJT 👏
Thanks for sharing with the KX Community!
2022.11.23 02:32 AM
It looks like mcs3 is only 25x faster than mcs1, still a massive improvement though. For those curious the vector solution itself is a vectorization of Ulf Frenanders' own solution to the problem from back in 1977.
2022.11.23 03:25 AM
That’s right – the other 2× was ‘low-hanging fruit’ from using an int zero!
2022.11.23 03:49 AM
Hey Nathan! Thanks again for your brilliant solution to my Q problem! I've been wondering ever since how you came up with such a creative solution -- it strikes me as needing some background knowledge in a field of math I'm unfamiliar with yet -- an algebra of running sums, mins, maxes, and compositions of such functions (reminds me of stuff like the Legendre transform and the convex duals). Could I ask you what your background is, and what this branch of mathematics is called? I'd love to read up more on it. I was able to prove that the two algorithms are equivalent, but I did so at the scalar level; I'd love to come up with a proof that does it at the series level, using something similar to how vectors / matrices are used for linear algebra and multivariate calculus; or how lag operators are used in time series; do you know of any such proofs? Many thanks!
2022.11.23 05:40 AM
Background wise I am pure mathematics mostly studied Class/Set Theory along with Abstract algebra but I have a fair amount of time spent on programming languages akin to Q. I'm not familiar with any particular field that covers these sorts of operations specifically and I have no particular proof that both algorithms are equivalent.
Instead I solved it from the start again using the observation that the sum of a given range i to j is the same as the sum up to j minus the sum up to i. The proof of validity of the algorithm should be fairly direct from there but is should look like a classical series proof that you would see in Set Theory or in the fundamental proofs for calculus.
Hope that helps
EMEA
Tel: +44 (0)28 3025 2242
AMERICAS
Tel: +1 (212) 447 6700
APAC
Tel: +61 (0)2 9236 5700
KX. All Rights Reserved.
KX and kdb+ are registered trademarks of KX Systems, Inc., a subsidiary of FD Technologies plc.