cancel
Showing results for 
Search instead for 
Did you mean: 

LeetCode #53: Maximum Subarray

SJT
Contributor III
Contributor III

https://leetcode.com/problems/maximum-subarray/

Given an integer array nums, find the subarray which has the largest sum and return its sum.

The performance of q solutions to the Maximum Subarray problem appeared on StackOverflow recently.

In q/kdb+, can I speed up Kadane's algorithm for the maximum subarray problem?

Brute force

A brute-force solution is unlikely to be efficient but will be useful for testing faster candidates.

Ours generates indices for all the subarrays, calculates the corresponding sums and picks the largest.

q)x:-2 1 -3 4 -1 2 1 -5 4
q)1+tcx:til count x
1 2 3 4 5 6 7 8 9
q)tcx+neg[tcx]_\:til each 1+tcx:til count x
(,0;0 1;0 1 2;0 1 2 3;0 1 2 3 4;0 1 2 3 4 5;0 1 2 3 4 5 6;0 1 2 3 4 5 6 7;0 1 2 3 4 5 6 7 8)
(,1;1 2;1 2 3;1 2 3 4;1 2 3 4 5;1 2 3 4 5 6;1 2 3 4 5 6 7;1 2 3 4 5 6 7 8)
(,2;2 3;2 3 4;2 3 4 5;2 3 4 5 6;2 3 4 5 6 7;2 3 4 5 6 7 8)
(,3;3 4;3 4 5;3 4 5 6;3 4 5 6 7;3 4 5 6 7 8)
(,4;4 5;4 5 6;4 5 6 7;4 5 6 7 8)
(,5;5 6;5 6 7;5 6 7 8)
(,6;6 7;6 7 8)
(,7;7 8)
,,8

q)raze x tcx+neg[tcx]_\:til each 1+tcx:til count x
,-2
-2 1
-2 1 -3
-2 1 -3 4
-2 1 -3 4 -1
-2 1 -3 4 -1 2
-2 1 -3 4 -1 2 1
-2 1 -3 4 -1 2 1 -5
-2 1 -3 4 -1 2 1 -5 4
,1
1 -3
1 -3 4
1 -3 4 -1
1 -3 4 -1 2
1 -3 4 -1 2 1
1 -3 4 -1 2 1 -5
1 -3 4 -1 2 1 -5 4
,-3
-3 4
-3 4 -1
-3 4 -1 2
-3 4 -1 2 1
..

q)mcs0:{max sum each raze x tcx+neg[tcx]_\:til each 1+tcx:til count x}
q)mcs0 x
6

Kadane’s algorithm

The OP on StackOverflow offered ((|[0])(+)::)\[0f;x] as an implementation of Kadane’s algorithm and wondered if it could run as fast on a 9M list as Python+numba does. For this he would need at least a 4× improvement.

Your eye will immediately be drawn to the 0f. What’s a float doing in this? And you‘re right. The expression runs 2× faster with the seed set to integer 0; and fractionally faster in a simpler form.

q)M9: -10+9000000?20
q)\ts max((|[0]) (+)::)\[0f;M9]
4367 412436160
q)\ts max((|[0]) (+)::)\[0;M9]
2425 412436160
q)\ts max 0(0|+)\M9
2181 412436096

q)mcs1: max ((0|+)\) @  / Kadane

Pause here to notice the very terse composition of 0| and +, equivalent to but slightly faster than the lambda {0|x+y}.

q)\ts 0(0|+)\M9
2316 412436000
q)\ts 0{0|x+y}\M9
2051 412436096

This implementation of Kadane fails on an edge case: when all the numbers are negative. Like a sundial that “marks only the sunny hours”, anytime the max-so-far falls below zero, it gets set to 0. If no number is positive the result is wrongly given as zero.

q)0(0|+)\neg 1 + til 5
0 0 0 0 0

A more exact implementation of Kadane would track not only the max-since-last-zero, but also the max-seen-so-far in case all the numbers are negative. But of course that is nothing but the max of the numbers. If the expression above returns all 0s, then there are no positive numbers, and we can safely return the maximum instead.

q)mcs1a:{$[r:max 0(0|+)\x;r;max x]} 
q)mcs1a each(M9;neg 1+til 5)
354 -1

This will do nicely if all-negative is a rare edge case. But what if it were often so?

q)M9n:neg 1+abs M9
q)\ts mcs1a M9
2224 412435728
q)\ts mcs1a M9n
2561 412435728

We could handle this by modifying the scan to track both the max-since-last zero and also the max-number-seen. At the end, if the latter is negative, that’s the result we want.

q)mcs2:{{(y;z)z<0}.(0,x 0 0) {(0|y+x 0;x[1]|0|y+x 0; y|x 2)}/x}
q)\ts mcs2 M9
7793 592
q)\ts mcs2 M9n
7865 592

Now the two arguments require the same time and space, but execution takes 3× more time than mcs1a over 600× less space.

Vector primitives and overcomputing

The implicit iteration of the vector primitives is so fast that it is often more efficient to ‘overcompute’ than to parse and execute an explicit iteration. Consider the simple case of determining the range of a vector.

q)\ts:10 (min M9;max M9)
83 960
q)\ts:10 (min;max)@\:M9
82 1024
q)\ts:10 0 0{(y&x 0;y|x 1)}/M9
44158 2192

Vector vector vector

Finally we abandon Kadane’s algorithm for a vector solution from Nathan Swann.

q)mcs3:{max s-mins 0^prev s:sums x}
q)mcs3 neg 1+til 5  / handles edge case
-1
q)\ts mcs1 M9
2209 412435616
q)\ts mcs3 M9
84 536871104

Here all the iteration is implicit and we see a 50× speed-up from the OP’s solution, well more than the 4× improvement needed to beat Python+Numba.

Conclusions

  1. A little tweaking won 2× on the OP expression.
  2. Switching from two primitive aggregations through a 9M list to a single explicit iteration increased execution time 3×. (But massively saved memory.)
  3. Deserting Kadane and divide-and-conquer algorithms for a ‘pure’ vector solution won a 50× improvement, significantly beating the reported execution by Python+Numba.

Archived at https://github.com/qbists/studyq

5 REPLIES 5

LeahS
Moderator Moderator
Moderator

Super content @SJT 👏

Thanks for sharing with the KX Community!

NathanSwann
New Contributor

It looks like mcs3 is only 25x faster than mcs1, still a massive improvement though. For those curious the vector solution itself is a vectorization of Ulf Frenanders' own solution to the problem from back in 1977.

That’s right – the other 2× was ‘low-hanging fruit’ from using an int zero!

Hey Nathan! Thanks again for your brilliant solution to my Q problem! I've been wondering ever since how you came up with such a creative solution -- it strikes me as needing some background knowledge in a field of math I'm unfamiliar with yet -- an algebra of running sums, mins, maxes, and compositions of such functions (reminds me of stuff like the Legendre transform and the convex duals). Could I ask you what your background is, and what this branch of mathematics is called? I'd love to read up more on it. I was able to prove that the two algorithms are equivalent, but I did so at the scalar level; I'd love to come up with a proof that does it at the series level, using something similar to how vectors / matrices are used for linear algebra and multivariate calculus; or how lag operators are used in time series; do you know of any such proofs? Many thanks!

Background wise I am pure mathematics mostly studied Class/Set Theory along with Abstract algebra but I have a fair amount of time spent on programming languages akin to Q. I'm not familiar with any particular field that covers these sorts of operations specifically and I have no particular proof that both algorithms are equivalent.

Instead I solved it from the start again using the observation that the sum of a given range i to j is the same as the sum up to j minus the sum up to i. The proof of validity of the algorithm should be fairly direct from there but is should look like a classical series proof that you would see in Set Theory or in the fundamental proofs for calculus.

Hope that helps