Functional Leetcode 100 – Permutations

NOTE

This article was translated from Chinese by ChatGPT and may contain some errors.

Preface

Algorithm problems are one of the nightmares for many programmers. Algorithms themselves are certainly interesting, but the frustration of not being able to solve problems and the embarrassment of not being able to answer questions in interviews is something no one enjoys.

The traditional way to deal with the challenge of algorithm problems is problem-solving practice. Solving a large number of problems to learn algorithms. Many people believe in a theory that, whether its learning mathematics or algorithms, a large amount of practice is necessary.

I also agree with this theory. The famous saying of von Neumann is still hanging on the homepage of this blog:

Young man, in mathematics you dont understand things. You just get used to them.

Intuitive understanding, intuitive experience, is an invaluable treasure. Practice is the most reliable means of acquiring this invaluable treasure. In short, if you want to learn algorithms, you must practice problems.

However, I am extremely tired of a conventional path of algorithm learning. That is, when you encounter a problem you cant solve, after thinking about it for 10 minutes and still not knowing where to start, you go looking for the answer. Perhaps you understand the answer or remember the answer, and the next time you see the same problem, you plug it in. As long as you have seen enough answers, then you can plug enough problems.

Why dont I like this learning method? Perhaps Im not smart enough, I find it difficult to understand the answers given by others. Too many people only tell you the final efficient algorithm, ignoring the process of building intuition, let alone proving the algorithm. From these peoples answers, I neither feel their deep understanding of the problem nor the beauty of the algorithm itself.

Can we only learn algorithms through an education designed for geniuses?

NoThere are still people in the world who are trying to design programs using purely functional languages and equation derivation methods. These people also value the value of intuition. They start from a clear but inefficient program and step by step give the final efficient algorithm. In this process, the beauty of the algorithm, the insight of the algorithm, is revealed in detail. The late Richard Bird and Professor Mu Xincheng from Taiwan are pioneers of this approach. In their papers and books, I feel a true gentleness.

I think, although equation derivation is often very difficult, explaining what the non-trivial part of the algorithm is and what the intuition of the algorithm is, may not be that difficult.

This series of articles is my attempt. In these articles, I will explain how I solve Leetcode problems using purely functional programming. As for why I chose Leetcode instead of the more difficult Codeforces, I believe that too high a difficulty may not be a good thing. High difficulty problems are likely to require multiple algorithms to handle together, and we want to focus on a specific type of algorithm in each article. (Another reason is that I am still a beginner in the sense of algorithms, and I cant handle problems that are too difficult)

The consequence of choosing Leetcode is that we must use Racket or Scala rather than Haskell to submit code. This can sometimes cause some confusion, but my experience is that most of the Haskell code I provide can be directly translated into Scala. For each problem, I will also provide the corresponding Scala solution. If there are indeed problems, such as Scala not having a Data.Graph immutable data structure library, then we will fall back to non-pure functional programming as the situation dictates. However, this situation should be relatively rare overall. At the end of each problem, I may also provide the corresponding imperative version, and perhaps we will also discuss the differences between the functional version and the imperative version.

These articles are not Introduction to Functional Programming. Although I may hint at the usage of certain library functions, overall, these articles assume that you already have a basic knowledge of purely functional programming - recursion, lists, fold, map, etc. If you know nothing about functional programming and the Haskell language, it is best to learn some Haskell knowledge before reading these articles. Here are some learning resources I recommend:

The road is long and the way is far, I will seek it up and down.

31. Next Permutation

Problem Description

Here is the original description from Leetcode:

A permutation of an array of integers is an arrangement of its members into a sequence or linear order.

For example, for arr = [1,2,3], the following are all the permutations of arr: [1,2,3], [1,3,2], [2,1,3], [2,3,1], [3,1,2], [3,2,1].

The next permutation of an array of integers is the next lexicographically greater permutation of its integer. More formally, if all the permutations of the array are sorted in one container according to their lexicographical order, then the next permutation of that array is the permutation that follows it in the sorted container. If such arrangement is not possible, the array must be rearranged as the lowest possible order (i.e., sorted in ascending order).

For example: - The next permutation of arr = [1,2,3] is [1,3,2]. - Similarly, the next permutation of arr = [2,3,1] is [3,1,2]. - While the next permutation of arr = [3,2,1] is [1,2,3] because [3,2,1] does not have a lexicographical larger rearrangement.

Given an array of integers nums, find the next permutation of nums.

The replacement must be in place and use only constant extra memory.

We are in purely functional programming. Pure functional programming has no concept of modification, so we can ignore the last sentence.

First Thought

Generally, many algorithm problems can give an extremely concise (but inefficient) purely functional solution. We call this solution the programs standard reference (Spec), and our thinking starts from here. However, this problem does not seem to be the case. Consider the mathematical expression of this problem:
next(a)={min(P)a=max(P)min({mmP,m>a})otherwise \text{next}(a) = \begin{cases} \min(\mathbb{P}) & a = \max(\mathbb{P}) \\ \min(\{ m | m \in \mathbb{P}, m > a \}) & \text{otherwise} \end{cases}

This does not give us much inspiration. In fact, in many programming languages, finding all permutations of a list (P\mathbb{P}), is more troublesome than this problem. 1

In this case, an intuitive attempt is, can simple recursion solve the problem?

Recursion

The next permutation function (denoted as next) is defined on permutations and lexicographical order, but it can obviously be generalized to more generalized structures.

We first examine the case of decimal integers. Clearly, decimal integers can also define a next function. This is the famous successor function, i.e., f(x)=x+1f(x) = x + 1.

Decimal integers can be represented as a list (or, in imperative languages, an array), where each element in the list represents a digit of the number. For example, 100100 is represented as [1, 0, 0].

We are particularly interested in the successor function under list representation, because under this representation, >= is precisely the lexicographical order comparison. We denote the successor function of decimal integers under list representation as next10.

1-- next10 [1, 1, 0] = [1, 1, 1]
2next10 ...

For example, next10 [1, 1, 0] = [1, 1, 1].

This example gives us a very strong intuition. That is, next10 [1, 0] = [1, 1], at this time next10 [1, 1, 0] = 1 : next10 [1, 0]. Generalizing this relationship, we get that for the integer represented by x:xs,

1next10 (x:xs) = x:(next10 xs)

The above equation holds most of the time. Unless, next10 xs has already failed to be calculated, for example, the next number (successor) of [1, 9, 9] is [2, 0, 0], at this time it is necessary to carry over. More strictly speaking, 99 is the maximum two-digit decimal integer, so the above equation does not hold. It can be proven that as long as xs is not the maximum value, then the above equation holds.

Similarly, it can also be proven that the next function required by the problem has the same property: when xs is not the maximum value of a permutation of length length xs,

1next (x:xs) = x:next xs

This property is far too important, because we can directly construct the recursive function:

1next (x:xs)
2    | isMax xs  = tick x xs
3    | otherwise = x:next xs

So, the problem becomes

  1. How to define isMax
  2. If xs is already the maximum value, how should the tick function be defined?

Definition of isMax

For a permutation xs, maximum means that no matter how it is rearranged, the new xs' must have xs >= xs'.

What kind of permutation has this property? A very natural idea is that because this is lexicographical order, if you want the permutation to be as large as possible, you must put the larger numbers in front. In other words, descending permutations must be the largest.

1isMax :: Ord a => [a] -> Bool
2isMax = down

In Haskell, to determine whether a list is ascending or descending, we can use the zip with tail method. Specifically, we first zip the list with its tail, and then use functions like map, all to make the judgment.

1dup :: [a] -> [(a, a)]
2dup xs = zip xs (tail xs)
3
4down :: Ord a => [a] -> Bool
5down = all (uncurry (>=)) . dup

Definition of tick

If xs already satisfies down, in other words, it is already the largest, then x:xs must undergo a complex operation to get the next enumeration. This complex operation, which I call tick.

For example, the next enumeration of [2, 4, 3, 1] is [3, 1, 2, 4]. Here x is 2, xs is [4, 3, 1], and xs satisfies down, so it is no longer possible to perform a simple recursion, but rather to use tick for the calculation.

Continuing with the example of [2, 4, 3, 1]. Intuitively, to find the next enumeration of [2, 4, 3, 1] ys@(y:ys'), it can be divided into two steps:

  • Determine what y is
  • Determine what ys' is

First, y must be strictly greater than x. Because we have already shown that all enumerations starting with x are smaller than x:xs. Second, y should be as small as possible, otherwise there would be a smaller enumeration p that satisfies p > x:xs.

In the above example, y is 3. It is easy to think that 3 is the smallest number in xs = [4, 3, 1] that is greater than x = 2. We call this number the pivot. We can write the definition code:

1pivot x xs = minimum [ x' | x' <- xs, x' > x  ]

Notice that xs is ordered, so the above code can be rewritten as

1pivot x = last . takeWhile (> x)

It should be noted that the pivot function is a partial function, we used unsafe last and minimum operations, which means that when calling it, we must ensure that there is at least one x' > x.

If we can find such a y > x, then we can be sure that y:ys' is definitely larger than x:xs. Therefore, the construction of ys' should make y:ys' as small as possible. In other words, ys' should be the smallest permutation. Similar to our discussion above about the maximum value, the smallest permutation is the monotonically increasing permutation. This only requires sorting.

The elements in ys' are naturally xs minus y and then plus x. For example, in the above example [2, 4, 3, 1], it can give

1x:xs = [2, 4, 3, 1]
2
3y = last . takeWhile (> x) $ xs
4  = last [4, 3]
5  = 3
6
7ys'' = delete 3 xs
8     = [4, 1]
9
10ys' = sort (x:ys'')
11    = sort [2, 4, 1]
12    = [1, 2, 4]
13
14ys = y:ys'
15   = [3, 1, 2, 4]

Finally, there is one more situation that needs to be discussed, which is when pivot x xs = ⊥. This situation must be judged in advance to avoid errors. When does pivot not exist? That is when x:xs already satisfies down, for example [4, 3, 2, 1], at this time, it is only necessary to reverse the input.

Based on the above discussion, we give the definition of the tick function:

1import Data.List (delete)
2
3tick :: Ord a -> a -> [a] -> [a]
4tick x xs
5    | null l    = reverse (x:xs)
6    | otherwise = y:(sort (x:ys'))
7    where l = takeWhile (> x) xs
8          y = last l
9          ys' = delete y xs
1> tick 4 [3, 2, 1]
2[1, 2, 3, 4]
3
4> tick 2 [4, 3, 1]
5[3, 1, 2, 4]

Without a doubt, the current next function has already been able to correctly solve this problem. Readers may translate the above Haskell code into Scala and try it on Leetcode. Starting from recursion, we find that the solution to this problem is very intuitive, and can be given simply and easily.

More Efficient Code

Although the next function defined above is elegant and intuitive, it is not efficient enough. The main problem lies in,

  1. In tick, there is no need to use sort. xs is ordered, and this can be used to combine the new ordered sequence within O(n)O(n)
  2. down is called many times by next, each call is O(n)O(n), leading to an overall complexity of O(n2)O(n^2).

Problem 1. is simple, just rewrite tick.

Problem 2. is more complicated. We need to carefully observe the calculation process, for example next [3, 5, 1, 4, 2]:

1next [3, 5, 1, 4, 2]
2= 3:(next [5, 1, 4, 2])
3= 3:(5:(next [1, 4, 2]))
4= 3:(5:(tick 1 [4, 2]))

It can be seen that this calculation always has a certain shape, that is, the front part of the input remains unchanged, and the back part is tick.

1[3, 5,           1, 4, 2]
2^^^^^^           #######
3Unchanged part     tick part

And their boundary, intuitively, is the moment when down changes from False to True. Lets go back to the next function:

1next (x:xs)
2    | down xs  = tick x xs
3    | otherwise = x:next xs

In each recursion, next (x:xs) will calculate down xs, using the example just now,

1-- First recursion
2down [5, 1, 4, 2] = False
3-- Second recursion
4down [1, 4, 2]    = False
5-- Third recursion
6down [4, 2]       = True

In fact, down always sequentially calculates the next element in tails:

1> tails [3, 5, 1, 4, 2]
2[[3,5,1,4,2],[5,1,4,2],[1,4,2],[4,2],[2],[]]

Lets calculate the down xs used by next in advance2 and put it in a list

1import Data.List (tails)
2
3nextDown (x:xs) (d:ds)
4    | d = tick x xs
5    | otherwise = x:nextDown xs ds
6
7next xs = nextDown xs (downs xs)
8downs = drop 1 . map down . tails

If we can calculate downs in O(n)O(n) time, problem 2. will be solved. Intuitively, this is easy, because down [4, 3, 2, 1] is supposed to ensure:

  • 4 >= 3
  • down [3, 2, 1]

In other words, when we want to calculate down [4, 3, 2, 1], down [3, 2, 1] has already been calculated, it just needs a way to store this previous calculation for later use.

The functional programming community has long had a solution to this kind of problem: the scan theorem.

The scan theorem refers to

1map (foldl op a) . inits = scanl op a

Similarly

1map (foldr op a) . tails = scanr op a

Using this theorem, we give the following program calculation process. Note that, to avoid problems, we defined

1tails' = filter (not . null) . tails
1downs = { definition }
2        drop 1 . map down . tails'
3      = { definition }
4        drop 1 . map (all (uncurry (>=)) . dup) . tails'
5      = { map distributivity }
6        drop 1 . map (all (uncurry (>=))) . map dup . tails'
7      = { (1) }
8        drop 1 . map (all (uncurry (>=))) . tails . dup
9      = { definition }
10        drop 1 . map (foldr (&&) True . map (uncurry (>=))) . tails . dup
11      = { fold-map fusion }
12        drop 1 . map (foldr join True) . tails . dup
13      = { scan theorem }
14        drop 1 . scanr join True . dup
15          where join (x, x1) r = x >= x1 && r

A new proof for (1) is needed, which states that the following code is equivalent:

1> map dup $ tails' [4, 2, 3, 1]
2[[(4,2),(2,3),(3,1)],[(2,3),(3,1)],[(3,1)],[]]
3
4> tails $ dup [4, 2, 3, 1]
5[[(4,2),(2,3),(3,1)],[(2,3),(3,1)],[(3,1)],[]]

Proof omitted.

Summary

Combining the above discussion, we can give an efficient functional implementation:

1tick x xs = build l r x
2    where (l, r) = span (> x) xs
3          build [] r x = reverse (x:r)
4          build l  r x = last l : reverse (init l ++ [x] ++ r)
5
6dup xs = zip xs (tail xs)
7downs = scanr (\(x, x1) r -> (x >= x1) && r) True . dup
8
9next xs = l ++ tick (head r) (tail r)
10    where (l, r) = splitAt (n - 1) xs
11          n = length $ takeWhile not $ downs xs

This is O(n)O(n) time complexity. However, our constant is indeed larger than that of an imperative program. Theoretically, the implementation of the nextDown function defined in the previous section is more efficient (avoiding length and splitAt), but I still think that the implementation given in this section is clearer.

I submitted the equivalent Scala code to leetcode:

1object Solution {
2    def nextPermutation(nums: Array[Int]): Unit = {
3        val nextNums = next(nums.toList).toArray
4        for (i <- nums.indices) {
5            nums(i) = nextNums(i)
6        }
7    }
8
9    def dup(xs: List[Int]): List[(Int, Int)] = xs zip xs.drop(1)
10
11    def tick(x: Int, xs: List[Int]): List[Int] = {
12        val (l, r) = xs.span(_ > x)
13        if (l.isEmpty) r.reverse ++ List(x)
14        else l.last :: (l.init ++ List(x) ++ r).reverse
15    }
16
17    def tailsDown(xs: List[(Int, Int)]): List[Boolean] = 
18        xs.scanRight(true) { case ((x, x1), r) => (x >= x1) && r }
19
20    def next(xs: List[Int]): List[Int] = {
21        val n = tailsDown(dup(xs)).takeWhile(!_).length
22        val (l, r) = xs.splitAt(n - 1)
23        l ++ tick(r.head, r.tail)
24    }
25}

The next permutation problem is one of the few problems where an imperative program can be simpler than a functional program. Even the Haskell Data.Permute library uses an imperative method.

In imperative programs, the above algorithm can be described as two processes:

  1. Traverse from right to left to find the longest descending sequence a[i(len - 1)]a[i \dots (\text{len - 1})]
  2. Perform the calculation,
    • If i=0i = 0, then reverse the array

    • In other cases, find j[i,(len - 1)]j \in [i, \text{(len - 1)}], such that

      • a[j]>a[i1]a[j] > a[i - 1]
      • ki,a[k]>a[i1]a[k]a[j]\forall k \ge i, a[k] > a[i - 1] \to a[k] \ge a[j]

      Swap i1,ji - 1, j, reverse a[i(len - 1)]a[i \dots (\text{len - 1})]

If the reader fully understands the above functional algorithm, this imperative algorithm should be very straightforward. But if this imperative algorithm is directly presented to you, can you really understand why it is correct?

Lets end todays story with a Python program!

1def swap(a, i, j):
2    tmp = a[i]
3    a[i] = a[j]
4    a[j] = tmp
5
6def rev(a, start, end):
7    l = end - start
8    for i in range(start, start + l // 2):
9        swap(a, i, end - (i - start) - 1)
10
11def solve(a):
12    for i in range(len(a) - 1, -1, -1):
13        if i >= 1 and a[i - 1] < a[i]:
14            break
15    if i == 0:
16        rev(a, 0, len(a))
17    else:
18        j = len(a) - 1
19        while j >= i and a[j] <= a[i - 1]:
20            j -= 1
21        swap(a, i - 1, j)
22        rev(a, i, len(a))