Data Structure and Algorithm

Overview

Backtracking is a byproduct of recursion. As long as there is recursion, there will be backtracking.

Because the essence of backtracking is exhaustive enumeration, enumerating all possibilities and then selecting the answer we want. If we want to make backtracking more efficient, we can add some pruning operations, but this does not change the essence of backtracking, which is exhaustive enumeration.

Problems For Backtracking

Backtracking algorithm is usually used in the following senario

Combination Problems

Find a collection of k elements that satisfies certain condition from N elements.

E.g.: 77. Combinations 17. Letter Combinations of a Phone Number 39. Combination Sum 40. Combination SumII 216. Combination SumIII

Slicing Problems

How many ways to slice / split a string following certain rules.

E.g.: 131. Palindrome Partitioning 93. Restore IP Addresses

Subset Problems

How many subsets that satisfy certain condition are there in a N elements collection?

E.g.: 78. Subsets 90. Subsets II 491. Non-decreasing Subsequences (Not exactly the same, but really similar)

Permutation Problems

How many ways are there to arrange N numbers according to certain rules?

E.g.: 46. Permutations 47. Permutations II

Chess Problems

N-Queens, solve Sudoku, etc

E.g.: 51. N-Queens 37. Sudoku Solver

Tree Structure

Backtracking is an algorithm often used in Depth-First Search (DFS) and Breadth-First Search (BFS).

The essence is: if you can't go on, turn back.

Therefore, all problems that can be solved by backtracking can be abstracted into a tree structure. By the same token, when you find that a problem can be abstracted into a tree structure, you should be able to think of backtracking.

Tree structure must posses certain properties: width and height.

Because the backtracking method solves the problem of recursively searching for subsets in a set, the size of the set constitutes the width of the tree, and the depth of the recursion constitutes the depth of the tree.
Recursion must have a termination condition, so it must be a tree with limited height (N-ary tree).

Implementation of Backtracking - Basis

The essence of backtracking is: turn back if you can't go the same way. So how to turn back and make sure you don't go back the same way after turning back is the first thing you need to think about when using the backtracking algorithm.

The implementation process can be summarized into the following four steps:

Constrauct the search space tree.
Do horizontal traverse: elements on the same tree level.

If meets the finish condition, return the outcomes
Otherwise, keep searching, if it reaches the end then turn back to another path

Key points: Backtrack, undo the processing results of the current step and move to another path.

Coding Template (Basis)

void backtracking(params){
  if (finish_condition){
      store outcomes;
      return;
  }
  for (choose: elements in the current layer){      # horizontal
      process the current node;
      backtracking(path, options); // recursion     # vertical
      backtracking, undo the processed outcome
  }
}

Optimization by Pruning

The only method to optimize the backtracking algorithm is Pruning.

To put it simply, the core purpose of pruning is to reduce unnecessary search space, thereby reducing the complexity of time and space.

Therefore, the essence of pruning lies in whether one can keenly discover the opportunity for early termination during the vertical traversal process.

Takes Combination Problems as examples

We try to use the backtracking algorithm to find a set of k numbers from N numbers according to certain rules.

During this search process, when the search reaches a node at a certain level of the spatial tree, the pruning algorithm finds that the elements between the starting point and the end of the set are no longer enough for the k elements required by the question, so there is no need to search.

Assuming that there are M layers of subtrees after this node, this pruning can reduce the complexity of O(2^M) for us.

Coding Template (With Pruning)

void backtracking(params) {
  if (finish_condition) {
    store outcomes;
    return;
  }
  for i in range(start_index, len(nums))  {      # horizontal
      if ( i staisfies pruning condition):
        continue
      process current tree node;
      backtracking(path, options); // recursion  # vertical
      backtracking, undo the current processed outcomes
  }
}

⭐️ Optimization By Deduplication

Pruning Vs. Deduplication

Isn't it said that the only optimization method for backtracking algorithm is pruning? Then what is deduplication for?

Pruning is an optimization at the code level

Pruning is a general optimization method for backtracking algorithms as a whole, which can be directly written into the code template. In any case, the backtracking algorithm can use pruning to reduce the search space, but in many cases the search space that can be reduced is relatively limited.

Deduplication is an optimization of the problem-solving level

The so-called deduplication is actually to ensure that the used elements cannot be selected repeatedly. It is difficult to implement because it is often difficult to distinguish which dimension (horizontally or vertically) to avoid duplication.
It is known that the problems solved by the backtracking method can be abstracted into a tree structure. Then "used" has two dimensions in this tree structure. The vertical dimension is used on the same branch, and the horizontal dimension is used on the same tree layer.
So the first problem to be solved by de-weighting optimization is, should we focus on whether it has been used on the same tree layer or on the same branch?
E.g. Leetcode 40. Combination Sum II. We found that elements can be repeated in the same combination, no matter how they are repeated, but two combinations cannot be the same. That is, given the original array: candidates = [1, 1, 2], target = 3 (where candidates is sorted). When 1 with index 0 is used on the first branch, it does not affect our continued use of 1 with index 1 on the same branch. However, if 1 with index 0 is used again on the same layer, duplication will inevitably occur.
So what we want to deduplicate are the "used" on the same tree layer. The elements on the same branch are all elements in the same combination and do not need to be deduplicated.

⭐️ Deduplication on Branch

Branch: the path form the root node to a certain leaf node.

In order to avoid the elements at the same position in the original array (considering the possible existence of duplicate values) being used repeatedly in the path, it is necessary to define a global variable to record which element is used. This variable is usually called used by convention.

used = [0 for _ in range (len(nums))]

void backtracking(params) {
  if (finish_condition) { 
    … pass 
  }
  for i in range (start_index, len(nums))  {      # horizontal
    if ( i satisfies pruning condition) { 
      continue 
    }
    if (used[i] == 0) { 
      continue 
    }
    process current tree node
    used[i] = 1
    backtracking(path, options); // recursion     # vertical
    backtracking, undo the currnet processed outcomes
    used[i] = 0
  }
}

⭐️ Level Deduplication

The two deduplication are similar, as they both need to define a used variable to record.

But the difference is that we need to define a local variable to avoid the duplication in the same tree level, instead of a global variable.

void backtracking(params) {
  if (finish_condition) { 
    … pass
  }
  
  used = [0 for _ in range (len(nums))]

  for i in range (start_index, len(nums))  {      # horizontal
    if ( i satisfies pruning condition) { 
      continue 
    }
    if (used[i] == 0) { 
      continue 
    }
    process current tree node;
    used[i] = 1
    backtracking(path, options); // recursion     # vertical
    backtracking, undo hte current processing outcomes
    used[i] = 0
  }
}

In many Leetcode solutions, you will find that some people are used to using sets instead of arrays for deduplication. However, it is worth noting that there is a big performance difference between using used arrays for deduplication and using sets for deduplication:

The version using set to deduplicate is much less efficient than the version using used array. And using the used array has almost no additional burden in terms of time complexity!

Solutions to Combination and Slicing Problems

Combination Problems

Time Complexity: O(2^n)
Space Complexity: O(n)
The combination problem is actually a subset problem, so the worst case of the combination problem will not exceed the time complexity of the subset problem. And because the state of each element in the subset problem is nothing more than taking or not taking, the time complexity is O(2^n)
The recursion depth is n, so the system stack space will be O(n), the space used by each layer of recursion is constant. Note that result and path in the code are global variables. Even if they are placed in parameters, they are passed by reference and no new memory space is allocated. The final space complexity is O(n)

Combination problems are relatively simple and representative among backtracking problems. Thus the backtracking basic template and pruning template can completely solve the combination problem in a general sense, and will not be repeated here.

A major difficulty of the combination problem itself is how to avoid repetition in a disordered state, taking the Leetcode combination problem as an example.

In the first layer, when taking 2, you need to delete 1 from the candidate, otherwise there will be a risk of duplication. Similarly, 2 needs to be deleted from 3.

for i in range(start_index, len(nums))  {      # horizontal
  process the current tree node;
  backtracking(path, options); // recursion    # vertical
  backtracking, undo the current processed outcome
}

Slicing Problems

Time Complexity: O(2^n)
Space Complexity: O(n)

The difficulty of the slicing problem lies in how to simulate the cutting line and how to determine how to terminate the recursion in the problem. The solution itself is very similar to the combination problem. Being able to use the idea of solving combination problems to solve the slicing problem is half the battle.

Solutions to Permutation and Subsets Problems

Permutation Problems

Time Complexity: O(n!)
This can be clearly seen from the arranged tree diagram. There are n nodes on each layer, and each branch on the second layer extends to n-1 branches, and then there are n-2 branches below, so the total number of nodes up to the leaf node is n!.
Space Complexity: O(n)

The core difference between Combination and Permutation problems is that the Combination Problem does care about order, while Permutation Problem cares, [1, 2] and [2, 1] are the same array in Combination Problem, but different in Permutation Problem.

Therefore, the start_index parameter required in solving the combination problem can be omitted in the permutation problem.

for i in range(0, len(nums))  {
  if ( i satisfies pruning condition):
    continue
  process current tree node;
  backtracking(path, options);
  backtracking, undo the current processed outcome
}

Subset Problems

Time Complexity: O(2^n)
Because the state of each element is nothing more than taking or not taking, the time complexity is O(2^n)
Space Complexity:O(n)

The core difference between subset problems and combination problems is that combination problems only need to collect leaf nodes, while subsets need to collect all nodes.

Therefore, the termination condition of combination problems is often based on the number of elements in the node, while subset problems do not require termination.

if ( len(ans) == len(nums) ) {
  store outcomes;
  return;
}

Solutions to Chess Problems

The chessboard problem is the most difficult problem in the backtracking algorithm, and it is also a required course in the field of artificial intelligence.

Traditional board games, including Chinese chess, international chess, Go, Sudoku, and Gobang, are all two-dimensional chessboards, so naturally we have to consider the possibility of two-dimensional recursion.

Different game rules will directly lead to the width of each layer of branches, so the performance of chessboard solutions cannot be evaluated uniformly.

Takes N-Queens as example, which lookes like n * n at a glance, but according to the rule of the N-Queens game, we can find that there could be only 1 queen on each row and column. Thus, the depth of recursion is only N, which is much easier than global search in a real N * N chess board.

So we can have:

Time Complexity: O(n!). In fact, if you look at the tree diagram, intuitively O(n^n), but the queens cannot meet each other, so there is pruning during the search process, which makes the worst case isO(n!)。
Space Complexity: O(n)

Meanwhile, Sudoku Solver is a typical 2D recursion, from which we can have:

Time Complexity: O(9^m) , m is the number of space。
Space Complexity: O(n^2), where the depth of recursion is n^2

Regardless of the characteristics of the problem itself, the code for the two-dimensional chessboard can be summarized as the following template:

board = [
  [......],
  ……
]

void backtracking(i, j) {
  if (finish_condition) { 
    … pass
  }
  for ( a in board(i, j) all possible actions)  {
      if ( a satisfies pruning condition) { 
        continue 
      }
      process the current tree node;
      backtracking( i', j' )
      backtrack, undo the current processed outcomes
  }
}

# all possible starting points
for i in range( 0, len(board) ) {
  for j in range (0, len(board) ) {
      backtracking(i, j)
  }
}

#Tech

Data Structure and Algorithm - Backtracking

https://billyjojojobulidogithub.io/notes/Algorithm/backtracking/english/index.html

Author

Baocheng Wang

Posted on

July 25, 2024

Licensed under