Sunforger

Sunforger

Algorithm: Longest Common Subsequence Problem

Problem Background#

Given two strings A and B, find their longest common subsequence.

Subsequence: can be non-contiguous but must maintain the original order
Common subsequence: a subsequence that is in both A and B

Example:

A = "ABCBDAB"
B = "BDCAB"

LCS = "BDAB" or "BCAB" (both of length 4)

Applications#

File/code difference comparison (e.g., diff tools)
Gene sequence alignment (DNA/protein)
Spell checking, version control merging
Edit distance, semantic similarity calculation

Dynamic Programming Solution#

Idea: Define dp[i][j]dp[i][j] to represent the length of the longest common subsequence of A[0..i1]A[0..i-1] and B[0..j1]B[0..j-1].

Pseudocode (including state transition equation):

def LCS_length(A, B):
    n, m = len(A), len(B)
    dp = [[0] * (m+1) for _ in range(n+1)]

    for i in range(1, n+1):
        for j in range(1, m+1):
            if A[i-1] == B[j-1]:
                dp[i][j] = dp[i-1][j-1] + 1
            else:
                dp[i][j] = max(dp[i-1][j], dp[i][j-1])

    return dp[n][m]

Time complexity: O(m×n)O(m\times n)
Space complexity: O(m×n)O(m\times n)

Backtracking and Sequence Recovery#

  # ...

  # Backtrack to recover the path
  i, j = n, m
  lcs = []
  while i > 0 and j > 0:
      if A[i-1] == B[j-1]:
          lcs.append(A[i-1])
          i -= 1
          j -= 1
      elif dp[i-1][j] >= dp[i][j-1]:
          i -= 1
      else:
          j -= 1

  return ''.join(reversed(lcs))

Rolling Array to Optimize Space Complexity#

In a 2D DP table, if the value of a certain row only depends on the "previous row," we do not need to keep the entire 2D table—just save the current row and the previous row for rolling use.

Original State Transition:

if A[i-1] == B[j-1]:
    dp[i][j] = dp[i-1][j-1] + 1
else:
    dp[i][j] = max(dp[i-1][j], dp[i][j-1])

Each dp[i][j] only depends on the data from row i-1, so we can optimize space using a rolling array.

Rolling Array Implementation
Only keep 2 rows

prev = [0] * (m + 1)  # Previous row
curr = [0] * (m + 1)  # Current row

After processing each row i, swap prev and curr.

def LCS_length(A, B):
    n, m = len(A), len(B)
    prev = [0] * (m + 1)
    curr = [0] * (m + 1)

    for i in range(1, n + 1):
        for j in range(1, m + 1):
            if A[i - 1] == B[j - 1]:
                curr[j] = prev[j - 1] + 1
            else:
                curr[j] = max(prev[j], curr[j - 1])
        prev, curr = curr, prev  # Swap the two rows

    return prev[m]  # Note that the final result is in prev (because it was swapped in the last round)

Time complexity remains O(m×n)O(m\times n)
Space complexity reduces to O(min{m,n})O(\min\{m, n\})

But note: It is not possible to recover the LCS path using the rolling array. Because the path information has been lost in the space compression.

Loading...
Ownership of this post data is guaranteed by blockchain and smart contracts to the creator alone.