Problem Background#
Given two strings A and B, find their longest common subsequence.
Subsequence: can be non-contiguous but must maintain the original order
Common subsequence: a subsequence that is in both A and B
Example:
A = "ABCBDAB"
B = "BDCAB"
LCS = "BDAB" or "BCAB" (both of length 4)
Applications#
File/code difference comparison (e.g., diff tools)
Gene sequence alignment (DNA/protein)
Spell checking, version control merging
Edit distance, semantic similarity calculation
Dynamic Programming Solution#
Idea: Define to represent the length of the longest common subsequence of and .
Pseudocode (including state transition equation):
def LCS_length(A, B):
n, m = len(A), len(B)
dp = [[0] * (m+1) for _ in range(n+1)]
for i in range(1, n+1):
for j in range(1, m+1):
if A[i-1] == B[j-1]:
dp[i][j] = dp[i-1][j-1] + 1
else:
dp[i][j] = max(dp[i-1][j], dp[i][j-1])
return dp[n][m]
Time complexity:
Space complexity:
Backtracking and Sequence Recovery#
# ...
# Backtrack to recover the path
i, j = n, m
lcs = []
while i > 0 and j > 0:
if A[i-1] == B[j-1]:
lcs.append(A[i-1])
i -= 1
j -= 1
elif dp[i-1][j] >= dp[i][j-1]:
i -= 1
else:
j -= 1
return ''.join(reversed(lcs))
Rolling Array to Optimize Space Complexity#
In a 2D DP table, if the value of a certain row only depends on the "previous row," we do not need to keep the entire 2D table—just save the current row and the previous row for rolling use.
Original State Transition:
if A[i-1] == B[j-1]:
dp[i][j] = dp[i-1][j-1] + 1
else:
dp[i][j] = max(dp[i-1][j], dp[i][j-1])
Each dp[i][j] only depends on the data from row i-1, so we can optimize space using a rolling array.
Rolling Array Implementation
Only keep 2 rows
prev = [0] * (m + 1) # Previous row
curr = [0] * (m + 1) # Current row
After processing each row i, swap prev and curr.
def LCS_length(A, B):
n, m = len(A), len(B)
prev = [0] * (m + 1)
curr = [0] * (m + 1)
for i in range(1, n + 1):
for j in range(1, m + 1):
if A[i - 1] == B[j - 1]:
curr[j] = prev[j - 1] + 1
else:
curr[j] = max(prev[j], curr[j - 1])
prev, curr = curr, prev # Swap the two rows
return prev[m] # Note that the final result is in prev (because it was swapped in the last round)
Time complexity remains
Space complexity reduces to
But note: It is not possible to recover the LCS path using the rolling array. Because the path information has been lost in the space compression.