The edit distance is a classic problem in string processing, measuring the minimum number of operations required to transform one string into another.
Problem Definition#
Given two strings A and B, find the minimum number of operations to convert A into B.
Only one type of operation is allowed at a time, and the "cost" of each operation is 1.
- Insert a character (Insert)
- Delete a character (Delete)
- Replace a character (Replace)
Example:
Transform horse
into ros
:
horse → rorse (replace 'h' → 'r')
rorse → rose (delete 'r')
rose → ros (delete 'e')
Total 3 steps → Edit distance = 3
Dynamic Programming Solution#
We use a two-dimensional array to represent the minimum number of operations required to convert to .
## Initial conditions
dp[0][j] = j # Transform empty string to B[0..j-1]: insert j times
dp[i][0] = i # Transform A[0..i-1] to empty string: delete i times
## State transition equation
if A[i-1] == B[j-1]:
dp[i][j] = dp[i-1][j-1] # No operation needed
else:
dp[i][j] = min(
dp[i-1][j] + 1, # Delete A[i-1]
dp[i][j-1] + 1, # Insert B[j-1]
dp[i-1][j-1] + 1 # Replace A[i-1] with B[j-1]
)
Pseudocode:
def edit_distance(A, B):
n, m = len(A), len(B)
dp = [[0] * (m+1) for _ in range(n+1)]
for i in range(n+1):
dp[i][0] = i
for j in range(m+1):
dp[0][j] = j
for i in range(1, n+1):
for j in range(1, m+1):
if A[i-1] == B[j-1]:
dp[i][j] = dp[i-1][j-1]
else:
dp[i][j] = min(
dp[i-1][j] + 1, # Delete (move up)
dp[i][j-1] + 1, # Insert (move left)
dp[i-1][j-1] + 1 # Replace (move diagonally up-left)
)
return dp[n][m]
Time complexity
Space complexity
A rolling array can be used to optimize space complexity, only saving the current row and the previous row. This is similar to the longest common subsequence problem LCS.
Variant Problems#
Shortest edit path problem (requires restoring edit operations)
Weighted edit distance problem (different costs for each operation)
Damerau-Levenshtein distance (DL distance, allows swap operations)
Edit Path Recovery (Backtracking)#
An additional array needs to be constructed to save the operation steps.
def edit_path(A, B):
n, m = len(A), len(B)
dp = [[0]*(m+1) for _ in range(n+1)]
move = [['']*(m+1) for _ in range(n+1)]
for i in range(n+1):
dp[i][0] = i
move[i][0] = 'D' # Delete
for j in range(m+1):
dp[0][j] = j
move[0][j] = 'I' # Insert
move[0][0] = ' '
for i in range(1, n+1):
for j in range(1, m+1):
if A[i-1] == B[j-1]:
dp[i][j] = dp[i-1][j-1]
move[i][j] = 'M' # Match
else:
choices = [
(dp[i-1][j] + 1, 'D'),
(dp[i][j-1] + 1, 'I'),
(dp[i-1][j-1] + 1, 'R')
]
dp[i][j], move[i][j] = min(choices)
# Backtrack the path
ops = []
i, j = n, m
while i > 0 or j > 0:
op = move[i][j]
if op == 'M' or op == 'R':
ops.append((op, i-1, j-1))
i -= 1
j -= 1
elif op == 'D':
ops.append(('D', i-1))
i -= 1
elif op == 'I':
ops.append(('I', j-1))
j -= 1
ops.reverse()
return dp[n][m], ops
State compression cannot be used, as it would lose path information.
Weighted Edit Distance#
Only the cost of the replace operation is needed.
def weighted_edit_distance(A, B, insert_cost=1, delete_cost=1, replace_cost=2):
n, m = len(A), len(B)
dp = [[0]*(m+1) for _ in range(n+1)]
for i in range(n+1):
dp[i][0] = i * delete_cost
for j in range(m+1):
dp[0][j] = j * insert_cost
for i in range(1, n+1):
for j in range(1, m+1):
if A[i-1] == B[j-1]:
dp[i][j] = dp[i-1][j-1]
else:
dp[i][j] = min(
dp[i-1][j] + delete_cost,
dp[i][j-1] + insert_cost,
dp[i-1][j-1] + replace_cost
)
return dp[n][m]
Damerau-Levenshtein Distance (DL Distance)#
Based on edit distance, an additional operation is allowed: swapping two adjacent characters.
Use dynamic programming to solve it.
: represents the minimum number of operations to convert to .
State Transition Equation
1. If A[i-1] == B[j-1]:
dp[i][j] = dp[i-1][j-1] (characters are the same, no operation needed)
2. Otherwise:
dp[i][j] = min(
dp[i-1][j] + 1, (delete)
dp[i][j-1] + 1, (insert)
dp[i-1][j-1] + 1 (replace)
)
3. If the swap condition is met:
A[i-1] == B[j-2] and A[i-2] == B[j-1]
then allow:
dp[i][j] = min(dp[i][j], dp[i-2][j-2] + 1) (swap)
Pseudocode (swap operation)
for i in range(1, n+1):
for j in range(1, m+1):
cost = 0 if A[i-1] == B[j-1] else 1
dp[i][j] = min(
dp[i-1][j] + 1, # Delete
dp[i][j-1] + 1, # Insert
dp[i-1][j-1] + cost # Replace or match
)
# Check if adjacent characters can be swapped
if i > 1 and j > 1 and A[i-1] == B[j-2] and A[i-2] == B[j-1]:
dp[i][j] = min(dp[i][j], dp[i-2][j-2] + 1) # Swap