Sunforger

Sunforger

Dynamic Programming: Edit Distance Problem

The edit distance is a classic problem in string processing, measuring the minimum number of operations required to transform one string into another.

Problem Definition#

Given two strings A and B, find the minimum number of operations to convert A into B.

Only one type of operation is allowed at a time, and the "cost" of each operation is 1.

  1. Insert a character (Insert)
  2. Delete a character (Delete)
  3. Replace a character (Replace)

Example:
Transform horse into ros:

horse → rorse    (replace 'h' → 'r')
rorse → rose     (delete 'r')
rose  → ros      (delete 'e')

Total 3 steps → Edit distance = 3

Dynamic Programming Solution#

We use a two-dimensional array dp[i][j]dp[i][j] to represent the minimum number of operations required to convert A[0..i1]A[0..i-1] to B[0..j1]B[0..j-1].

## Initial conditions
dp[0][j] = j      # Transform empty string to B[0..j-1]: insert j times
dp[i][0] = i      # Transform A[0..i-1] to empty string: delete i times

## State transition equation
if A[i-1] == B[j-1]:
    dp[i][j] = dp[i-1][j-1]         # No operation needed
else:
    dp[i][j] = min(
        dp[i-1][j] + 1,             # Delete A[i-1]
        dp[i][j-1] + 1,             # Insert B[j-1]
        dp[i-1][j-1] + 1            # Replace A[i-1] with B[j-1]
    )

Pseudocode:

def edit_distance(A, B):
    n, m = len(A), len(B)
    dp = [[0] * (m+1) for _ in range(n+1)]

    for i in range(n+1):
        dp[i][0] = i
    for j in range(m+1):
        dp[0][j] = j

    for i in range(1, n+1):
        for j in range(1, m+1):
            if A[i-1] == B[j-1]:
                dp[i][j] = dp[i-1][j-1]
            else:
                dp[i][j] = min(
                    dp[i-1][j] + 1,      # Delete (move up)
                    dp[i][j-1] + 1,      # Insert (move left)
                    dp[i-1][j-1] + 1     # Replace (move diagonally up-left)
                )

    return dp[n][m]

Time complexity O(m×n)O(m\times n)
Space complexity O(m×n)O(m\times n)

A rolling array can be used to optimize space complexity, only saving the current row and the previous row. This is similar to the longest common subsequence problem LCS.

Variant Problems#

Shortest edit path problem (requires restoring edit operations)
Weighted edit distance problem (different costs for each operation)
Damerau-Levenshtein distance (DL distance, allows swap operations)

Edit Path Recovery (Backtracking)#

An additional array needs to be constructed to save the operation steps.

def edit_path(A, B):
    n, m = len(A), len(B)
    dp = [[0]*(m+1) for _ in range(n+1)]
    move = [['']*(m+1) for _ in range(n+1)]

    for i in range(n+1):
        dp[i][0] = i
        move[i][0] = 'D'  # Delete
    for j in range(m+1):
        dp[0][j] = j
        move[0][j] = 'I'  # Insert
    move[0][0] = ' '

    for i in range(1, n+1):
        for j in range(1, m+1):
            if A[i-1] == B[j-1]:
                dp[i][j] = dp[i-1][j-1]
                move[i][j] = 'M'  # Match
            else:
                choices = [
                    (dp[i-1][j] + 1, 'D'),
                    (dp[i][j-1] + 1, 'I'),
                    (dp[i-1][j-1] + 1, 'R')
                ]
                dp[i][j], move[i][j] = min(choices)

    # Backtrack the path
    ops = []
    i, j = n, m
    while i > 0 or j > 0:
        op = move[i][j]
        if op == 'M' or op == 'R':
            ops.append((op, i-1, j-1))
            i -= 1
            j -= 1
        elif op == 'D':
            ops.append(('D', i-1))
            i -= 1
        elif op == 'I':
            ops.append(('I', j-1))
            j -= 1
    ops.reverse()
    return dp[n][m], ops

State compression cannot be used, as it would lose path information.

Weighted Edit Distance#

Only the cost of the replace operation is needed.

def weighted_edit_distance(A, B, insert_cost=1, delete_cost=1, replace_cost=2):
    n, m = len(A), len(B)
    dp = [[0]*(m+1) for _ in range(n+1)]

    for i in range(n+1):
        dp[i][0] = i * delete_cost
    for j in range(m+1):
        dp[0][j] = j * insert_cost

    for i in range(1, n+1):
        for j in range(1, m+1):
            if A[i-1] == B[j-1]:
                dp[i][j] = dp[i-1][j-1]
            else:
                dp[i][j] = min(
                    dp[i-1][j] + delete_cost,
                    dp[i][j-1] + insert_cost,
                    dp[i-1][j-1] + replace_cost
                )
    return dp[n][m]

Damerau-Levenshtein Distance (DL Distance)#

Based on edit distance, an additional operation is allowed: swapping two adjacent characters.

Use dynamic programming to solve it.
dp[i][j]dp[i][j]: represents the minimum number of operations to convert A[0..i1]A[0..i-1] to B[0..j1]B[0..j-1].

State Transition Equation

1. If A[i-1] == B[j-1]:
   dp[i][j] = dp[i-1][j-1]                (characters are the same, no operation needed)

2. Otherwise:
   dp[i][j] = min(
       dp[i-1][j] + 1,                    (delete)
       dp[i][j-1] + 1,                    (insert)
       dp[i-1][j-1] + 1                   (replace)
   )

3. If the swap condition is met:
   A[i-1] == B[j-2] and A[i-2] == B[j-1]
   then allow:
   dp[i][j] = min(dp[i][j], dp[i-2][j-2] + 1)  (swap)

Pseudocode (swap operation)

for i in range(1, n+1):
    for j in range(1, m+1):
        cost = 0 if A[i-1] == B[j-1] else 1
        dp[i][j] = min(
            dp[i-1][j] + 1,        # Delete
            dp[i][j-1] + 1,        # Insert
            dp[i-1][j-1] + cost    # Replace or match
        )
        # Check if adjacent characters can be swapped
        if i > 1 and j > 1 and A[i-1] == B[j-2] and A[i-2] == B[j-1]:
            dp[i][j] = min(dp[i][j], dp[i-2][j-2] + 1)  # Swap
Loading...
Ownership of this post data is guaranteed by blockchain and smart contracts to the creator alone.