In computational linguistics and computer science, edit distance is a way of quantifying how dissimilar two strings (e.g., words) are to one another by counting the minimum number of operations required to transform one string into the other. Edit distances find applications in natural language processing, where automatic spelling correction can determine candidate corrections for a misspelt word by selecting words from a dictionary that have a low distance to the word in question. In bioinformatics, it can be used to quantify the similarity of DNA sequences, which can be viewed as strings of the letters A, C, G and T.
Different definitions of an edit distance use different sets of string operations. The Levenshtein distance operations are the removal, insertion, or substitution of a character in the string. Being the most common metric, the Levenshtein distance is usually what is meant by "edit distance".
Will First Solve the problem using Recursion :
package dp;
public class EditDistance {
public static void main(String[] args) {
String str1 = "sunday";
String str2 = "saturday";
long startTime = System.nanoTime();
System.out.println(EditDistanceCount(str1,str2, str1.length(), str2.length()));
long endTime = System.nanoTime();
long totalTime = endTime - startTime;
System.out.println("Total Time (nanoseconds) : " + (totalTime));
}
private static int EditDistanceCount(String str1, String str2, int length1, int length2) {
if(length1 == 0)
return length2;
if(length2 == 0)
return length1;
if(str1.charAt(length1-1) == str2.charAt(length2-1))
return EditDistanceCount(str1, str2, length1-1, length2-1);
return 1 + min(
EditDistanceCount(str1, str2, length1, length2 -1 ), //insert
EditDistanceCount(str1, str2, length1-1, length2), //Remove
EditDistanceCount(str1, str2, length1-1, length2-1) //Replace
);
}
private static int min(int a, int b, int c) {
if( a < b)
{
return a < c ? a : c;
}
else
{
return b < c ? b : c;
}
}
}
output:
3
Total Time (nanoseconds) : 419115
The time complexity of the above solution is exponential. in the worst case, it will be 3^n.
In the above approach, we can see that many subproblems are solved again and again. We can use the previously solved subproblem to be used in solving a new problem which will reduce the time complexity considerably.
Let's solve the problem using dynamic programming :
package dp;
public class EditDistance {
public static void main(String[] args) {
String str1 = "sunday";
String str2 = "saturday";
long startTime = System.nanoTime();
System.out.println(EditDistanceCount(str1,str2, str1.length(), str2.length()));
long endTime = System.nanoTime();
long totalTime = endTime - startTime;
System.out.println("Total Time (nanoseconds) : " + (totalTime));
}
private static int EditDistanceCount(String str1, String str2, int length1, int length2) {
if(length1 == 0)
return length2;
if(length2 == 0)
return length1;
int[][] dp = new int[length1+1][length2+1];
for(int i=0;i<length1+1;i++) {
for(int j=0;j<length2+1;j++) {
if(i==0)
dp[i][j] = j;
else if(j==0)
dp[i][j] = i;
else if(str1.charAt(i-1) == str2.charAt(j-1))
dp[i][j]=dp[i-1][j-1]; // if same characters do nothing
else
dp[i][j]= 1 + min(dp[i][j-1], dp[i-1][j], dp[i-1][j-1]);
}
}
return dp[length1][length2];
}
private static int min(int a, int b, int c) {
if( a < b)
{
return a < c ? a : c;
}
else
{
return b < c ? b : c;
}
}
}
output:
3
Total Time (nanoseconds) : 177840
The time complexity of the above solution is O(mxn) where m and n are the length of respective strings to be compared.



