EZ Study

Actuarial Biology Chemistry Economics Calculators Confucius Engineer
Physics
C.S.


Damerau Levenshtein distance in SQL Netezza

In information theory and computer science, the Damerau-Levenshtein distance (named after Frederick J. Damerau and Vladimir I. Levenshtein) is a "distance" (string metric) between two strings, i.e., finite sequence of symbols, given by counting the minimum number of operations needed to transform one string into the other, where an operation is defined as an insertion, deletion, or substitution of a single character, or a transposition of two adjacent characters.

For example, the Levenshtein distance between "kitten" and "sitting" is 3, since the following three edits change one into the other, and there is no way to do it with fewer than three edits:

1. kitten --> sitten (substitution of "s" for "k")
2. sitten --> sittin (substitution of "i" for "e")
3. sittin --> sitting (insertion of "g" at the end).

Damerau-Levenshtein distance allows insertion, deletion, substitution, and the transposition of two adjacent characters; Longest common subsequence metric allows only insertion and deletion, not substitution; Hamming distance allows only substitution, hence, it only applies to strings of the same length.

Netezza funcstions to calculate Damerau-Levenshtein distance:
    le_dst: le_dst ( string_expression1 , string_expression2 ) Returns a value indicating how different the two input strings are, calculated according to the Levenshtein edit distance algorithm.

    dle_dst: dle_dst ( string_expression1 , string_expression2 ) Returns a value indicating how different the two input strings are, calculated according to the Damerau-Levenshtein distance algorithm


Related links:

Continue to : Running SAS procedures inside Netezza   SAS tutorial home
Back to: Top SAS Tuninig Techniques for Large Dataset   Statistics tutorial home