http://en.wikipedia.org/wiki/Dice%27s_coefficient

When taken as a string similarity measure, the coefficient may be calculated for two strings, x and y using bigrams. here nt is the number of character bigrams found in both strings, nx is the number of bigrams in string x and ny is the number of bigrams in string y. For example, to calculate the similarity between:

night

nacht

We would find the set of bigrams in each word:

{ni,ig,gh,ht}

{na,ac,ch,ht}

Each set has four elements, and the intersection of these two sets has only one element: ht.

Plugging this into the formula, we calculate, s = (2 · 1) / (4 + 4) = 0.25.

(defn dice-coefficient-str " http://en.wikipedia.org/wiki/Dice%27s_coefficient When taken as a string similarity measure, the coefficient may be calculated for two strings, x and y using bigrams. here nt is the number of character bigrams found in both strings, nx is the number of bigrams in string x and ny is the number of bigrams in string y. For example, to calculate the similarity between: night nacht We would find the set of bigrams in each word: {ni,ig,gh,ht} {na,ac,ch,ht} Each set has four elements, and the intersection of these two sets has only one element: ht. Plugging this into the formula, we calculate, s = (2 ¬? 1) / (4 + 4) = 0.25. " [a b] (dice-coefficient (bigrams a) (bigrams b)))

