Matching based on common tokens

Compare stings based on shared tokens.

lev_token_set_ratio(a, b, pairwise = TRUE, useNames = TRUE, ...)

Arguments

a, b: The input strings
pairwise: Boolean. If TRUE, only the pairwise distances between a and b will be computed, rather than the combinations of all elements.
useNames: Boolean. Use input vectors as row and column names?
...: Additional arguments to be passed to stringdist::stringdistmatrix() or stringdist::stringsimmatrix().

Value

A numeric scalar, vector or matrix depending on the length of the inputs.

Details

Similar to lev_token_sort_ratio() this function breaks the input down into tokens. It then identifies any common tokens between strings and creates three new strings:

x <- {common_tokens}
y <- {common_tokens}{remaining_unique_tokens_from_string_a}
z <- {common_tokens}{remaining_unique_tokens_from_string_b}

and performs three pairwise lev_ratio() calculations between them (x vs y, y vs z and x vs z). The highest of those three ratios is returned.

Examples

x <- "the quick brown fox jumps over the lazy dog"
y <- "my lazy dog was jumped over by a quick brown fox"

lev_ratio(x, y)
#> [1] 0.2916667

lev_token_sort_ratio(x, y)
#> [1] 0.6458333

lev_token_set_ratio(x, y)
#> [1] 0.7435897

Arguments

Value

Details

See also

Examples