You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@impala.apache.org by "Norbert Luksa (JIRA)" <ji...@apache.org> on 2019/07/11 17:18:00 UTC

[jira] [Created] (IMPALA-8752) Add Jaro-winkler edit distance and similarity built-in function

Norbert Luksa created IMPALA-8752:
-------------------------------------

             Summary: Add Jaro-winkler edit distance and similarity built-in function
                 Key: IMPALA-8752
                 URL: https://issues.apache.org/jira/browse/IMPALA-8752
             Project: IMPALA
          Issue Type: New Feature
            Reporter: Norbert Luksa
            Assignee: Norbert Luksa


References:
 * [Apache commons - JaroWinklerDistance |[https://commons.apache.org/proper/commons-text/apidocs/org/apache/commons/text/similarity/JaroWinklerDistance.html]]
 * [Apache commons - JaroWinklerSimilarity |[https://commons.apache.org/proper/commons-text/apidocs/org/apache/commons/text/similarity/JaroWinklerSimilarity.html]]
 * [Oracle - JARO_WINKLER[_SIMILARITY]|[https://oracle-base.com/articles/11g/utl_match-string-matching-in-oracle]]

Notable difference:
 * With similarity, the Oracle version returns a normalized result ranging from 0 to 100.
 * In the Appache version, null values result in exceptions.
 * Apache rounds the values to two digitsĀ 

The scaling factor of the algorithm can be added as an extra/default argument.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)