You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Greg Rahn (Code Review)" <ge...@cloudera.org> on 2019/11/07 16:31:03 UTC

[Impala-ASF-CR] IMPALA-8709: Add Damerau-Levenshtein edit distance built-in function

Greg Rahn has uploaded a new patch set (#8). ( http://gerrit.cloudera.org:8080/13794 )

Change subject: IMPALA-8709: Add Damerau-Levenshtein edit distance built-in function
......................................................................

IMPALA-8709: Add Damerau-Levenshtein edit distance built-in function

This patch adds new built-in functions to calculate restricted
Damerau-Levenshtein edit distance (optimal string alignment).
Implmented as dle_dst() and damerau_levenshtein(). If either value is
NULL or both values are NULL returns NULL which differs from Netezza's
dle_dst() which returns the length of the not NULL value or 0 if both
values are NULL. The NULL behavior matches the existing levenshtein()
function.

Also cleans up levenshtein tests.

Testing:
- Added unit tests to expr-test.cc
- Manual testing on over 1400 string pairs from
  http://marvin.cs.uidaho.edu/misspell.html and results match Netezza

Change-Id: Ib759817ec15e7075bf49d51e494e45c8af4db94d
---
M be/src/exprs/expr-test.cc
M be/src/exprs/string-functions-ir.cc
M be/src/exprs/string-functions.h
M common/function-registry/impala_functions.py
4 files changed, 152 insertions(+), 37 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/94/13794/8
-- 
To view, visit http://gerrit.cloudera.org:8080/13794
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ib759817ec15e7075bf49d51e494e45c8af4db94d
Gerrit-Change-Number: 13794
Gerrit-PatchSet: 8
Gerrit-Owner: Greg Rahn <gr...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Greg Rahn <gr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>