You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Karol Potocki (JIRA)" <ji...@apache.org> on 2015/10/30 17:00:29 UTC

[jira] [Commented] (DRILL-3747) UDF for "fuzzy" string and similarity matching

    [ https://issues.apache.org/jira/browse/DRILL-3747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14982753#comment-14982753 ] 

Karol Potocki commented on DRILL-3747:
--------------------------------------

Such functionality is often required when we search through data produced by user collaboration (i.e. street names etc. in internet datasources) or we make search conditions based on user input (handling spelling mistakes).
Recently I needed solution like that, basic implementation is on my github:
https://github.com/k255/drill-fuzzy-search
It works on simmetrics library which recently went apache license.

> UDF for "fuzzy" string and similarity matching
> ----------------------------------------------
>
>                 Key: DRILL-3747
>                 URL: https://issues.apache.org/jira/browse/DRILL-3747
>             Project: Apache Drill
>          Issue Type: New Feature
>          Components: Functions - Drill
>    Affects Versions: Future
>            Reporter: Edmon Begoli
>            Priority: Minor
>              Labels: features
>             Fix For: Future
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> I propose implementation of string/distance or distance matching functions similar to what one finds in most of other databases - soundex, metaphone, levenshtein (and more advanced variants such as levenshtein-damerau, jaro-winkler, etc.).
> See fuzzystrmatch http://www.postgresql.org/docs/9.5/static/fuzzystrmatch.html, 
> and pg_similarity http://pgsimilarity.projects.pgfoundry.org/
> for inspiration.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)