You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Brooke Schreier Ganz (Created) (JIRA)" <ji...@apache.org> on 2011/12/21 01:39:30 UTC

[jira] [Created] (SOLR-2982) Upgrade Apache Commons Codec to version 1.6 in order to add new Beider-Morse Phonetic Matching (BMPM) option

Upgrade Apache Commons Codec to version 1.6 in order to add new Beider-Morse Phonetic Matching (BMPM) option
------------------------------------------------------------------------------------------------------------

                 Key: SOLR-2982
                 URL: https://issues.apache.org/jira/browse/SOLR-2982
             Project: Solr
          Issue Type: Improvement
          Components: Rules, Schema and Analysis, search
            Reporter: Brooke Schreier Ganz
             Fix For: 3.6, 4.0


Apache Commons Codec released version 1.6 of their codec pack in November, 2011.  Along with a few bug fixes, 1.6 contains a great new phonetic matching system called Beider-Morse Phonetic Matching (BMPM) that is far superior to the existing phonetic codecs, such as regular soundex, metaphone, caverphone, and so on.  BMPM has actually been available for some time, but this is the first port of it to java, and its first commit in the Apache ecosystem.

For a lot more information, see here: http://stevemorse.org/phoneticinfo.htm   and  http://stevemorse.org/phonetics/bmpm.htm

BMPM would be a fantastic "soundalike" tool to help search for personal names (or just surnames) in a Solr/Lucene index, much better than Levenshtein distance for this use case.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Resolved] (SOLR-2982) Upgrade Apache Commons Codec to version 1.6 in order to add new Beider-Morse Phonetic Matching (BMPM) option

Posted by "Robert Muir (Resolved) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-2982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Muir resolved SOLR-2982.
-------------------------------

    Resolution: Fixed
    
> Upgrade Apache Commons Codec to version 1.6 in order to add new Beider-Morse Phonetic Matching (BMPM) option
> ------------------------------------------------------------------------------------------------------------
>
>                 Key: SOLR-2982
>                 URL: https://issues.apache.org/jira/browse/SOLR-2982
>             Project: Solr
>          Issue Type: Improvement
>          Components: Rules, Schema and Analysis, search
>            Reporter: Brooke Schreier Ganz
>              Labels: codec, commons, commons-codec, language, names, phonetic, search, searching, soundalike
>             Fix For: 3.6, 4.0
>
>         Attachments: SOLR-2982.patch
>
>
> Apache Commons Codec released version 1.6 of their codec pack in November, 2011.  Along with a few bug fixes, 1.6 contains a great new phonetic matching system called Beider-Morse Phonetic Matching (BMPM) that is far superior to the existing phonetic codecs, such as regular soundex, metaphone, caverphone, and so on.  BMPM has actually been available for some time, but this is the first port of it to java, and its first commit in the Apache ecosystem.
> For a lot more information, see here: http://stevemorse.org/phoneticinfo.htm   and  http://stevemorse.org/phonetics/bmpm.htm
> BMPM would be a fantastic "soundalike" tool to help search for personal names (or just surnames) in a Solr/Lucene index, much better than Levenshtein distance for this use case.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (SOLR-2982) Upgrade Apache Commons Codec to version 1.6 in order to add new Beider-Morse Phonetic Matching (BMPM) option

Posted by "Robert Muir (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-2982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Muir updated SOLR-2982:
------------------------------

    Attachment: SOLR-2982.patch

Attached is a patch, really bmpm needs its own filter because this encoding is shoved onto the commons-codec API (but imo this is really confusing: it doesnt really make sense to use strings here)

the output of this thing is actually syntax such as (((x|y|z)-(a|b..., which means we have to parse it again to do anything with it.

I also noticed this new encoder seems to have performance issues, i had to scale back the random strings test somewhat. 

                
> Upgrade Apache Commons Codec to version 1.6 in order to add new Beider-Morse Phonetic Matching (BMPM) option
> ------------------------------------------------------------------------------------------------------------
>
>                 Key: SOLR-2982
>                 URL: https://issues.apache.org/jira/browse/SOLR-2982
>             Project: Solr
>          Issue Type: Improvement
>          Components: Rules, Schema and Analysis, search
>            Reporter: Brooke Schreier Ganz
>              Labels: codec, commons, commons-codec, language, names, phonetic, search, searching, soundalike
>             Fix For: 3.6, 4.0
>
>         Attachments: SOLR-2982.patch
>
>
> Apache Commons Codec released version 1.6 of their codec pack in November, 2011.  Along with a few bug fixes, 1.6 contains a great new phonetic matching system called Beider-Morse Phonetic Matching (BMPM) that is far superior to the existing phonetic codecs, such as regular soundex, metaphone, caverphone, and so on.  BMPM has actually been available for some time, but this is the first port of it to java, and its first commit in the Apache ecosystem.
> For a lot more information, see here: http://stevemorse.org/phoneticinfo.htm   and  http://stevemorse.org/phonetics/bmpm.htm
> BMPM would be a fantastic "soundalike" tool to help search for personal names (or just surnames) in a Solr/Lucene index, much better than Levenshtein distance for this use case.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (SOLR-2982) Upgrade Apache Commons Codec to version 1.6 in order to add new Beider-Morse Phonetic Matching (BMPM) option

Posted by "Brooke Schreier Ganz (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-2982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13177561#comment-13177561 ] 

Brooke Schreier Ganz commented on SOLR-2982:
--------------------------------------------

Thank you so much for working on this!  I just tested out the latest nightly build on my laptop and everything works great.

I use Solr to run a non-profit group's 190,000+ record genealogy database and this new ability to do proper soundalike surname searches through our listings of consonant-heavy Central and Eastern European surnames (and their multitudes of "creative" spelling variants) will make things *a lot* easier.  You just made a lot of genealogists very happy.  :-)

Thanks again!
                
> Upgrade Apache Commons Codec to version 1.6 in order to add new Beider-Morse Phonetic Matching (BMPM) option
> ------------------------------------------------------------------------------------------------------------
>
>                 Key: SOLR-2982
>                 URL: https://issues.apache.org/jira/browse/SOLR-2982
>             Project: Solr
>          Issue Type: Improvement
>          Components: Rules, Schema and Analysis, search
>            Reporter: Brooke Schreier Ganz
>              Labels: codec, commons, commons-codec, language, names, phonetic, search, searching, soundalike
>             Fix For: 3.6, 4.0
>
>         Attachments: SOLR-2982.patch
>
>
> Apache Commons Codec released version 1.6 of their codec pack in November, 2011.  Along with a few bug fixes, 1.6 contains a great new phonetic matching system called Beider-Morse Phonetic Matching (BMPM) that is far superior to the existing phonetic codecs, such as regular soundex, metaphone, caverphone, and so on.  BMPM has actually been available for some time, but this is the first port of it to java, and its first commit in the Apache ecosystem.
> For a lot more information, see here: http://stevemorse.org/phoneticinfo.htm   and  http://stevemorse.org/phonetics/bmpm.htm
> BMPM would be a fantastic "soundalike" tool to help search for personal names (or just surnames) in a Solr/Lucene index, much better than Levenshtein distance for this use case.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org