You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@commons.apache.org by "Gary D. Gregory (JIRA)" <ji...@apache.org> on 2011/07/25 17:46:14 UTC

[jira] [Issue Comment Edited] (CODEC-125) Implement a Beider-Morse phonetic matching codec

    [ https://issues.apache.org/jira/browse/CODEC-125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13070559#comment-13070559 ] 

Gary D. Gregory edited comment on CODEC-125 at 7/25/11 3:44 PM:
----------------------------------------------------------------

For me it looks like this test:

{noformat}
java.lang.AssertionError: language predicted for name 'Renault' is wrong: [] should contain 'french'
	at org.junit.Assert.fail(Assert.java:91)
	at org.junit.Assert.assertTrue(Assert.java:43)
	at org.apache.commons.codec.language.bm.LanguageGuessingTest.testLanguageGuessing(LanguageGuessingTest.java:84)
{noformat}

Fails because this bm.Lang method:
{code:java}
    public Set<String> guessLanguages(String text)
    {
        text = text.toLowerCase(); // todo: locale?
//        System.out.println("Testing text: '" + text + "'");

        Set<String> langs = new HashSet<String>(languages.getLanguages());
        for(LangRule rule : rules)
        {
            if(rule.matches(text))
            {
//                System.out.println("Rule " + rule.pattern + " matches " + text);
                if(rule.acceptOnMatch) {
//                    System.out.println("Retaining " + rule.languages);
                    langs.retainAll(rule.languages);
                }
                else {
//                    System.out.println("Removing " + rule.languages);
                    langs.removeAll(rule.languages);
                }
//                System.out.println("Current languages: " + langs);
            }
            else
            {
//                System.out.println("Rule " + rule.pattern + " does not match " + text);
            }
        }

        return langs;
    }
{code}
Return an empty set. It first add, then removes values in the loop and the set finishes empty. 

Could rule order be an issue. A difference in RE interpretation between Java 5 and 6? I am on 6.

      was (Author: garydgregory):
    For me it looks like this test:

java.lang.AssertionError: language predicted for name 'Renault' is wrong: [] should contain 'french'
	at org.junit.Assert.fail(Assert.java:91)
	at org.junit.Assert.assertTrue(Assert.java:43)
	at org.apache.commons.codec.language.bm.LanguageGuessingTest.testLanguageGuessing(LanguageGuessingTest.java:84)

Fails because this bm.Lang method:

    public Set<String> guessLanguages(String text)
    {
        text = text.toLowerCase(); // todo: locale?
//        System.out.println("Testing text: '" + text + "'");

        Set<String> langs = new HashSet<String>(languages.getLanguages());
        for(LangRule rule : rules)
        {
            if(rule.matches(text))
            {
//                System.out.println("Rule " + rule.pattern + " matches " + text);
                if(rule.acceptOnMatch) {
//                    System.out.println("Retaining " + rule.languages);
                    langs.retainAll(rule.languages);
                }
                else {
//                    System.out.println("Removing " + rule.languages);
                    langs.removeAll(rule.languages);
                }
//                System.out.println("Current languages: " + langs);
            }
            else
            {
//                System.out.println("Rule " + rule.pattern + " does not match " + text);
            }
        }

        return langs;
    }

Return an empty set. It first add, then removes values in the loop and the set finishes empty. 

Could rule order be an issue. A difference in RE interpretation between Java 5 and 6? I am on 6.
  
> Implement a Beider-Morse phonetic matching codec
> ------------------------------------------------
>
>                 Key: CODEC-125
>                 URL: https://issues.apache.org/jira/browse/CODEC-125
>             Project: Commons Codec
>          Issue Type: New Feature
>            Reporter: Matthew Pocock
>            Priority: Minor
>         Attachments: bm-gg.diff, bmpm.patch, bmpm.patch, bmpm.patch, bmpm.patch, bmpm.patch, bmpm.patch, bmpm.patch, bmpm.patch
>
>
> I have implemented Beider Morse Phonetic Matching as a codec against the commons-codec svn trunk. I would like to contribute this to commons-codec.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira