You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@commons.apache.org by "Gary D. Gregory (JIRA)" <ji...@apache.org> on 2011/07/25 17:46:14 UTC
[jira] [Issue Comment Edited] (CODEC-125) Implement a Beider-Morse
phonetic matching codec
[ https://issues.apache.org/jira/browse/CODEC-125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13070559#comment-13070559 ]
Gary D. Gregory edited comment on CODEC-125 at 7/25/11 3:44 PM:
----------------------------------------------------------------
For me it looks like this test:
{noformat}
java.lang.AssertionError: language predicted for name 'Renault' is wrong: [] should contain 'french'
at org.junit.Assert.fail(Assert.java:91)
at org.junit.Assert.assertTrue(Assert.java:43)
at org.apache.commons.codec.language.bm.LanguageGuessingTest.testLanguageGuessing(LanguageGuessingTest.java:84)
{noformat}
Fails because this bm.Lang method:
{code:java}
public Set<String> guessLanguages(String text)
{
text = text.toLowerCase(); // todo: locale?
// System.out.println("Testing text: '" + text + "'");
Set<String> langs = new HashSet<String>(languages.getLanguages());
for(LangRule rule : rules)
{
if(rule.matches(text))
{
// System.out.println("Rule " + rule.pattern + " matches " + text);
if(rule.acceptOnMatch) {
// System.out.println("Retaining " + rule.languages);
langs.retainAll(rule.languages);
}
else {
// System.out.println("Removing " + rule.languages);
langs.removeAll(rule.languages);
}
// System.out.println("Current languages: " + langs);
}
else
{
// System.out.println("Rule " + rule.pattern + " does not match " + text);
}
}
return langs;
}
{code}
Return an empty set. It first add, then removes values in the loop and the set finishes empty.
Could rule order be an issue. A difference in RE interpretation between Java 5 and 6? I am on 6.
was (Author: garydgregory):
For me it looks like this test:
java.lang.AssertionError: language predicted for name 'Renault' is wrong: [] should contain 'french'
at org.junit.Assert.fail(Assert.java:91)
at org.junit.Assert.assertTrue(Assert.java:43)
at org.apache.commons.codec.language.bm.LanguageGuessingTest.testLanguageGuessing(LanguageGuessingTest.java:84)
Fails because this bm.Lang method:
public Set<String> guessLanguages(String text)
{
text = text.toLowerCase(); // todo: locale?
// System.out.println("Testing text: '" + text + "'");
Set<String> langs = new HashSet<String>(languages.getLanguages());
for(LangRule rule : rules)
{
if(rule.matches(text))
{
// System.out.println("Rule " + rule.pattern + " matches " + text);
if(rule.acceptOnMatch) {
// System.out.println("Retaining " + rule.languages);
langs.retainAll(rule.languages);
}
else {
// System.out.println("Removing " + rule.languages);
langs.removeAll(rule.languages);
}
// System.out.println("Current languages: " + langs);
}
else
{
// System.out.println("Rule " + rule.pattern + " does not match " + text);
}
}
return langs;
}
Return an empty set. It first add, then removes values in the loop and the set finishes empty.
Could rule order be an issue. A difference in RE interpretation between Java 5 and 6? I am on 6.
> Implement a Beider-Morse phonetic matching codec
> ------------------------------------------------
>
> Key: CODEC-125
> URL: https://issues.apache.org/jira/browse/CODEC-125
> Project: Commons Codec
> Issue Type: New Feature
> Reporter: Matthew Pocock
> Priority: Minor
> Attachments: bm-gg.diff, bmpm.patch, bmpm.patch, bmpm.patch, bmpm.patch, bmpm.patch, bmpm.patch, bmpm.patch, bmpm.patch
>
>
> I have implemented Beider Morse Phonetic Matching as a codec against the commons-codec svn trunk. I would like to contribute this to commons-codec.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira