You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@commons.apache.org by Meyer Falk <Fa...@IT2media.de> on 2010/09/15 09:56:17 UTC

Re: [codec] Koelner Phonetik (cologne phonetic)

Ok, thank you. There's now an issue at JIRA:
https://issues.apache.org/jira/browse/CODEC-106

Best regards,
Falk

Am Montag, den 13.09.2010, 08:56 -0400 schrieb James Carman:
> Perhaps file a JIRA (https://issues.apache.org/jira/browse/CODEC) and
> attach the source.  Make sure you check the "Grant license to ASF for
> inclusion in ASF works" checkbox.
> 
> 
> 
> On Mon, Sep 13, 2010 at 8:33 AM, Falk Meyer <fa...@it2media.de> wrote:
> > Hi there,
> >
> > I’ve implemented the “Kölner Phonetik” algorithm (cologne phonetic),
> > which is a phonetic algorithm optimised for the German language. For a
> > German description see: http://de.wikipedia.org/wiki/K%C3%
> > B6lner_Phonetik . For an English description see the source-code
> > comments.
> >
> > Latest Source files:
> >
> > org.apache.commons.codec.language.ColognePhonetic –
> > http://ubuntuone.com/p/Fzm/
> >
> > org.apache.commons.codec.language.ColognePhoneticTest –
> > http://ubuntuone.com/p/Fzn/
> >
> > Latest Build: http://ubuntuone.com/p/Fzo/
> >
> > Whole project (packed jar): http://ubuntuone.com/p/Fzp/
> >
> > If you want to add “Kölner Phonetik” functionality to commons-codec,
> > feel free to adapt the sources to your guidelines, or tell me how I can
> > do.
> >
> > Best regards,
> > Falk
> >
> > **********************************************************************
> > Diese E-Mail wurde auf Viren ueberprueft.
> > mailsweeper@it2media.de
> > **********************************************************************
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> > For additional commands, e-mail: dev-help@commons.apache.org
> >
> >
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
> 


Re: [codec] Koelner Phonetik (cologne phonetic)

Posted by Paul Libbrecht <pa...@activemath.org>.
How does it compare to the Metaphone results?

paul


Le 15 sept. 2010 à 09:56, Meyer Falk a écrit :

> Ok, thank you. There's now an issue at JIRA:
> https://issues.apache.org/jira/browse/CODEC-106
> 
> Best regards,
> Falk
> 
> Am Montag, den 13.09.2010, 08:56 -0400 schrieb James Carman:
>> Perhaps file a JIRA (https://issues.apache.org/jira/browse/CODEC) and
>> attach the source.  Make sure you check the "Grant license to ASF for
>> inclusion in ASF works" checkbox.
>> 
>> 
>> 
>> On Mon, Sep 13, 2010 at 8:33 AM, Falk Meyer <fa...@it2media.de> wrote:
>>> Hi there,
>>> 
>>> I’ve implemented the “Kölner Phonetik” algorithm (cologne phonetic),
>>> which is a phonetic algorithm optimised for the German language. For a
>>> German description see: http://de.wikipedia.org/wiki/K%C3%
>>> B6lner_Phonetik . For an English description see the source-code
>>> comments.
>>> 
>>> Latest Source files:
>>> 
>>> org.apache.commons.codec.language.ColognePhonetic –
>>> http://ubuntuone.com/p/Fzm/
>>> 
>>> org.apache.commons.codec.language.ColognePhoneticTest –
>>> http://ubuntuone.com/p/Fzn/
>>> 
>>> Latest Build: http://ubuntuone.com/p/Fzo/
>>> 
>>> Whole project (packed jar): http://ubuntuone.com/p/Fzp/
>>> 
>>> If you want to add “Kölner Phonetik” functionality to commons-codec,
>>> feel free to adapt the sources to your guidelines, or tell me how I can
>>> do.
>>> 
>>> Best regards,
>>> Falk
>>> 
>>> **********************************************************************
>>> Diese E-Mail wurde auf Viren ueberprueft.
>>> mailsweeper@it2media.de
>>> **********************************************************************
>>> 
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>>> For additional commands, e-mail: dev-help@commons.apache.org
>>> 
>>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>> For additional commands, e-mail: dev-help@commons.apache.org
>> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [codec] Koelner Phonetik (cologne phonetic)

Posted by Falk Meyer <fa...@it2media.de>.
Hi Paul,

in general the cologne phonetic is for German and metaphone is for
English texts, AFAIK. So cologne phonetic takes care about some special
cases: "sch" is something like "s", "th" doesn't sound like the English
"th" (it simply sounds like "t"), what a "c" sounds like depends
strongly on it's context ("ch", "ck", "sch") etc. and last but not least
this implementation handles germanic umlauts (ä, ö, ü) and the "ß".

Some examples, where cologne phonetic matches and metaphone does not:
-------------------------
Schule =? Sule
isCologneEqual: true        // because "Sch" is near "s" in German
isMetaphoneEqual: false

School =? Skool
isCologneEqual: false       // because "Sch" does't sound like "sk"
isMetaphoneEqual: true

Theater =? Teater
isCologneEqual: true        // "h" is silent here
isMetaphoneEqual: false

Sach =? Sack                // because the alg. says it ;)
isCologneEqual: true
isMetaphoneEqual: false

Harkan =? Arkan             
isCologneEqual: true        // "h" is ignored in most cases
isMetaphoneEqual: false

Alb =? Alp
isCologneEqual: true        // sounds nearly identical
isMetaphoneEqual: false

ä =? a
isCologneEqual: true        // umlaut
isMetaphoneEqual: false

ö =? o
isCologneEqual: true        // ~
isMetaphoneEqual: false

ü =? u
isCologneEqual: true        // ~
isMetaphoneEqual: false

ß =? s
isCologneEqual: true
isMetaphoneEqual: false
-------------------------


I have no good testing data at the moment and with metaphone im not
familiar enough to make general statements about the differences between
cologne phonetic's and metaphon's results.

If there are still questions about the algorithm, one can study the
comments, the code itself and the test cases attached here
https://issues.apache.org/jira/browse/CODEC-106 and compare it to
metaphone.

Cheers,
Falk


Am Mittwoch, den 15.09.2010, 10:02 +0200 schrieb Paul Libbrecht:
> How does it compare to the Metaphone results?
> 
> paul
> 
> 
> Le 15 sept. 2010 à 09:56, Meyer Falk a écrit :
> 
> > Ok, thank you. There's now an issue at JIRA:
> > https://issues.apache.org/jira/browse/CODEC-106
> > 
> > Best regards,
> > Falk
> > 
> > Am Montag, den 13.09.2010, 08:56 -0400 schrieb James Carman:
> >> Perhaps file a JIRA (https://issues.apache.org/jira/browse/CODEC) and
> >> attach the source.  Make sure you check the "Grant license to ASF for
> >> inclusion in ASF works" checkbox.
> >> 
> >> 
> >> 
> >> On Mon, Sep 13, 2010 at 8:33 AM, Falk Meyer <fa...@it2media.de> wrote:
> >>> Hi there,
> >>> 
> >>> I’ve implemented the “Kölner Phonetik” algorithm (cologne phonetic),
> >>> which is a phonetic algorithm optimised for the German language. For a
> >>> German description see: http://de.wikipedia.org/wiki/K%C3%
> >>> B6lner_Phonetik . For an English description see the source-code
> >>> comments.
> >>> 
> >>> Latest Source files:
> >>> 
> >>> org.apache.commons.codec.language.ColognePhonetic –
> >>> http://ubuntuone.com/p/Fzm/
> >>> 
> >>> org.apache.commons.codec.language.ColognePhoneticTest –
> >>> http://ubuntuone.com/p/Fzn/
> >>> 
> >>> Latest Build: http://ubuntuone.com/p/Fzo/
> >>> 
> >>> Whole project (packed jar): http://ubuntuone.com/p/Fzp/
> >>> 
> >>> If you want to add “Kölner Phonetik” functionality to commons-codec,
> >>> feel free to adapt the sources to your guidelines, or tell me how I can
> >>> do.
> >>> 
> >>> Best regards,
> >>> Falk
> >>> 
> >>> **********************************************************************
> >>> Diese E-Mail wurde auf Viren ueberprueft.
> >>> mailsweeper@it2media.de
> >>> **********************************************************************
> >>> 
> >>> ---------------------------------------------------------------------
> >>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> >>> For additional commands, e-mail: dev-help@commons.apache.org
> >>> 
> >>> 
> >> 
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> >> For additional commands, e-mail: dev-help@commons.apache.org
> >> 
> > 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org