You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@commons.apache.org by Steinar Cook <st...@balder.no> on 2006/10/23 22:35:13 UTC

[codec]Implementing support for additional non-english vowels in double metaphone

I have made some modifications to  
org.apache.commons.codec.language.DoubleMetaphone in order to support  
the three additional Norwegian and Danish vowels.  The current  
implementation at Jakarta does not provide any methods to specify the  
language of the input text.

Is it all right to modify DoubleMetaphone to support the Scandinavian  
vowels (Swedish, Danish and Norwegian) and possibly other languages  
or have I completely misunderstood the idea behind the double  
metaphone algorithm? That is, should double metaphone detect various  
language constructs automatically or is it perhaps a better idea to  
have a factory which returns a double metaphone implementation  
appropriate for the language?

Any suggestions?

I would like to contribute any changes back to Jakarta commons-codec,  
of course.


Steinar Cook
steinar@balder.no




---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


RE: [codec]Implementing support for additional non-english vowels in double metaphone

Posted by Gary Gregory <gg...@seagullsoftware.com>.
Hello: Steinar:

The current DoubleMetaphone implementation (released and SVN) allows for
Spanish and Germanic characters, so adding support for other languages
in the same class seems to be in the spirit of the current
implementation. 

I would also say that having language-specific implementation sure
sounds like a reasonable idea. I wonder if there are some performance
issues with the current implementation attempting to work for all
languages. It seems like a bigger topic though and might be worth
discussing separately if the list is interested.

So I would say: Create a JIRA ticket [1] Go ahead and submit patches [2]
for the code *and* unit tests based on the SVN code [3].

Thank you,
Gary

[1] https://issues.apache.org/jira/browse/CODEC
[2] http://jakarta.apache.org/commons/patches.html
[3] http://jakarta.apache.org/commons/codec/cvs-usage.html

> -----Original Message-----
> From: Steinar Cook [mailto:steinar@balder.no]
> Sent: Monday, October 23, 2006 1:35 PM
> To: commons-dev@jakarta.apache.org
> Subject: [codec]Implementing support for additional non-english vowels
in double
> metaphone
> 
> I have made some modifications to
> org.apache.commons.codec.language.DoubleMetaphone in order to support
> the three additional Norwegian and Danish vowels.  The current
> implementation at Jakarta does not provide any methods to specify the
> language of the input text.
> 
> Is it all right to modify DoubleMetaphone to support the Scandinavian
> vowels (Swedish, Danish and Norwegian) and possibly other languages
> or have I completely misunderstood the idea behind the double
> metaphone algorithm? That is, should double metaphone detect various
> language constructs automatically or is it perhaps a better idea to
> have a factory which returns a double metaphone implementation
> appropriate for the language?
> 
> Any suggestions?
> 
> I would like to contribute any changes back to Jakarta commons-codec,
> of course.
> 
> 
> Steinar Cook
> steinar@balder.no
> 
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: commons-dev-help@jakarta.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org