You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@commons.apache.org by "Gary D. Gregory (JIRA)" <ji...@apache.org> on 2011/03/30 04:48:05 UTC
[jira] [Closed] (CODEC-107) Enhance documentation for Language
Encoders
[ https://issues.apache.org/jira/browse/CODEC-107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Gary D. Gregory closed CODEC-107.
---------------------------------
Resolution: Won't Fix
> Enhance documentation for Language Encoders
> -------------------------------------------
>
> Key: CODEC-107
> URL: https://issues.apache.org/jira/browse/CODEC-107
> Project: Commons Codec
> Issue Type: Improvement
> Affects Versions: 1.4
> Reporter: Marc Pompl
> Priority: Minor
> Fix For: 1.5
>
> Original Estimate: 1h
> Remaining Estimate: 1h
>
> The current userguide (http://commons.apache.org/codec/userguide.html) just lists four Language Encoders, but there are five at the moment. CODEC-106 implements a sixth one.
> Would be a good idea, to complete documentation.
> Additionally, I suggest to extent the userguide in order to show a simple performance measurement:
> _SNIP_
> org.apache.commons.codec.language.Metaphone encodings per msec: 327
> org.apache.commons.codec.language.DoubleMetaphone encodings per msec: 224
> org.apache.commons.codec.language.Soundex encodings per msec: 904
> org.apache.commons.codec.language.RefinedSoundex encodings per msec: 637
> org.apache.commons.codec.language.Caverphone encodings per msec: 5
> org.apache.commons.codec.language.ColognePhonetic encodings per msec: 289
> So, Soundex is the fastest encoder. Caverphone is much slower than any other algorithm. All others show off nearly the same performance.
> Checked with the following code:
> {code:java}
> private static final int REPEATS = 1000000;
> public void checkSpeed() throws Exception {
> checkSpeedEncoding(new Metaphone(), "easgasg", REPEATS);
> checkSpeedEncoding(new DoubleMetaphone(), "easgasg", REPEATS);
> checkSpeedEncoding(new Soundex(), "easgasg", REPEATS);
> checkSpeedEncoding(new RefinedSoundex(), "easgasg", REPEATS);
> checkSpeedEncoding(new Caverphone(), "Carlene", 100000);
> checkSpeedEncoding(new ColognePhonetic(), "Schmitt", REPEATS);
> }
>
> private void checkSpeedEncoding(Encoder encoder, String toBeEncoded, int repeats) throws Exception {
> long start = System.currentTimeMillis();
> for ( int i=0; i<repeats; i++) {
> encoder.encode(toBeEncoded);
> }
> long duration = System.currentTimeMillis()-start;
> System.out.println(encoder.getClass().getName() + " encodings per msec: "+(repeats/duration));
> }
> {code}
> _SNAP_
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: [jira] [Closed] (CODEC-107) Enhance documentation for Language Encoders
Posted by Gary Gregory <ga...@gmail.com>.
On Tue, Mar 29, 2011 at 11:20 PM, sebb <se...@gmail.com> wrote:
> On 30 March 2011 03:48, Gary D. Gregory (JIRA) <ji...@apache.org> wrote:
> >
> > [
> https://issues.apache.org/jira/browse/CODEC-107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel]
> >
> > Gary D. Gregory closed CODEC-107.
> > ---------------------------------
> >
> > Resolution: Won't Fix
>
> In that case, the "Fix for" version should be removed, no?
>
Good point. I guess there is not way to say "we considered this for 1.5 and
decided against it". Well, I can say that in a comment of course :)
Gary
>
> >
> >> Enhance documentation for Language Encoders
> >> -------------------------------------------
> >>
> >> Key: CODEC-107
> >> URL: https://issues.apache.org/jira/browse/CODEC-107
> >> Project: Commons Codec
> >> Issue Type: Improvement
> >> Affects Versions: 1.4
> >> Reporter: Marc Pompl
> >> Priority: Minor
> >> Fix For: 1.5
> >>
> >> Original Estimate: 1h
> >> Remaining Estimate: 1h
> >>
> >> The current userguide (http://commons.apache.org/codec/userguide.html)
> just lists four Language Encoders, but there are five at the moment.
> CODEC-106 implements a sixth one.
> >> Would be a good idea, to complete documentation.
> >> Additionally, I suggest to extent the userguide in order to show a
> simple performance measurement:
> >> _SNIP_
> >> org.apache.commons.codec.language.Metaphone encodings per msec: 327
> >> org.apache.commons.codec.language.DoubleMetaphone encodings per msec:
> 224
> >> org.apache.commons.codec.language.Soundex encodings per msec: 904
> >> org.apache.commons.codec.language.RefinedSoundex encodings per msec: 637
> >> org.apache.commons.codec.language.Caverphone encodings per msec: 5
> >> org.apache.commons.codec.language.ColognePhonetic encodings per msec:
> 289
> >> So, Soundex is the fastest encoder. Caverphone is much slower than any
> other algorithm. All others show off nearly the same performance.
> >> Checked with the following code:
> >> {code:java}
> >> private static final int REPEATS = 1000000;
> >> public void checkSpeed() throws Exception {
> >> checkSpeedEncoding(new Metaphone(), "easgasg", REPEATS);
> >> checkSpeedEncoding(new DoubleMetaphone(), "easgasg", REPEATS);
> >> checkSpeedEncoding(new Soundex(), "easgasg", REPEATS);
> >> checkSpeedEncoding(new RefinedSoundex(), "easgasg", REPEATS);
> >> checkSpeedEncoding(new Caverphone(), "Carlene", 100000);
> >> checkSpeedEncoding(new ColognePhonetic(), "Schmitt", REPEATS);
> >> }
> >>
> >> private void checkSpeedEncoding(Encoder encoder, String toBeEncoded,
> int repeats) throws Exception {
> >> long start = System.currentTimeMillis();
> >> for ( int i=0; i<repeats; i++) {
> >> encoder.encode(toBeEncoded);
> >> }
> >> long duration = System.currentTimeMillis()-start;
> >> System.out.println(encoder.getClass().getName() + " encodings
> per msec: "+(repeats/duration));
> >> }
> >> {code}
> >> _SNAP_
> >
> > --
> > This message is automatically generated by JIRA.
> > For more information on JIRA, see:
> http://www.atlassian.com/software/jira
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>
>
--
Thank you,
Gary
http://garygregory.wordpress.com/
http://garygregory.com/
http://people.apache.org/~ggregory/
http://twitter.com/GaryGregory
Re: [jira] [Closed] (CODEC-107) Enhance documentation for Language Encoders
Posted by sebb <se...@gmail.com>.
On 30 March 2011 03:48, Gary D. Gregory (JIRA) <ji...@apache.org> wrote:
>
> [ https://issues.apache.org/jira/browse/CODEC-107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
>
> Gary D. Gregory closed CODEC-107.
> ---------------------------------
>
> Resolution: Won't Fix
In that case, the "Fix for" version should be removed, no?
>
>> Enhance documentation for Language Encoders
>> -------------------------------------------
>>
>> Key: CODEC-107
>> URL: https://issues.apache.org/jira/browse/CODEC-107
>> Project: Commons Codec
>> Issue Type: Improvement
>> Affects Versions: 1.4
>> Reporter: Marc Pompl
>> Priority: Minor
>> Fix For: 1.5
>>
>> Original Estimate: 1h
>> Remaining Estimate: 1h
>>
>> The current userguide (http://commons.apache.org/codec/userguide.html) just lists four Language Encoders, but there are five at the moment. CODEC-106 implements a sixth one.
>> Would be a good idea, to complete documentation.
>> Additionally, I suggest to extent the userguide in order to show a simple performance measurement:
>> _SNIP_
>> org.apache.commons.codec.language.Metaphone encodings per msec: 327
>> org.apache.commons.codec.language.DoubleMetaphone encodings per msec: 224
>> org.apache.commons.codec.language.Soundex encodings per msec: 904
>> org.apache.commons.codec.language.RefinedSoundex encodings per msec: 637
>> org.apache.commons.codec.language.Caverphone encodings per msec: 5
>> org.apache.commons.codec.language.ColognePhonetic encodings per msec: 289
>> So, Soundex is the fastest encoder. Caverphone is much slower than any other algorithm. All others show off nearly the same performance.
>> Checked with the following code:
>> {code:java}
>> private static final int REPEATS = 1000000;
>> public void checkSpeed() throws Exception {
>> checkSpeedEncoding(new Metaphone(), "easgasg", REPEATS);
>> checkSpeedEncoding(new DoubleMetaphone(), "easgasg", REPEATS);
>> checkSpeedEncoding(new Soundex(), "easgasg", REPEATS);
>> checkSpeedEncoding(new RefinedSoundex(), "easgasg", REPEATS);
>> checkSpeedEncoding(new Caverphone(), "Carlene", 100000);
>> checkSpeedEncoding(new ColognePhonetic(), "Schmitt", REPEATS);
>> }
>>
>> private void checkSpeedEncoding(Encoder encoder, String toBeEncoded, int repeats) throws Exception {
>> long start = System.currentTimeMillis();
>> for ( int i=0; i<repeats; i++) {
>> encoder.encode(toBeEncoded);
>> }
>> long duration = System.currentTimeMillis()-start;
>> System.out.println(encoder.getClass().getName() + " encodings per msec: "+(repeats/duration));
>> }
>> {code}
>> _SNAP_
>
> --
> This message is automatically generated by JIRA.
> For more information on JIRA, see: http://www.atlassian.com/software/jira
>
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org