You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@commons.apache.org by "Marc Pompl (JIRA)" <ji...@apache.org> on 2010/12/11 01:27:01 UTC

[jira] Created: (CODEC-107) Enhance documentation for Language Encoders

Enhance documentation for Language Encoders
-------------------------------------------

                 Key: CODEC-107
                 URL: https://issues.apache.org/jira/browse/CODEC-107
             Project: Commons Codec
          Issue Type: Improvement
    Affects Versions: 1.4
            Reporter: Marc Pompl
            Priority: Minor
             Fix For: 1.5


The current userguide (http://commons.apache.org/codec/userguide.html) just lists four Language Encoders, but there are five at the moment. CODEC-106 implements a sixth one.
Would be a good idea, to complete documentation.

Additionally, I suggest to extent the wiki (http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.PhoneticFilterFactory) in order to show a simple performance measurement:

_SNAP_

Metaphone encodings per sec: 32258
DoubleMetaphone encodings per sec: 31250
Soundex encodings per sec: 35714
RefinedSoundex encodings per sec: 34482
Caverphone encodings per sec: 5813
ColognePhonetic encodings per sec: 33333

So, Caverphone is much slower than any other algorithm. All others show off nearly the same performance.

Checked with the following code:

{code:java}
  public void checkSpeed() throws Exception {
	  checkSpeedEncoding("Metaphone", "easgasg", "ESKS");
	  checkSpeedEncoding("DoubleMetaphone", "easgasg", "ASKS");
	  checkSpeedEncoding("Soundex", "easgasg", "E220");
	  checkSpeedEncoding("RefinedSoundex", "easgasg", "E034034");
	  checkSpeedEncoding("Caverphone", "Carlene", "KLN1111111");
	  checkSpeedEncoding("ColognePhonetic", "Schmitt", "862");
  }
  
  private void checkSpeedEncoding(String encoder, String toBeEncoded, String estimated) throws Exception {
	  long start = System.currentTimeMillis();
	  for ( int i=0; i<REPEATS; i++) {
		    assertAlgorithm(encoder, "false", toBeEncoded,
		            new String[] { estimated });
	  }
	  long duration = System.currentTimeMillis()-start;
	  System.out.println(encoder + " encodings per sec: "+(REPEATS/(duration/1000)));
  }
{code}

_SNAP_

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Re: [jira] [Closed] (CODEC-107) Enhance documentation for Language Encoders

Posted by Gary Gregory <ga...@gmail.com>.
On Tue, Mar 29, 2011 at 11:20 PM, sebb <se...@gmail.com> wrote:

> On 30 March 2011 03:48, Gary D. Gregory (JIRA) <ji...@apache.org> wrote:
> >
> >     [
> https://issues.apache.org/jira/browse/CODEC-107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel]
> >
> > Gary D. Gregory closed CODEC-107.
> > ---------------------------------
> >
> >    Resolution: Won't Fix
>
> In that case, the "Fix for" version should be removed, no?
>

Good point. I guess there is not way to say "we considered this for 1.5 and
decided against it". Well, I can say that in a comment of course :)

Gary


>
> >
> >> Enhance documentation for Language Encoders
> >> -------------------------------------------
> >>
> >>                 Key: CODEC-107
> >>                 URL: https://issues.apache.org/jira/browse/CODEC-107
> >>             Project: Commons Codec
> >>          Issue Type: Improvement
> >>    Affects Versions: 1.4
> >>            Reporter: Marc Pompl
> >>            Priority: Minor
> >>             Fix For: 1.5
> >>
> >>   Original Estimate: 1h
> >>  Remaining Estimate: 1h
> >>
> >> The current userguide (http://commons.apache.org/codec/userguide.html)
> just lists four Language Encoders, but there are five at the moment.
> CODEC-106 implements a sixth one.
> >> Would be a good idea, to complete documentation.
> >> Additionally, I suggest to extent the userguide in order to show a
> simple performance measurement:
> >> _SNIP_
> >> org.apache.commons.codec.language.Metaphone encodings per msec: 327
> >> org.apache.commons.codec.language.DoubleMetaphone encodings per msec:
> 224
> >> org.apache.commons.codec.language.Soundex encodings per msec: 904
> >> org.apache.commons.codec.language.RefinedSoundex encodings per msec: 637
> >> org.apache.commons.codec.language.Caverphone encodings per msec: 5
> >> org.apache.commons.codec.language.ColognePhonetic encodings per msec:
> 289
> >> So, Soundex is the fastest encoder. Caverphone is much slower than any
> other algorithm. All others show off nearly the same performance.
> >> Checked with the following code:
> >> {code:java}
> >>   private static final int REPEATS = 1000000;
> >>   public void checkSpeed() throws Exception {
> >>         checkSpeedEncoding(new Metaphone(), "easgasg", REPEATS);
> >>         checkSpeedEncoding(new DoubleMetaphone(), "easgasg", REPEATS);
> >>         checkSpeedEncoding(new Soundex(), "easgasg", REPEATS);
> >>         checkSpeedEncoding(new RefinedSoundex(), "easgasg", REPEATS);
> >>         checkSpeedEncoding(new Caverphone(), "Carlene", 100000);
> >>         checkSpeedEncoding(new ColognePhonetic(), "Schmitt", REPEATS);
> >>   }
> >>
> >>   private void checkSpeedEncoding(Encoder encoder, String toBeEncoded,
> int repeats) throws Exception {
> >>         long start = System.currentTimeMillis();
> >>         for ( int i=0; i<repeats; i++) {
> >>                   encoder.encode(toBeEncoded);
> >>         }
> >>         long duration = System.currentTimeMillis()-start;
> >>         System.out.println(encoder.getClass().getName() + " encodings
> per msec: "+(repeats/duration));
> >>   }
> >> {code}
> >> _SNAP_
> >
> > --
> > This message is automatically generated by JIRA.
> > For more information on JIRA, see:
> http://www.atlassian.com/software/jira
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>
>


-- 
Thank you,
Gary

http://garygregory.wordpress.com/
http://garygregory.com/
http://people.apache.org/~ggregory/
http://twitter.com/GaryGregory

Re: [jira] [Closed] (CODEC-107) Enhance documentation for Language Encoders

Posted by sebb <se...@gmail.com>.
On 30 March 2011 03:48, Gary D. Gregory (JIRA) <ji...@apache.org> wrote:
>
>     [ https://issues.apache.org/jira/browse/CODEC-107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
>
> Gary D. Gregory closed CODEC-107.
> ---------------------------------
>
>    Resolution: Won't Fix

In that case, the "Fix for" version should be removed, no?

>
>> Enhance documentation for Language Encoders
>> -------------------------------------------
>>
>>                 Key: CODEC-107
>>                 URL: https://issues.apache.org/jira/browse/CODEC-107
>>             Project: Commons Codec
>>          Issue Type: Improvement
>>    Affects Versions: 1.4
>>            Reporter: Marc Pompl
>>            Priority: Minor
>>             Fix For: 1.5
>>
>>   Original Estimate: 1h
>>  Remaining Estimate: 1h
>>
>> The current userguide (http://commons.apache.org/codec/userguide.html) just lists four Language Encoders, but there are five at the moment. CODEC-106 implements a sixth one.
>> Would be a good idea, to complete documentation.
>> Additionally, I suggest to extent the userguide in order to show a simple performance measurement:
>> _SNIP_
>> org.apache.commons.codec.language.Metaphone encodings per msec: 327
>> org.apache.commons.codec.language.DoubleMetaphone encodings per msec: 224
>> org.apache.commons.codec.language.Soundex encodings per msec: 904
>> org.apache.commons.codec.language.RefinedSoundex encodings per msec: 637
>> org.apache.commons.codec.language.Caverphone encodings per msec: 5
>> org.apache.commons.codec.language.ColognePhonetic encodings per msec: 289
>> So, Soundex is the fastest encoder. Caverphone is much slower than any other algorithm. All others show off nearly the same performance.
>> Checked with the following code:
>> {code:java}
>>   private static final int REPEATS = 1000000;
>>   public void checkSpeed() throws Exception {
>>         checkSpeedEncoding(new Metaphone(), "easgasg", REPEATS);
>>         checkSpeedEncoding(new DoubleMetaphone(), "easgasg", REPEATS);
>>         checkSpeedEncoding(new Soundex(), "easgasg", REPEATS);
>>         checkSpeedEncoding(new RefinedSoundex(), "easgasg", REPEATS);
>>         checkSpeedEncoding(new Caverphone(), "Carlene", 100000);
>>         checkSpeedEncoding(new ColognePhonetic(), "Schmitt", REPEATS);
>>   }
>>
>>   private void checkSpeedEncoding(Encoder encoder, String toBeEncoded, int repeats) throws Exception {
>>         long start = System.currentTimeMillis();
>>         for ( int i=0; i<repeats; i++) {
>>                   encoder.encode(toBeEncoded);
>>         }
>>         long duration = System.currentTimeMillis()-start;
>>         System.out.println(encoder.getClass().getName() + " encodings per msec: "+(repeats/duration));
>>   }
>> {code}
>> _SNAP_
>
> --
> This message is automatically generated by JIRA.
> For more information on JIRA, see: http://www.atlassian.com/software/jira
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


[jira] [Commented] (CODEC-107) Enhance documentation for Language Encoders

Posted by "Marc Pompl (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CODEC-107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13018163#comment-13018163 ] 

Marc Pompl commented on CODEC-107:
----------------------------------

Hi Gary,

I am glad to see the documentation has been improvd. Thanks! :-)

(Just confused that this JIRA has been marked as "won't be fixed".)

> Enhance documentation for Language Encoders
> -------------------------------------------
>
>                 Key: CODEC-107
>                 URL: https://issues.apache.org/jira/browse/CODEC-107
>             Project: Commons Codec
>          Issue Type: Improvement
>    Affects Versions: 1.4
>            Reporter: Marc Pompl
>            Priority: Minor
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The current userguide (http://commons.apache.org/codec/userguide.html) just lists four Language Encoders, but there are five at the moment. CODEC-106 implements a sixth one.
> Would be a good idea, to complete documentation.
> Additionally, I suggest to extent the userguide in order to show a simple performance measurement:
> _SNIP_
> org.apache.commons.codec.language.Metaphone encodings per msec: 327
> org.apache.commons.codec.language.DoubleMetaphone encodings per msec: 224
> org.apache.commons.codec.language.Soundex encodings per msec: 904
> org.apache.commons.codec.language.RefinedSoundex encodings per msec: 637
> org.apache.commons.codec.language.Caverphone encodings per msec: 5
> org.apache.commons.codec.language.ColognePhonetic encodings per msec: 289
> So, Soundex is the fastest encoder. Caverphone is much slower than any other algorithm. All others show off nearly the same performance.
> Checked with the following code:
> {code:java}
>   private static final int REPEATS = 1000000;
>   public void checkSpeed() throws Exception {
> 	  checkSpeedEncoding(new Metaphone(), "easgasg", REPEATS);
> 	  checkSpeedEncoding(new DoubleMetaphone(), "easgasg", REPEATS);
> 	  checkSpeedEncoding(new Soundex(), "easgasg", REPEATS);
> 	  checkSpeedEncoding(new RefinedSoundex(), "easgasg", REPEATS);
> 	  checkSpeedEncoding(new Caverphone(), "Carlene", 100000);
> 	  checkSpeedEncoding(new ColognePhonetic(), "Schmitt", REPEATS);
>   }
>   
>   private void checkSpeedEncoding(Encoder encoder, String toBeEncoded, int repeats) throws Exception {
> 	  long start = System.currentTimeMillis();
> 	  for ( int i=0; i<repeats; i++) {
> 		    encoder.encode(toBeEncoded);
> 	  }
> 	  long duration = System.currentTimeMillis()-start;
> 	  System.out.println(encoder.getClass().getName() + " encodings per msec: "+(repeats/duration));
>   }
> {code}
> _SNAP_

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CODEC-107) Enhance documentation for Language Encoders

Posted by "Gary D. Gregory (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CODEC-107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13018329#comment-13018329 ] 

Gary D. Gregory commented on CODEC-107:
---------------------------------------

I marked it as 'won't fix' because we did not include performance measurement in 1.5. 

> Enhance documentation for Language Encoders
> -------------------------------------------
>
>                 Key: CODEC-107
>                 URL: https://issues.apache.org/jira/browse/CODEC-107
>             Project: Commons Codec
>          Issue Type: Improvement
>    Affects Versions: 1.4
>            Reporter: Marc Pompl
>            Priority: Minor
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The current userguide (http://commons.apache.org/codec/userguide.html) just lists four Language Encoders, but there are five at the moment. CODEC-106 implements a sixth one.
> Would be a good idea, to complete documentation.
> Additionally, I suggest to extent the userguide in order to show a simple performance measurement:
> _SNIP_
> org.apache.commons.codec.language.Metaphone encodings per msec: 327
> org.apache.commons.codec.language.DoubleMetaphone encodings per msec: 224
> org.apache.commons.codec.language.Soundex encodings per msec: 904
> org.apache.commons.codec.language.RefinedSoundex encodings per msec: 637
> org.apache.commons.codec.language.Caverphone encodings per msec: 5
> org.apache.commons.codec.language.ColognePhonetic encodings per msec: 289
> So, Soundex is the fastest encoder. Caverphone is much slower than any other algorithm. All others show off nearly the same performance.
> Checked with the following code:
> {code:java}
>   private static final int REPEATS = 1000000;
>   public void checkSpeed() throws Exception {
> 	  checkSpeedEncoding(new Metaphone(), "easgasg", REPEATS);
> 	  checkSpeedEncoding(new DoubleMetaphone(), "easgasg", REPEATS);
> 	  checkSpeedEncoding(new Soundex(), "easgasg", REPEATS);
> 	  checkSpeedEncoding(new RefinedSoundex(), "easgasg", REPEATS);
> 	  checkSpeedEncoding(new Caverphone(), "Carlene", 100000);
> 	  checkSpeedEncoding(new ColognePhonetic(), "Schmitt", REPEATS);
>   }
>   
>   private void checkSpeedEncoding(Encoder encoder, String toBeEncoded, int repeats) throws Exception {
> 	  long start = System.currentTimeMillis();
> 	  for ( int i=0; i<repeats; i++) {
> 		    encoder.encode(toBeEncoded);
> 	  }
> 	  long duration = System.currentTimeMillis()-start;
> 	  System.out.println(encoder.getClass().getName() + " encodings per msec: "+(repeats/duration));
>   }
> {code}
> _SNAP_

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Updated: (CODEC-107) Enhance documentation for Language Encoders

Posted by "Marc Pompl (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CODEC-107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Marc Pompl updated CODEC-107:
-----------------------------

    Description: 
The current userguide (http://commons.apache.org/codec/userguide.html) just lists four Language Encoders, but there are five at the moment. CODEC-106 implements a sixth one.
Would be a good idea, to complete documentation.

Additionally, I suggest to extent the userguide in order to show a simple performance measurement:

_SNAP_

Metaphone encodings per sec: 32258
DoubleMetaphone encodings per sec: 31250
Soundex encodings per sec: 35714
RefinedSoundex encodings per sec: 34482
Caverphone encodings per sec: 5813
ColognePhonetic encodings per sec: 33333

So, Caverphone is much slower than any other algorithm. All others show off nearly the same performance.

Checked with the following code:

{code:java}
  public void checkSpeed() throws Exception {
	  checkSpeedEncoding("Metaphone", "easgasg", "ESKS");
	  checkSpeedEncoding("DoubleMetaphone", "easgasg", "ASKS");
	  checkSpeedEncoding("Soundex", "easgasg", "E220");
	  checkSpeedEncoding("RefinedSoundex", "easgasg", "E034034");
	  checkSpeedEncoding("Caverphone", "Carlene", "KLN1111111");
	  checkSpeedEncoding("ColognePhonetic", "Schmitt", "862");
  }
  
  private void checkSpeedEncoding(String encoder, String toBeEncoded, String estimated) throws Exception {
	  long start = System.currentTimeMillis();
	  for ( int i=0; i<REPEATS; i++) {
		    assertAlgorithm(encoder, "false", toBeEncoded,
		            new String[] { estimated });
	  }
	  long duration = System.currentTimeMillis()-start;
	  System.out.println(encoder + " encodings per sec: "+(REPEATS/(duration/1000)));
  }
{code}

_SNAP_

  was:
The current userguide (http://commons.apache.org/codec/userguide.html) just lists four Language Encoders, but there are five at the moment. CODEC-106 implements a sixth one.
Would be a good idea, to complete documentation.

Additionally, I suggest to extent the wiki (http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.PhoneticFilterFactory) in order to show a simple performance measurement:

_SNAP_

Metaphone encodings per sec: 32258
DoubleMetaphone encodings per sec: 31250
Soundex encodings per sec: 35714
RefinedSoundex encodings per sec: 34482
Caverphone encodings per sec: 5813
ColognePhonetic encodings per sec: 33333

So, Caverphone is much slower than any other algorithm. All others show off nearly the same performance.

Checked with the following code:

{code:java}
  public void checkSpeed() throws Exception {
	  checkSpeedEncoding("Metaphone", "easgasg", "ESKS");
	  checkSpeedEncoding("DoubleMetaphone", "easgasg", "ASKS");
	  checkSpeedEncoding("Soundex", "easgasg", "E220");
	  checkSpeedEncoding("RefinedSoundex", "easgasg", "E034034");
	  checkSpeedEncoding("Caverphone", "Carlene", "KLN1111111");
	  checkSpeedEncoding("ColognePhonetic", "Schmitt", "862");
  }
  
  private void checkSpeedEncoding(String encoder, String toBeEncoded, String estimated) throws Exception {
	  long start = System.currentTimeMillis();
	  for ( int i=0; i<REPEATS; i++) {
		    assertAlgorithm(encoder, "false", toBeEncoded,
		            new String[] { estimated });
	  }
	  long duration = System.currentTimeMillis()-start;
	  System.out.println(encoder + " encodings per sec: "+(REPEATS/(duration/1000)));
  }
{code}

_SNAP_


> Enhance documentation for Language Encoders
> -------------------------------------------
>
>                 Key: CODEC-107
>                 URL: https://issues.apache.org/jira/browse/CODEC-107
>             Project: Commons Codec
>          Issue Type: Improvement
>    Affects Versions: 1.4
>            Reporter: Marc Pompl
>            Priority: Minor
>             Fix For: 1.5
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The current userguide (http://commons.apache.org/codec/userguide.html) just lists four Language Encoders, but there are five at the moment. CODEC-106 implements a sixth one.
> Would be a good idea, to complete documentation.
> Additionally, I suggest to extent the userguide in order to show a simple performance measurement:
> _SNAP_
> Metaphone encodings per sec: 32258
> DoubleMetaphone encodings per sec: 31250
> Soundex encodings per sec: 35714
> RefinedSoundex encodings per sec: 34482
> Caverphone encodings per sec: 5813
> ColognePhonetic encodings per sec: 33333
> So, Caverphone is much slower than any other algorithm. All others show off nearly the same performance.
> Checked with the following code:
> {code:java}
>   public void checkSpeed() throws Exception {
> 	  checkSpeedEncoding("Metaphone", "easgasg", "ESKS");
> 	  checkSpeedEncoding("DoubleMetaphone", "easgasg", "ASKS");
> 	  checkSpeedEncoding("Soundex", "easgasg", "E220");
> 	  checkSpeedEncoding("RefinedSoundex", "easgasg", "E034034");
> 	  checkSpeedEncoding("Caverphone", "Carlene", "KLN1111111");
> 	  checkSpeedEncoding("ColognePhonetic", "Schmitt", "862");
>   }
>   
>   private void checkSpeedEncoding(String encoder, String toBeEncoded, String estimated) throws Exception {
> 	  long start = System.currentTimeMillis();
> 	  for ( int i=0; i<REPEATS; i++) {
> 		    assertAlgorithm(encoder, "false", toBeEncoded,
> 		            new String[] { estimated });
> 	  }
> 	  long duration = System.currentTimeMillis()-start;
> 	  System.out.println(encoder + " encodings per sec: "+(REPEATS/(duration/1000)));
>   }
> {code}
> _SNAP_

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CODEC-107) Enhance documentation for Language Encoders

Posted by "Marc Pompl (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CODEC-107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12990851#comment-12990851 ] 

Marc Pompl commented on CODEC-107:
----------------------------------

You mean I should provide a patch for documentation? Where do I find the "source" of it? Somewhere in repository? Where do I find it? I would  give it a try.

Regarding your assumption of performance comparisions, I agree with you, if---and only if---you are pinpointed to the "best encoder for your encoding needs". 
If you have to respect speed---or let's say responsiveness---as a key business value, then you have to tradeoff speed and accuracy, sometimes. As you now, your encoders are used in search engines like SOLR in a basic manner. The critical aspect is the indexing of searchable data. If your business case handles a lot of heavily changed data sets, it could really hurt the search performance.
So, in my opinion, it would be nice to have a clue in the documentation how fast every encoder performs at the big view. Otherwise, every application performance engineer has to write a tiny test scenario, like me.


> Enhance documentation for Language Encoders
> -------------------------------------------
>
>                 Key: CODEC-107
>                 URL: https://issues.apache.org/jira/browse/CODEC-107
>             Project: Commons Codec
>          Issue Type: Improvement
>    Affects Versions: 1.4
>            Reporter: Marc Pompl
>            Priority: Minor
>             Fix For: 1.5
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The current userguide (http://commons.apache.org/codec/userguide.html) just lists four Language Encoders, but there are five at the moment. CODEC-106 implements a sixth one.
> Would be a good idea, to complete documentation.
> Additionally, I suggest to extent the userguide in order to show a simple performance measurement:
> _SNIP_
> org.apache.commons.codec.language.Metaphone encodings per msec: 327
> org.apache.commons.codec.language.DoubleMetaphone encodings per msec: 224
> org.apache.commons.codec.language.Soundex encodings per msec: 904
> org.apache.commons.codec.language.RefinedSoundex encodings per msec: 637
> org.apache.commons.codec.language.Caverphone encodings per msec: 5
> org.apache.commons.codec.language.ColognePhonetic encodings per msec: 289
> So, Soundex is the fastest encoder. Caverphone is much slower than any other algorithm. All others show off nearly the same performance.
> Checked with the following code:
> {code:java}
>   private static final int REPEATS = 1000000;
>   public void checkSpeed() throws Exception {
> 	  checkSpeedEncoding(new Metaphone(), "easgasg", REPEATS);
> 	  checkSpeedEncoding(new DoubleMetaphone(), "easgasg", REPEATS);
> 	  checkSpeedEncoding(new Soundex(), "easgasg", REPEATS);
> 	  checkSpeedEncoding(new RefinedSoundex(), "easgasg", REPEATS);
> 	  checkSpeedEncoding(new Caverphone(), "Carlene", 100000);
> 	  checkSpeedEncoding(new ColognePhonetic(), "Schmitt", REPEATS);
>   }
>   
>   private void checkSpeedEncoding(Encoder encoder, String toBeEncoded, int repeats) throws Exception {
> 	  long start = System.currentTimeMillis();
> 	  for ( int i=0; i<repeats; i++) {
> 		    encoder.encode(toBeEncoded);
> 	  }
> 	  long duration = System.currentTimeMillis()-start;
> 	  System.out.println(encoder.getClass().getName() + " encodings per msec: "+(repeats/duration));
>   }
> {code}
> _SNAP_

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Closed] (CODEC-107) Enhance documentation for Language Encoders

Posted by "Gary D. Gregory (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CODEC-107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Gary D. Gregory closed CODEC-107.
---------------------------------

    Resolution: Won't Fix

> Enhance documentation for Language Encoders
> -------------------------------------------
>
>                 Key: CODEC-107
>                 URL: https://issues.apache.org/jira/browse/CODEC-107
>             Project: Commons Codec
>          Issue Type: Improvement
>    Affects Versions: 1.4
>            Reporter: Marc Pompl
>            Priority: Minor
>             Fix For: 1.5
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The current userguide (http://commons.apache.org/codec/userguide.html) just lists four Language Encoders, but there are five at the moment. CODEC-106 implements a sixth one.
> Would be a good idea, to complete documentation.
> Additionally, I suggest to extent the userguide in order to show a simple performance measurement:
> _SNIP_
> org.apache.commons.codec.language.Metaphone encodings per msec: 327
> org.apache.commons.codec.language.DoubleMetaphone encodings per msec: 224
> org.apache.commons.codec.language.Soundex encodings per msec: 904
> org.apache.commons.codec.language.RefinedSoundex encodings per msec: 637
> org.apache.commons.codec.language.Caverphone encodings per msec: 5
> org.apache.commons.codec.language.ColognePhonetic encodings per msec: 289
> So, Soundex is the fastest encoder. Caverphone is much slower than any other algorithm. All others show off nearly the same performance.
> Checked with the following code:
> {code:java}
>   private static final int REPEATS = 1000000;
>   public void checkSpeed() throws Exception {
> 	  checkSpeedEncoding(new Metaphone(), "easgasg", REPEATS);
> 	  checkSpeedEncoding(new DoubleMetaphone(), "easgasg", REPEATS);
> 	  checkSpeedEncoding(new Soundex(), "easgasg", REPEATS);
> 	  checkSpeedEncoding(new RefinedSoundex(), "easgasg", REPEATS);
> 	  checkSpeedEncoding(new Caverphone(), "Carlene", 100000);
> 	  checkSpeedEncoding(new ColognePhonetic(), "Schmitt", REPEATS);
>   }
>   
>   private void checkSpeedEncoding(Encoder encoder, String toBeEncoded, int repeats) throws Exception {
> 	  long start = System.currentTimeMillis();
> 	  for ( int i=0; i<repeats; i++) {
> 		    encoder.encode(toBeEncoded);
> 	  }
> 	  long duration = System.currentTimeMillis()-start;
> 	  System.out.println(encoder.getClass().getName() + " encodings per msec: "+(repeats/duration));
>   }
> {code}
> _SNAP_

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (CODEC-107) Enhance documentation for Language Encoders

Posted by "Gary Gregory (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CODEC-107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12990896#comment-12990896 ] 

Gary Gregory commented on CODEC-107:
------------------------------------

Hi Marc,

Thank you for pitching in! :)

You start by can downloading the project from SVN. The user's guide is in {{src/site/xdoc/userguide.xml}}. The directory {{src/site/xdoc/}} contains other documentation as well.

Javadoc package-level documentation is in {{src/java/org/apache/commons/codec/language/package.html}}

Finally, each class Javadoc can also be improved if needed.

If performance numbers are documented, the report must (IMO) include a description of the hardware and software environment.

> Enhance documentation for Language Encoders
> -------------------------------------------
>
>                 Key: CODEC-107
>                 URL: https://issues.apache.org/jira/browse/CODEC-107
>             Project: Commons Codec
>          Issue Type: Improvement
>    Affects Versions: 1.4
>            Reporter: Marc Pompl
>            Priority: Minor
>             Fix For: 1.5
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The current userguide (http://commons.apache.org/codec/userguide.html) just lists four Language Encoders, but there are five at the moment. CODEC-106 implements a sixth one.
> Would be a good idea, to complete documentation.
> Additionally, I suggest to extent the userguide in order to show a simple performance measurement:
> _SNIP_
> org.apache.commons.codec.language.Metaphone encodings per msec: 327
> org.apache.commons.codec.language.DoubleMetaphone encodings per msec: 224
> org.apache.commons.codec.language.Soundex encodings per msec: 904
> org.apache.commons.codec.language.RefinedSoundex encodings per msec: 637
> org.apache.commons.codec.language.Caverphone encodings per msec: 5
> org.apache.commons.codec.language.ColognePhonetic encodings per msec: 289
> So, Soundex is the fastest encoder. Caverphone is much slower than any other algorithm. All others show off nearly the same performance.
> Checked with the following code:
> {code:java}
>   private static final int REPEATS = 1000000;
>   public void checkSpeed() throws Exception {
> 	  checkSpeedEncoding(new Metaphone(), "easgasg", REPEATS);
> 	  checkSpeedEncoding(new DoubleMetaphone(), "easgasg", REPEATS);
> 	  checkSpeedEncoding(new Soundex(), "easgasg", REPEATS);
> 	  checkSpeedEncoding(new RefinedSoundex(), "easgasg", REPEATS);
> 	  checkSpeedEncoding(new Caverphone(), "Carlene", 100000);
> 	  checkSpeedEncoding(new ColognePhonetic(), "Schmitt", REPEATS);
>   }
>   
>   private void checkSpeedEncoding(Encoder encoder, String toBeEncoded, int repeats) throws Exception {
> 	  long start = System.currentTimeMillis();
> 	  for ( int i=0; i<repeats; i++) {
> 		    encoder.encode(toBeEncoded);
> 	  }
> 	  long duration = System.currentTimeMillis()-start;
> 	  System.out.println(encoder.getClass().getName() + " encodings per msec: "+(repeats/duration));
>   }
> {code}
> _SNAP_

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Updated: (CODEC-107) Enhance documentation for Language Encoders

Posted by "Marc Pompl (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CODEC-107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Marc Pompl updated CODEC-107:
-----------------------------

    Description: 
The current userguide (http://commons.apache.org/codec/userguide.html) just lists four Language Encoders, but there are five at the moment. CODEC-106 implements a sixth one.
Would be a good idea, to complete documentation.

Additionally, I suggest to extent the userguide in order to show a simple performance measurement:

_SNAP_

org.apache.commons.codec.language.Metaphone encodings per msec: 327
org.apache.commons.codec.language.DoubleMetaphone encodings per msec: 224
org.apache.commons.codec.language.Soundex encodings per msec: 904
org.apache.commons.codec.language.RefinedSoundex encodings per msec: 637
org.apache.commons.codec.language.Caverphone encodings per msec: 5
org.apache.commons.codec.language.ColognePhonetic encodings per msec: 289

So, Soundex is the fastest encoder. Caverphone is much slower than any other algorithm. All others show off nearly the same performance.

Checked with the following code:

{code:java}
  public void checkSpeed() throws Exception {
	  checkSpeedEncoding(new Metaphone(), "easgasg", REPEATS);
	  checkSpeedEncoding(new DoubleMetaphone(), "easgasg", REPEATS);
	  checkSpeedEncoding(new Soundex(), "easgasg", REPEATS);
	  checkSpeedEncoding(new RefinedSoundex(), "easgasg", REPEATS);
	  checkSpeedEncoding(new Caverphone(), "Carlene", 100000);
	  checkSpeedEncoding(new ColognePhonetic(), "Schmitt", REPEATS);
  }
  
  private void checkSpeedEncoding(Encoder encoder, String toBeEncoded, int repeats) throws Exception {
	  long start = System.currentTimeMillis();
	  for ( int i=0; i<repeats; i++) {
		    encoder.encode(toBeEncoded);
	  }
	  long duration = System.currentTimeMillis()-start;
	  System.out.println(encoder.getClass().getName() + " encodings per msec: "+(repeats/duration));
  }
{code}

_SNAP_

  was:
The current userguide (http://commons.apache.org/codec/userguide.html) just lists four Language Encoders, but there are five at the moment. CODEC-106 implements a sixth one.
Would be a good idea, to complete documentation.

Additionally, I suggest to extent the userguide in order to show a simple performance measurement:

_SNAP_

Metaphone encodings per sec: 32258
DoubleMetaphone encodings per sec: 31250
Soundex encodings per sec: 35714
RefinedSoundex encodings per sec: 34482
Caverphone encodings per sec: 5813
ColognePhonetic encodings per sec: 33333

So, Caverphone is much slower than any other algorithm. All others show off nearly the same performance.

Checked with the following code:

{code:java}
  public void checkSpeed() throws Exception {
	  checkSpeedEncoding("Metaphone", "easgasg", "ESKS");
	  checkSpeedEncoding("DoubleMetaphone", "easgasg", "ASKS");
	  checkSpeedEncoding("Soundex", "easgasg", "E220");
	  checkSpeedEncoding("RefinedSoundex", "easgasg", "E034034");
	  checkSpeedEncoding("Caverphone", "Carlene", "KLN1111111");
	  checkSpeedEncoding("ColognePhonetic", "Schmitt", "862");
  }
  
  private void checkSpeedEncoding(String encoder, String toBeEncoded, String estimated) throws Exception {
	  long start = System.currentTimeMillis();
	  for ( int i=0; i<REPEATS; i++) {
		    assertAlgorithm(encoder, "false", toBeEncoded,
		            new String[] { estimated });
	  }
	  long duration = System.currentTimeMillis()-start;
	  System.out.println(encoder + " encodings per sec: "+(REPEATS/(duration/1000)));
  }
{code}

_SNAP_


> Enhance documentation for Language Encoders
> -------------------------------------------
>
>                 Key: CODEC-107
>                 URL: https://issues.apache.org/jira/browse/CODEC-107
>             Project: Commons Codec
>          Issue Type: Improvement
>    Affects Versions: 1.4
>            Reporter: Marc Pompl
>            Priority: Minor
>             Fix For: 1.5
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The current userguide (http://commons.apache.org/codec/userguide.html) just lists four Language Encoders, but there are five at the moment. CODEC-106 implements a sixth one.
> Would be a good idea, to complete documentation.
> Additionally, I suggest to extent the userguide in order to show a simple performance measurement:
> _SNAP_
> org.apache.commons.codec.language.Metaphone encodings per msec: 327
> org.apache.commons.codec.language.DoubleMetaphone encodings per msec: 224
> org.apache.commons.codec.language.Soundex encodings per msec: 904
> org.apache.commons.codec.language.RefinedSoundex encodings per msec: 637
> org.apache.commons.codec.language.Caverphone encodings per msec: 5
> org.apache.commons.codec.language.ColognePhonetic encodings per msec: 289
> So, Soundex is the fastest encoder. Caverphone is much slower than any other algorithm. All others show off nearly the same performance.
> Checked with the following code:
> {code:java}
>   public void checkSpeed() throws Exception {
> 	  checkSpeedEncoding(new Metaphone(), "easgasg", REPEATS);
> 	  checkSpeedEncoding(new DoubleMetaphone(), "easgasg", REPEATS);
> 	  checkSpeedEncoding(new Soundex(), "easgasg", REPEATS);
> 	  checkSpeedEncoding(new RefinedSoundex(), "easgasg", REPEATS);
> 	  checkSpeedEncoding(new Caverphone(), "Carlene", 100000);
> 	  checkSpeedEncoding(new ColognePhonetic(), "Schmitt", REPEATS);
>   }
>   
>   private void checkSpeedEncoding(Encoder encoder, String toBeEncoded, int repeats) throws Exception {
> 	  long start = System.currentTimeMillis();
> 	  for ( int i=0; i<repeats; i++) {
> 		    encoder.encode(toBeEncoded);
> 	  }
> 	  long duration = System.currentTimeMillis()-start;
> 	  System.out.println(encoder.getClass().getName() + " encodings per msec: "+(repeats/duration));
>   }
> {code}
> _SNAP_

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (CODEC-107) Enhance documentation for Language Encoders

Posted by "Marc Pompl (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CODEC-107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Marc Pompl updated CODEC-107:
-----------------------------

    Description: 
The current userguide (http://commons.apache.org/codec/userguide.html) just lists four Language Encoders, but there are five at the moment. CODEC-106 implements a sixth one.
Would be a good idea, to complete documentation.

Additionally, I suggest to extent the userguide in order to show a simple performance measurement:

_SNIP_

org.apache.commons.codec.language.Metaphone encodings per msec: 327
org.apache.commons.codec.language.DoubleMetaphone encodings per msec: 224
org.apache.commons.codec.language.Soundex encodings per msec: 904
org.apache.commons.codec.language.RefinedSoundex encodings per msec: 637
org.apache.commons.codec.language.Caverphone encodings per msec: 5
org.apache.commons.codec.language.ColognePhonetic encodings per msec: 289

So, Soundex is the fastest encoder. Caverphone is much slower than any other algorithm. All others show off nearly the same performance.

Checked with the following code:

{code:java}
  private static final int REPEATS = 1000000;

  public void checkSpeed() throws Exception {
	  checkSpeedEncoding(new Metaphone(), "easgasg", REPEATS);
	  checkSpeedEncoding(new DoubleMetaphone(), "easgasg", REPEATS);
	  checkSpeedEncoding(new Soundex(), "easgasg", REPEATS);
	  checkSpeedEncoding(new RefinedSoundex(), "easgasg", REPEATS);
	  checkSpeedEncoding(new Caverphone(), "Carlene", 100000);
	  checkSpeedEncoding(new ColognePhonetic(), "Schmitt", REPEATS);
  }
  
  private void checkSpeedEncoding(Encoder encoder, String toBeEncoded, int repeats) throws Exception {
	  long start = System.currentTimeMillis();
	  for ( int i=0; i<repeats; i++) {
		    encoder.encode(toBeEncoded);
	  }
	  long duration = System.currentTimeMillis()-start;
	  System.out.println(encoder.getClass().getName() + " encodings per msec: "+(repeats/duration));
  }
{code}

_SNAP_

  was:
The current userguide (http://commons.apache.org/codec/userguide.html) just lists four Language Encoders, but there are five at the moment. CODEC-106 implements a sixth one.
Would be a good idea, to complete documentation.

Additionally, I suggest to extent the userguide in order to show a simple performance measurement:

_SNAP_

org.apache.commons.codec.language.Metaphone encodings per msec: 327
org.apache.commons.codec.language.DoubleMetaphone encodings per msec: 224
org.apache.commons.codec.language.Soundex encodings per msec: 904
org.apache.commons.codec.language.RefinedSoundex encodings per msec: 637
org.apache.commons.codec.language.Caverphone encodings per msec: 5
org.apache.commons.codec.language.ColognePhonetic encodings per msec: 289

So, Soundex is the fastest encoder. Caverphone is much slower than any other algorithm. All others show off nearly the same performance.

Checked with the following code:

{code:java}
  public void checkSpeed() throws Exception {
	  checkSpeedEncoding(new Metaphone(), "easgasg", REPEATS);
	  checkSpeedEncoding(new DoubleMetaphone(), "easgasg", REPEATS);
	  checkSpeedEncoding(new Soundex(), "easgasg", REPEATS);
	  checkSpeedEncoding(new RefinedSoundex(), "easgasg", REPEATS);
	  checkSpeedEncoding(new Caverphone(), "Carlene", 100000);
	  checkSpeedEncoding(new ColognePhonetic(), "Schmitt", REPEATS);
  }
  
  private void checkSpeedEncoding(Encoder encoder, String toBeEncoded, int repeats) throws Exception {
	  long start = System.currentTimeMillis();
	  for ( int i=0; i<repeats; i++) {
		    encoder.encode(toBeEncoded);
	  }
	  long duration = System.currentTimeMillis()-start;
	  System.out.println(encoder.getClass().getName() + " encodings per msec: "+(repeats/duration));
  }
{code}

_SNAP_


> Enhance documentation for Language Encoders
> -------------------------------------------
>
>                 Key: CODEC-107
>                 URL: https://issues.apache.org/jira/browse/CODEC-107
>             Project: Commons Codec
>          Issue Type: Improvement
>    Affects Versions: 1.4
>            Reporter: Marc Pompl
>            Priority: Minor
>             Fix For: 1.5
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The current userguide (http://commons.apache.org/codec/userguide.html) just lists four Language Encoders, but there are five at the moment. CODEC-106 implements a sixth one.
> Would be a good idea, to complete documentation.
> Additionally, I suggest to extent the userguide in order to show a simple performance measurement:
> _SNIP_
> org.apache.commons.codec.language.Metaphone encodings per msec: 327
> org.apache.commons.codec.language.DoubleMetaphone encodings per msec: 224
> org.apache.commons.codec.language.Soundex encodings per msec: 904
> org.apache.commons.codec.language.RefinedSoundex encodings per msec: 637
> org.apache.commons.codec.language.Caverphone encodings per msec: 5
> org.apache.commons.codec.language.ColognePhonetic encodings per msec: 289
> So, Soundex is the fastest encoder. Caverphone is much slower than any other algorithm. All others show off nearly the same performance.
> Checked with the following code:
> {code:java}
>   private static final int REPEATS = 1000000;
>   public void checkSpeed() throws Exception {
> 	  checkSpeedEncoding(new Metaphone(), "easgasg", REPEATS);
> 	  checkSpeedEncoding(new DoubleMetaphone(), "easgasg", REPEATS);
> 	  checkSpeedEncoding(new Soundex(), "easgasg", REPEATS);
> 	  checkSpeedEncoding(new RefinedSoundex(), "easgasg", REPEATS);
> 	  checkSpeedEncoding(new Caverphone(), "Carlene", 100000);
> 	  checkSpeedEncoding(new ColognePhonetic(), "Schmitt", REPEATS);
>   }
>   
>   private void checkSpeedEncoding(Encoder encoder, String toBeEncoded, int repeats) throws Exception {
> 	  long start = System.currentTimeMillis();
> 	  for ( int i=0; i<repeats; i++) {
> 		    encoder.encode(toBeEncoded);
> 	  }
> 	  long duration = System.currentTimeMillis()-start;
> 	  System.out.println(encoder.getClass().getName() + " encodings per msec: "+(repeats/duration));
>   }
> {code}
> _SNAP_

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] [Updated] (CODEC-107) Enhance documentation for Language Encoders

Posted by "Gary D. Gregory (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CODEC-107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Gary D. Gregory updated CODEC-107:
----------------------------------

    Fix Version/s:     (was: 1.5)

We considered this for 1.5 and decided against it.

> Enhance documentation for Language Encoders
> -------------------------------------------
>
>                 Key: CODEC-107
>                 URL: https://issues.apache.org/jira/browse/CODEC-107
>             Project: Commons Codec
>          Issue Type: Improvement
>    Affects Versions: 1.4
>            Reporter: Marc Pompl
>            Priority: Minor
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The current userguide (http://commons.apache.org/codec/userguide.html) just lists four Language Encoders, but there are five at the moment. CODEC-106 implements a sixth one.
> Would be a good idea, to complete documentation.
> Additionally, I suggest to extent the userguide in order to show a simple performance measurement:
> _SNIP_
> org.apache.commons.codec.language.Metaphone encodings per msec: 327
> org.apache.commons.codec.language.DoubleMetaphone encodings per msec: 224
> org.apache.commons.codec.language.Soundex encodings per msec: 904
> org.apache.commons.codec.language.RefinedSoundex encodings per msec: 637
> org.apache.commons.codec.language.Caverphone encodings per msec: 5
> org.apache.commons.codec.language.ColognePhonetic encodings per msec: 289
> So, Soundex is the fastest encoder. Caverphone is much slower than any other algorithm. All others show off nearly the same performance.
> Checked with the following code:
> {code:java}
>   private static final int REPEATS = 1000000;
>   public void checkSpeed() throws Exception {
> 	  checkSpeedEncoding(new Metaphone(), "easgasg", REPEATS);
> 	  checkSpeedEncoding(new DoubleMetaphone(), "easgasg", REPEATS);
> 	  checkSpeedEncoding(new Soundex(), "easgasg", REPEATS);
> 	  checkSpeedEncoding(new RefinedSoundex(), "easgasg", REPEATS);
> 	  checkSpeedEncoding(new Caverphone(), "Carlene", 100000);
> 	  checkSpeedEncoding(new ColognePhonetic(), "Schmitt", REPEATS);
>   }
>   
>   private void checkSpeedEncoding(Encoder encoder, String toBeEncoded, int repeats) throws Exception {
> 	  long start = System.currentTimeMillis();
> 	  for ( int i=0; i<repeats; i++) {
> 		    encoder.encode(toBeEncoded);
> 	  }
> 	  long duration = System.currentTimeMillis()-start;
> 	  System.out.println(encoder.getClass().getName() + " encodings per msec: "+(repeats/duration));
>   }
> {code}
> _SNAP_

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (CODEC-107) Enhance documentation for Language Encoders

Posted by "Gary Gregory (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CODEC-107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12985467#action_12985467 ] 

Gary Gregory commented on CODEC-107:
------------------------------------

Feel free to provide a patch. 

Personally, I do not see the point of providing performance comparisons of language encoders. So I would not include that part of the docs (my opinion.)

> Enhance documentation for Language Encoders
> -------------------------------------------
>
>                 Key: CODEC-107
>                 URL: https://issues.apache.org/jira/browse/CODEC-107
>             Project: Commons Codec
>          Issue Type: Improvement
>    Affects Versions: 1.4
>            Reporter: Marc Pompl
>            Priority: Minor
>             Fix For: 1.5
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The current userguide (http://commons.apache.org/codec/userguide.html) just lists four Language Encoders, but there are five at the moment. CODEC-106 implements a sixth one.
> Would be a good idea, to complete documentation.
> Additionally, I suggest to extent the userguide in order to show a simple performance measurement:
> _SNIP_
> org.apache.commons.codec.language.Metaphone encodings per msec: 327
> org.apache.commons.codec.language.DoubleMetaphone encodings per msec: 224
> org.apache.commons.codec.language.Soundex encodings per msec: 904
> org.apache.commons.codec.language.RefinedSoundex encodings per msec: 637
> org.apache.commons.codec.language.Caverphone encodings per msec: 5
> org.apache.commons.codec.language.ColognePhonetic encodings per msec: 289
> So, Soundex is the fastest encoder. Caverphone is much slower than any other algorithm. All others show off nearly the same performance.
> Checked with the following code:
> {code:java}
>   private static final int REPEATS = 1000000;
>   public void checkSpeed() throws Exception {
> 	  checkSpeedEncoding(new Metaphone(), "easgasg", REPEATS);
> 	  checkSpeedEncoding(new DoubleMetaphone(), "easgasg", REPEATS);
> 	  checkSpeedEncoding(new Soundex(), "easgasg", REPEATS);
> 	  checkSpeedEncoding(new RefinedSoundex(), "easgasg", REPEATS);
> 	  checkSpeedEncoding(new Caverphone(), "Carlene", 100000);
> 	  checkSpeedEncoding(new ColognePhonetic(), "Schmitt", REPEATS);
>   }
>   
>   private void checkSpeedEncoding(Encoder encoder, String toBeEncoded, int repeats) throws Exception {
> 	  long start = System.currentTimeMillis();
> 	  for ( int i=0; i<repeats; i++) {
> 		    encoder.encode(toBeEncoded);
> 	  }
> 	  long duration = System.currentTimeMillis()-start;
> 	  System.out.println(encoder.getClass().getName() + " encodings per msec: "+(repeats/duration));
>   }
> {code}
> _SNAP_

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.