You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Steven Rowe (JIRA)" <ji...@apache.org> on 2009/06/28 07:06:47 UTC

[jira] Commented: (LUCENE-1719) Add javadoc notes about ICUCollationKeyFilter's speed advantage over CollationKeyFilter

    [ https://issues.apache.org/jira/browse/LUCENE-1719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12724923#action_12724923 ] 

Steven Rowe commented on LUCENE-1719:
-------------------------------------

I also tested ICU4J version 4.2 (released 6 weeks ago), and the timings were nearly identical to those from ICU4J version 4.0 (the one that's in contrib/collation/lib/).

The timings given in the table above were not produced with the "-server" option to the JVM.  I separately tested all combinations using the "-server" option, but there was no difference for the 32-bit JVMs, though roughly 3-4% faster for the 64-bit JVMs.  I got the impression (didn't actually calculate) that although the best times of 5 runs were better for the 64-bit JVMs when using the "-server" option, the average times seemed to be slightly worse.  In any case, the performance improvement of the ICU4J implementation over the java.text.Collator implementation was basically unaffected by the use of the "-server" JVM option.


> Add javadoc notes about ICUCollationKeyFilter's speed advantage over CollationKeyFilter
> ---------------------------------------------------------------------------------------
>
>                 Key: LUCENE-1719
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1719
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: contrib/*
>    Affects Versions: 2.4.1
>            Reporter: Steven Rowe
>            Priority: Trivial
>             Fix For: 2.9
>
>         Attachments: LUCENE-1719.patch
>
>
> contrib/collation's ICUCollationKeyFilter, which uses ICU4J collation, is faster than CollationKeyFilter, the JVM-provided java.text.Collator implementation in the same package.  The javadocs of these classes should be modified to add a note to this effect.
> My curiosity was piqued by [Robert Muir's comment|https://issues.apache.org/jira/browse/LUCENE-1581?focusedCommentId=12720300#action_12720300] on LUCENE-1581, in which he states that ICUCollationKeyFilter is up to 30x faster than CollationKeyFilter.
> I timed the operation of these two classes, with Sun JVM versions 1.4.2/32-bit, 1.5.0/32- and 64-bit, and 1.6.0/64-bit, using 90k word lists of 4 languages (taken from the corresponding Debian wordlist packages and truncated to the first 90k words after a fixed random shuffling), using Collators at the default strength, on a Windows Vista 64-bit machine.  I used an analysis pipeline consisting of WhitespaceTokenizer chained to the collation key filter, so to isolate the time taken by the collation key filters, I also timed WhitespaceTokenizer operating alone for each combination.  The rightmost column represents the performance advantage of the ICU4J implemtation (ICU) over the java.text.Collator implementation (JVM), after discounting the WhitespaceTokenizer time (WST): (ICU-WST) / (JVM-WST). The best times out of 5 runs for each combination, in milliseconds, are as follows:
> ||Sun JVM||Language||java.text||ICU4J||WhitespaceTokenizer||ICU4J Improvement||
> |1.4.2_17 (32 bit)|English|522|212|13|2.6x|
> |1.4.2_17 (32 bit)|French|716|243|14|3.1x|
> |1.4.2_17 (32 bit)|German|669|264|16|2.6x|
> |1.4.2_17 (32 bit)|Ukranian|931|474|25|2.0x|
> |1.5.0_15 (32 bit)|English|604|176|16|3.7x|
> |1.5.0_15 (32 bit)|French|817|209|17|4.2x|
> |1.5.0_15 (32 bit)|German|799|225|20|3.8x|
> |1.5.0_15 (32 bit)|Ukranian|1029|436|26|2.4x|
> |1.5.0_15 (64 bit)|English|431|89|10|5.3x|
> |1.5.0_15 (64 bit)|French|562|112|11|5.5x|
> |1.5.0_15 (64 bit)|German|567|116|13|5.4x|
> |1.5.0_15 (64 bit)|Ukranian|734|281|21|2.7x|
> |1.6.0_13 (64 bit)|English|162|81|9|2.1x|
> |1.6.0_13 (64 bit)|French|192|92|10|2.2x|
> |1.6.0_13 (64 bit)|German|204|99|14|2.2x|
> |1.6.0_13 (64 bit)|Ukranian|273|202|21|1.4x|

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org