You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Ian Boston <ie...@tfd.co.uk> on 2007/04/29 11:55:08 UTC

A Faster ISOLatin1AccentFilter

Hi,

We've been using this filter in a project and found it a bit slow, so 
we've re-written it. In tests we did on a 5M string with no accents, we 
found the version in trunk to take ~200ms, this version takes about 
~12ms. If there are accents, then its not quite as good at about 60ms. 
(MacBookPro)

Code is at
https://saffron.caret.cam.ac.uk/svn/projects/darwincorresp/trunk/darwin-analyzer/src/main/java/uk/ac/cam/caret/darwin/lucene/ISOLatin1AccentFilter.java

BTW, just replacing StringBuffer with StringBuilder got down to about 
80ms for both cases.

Ian

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: A Faster ISOLatin1AccentFilter

Posted by Ian Boston <ie...@tfd.co.uk>.
Done,
https://issues.apache.org/jira/browse/LUCENE-871

Ian

Yonik Seeley wrote:
> Thanks Ian, nice improvement!
> Could you open a lucene JIRA issue and provide a patch?
> (that helps with other things such as IP tracking)
> 
> -Yonik
> 
> On 4/29/07, Ian Boston <ie...@tfd.co.uk> wrote:
>> Hi,
>>
>> We've been using this filter in a project and found it a bit slow, so
>> we've re-written it. In tests we did on a 5M string with no accents, we
>> found the version in trunk to take ~200ms, this version takes about
>> ~12ms. If there are accents, then its not quite as good at about 60ms.
>> (MacBookPro)
>>
>> Code is at
>> https://saffron.caret.cam.ac.uk/svn/projects/darwincorresp/trunk/darwin-analyzer/src/main/java/uk/ac/cam/caret/darwin/lucene/ISOLatin1AccentFilter.java 
>>
>>
>> BTW, just replacing StringBuffer with StringBuilder got down to about
>> 80ms for both cases.
>>
>> Ian
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: A Faster ISOLatin1AccentFilter

Posted by Yonik Seeley <yo...@apache.org>.
Thanks Ian, nice improvement!
Could you open a lucene JIRA issue and provide a patch?
(that helps with other things such as IP tracking)

-Yonik

On 4/29/07, Ian Boston <ie...@tfd.co.uk> wrote:
> Hi,
>
> We've been using this filter in a project and found it a bit slow, so
> we've re-written it. In tests we did on a 5M string with no accents, we
> found the version in trunk to take ~200ms, this version takes about
> ~12ms. If there are accents, then its not quite as good at about 60ms.
> (MacBookPro)
>
> Code is at
> https://saffron.caret.cam.ac.uk/svn/projects/darwincorresp/trunk/darwin-analyzer/src/main/java/uk/ac/cam/caret/darwin/lucene/ISOLatin1AccentFilter.java
>
> BTW, just replacing StringBuffer with StringBuilder got down to about
> 80ms for both cases.
>
> Ian

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org