You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by ma...@yahoo.co.uk on 2004/04/01 10:29:59 UTC

Re: Performance of hit highlighting and finding term positions for

730 msecs is the correct number for 10 * 16k docs with StandardTokenizer! 
The 11ms per doc figure in my post was for highlighlighting using a lower-case-filter-only analyzer.
5ms of this figure was the cost of the lower-case-filter-only analyzer.

73 msecs is the cost of JUST StandardTokenizer (no highlighting)
StandardAnalyzer uses StandardTokenizer so is probably used in a lot of apps. It tries to keep certain text eg email addresses as one term.
I can live without it and I suspect most apps can too. I haven't looked into why its slow but I notice it does make use of Vectors.
I think a lot of people's highlighter performance issues may extend from this.

Cheers
Mark

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org