You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucenenet.apache.org by Ravi Patel <rp...@live.com> on 2010/05/18 15:33:34 UTC

lucene performance questions

 

I have a bunch of fields that have single values such as "date", "id", "flagged"

 

I've noticed that if I Index Tokenize them, my queries are much faster than if they are Untokenized.


In My query, I'm using a BooleanQuery or RangeFilter/Query and querying/sorting/filterling based on these values.

Example uses:

SortField minuteSort = new SortField("date", SortField.STRING, reverse);

filter = new RangeFilter("id", lowerId, upperId, false, false);

booleanQuery.Add(new TermQuery(new Term("flagged", "true")), BooleanClause.Occur.MUST_NOT);

 

Two Questions:

1.  Is there a cost at search-time in making fields Tokenized that don't need to be?  I assume there's a cost at Index time, but I'm not too worried about the Index cost.

2.  Should fields that are used in my 3 example lines above by Tokenized?  If not, why am I seeing a huge performance difference when they are UnTokenized?  I'm really not running any queries that require some sort of analysis on these fields other than that they are indexed as-s
 		 	   		  
_________________________________________________________________
The New Busy think 9 to 5 is a cute idea. Combine multiple calendars with Hotmail. 
http://www.windowslive.com/campaign/thenewbusy?tile=multicalendar&ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_5

RE: lucene performance questions

Posted by Digy <di...@gmail.com>.
Whether you tokenize them or not, there shouldn't be any performance change.
(ignoring the parsing of a few words of user's query)
Is this some kind of XY problem
(http://dictionary.babylon.com/xy%20problem/)

DIGY



-----Original Message-----
From: Ravi Patel [mailto:rpatel4@live.com] 
Sent: Tuesday, May 18, 2010 4:34 PM
To: lucene-net-dev@lucene.apache.org
Subject: lucene performance questions


 

I have a bunch of fields that have single values such as "date", "id",
"flagged"

 

I've noticed that if I Index Tokenize them, my queries are much faster than
if they are Untokenized.


In My query, I'm using a BooleanQuery or RangeFilter/Query and
querying/sorting/filterling based on these values.

Example uses:

SortField minuteSort = new SortField("date", SortField.STRING, reverse);

filter = new RangeFilter("id", lowerId, upperId, false, false);

booleanQuery.Add(new TermQuery(new Term("flagged", "true")),
BooleanClause.Occur.MUST_NOT);

 

Two Questions:

1.  Is there a cost at search-time in making fields Tokenized that don't
need to be?  I assume there's a cost at Index time, but I'm not too worried
about the Index cost.

2.  Should fields that are used in my 3 example lines above by Tokenized?
If not, why am I seeing a huge performance difference when they are
UnTokenized?  I'm really not running any queries that require some sort of
analysis on these fields other than that they are indexed as-s
 		 	   		  
_________________________________________________________________
The New Busy think 9 to 5 is a cute idea. Combine multiple calendars with
Hotmail. 
http://www.windowslive.com/campaign/thenewbusy?tile=multicalendar&ocid=PID28
326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_5


Re: lucene performance questions

Posted by Robert Jordan <ro...@gmx.net>.
On 18.05.2010 15:33, Ravi Patel wrote:
> Two Questions:
>
> 1.  Is there a cost at search-time in making fields Tokenized that
> don't need to be?  I assume there's a cost at Index time, but I'm not
> too worried about the Index cost.

Which analyzer are your using at index time?

>
> 2.  Should fields that are used in my 3 example lines above by
> Tokenized?  If not, why am I seeing a huge performance difference
> when they are UnTokenized?  I'm really not running any queries that
> require some sort of analysis on these fields other than that they
> are indexed as-s

They should not be tokenized/analyzed, but this is highly dependent
on the analyzer.

In general, an ID-like field is never tokenized, unless the analyzer
is a bijective function.

Robert