You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by Eleanor Joslin <ej...@decisionsoft.com> on 2008/02/01 00:52:16 UTC

Using a QueryParser with an untokenized field?

In my Lucene index there's a field that contains the local names of XML 
elements, one name per document.  Users can enter arbitrary queries for 
this field, so I'm using a QueryParser.

>From reading around it looks as if the field needs to be tokenized, but 
since the field's content is always a single term, is this really 
necessary?  What difference does it make?  I know that the QueryParser has 
to use a token stream, but for this field, tokenizing ought to be a no-op. 
Or am I missing something?

Thanks,

Eleanor Joslin

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Using a QueryParser with an untokenized field?

Posted by Chris Hostetter <ho...@fucit.org>.

: Thank you, this was exactly what I needed.  So "tokenizing" really denotes a
: more general process that can involve normalizing the case or whatever else
: can be done with a filter.  This is where I was confused.

When constructing a Document/Field "TOKENIZED" really referes to the 
broader sense of "Analysis" (ie: should the Analyzer be used on the value)



-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Using a QueryParser with an untokenized field?

Posted by Eleanor Joslin <ej...@decisionsoft.com>.

Thank you, this was exactly what I needed.  So "tokenizing" really 
denotes a more general process that can involve normalizing the case or 
whatever else can be done with a filter.  This is where I was confused.

Eleanor

Jan Peter Stotz wrote:
> Hi Eleanor.
> 
>> In my Lucene index there's a field that contains the local names of 
>> XML elements, one name per document.  Users can enter arbitrary 
>> queries for this field, so I'm using a QueryParser.
> 
>> From reading around it looks as if the field needs to be tokenized, 
>> but since the field's content is always a single term, is this really 
>> necessary?  
> 
> You are right, your field is already tokenized, but from what I know the 
> main difference is that untokenized fields do not pass your selected 
> analyzer when being added to the index. If your analyzer for example 
> incorporates the LowerCaseFilter,  the field will be converted into 
> lower case before it is indexed. When using the same analyzer for your 
> QueryParser this will allow you to perform case insensitive query.
> 
> If you add the field untokenized and your Analyzer (at query time) 
> incorporates the LowerCaseFilter, you will be unable find elements that 
> contain upper characters.
> 
> Jan
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 


-- 
Eleanor Joslin, Software Development   DecisionSoft Ltd.
Telephone: +44-1865-203192             http://www.decisionsoft.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Using a QueryParser with an untokenized field?

Posted by Jan Peter Stotz <jp...@gmx.de>.

Hi Eleanor.

> In my Lucene index there's a field that contains the local names of XML 
> elements, one name per document.  Users can enter arbitrary queries for 
> this field, so I'm using a QueryParser.

> From reading around it looks as if the field needs to be tokenized, but 
> since the field's content is always a single term, is this really 
> necessary?  

You are right, your field is already tokenized, but from what I know the 
main difference is that untokenized fields do not pass your selected 
analyzer when being added to the index. If your analyzer for example 
incorporates the LowerCaseFilter,  the field will be converted into 
lower case before it is indexed. When using the same analyzer for your 
QueryParser this will allow you to perform case insensitive query.

If you add the field untokenized and your Analyzer (at query time) 
incorporates the LowerCaseFilter, you will be unable find elements that 
contain upper characters.

Jan

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org