You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Eleanor Joslin <ej...@decisionsoft.com> on 2008/02/01 00:52:16 UTC
Using a QueryParser with an untokenized field?
In my Lucene index there's a field that contains the local names of XML
elements, one name per document. Users can enter arbitrary queries for
this field, so I'm using a QueryParser.
>From reading around it looks as if the field needs to be tokenized, but
since the field's content is always a single term, is this really
necessary? What difference does it make? I know that the QueryParser has
to use a token stream, but for this field, tokenizing ought to be a no-op.
Or am I missing something?
Thanks,
Eleanor Joslin
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Using a QueryParser with an untokenized field?
Posted by Chris Hostetter <ho...@fucit.org>.
: Thank you, this was exactly what I needed. So "tokenizing" really denotes a
: more general process that can involve normalizing the case or whatever else
: can be done with a filter. This is where I was confused.
When constructing a Document/Field "TOKENIZED" really referes to the
broader sense of "Analysis" (ie: should the Analyzer be used on the value)
-Hoss
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Using a QueryParser with an untokenized field?
Posted by Eleanor Joslin <ej...@decisionsoft.com>.
Thank you, this was exactly what I needed. So "tokenizing" really
denotes a more general process that can involve normalizing the case or
whatever else can be done with a filter. This is where I was confused.
Eleanor
Jan Peter Stotz wrote:
> Hi Eleanor.
>
>> In my Lucene index there's a field that contains the local names of
>> XML elements, one name per document. Users can enter arbitrary
>> queries for this field, so I'm using a QueryParser.
>
>> From reading around it looks as if the field needs to be tokenized,
>> but since the field's content is always a single term, is this really
>> necessary?
>
> You are right, your field is already tokenized, but from what I know the
> main difference is that untokenized fields do not pass your selected
> analyzer when being added to the index. If your analyzer for example
> incorporates the LowerCaseFilter, the field will be converted into
> lower case before it is indexed. When using the same analyzer for your
> QueryParser this will allow you to perform case insensitive query.
>
> If you add the field untokenized and your Analyzer (at query time)
> incorporates the LowerCaseFilter, you will be unable find elements that
> contain upper characters.
>
> Jan
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
--
Eleanor Joslin, Software Development DecisionSoft Ltd.
Telephone: +44-1865-203192 http://www.decisionsoft.com
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Using a QueryParser with an untokenized field?
Posted by Jan Peter Stotz <jp...@gmx.de>.
Hi Eleanor.
> In my Lucene index there's a field that contains the local names of XML
> elements, one name per document. Users can enter arbitrary queries for
> this field, so I'm using a QueryParser.
> From reading around it looks as if the field needs to be tokenized, but
> since the field's content is always a single term, is this really
> necessary?
You are right, your field is already tokenized, but from what I know the
main difference is that untokenized fields do not pass your selected
analyzer when being added to the index. If your analyzer for example
incorporates the LowerCaseFilter, the field will be converted into
lower case before it is indexed. When using the same analyzer for your
QueryParser this will allow you to perform case insensitive query.
If you add the field untokenized and your Analyzer (at query time)
incorporates the LowerCaseFilter, you will be unable find elements that
contain upper characters.
Jan
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org