You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by Ian Lea <ia...@gmail.com> on 2009/08/03 11:21:07 UTC

Re: question about indexing/searching using standardanalyzer for KEYWORD field that contains alphanumeric data

Hi

Storing documentkey as TEXT will be causing it to be passed through
StandardAnalyzer which will be downcasing it, and the index will be
holding "l2222fahbhmf" rather than "L2222FAHBHMF".  When you changed
it to KEYWORD it will have been stored as is so the
updateDocument(term, doc) call will have worked but searching will
have failed because StandardAnalyzer will have downcased it. Numeric
keys will have worked everywhere because they don't get downcased.

See "Why is it important to use the same analyzer type during indexing
and search?" in the FAQ.

The best solution is probably to store it as KEYWORD and use
PerFieldAnalyzerWrapper to specify KeywordAnalyzer for documentkey.
The javadocs have an example showing what you need.

TEXT and KEYWORD haven't been around in lucene for a while.  You might
like to consider upgrading.  Good practice anyway to mention what
version you are using when asking questions.

--
Ian.

On Mon, Aug 3, 2009 at 3:49 AM, Leonard
Gestrin<Le...@markettools.com> wrote:
>
> Hello,
> I have question about KEYWORD type and searching/updating.  I am getting strange behavior that I can't quite comprehend.
> My index is created using standard analyzer, which used for writing and searching. It has three fields
>
> userpin - alphanumeric field which is stored as TEXT
> documentkey  - alphanumeric field which is stored as TEXT
> contents - text of document which is stored as TEXT
>
> When I try to update document I am creating Term to find document by documentKey and I am using
>
>  org.apache.lucene.index.IndexWriter.updateDocument(term, pDocument);
>
> to do the update.  Lucene fails to find the document by the term and I am getting duplicate documents in the index.
> When I changed index to define documentKey as KEYWORD the updates started to work fine.
> However, search for documentKey using StandardAnalyzer stopped working.
>
> It appears that lucene is using keywordAnalyzer for searching for the term during update, even though the indexer is open with StandardAnalyzer.
>
> The sample values that are stored in documentKeys are: "L2222FAHBHMF", "L2222FAHBHAS".
> I noticed if documentKey is numeric value, both KeywordAnalyzer and StandardAnalyzer can find the documents by it without any problem thus reader can find and indexer can update without any problems. With alphanumeric I cant get both to work.
> Any help is appreciated.
> Thanks
> Leonard
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

RE: question about indexing/searching using standardanalyzer for KEYWORD field that contains alphanumeric data

Posted by Leonard Gestrin <Le...@markettools.com>.

Hi Ian
Thank you for reply.
I have recently upgraded the application to lucene 2.4.1

I did not realize that during update operation standard analyzer was not invoked on the term same way as it's done for searching even though indexer is open using it. I am a newbie on lucene (I inherited project) - I will do some reading as you suggested.
Thanks
Leonard

-----Original Message-----
From: Ian Lea [mailto:ian.lea@gmail.com] 
Sent: Monday, August 03, 2009 2:21 AM
To: java-user@lucene.apache.org
Subject: Re: question about indexing/searching using standardanalyzer for KEYWORD field that contains alphanumeric data

Hi

Storing documentkey as TEXT will be causing it to be passed through
StandardAnalyzer which will be downcasing it, and the index will be
holding "l2222fahbhmf" rather than "L2222FAHBHMF".  When you changed
it to KEYWORD it will have been stored as is so the
updateDocument(term, doc) call will have worked but searching will
have failed because StandardAnalyzer will have downcased it. Numeric
keys will have worked everywhere because they don't get downcased.

See "Why is it important to use the same analyzer type during indexing
and search?" in the FAQ.

The best solution is probably to store it as KEYWORD and use
PerFieldAnalyzerWrapper to specify KeywordAnalyzer for documentkey.
The javadocs have an example showing what you need.

TEXT and KEYWORD haven't been around in lucene for a while.  You might
like to consider upgrading.  Good practice anyway to mention what
version you are using when asking questions.

--
Ian.

On Mon, Aug 3, 2009 at 3:49 AM, Leonard
Gestrin<Le...@markettools.com> wrote:
>
> Hello,
> I have question about KEYWORD type and searching/updating.  I am getting strange behavior that I can't quite comprehend.
> My index is created using standard analyzer, which used for writing and searching. It has three fields
>
> userpin - alphanumeric field which is stored as TEXT
> documentkey  - alphanumeric field which is stored as TEXT
> contents - text of document which is stored as TEXT
>
> When I try to update document I am creating Term to find document by documentKey and I am using
>
>  org.apache.lucene.index.IndexWriter.updateDocument(term, pDocument);
>
> to do the update.  Lucene fails to find the document by the term and I am getting duplicate documents in the index.
> When I changed index to define documentKey as KEYWORD the updates started to work fine.
> However, search for documentKey using StandardAnalyzer stopped working.
>
> It appears that lucene is using keywordAnalyzer for searching for the term during update, even though the indexer is open with StandardAnalyzer.
>
> The sample values that are stored in documentKeys are: "L2222FAHBHMF", "L2222FAHBHAS".
> I noticed if documentKey is numeric value, both KeywordAnalyzer and StandardAnalyzer can find the documents by it without any problem thus reader can find and indexer can update without any problems. With alphanumeric I cant get both to work.
> Any help is appreciated.
> Thanks
> Leonard
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org