You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by Constantine Vetoshev <ge...@gmail.com> on 2010/03/25 20:46:44 UTC

Fields with Field.Store.NO and Field.Index.ANALYZED not being indexed

I have a strange problem with Field.Store.NO and Field.Index.ANALYZED
fields with Lucene 3.0.1.

I'm testing my app with twenty test documents. Each has about ten
fields. All fields except one, "Content", are set as
Field.Store.YES. The "Content" field is set as Field.Store.NO and
Field.Index.ANALYZED. Using Luke, I discovered that this "Content" field
is not persisted to the disk, except on one document (neither the first
nor the last in the list). This always happens for exactly the same
document. When I examine the Document object before writing it, it has
the "Content" field I expect.

When I change the "Content" field from Field.Store.NO to
Field.Store.YES, everything starts working. Every document has the
"Content" field exactly as I expect, and searches produce the hits I
expect to see. I really don't want to save the full "Content" data in
the Lucene index, though. I'm baffled why Field.Store.NO results in
nothing being written to the index even with Field.Index.ANALYZED.

Suggestions?

-- 
Regards,
Constantine Vetoshev


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Fields with Field.Store.NO and Field.Index.ANALYZED not being indexed

Posted by Erick Erickson <er...@gmail.com>.

"if you do not have access to the original contents" is the key if Uwe's
comment. You do not need a separate field at all, it all depends upon
your situation. There's no problem in indexing AND storing f field.

HTH
Erick

On Sun, Aug 29, 2010 at 11:33 PM, Constantine Vetoshev
<ge...@gmail.com>wrote:

> "Uwe Schindler" <uw...@thetaphi.de> writes:
> > You cannot retrieve non-stored fields. They are analyzed and tokenized
> > during indexing and this is a one-way transformation. If you update
> > documents you have to reindex the contents. If you do not have access to
> the
> > original contents anymore, you may consider adding a stored-only "raw
> > document" field, that contains everything to rebuild the indexed fields.
> In
> > our installation, we have a stored field containing the JSON/XML source
> > document to do this.
>
> Thanks, that helps.
>
> Since it seems that I have to keep the raw data around, is there any
> reason not to just make that data's field both stored and analyzed? I'm
> just wondering why you use a separate stored-only field: a Lucene
> limitation or some app-specific reason?
>
> --
> Regards,
> Constantine Vetoshev
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: Fields with Field.Store.NO and Field.Index.ANALYZED not being indexed

Posted by Constantine Vetoshev <ge...@gmail.com>.

"Uwe Schindler" <uw...@thetaphi.de> writes:
> You cannot retrieve non-stored fields. They are analyzed and tokenized
> during indexing and this is a one-way transformation. If you update
> documents you have to reindex the contents. If you do not have access to the
> original contents anymore, you may consider adding a stored-only "raw
> document" field, that contains everything to rebuild the indexed fields. In
> our installation, we have a stored field containing the JSON/XML source
> document to do this.

Thanks, that helps.

Since it seems that I have to keep the raw data around, is there any
reason not to just make that data's field both stored and analyzed? I'm
just wondering why you use a separate stored-only field: a Lucene
limitation or some app-specific reason?

--
Regards,
Constantine Vetoshev


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Fields with Field.Store.NO and Field.Index.ANALYZED not being indexed

Posted by Erick Erickson <er...@gmail.com>.

Adding to Uwe's comment, you may be operating under a false
assumption. Lucene has no capability to update fields in a document.
Period. This is one of the most frequently requested changes, but
the nature of an inverted index makes this...er...tricky. Updates
are really a document delete followed by a document add. And as
a bonus, the new document won't even have the same internal
Lucene doc id as the one it replaces.

So if you're reading a document from the index, non-stored fields
are not part of the new update and your results will be...uhmmmm....
not what you expect...

Best
Erick

On Sun, Aug 29, 2010 at 1:48 PM, Uwe Schindler <uw...@thetaphi.de> wrote:

> You cannot retrieve non-stored fields. They are analyzed and tokenized
> during indexing and this is a one-way transformation. If you update
> documents you have to reindex the contents. If you do not have access to
> the
> original contents anymore, you may consider adding a stored-only "raw
> document" field, that contains everything to rebuild the indexed fields. In
> our installation, we have a stored field containing the JSON/XML source
> document to do this.
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
>
>
> > -----Original Message-----
> > From: Constantine Vetoshev [mailto:gepardcv@gmail.com]
> > Sent: Sunday, August 29, 2010 10:38 PM
> > To: java-user@lucene.apache.org
> > Subject: Re: Fields with Field.Store.NO and Field.Index.ANALYZED not
> being
> > indexed
> >
> > Thanks Erick.
> >
> > I finally had time to go back and look at this problem. I discovered that
> the
> > analyzed fields work fine for searching until I use
> > IndexWriter.updateDocument().
> >
> > The way my application runs, it has to update documents several times to
> > update one specific field. The update code queries out Document objects
> using
> > a unique identifier, and updates the field. The problem is in Document
> objects
> > returned by the query. The querying code runs a search, and eventually
> calls
> > IndexSearcher.doc(int). According to the API documentation, that method
> only
> > returns Document objects with stored fields from the underlying index.
> >
> > I tried calling IndexSearcher.doc(int i, FieldSelector fieldSelector)
> with
> > fieldSelector set to null: the documentation states that this returns
> Document
> > objects with all fields, but that also only seems to return stored
> fields.
> >
> > So my question becomes: how can I update a document which contains non-
> > stored analyzed fields without clobbering the analyzed-only fields?
> > Note that I do not need to update the analyzed-only fields. I have found
> nothing
> > helpful in the documentation.
> >
> > --
> > Regards,
> > Constantine Vetoshev
> >
> >
> > Erick Erickson <er...@gmail.com> writes:
> >
> > > I would be extraordinarily surprised if this was in Lucene, this is so
> > > basic to how it works that the howls would be heard world-round <G>.
> > >
> > > So I'm guessing it's in your code. Could you show it to us? Or, better
> > > yet, create a small, self-contained test case that illustrates your
> problem?
> > >
> > > Also, what analyzer(s) are you using? And what do your docs look like?
> > >
> > > Best
> > > Erick
> > >
> > > On Thu, Mar 25, 2010 at 3:46 PM, Constantine Vetoshev
> > <ge...@gmail.com>wrote:
> > >
> > >> I have a strange problem with Field.Store.NO and Field.Index.ANALYZED
> > >> fields with Lucene 3.0.1.
> > >>
> > >> I'm testing my app with twenty test documents. Each has about ten
> > >> fields. All fields except one, "Content", are set as Field.Store.YES.
> > >> The "Content" field is set as Field.Store.NO and
> > >> Field.Index.ANALYZED. Using Luke, I discovered that this "Content"
> > >> field is not persisted to the disk, except on one document (neither
> > >> the first nor the last in the list). This always happens for exactly
> > >> the same document. When I examine the Document object before writing
> > >> it, it has the "Content" field I expect.
> > >>
> > >> When I change the "Content" field from Field.Store.NO to
> > >> Field.Store.YES, everything starts working. Every document has the
> > >> "Content" field exactly as I expect, and searches produce the hits I
> > >> expect to see. I really don't want to save the full "Content" data in
> > >> the Lucene index, though. I'm baffled why Field.Store.NO results in
> > >> nothing being written to the index even with Field.Index.ANALYZED.
> > >>
> > >> Suggestions?
> > >>
> > >> --
> > >> Regards,
> > >> Constantine Vetoshev
> > >>
> > >>
> > >> ---------------------------------------------------------------------
> > >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > >> For additional commands, e-mail: java-user-help@lucene.apache.org
> > >>
> > >>
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

RE: Fields with Field.Store.NO and Field.Index.ANALYZED not being indexed

Posted by Uwe Schindler <uw...@thetaphi.de>.

You cannot retrieve non-stored fields. They are analyzed and tokenized
during indexing and this is a one-way transformation. If you update
documents you have to reindex the contents. If you do not have access to the
original contents anymore, you may consider adding a stored-only "raw
document" field, that contains everything to rebuild the indexed fields. In
our installation, we have a stored field containing the JSON/XML source
document to do this.

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de


> -----Original Message-----
> From: Constantine Vetoshev [mailto:gepardcv@gmail.com]
> Sent: Sunday, August 29, 2010 10:38 PM
> To: java-user@lucene.apache.org
> Subject: Re: Fields with Field.Store.NO and Field.Index.ANALYZED not being
> indexed
> 
> Thanks Erick.
> 
> I finally had time to go back and look at this problem. I discovered that
the
> analyzed fields work fine for searching until I use
> IndexWriter.updateDocument().
> 
> The way my application runs, it has to update documents several times to
> update one specific field. The update code queries out Document objects
using
> a unique identifier, and updates the field. The problem is in Document
objects
> returned by the query. The querying code runs a search, and eventually
calls
> IndexSearcher.doc(int). According to the API documentation, that method
only
> returns Document objects with stored fields from the underlying index.
> 
> I tried calling IndexSearcher.doc(int i, FieldSelector fieldSelector) with
> fieldSelector set to null: the documentation states that this returns
Document
> objects with all fields, but that also only seems to return stored fields.
> 
> So my question becomes: how can I update a document which contains non-
> stored analyzed fields without clobbering the analyzed-only fields?
> Note that I do not need to update the analyzed-only fields. I have found
nothing
> helpful in the documentation.
> 
> --
> Regards,
> Constantine Vetoshev
> 
> 
> Erick Erickson <er...@gmail.com> writes:
> 
> > I would be extraordinarily surprised if this was in Lucene, this is so
> > basic to how it works that the howls would be heard world-round <G>.
> >
> > So I'm guessing it's in your code. Could you show it to us? Or, better
> > yet, create a small, self-contained test case that illustrates your
problem?
> >
> > Also, what analyzer(s) are you using? And what do your docs look like?
> >
> > Best
> > Erick
> >
> > On Thu, Mar 25, 2010 at 3:46 PM, Constantine Vetoshev
> <ge...@gmail.com>wrote:
> >
> >> I have a strange problem with Field.Store.NO and Field.Index.ANALYZED
> >> fields with Lucene 3.0.1.
> >>
> >> I'm testing my app with twenty test documents. Each has about ten
> >> fields. All fields except one, "Content", are set as Field.Store.YES.
> >> The "Content" field is set as Field.Store.NO and
> >> Field.Index.ANALYZED. Using Luke, I discovered that this "Content"
> >> field is not persisted to the disk, except on one document (neither
> >> the first nor the last in the list). This always happens for exactly
> >> the same document. When I examine the Document object before writing
> >> it, it has the "Content" field I expect.
> >>
> >> When I change the "Content" field from Field.Store.NO to
> >> Field.Store.YES, everything starts working. Every document has the
> >> "Content" field exactly as I expect, and searches produce the hits I
> >> expect to see. I really don't want to save the full "Content" data in
> >> the Lucene index, though. I'm baffled why Field.Store.NO results in
> >> nothing being written to the index even with Field.Index.ANALYZED.
> >>
> >> Suggestions?
> >>
> >> --
> >> Regards,
> >> Constantine Vetoshev
> >>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >> For additional commands, e-mail: java-user-help@lucene.apache.org
> >>
> >>
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Fields with Field.Store.NO and Field.Index.ANALYZED not being indexed

Posted by Constantine Vetoshev <ge...@gmail.com>.

Thanks Erick.

I finally had time to go back and look at this problem. I discovered
that the analyzed fields work fine for searching until I use
IndexWriter.updateDocument().

The way my application runs, it has to update documents several times to
update one specific field. The update code queries out Document objects
using a unique identifier, and updates the field. The problem is in
Document objects returned by the query. The querying code runs a search,
and eventually calls IndexSearcher.doc(int). According to the API
documentation, that method only returns Document objects with stored
fields from the underlying index.

I tried calling IndexSearcher.doc(int i, FieldSelector fieldSelector)
with fieldSelector set to null: the documentation states that this
returns Document objects with all fields, but that also only seems to
return stored fields.

So my question becomes: how can I update a document which contains
non-stored analyzed fields without clobbering the analyzed-only fields?
Note that I do not need to update the analyzed-only fields. I have found
nothing helpful in the documentation.

--
Regards,
Constantine Vetoshev

Erick Erickson <er...@gmail.com> writes:

> I would be extraordinarily surprised if this was in Lucene, this is so
> basic to how it works that the howls would be heard world-round <G>.
>
> So I'm guessing it's in your code. Could you show it to us? Or, better
> yet, create a small, self-contained test case that illustrates your problem?
>
> Also, what analyzer(s) are you using? And what do your docs look like?
>
> Best
> Erick
>
> On Thu, Mar 25, 2010 at 3:46 PM, Constantine Vetoshev <ge...@gmail.com>wrote:
>
>> I have a strange problem with Field.Store.NO and Field.Index.ANALYZED
>> fields with Lucene 3.0.1.
>>
>> I'm testing my app with twenty test documents. Each has about ten
>> fields. All fields except one, "Content", are set as
>> Field.Store.YES. The "Content" field is set as Field.Store.NO and
>> Field.Index.ANALYZED. Using Luke, I discovered that this "Content" field
>> is not persisted to the disk, except on one document (neither the first
>> nor the last in the list). This always happens for exactly the same
>> document. When I examine the Document object before writing it, it has
>> the "Content" field I expect.
>>
>> When I change the "Content" field from Field.Store.NO to
>> Field.Store.YES, everything starts working. Every document has the
>> "Content" field exactly as I expect, and searches produce the hits I
>> expect to see. I really don't want to save the full "Content" data in
>> the Lucene index, though. I'm baffled why Field.Store.NO results in
>> nothing being written to the index even with Field.Index.ANALYZED.
>>
>> Suggestions?
>>
>> --
>> Regards,
>> Constantine Vetoshev
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Fields with Field.Store.NO and Field.Index.ANALYZED not being indexed

Posted by Erick Erickson <er...@gmail.com>.

I would be extraordinarily surprised if this was in Lucene, this is so
basic to how it works that the howls would be heard world-round <G>.

So I'm guessing it's in your code. Could you show it to us? Or, better
yet, create a small, self-contained test case that illustrates your problem?

Also, what analyzer(s) are you using? And what do your docs look like?

Best
Erick

On Thu, Mar 25, 2010 at 3:46 PM, Constantine Vetoshev <ge...@gmail.com>wrote:

> I have a strange problem with Field.Store.NO and Field.Index.ANALYZED
> fields with Lucene 3.0.1.
>
> I'm testing my app with twenty test documents. Each has about ten
> fields. All fields except one, "Content", are set as
> Field.Store.YES. The "Content" field is set as Field.Store.NO and
> Field.Index.ANALYZED. Using Luke, I discovered that this "Content" field
> is not persisted to the disk, except on one document (neither the first
> nor the last in the list). This always happens for exactly the same
> document. When I examine the Document object before writing it, it has
> the "Content" field I expect.
>
> When I change the "Content" field from Field.Store.NO to
> Field.Store.YES, everything starts working. Every document has the
> "Content" field exactly as I expect, and searches produce the hits I
> expect to see. I really don't want to save the full "Content" data in
> the Lucene index, though. I'm baffled why Field.Store.NO results in
> nothing being written to the index even with Field.Index.ANALYZED.
>
> Suggestions?
>
> --
> Regards,
> Constantine Vetoshev
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>