You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by Benson Margulies <bi...@gmail.com> on 2012/03/06 14:35:38 UTC

Problem with updating a document or TermQuery with current trunk

I've posted a self-contained test case to github of a mystery.

git://github.com/bimargulies/lucene-4-update-case.git

The code can be seen at
https://github.com/bimargulies/lucene-4-update-case/blob/master/src/test/java/org/apache/lucene/BadFieldTokenizedFlagTest.java.

I write a doc to an index, close the index, then reopen and do a
delete/add on the doc to add a field. If I iterate the docs in the
index, all looks well, but when I try to query for the doc, it isn't
found.

To be a bit more specific, the doc has a field "field1" which is a
StringField.TYPE_STORED, and it is a query on that field which comes
up empty.

I expect to learn that I've missed something obvious, and I offer
thanks and apologies in advance.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Problem with updating a document or TermQuery with current trunk

Posted by Benson Margulies <bi...@gmail.com>.

On Tue, Mar 6, 2012 at 9:23 AM, Benson Margulies <bi...@gmail.com> wrote:
> On Tue, Mar 6, 2012 at 9:20 AM, Robert Muir <rc...@gmail.com> wrote:
>> I think the issue is that your analyzer is standardanalyzer, yet field
>> text value is "value-1"
>
> Robert,
>
> Why is this field analyzed at all? It's built with StringField.TYPE_STORED.
>
> I'll push another copy that shows that it works fine when the doc is
> first added, and gets bad after the 'update', when the field acquires
> the 'tokenized' boolean mysteriously.

I pushed a new copy that runs the query successfully before the
'delete/add' sequence, and then fails afterwards.

>
> --benson
>
>
>>
>> So standardanalyzer will tokenize this into two terms: "value" and "1"
>>
>> But later, you proceed to do TermQueries on "value-1". This term won't
>> exist... TermQuery etc that take Term don't analyze any text.
>>
>> Instead usually higher-level things like QueryParsers analyze text into Terms.
>>
>> On Tue, Mar 6, 2012 at 8:35 AM, Benson Margulies <bi...@gmail.com> wrote:
>>> I've posted a self-contained test case to github of a mystery.
>>>
>>> git://github.com/bimargulies/lucene-4-update-case.git
>>>
>>> The code can be seen at
>>> https://github.com/bimargulies/lucene-4-update-case/blob/master/src/test/java/org/apache/lucene/BadFieldTokenizedFlagTest.java.
>>>
>>> I write a doc to an index, close the index, then reopen and do a
>>> delete/add on the doc to add a field. If I iterate the docs in the
>>> index, all looks well, but when I try to query for the doc, it isn't
>>> found.
>>>
>>> To be a bit more specific, the doc has a field "field1" which is a
>>> StringField.TYPE_STORED, and it is a query on that field which comes
>>> up empty.
>>>
>>> I expect to learn that I've missed something obvious, and I offer
>>> thanks and apologies in advance.
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>
>>
>>
>> --
>> lucidimagination.com
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Problem with updating a document or TermQuery with current trunk

Posted by Michael McCandless <lu...@mikemccandless.com>.

On Tue, Mar 6, 2012 at 10:06 AM, Benson Margulies <bi...@gmail.com> wrote:
> On Tue, Mar 6, 2012 at 10:04 AM, Robert Muir <rc...@gmail.com> wrote:
>> Thanks Benson: look like the problem revolves around indexing
>> Document/Fields you get back from IR.document... this has always been
>> 'lossy', but I think this is a real API trap.
>>
>> Please keep testing :)
>
> Got a suggestion for sneaking around this in the mean time?

I just put a comment on the issue: you have to build a new Document
rather than re-index a Document loaded from IR.document.

Mike McCandless

http://blog.mikemccandless.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Problem with updating a document or TermQuery with current trunk

Posted by Benson Margulies <bi...@gmail.com>.

On Tue, Mar 6, 2012 at 10:04 AM, Robert Muir <rc...@gmail.com> wrote:
> Thanks Benson: look like the problem revolves around indexing
> Document/Fields you get back from IR.document... this has always been
> 'lossy', but I think this is a real API trap.
>
> Please keep testing :)

Got a suggestion for sneaking around this in the mean time?

>
> On Tue, Mar 6, 2012 at 9:58 AM, Benson Margulies <bi...@gmail.com> wrote:
>> On Tue, Mar 6, 2012 at 9:47 AM, Uwe Schindler <uw...@thetaphi.de> wrote:
>>> String field is analyzed, but with KeywordTokenizer, so all should be fine.
>>
>> I filed LUCENE-3854.
>>
>>>
>>> -----
>>> Uwe Schindler
>>> H.-H.-Meier-Allee 63, D-28213 Bremen
>>> http://www.thetaphi.de
>>> eMail: uwe@thetaphi.de
>>>
>>>
>>>> -----Original Message-----
>>>> From: Michael McCandless [mailto:lucene@mikemccandless.com]
>>>> Sent: Tuesday, March 06, 2012 3:42 PM
>>>> To: java-user@lucene.apache.org
>>>> Subject: Re: Problem with updating a document or TermQuery with current
>>>> trunk
>>>>
>>>> Hmm something is up here... I'll dig.  Seems like we are somehow analyzing
>>>> StringField when we shouldn't...
>>>>
>>>> Mike McCandless
>>>>
>>>> http://blog.mikemccandless.com
>>>>
>>>> On Tue, Mar 6, 2012 at 9:33 AM, Robert Muir <rc...@gmail.com> wrote:
>>>> > On Tue, Mar 6, 2012 at 9:23 AM, Benson Margulies
>>>> <bi...@gmail.com> wrote:
>>>> >> On Tue, Mar 6, 2012 at 9:20 AM, Robert Muir <rc...@gmail.com> wrote:
>>>> >>> I think the issue is that your analyzer is standardanalyzer, yet
>>>> >>> field text value is "value-1"
>>>> >>
>>>> >> Robert,
>>>> >>
>>>> >> Why is this field analyzed at all? It's built with
>>> StringField.TYPE_STORED.
>>>> >>
>>>> >
>>>> > thanks Benson, you are right!
>>>> >
>>>> > --
>>>> > lucidimagination.com
>>>> >
>>>> > ---------------------------------------------------------------------
>>>> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>> > For additional commands, e-mail: java-user-help@lucene.apache.org
>>>> >
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>
>
>
> --
> lucidimagination.com
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Problem with updating a document or TermQuery with current trunk

Posted by Robert Muir <rc...@gmail.com>.

Thanks Benson: look like the problem revolves around indexing
Document/Fields you get back from IR.document... this has always been
'lossy', but I think this is a real API trap.

Please keep testing :)

On Tue, Mar 6, 2012 at 9:58 AM, Benson Margulies <bi...@gmail.com> wrote:
> On Tue, Mar 6, 2012 at 9:47 AM, Uwe Schindler <uw...@thetaphi.de> wrote:
>> String field is analyzed, but with KeywordTokenizer, so all should be fine.
>
> I filed LUCENE-3854.
>
>>
>> -----
>> Uwe Schindler
>> H.-H.-Meier-Allee 63, D-28213 Bremen
>> http://www.thetaphi.de
>> eMail: uwe@thetaphi.de
>>
>>
>>> -----Original Message-----
>>> From: Michael McCandless [mailto:lucene@mikemccandless.com]
>>> Sent: Tuesday, March 06, 2012 3:42 PM
>>> To: java-user@lucene.apache.org
>>> Subject: Re: Problem with updating a document or TermQuery with current
>>> trunk
>>>
>>> Hmm something is up here... I'll dig.  Seems like we are somehow analyzing
>>> StringField when we shouldn't...
>>>
>>> Mike McCandless
>>>
>>> http://blog.mikemccandless.com
>>>
>>> On Tue, Mar 6, 2012 at 9:33 AM, Robert Muir <rc...@gmail.com> wrote:
>>> > On Tue, Mar 6, 2012 at 9:23 AM, Benson Margulies
>>> <bi...@gmail.com> wrote:
>>> >> On Tue, Mar 6, 2012 at 9:20 AM, Robert Muir <rc...@gmail.com> wrote:
>>> >>> I think the issue is that your analyzer is standardanalyzer, yet
>>> >>> field text value is "value-1"
>>> >>
>>> >> Robert,
>>> >>
>>> >> Why is this field analyzed at all? It's built with
>> StringField.TYPE_STORED.
>>> >>
>>> >
>>> > thanks Benson, you are right!
>>> >
>>> > --
>>> > lucidimagination.com
>>> >
>>> > ---------------------------------------------------------------------
>>> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> > For additional commands, e-mail: java-user-help@lucene.apache.org
>>> >
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>



-- 
lucidimagination.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Problem with updating a document or TermQuery with current trunk

Posted by Benson Margulies <bi...@gmail.com>.

On Tue, Mar 6, 2012 at 9:47 AM, Uwe Schindler <uw...@thetaphi.de> wrote:
> String field is analyzed, but with KeywordTokenizer, so all should be fine.

I filed LUCENE-3854.

>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
>
>
>> -----Original Message-----
>> From: Michael McCandless [mailto:lucene@mikemccandless.com]
>> Sent: Tuesday, March 06, 2012 3:42 PM
>> To: java-user@lucene.apache.org
>> Subject: Re: Problem with updating a document or TermQuery with current
>> trunk
>>
>> Hmm something is up here... I'll dig.  Seems like we are somehow analyzing
>> StringField when we shouldn't...
>>
>> Mike McCandless
>>
>> http://blog.mikemccandless.com
>>
>> On Tue, Mar 6, 2012 at 9:33 AM, Robert Muir <rc...@gmail.com> wrote:
>> > On Tue, Mar 6, 2012 at 9:23 AM, Benson Margulies
>> <bi...@gmail.com> wrote:
>> >> On Tue, Mar 6, 2012 at 9:20 AM, Robert Muir <rc...@gmail.com> wrote:
>> >>> I think the issue is that your analyzer is standardanalyzer, yet
>> >>> field text value is "value-1"
>> >>
>> >> Robert,
>> >>
>> >> Why is this field analyzed at all? It's built with
> StringField.TYPE_STORED.
>> >>
>> >
>> > thanks Benson, you are right!
>> >
>> > --
>> > lucidimagination.com
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> > For additional commands, e-mail: java-user-help@lucene.apache.org
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

RE: Problem with updating a document or TermQuery with current trunk

Posted by Uwe Schindler <uw...@thetaphi.de>.

String field is analyzed, but with KeywordTokenizer, so all should be fine.

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de


> -----Original Message-----
> From: Michael McCandless [mailto:lucene@mikemccandless.com]
> Sent: Tuesday, March 06, 2012 3:42 PM
> To: java-user@lucene.apache.org
> Subject: Re: Problem with updating a document or TermQuery with current
> trunk
> 
> Hmm something is up here... I'll dig.  Seems like we are somehow analyzing
> StringField when we shouldn't...
> 
> Mike McCandless
> 
> http://blog.mikemccandless.com
> 
> On Tue, Mar 6, 2012 at 9:33 AM, Robert Muir <rc...@gmail.com> wrote:
> > On Tue, Mar 6, 2012 at 9:23 AM, Benson Margulies
> <bi...@gmail.com> wrote:
> >> On Tue, Mar 6, 2012 at 9:20 AM, Robert Muir <rc...@gmail.com> wrote:
> >>> I think the issue is that your analyzer is standardanalyzer, yet
> >>> field text value is "value-1"
> >>
> >> Robert,
> >>
> >> Why is this field analyzed at all? It's built with
StringField.TYPE_STORED.
> >>
> >
> > thanks Benson, you are right!
> >
> > --
> > lucidimagination.com
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Problem with updating a document or TermQuery with current trunk

Posted by Michael McCandless <lu...@mikemccandless.com>.

Hmm something is up here... I'll dig.  Seems like we are somehow
analyzing StringField when we shouldn't...

Mike McCandless

http://blog.mikemccandless.com

On Tue, Mar 6, 2012 at 9:33 AM, Robert Muir <rc...@gmail.com> wrote:
> On Tue, Mar 6, 2012 at 9:23 AM, Benson Margulies <bi...@gmail.com> wrote:
>> On Tue, Mar 6, 2012 at 9:20 AM, Robert Muir <rc...@gmail.com> wrote:
>>> I think the issue is that your analyzer is standardanalyzer, yet field
>>> text value is "value-1"
>>
>> Robert,
>>
>> Why is this field analyzed at all? It's built with StringField.TYPE_STORED.
>>
>
> thanks Benson, you are right!
>
> --
> lucidimagination.com
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Problem with updating a document or TermQuery with current trunk

Posted by Benson Margulies <bi...@gmail.com>.

On Tue, Mar 6, 2012 at 9:33 AM, Robert Muir <rc...@gmail.com> wrote:
> On Tue, Mar 6, 2012 at 9:23 AM, Benson Margulies <bi...@gmail.com> wrote:
>> On Tue, Mar 6, 2012 at 9:20 AM, Robert Muir <rc...@gmail.com> wrote:
>>> I think the issue is that your analyzer is standardanalyzer, yet field
>>> text value is "value-1"
>>
>> Robert,
>>
>> Why is this field analyzed at all? It's built with StringField.TYPE_STORED.
>>
>
> thanks Benson, you are right!

So, should I attach this to a JIRA?
>
> --
> lucidimagination.com
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Problem with updating a document or TermQuery with current trunk

Posted by Robert Muir <rc...@gmail.com>.

On Tue, Mar 6, 2012 at 9:23 AM, Benson Margulies <bi...@gmail.com> wrote:
> On Tue, Mar 6, 2012 at 9:20 AM, Robert Muir <rc...@gmail.com> wrote:
>> I think the issue is that your analyzer is standardanalyzer, yet field
>> text value is "value-1"
>
> Robert,
>
> Why is this field analyzed at all? It's built with StringField.TYPE_STORED.
>

thanks Benson, you are right!

-- 
lucidimagination.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Problem with updating a document or TermQuery with current trunk

Posted by Benson Margulies <bi...@gmail.com>.

On Tue, Mar 6, 2012 at 9:20 AM, Robert Muir <rc...@gmail.com> wrote:
> I think the issue is that your analyzer is standardanalyzer, yet field
> text value is "value-1"

Robert,

Why is this field analyzed at all? It's built with StringField.TYPE_STORED.

I'll push another copy that shows that it works fine when the doc is
first added, and gets bad after the 'update', when the field acquires
the 'tokenized' boolean mysteriously.

--benson


>
> So standardanalyzer will tokenize this into two terms: "value" and "1"
>
> But later, you proceed to do TermQueries on "value-1". This term won't
> exist... TermQuery etc that take Term don't analyze any text.
>
> Instead usually higher-level things like QueryParsers analyze text into Terms.
>
> On Tue, Mar 6, 2012 at 8:35 AM, Benson Margulies <bi...@gmail.com> wrote:
>> I've posted a self-contained test case to github of a mystery.
>>
>> git://github.com/bimargulies/lucene-4-update-case.git
>>
>> The code can be seen at
>> https://github.com/bimargulies/lucene-4-update-case/blob/master/src/test/java/org/apache/lucene/BadFieldTokenizedFlagTest.java.
>>
>> I write a doc to an index, close the index, then reopen and do a
>> delete/add on the doc to add a field. If I iterate the docs in the
>> index, all looks well, but when I try to query for the doc, it isn't
>> found.
>>
>> To be a bit more specific, the doc has a field "field1" which is a
>> StringField.TYPE_STORED, and it is a query on that field which comes
>> up empty.
>>
>> I expect to learn that I've missed something obvious, and I offer
>> thanks and apologies in advance.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>
>
>
> --
> lucidimagination.com
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Problem with updating a document or TermQuery with current trunk

Posted by Robert Muir <rc...@gmail.com>.

I think the issue is that your analyzer is standardanalyzer, yet field
text value is "value-1"

So standardanalyzer will tokenize this into two terms: "value" and "1"

But later, you proceed to do TermQueries on "value-1". This term won't
exist... TermQuery etc that take Term don't analyze any text.

Instead usually higher-level things like QueryParsers analyze text into Terms.

On Tue, Mar 6, 2012 at 8:35 AM, Benson Margulies <bi...@gmail.com> wrote:
> I've posted a self-contained test case to github of a mystery.
>
> git://github.com/bimargulies/lucene-4-update-case.git
>
> The code can be seen at
> https://github.com/bimargulies/lucene-4-update-case/blob/master/src/test/java/org/apache/lucene/BadFieldTokenizedFlagTest.java.
>
> I write a doc to an index, close the index, then reopen and do a
> delete/add on the doc to add a field. If I iterate the docs in the
> index, all looks well, but when I try to query for the doc, it isn't
> found.
>
> To be a bit more specific, the doc has a field "field1" which is a
> StringField.TYPE_STORED, and it is a query on that field which comes
> up empty.
>
> I expect to learn that I've missed something obvious, and I offer
> thanks and apologies in advance.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>



-- 
lucidimagination.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org