You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Benson Margulies <bi...@gmail.com> on 2012/03/06 14:35:38 UTC
Problem with updating a document or TermQuery with current trunk
I've posted a self-contained test case to github of a mystery.
git://github.com/bimargulies/lucene-4-update-case.git
The code can be seen at
https://github.com/bimargulies/lucene-4-update-case/blob/master/src/test/java/org/apache/lucene/BadFieldTokenizedFlagTest.java.
I write a doc to an index, close the index, then reopen and do a
delete/add on the doc to add a field. If I iterate the docs in the
index, all looks well, but when I try to query for the doc, it isn't
found.
To be a bit more specific, the doc has a field "field1" which is a
StringField.TYPE_STORED, and it is a query on that field which comes
up empty.
I expect to learn that I've missed something obvious, and I offer
thanks and apologies in advance.
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Problem with updating a document or TermQuery with current trunk
Posted by Benson Margulies <bi...@gmail.com>.
On Tue, Mar 6, 2012 at 9:23 AM, Benson Margulies <bi...@gmail.com> wrote:
> On Tue, Mar 6, 2012 at 9:20 AM, Robert Muir <rc...@gmail.com> wrote:
>> I think the issue is that your analyzer is standardanalyzer, yet field
>> text value is "value-1"
>
> Robert,
>
> Why is this field analyzed at all? It's built with StringField.TYPE_STORED.
>
> I'll push another copy that shows that it works fine when the doc is
> first added, and gets bad after the 'update', when the field acquires
> the 'tokenized' boolean mysteriously.
I pushed a new copy that runs the query successfully before the
'delete/add' sequence, and then fails afterwards.
>
> --benson
>
>
>>
>> So standardanalyzer will tokenize this into two terms: "value" and "1"
>>
>> But later, you proceed to do TermQueries on "value-1". This term won't
>> exist... TermQuery etc that take Term don't analyze any text.
>>
>> Instead usually higher-level things like QueryParsers analyze text into Terms.
>>
>> On Tue, Mar 6, 2012 at 8:35 AM, Benson Margulies <bi...@gmail.com> wrote:
>>> I've posted a self-contained test case to github of a mystery.
>>>
>>> git://github.com/bimargulies/lucene-4-update-case.git
>>>
>>> The code can be seen at
>>> https://github.com/bimargulies/lucene-4-update-case/blob/master/src/test/java/org/apache/lucene/BadFieldTokenizedFlagTest.java.
>>>
>>> I write a doc to an index, close the index, then reopen and do a
>>> delete/add on the doc to add a field. If I iterate the docs in the
>>> index, all looks well, but when I try to query for the doc, it isn't
>>> found.
>>>
>>> To be a bit more specific, the doc has a field "field1" which is a
>>> StringField.TYPE_STORED, and it is a query on that field which comes
>>> up empty.
>>>
>>> I expect to learn that I've missed something obvious, and I offer
>>> thanks and apologies in advance.
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>
>>
>>
>> --
>> lucidimagination.com
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Problem with updating a document or TermQuery with current trunk
Posted by Michael McCandless <lu...@mikemccandless.com>.
On Tue, Mar 6, 2012 at 10:06 AM, Benson Margulies <bi...@gmail.com> wrote:
> On Tue, Mar 6, 2012 at 10:04 AM, Robert Muir <rc...@gmail.com> wrote:
>> Thanks Benson: look like the problem revolves around indexing
>> Document/Fields you get back from IR.document... this has always been
>> 'lossy', but I think this is a real API trap.
>>
>> Please keep testing :)
>
> Got a suggestion for sneaking around this in the mean time?
I just put a comment on the issue: you have to build a new Document
rather than re-index a Document loaded from IR.document.
Mike McCandless
http://blog.mikemccandless.com
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Problem with updating a document or TermQuery with current trunk
Posted by Benson Margulies <bi...@gmail.com>.
On Tue, Mar 6, 2012 at 10:04 AM, Robert Muir <rc...@gmail.com> wrote:
> Thanks Benson: look like the problem revolves around indexing
> Document/Fields you get back from IR.document... this has always been
> 'lossy', but I think this is a real API trap.
>
> Please keep testing :)
Got a suggestion for sneaking around this in the mean time?
>
> On Tue, Mar 6, 2012 at 9:58 AM, Benson Margulies <bi...@gmail.com> wrote:
>> On Tue, Mar 6, 2012 at 9:47 AM, Uwe Schindler <uw...@thetaphi.de> wrote:
>>> String field is analyzed, but with KeywordTokenizer, so all should be fine.
>>
>> I filed LUCENE-3854.
>>
>>>
>>> -----
>>> Uwe Schindler
>>> H.-H.-Meier-Allee 63, D-28213 Bremen
>>> http://www.thetaphi.de
>>> eMail: uwe@thetaphi.de
>>>
>>>
>>>> -----Original Message-----
>>>> From: Michael McCandless [mailto:lucene@mikemccandless.com]
>>>> Sent: Tuesday, March 06, 2012 3:42 PM
>>>> To: java-user@lucene.apache.org
>>>> Subject: Re: Problem with updating a document or TermQuery with current
>>>> trunk
>>>>
>>>> Hmm something is up here... I'll dig. Seems like we are somehow analyzing
>>>> StringField when we shouldn't...
>>>>
>>>> Mike McCandless
>>>>
>>>> http://blog.mikemccandless.com
>>>>
>>>> On Tue, Mar 6, 2012 at 9:33 AM, Robert Muir <rc...@gmail.com> wrote:
>>>> > On Tue, Mar 6, 2012 at 9:23 AM, Benson Margulies
>>>> <bi...@gmail.com> wrote:
>>>> >> On Tue, Mar 6, 2012 at 9:20 AM, Robert Muir <rc...@gmail.com> wrote:
>>>> >>> I think the issue is that your analyzer is standardanalyzer, yet
>>>> >>> field text value is "value-1"
>>>> >>
>>>> >> Robert,
>>>> >>
>>>> >> Why is this field analyzed at all? It's built with
>>> StringField.TYPE_STORED.
>>>> >>
>>>> >
>>>> > thanks Benson, you are right!
>>>> >
>>>> > --
>>>> > lucidimagination.com
>>>> >
>>>> > ---------------------------------------------------------------------
>>>> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>> > For additional commands, e-mail: java-user-help@lucene.apache.org
>>>> >
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>
>
>
> --
> lucidimagination.com
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Problem with updating a document or TermQuery with current trunk
Posted by Robert Muir <rc...@gmail.com>.
Thanks Benson: look like the problem revolves around indexing
Document/Fields you get back from IR.document... this has always been
'lossy', but I think this is a real API trap.
Please keep testing :)
On Tue, Mar 6, 2012 at 9:58 AM, Benson Margulies <bi...@gmail.com> wrote:
> On Tue, Mar 6, 2012 at 9:47 AM, Uwe Schindler <uw...@thetaphi.de> wrote:
>> String field is analyzed, but with KeywordTokenizer, so all should be fine.
>
> I filed LUCENE-3854.
>
>>
>> -----
>> Uwe Schindler
>> H.-H.-Meier-Allee 63, D-28213 Bremen
>> http://www.thetaphi.de
>> eMail: uwe@thetaphi.de
>>
>>
>>> -----Original Message-----
>>> From: Michael McCandless [mailto:lucene@mikemccandless.com]
>>> Sent: Tuesday, March 06, 2012 3:42 PM
>>> To: java-user@lucene.apache.org
>>> Subject: Re: Problem with updating a document or TermQuery with current
>>> trunk
>>>
>>> Hmm something is up here... I'll dig. Seems like we are somehow analyzing
>>> StringField when we shouldn't...
>>>
>>> Mike McCandless
>>>
>>> http://blog.mikemccandless.com
>>>
>>> On Tue, Mar 6, 2012 at 9:33 AM, Robert Muir <rc...@gmail.com> wrote:
>>> > On Tue, Mar 6, 2012 at 9:23 AM, Benson Margulies
>>> <bi...@gmail.com> wrote:
>>> >> On Tue, Mar 6, 2012 at 9:20 AM, Robert Muir <rc...@gmail.com> wrote:
>>> >>> I think the issue is that your analyzer is standardanalyzer, yet
>>> >>> field text value is "value-1"
>>> >>
>>> >> Robert,
>>> >>
>>> >> Why is this field analyzed at all? It's built with
>> StringField.TYPE_STORED.
>>> >>
>>> >
>>> > thanks Benson, you are right!
>>> >
>>> > --
>>> > lucidimagination.com
>>> >
>>> > ---------------------------------------------------------------------
>>> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> > For additional commands, e-mail: java-user-help@lucene.apache.org
>>> >
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
--
lucidimagination.com
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Problem with updating a document or TermQuery with current trunk
Posted by Benson Margulies <bi...@gmail.com>.
On Tue, Mar 6, 2012 at 9:47 AM, Uwe Schindler <uw...@thetaphi.de> wrote:
> String field is analyzed, but with KeywordTokenizer, so all should be fine.
I filed LUCENE-3854.
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
>
>
>> -----Original Message-----
>> From: Michael McCandless [mailto:lucene@mikemccandless.com]
>> Sent: Tuesday, March 06, 2012 3:42 PM
>> To: java-user@lucene.apache.org
>> Subject: Re: Problem with updating a document or TermQuery with current
>> trunk
>>
>> Hmm something is up here... I'll dig. Seems like we are somehow analyzing
>> StringField when we shouldn't...
>>
>> Mike McCandless
>>
>> http://blog.mikemccandless.com
>>
>> On Tue, Mar 6, 2012 at 9:33 AM, Robert Muir <rc...@gmail.com> wrote:
>> > On Tue, Mar 6, 2012 at 9:23 AM, Benson Margulies
>> <bi...@gmail.com> wrote:
>> >> On Tue, Mar 6, 2012 at 9:20 AM, Robert Muir <rc...@gmail.com> wrote:
>> >>> I think the issue is that your analyzer is standardanalyzer, yet
>> >>> field text value is "value-1"
>> >>
>> >> Robert,
>> >>
>> >> Why is this field analyzed at all? It's built with
> StringField.TYPE_STORED.
>> >>
>> >
>> > thanks Benson, you are right!
>> >
>> > --
>> > lucidimagination.com
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> > For additional commands, e-mail: java-user-help@lucene.apache.org
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
RE: Problem with updating a document or TermQuery with current trunk
Posted by Uwe Schindler <uw...@thetaphi.de>.
String field is analyzed, but with KeywordTokenizer, so all should be fine.
-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de
> -----Original Message-----
> From: Michael McCandless [mailto:lucene@mikemccandless.com]
> Sent: Tuesday, March 06, 2012 3:42 PM
> To: java-user@lucene.apache.org
> Subject: Re: Problem with updating a document or TermQuery with current
> trunk
>
> Hmm something is up here... I'll dig. Seems like we are somehow analyzing
> StringField when we shouldn't...
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
> On Tue, Mar 6, 2012 at 9:33 AM, Robert Muir <rc...@gmail.com> wrote:
> > On Tue, Mar 6, 2012 at 9:23 AM, Benson Margulies
> <bi...@gmail.com> wrote:
> >> On Tue, Mar 6, 2012 at 9:20 AM, Robert Muir <rc...@gmail.com> wrote:
> >>> I think the issue is that your analyzer is standardanalyzer, yet
> >>> field text value is "value-1"
> >>
> >> Robert,
> >>
> >> Why is this field analyzed at all? It's built with
StringField.TYPE_STORED.
> >>
> >
> > thanks Benson, you are right!
> >
> > --
> > lucidimagination.com
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Problem with updating a document or TermQuery with current trunk
Posted by Michael McCandless <lu...@mikemccandless.com>.
Hmm something is up here... I'll dig. Seems like we are somehow
analyzing StringField when we shouldn't...
Mike McCandless
http://blog.mikemccandless.com
On Tue, Mar 6, 2012 at 9:33 AM, Robert Muir <rc...@gmail.com> wrote:
> On Tue, Mar 6, 2012 at 9:23 AM, Benson Margulies <bi...@gmail.com> wrote:
>> On Tue, Mar 6, 2012 at 9:20 AM, Robert Muir <rc...@gmail.com> wrote:
>>> I think the issue is that your analyzer is standardanalyzer, yet field
>>> text value is "value-1"
>>
>> Robert,
>>
>> Why is this field analyzed at all? It's built with StringField.TYPE_STORED.
>>
>
> thanks Benson, you are right!
>
> --
> lucidimagination.com
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Problem with updating a document or TermQuery with current trunk
Posted by Benson Margulies <bi...@gmail.com>.
On Tue, Mar 6, 2012 at 9:33 AM, Robert Muir <rc...@gmail.com> wrote:
> On Tue, Mar 6, 2012 at 9:23 AM, Benson Margulies <bi...@gmail.com> wrote:
>> On Tue, Mar 6, 2012 at 9:20 AM, Robert Muir <rc...@gmail.com> wrote:
>>> I think the issue is that your analyzer is standardanalyzer, yet field
>>> text value is "value-1"
>>
>> Robert,
>>
>> Why is this field analyzed at all? It's built with StringField.TYPE_STORED.
>>
>
> thanks Benson, you are right!
So, should I attach this to a JIRA?
>
> --
> lucidimagination.com
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Problem with updating a document or TermQuery with current trunk
Posted by Robert Muir <rc...@gmail.com>.
On Tue, Mar 6, 2012 at 9:23 AM, Benson Margulies <bi...@gmail.com> wrote:
> On Tue, Mar 6, 2012 at 9:20 AM, Robert Muir <rc...@gmail.com> wrote:
>> I think the issue is that your analyzer is standardanalyzer, yet field
>> text value is "value-1"
>
> Robert,
>
> Why is this field analyzed at all? It's built with StringField.TYPE_STORED.
>
thanks Benson, you are right!
--
lucidimagination.com
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Problem with updating a document or TermQuery with current trunk
Posted by Benson Margulies <bi...@gmail.com>.
On Tue, Mar 6, 2012 at 9:20 AM, Robert Muir <rc...@gmail.com> wrote:
> I think the issue is that your analyzer is standardanalyzer, yet field
> text value is "value-1"
Robert,
Why is this field analyzed at all? It's built with StringField.TYPE_STORED.
I'll push another copy that shows that it works fine when the doc is
first added, and gets bad after the 'update', when the field acquires
the 'tokenized' boolean mysteriously.
--benson
>
> So standardanalyzer will tokenize this into two terms: "value" and "1"
>
> But later, you proceed to do TermQueries on "value-1". This term won't
> exist... TermQuery etc that take Term don't analyze any text.
>
> Instead usually higher-level things like QueryParsers analyze text into Terms.
>
> On Tue, Mar 6, 2012 at 8:35 AM, Benson Margulies <bi...@gmail.com> wrote:
>> I've posted a self-contained test case to github of a mystery.
>>
>> git://github.com/bimargulies/lucene-4-update-case.git
>>
>> The code can be seen at
>> https://github.com/bimargulies/lucene-4-update-case/blob/master/src/test/java/org/apache/lucene/BadFieldTokenizedFlagTest.java.
>>
>> I write a doc to an index, close the index, then reopen and do a
>> delete/add on the doc to add a field. If I iterate the docs in the
>> index, all looks well, but when I try to query for the doc, it isn't
>> found.
>>
>> To be a bit more specific, the doc has a field "field1" which is a
>> StringField.TYPE_STORED, and it is a query on that field which comes
>> up empty.
>>
>> I expect to learn that I've missed something obvious, and I offer
>> thanks and apologies in advance.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>
>
>
> --
> lucidimagination.com
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Problem with updating a document or TermQuery with current trunk
Posted by Robert Muir <rc...@gmail.com>.
I think the issue is that your analyzer is standardanalyzer, yet field
text value is "value-1"
So standardanalyzer will tokenize this into two terms: "value" and "1"
But later, you proceed to do TermQueries on "value-1". This term won't
exist... TermQuery etc that take Term don't analyze any text.
Instead usually higher-level things like QueryParsers analyze text into Terms.
On Tue, Mar 6, 2012 at 8:35 AM, Benson Margulies <bi...@gmail.com> wrote:
> I've posted a self-contained test case to github of a mystery.
>
> git://github.com/bimargulies/lucene-4-update-case.git
>
> The code can be seen at
> https://github.com/bimargulies/lucene-4-update-case/blob/master/src/test/java/org/apache/lucene/BadFieldTokenizedFlagTest.java.
>
> I write a doc to an index, close the index, then reopen and do a
> delete/add on the doc to add a field. If I iterate the docs in the
> index, all looks well, but when I try to query for the doc, it isn't
> found.
>
> To be a bit more specific, the doc has a field "field1" which is a
> StringField.TYPE_STORED, and it is a query on that field which comes
> up empty.
>
> I expect to learn that I've missed something obvious, and I offer
> thanks and apologies in advance.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
--
lucidimagination.com
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org