You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by Hans Lund <ha...@gmail.com> on 2016/10/04 14:40:37 UTC

merge problems

After upgrading to 6.2 we are having problems during merges (after running
for a while).

When the problem occurs its always complaining about the same field - and
throws:

java.lang.IllegalArgumentException: field="id" did not index point values
    at
org.apache.lucene.codecs.lucene60.Lucene60PointsReader.getBKDReader(Lucene60PointsReader.java:126)
    at
org.apache.lucene.codecs.lucene60.Lucene60PointsReader.size(Lucene60PointsReader.java:224)
    at
org.apache.lucene.codecs.lucene60.Lucene60PointsWriter.merge(Lucene60PointsWriter.java:169)
    at
org.apache.lucene.index.SegmentMerger.mergePoints(SegmentMerger.java:173)
    at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:122)
    at
org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4312)
    at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3889)


To figure out where we messed up - I have added some ugly logging to
Document:

public final void add(IndexableField field) {
        if ("id".equals(field.name()) &&
                field.fieldType().pointDimensionCount()
                        != 0) {
            System.err.println("Point value detected");
            for (IndexableField i : fields) {
                System.err.println(i);
            }
        }
        fields.add(field);
  }

In hope to intercept the document we messed up.

But to my surprise toString on the suspected field just says (contains a
URN):

indexed,omitNorms,indexOptions=DOCS<id:urn:wiki:doc:YEL:57028#1-1>

So any hints as to why field.fieldType().pointDimensionCount() != 0

and any suggestions what might cause this?

Regards
Hans Lund

Re: merge problems

Posted by Michael McCandless <lu...@mikemccandless.com>.

OK I have a small test case showing the issue!

I opened https://issues.apache.org/jira/browse/LUCENE-7491

Thanks for reporting this, Hans.

Mike McCandless

http://blog.mikemccandless.com

On Tue, Oct 11, 2016 at 12:08 PM, Hans Lund <ha...@gmail.com> wrote:
> hmm you're right - when it revealed a bug in our indexing code I stopped
> wondering ;-) but now I tried to create small tests to show the behavior -
> until now without success. I'm pretty sure that I can reproduce it by
> re-introducing our index bug, unfortunately it occurs after some hours
> parsing and indexing wikipedia dumps - but from there I'll try simplifying a
> test reproducing the setup.
>
> The setup we use is quite forward using MMapDirectory and a NRT setup - the
> only tailored functionality is our own IndexDeletionPolicy using an added
> timestamp in userdata for the index commit keeping a number of snapshots but
> honoring a max retention period, not that I suspect it to be the cause - but
> if fieldinfos from another snapshot is used in the merge that could cause
> problems
>
> Hans Lund
>
> On Tue, Oct 11, 2016 at 12:07 PM, Michael McCandless
> <lu...@mikemccandless.com> wrote:
>>
>> Hmm, that should be "OK" from Lucene's standpoint.
>>
>> I mean, it should not result in strange merge exceptions later on.
>>
>> I think there's a bug somewhere in Lucene's efforts to pretend it's
>> fully schema-less ... I'll try to reproduce this.
>>
>> Mike McCandless
>>
>> http://blog.mikemccandless.com
>>
>> On Tue, Oct 11, 2016 at 4:38 AM, Hans Lund <ha...@gmail.com> wrote:
>> > Turned out to be must much simpler - we had added a new 'dynamic' field
>> > to
>> > a stats doc a count on articles based on identified language code.
>> > Having a
>> > set of test documents in German, English, Swedish - no one had suspected
>> > the obvious that the language detection categorized a single document as
>> > being Indonesian, making the stats count id:1.
>> >
>> > I realized that the debug output I added - made output of everything
>> > else
>> > that the interesting field (iterating over already added fields - not
>> > the
>> > field causing the error later on ;-)
>> >
>> >
>> >
>> >
>> >
>> > On Mon, Oct 10, 2016 at 4:32 PM, Adrien Grand <jp...@gmail.com> wrote:
>> >
>> >> It looks like the field infos of your index went out of sync with data
>> >> stored in the files about points.
>> >>
>> >> Can you run CheckIndex on your index (potentially with the `-fast`
>> >> option
>> >> so that it only verifies checksums)? It could be that one of these two
>> >> parts of the index got corrupted.
>> >>
>> >> Since you were able to modify the way add(IndexableField) is
>> >> implemented,
>> >> I'm wondering if you are running a fork of Lucene? If yes, maybe you
>> >> did
>> >> some changes that triggered this bug?
>> >>
>> >> Otherwise is your application:
>> >>  - using IndexWriter.addIndexes?
>> >>  - customizing merging in some way, eg. by wrapping the merge readers?
>> >>
>> >> Le mar. 4 oct. 2016 à 16:40, Hans Lund <ha...@gmail.com> a écrit :
>> >>
>> >> > After upgrading to 6.2 we are having problems during merges (after
>> >> running
>> >> > for a while).
>> >> >
>> >> > When the problem occurs its always complaining about the same field -
>> >> > and
>> >> > throws:
>> >> >
>> >> > java.lang.IllegalArgumentException: field="id" did not index point
>> >> values
>> >> >     at
>> >> >
>> >> > org.apache.lucene.codecs.lucene60.Lucene60PointsReader.getBKDReader(
>> >> Lucene60PointsReader.java:126)
>> >> >     at
>> >> >
>> >> > org.apache.lucene.codecs.lucene60.Lucene60PointsReader.
>> >> size(Lucene60PointsReader.java:224)
>> >> >     at
>> >> >
>> >> > org.apache.lucene.codecs.lucene60.Lucene60PointsWriter.
>> >> merge(Lucene60PointsWriter.java:169)
>> >> >     at
>> >> > org.apache.lucene.index.SegmentMerger.mergePoints(
>> >> SegmentMerger.java:173)
>> >> >     at org.apache.lucene.index.SegmentMerger.merge(
>> >> SegmentMerger.java:122)
>> >> >     at
>> >> >
>> >> > org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4312)
>> >> >     at
>> >> > org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3889)
>> >> >
>> >> >
>> >> > To figure out where we messed up - I have added some ugly logging to
>> >> > Document:
>> >> >
>> >> > public final void add(IndexableField field) {
>> >> >         if ("id".equals(field.name()) &&
>> >> >                 field.fieldType().pointDimensionCount()
>> >> >                         != 0) {
>> >> >             System.err.println("Point value detected");
>> >> >             for (IndexableField i : fields) {
>> >> >                 System.err.println(i);
>> >> >             }
>> >> >         }
>> >> >         fields.add(field);
>> >> >   }
>> >> >
>> >> > In hope to intercept the document we messed up.
>> >> >
>> >> > But to my surprise toString on the suspected field just says
>> >> > (contains a
>> >> > URN):
>> >> >
>> >> > indexed,omitNorms,indexOptions=DOCS<id:urn:wiki:doc:YEL:57028#1-1>
>> >> >
>> >> > So any hints as to why field.fieldType().pointDimensionCount() != 0
>> >> >
>> >> > and any suggestions what might cause this?
>> >> >
>> >> > Regards
>> >> > Hans Lund
>> >> >
>> >>
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: merge problems

Posted by Hans Lund <ha...@gmail.com>.

hmm you're right - when it revealed a bug in our indexing code I stopped
wondering ;-) but now I tried to create small tests to show the behavior -
until now without success. I'm pretty sure that I can reproduce it by
re-introducing our index bug, unfortunately it occurs after some hours
parsing and indexing wikipedia dumps - but from there I'll try simplifying
a test reproducing the setup.

The setup we use is quite forward using MMapDirectory and a NRT setup - the
only tailored functionality is our own IndexDeletionPolicy using an added
timestamp in userdata for the index commit keeping a number of snapshots
but honoring a max retention period, not that I suspect it to be the cause
- but if fieldinfos from another snapshot is used in the merge that could
cause problems

Hans Lund

On Tue, Oct 11, 2016 at 12:07 PM, Michael McCandless <
lucene@mikemccandless.com> wrote:

> Hmm, that should be "OK" from Lucene's standpoint.
>
> I mean, it should not result in strange merge exceptions later on.
>
> I think there's a bug somewhere in Lucene's efforts to pretend it's
> fully schema-less ... I'll try to reproduce this.
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
> On Tue, Oct 11, 2016 at 4:38 AM, Hans Lund <ha...@gmail.com> wrote:
> > Turned out to be must much simpler - we had added a new 'dynamic' field
> to
> > a stats doc a count on articles based on identified language code.
> Having a
> > set of test documents in German, English, Swedish - no one had suspected
> > the obvious that the language detection categorized a single document as
> > being Indonesian, making the stats count id:1.
> >
> > I realized that the debug output I added - made output of everything else
> > that the interesting field (iterating over already added fields - not the
> > field causing the error later on ;-)
> >
> >
> >
> >
> >
> > On Mon, Oct 10, 2016 at 4:32 PM, Adrien Grand <jp...@gmail.com> wrote:
> >
> >> It looks like the field infos of your index went out of sync with data
> >> stored in the files about points.
> >>
> >> Can you run CheckIndex on your index (potentially with the `-fast`
> option
> >> so that it only verifies checksums)? It could be that one of these two
> >> parts of the index got corrupted.
> >>
> >> Since you were able to modify the way add(IndexableField) is
> implemented,
> >> I'm wondering if you are running a fork of Lucene? If yes, maybe you did
> >> some changes that triggered this bug?
> >>
> >> Otherwise is your application:
> >>  - using IndexWriter.addIndexes?
> >>  - customizing merging in some way, eg. by wrapping the merge readers?
> >>
> >> Le mar. 4 oct. 2016 à 16:40, Hans Lund <ha...@gmail.com> a écrit :
> >>
> >> > After upgrading to 6.2 we are having problems during merges (after
> >> running
> >> > for a while).
> >> >
> >> > When the problem occurs its always complaining about the same field -
> and
> >> > throws:
> >> >
> >> > java.lang.IllegalArgumentException: field="id" did not index point
> >> values
> >> >     at
> >> >
> >> > org.apache.lucene.codecs.lucene60.Lucene60PointsReader.getBKDReader(
> >> Lucene60PointsReader.java:126)
> >> >     at
> >> >
> >> > org.apache.lucene.codecs.lucene60.Lucene60PointsReader.
> >> size(Lucene60PointsReader.java:224)
> >> >     at
> >> >
> >> > org.apache.lucene.codecs.lucene60.Lucene60PointsWriter.
> >> merge(Lucene60PointsWriter.java:169)
> >> >     at
> >> > org.apache.lucene.index.SegmentMerger.mergePoints(
> >> SegmentMerger.java:173)
> >> >     at org.apache.lucene.index.SegmentMerger.merge(
> >> SegmentMerger.java:122)
> >> >     at
> >> > org.apache.lucene.index.IndexWriter.mergeMiddle(
> IndexWriter.java:4312)
> >> >     at org.apache.lucene.index.IndexWriter.merge(IndexWriter.
> java:3889)
> >> >
> >> >
> >> > To figure out where we messed up - I have added some ugly logging to
> >> > Document:
> >> >
> >> > public final void add(IndexableField field) {
> >> >         if ("id".equals(field.name()) &&
> >> >                 field.fieldType().pointDimensionCount()
> >> >                         != 0) {
> >> >             System.err.println("Point value detected");
> >> >             for (IndexableField i : fields) {
> >> >                 System.err.println(i);
> >> >             }
> >> >         }
> >> >         fields.add(field);
> >> >   }
> >> >
> >> > In hope to intercept the document we messed up.
> >> >
> >> > But to my surprise toString on the suspected field just says
> (contains a
> >> > URN):
> >> >
> >> > indexed,omitNorms,indexOptions=DOCS<id:urn:wiki:doc:YEL:57028#1-1>
> >> >
> >> > So any hints as to why field.fieldType().pointDimensionCount() != 0
> >> >
> >> > and any suggestions what might cause this?
> >> >
> >> > Regards
> >> > Hans Lund
> >> >
> >>
>

Re: merge problems

Posted by Michael McCandless <lu...@mikemccandless.com>.

Hmm, that should be "OK" from Lucene's standpoint.

I mean, it should not result in strange merge exceptions later on.

I think there's a bug somewhere in Lucene's efforts to pretend it's
fully schema-less ... I'll try to reproduce this.

Mike McCandless

http://blog.mikemccandless.com

On Tue, Oct 11, 2016 at 4:38 AM, Hans Lund <ha...@gmail.com> wrote:
> Turned out to be must much simpler - we had added a new 'dynamic' field to
> a stats doc a count on articles based on identified language code. Having a
> set of test documents in German, English, Swedish - no one had suspected
> the obvious that the language detection categorized a single document as
> being Indonesian, making the stats count id:1.
>
> I realized that the debug output I added - made output of everything else
> that the interesting field (iterating over already added fields - not the
> field causing the error later on ;-)
>
>
>
>
>
> On Mon, Oct 10, 2016 at 4:32 PM, Adrien Grand <jp...@gmail.com> wrote:
>
>> It looks like the field infos of your index went out of sync with data
>> stored in the files about points.
>>
>> Can you run CheckIndex on your index (potentially with the `-fast` option
>> so that it only verifies checksums)? It could be that one of these two
>> parts of the index got corrupted.
>>
>> Since you were able to modify the way add(IndexableField) is implemented,
>> I'm wondering if you are running a fork of Lucene? If yes, maybe you did
>> some changes that triggered this bug?
>>
>> Otherwise is your application:
>>  - using IndexWriter.addIndexes?
>>  - customizing merging in some way, eg. by wrapping the merge readers?
>>
>> Le mar. 4 oct. 2016 à 16:40, Hans Lund <ha...@gmail.com> a écrit :
>>
>> > After upgrading to 6.2 we are having problems during merges (after
>> running
>> > for a while).
>> >
>> > When the problem occurs its always complaining about the same field - and
>> > throws:
>> >
>> > java.lang.IllegalArgumentException: field="id" did not index point
>> values
>> >     at
>> >
>> > org.apache.lucene.codecs.lucene60.Lucene60PointsReader.getBKDReader(
>> Lucene60PointsReader.java:126)
>> >     at
>> >
>> > org.apache.lucene.codecs.lucene60.Lucene60PointsReader.
>> size(Lucene60PointsReader.java:224)
>> >     at
>> >
>> > org.apache.lucene.codecs.lucene60.Lucene60PointsWriter.
>> merge(Lucene60PointsWriter.java:169)
>> >     at
>> > org.apache.lucene.index.SegmentMerger.mergePoints(
>> SegmentMerger.java:173)
>> >     at org.apache.lucene.index.SegmentMerger.merge(
>> SegmentMerger.java:122)
>> >     at
>> > org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4312)
>> >     at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3889)
>> >
>> >
>> > To figure out where we messed up - I have added some ugly logging to
>> > Document:
>> >
>> > public final void add(IndexableField field) {
>> >         if ("id".equals(field.name()) &&
>> >                 field.fieldType().pointDimensionCount()
>> >                         != 0) {
>> >             System.err.println("Point value detected");
>> >             for (IndexableField i : fields) {
>> >                 System.err.println(i);
>> >             }
>> >         }
>> >         fields.add(field);
>> >   }
>> >
>> > In hope to intercept the document we messed up.
>> >
>> > But to my surprise toString on the suspected field just says (contains a
>> > URN):
>> >
>> > indexed,omitNorms,indexOptions=DOCS<id:urn:wiki:doc:YEL:57028#1-1>
>> >
>> > So any hints as to why field.fieldType().pointDimensionCount() != 0
>> >
>> > and any suggestions what might cause this?
>> >
>> > Regards
>> > Hans Lund
>> >
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: merge problems

Posted by Hans Lund <ha...@gmail.com>.

Turned out to be must much simpler - we had added a new 'dynamic' field to
a stats doc a count on articles based on identified language code. Having a
set of test documents in German, English, Swedish - no one had suspected
the obvious that the language detection categorized a single document as
being Indonesian, making the stats count id:1.

I realized that the debug output I added - made output of everything else
that the interesting field (iterating over already added fields - not the
field causing the error later on ;-)





On Mon, Oct 10, 2016 at 4:32 PM, Adrien Grand <jp...@gmail.com> wrote:

> It looks like the field infos of your index went out of sync with data
> stored in the files about points.
>
> Can you run CheckIndex on your index (potentially with the `-fast` option
> so that it only verifies checksums)? It could be that one of these two
> parts of the index got corrupted.
>
> Since you were able to modify the way add(IndexableField) is implemented,
> I'm wondering if you are running a fork of Lucene? If yes, maybe you did
> some changes that triggered this bug?
>
> Otherwise is your application:
>  - using IndexWriter.addIndexes?
>  - customizing merging in some way, eg. by wrapping the merge readers?
>
> Le mar. 4 oct. 2016 à 16:40, Hans Lund <ha...@gmail.com> a écrit :
>
> > After upgrading to 6.2 we are having problems during merges (after
> running
> > for a while).
> >
> > When the problem occurs its always complaining about the same field - and
> > throws:
> >
> > java.lang.IllegalArgumentException: field="id" did not index point
> values
> >     at
> >
> > org.apache.lucene.codecs.lucene60.Lucene60PointsReader.getBKDReader(
> Lucene60PointsReader.java:126)
> >     at
> >
> > org.apache.lucene.codecs.lucene60.Lucene60PointsReader.
> size(Lucene60PointsReader.java:224)
> >     at
> >
> > org.apache.lucene.codecs.lucene60.Lucene60PointsWriter.
> merge(Lucene60PointsWriter.java:169)
> >     at
> > org.apache.lucene.index.SegmentMerger.mergePoints(
> SegmentMerger.java:173)
> >     at org.apache.lucene.index.SegmentMerger.merge(
> SegmentMerger.java:122)
> >     at
> > org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4312)
> >     at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3889)
> >
> >
> > To figure out where we messed up - I have added some ugly logging to
> > Document:
> >
> > public final void add(IndexableField field) {
> >         if ("id".equals(field.name()) &&
> >                 field.fieldType().pointDimensionCount()
> >                         != 0) {
> >             System.err.println("Point value detected");
> >             for (IndexableField i : fields) {
> >                 System.err.println(i);
> >             }
> >         }
> >         fields.add(field);
> >   }
> >
> > In hope to intercept the document we messed up.
> >
> > But to my surprise toString on the suspected field just says (contains a
> > URN):
> >
> > indexed,omitNorms,indexOptions=DOCS<id:urn:wiki:doc:YEL:57028#1-1>
> >
> > So any hints as to why field.fieldType().pointDimensionCount() != 0
> >
> > and any suggestions what might cause this?
> >
> > Regards
> > Hans Lund
> >
>

Re: merge problems

Posted by Adrien Grand <jp...@gmail.com>.

It looks like the field infos of your index went out of sync with data
stored in the files about points.

Can you run CheckIndex on your index (potentially with the `-fast` option
so that it only verifies checksums)? It could be that one of these two
parts of the index got corrupted.

Since you were able to modify the way add(IndexableField) is implemented,
I'm wondering if you are running a fork of Lucene? If yes, maybe you did
some changes that triggered this bug?

Otherwise is your application:
 - using IndexWriter.addIndexes?
 - customizing merging in some way, eg. by wrapping the merge readers?

Le mar. 4 oct. 2016 à 16:40, Hans Lund <ha...@gmail.com> a écrit :

> After upgrading to 6.2 we are having problems during merges (after running
> for a while).
>
> When the problem occurs its always complaining about the same field - and
> throws:
>
> java.lang.IllegalArgumentException: field="id" did not index point values
>     at
>
> org.apache.lucene.codecs.lucene60.Lucene60PointsReader.getBKDReader(Lucene60PointsReader.java:126)
>     at
>
> org.apache.lucene.codecs.lucene60.Lucene60PointsReader.size(Lucene60PointsReader.java:224)
>     at
>
> org.apache.lucene.codecs.lucene60.Lucene60PointsWriter.merge(Lucene60PointsWriter.java:169)
>     at
> org.apache.lucene.index.SegmentMerger.mergePoints(SegmentMerger.java:173)
>     at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:122)
>     at
> org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4312)
>     at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3889)
>
>
> To figure out where we messed up - I have added some ugly logging to
> Document:
>
> public final void add(IndexableField field) {
>         if ("id".equals(field.name()) &&
>                 field.fieldType().pointDimensionCount()
>                         != 0) {
>             System.err.println("Point value detected");
>             for (IndexableField i : fields) {
>                 System.err.println(i);
>             }
>         }
>         fields.add(field);
>   }
>
> In hope to intercept the document we messed up.
>
> But to my surprise toString on the suspected field just says (contains a
> URN):
>
> indexed,omitNorms,indexOptions=DOCS<id:urn:wiki:doc:YEL:57028#1-1>
>
> So any hints as to why field.fieldType().pointDimensionCount() != 0
>
> and any suggestions what might cause this?
>
> Regards
> Hans Lund
>