You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Trejkaz <tr...@trypticon.org> on 2017/12/10 22:19:48 UTC

UnsupportedOperationException from Outputs.merge, during addIndexes

Hi all.

I have an addIndexes call which in my over-weekend run threw an
UnsupportedOperationException from deep inside Lucene's code.

I'm wondering what sort of condition this is expected to occur in. The
source postings it's writing might be corrupt in some way, and if I
figure out what way it's corrupt, I can try to work around it.

I'm sure it's reproducible so I've put a breakpoint on the spot
already, but it takes a long time to get there.

Version in use is Lucene 6.6.0.

TX


java.lang.UnsupportedOperationException
    at org.apache.lucene.util.fst.Outputs.merge(Outputs.java:97)
    at org.apache.lucene.util.fst.Builder.add(Builder.java:459)
    at org.apache.lucene.codecs.blocktree.BlockTreeTermsWriter$PendingBlock.append(BlockTreeTermsWriter.java:503)
    at org.apache.lucene.codecs.blocktree.BlockTreeTermsWriter$PendingBlock.compileIndex(BlockTreeTermsWriter.java:475)
    at org.apache.lucene.codecs.blocktree.BlockTreeTermsWriter$TermsWriter.writeBlocks(BlockTreeTermsWriter.java:635)
    at org.apache.lucene.codecs.blocktree.BlockTreeTermsWriter$TermsWriter.pushTerm(BlockTreeTermsWriter.java:907)
    at org.apache.lucene.codecs.blocktree.BlockTreeTermsWriter$TermsWriter.write(BlockTreeTermsWriter.java:871)
    at org.apache.lucene.codecs.blocktree.BlockTreeTermsWriter.write(BlockTreeTermsWriter.java:344)
at org.apache.lucene.codecs.FieldsConsumer.merge(FieldsConsumer.java:105)
    at org.apache.lucene.codecs.perfield.PerFieldPostingsFormat$FieldsWriter.merge(PerFieldPostingsFormat.java:164)
    at org.apache.lucene.index.SegmentMerger.mergeTerms(SegmentMerger.java:216)
    at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:101)
    at org.apache.lucene.index.IndexWriter.addIndexes(IndexWriter.java:2824)

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: UnsupportedOperationException from Outputs.merge, during addIndexes

Posted by Trejkaz <tr...@trypticon.org>.
On Mon, Dec 11, 2017 at 10:59 PM, Adrien Grand <jp...@gmail.com> wrote:
> This means the FST builder is fed twice with the same key, so it tries to
> merge their outputs. This should not happen since the terms dictionary
> deduplicates terms.
>
> Do you get additional errors if you enable assertions? What are the codec
> readers that you pass to addIndexes? Could they contain duplicate terms?

This hint is a good lead, I'll start by checking all our own reader
implementations to see whether any of them could return the same term
more than once. The index I have been given might be broken somehow
too, but we're also migrating the data by creating "fake" codec
readers for things like postings, so it could be literally anywhere at
this point.

Turns out I still don't get to find out which field did it yet either,
because the most suspicious field didn't trigger it when migrated by
itself, and my overnight attempt died for other reasons. :)

TX

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: UnsupportedOperationException from Outputs.merge, during addIndexes

Posted by Adrien Grand <jp...@gmail.com>.
This means the FST builder is fed twice with the same key, so it tries to
merge their outputs. This should not happen since the terms dictionary
deduplicates terms.

Do you get additional errors if you enable assertions? What are the codec
readers that you pass to addIndexes? Could they contain duplicate terms?

Le dim. 10 déc. 2017 à 23:19, Trejkaz <tr...@trypticon.org> a écrit :

> Hi all.
>
> I have an addIndexes call which in my over-weekend run threw an
> UnsupportedOperationException from deep inside Lucene's code.
>
> I'm wondering what sort of condition this is expected to occur in. The
> source postings it's writing might be corrupt in some way, and if I
> figure out what way it's corrupt, I can try to work around it.
>
> I'm sure it's reproducible so I've put a breakpoint on the spot
> already, but it takes a long time to get there.
>
> Version in use is Lucene 6.6.0.
>
> TX
>
>
> java.lang.UnsupportedOperationException
>     at org.apache.lucene.util.fst.Outputs.merge(Outputs.java:97)
>     at org.apache.lucene.util.fst.Builder.add(Builder.java:459)
>     at
> org.apache.lucene.codecs.blocktree.BlockTreeTermsWriter$PendingBlock.append(BlockTreeTermsWriter.java:503)
>     at
> org.apache.lucene.codecs.blocktree.BlockTreeTermsWriter$PendingBlock.compileIndex(BlockTreeTermsWriter.java:475)
>     at
> org.apache.lucene.codecs.blocktree.BlockTreeTermsWriter$TermsWriter.writeBlocks(BlockTreeTermsWriter.java:635)
>     at
> org.apache.lucene.codecs.blocktree.BlockTreeTermsWriter$TermsWriter.pushTerm(BlockTreeTermsWriter.java:907)
>     at
> org.apache.lucene.codecs.blocktree.BlockTreeTermsWriter$TermsWriter.write(BlockTreeTermsWriter.java:871)
>     at
> org.apache.lucene.codecs.blocktree.BlockTreeTermsWriter.write(BlockTreeTermsWriter.java:344)
> at org.apache.lucene.codecs.FieldsConsumer.merge(FieldsConsumer.java:105)
>     at
> org.apache.lucene.codecs.perfield.PerFieldPostingsFormat$FieldsWriter.merge(PerFieldPostingsFormat.java:164)
>     at
> org.apache.lucene.index.SegmentMerger.mergeTerms(SegmentMerger.java:216)
>     at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:101)
>     at
> org.apache.lucene.index.IndexWriter.addIndexes(IndexWriter.java:2824)
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>