You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Jason Rutherglen <ja...@gmail.com> on 2009/03/24 19:01:53 UTC

Assertion Error in TermsHashPerField.comparePostings - Lucene 2.4

While indexing using
contrib/org.apache.lucene.benchmark.byTask.feeds.EnwikiDocMaker.  The
asserion error is from TermsHashPerField.comparePostings(RawPostingList p1,
RawPostingList p2).  A Payload is added to the document representing a UID.
Only 1-2 out of 1 million documents indexed generates this error.

java.lang.AssertionError
problem adding
doc:Document<stored/uncompressed,indexed,tokenized<body:[[Image:Croatia,
Washington.JPG|right|250px|thumb|The Croatian embassy]] The '''Croatian
Embassy in Washington''' is the [[embassy]] of [[Croatia]] in [[Washington,
D.C.]]  It is located on [[Embassy Row]] at 2343 [[Massachusetts Avenue
(Washington, DC)|Massachusetts Avenue]], [[Washington DC
(northwest)|Northwest]] near [[Dupont Circle]].  Previously the building had
been home to the [[Austrian Embassy in Washington|Austrian embassy]], but
they left for larger quarters and sold the structure to Croatia in 1993.
The purchase and renovation of the building was largely paid for by the
[[Croatian-American]] community.  In front of the embassy is a large
sculpture of [[St. Jerome]] by Croatian sculptor [[Ivan Me?trovi?]].
==External link== *[http://www.croatiaemb.org/ Official site]
[[Category:Embassies in Washington|Croatia]] [[Category:Foreign relations of
Croatia]]> stored/uncompressed,indexed,tokenized<doctitle:Embassy of Croatia
in Washington> stored/uncompressed,indexed,tokenized<docdate:29-JUN-2006
07:27:44.000> stored/uncompressed,indexed,omitNorms<docid:1703107>
indexed,tokenized<_I...@e7b3cf>
indexed<id:667162>> ex: java.lang.AssertionError
    at
org.apache.lucene.index.TermsHashPerField.comparePostings(TermsHashPerField.java:228)
    at
org.apache.lucene.index.TermsHashPerField.quickSort(TermsHashPerField.java:144)
    at
org.apache.lucene.index.TermsHashPerField.sortPostings(TermsHashPerField.java:136)
    at
org.apache.lucene.index.FreqProxFieldMergeState.<init>(FreqProxFieldMergeState.java:51)
    at
org.apache.lucene.index.FreqProxTermsWriter.appendPostings(FreqProxTermsWriter.java:202)
    at
org.apache.lucene.index.FreqProxTermsWriter.flush(FreqProxTermsWriter.java:132)
    at org.apache.lucene.index.TermsHash.flush(TermsHash.java:145)
    at org.apache.lucene.index.DocInverter.flush(DocInverter.java:74)
    at
org.apache.lucene.index.DocFieldConsumers.flush(DocFieldConsumers.java:75)
    at
org.apache.lucene.index.DocFieldProcessor.flush(DocFieldProcessor.java:60)
    at
org.apache.lucene.index.DocumentsWriter.flush(DocumentsWriter.java:574)
    at org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:3533)
    at org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:3442)
    at
org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1922)
    at
org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1880)

Re: Assertion Error in TermsHashPerField.comparePostings - Lucene 2.4

Posted by Michael McCandless <lu...@mikemccandless.com>.
I'm unable to reproduce this.

Jason have you tried on other computers (to rule out eg bad RAM/IO)?

Mike

On Wed, Mar 25, 2009 at 6:39 PM, Jason Rutherglen
<ja...@gmail.com> wrote:
> LuceneError when executed should reproduce the failure.  The
> contrib/benchmark libraries are required.  MultiThreadDocAdd is a
> multithreaded indexing utility class.
>
> On Wed, Mar 25, 2009 at 1:06 PM, Jason Rutherglen
> <ja...@gmail.com> wrote:
>>
>> Each document is being created in a single thread, and the fields of the
>> document are not being updated elsewhere.  I haven't posted the full code
>> yet as it needs to cleaned up.  Thanks Mike!
>>
>> On Tue, Mar 24, 2009 at 2:43 PM, Michael McCandless
>> <lu...@mikemccandless.com> wrote:
>>>
>>> It looks like you are reusing a Field (the f.setValue(...) calls); are
>>> you sure you're not changing a Document/Field while another thread is
>>> adding it to the index?
>>>
>>> If you can post the full code, then I can try to run it on my
>>> wikipedia dump locally.
>>>
>>> Mike
>>>
>>> Jason Rutherglen <ja...@gmail.com> wrote:
>>> > Mike,
>>> >
>>> > It only happens when at least 1 million documents are indexed in a
>>> > multithreaded fashion.  Maybe I should post the code?  I will try
>>> > indexing
>>> > without the payload field, I assume it won't fail because I indexed
>>> > wikipedia before with no issues.
>>> >
>>> > Thanks!
>>> >
>>> > Jason
>>> >
>>> > On Tue, Mar 24, 2009 at 12:25 PM, Michael McCandless <
>>> > lucene@mikemccandless.com> wrote:
>>> >
>>> >> Hmmmm.
>>> >>
>>> >> Jason is this easily/compactly repeated?  EG, try to index the N docs
>>> >> before that one.
>>> >>
>>> >> If you remove the SinglePayloadTokenStream field, does the exception
>>> >> still happen?
>>> >>
>>> >> Mike
>>> >>
>>> >> Jason Rutherglen <ja...@gmail.com> wrote:
>>> >> > While indexing using
>>> >> > contrib/org.apache.lucene.benchmark.byTask.feeds.EnwikiDocMaker.
>>> >> >  The
>>> >> > asserion error is from
>>> >> > TermsHashPerField.comparePostings(RawPostingList
>>> >> p1,
>>> >> > RawPostingList p2).  A Payload is added to the document representing
>>> >> > a
>>> >> UID.
>>> >> > Only 1-2 out of 1 million documents indexed generates this error.
>>> >> >
>>> >> > java.lang.AssertionError
>>> >> > problem adding
>>> >> >
>>> >> > doc:Document<stored/uncompressed,indexed,tokenized<body:[[Image:Croatia,
>>> >> > Washington.JPG|right|250px|thumb|The Croatian embassy]] The
>>> >> > '''Croatian
>>> >> > Embassy in Washington''' is the [[embassy]] of [[Croatia]] in
>>> >> [[Washington,
>>> >> > D.C.]]  It is located on [[Embassy Row]] at 2343 [[Massachusetts
>>> >> > Avenue
>>> >> > (Washington, DC)|Massachusetts Avenue]], [[Washington DC
>>> >> > (northwest)|Northwest]] near [[Dupont Circle]].  Previously the
>>> >> > building
>>> >> had
>>> >> > been home to the [[Austrian Embassy in Washington|Austrian
>>> >> > embassy]], but
>>> >> > they left for larger quarters and sold the structure to Croatia in
>>> >> > 1993.
>>> >> > The purchase and renovation of the building was largely paid for by
>>> >> > the
>>> >> > [[Croatian-American]] community.  In front of the embassy is a large
>>> >> > sculpture of [[St. Jerome]] by Croatian sculptor [[Ivan Me?trovi?]].
>>> >> > ==External link== *[http://www.croatiaemb.org/ Official site]
>>> >> > [[Category:Embassies in Washington|Croatia]] [[Category:Foreign
>>> >> > relations
>>> >> of
>>> >> > Croatia]]> stored/uncompressed,indexed,tokenized<doctitle:Embassy of
>>> >> Croatia
>>> >> > in Washington>
>>> >> > stored/uncompressed,indexed,tokenized<docdate:29-JUN-2006
>>> >> > 07:27:44.000> stored/uncompressed,indexed,omitNorms<docid:1703107>
>>> >> >
>>> >>
>>> >> indexed,tokenized<_ID:proj.zoie.api.ZoieIndexReader$SinglePayloadTokenStream@e7b3cf
>>> >> >
>>> >> > indexed<id:667162>> ex: java.lang.AssertionError
>>> >> >    at
>>> >> >
>>> >>
>>> >> org.apache.lucene.index.TermsHashPerField.comparePostings(TermsHashPerField.java:228)
>>> >> >    at
>>> >> >
>>> >>
>>> >> org.apache.lucene.index.TermsHashPerField.quickSort(TermsHashPerField.java:144)
>>> >> >    at
>>> >> >
>>> >>
>>> >> org.apache.lucene.index.TermsHashPerField.sortPostings(TermsHashPerField.java:136)
>>> >> >    at
>>> >> >
>>> >>
>>> >> org.apache.lucene.index.FreqProxFieldMergeState.<init>(FreqProxFieldMergeState.java:51)
>>> >> >    at
>>> >> >
>>> >>
>>> >> org.apache.lucene.index.FreqProxTermsWriter.appendPostings(FreqProxTermsWriter.java:202)
>>> >> >    at
>>> >> >
>>> >>
>>> >> org.apache.lucene.index.FreqProxTermsWriter.flush(FreqProxTermsWriter.java:132)
>>> >> >    at org.apache.lucene.index.TermsHash.flush(TermsHash.java:145)
>>> >> >    at org.apache.lucene.index.DocInverter.flush(DocInverter.java:74)
>>> >> >    at
>>> >> >
>>> >>
>>> >> org.apache.lucene.index.DocFieldConsumers.flush(DocFieldConsumers.java:75)
>>> >> >    at
>>> >> >
>>> >>
>>> >> org.apache.lucene.index.DocFieldProcessor.flush(DocFieldProcessor.java:60)
>>> >> >    at
>>> >> >
>>> >> > org.apache.lucene.index.DocumentsWriter.flush(DocumentsWriter.java:574)
>>> >> >    at
>>> >> > org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:3533)
>>> >> >    at
>>> >> > org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:3442)
>>> >> >    at
>>> >> >
>>> >> > org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1922)
>>> >> >    at
>>> >> >
>>> >> > org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1880)
>>> >> >
>>> >>
>>> >> ---------------------------------------------------------------------
>>> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> >> For additional commands, e-mail: java-user-help@lucene.apache.org
>>> >>
>>> >>
>>> >
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Assertion Error in TermsHashPerField.comparePostings - Lucene 2.4

Posted by Michael McCandless <lu...@mikemccandless.com>.
Another thing is to limit the max # merge threads CMS will run at
once.  It defaults to 3 now.

Mike

On Thu, Mar 26, 2009 at 2:08 PM, Jason Rutherglen
<ja...@gmail.com> wrote:
> I used the NoMergePolicy to build the index as I noticed the indexing is
> faster, meaning the system simply creates large multi-megabyte segments in
> the ram buffer, flushes them out and doesn't worry about merging which
> causes massive disk trashing.  I am pondering some benchmarks to find the
> optimal merge policy for realtime search, I'm not sure it's always necessary
> to merge according to the Log system.
>
> For example, a merge policy that caps the size of each segment at 250
> megabytes, and does no merging could be interesting for realtime where many
> deletes are coming in and the segments with enough deletes need to merged
> away in 1-2 hours.  Meaning optimizing may not be best as it requires later
> large merges.  Also an interleaving system that does not perform merges if a
> flush is occurring could useful for minimizing disk trash.
>
> On Wed, Mar 25, 2009 at 3:39 PM, Jason Rutherglen <
> jason.rutherglen@gmail.com> wrote:
>
>> LuceneError when executed should reproduce the failure.  The
>> contrib/benchmark libraries are required.  MultiThreadDocAdd is a
>> multithreaded indexing utility class.
>>
>> On Wed, Mar 25, 2009 at 1:06 PM, Jason Rutherglen <
>> jason.rutherglen@gmail.com> wrote:
>>
>>> Each document is being created in a single thread, and the fields of the
>>> document are not being updated elsewhere.  I haven't posted the full code
>>> yet as it needs to cleaned up.  Thanks Mike!
>>>
>>>
>>> On Tue, Mar 24, 2009 at 2:43 PM, Michael McCandless <
>>> lucene@mikemccandless.com> wrote:
>>>
>>>> It looks like you are reusing a Field (the f.setValue(...) calls); are
>>>> you sure you're not changing a Document/Field while another thread is
>>>> adding it to the index?
>>>>
>>>> If you can post the full code, then I can try to run it on my
>>>> wikipedia dump locally.
>>>>
>>>> Mike
>>>>
>>>> Jason Rutherglen <ja...@gmail.com> wrote:
>>>> > Mike,
>>>> >
>>>> > It only happens when at least 1 million documents are indexed in a
>>>> > multithreaded fashion.  Maybe I should post the code?  I will try
>>>> indexing
>>>> > without the payload field, I assume it won't fail because I indexed
>>>> > wikipedia before with no issues.
>>>> >
>>>> > Thanks!
>>>> >
>>>> > Jason
>>>> >
>>>> > On Tue, Mar 24, 2009 at 12:25 PM, Michael McCandless <
>>>> > lucene@mikemccandless.com> wrote:
>>>> >
>>>> >> Hmmmm.
>>>> >>
>>>> >> Jason is this easily/compactly repeated?  EG, try to index the N docs
>>>> >> before that one.
>>>> >>
>>>> >> If you remove the SinglePayloadTokenStream field, does the exception
>>>> >> still happen?
>>>> >>
>>>> >> Mike
>>>> >>
>>>> >> Jason Rutherglen <ja...@gmail.com> wrote:
>>>> >> > While indexing using
>>>> >> > contrib/org.apache.lucene.benchmark.byTask.feeds.EnwikiDocMaker.
>>>>  The
>>>> >> > asserion error is from
>>>> TermsHashPerField.comparePostings(RawPostingList
>>>> >> p1,
>>>> >> > RawPostingList p2).  A Payload is added to the document representing
>>>> a
>>>> >> UID.
>>>> >> > Only 1-2 out of 1 million documents indexed generates this error.
>>>> >> >
>>>> >> > java.lang.AssertionError
>>>> >> > problem adding
>>>> >> >
>>>> doc:Document<stored/uncompressed,indexed,tokenized<body:[[Image:Croatia,
>>>> >> > Washington.JPG|right|250px|thumb|The Croatian embassy]] The
>>>> '''Croatian
>>>> >> > Embassy in Washington''' is the [[embassy]] of [[Croatia]] in
>>>> >> [[Washington,
>>>> >> > D.C.]]  It is located on [[Embassy Row]] at 2343 [[Massachusetts
>>>> Avenue
>>>> >> > (Washington, DC)|Massachusetts Avenue]], [[Washington DC
>>>> >> > (northwest)|Northwest]] near [[Dupont Circle]].  Previously the
>>>> building
>>>> >> had
>>>> >> > been home to the [[Austrian Embassy in Washington|Austrian
>>>> embassy]], but
>>>> >> > they left for larger quarters and sold the structure to Croatia in
>>>> 1993.
>>>> >> > The purchase and renovation of the building was largely paid for by
>>>> the
>>>> >> > [[Croatian-American]] community.  In front of the embassy is a large
>>>> >> > sculpture of [[St. Jerome]] by Croatian sculptor [[Ivan Me?trovi?]].
>>>> >> > ==External link== *[http://www.croatiaemb.org/ Official site]
>>>> >> > [[Category:Embassies in Washington|Croatia]] [[Category:Foreign
>>>> relations
>>>> >> of
>>>> >> > Croatia]]> stored/uncompressed,indexed,tokenized<doctitle:Embassy of
>>>> >> Croatia
>>>> >> > in Washington>
>>>> stored/uncompressed,indexed,tokenized<docdate:29-JUN-2006
>>>> >> > 07:27:44.000> stored/uncompressed,indexed,omitNorms<docid:1703107>
>>>> >> >
>>>> >>
>>>> indexed,tokenized<_ID:proj.zoie.api.ZoieIndexReader$SinglePayloadTokenStream@e7b3cf
>>>> >> >
>>>> >> > indexed<id:667162>> ex: java.lang.AssertionError
>>>> >> >    at
>>>> >> >
>>>> >>
>>>> org.apache.lucene.index.TermsHashPerField.comparePostings(TermsHashPerField.java:228)
>>>> >> >    at
>>>> >> >
>>>> >>
>>>> org.apache.lucene.index.TermsHashPerField.quickSort(TermsHashPerField.java:144)
>>>> >> >    at
>>>> >> >
>>>> >>
>>>> org.apache.lucene.index.TermsHashPerField.sortPostings(TermsHashPerField.java:136)
>>>> >> >    at
>>>> >> >
>>>> >>
>>>> org.apache.lucene.index.FreqProxFieldMergeState.<init>(FreqProxFieldMergeState.java:51)
>>>> >> >    at
>>>> >> >
>>>> >>
>>>> org.apache.lucene.index.FreqProxTermsWriter.appendPostings(FreqProxTermsWriter.java:202)
>>>> >> >    at
>>>> >> >
>>>> >>
>>>> org.apache.lucene.index.FreqProxTermsWriter.flush(FreqProxTermsWriter.java:132)
>>>> >> >    at org.apache.lucene.index.TermsHash.flush(TermsHash.java:145)
>>>> >> >    at org.apache.lucene.index.DocInverter.flush(DocInverter.java:74)
>>>> >> >    at
>>>> >> >
>>>> >>
>>>> org.apache.lucene.index.DocFieldConsumers.flush(DocFieldConsumers.java:75)
>>>> >> >    at
>>>> >> >
>>>> >>
>>>> org.apache.lucene.index.DocFieldProcessor.flush(DocFieldProcessor.java:60)
>>>> >> >    at
>>>> >> >
>>>> org.apache.lucene.index.DocumentsWriter.flush(DocumentsWriter.java:574)
>>>> >> >    at
>>>> org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:3533)
>>>> >> >    at
>>>> org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:3442)
>>>> >> >    at
>>>> >> >
>>>> org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1922)
>>>> >> >    at
>>>> >> >
>>>> org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1880)
>>>> >> >
>>>> >>
>>>> >> ---------------------------------------------------------------------
>>>> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>> >> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>> >>
>>>> >>
>>>> >
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>
>>>>
>>>
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Assertion Error in TermsHashPerField.comparePostings - Lucene 2.4

Posted by Jason Rutherglen <ja...@gmail.com>.
I used the NoMergePolicy to build the index as I noticed the indexing is
faster, meaning the system simply creates large multi-megabyte segments in
the ram buffer, flushes them out and doesn't worry about merging which
causes massive disk trashing.  I am pondering some benchmarks to find the
optimal merge policy for realtime search, I'm not sure it's always necessary
to merge according to the Log system.

For example, a merge policy that caps the size of each segment at 250
megabytes, and does no merging could be interesting for realtime where many
deletes are coming in and the segments with enough deletes need to merged
away in 1-2 hours.  Meaning optimizing may not be best as it requires later
large merges.  Also an interleaving system that does not perform merges if a
flush is occurring could useful for minimizing disk trash.

On Wed, Mar 25, 2009 at 3:39 PM, Jason Rutherglen <
jason.rutherglen@gmail.com> wrote:

> LuceneError when executed should reproduce the failure.  The
> contrib/benchmark libraries are required.  MultiThreadDocAdd is a
> multithreaded indexing utility class.
>
> On Wed, Mar 25, 2009 at 1:06 PM, Jason Rutherglen <
> jason.rutherglen@gmail.com> wrote:
>
>> Each document is being created in a single thread, and the fields of the
>> document are not being updated elsewhere.  I haven't posted the full code
>> yet as it needs to cleaned up.  Thanks Mike!
>>
>>
>> On Tue, Mar 24, 2009 at 2:43 PM, Michael McCandless <
>> lucene@mikemccandless.com> wrote:
>>
>>> It looks like you are reusing a Field (the f.setValue(...) calls); are
>>> you sure you're not changing a Document/Field while another thread is
>>> adding it to the index?
>>>
>>> If you can post the full code, then I can try to run it on my
>>> wikipedia dump locally.
>>>
>>> Mike
>>>
>>> Jason Rutherglen <ja...@gmail.com> wrote:
>>> > Mike,
>>> >
>>> > It only happens when at least 1 million documents are indexed in a
>>> > multithreaded fashion.  Maybe I should post the code?  I will try
>>> indexing
>>> > without the payload field, I assume it won't fail because I indexed
>>> > wikipedia before with no issues.
>>> >
>>> > Thanks!
>>> >
>>> > Jason
>>> >
>>> > On Tue, Mar 24, 2009 at 12:25 PM, Michael McCandless <
>>> > lucene@mikemccandless.com> wrote:
>>> >
>>> >> Hmmmm.
>>> >>
>>> >> Jason is this easily/compactly repeated?  EG, try to index the N docs
>>> >> before that one.
>>> >>
>>> >> If you remove the SinglePayloadTokenStream field, does the exception
>>> >> still happen?
>>> >>
>>> >> Mike
>>> >>
>>> >> Jason Rutherglen <ja...@gmail.com> wrote:
>>> >> > While indexing using
>>> >> > contrib/org.apache.lucene.benchmark.byTask.feeds.EnwikiDocMaker.
>>>  The
>>> >> > asserion error is from
>>> TermsHashPerField.comparePostings(RawPostingList
>>> >> p1,
>>> >> > RawPostingList p2).  A Payload is added to the document representing
>>> a
>>> >> UID.
>>> >> > Only 1-2 out of 1 million documents indexed generates this error.
>>> >> >
>>> >> > java.lang.AssertionError
>>> >> > problem adding
>>> >> >
>>> doc:Document<stored/uncompressed,indexed,tokenized<body:[[Image:Croatia,
>>> >> > Washington.JPG|right|250px|thumb|The Croatian embassy]] The
>>> '''Croatian
>>> >> > Embassy in Washington''' is the [[embassy]] of [[Croatia]] in
>>> >> [[Washington,
>>> >> > D.C.]]  It is located on [[Embassy Row]] at 2343 [[Massachusetts
>>> Avenue
>>> >> > (Washington, DC)|Massachusetts Avenue]], [[Washington DC
>>> >> > (northwest)|Northwest]] near [[Dupont Circle]].  Previously the
>>> building
>>> >> had
>>> >> > been home to the [[Austrian Embassy in Washington|Austrian
>>> embassy]], but
>>> >> > they left for larger quarters and sold the structure to Croatia in
>>> 1993.
>>> >> > The purchase and renovation of the building was largely paid for by
>>> the
>>> >> > [[Croatian-American]] community.  In front of the embassy is a large
>>> >> > sculpture of [[St. Jerome]] by Croatian sculptor [[Ivan Me?trovi?]].
>>> >> > ==External link== *[http://www.croatiaemb.org/ Official site]
>>> >> > [[Category:Embassies in Washington|Croatia]] [[Category:Foreign
>>> relations
>>> >> of
>>> >> > Croatia]]> stored/uncompressed,indexed,tokenized<doctitle:Embassy of
>>> >> Croatia
>>> >> > in Washington>
>>> stored/uncompressed,indexed,tokenized<docdate:29-JUN-2006
>>> >> > 07:27:44.000> stored/uncompressed,indexed,omitNorms<docid:1703107>
>>> >> >
>>> >>
>>> indexed,tokenized<_ID:proj.zoie.api.ZoieIndexReader$SinglePayloadTokenStream@e7b3cf
>>> >> >
>>> >> > indexed<id:667162>> ex: java.lang.AssertionError
>>> >> >    at
>>> >> >
>>> >>
>>> org.apache.lucene.index.TermsHashPerField.comparePostings(TermsHashPerField.java:228)
>>> >> >    at
>>> >> >
>>> >>
>>> org.apache.lucene.index.TermsHashPerField.quickSort(TermsHashPerField.java:144)
>>> >> >    at
>>> >> >
>>> >>
>>> org.apache.lucene.index.TermsHashPerField.sortPostings(TermsHashPerField.java:136)
>>> >> >    at
>>> >> >
>>> >>
>>> org.apache.lucene.index.FreqProxFieldMergeState.<init>(FreqProxFieldMergeState.java:51)
>>> >> >    at
>>> >> >
>>> >>
>>> org.apache.lucene.index.FreqProxTermsWriter.appendPostings(FreqProxTermsWriter.java:202)
>>> >> >    at
>>> >> >
>>> >>
>>> org.apache.lucene.index.FreqProxTermsWriter.flush(FreqProxTermsWriter.java:132)
>>> >> >    at org.apache.lucene.index.TermsHash.flush(TermsHash.java:145)
>>> >> >    at org.apache.lucene.index.DocInverter.flush(DocInverter.java:74)
>>> >> >    at
>>> >> >
>>> >>
>>> org.apache.lucene.index.DocFieldConsumers.flush(DocFieldConsumers.java:75)
>>> >> >    at
>>> >> >
>>> >>
>>> org.apache.lucene.index.DocFieldProcessor.flush(DocFieldProcessor.java:60)
>>> >> >    at
>>> >> >
>>> org.apache.lucene.index.DocumentsWriter.flush(DocumentsWriter.java:574)
>>> >> >    at
>>> org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:3533)
>>> >> >    at
>>> org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:3442)
>>> >> >    at
>>> >> >
>>> org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1922)
>>> >> >    at
>>> >> >
>>> org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1880)
>>> >> >
>>> >>
>>> >> ---------------------------------------------------------------------
>>> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> >> For additional commands, e-mail: java-user-help@lucene.apache.org
>>> >>
>>> >>
>>> >
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>>
>>
>

Re: Assertion Error in TermsHashPerField.comparePostings - Lucene 2.4

Posted by Jason Rutherglen <ja...@gmail.com>.
LuceneError when executed should reproduce the failure.  The
contrib/benchmark libraries are required.  MultiThreadDocAdd is a
multithreaded indexing utility class.

On Wed, Mar 25, 2009 at 1:06 PM, Jason Rutherglen <
jason.rutherglen@gmail.com> wrote:

> Each document is being created in a single thread, and the fields of the
> document are not being updated elsewhere.  I haven't posted the full code
> yet as it needs to cleaned up.  Thanks Mike!
>
>
> On Tue, Mar 24, 2009 at 2:43 PM, Michael McCandless <
> lucene@mikemccandless.com> wrote:
>
>> It looks like you are reusing a Field (the f.setValue(...) calls); are
>> you sure you're not changing a Document/Field while another thread is
>> adding it to the index?
>>
>> If you can post the full code, then I can try to run it on my
>> wikipedia dump locally.
>>
>> Mike
>>
>> Jason Rutherglen <ja...@gmail.com> wrote:
>> > Mike,
>> >
>> > It only happens when at least 1 million documents are indexed in a
>> > multithreaded fashion.  Maybe I should post the code?  I will try
>> indexing
>> > without the payload field, I assume it won't fail because I indexed
>> > wikipedia before with no issues.
>> >
>> > Thanks!
>> >
>> > Jason
>> >
>> > On Tue, Mar 24, 2009 at 12:25 PM, Michael McCandless <
>> > lucene@mikemccandless.com> wrote:
>> >
>> >> Hmmmm.
>> >>
>> >> Jason is this easily/compactly repeated?  EG, try to index the N docs
>> >> before that one.
>> >>
>> >> If you remove the SinglePayloadTokenStream field, does the exception
>> >> still happen?
>> >>
>> >> Mike
>> >>
>> >> Jason Rutherglen <ja...@gmail.com> wrote:
>> >> > While indexing using
>> >> > contrib/org.apache.lucene.benchmark.byTask.feeds.EnwikiDocMaker.  The
>> >> > asserion error is from
>> TermsHashPerField.comparePostings(RawPostingList
>> >> p1,
>> >> > RawPostingList p2).  A Payload is added to the document representing
>> a
>> >> UID.
>> >> > Only 1-2 out of 1 million documents indexed generates this error.
>> >> >
>> >> > java.lang.AssertionError
>> >> > problem adding
>> >> >
>> doc:Document<stored/uncompressed,indexed,tokenized<body:[[Image:Croatia,
>> >> > Washington.JPG|right|250px|thumb|The Croatian embassy]] The
>> '''Croatian
>> >> > Embassy in Washington''' is the [[embassy]] of [[Croatia]] in
>> >> [[Washington,
>> >> > D.C.]]  It is located on [[Embassy Row]] at 2343 [[Massachusetts
>> Avenue
>> >> > (Washington, DC)|Massachusetts Avenue]], [[Washington DC
>> >> > (northwest)|Northwest]] near [[Dupont Circle]].  Previously the
>> building
>> >> had
>> >> > been home to the [[Austrian Embassy in Washington|Austrian embassy]],
>> but
>> >> > they left for larger quarters and sold the structure to Croatia in
>> 1993.
>> >> > The purchase and renovation of the building was largely paid for by
>> the
>> >> > [[Croatian-American]] community.  In front of the embassy is a large
>> >> > sculpture of [[St. Jerome]] by Croatian sculptor [[Ivan Me?trovi?]].
>> >> > ==External link== *[http://www.croatiaemb.org/ Official site]
>> >> > [[Category:Embassies in Washington|Croatia]] [[Category:Foreign
>> relations
>> >> of
>> >> > Croatia]]> stored/uncompressed,indexed,tokenized<doctitle:Embassy of
>> >> Croatia
>> >> > in Washington>
>> stored/uncompressed,indexed,tokenized<docdate:29-JUN-2006
>> >> > 07:27:44.000> stored/uncompressed,indexed,omitNorms<docid:1703107>
>> >> >
>> >>
>> indexed,tokenized<_ID:proj.zoie.api.ZoieIndexReader$SinglePayloadTokenStream@e7b3cf
>> >> >
>> >> > indexed<id:667162>> ex: java.lang.AssertionError
>> >> >    at
>> >> >
>> >>
>> org.apache.lucene.index.TermsHashPerField.comparePostings(TermsHashPerField.java:228)
>> >> >    at
>> >> >
>> >>
>> org.apache.lucene.index.TermsHashPerField.quickSort(TermsHashPerField.java:144)
>> >> >    at
>> >> >
>> >>
>> org.apache.lucene.index.TermsHashPerField.sortPostings(TermsHashPerField.java:136)
>> >> >    at
>> >> >
>> >>
>> org.apache.lucene.index.FreqProxFieldMergeState.<init>(FreqProxFieldMergeState.java:51)
>> >> >    at
>> >> >
>> >>
>> org.apache.lucene.index.FreqProxTermsWriter.appendPostings(FreqProxTermsWriter.java:202)
>> >> >    at
>> >> >
>> >>
>> org.apache.lucene.index.FreqProxTermsWriter.flush(FreqProxTermsWriter.java:132)
>> >> >    at org.apache.lucene.index.TermsHash.flush(TermsHash.java:145)
>> >> >    at org.apache.lucene.index.DocInverter.flush(DocInverter.java:74)
>> >> >    at
>> >> >
>> >>
>> org.apache.lucene.index.DocFieldConsumers.flush(DocFieldConsumers.java:75)
>> >> >    at
>> >> >
>> >>
>> org.apache.lucene.index.DocFieldProcessor.flush(DocFieldProcessor.java:60)
>> >> >    at
>> >> >
>> org.apache.lucene.index.DocumentsWriter.flush(DocumentsWriter.java:574)
>> >> >    at
>> org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:3533)
>> >> >    at
>> org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:3442)
>> >> >    at
>> >> >
>> org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1922)
>> >> >    at
>> >> >
>> org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1880)
>> >> >
>> >>
>> >> ---------------------------------------------------------------------
>> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> >> For additional commands, e-mail: java-user-help@lucene.apache.org
>> >>
>> >>
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>

Re: Assertion Error in TermsHashPerField.comparePostings - Lucene 2.4

Posted by Jason Rutherglen <ja...@gmail.com>.
Each document is being created in a single thread, and the fields of the
document are not being updated elsewhere.  I haven't posted the full code
yet as it needs to cleaned up.  Thanks Mike!

On Tue, Mar 24, 2009 at 2:43 PM, Michael McCandless <
lucene@mikemccandless.com> wrote:

> It looks like you are reusing a Field (the f.setValue(...) calls); are
> you sure you're not changing a Document/Field while another thread is
> adding it to the index?
>
> If you can post the full code, then I can try to run it on my
> wikipedia dump locally.
>
> Mike
>
> Jason Rutherglen <ja...@gmail.com> wrote:
> > Mike,
> >
> > It only happens when at least 1 million documents are indexed in a
> > multithreaded fashion.  Maybe I should post the code?  I will try
> indexing
> > without the payload field, I assume it won't fail because I indexed
> > wikipedia before with no issues.
> >
> > Thanks!
> >
> > Jason
> >
> > On Tue, Mar 24, 2009 at 12:25 PM, Michael McCandless <
> > lucene@mikemccandless.com> wrote:
> >
> >> Hmmmm.
> >>
> >> Jason is this easily/compactly repeated?  EG, try to index the N docs
> >> before that one.
> >>
> >> If you remove the SinglePayloadTokenStream field, does the exception
> >> still happen?
> >>
> >> Mike
> >>
> >> Jason Rutherglen <ja...@gmail.com> wrote:
> >> > While indexing using
> >> > contrib/org.apache.lucene.benchmark.byTask.feeds.EnwikiDocMaker.  The
> >> > asserion error is from
> TermsHashPerField.comparePostings(RawPostingList
> >> p1,
> >> > RawPostingList p2).  A Payload is added to the document representing a
> >> UID.
> >> > Only 1-2 out of 1 million documents indexed generates this error.
> >> >
> >> > java.lang.AssertionError
> >> > problem adding
> >> >
> doc:Document<stored/uncompressed,indexed,tokenized<body:[[Image:Croatia,
> >> > Washington.JPG|right|250px|thumb|The Croatian embassy]] The
> '''Croatian
> >> > Embassy in Washington''' is the [[embassy]] of [[Croatia]] in
> >> [[Washington,
> >> > D.C.]]  It is located on [[Embassy Row]] at 2343 [[Massachusetts
> Avenue
> >> > (Washington, DC)|Massachusetts Avenue]], [[Washington DC
> >> > (northwest)|Northwest]] near [[Dupont Circle]].  Previously the
> building
> >> had
> >> > been home to the [[Austrian Embassy in Washington|Austrian embassy]],
> but
> >> > they left for larger quarters and sold the structure to Croatia in
> 1993.
> >> > The purchase and renovation of the building was largely paid for by
> the
> >> > [[Croatian-American]] community.  In front of the embassy is a large
> >> > sculpture of [[St. Jerome]] by Croatian sculptor [[Ivan Me?trovi?]].
> >> > ==External link== *[http://www.croatiaemb.org/ Official site]
> >> > [[Category:Embassies in Washington|Croatia]] [[Category:Foreign
> relations
> >> of
> >> > Croatia]]> stored/uncompressed,indexed,tokenized<doctitle:Embassy of
> >> Croatia
> >> > in Washington>
> stored/uncompressed,indexed,tokenized<docdate:29-JUN-2006
> >> > 07:27:44.000> stored/uncompressed,indexed,omitNorms<docid:1703107>
> >> >
> >>
> indexed,tokenized<_ID:proj.zoie.api.ZoieIndexReader$SinglePayloadTokenStream@e7b3cf
> >> >
> >> > indexed<id:667162>> ex: java.lang.AssertionError
> >> >    at
> >> >
> >>
> org.apache.lucene.index.TermsHashPerField.comparePostings(TermsHashPerField.java:228)
> >> >    at
> >> >
> >>
> org.apache.lucene.index.TermsHashPerField.quickSort(TermsHashPerField.java:144)
> >> >    at
> >> >
> >>
> org.apache.lucene.index.TermsHashPerField.sortPostings(TermsHashPerField.java:136)
> >> >    at
> >> >
> >>
> org.apache.lucene.index.FreqProxFieldMergeState.<init>(FreqProxFieldMergeState.java:51)
> >> >    at
> >> >
> >>
> org.apache.lucene.index.FreqProxTermsWriter.appendPostings(FreqProxTermsWriter.java:202)
> >> >    at
> >> >
> >>
> org.apache.lucene.index.FreqProxTermsWriter.flush(FreqProxTermsWriter.java:132)
> >> >    at org.apache.lucene.index.TermsHash.flush(TermsHash.java:145)
> >> >    at org.apache.lucene.index.DocInverter.flush(DocInverter.java:74)
> >> >    at
> >> >
> >>
> org.apache.lucene.index.DocFieldConsumers.flush(DocFieldConsumers.java:75)
> >> >    at
> >> >
> >>
> org.apache.lucene.index.DocFieldProcessor.flush(DocFieldProcessor.java:60)
> >> >    at
> >> >
> org.apache.lucene.index.DocumentsWriter.flush(DocumentsWriter.java:574)
> >> >    at
> org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:3533)
> >> >    at org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:3442)
> >> >    at
> >> > org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1922)
> >> >    at
> >> > org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1880)
> >> >
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >> For additional commands, e-mail: java-user-help@lucene.apache.org
> >>
> >>
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: Assertion Error in TermsHashPerField.comparePostings - Lucene 2.4

Posted by Michael McCandless <lu...@mikemccandless.com>.
It looks like you are reusing a Field (the f.setValue(...) calls); are
you sure you're not changing a Document/Field while another thread is
adding it to the index?

If you can post the full code, then I can try to run it on my
wikipedia dump locally.

Mike

Jason Rutherglen <ja...@gmail.com> wrote:
> Mike,
>
> It only happens when at least 1 million documents are indexed in a
> multithreaded fashion.  Maybe I should post the code?  I will try indexing
> without the payload field, I assume it won't fail because I indexed
> wikipedia before with no issues.
>
> Thanks!
>
> Jason
>
> On Tue, Mar 24, 2009 at 12:25 PM, Michael McCandless <
> lucene@mikemccandless.com> wrote:
>
>> Hmmmm.
>>
>> Jason is this easily/compactly repeated?  EG, try to index the N docs
>> before that one.
>>
>> If you remove the SinglePayloadTokenStream field, does the exception
>> still happen?
>>
>> Mike
>>
>> Jason Rutherglen <ja...@gmail.com> wrote:
>> > While indexing using
>> > contrib/org.apache.lucene.benchmark.byTask.feeds.EnwikiDocMaker.  The
>> > asserion error is from TermsHashPerField.comparePostings(RawPostingList
>> p1,
>> > RawPostingList p2).  A Payload is added to the document representing a
>> UID.
>> > Only 1-2 out of 1 million documents indexed generates this error.
>> >
>> > java.lang.AssertionError
>> > problem adding
>> > doc:Document<stored/uncompressed,indexed,tokenized<body:[[Image:Croatia,
>> > Washington.JPG|right|250px|thumb|The Croatian embassy]] The '''Croatian
>> > Embassy in Washington''' is the [[embassy]] of [[Croatia]] in
>> [[Washington,
>> > D.C.]]  It is located on [[Embassy Row]] at 2343 [[Massachusetts Avenue
>> > (Washington, DC)|Massachusetts Avenue]], [[Washington DC
>> > (northwest)|Northwest]] near [[Dupont Circle]].  Previously the building
>> had
>> > been home to the [[Austrian Embassy in Washington|Austrian embassy]], but
>> > they left for larger quarters and sold the structure to Croatia in 1993.
>> > The purchase and renovation of the building was largely paid for by the
>> > [[Croatian-American]] community.  In front of the embassy is a large
>> > sculpture of [[St. Jerome]] by Croatian sculptor [[Ivan Me?trovi?]].
>> > ==External link== *[http://www.croatiaemb.org/ Official site]
>> > [[Category:Embassies in Washington|Croatia]] [[Category:Foreign relations
>> of
>> > Croatia]]> stored/uncompressed,indexed,tokenized<doctitle:Embassy of
>> Croatia
>> > in Washington> stored/uncompressed,indexed,tokenized<docdate:29-JUN-2006
>> > 07:27:44.000> stored/uncompressed,indexed,omitNorms<docid:1703107>
>> >
>> indexed,tokenized<_ID:proj.zoie.api.ZoieIndexReader$SinglePayloadTokenStream@e7b3cf
>> >
>> > indexed<id:667162>> ex: java.lang.AssertionError
>> >    at
>> >
>> org.apache.lucene.index.TermsHashPerField.comparePostings(TermsHashPerField.java:228)
>> >    at
>> >
>> org.apache.lucene.index.TermsHashPerField.quickSort(TermsHashPerField.java:144)
>> >    at
>> >
>> org.apache.lucene.index.TermsHashPerField.sortPostings(TermsHashPerField.java:136)
>> >    at
>> >
>> org.apache.lucene.index.FreqProxFieldMergeState.<init>(FreqProxFieldMergeState.java:51)
>> >    at
>> >
>> org.apache.lucene.index.FreqProxTermsWriter.appendPostings(FreqProxTermsWriter.java:202)
>> >    at
>> >
>> org.apache.lucene.index.FreqProxTermsWriter.flush(FreqProxTermsWriter.java:132)
>> >    at org.apache.lucene.index.TermsHash.flush(TermsHash.java:145)
>> >    at org.apache.lucene.index.DocInverter.flush(DocInverter.java:74)
>> >    at
>> >
>> org.apache.lucene.index.DocFieldConsumers.flush(DocFieldConsumers.java:75)
>> >    at
>> >
>> org.apache.lucene.index.DocFieldProcessor.flush(DocFieldProcessor.java:60)
>> >    at
>> > org.apache.lucene.index.DocumentsWriter.flush(DocumentsWriter.java:574)
>> >    at org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:3533)
>> >    at org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:3442)
>> >    at
>> > org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1922)
>> >    at
>> > org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1880)
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Assertion Error in TermsHashPerField.comparePostings - Lucene 2.4

Posted by Jason Rutherglen <ja...@gmail.com>.
Mike,

It only happens when at least 1 million documents are indexed in a
multithreaded fashion.  Maybe I should post the code?  I will try indexing
without the payload field, I assume it won't fail because I indexed
wikipedia before with no issues.

Thanks!

Jason

On Tue, Mar 24, 2009 at 12:25 PM, Michael McCandless <
lucene@mikemccandless.com> wrote:

> Hmmmm.
>
> Jason is this easily/compactly repeated?  EG, try to index the N docs
> before that one.
>
> If you remove the SinglePayloadTokenStream field, does the exception
> still happen?
>
> Mike
>
> Jason Rutherglen <ja...@gmail.com> wrote:
> > While indexing using
> > contrib/org.apache.lucene.benchmark.byTask.feeds.EnwikiDocMaker.  The
> > asserion error is from TermsHashPerField.comparePostings(RawPostingList
> p1,
> > RawPostingList p2).  A Payload is added to the document representing a
> UID.
> > Only 1-2 out of 1 million documents indexed generates this error.
> >
> > java.lang.AssertionError
> > problem adding
> > doc:Document<stored/uncompressed,indexed,tokenized<body:[[Image:Croatia,
> > Washington.JPG|right|250px|thumb|The Croatian embassy]] The '''Croatian
> > Embassy in Washington''' is the [[embassy]] of [[Croatia]] in
> [[Washington,
> > D.C.]]  It is located on [[Embassy Row]] at 2343 [[Massachusetts Avenue
> > (Washington, DC)|Massachusetts Avenue]], [[Washington DC
> > (northwest)|Northwest]] near [[Dupont Circle]].  Previously the building
> had
> > been home to the [[Austrian Embassy in Washington|Austrian embassy]], but
> > they left for larger quarters and sold the structure to Croatia in 1993.
> > The purchase and renovation of the building was largely paid for by the
> > [[Croatian-American]] community.  In front of the embassy is a large
> > sculpture of [[St. Jerome]] by Croatian sculptor [[Ivan Me?trovi?]].
> > ==External link== *[http://www.croatiaemb.org/ Official site]
> > [[Category:Embassies in Washington|Croatia]] [[Category:Foreign relations
> of
> > Croatia]]> stored/uncompressed,indexed,tokenized<doctitle:Embassy of
> Croatia
> > in Washington> stored/uncompressed,indexed,tokenized<docdate:29-JUN-2006
> > 07:27:44.000> stored/uncompressed,indexed,omitNorms<docid:1703107>
> >
> indexed,tokenized<_ID:proj.zoie.api.ZoieIndexReader$SinglePayloadTokenStream@e7b3cf
> >
> > indexed<id:667162>> ex: java.lang.AssertionError
> >    at
> >
> org.apache.lucene.index.TermsHashPerField.comparePostings(TermsHashPerField.java:228)
> >    at
> >
> org.apache.lucene.index.TermsHashPerField.quickSort(TermsHashPerField.java:144)
> >    at
> >
> org.apache.lucene.index.TermsHashPerField.sortPostings(TermsHashPerField.java:136)
> >    at
> >
> org.apache.lucene.index.FreqProxFieldMergeState.<init>(FreqProxFieldMergeState.java:51)
> >    at
> >
> org.apache.lucene.index.FreqProxTermsWriter.appendPostings(FreqProxTermsWriter.java:202)
> >    at
> >
> org.apache.lucene.index.FreqProxTermsWriter.flush(FreqProxTermsWriter.java:132)
> >    at org.apache.lucene.index.TermsHash.flush(TermsHash.java:145)
> >    at org.apache.lucene.index.DocInverter.flush(DocInverter.java:74)
> >    at
> >
> org.apache.lucene.index.DocFieldConsumers.flush(DocFieldConsumers.java:75)
> >    at
> >
> org.apache.lucene.index.DocFieldProcessor.flush(DocFieldProcessor.java:60)
> >    at
> > org.apache.lucene.index.DocumentsWriter.flush(DocumentsWriter.java:574)
> >    at org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:3533)
> >    at org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:3442)
> >    at
> > org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1922)
> >    at
> > org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1880)
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: Assertion Error in TermsHashPerField.comparePostings - Lucene 2.4

Posted by Jason Rutherglen <ja...@gmail.com>.
Using StandardAnalyzer.  It's probably the payload field?

This is the code that creates the payload field:

private static class SinglePayloadTokenStream extends TokenStream {
           private Token token = new Token(UID_TERM.text(), 0, 0);
           private byte[] buffer = new byte[4];
           private boolean returnToken = false;

           void setUID(int uid) {
             buffer[0] = (byte) (uid);
             buffer[1] = (byte) (uid >> 8);
             buffer[2] = (byte) (uid >> 16);
             buffer[3] = (byte) (uid >> 24);
             token.setPayload(new Payload(buffer));
             returnToken = true;
           }

           public Token next() throws IOException {
             if (returnToken) {
               returnToken = false;
               return token;
             } else {
               return null;
             }
           }
         }

    public static void fillDocumentID(Document doc,int id)
    {
          SinglePayloadTokenStream singlePayloadTokenStream = new
SinglePayloadTokenStream();
          singlePayloadTokenStream.setUID(id);
           Field f=doc.getField(UID_TERM.field());
           if (f==null)
           {
               f=new Field(UID_TERM.field(), singlePayloadTokenStream);
               doc.add(f);
           }
           else{
               f.setValue(singlePayloadTokenStream);
           }
           f=null;
           f=doc.getField(Indexable.DOCUMENT_ID_FIELD);
           if (f==null)
           {
               f=new
Field(Indexable.DOCUMENT_ID_FIELD,String.valueOf(id),Store.NO,Index.NOT_ANALYZED);
               doc.add(f);
           }
           else
           {
               f.setValue(String.valueOf(id));
           }
      }

On Tue, Mar 24, 2009 at 12:36 PM, Michael McCandless <
lucene@mikemccandless.com> wrote:

> I was just able to index all of wikipedia, using StandardAnalyzer,
> with assertions enabled, without hitting that exception.  Which
> analyzer are you using (besides your payload field)?
>
> Mike
>
> Michael McCandless <lu...@mikemccandless.com> wrote:
> > Hmmmm.
> >
> > Jason is this easily/compactly repeated?  EG, try to index the N docs
> > before that one.
> >
> > If you remove the SinglePayloadTokenStream field, does the exception
> > still happen?
> >
> > Mike
> >
> > Jason Rutherglen <ja...@gmail.com> wrote:
> >> While indexing using
> >> contrib/org.apache.lucene.benchmark.byTask.feeds.EnwikiDocMaker.  The
> >> asserion error is from TermsHashPerField.comparePostings(RawPostingList
> p1,
> >> RawPostingList p2).  A Payload is added to the document representing a
> UID.
> >> Only 1-2 out of 1 million documents indexed generates this error.
> >>
> >> java.lang.AssertionError
> >> problem adding
> >> doc:Document<stored/uncompressed,indexed,tokenized<body:[[Image:Croatia,
> >> Washington.JPG|right|250px|thumb|The Croatian embassy]] The '''Croatian
> >> Embassy in Washington''' is the [[embassy]] of [[Croatia]] in
> [[Washington,
> >> D.C.]]  It is located on [[Embassy Row]] at 2343 [[Massachusetts Avenue
> >> (Washington, DC)|Massachusetts Avenue]], [[Washington DC
> >> (northwest)|Northwest]] near [[Dupont Circle]].  Previously the building
> had
> >> been home to the [[Austrian Embassy in Washington|Austrian embassy]],
> but
> >> they left for larger quarters and sold the structure to Croatia in 1993.
> >> The purchase and renovation of the building was largely paid for by the
> >> [[Croatian-American]] community.  In front of the embassy is a large
> >> sculpture of [[St. Jerome]] by Croatian sculptor [[Ivan Me?trovi?]].
> >> ==External link== *[http://www.croatiaemb.org/ Official site]
> >> [[Category:Embassies in Washington|Croatia]] [[Category:Foreign
> relations of
> >> Croatia]]> stored/uncompressed,indexed,tokenized<doctitle:Embassy of
> Croatia
> >> in Washington> stored/uncompressed,indexed,tokenized<docdate:29-JUN-2006
> >> 07:27:44.000> stored/uncompressed,indexed,omitNorms<docid:1703107>
> >>
> indexed,tokenized<_ID:proj.zoie.api.ZoieIndexReader$SinglePayloadTokenStream@e7b3cf
> >
> >> indexed<id:667162>> ex: java.lang.AssertionError
> >>    at
> >>
> org.apache.lucene.index.TermsHashPerField.comparePostings(TermsHashPerField.java:228)
> >>    at
> >>
> org.apache.lucene.index.TermsHashPerField.quickSort(TermsHashPerField.java:144)
> >>    at
> >>
> org.apache.lucene.index.TermsHashPerField.sortPostings(TermsHashPerField.java:136)
> >>    at
> >>
> org.apache.lucene.index.FreqProxFieldMergeState.<init>(FreqProxFieldMergeState.java:51)
> >>    at
> >>
> org.apache.lucene.index.FreqProxTermsWriter.appendPostings(FreqProxTermsWriter.java:202)
> >>    at
> >>
> org.apache.lucene.index.FreqProxTermsWriter.flush(FreqProxTermsWriter.java:132)
> >>    at org.apache.lucene.index.TermsHash.flush(TermsHash.java:145)
> >>    at org.apache.lucene.index.DocInverter.flush(DocInverter.java:74)
> >>    at
> >>
> org.apache.lucene.index.DocFieldConsumers.flush(DocFieldConsumers.java:75)
> >>    at
> >>
> org.apache.lucene.index.DocFieldProcessor.flush(DocFieldProcessor.java:60)
> >>    at
> >> org.apache.lucene.index.DocumentsWriter.flush(DocumentsWriter.java:574)
> >>    at org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:3533)
> >>    at org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:3442)
> >>    at
> >> org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1922)
> >>    at
> >> org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1880)
> >>
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: Assertion Error in TermsHashPerField.comparePostings - Lucene 2.4

Posted by Michael McCandless <lu...@mikemccandless.com>.
I was just able to index all of wikipedia, using StandardAnalyzer,
with assertions enabled, without hitting that exception.  Which
analyzer are you using (besides your payload field)?

Mike

Michael McCandless <lu...@mikemccandless.com> wrote:
> Hmmmm.
>
> Jason is this easily/compactly repeated?  EG, try to index the N docs
> before that one.
>
> If you remove the SinglePayloadTokenStream field, does the exception
> still happen?
>
> Mike
>
> Jason Rutherglen <ja...@gmail.com> wrote:
>> While indexing using
>> contrib/org.apache.lucene.benchmark.byTask.feeds.EnwikiDocMaker.  The
>> asserion error is from TermsHashPerField.comparePostings(RawPostingList p1,
>> RawPostingList p2).  A Payload is added to the document representing a UID.
>> Only 1-2 out of 1 million documents indexed generates this error.
>>
>> java.lang.AssertionError
>> problem adding
>> doc:Document<stored/uncompressed,indexed,tokenized<body:[[Image:Croatia,
>> Washington.JPG|right|250px|thumb|The Croatian embassy]] The '''Croatian
>> Embassy in Washington''' is the [[embassy]] of [[Croatia]] in [[Washington,
>> D.C.]]  It is located on [[Embassy Row]] at 2343 [[Massachusetts Avenue
>> (Washington, DC)|Massachusetts Avenue]], [[Washington DC
>> (northwest)|Northwest]] near [[Dupont Circle]].  Previously the building had
>> been home to the [[Austrian Embassy in Washington|Austrian embassy]], but
>> they left for larger quarters and sold the structure to Croatia in 1993.
>> The purchase and renovation of the building was largely paid for by the
>> [[Croatian-American]] community.  In front of the embassy is a large
>> sculpture of [[St. Jerome]] by Croatian sculptor [[Ivan Me?trovi?]].
>> ==External link== *[http://www.croatiaemb.org/ Official site]
>> [[Category:Embassies in Washington|Croatia]] [[Category:Foreign relations of
>> Croatia]]> stored/uncompressed,indexed,tokenized<doctitle:Embassy of Croatia
>> in Washington> stored/uncompressed,indexed,tokenized<docdate:29-JUN-2006
>> 07:27:44.000> stored/uncompressed,indexed,omitNorms<docid:1703107>
>> indexed,tokenized<_I...@e7b3cf>
>> indexed<id:667162>> ex: java.lang.AssertionError
>>    at
>> org.apache.lucene.index.TermsHashPerField.comparePostings(TermsHashPerField.java:228)
>>    at
>> org.apache.lucene.index.TermsHashPerField.quickSort(TermsHashPerField.java:144)
>>    at
>> org.apache.lucene.index.TermsHashPerField.sortPostings(TermsHashPerField.java:136)
>>    at
>> org.apache.lucene.index.FreqProxFieldMergeState.<init>(FreqProxFieldMergeState.java:51)
>>    at
>> org.apache.lucene.index.FreqProxTermsWriter.appendPostings(FreqProxTermsWriter.java:202)
>>    at
>> org.apache.lucene.index.FreqProxTermsWriter.flush(FreqProxTermsWriter.java:132)
>>    at org.apache.lucene.index.TermsHash.flush(TermsHash.java:145)
>>    at org.apache.lucene.index.DocInverter.flush(DocInverter.java:74)
>>    at
>> org.apache.lucene.index.DocFieldConsumers.flush(DocFieldConsumers.java:75)
>>    at
>> org.apache.lucene.index.DocFieldProcessor.flush(DocFieldProcessor.java:60)
>>    at
>> org.apache.lucene.index.DocumentsWriter.flush(DocumentsWriter.java:574)
>>    at org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:3533)
>>    at org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:3442)
>>    at
>> org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1922)
>>    at
>> org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1880)
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Assertion Error in TermsHashPerField.comparePostings - Lucene 2.4

Posted by Michael McCandless <lu...@mikemccandless.com>.
Hmmmm.

Jason is this easily/compactly repeated?  EG, try to index the N docs
before that one.

If you remove the SinglePayloadTokenStream field, does the exception
still happen?

Mike

Jason Rutherglen <ja...@gmail.com> wrote:
> While indexing using
> contrib/org.apache.lucene.benchmark.byTask.feeds.EnwikiDocMaker.  The
> asserion error is from TermsHashPerField.comparePostings(RawPostingList p1,
> RawPostingList p2).  A Payload is added to the document representing a UID.
> Only 1-2 out of 1 million documents indexed generates this error.
>
> java.lang.AssertionError
> problem adding
> doc:Document<stored/uncompressed,indexed,tokenized<body:[[Image:Croatia,
> Washington.JPG|right|250px|thumb|The Croatian embassy]] The '''Croatian
> Embassy in Washington''' is the [[embassy]] of [[Croatia]] in [[Washington,
> D.C.]]  It is located on [[Embassy Row]] at 2343 [[Massachusetts Avenue
> (Washington, DC)|Massachusetts Avenue]], [[Washington DC
> (northwest)|Northwest]] near [[Dupont Circle]].  Previously the building had
> been home to the [[Austrian Embassy in Washington|Austrian embassy]], but
> they left for larger quarters and sold the structure to Croatia in 1993.
> The purchase and renovation of the building was largely paid for by the
> [[Croatian-American]] community.  In front of the embassy is a large
> sculpture of [[St. Jerome]] by Croatian sculptor [[Ivan Me?trovi?]].
> ==External link== *[http://www.croatiaemb.org/ Official site]
> [[Category:Embassies in Washington|Croatia]] [[Category:Foreign relations of
> Croatia]]> stored/uncompressed,indexed,tokenized<doctitle:Embassy of Croatia
> in Washington> stored/uncompressed,indexed,tokenized<docdate:29-JUN-2006
> 07:27:44.000> stored/uncompressed,indexed,omitNorms<docid:1703107>
> indexed,tokenized<_I...@e7b3cf>
> indexed<id:667162>> ex: java.lang.AssertionError
>    at
> org.apache.lucene.index.TermsHashPerField.comparePostings(TermsHashPerField.java:228)
>    at
> org.apache.lucene.index.TermsHashPerField.quickSort(TermsHashPerField.java:144)
>    at
> org.apache.lucene.index.TermsHashPerField.sortPostings(TermsHashPerField.java:136)
>    at
> org.apache.lucene.index.FreqProxFieldMergeState.<init>(FreqProxFieldMergeState.java:51)
>    at
> org.apache.lucene.index.FreqProxTermsWriter.appendPostings(FreqProxTermsWriter.java:202)
>    at
> org.apache.lucene.index.FreqProxTermsWriter.flush(FreqProxTermsWriter.java:132)
>    at org.apache.lucene.index.TermsHash.flush(TermsHash.java:145)
>    at org.apache.lucene.index.DocInverter.flush(DocInverter.java:74)
>    at
> org.apache.lucene.index.DocFieldConsumers.flush(DocFieldConsumers.java:75)
>    at
> org.apache.lucene.index.DocFieldProcessor.flush(DocFieldProcessor.java:60)
>    at
> org.apache.lucene.index.DocumentsWriter.flush(DocumentsWriter.java:574)
>    at org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:3533)
>    at org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:3442)
>    at
> org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1922)
>    at
> org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1880)
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org