You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@lucene.apache.org by Vince Taluskie <vi...@taluskie.com> on 2003/09/24 21:04:59 UTC

negative number of docs?

Hello,

I'm using lucene for a legacy records project and up till now things have 
worked very well.  The last round of additions I've made to the 
largest index looks like it's hit a limit/bug.

I'm running across a problem with lucene v1.2 involving numDocs()  
returning a negative number of documents (which is causing hits.length()
to throw an exception) after merging several large indexes together.

In this project, there are 11 types of legacy data reports - they are put 
into a fielded format where each row of data becomes a document.  Data 
from multiple divisions is indexed and then merged together into a single 
index for each report type.  

The index sizes are typicaly about 75M documents but the largest had 242M
documents before the latest update.  The latest merge of two intermediary
indexes at 195M docs and 96M docs should have put it at 291M documents -
the merge process (which uses the IndexWriter.addIndexes() call) ran
without any errors and the final index size looks correct but when I
attempt to check the number of documents in it it returns a negative
number.  

Before:

Index /rr/all_indexes/SL contains 242582695 documents

After:

Index /rr/tmpindexes/global/SL contains -245430166 documents

Attempts to perform seaches on this index cause exceptions when the hits
object is returned by the IndexSearcher.search() function.  The trace
looks like:

11:53:36,377 ERROR [Engine] StandardWrapperValve[RRSearcher]: 
Servlet.service() 
for servlet RRSearcher threw exception
java.lang.NegativeArraySizeException
        at org.apache.lucene.index.SegmentReader.norms(Unknown Source)
        at org.apache.lucene.search.TermQuery.scorer(Unknown Source)
        at org.apache.lucene.search.BooleanQuery.scorer(Unknown Source)
        at org.apache.lucene.search.Query.scorer(Unknown Source)
        at org.apache.lucene.search.IndexSearcher.search(Unknown Source)
        at org.apache.lucene.search.Hits.getMoreDocs(Unknown Source)
        at org.apache.lucene.search.Hits.<init>(Unknown Source)
        at org.apache.lucene.search.Searcher.search(Unknown Source)
        at com.cexp.ta.rec_retention.RRSearcher.doPost(RRSearcher.java:347)


I figured I would be fine with the number of documents upto the 2-4B
range - and the data uploads for the project are finished so the indexes
shouldn't need to get larger after this but it looks like I've hit a 
limit between 242M-291M documents.   Should I file a bug on this?  

Recommendations?  I could arbitrarily split the indexes or re-index with 
1.3 if the limit is fixed there - but the simplicity of the unified search 
is a real plus.

Vince

Re: negative number of docs?

Posted by Otis Gospodnetic <ot...@yahoo.com>.

Wow, big index, almost 300M documents.

It looks like we need this small change that Doug had suggested applied
to the code, so I'll do that now.

Otis

--- Vince Taluskie <vi...@taluskie.com> wrote:
> 
> Doug,
> 
> This fix worked like a charm!  Now it's returning correctly:
> 
> 	Index /rr/all_indexes/SL contains 291440746 documents
> 
> Thanks for the great toolkit and excellent support.
> 
> Vince
> 
> On Wed, 24 Sep 2003, Doug Cutting wrote:
> 
> > Vince Taluskie wrote:
> > > Index /rr/tmpindexes/global/SL contains -245430166 documents
> > > 
> > > 11:53:36,377 ERROR [Engine] StandardWrapperValve[RRSearcher]: 
> > > Servlet.service() 
> > > for servlet RRSearcher threw exception
> > > java.lang.NegativeArraySizeException
> > >         at org.apache.lucene.index.SegmentReader.norms(Unknown
> Source)
> > > 
> > > I figured I would be fine with the number of documents upto the
> 2-4B
> > > range - and the data uploads for the project are finished so the
> indexes
> > > shouldn't need to get larger after this but it looks like I've
> hit a 
> > > limit between 242M-291M documents.
> > 
> > I think the problem is on line 77 of FieldReader.java.  Try
> replacing 
> > the line:
> > 
> >      size = (int)indexStream.length() / 8;
> > 
> > with:
> > 
> >      size = (int)(indexStream.length() / 8);
> > 
> > I believe this will fix the problem.  Tell me if it does.
> > 
> > Thanks,
> > 
> > Doug


__________________________________
Do you Yahoo!?
The New Yahoo! Shopping - with improved product search
http://shopping.yahoo.com

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org

Re: negative number of docs?

Posted by Vince Taluskie <vi...@taluskie.com>.

Doug,

This fix worked like a charm!  Now it's returning correctly:

	Index /rr/all_indexes/SL contains 291440746 documents

Thanks for the great toolkit and excellent support.

Vince

On Wed, 24 Sep 2003, Doug Cutting wrote:

> Vince Taluskie wrote:
> > Index /rr/tmpindexes/global/SL contains -245430166 documents
> > 
> > 11:53:36,377 ERROR [Engine] StandardWrapperValve[RRSearcher]: 
> > Servlet.service() 
> > for servlet RRSearcher threw exception
> > java.lang.NegativeArraySizeException
> >         at org.apache.lucene.index.SegmentReader.norms(Unknown Source)
> > 
> > I figured I would be fine with the number of documents upto the 2-4B
> > range - and the data uploads for the project are finished so the indexes
> > shouldn't need to get larger after this but it looks like I've hit a 
> > limit between 242M-291M documents.
> 
> I think the problem is on line 77 of FieldReader.java.  Try replacing 
> the line:
> 
>      size = (int)indexStream.length() / 8;
> 
> with:
> 
>      size = (int)(indexStream.length() / 8);
> 
> I believe this will fix the problem.  Tell me if it does.
> 
> Thanks,
> 
> Doug
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
> 


~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@work                                      @home
                                          
 vince.taluskie (at) cexp.com               vince (at) taluskie.com
 Corporate Express; Technical Architect     Westminster, CO
 Phone:   303 664 2660                      http://www.taluskie.com
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Re: negative number of docs?

Posted by Vince Taluskie <vi...@taluskie.com>.

Doug,

This fix worked like a charm!  Now it's returning correctly:

	Index /rr/all_indexes/SL contains 291440746 documents

Thanks for the great toolkit and excellent support.

Vince

On Wed, 24 Sep 2003, Doug Cutting wrote:

> Vince Taluskie wrote:
> > Index /rr/tmpindexes/global/SL contains -245430166 documents
> > 
> > 11:53:36,377 ERROR [Engine] StandardWrapperValve[RRSearcher]: 
> > Servlet.service() 
> > for servlet RRSearcher threw exception
> > java.lang.NegativeArraySizeException
> >         at org.apache.lucene.index.SegmentReader.norms(Unknown Source)
> > 
> > I figured I would be fine with the number of documents upto the 2-4B
> > range - and the data uploads for the project are finished so the indexes
> > shouldn't need to get larger after this but it looks like I've hit a 
> > limit between 242M-291M documents.
> 
> I think the problem is on line 77 of FieldReader.java.  Try replacing 
> the line:
> 
>      size = (int)indexStream.length() / 8;
> 
> with:
> 
>      size = (int)(indexStream.length() / 8);
> 
> I believe this will fix the problem.  Tell me if it does.
> 
> Thanks,
> 
> Doug
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
> 


~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@work                                      @home
                                          
 vince.taluskie (at) cexp.com               vince (at) taluskie.com
 Corporate Express; Technical Architect     Westminster, CO
 Phone:   303 664 2660                      http://www.taluskie.com
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~



---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org

Re: negative number of docs?

Posted by Doug Cutting <cu...@lucene.com>.

Vince Taluskie wrote:
> Index /rr/tmpindexes/global/SL contains -245430166 documents
> 
> 11:53:36,377 ERROR [Engine] StandardWrapperValve[RRSearcher]: 
> Servlet.service() 
> for servlet RRSearcher threw exception
> java.lang.NegativeArraySizeException
>         at org.apache.lucene.index.SegmentReader.norms(Unknown Source)
> 
> I figured I would be fine with the number of documents upto the 2-4B
> range - and the data uploads for the project are finished so the indexes
> shouldn't need to get larger after this but it looks like I've hit a 
> limit between 242M-291M documents.

I think the problem is on line 77 of FieldReader.java.  Try replacing 
the line:

     size = (int)indexStream.length() / 8;

with:

     size = (int)(indexStream.length() / 8);

I believe this will fix the problem.  Tell me if it does.

Thanks,

Doug


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org

Re: negative number of docs?

Posted by Doug Cutting <cu...@lucene.com>.

Vince Taluskie wrote:
> Index /rr/tmpindexes/global/SL contains -245430166 documents
> 
> 11:53:36,377 ERROR [Engine] StandardWrapperValve[RRSearcher]: 
> Servlet.service() 
> for servlet RRSearcher threw exception
> java.lang.NegativeArraySizeException
>         at org.apache.lucene.index.SegmentReader.norms(Unknown Source)
> 
> I figured I would be fine with the number of documents upto the 2-4B
> range - and the data uploads for the project are finished so the indexes
> shouldn't need to get larger after this but it looks like I've hit a 
> limit between 242M-291M documents.

I think the problem is on line 77 of FieldReader.java.  Try replacing 
the line:

     size = (int)indexStream.length() / 8;

with:

     size = (int)(indexStream.length() / 8);

I believe this will fix the problem.  Tell me if it does.

Thanks,

Doug