You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Robert Haschart <rh...@virginia.edu> on 2008/04/17 16:57:18 UTC

CorruptIndexException

Greetings all,

We are using Solr to index Marc records to create a better, more user 
friendly library catalog here at the University of Virginia.   To do 
this I have written a program starting from the VuFind Java importer 
written by Wayne Graham (from the College of William & Mary).   After 
making extensive changes to accomodate our different indexing scheme the 
program was working great. 

Subsequently I upgraded to lucene version 2.3.1 to realize the faster 
indexing performance, and although it is faster, is also seems somewhat 
unreliable.   A half a dozen times I have completely wiped out the 
existing index and started afresh, building an new index.  Each time at 
some point in the indexing run I receive the Error message:


Exception in thread "Thread-10" 
org.apache.lucene.index.MergePolicy$MergeException: 
org.apache.lucene.index.CorruptIndexException:
    doc counts differ for segment _3gh: fieldsReader shows 15999 but 
segmentInfo shows 16000
        at 
org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:271)
Caused by: org.apache.lucene.index.CorruptIndexException: doc counts 
differ for segment _3gh: fieldsReader shows 15999 but segmentInfo shows 
16000
        at 
org.apache.lucene.index.SegmentReader.initialize(SegmentReader.java:313)
        at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:262)
        at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:221)
        at 
org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:3099)
        at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:2834)
        at 
org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:240)


This occurs a dozen or so times within the set of Marc records it is 
processing, and when the next segment starts processing I receive this 
message:


Loading properties from import.properties
Apr 16, 2008 7:03:51 AM org.apache.solr.core.Config setInstanceDir
INFO: Solr home set to '/usr/local/projects/solr/'
Apr 16, 2008 7:03:51 AM org.apache.solr.core.SolrConfig initConfig
INFO: Loaded SolrConfig: solrconfig.xml
Apr 16, 2008 7:03:52 AM org.apache.solr.core.SolrCore <init>
INFO: Opening new SolrCore at /usr/local/projects/solr/, 
dataDir=/usr/local/projects/solr/data
Apr 16, 2008 7:03:52 AM org.apache.solr.schema.IndexSchema readConfig
INFO: Reading Solr Schema
Apr 16, 2008 7:03:52 AM org.apache.solr.schema.IndexSchema readConfig
INFO: Schema name=solr_int
Apr 16, 2008 7:03:52 AM org.apache.solr.schema.IndexSchema readConfig
INFO: default search field is text
Apr 16, 2008 7:03:52 AM org.apache.solr.schema.IndexSchema readConfig
INFO: query parser default operator is AND
Apr 16, 2008 7:03:52 AM org.apache.solr.schema.IndexSchema readConfig
INFO: unique key field: id
Apr 16, 2008 7:03:52 AM org.apache.solr.core.SolrCore parseListener
INFO: Searching for listeners: //listener[@event="firstSearcher"]
Apr 16, 2008 7:03:52 AM org.apache.solr.core.SolrCore parseListener
INFO: Searching for listeners: //listener[@event="newSearcher"]
Apr 16, 2008 7:03:52 AM org.apache.solr.core.SolrCore initWriters
INFO: adding queryResponseWriter 
xslt=org.apache.solr.request.XSLTResponseWriter
Apr 16, 2008 7:03:52 AM org.apache.solr.request.XSLTResponseWriter init
INFO: xsltCacheLifetimeSeconds=5
Apr 16, 2008 7:03:52 AM org.apache.solr.core.RequestHandlers 
initHandlersFromConfig
INFO: adding requestHandler: standard=solr.StandardRequestHandler
Apr 16, 2008 7:03:52 AM org.apache.solr.core.RequestHandlers 
initHandlersFromConfig
INFO: adding requestHandler: partitioned=solr.DisMaxRequestHandler
Apr 16, 2008 7:03:52 AM org.apache.solr.core.RequestHandlers 
initHandlersFromConfig
INFO: adding requestHandler: dismax=solr.DisMaxRequestHandler
Apr 16, 2008 7:03:52 AM org.apache.solr.core.RequestHandlers 
initHandlersFromConfig
INFO: adding requestHandler: catalog=solr.DisMaxRequestHandler
Apr 16, 2008 7:03:52 AM org.apache.solr.core.RequestHandlers 
initHandlersFromConfig
INFO: adding requestHandler: music=solr.DisMaxRequestHandler
Apr 16, 2008 7:03:52 AM org.apache.solr.core.RequestHandlers 
initHandlersFromConfig
INFO: adding requestHandler: semester_at_sea=solr.DisMaxRequestHandler
Apr 16, 2008 7:03:52 AM org.apache.solr.core.RequestHandlers 
initHandlersFromConfig
INFO: adding lazy requestHandler: 
spellchecker=solr.SpellCheckerRequestHandler
Apr 16, 2008 7:03:52 AM org.apache.solr.core.RequestHandlers 
initHandlersFromConfig
INFO: adding requestHandler: /update=solr.XmlUpdateRequestHandler
Apr 16, 2008 7:03:52 AM org.apache.solr.core.RequestHandlers 
initHandlersFromConfig
INFO: adding lazy requestHandler: /update/csv=solr.CSVRequestHandler
Apr 16, 2008 7:03:52 AM org.apache.solr.core.RequestHandlers 
initHandlersFromConfig
INFO: adding requestHandler: 
/admin/luke=org.apache.solr.handler.admin.LukeRequestHandler
Apr 16, 2008 7:03:52 AM org.apache.solr.core.RequestHandlers 
initHandlersFromConfig
INFO: adding requestHandler: 
/admin/system=org.apache.solr.handler.admin.SystemInfoHandler
Apr 16, 2008 7:03:52 AM org.apache.solr.core.RequestHandlers 
initHandlersFromConfig
INFO: adding requestHandler: 
/admin/plugins=org.apache.solr.handler.admin.PluginInfoHandler
Apr 16, 2008 7:03:52 AM org.apache.solr.core.RequestHandlers 
initHandlersFromConfig
INFO: adding requestHandler: 
/admin/threads=org.apache.solr.handler.admin.ThreadDumpHandler
Apr 16, 2008 7:03:52 AM org.apache.solr.core.RequestHandlers 
initHandlersFromConfig
INFO: adding requestHandler: 
/admin/properties=org.apache.solr.handler.admin.PropertiesRequestHandler
Apr 16, 2008 7:03:52 AM org.apache.solr.core.RequestHandlers 
initHandlersFromConfig
INFO: adding requestHandler: /debug/dump=solr.DumpRequestHandler
Exception in thread "main" java.lang.RuntimeException: 
org.apache.lucene.index.CorruptIndexException: doc counts differ for 
segment
_3gh: fieldsReader shows 15999 but segmentInfo shows 16000
        at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:433)
        at org.apache.solr.core.SolrCore.<init>(SolrCore.java:216)
        at MarcImporter.<init>(Unknown Source)
        at MarcImporter.main(Unknown Source)
Caused by: org.apache.lucene.index.CorruptIndexException: doc counts 
differ for segment _3gh: fieldsReader shows 15999 but segmentIn
fo shows 16000
        at 
org.apache.lucene.index.SegmentReader.initialize(SegmentReader.java:313)
        at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:262)
        at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:197)
        at 
org.apache.lucene.index.MultiSegmentReader.<init>(MultiSegmentReader.java:55)
        at 
org.apache.lucene.index.DirectoryIndexReader$1.doBody(DirectoryIndexReader.java:75)
        at 
org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:636)
        at 
org.apache.lucene.index.DirectoryIndexReader.open(DirectoryIndexReader.java:63)
        at org.apache.lucene.index.IndexReader.open(IndexReader.java:209)
        at org.apache.lucene.index.IndexReader.open(IndexReader.java:173)
        at 
org.apache.solr.search.SolrIndexSearcher.<init>(SolrIndexSearcher.java:87)
        at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:424)
        ... 3 more

Does anybody have suggestions as to how to track this problem down?

Thanks in Advance,

Robert Haschart




Re: CorruptIndexException

Posted by Michael McCandless <lu...@mikemccandless.com>.
Robert Haschart <rh...@virginia.edu> wrote:

>  To answer your questions:     I completely deleted the index each time
> before retesting.   and the java command as shown by "ps" does show -Xbatch.
>  The program is running on:
>  > uname -a
>  Linux lab8.betech.virginia.edu 2.6.18-53.1.14.el5 #1 SMP Tue Feb 19
> 07:18:21 EST 2008 i686 i686 i386 GNU/Linux
>  > more /etc/redhat-release
>  Red Hat Enterprise Linux Server release 5.1 (Tikanga)
>
>  after downgrading from the originally reported version of java:   Java(TM)
> SE Runtime Environment (build 1.6.0_05-b13)
>  to this one:
>  > java -version
>  java version "1.6.0_02"
>  Java(TM) SE Runtime Environment (build 1.6.0_02-b05)
>  Java HotSpot(TM) Server VM (build 1.6.0_02-b05, mixed mode)
>
>  the indexing run sucessfully completed processing all 112 record chunks!
> Yea!
>  (with -Xbatch on the command line, I didn't try with the 1.6.0_02 java
> without -Xbatch)

OK, that's good and bad news.  Good in that this still appears to be a
JVM issue (scary, really) since downgrading to 1.6.0_02 resolves it.
Bad in that -Xbatch is not always a viable workaround.  But at least
you have a way forward...

> So at this point it looks like the problem is in my marc-8 to utf-8
> translation code.  I'll look into this possibility further.

OK.  Let me know if this seems to come back to a Lucene issue!

Mike

Re: CorruptIndexException

Posted by Robert Haschart <rh...@virginia.edu>.
Michael,
Following up on this most recent post.  I remembered that the initial 
records were translated into utf-8 prior to indexing, whereas the 
updates records are in the marc-8 encoding internally, and the program 
is written to translate them on the fly as they are read in before 
indexing them.  I just tried pre-translating them, and the entire set of 
updates ran.   So at this point it looks like the problem is in my 
marc-8 to utf-8 translation code.  I'll look into this possibility further.

    Thank again for your help on my earlier problem.
    -Robert Haschart

Robert Haschart wrote:

> Michael,
>
> To answer your questions:     I completely deleted the index each time 
> before retesting.    and the java command as shown by "ps" does show 
> -Xbatch.
> The program is running on:
> > uname -a
> Linux lab8.betech.virginia.edu 2.6.18-53.1.14.el5 #1 SMP Tue Feb 19 
> 07:18:21 EST 2008 i686 i686 i386 GNU/Linux
> > more /etc/redhat-release
> Red Hat Enterprise Linux Server release 5.1 (Tikanga)
>
> after downgrading from the originally reported version of java:   
> Java(TM) SE Runtime Environment (build 1.6.0_05-b13)
> to this one:
> > java -version
> java version "1.6.0_02"
> Java(TM) SE Runtime Environment (build 1.6.0_02-b05)
> Java HotSpot(TM) Server VM (build 1.6.0_02-b05, mixed mode)
>
> the indexing run sucessfully completed processing all 112 record 
> chunks!  Yea!
> (with -Xbatch on the command line, I didn't try with the 1.6.0_02 java 
> without -Xbatch)
>
>
> However, I am still seeing a different problem which is what caused me 
> to upgrade to Lucene version 2.3.1 and start experiencing the 
> CorruptIndexException.
>
> Basically we have a set of 112 files dumped from our OPAC in a binary 
> Marc record format, each of which contains about 35000 records.  In 
> addition to those files we have a set of daily updates, consisting of 
> new records that have been added, and edits for existing records, as 
> well as a separate file listing the ids of records to be deleted.
>
> After creating the initial index, I have a script loop through all of 
> the update files, adding in all of the new records and updates, and 
> then processing all of that day's deletes.  Typically at some point in 
> processing the updates, an auto-commit will be triggered.  Eventually 
> for one of these auto-commits (not the same one every time) the commit 
> will never finish.  The behavior I see is that it will write out 
> information about doing a commit (as shown below) and then seeming do 
> nothing ever after, although the CPU % as reported by "ps" for the 
> process sits around 90 to 100 % and stays there for days.  While the 
> program is sitting there doing this, no changes are made to the files 
> in the index.  So its really not clear what it is doing.
> If you have any ideas about this other problem, I would appreciate ant 
> insight you have.
>
> Adding record 10993: u4386758
> Adding record 10994: u4386760
> Adding record 10995: u4386767
> Adding record 10996: u4386768
> Adding record 10997: u4386812
> Adding record 10998: u4386816
> Adding record 10999: u4386850
> Adding record 11000: u4386883
> Adding record 11001: u4387066
> Adding record 11002: u4387074
> Adding record 11003: u4387764
> Apr 20, 2008 1:12:18 PM org.apache.solr.update.DirectUpdateHandler2 
> commit
> INFO: start commit(optimize=false,waitFlush=true,waitSearcher=true)
> Apr 20, 2008 1:12:18 PM org.apache.solr.update.DirectUpdateHandler2 
> doDeletions
> INFO: DirectUpdateHandler2 deleting and removing dups for 11003 ids
> Apr 20, 2008 1:12:32 PM org.apache.solr.search.SolrIndexSearcher <init>
> INFO: Opening Searcher@1c7e2da DirectUpdateHandler2
> Apr 20, 2008 1:12:36 PM org.apache.solr.update.DirectUpdateHandler2 
> doDeletions
> INFO: DirectUpdateHandler2 docs deleted=11003
> Apr 20, 2008 1:12:36 PM org.apache.solr.search.SolrIndexSearcher <init>
> INFO: Opening Searcher@51127a main
> Apr 20, 2008 1:12:37 PM org.apache.solr.update.DirectUpdateHandler2 
> commit
> INFO: end_commit_flush
> Apr 20, 2008 1:12:37 PM org.apache.solr.search.SolrIndexSearcher warm
> INFO: autowarming Searcher@51127a main from Searcher@b9e45a main
>        
> filterCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0} 
>
> Apr 20, 2008 1:12:37 PM org.apache.solr.search.SolrIndexSearcher warm
> INFO: autowarming result for Searcher@51127a main
>        
> filterCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0} 
>
> Apr 20, 2008 1:12:37 PM org.apache.solr.search.SolrIndexSearcher warm
> INFO: autowarming Searcher@51127a main from Searcher@b9e45a main
>        
> queryResultCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0} 
>
> Apr 20, 2008 1:12:37 PM org.apache.solr.search.SolrIndexSearcher warm
> INFO: autowarming result for Searcher@51127a main
>        
> queryResultCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0} 
>
> Apr 20, 2008 1:12:37 PM org.apache.solr.search.SolrIndexSearcher warm
> INFO: autowarming Searcher@51127a main from Searcher@b9e45a main
>        
> documentCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0} 
>
> Apr 20, 2008 1:12:37 PM org.apache.solr.search.SolrIndexSearcher warm
> INFO: autowarming result for Searcher@51127a main
>        
> documentCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0} 
>
> Apr 20, 2008 1:12:37 PM org.apache.solr.core.SolrCore registerSearcher
> INFO: Registered new searcher Searcher@51127a main
> Apr 20, 2008 1:12:37 PM org.apache.solr.search.SolrIndexSearcher close
> INFO: Closing Searcher@b9e45a main
>        
> filterCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0} 
>
>        
> queryResultCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0} 
>
>        
> documentCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0} 
>
>
>
>



Re: CorruptIndexException

Posted by Robert Haschart <rh...@virginia.edu>.
Michael,

To answer your questions:  
    I completely deleted the index each time before retesting. 
    and the java command as shown by "ps" does show -Xbatch.
The program is running on:
 > uname -a
Linux lab8.betech.virginia.edu 2.6.18-53.1.14.el5 #1 SMP Tue Feb 19 
07:18:21 EST 2008 i686 i686 i386 GNU/Linux
 > more /etc/redhat-release
Red Hat Enterprise Linux Server release 5.1 (Tikanga)

after downgrading from the originally reported version of java:   
Java(TM) SE Runtime Environment (build 1.6.0_05-b13)
to this one:
 > java -version
java version "1.6.0_02"
Java(TM) SE Runtime Environment (build 1.6.0_02-b05)
Java HotSpot(TM) Server VM (build 1.6.0_02-b05, mixed mode)

the indexing run sucessfully completed processing all 112 record 
chunks!  Yea!
(with -Xbatch on the command line, I didn't try with the 1.6.0_02 java 
without -Xbatch)


However, I am still seeing a different problem which is what caused me 
to upgrade to Lucene version 2.3.1 and start experiencing the 
CorruptIndexException.

Basically we have a set of 112 files dumped from our OPAC in a binary 
Marc record format, each of which contains about 35000 records.  In 
addition to those files we have a set of daily updates, consisting of 
new records that have been added, and edits for existing records, as 
well as a separate file listing the ids of records to be deleted.

After creating the initial index, I have a script loop through all of 
the update files, adding in all of the new records and updates, and then 
processing all of that day's deletes.  Typically at some point in 
processing the updates, an auto-commit will be triggered.  Eventually 
for one of these auto-commits (not the same one every time) the commit 
will never finish.  The behavior I see is that it will write out 
information about doing a commit (as shown below) and then seeming do 
nothing ever after, although the CPU % as reported by "ps" for the 
process sits around 90 to 100 % and stays there for days.  While the 
program is sitting there doing this, no changes are made to the files in 
the index.  So its really not clear what it is doing. 

If you have any ideas about this other problem, I would appreciate ant 
insight you have.

Adding record 10993: u4386758
Adding record 10994: u4386760
Adding record 10995: u4386767
Adding record 10996: u4386768
Adding record 10997: u4386812
Adding record 10998: u4386816
Adding record 10999: u4386850
Adding record 11000: u4386883
Adding record 11001: u4387066
Adding record 11002: u4387074
Adding record 11003: u4387764
Apr 20, 2008 1:12:18 PM org.apache.solr.update.DirectUpdateHandler2 commit
INFO: start commit(optimize=false,waitFlush=true,waitSearcher=true)
Apr 20, 2008 1:12:18 PM org.apache.solr.update.DirectUpdateHandler2 
doDeletions
INFO: DirectUpdateHandler2 deleting and removing dups for 11003 ids
Apr 20, 2008 1:12:32 PM org.apache.solr.search.SolrIndexSearcher <init>
INFO: Opening Searcher@1c7e2da DirectUpdateHandler2
Apr 20, 2008 1:12:36 PM org.apache.solr.update.DirectUpdateHandler2 
doDeletions
INFO: DirectUpdateHandler2 docs deleted=11003
Apr 20, 2008 1:12:36 PM org.apache.solr.search.SolrIndexSearcher <init>
INFO: Opening Searcher@51127a main
Apr 20, 2008 1:12:37 PM org.apache.solr.update.DirectUpdateHandler2 commit
INFO: end_commit_flush
Apr 20, 2008 1:12:37 PM org.apache.solr.search.SolrIndexSearcher warm
INFO: autowarming Searcher@51127a main from Searcher@b9e45a main
        
filterCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
Apr 20, 2008 1:12:37 PM org.apache.solr.search.SolrIndexSearcher warm
INFO: autowarming result for Searcher@51127a main
        
filterCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
Apr 20, 2008 1:12:37 PM org.apache.solr.search.SolrIndexSearcher warm
INFO: autowarming Searcher@51127a main from Searcher@b9e45a main
        
queryResultCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
Apr 20, 2008 1:12:37 PM org.apache.solr.search.SolrIndexSearcher warm
INFO: autowarming result for Searcher@51127a main
        
queryResultCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
Apr 20, 2008 1:12:37 PM org.apache.solr.search.SolrIndexSearcher warm
INFO: autowarming Searcher@51127a main from Searcher@b9e45a main
        
documentCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
Apr 20, 2008 1:12:37 PM org.apache.solr.search.SolrIndexSearcher warm
INFO: autowarming result for Searcher@51127a main
        
documentCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
Apr 20, 2008 1:12:37 PM org.apache.solr.core.SolrCore registerSearcher
INFO: Registered new searcher Searcher@51127a main
Apr 20, 2008 1:12:37 PM org.apache.solr.search.SolrIndexSearcher close
INFO: Closing Searcher@b9e45a main
        
filterCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
        
queryResultCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
        
documentCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}




Michael McCandless wrote:

>
> Hmmm, not good.
>
> One other thing to try would be -Xint, which turns off hotspot  
> compilation entirely.  On that last case that also prevented the issue.
>
> Did you cleanly rebuild your index when you retested?  And you're  
> really certain your JRE is running with -Xbatch?  (You should be able  
> to use "ps" to look in the process listing and verify).
>
> Which OS are you on?
>
> Mike
>
> Robert Haschart wrote:
>
>> Mike,
>>
>> You are right it does sound exactly like that situation.   The java  
>> version is:
>>
>> java version "1.6.0_05"
>> Java(TM) SE Runtime Environment (build 1.6.0_05-b13)
>> Java HotSpot(TM) Server VM (build 10.0-b19, mixed mode)
>>
>> Which seems to be the same as the one giving the other poster  problems.
>>
>> I tried running with the -Xbatch flag on for the indexing runs, but  
>> after 23 chunks of Marc records, the exact same problem occurred.
>>
>> I'm going to try downgrading java versions to see whether that helps.
>>
>> -Robert Haschart
>>
>>
>> Michael McCandless wrote:
>>
>>>
>>> Which exact version of the JRE are you using?  Can you try  running  
>>> java with -Xbatch (forces up front compilation)?
>>>
>>> Your situation sounds very similar to this one:
>>>
>>>   http://lucene.markmail.org/message/awkkunr7j24nh4qj
>>>
>>> Mike
>>>
>>> On Apr 17, 2008, at 10:57 AM, Robert Haschart wrote:
>>>
>>>> Greetings all,
>>>>
>>>> We are using Solr to index Marc records to create a better, more   
>>>> user friendly library catalog here at the University of  
>>>> Virginia.    To do this I have written a program starting from  the 
>>>> VuFind Java  importer written by Wayne Graham (from the  College of 
>>>> William &  Mary).   After making extensive changes to  accomodate 
>>>> our different  indexing scheme the program was working  great.
>>>> Subsequently I upgraded to lucene version 2.3.1 to realize the   
>>>> faster indexing performance, and although it is faster, is also   
>>>> seems somewhat unreliable.   A half a dozen times I have  
>>>> completely  wiped out the existing index and started afresh,  
>>>> building an new  index.  Each time at some point in the indexing  
>>>> run I receive the  Error message:
>>>>
>>>>
>>>> Exception in thread "Thread-10"  
>>>> org.apache.lucene.index.MergePolicy $MergeException:  
>>>> org.apache.lucene.index.CorruptIndexException:
>>>>    doc counts differ for segment _3gh: fieldsReader shows 15999  
>>>> but  segmentInfo shows 16000
>>>>        at org.apache.lucene.index.ConcurrentMergeScheduler  
>>>> $MergeThread.run(ConcurrentMergeScheduler.java:271)
>>>> Caused by: org.apache.lucene.index.CorruptIndexException: doc   
>>>> counts differ for segment _3gh: fieldsReader shows 15999 but   
>>>> segmentInfo shows 16000
>>>>        at org.apache.lucene.index.SegmentReader.initialize  
>>>> (SegmentReader.java:313)
>>>>        at org.apache.lucene.index.SegmentReader.get  
>>>> (SegmentReader.java:262)
>>>>        at org.apache.lucene.index.SegmentReader.get  
>>>> (SegmentReader.java:221)
>>>>        at org.apache.lucene.index.IndexWriter.mergeMiddle  
>>>> (IndexWriter.java:3099)
>>>>        at org.apache.lucene.index.IndexWriter.merge  
>>>> (IndexWriter.java:2834)
>>>>        at org.apache.lucene.index.ConcurrentMergeScheduler  
>>>> $MergeThread.run(ConcurrentMergeScheduler.java:240)
>>>>
>>>>
>>>> This occurs a dozen or so times within the set of Marc records  it  
>>>> is processing, and when the next segment starts processing I   
>>>> receive this message:
>>>>
>>>>
>>>> Loading properties from import.properties
>>>> Apr 16, 2008 7:03:51 AM org.apache.solr.core.Config setInstanceDir
>>>> INFO: Solr home set to '/usr/local/projects/solr/'
>>>> Apr 16, 2008 7:03:51 AM org.apache.solr.core.SolrConfig initConfig
>>>> INFO: Loaded SolrConfig: solrconfig.xml
>>>> Apr 16, 2008 7:03:52 AM org.apache.solr.core.SolrCore <init>
>>>> INFO: Opening new SolrCore at /usr/local/projects/solr/,  dataDir=/ 
>>>> usr/local/projects/solr/data
>>>> Apr 16, 2008 7:03:52 AM org.apache.solr.schema.IndexSchema  readConfig
>>>> INFO: Reading Solr Schema
>>>> Apr 16, 2008 7:03:52 AM org.apache.solr.schema.IndexSchema  readConfig
>>>> INFO: Schema name=solr_int
>>>> Apr 16, 2008 7:03:52 AM org.apache.solr.schema.IndexSchema  readConfig
>>>> INFO: default search field is text
>>>> Apr 16, 2008 7:03:52 AM org.apache.solr.schema.IndexSchema  readConfig
>>>> INFO: query parser default operator is AND
>>>> Apr 16, 2008 7:03:52 AM org.apache.solr.schema.IndexSchema  readConfig
>>>> INFO: unique key field: id
>>>> Apr 16, 2008 7:03:52 AM org.apache.solr.core.SolrCore parseListener
>>>> INFO: Searching for listeners: //listener[@event="firstSearcher"]
>>>> Apr 16, 2008 7:03:52 AM org.apache.solr.core.SolrCore parseListener
>>>> INFO: Searching for listeners: //listener[@event="newSearcher"]
>>>> Apr 16, 2008 7:03:52 AM org.apache.solr.core.SolrCore initWriters
>>>> INFO: adding queryResponseWriter   
>>>> xslt=org.apache.solr.request.XSLTResponseWriter
>>>> Apr 16, 2008 7:03:52 AM  
>>>> org.apache.solr.request.XSLTResponseWriter  init
>>>> INFO: xsltCacheLifetimeSeconds=5
>>>> Apr 16, 2008 7:03:52 AM org.apache.solr.core.RequestHandlers   
>>>> initHandlersFromConfig
>>>> INFO: adding requestHandler: standard=solr.StandardRequestHandler
>>>> Apr 16, 2008 7:03:52 AM org.apache.solr.core.RequestHandlers   
>>>> initHandlersFromConfig
>>>> INFO: adding requestHandler: partitioned=solr.DisMaxRequestHandler
>>>> Apr 16, 2008 7:03:52 AM org.apache.solr.core.RequestHandlers   
>>>> initHandlersFromConfig
>>>> INFO: adding requestHandler: dismax=solr.DisMaxRequestHandler
>>>> Apr 16, 2008 7:03:52 AM org.apache.solr.core.RequestHandlers   
>>>> initHandlersFromConfig
>>>> INFO: adding requestHandler: catalog=solr.DisMaxRequestHandler
>>>> Apr 16, 2008 7:03:52 AM org.apache.solr.core.RequestHandlers   
>>>> initHandlersFromConfig
>>>> INFO: adding requestHandler: music=solr.DisMaxRequestHandler
>>>> Apr 16, 2008 7:03:52 AM org.apache.solr.core.RequestHandlers   
>>>> initHandlersFromConfig
>>>> INFO: adding requestHandler:  
>>>> semester_at_sea=solr.DisMaxRequestHandler
>>>> Apr 16, 2008 7:03:52 AM org.apache.solr.core.RequestHandlers   
>>>> initHandlersFromConfig
>>>> INFO: adding lazy requestHandler:   
>>>> spellchecker=solr.SpellCheckerRequestHandler
>>>> Apr 16, 2008 7:03:52 AM org.apache.solr.core.RequestHandlers   
>>>> initHandlersFromConfig
>>>> INFO: adding requestHandler: /update=solr.XmlUpdateRequestHandler
>>>> Apr 16, 2008 7:03:52 AM org.apache.solr.core.RequestHandlers   
>>>> initHandlersFromConfig
>>>> INFO: adding lazy requestHandler: /update/csv=solr.CSVRequestHandler
>>>> Apr 16, 2008 7:03:52 AM org.apache.solr.core.RequestHandlers   
>>>> initHandlersFromConfig
>>>> INFO: adding requestHandler: /admin/  
>>>> luke=org.apache.solr.handler.admin.LukeRequestHandler
>>>> Apr 16, 2008 7:03:52 AM org.apache.solr.core.RequestHandlers   
>>>> initHandlersFromConfig
>>>> INFO: adding requestHandler: /admin/  
>>>> system=org.apache.solr.handler.admin.SystemInfoHandler
>>>> Apr 16, 2008 7:03:52 AM org.apache.solr.core.RequestHandlers   
>>>> initHandlersFromConfig
>>>> INFO: adding requestHandler: /admin/  
>>>> plugins=org.apache.solr.handler.admin.PluginInfoHandler
>>>> Apr 16, 2008 7:03:52 AM org.apache.solr.core.RequestHandlers   
>>>> initHandlersFromConfig
>>>> INFO: adding requestHandler: /admin/  
>>>> threads=org.apache.solr.handler.admin.ThreadDumpHandler
>>>> Apr 16, 2008 7:03:52 AM org.apache.solr.core.RequestHandlers   
>>>> initHandlersFromConfig
>>>> INFO: adding requestHandler: /admin/  
>>>> properties=org.apache.solr.handler.admin.PropertiesRequestHandler
>>>> Apr 16, 2008 7:03:52 AM org.apache.solr.core.RequestHandlers   
>>>> initHandlersFromConfig
>>>> INFO: adding requestHandler: /debug/dump=solr.DumpRequestHandler
>>>> Exception in thread "main" java.lang.RuntimeException:   
>>>> org.apache.lucene.index.CorruptIndexException: doc counts differ   
>>>> for segment
>>>> _3gh: fieldsReader shows 15999 but segmentInfo shows 16000
>>>>        at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java: 
>>>> 433)
>>>>        at org.apache.solr.core.SolrCore.<init>(SolrCore.java:216)
>>>>        at MarcImporter.<init>(Unknown Source)
>>>>        at MarcImporter.main(Unknown Source)
>>>> Caused by: org.apache.lucene.index.CorruptIndexException: doc   
>>>> counts differ for segment _3gh: fieldsReader shows 15999 but  
>>>> segmentIn
>>>> fo shows 16000
>>>>        at org.apache.lucene.index.SegmentReader.initialize  
>>>> (SegmentReader.java:313)
>>>>        at org.apache.lucene.index.SegmentReader.get  
>>>> (SegmentReader.java:262)
>>>>        at org.apache.lucene.index.SegmentReader.get  
>>>> (SegmentReader.java:197)
>>>>        at org.apache.lucene.index.MultiSegmentReader.<init>  
>>>> (MultiSegmentReader.java:55)
>>>>        at org.apache.lucene.index.DirectoryIndexReader$1.doBody  
>>>> (DirectoryIndexReader.java:75)
>>>>        at org.apache.lucene.index.SegmentInfos 
>>>> $FindSegmentsFile.run (SegmentInfos.java:636)
>>>>        at org.apache.lucene.index.DirectoryIndexReader.open  
>>>> (DirectoryIndexReader.java:63)
>>>>        at org.apache.lucene.index.IndexReader.open 
>>>> (IndexReader.java: 209)
>>>>        at org.apache.lucene.index.IndexReader.open 
>>>> (IndexReader.java: 173)
>>>>        at org.apache.solr.search.SolrIndexSearcher.<init>  
>>>> (SolrIndexSearcher.java:87)
>>>>        at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java: 
>>>> 424)
>>>>        ... 3 more
>>>>
>>>> Does anybody have suggestions as to how to track this problem down?
>>>>
>>>> Thanks in Advance,
>>>>
>>>> Robert Haschart
>>>>
>>>>
>>>>
>>
>>



Re: CorruptIndexException

Posted by Michael McCandless <lu...@mikemccandless.com>.
Hmmm, not good.

One other thing to try would be -Xint, which turns off hotspot  
compilation entirely.  On that last case that also prevented the issue.

Did you cleanly rebuild your index when you retested?  And you're  
really certain your JRE is running with -Xbatch?  (You should be able  
to use "ps" to look in the process listing and verify).

Which OS are you on?

Mike

Robert Haschart wrote:
> Mike,
>
> You are right it does sound exactly like that situation.   The java  
> version is:
>
> java version "1.6.0_05"
> Java(TM) SE Runtime Environment (build 1.6.0_05-b13)
> Java HotSpot(TM) Server VM (build 10.0-b19, mixed mode)
>
> Which seems to be the same as the one giving the other poster  
> problems.
>
> I tried running with the -Xbatch flag on for the indexing runs, but  
> after 23 chunks of Marc records, the exact same problem occurred.
>
> I'm going to try downgrading java versions to see whether that helps.
>
> -Robert Haschart
>
>
> Michael McCandless wrote:
>
>>
>> Which exact version of the JRE are you using?  Can you try  
>> running  java with -Xbatch (forces up front compilation)?
>>
>> Your situation sounds very similar to this one:
>>
>>   http://lucene.markmail.org/message/awkkunr7j24nh4qj
>>
>> Mike
>>
>> On Apr 17, 2008, at 10:57 AM, Robert Haschart wrote:
>>
>>> Greetings all,
>>>
>>> We are using Solr to index Marc records to create a better, more   
>>> user friendly library catalog here at the University of  
>>> Virginia.    To do this I have written a program starting from  
>>> the VuFind Java  importer written by Wayne Graham (from the  
>>> College of William &  Mary).   After making extensive changes to  
>>> accomodate our different  indexing scheme the program was working  
>>> great.
>>> Subsequently I upgraded to lucene version 2.3.1 to realize the   
>>> faster indexing performance, and although it is faster, is also   
>>> seems somewhat unreliable.   A half a dozen times I have  
>>> completely  wiped out the existing index and started afresh,  
>>> building an new  index.  Each time at some point in the indexing  
>>> run I receive the  Error message:
>>>
>>>
>>> Exception in thread "Thread-10"  
>>> org.apache.lucene.index.MergePolicy $MergeException:  
>>> org.apache.lucene.index.CorruptIndexException:
>>>    doc counts differ for segment _3gh: fieldsReader shows 15999  
>>> but  segmentInfo shows 16000
>>>        at org.apache.lucene.index.ConcurrentMergeScheduler  
>>> $MergeThread.run(ConcurrentMergeScheduler.java:271)
>>> Caused by: org.apache.lucene.index.CorruptIndexException: doc   
>>> counts differ for segment _3gh: fieldsReader shows 15999 but   
>>> segmentInfo shows 16000
>>>        at org.apache.lucene.index.SegmentReader.initialize  
>>> (SegmentReader.java:313)
>>>        at org.apache.lucene.index.SegmentReader.get  
>>> (SegmentReader.java:262)
>>>        at org.apache.lucene.index.SegmentReader.get  
>>> (SegmentReader.java:221)
>>>        at org.apache.lucene.index.IndexWriter.mergeMiddle  
>>> (IndexWriter.java:3099)
>>>        at org.apache.lucene.index.IndexWriter.merge  
>>> (IndexWriter.java:2834)
>>>        at org.apache.lucene.index.ConcurrentMergeScheduler  
>>> $MergeThread.run(ConcurrentMergeScheduler.java:240)
>>>
>>>
>>> This occurs a dozen or so times within the set of Marc records  
>>> it  is processing, and when the next segment starts processing I   
>>> receive this message:
>>>
>>>
>>> Loading properties from import.properties
>>> Apr 16, 2008 7:03:51 AM org.apache.solr.core.Config setInstanceDir
>>> INFO: Solr home set to '/usr/local/projects/solr/'
>>> Apr 16, 2008 7:03:51 AM org.apache.solr.core.SolrConfig initConfig
>>> INFO: Loaded SolrConfig: solrconfig.xml
>>> Apr 16, 2008 7:03:52 AM org.apache.solr.core.SolrCore <init>
>>> INFO: Opening new SolrCore at /usr/local/projects/solr/,  
>>> dataDir=/ usr/local/projects/solr/data
>>> Apr 16, 2008 7:03:52 AM org.apache.solr.schema.IndexSchema  
>>> readConfig
>>> INFO: Reading Solr Schema
>>> Apr 16, 2008 7:03:52 AM org.apache.solr.schema.IndexSchema  
>>> readConfig
>>> INFO: Schema name=solr_int
>>> Apr 16, 2008 7:03:52 AM org.apache.solr.schema.IndexSchema  
>>> readConfig
>>> INFO: default search field is text
>>> Apr 16, 2008 7:03:52 AM org.apache.solr.schema.IndexSchema  
>>> readConfig
>>> INFO: query parser default operator is AND
>>> Apr 16, 2008 7:03:52 AM org.apache.solr.schema.IndexSchema  
>>> readConfig
>>> INFO: unique key field: id
>>> Apr 16, 2008 7:03:52 AM org.apache.solr.core.SolrCore parseListener
>>> INFO: Searching for listeners: //listener[@event="firstSearcher"]
>>> Apr 16, 2008 7:03:52 AM org.apache.solr.core.SolrCore parseListener
>>> INFO: Searching for listeners: //listener[@event="newSearcher"]
>>> Apr 16, 2008 7:03:52 AM org.apache.solr.core.SolrCore initWriters
>>> INFO: adding queryResponseWriter   
>>> xslt=org.apache.solr.request.XSLTResponseWriter
>>> Apr 16, 2008 7:03:52 AM  
>>> org.apache.solr.request.XSLTResponseWriter  init
>>> INFO: xsltCacheLifetimeSeconds=5
>>> Apr 16, 2008 7:03:52 AM org.apache.solr.core.RequestHandlers   
>>> initHandlersFromConfig
>>> INFO: adding requestHandler: standard=solr.StandardRequestHandler
>>> Apr 16, 2008 7:03:52 AM org.apache.solr.core.RequestHandlers   
>>> initHandlersFromConfig
>>> INFO: adding requestHandler: partitioned=solr.DisMaxRequestHandler
>>> Apr 16, 2008 7:03:52 AM org.apache.solr.core.RequestHandlers   
>>> initHandlersFromConfig
>>> INFO: adding requestHandler: dismax=solr.DisMaxRequestHandler
>>> Apr 16, 2008 7:03:52 AM org.apache.solr.core.RequestHandlers   
>>> initHandlersFromConfig
>>> INFO: adding requestHandler: catalog=solr.DisMaxRequestHandler
>>> Apr 16, 2008 7:03:52 AM org.apache.solr.core.RequestHandlers   
>>> initHandlersFromConfig
>>> INFO: adding requestHandler: music=solr.DisMaxRequestHandler
>>> Apr 16, 2008 7:03:52 AM org.apache.solr.core.RequestHandlers   
>>> initHandlersFromConfig
>>> INFO: adding requestHandler:  
>>> semester_at_sea=solr.DisMaxRequestHandler
>>> Apr 16, 2008 7:03:52 AM org.apache.solr.core.RequestHandlers   
>>> initHandlersFromConfig
>>> INFO: adding lazy requestHandler:   
>>> spellchecker=solr.SpellCheckerRequestHandler
>>> Apr 16, 2008 7:03:52 AM org.apache.solr.core.RequestHandlers   
>>> initHandlersFromConfig
>>> INFO: adding requestHandler: /update=solr.XmlUpdateRequestHandler
>>> Apr 16, 2008 7:03:52 AM org.apache.solr.core.RequestHandlers   
>>> initHandlersFromConfig
>>> INFO: adding lazy requestHandler: /update/csv=solr.CSVRequestHandler
>>> Apr 16, 2008 7:03:52 AM org.apache.solr.core.RequestHandlers   
>>> initHandlersFromConfig
>>> INFO: adding requestHandler: /admin/  
>>> luke=org.apache.solr.handler.admin.LukeRequestHandler
>>> Apr 16, 2008 7:03:52 AM org.apache.solr.core.RequestHandlers   
>>> initHandlersFromConfig
>>> INFO: adding requestHandler: /admin/  
>>> system=org.apache.solr.handler.admin.SystemInfoHandler
>>> Apr 16, 2008 7:03:52 AM org.apache.solr.core.RequestHandlers   
>>> initHandlersFromConfig
>>> INFO: adding requestHandler: /admin/  
>>> plugins=org.apache.solr.handler.admin.PluginInfoHandler
>>> Apr 16, 2008 7:03:52 AM org.apache.solr.core.RequestHandlers   
>>> initHandlersFromConfig
>>> INFO: adding requestHandler: /admin/  
>>> threads=org.apache.solr.handler.admin.ThreadDumpHandler
>>> Apr 16, 2008 7:03:52 AM org.apache.solr.core.RequestHandlers   
>>> initHandlersFromConfig
>>> INFO: adding requestHandler: /admin/  
>>> properties=org.apache.solr.handler.admin.PropertiesRequestHandler
>>> Apr 16, 2008 7:03:52 AM org.apache.solr.core.RequestHandlers   
>>> initHandlersFromConfig
>>> INFO: adding requestHandler: /debug/dump=solr.DumpRequestHandler
>>> Exception in thread "main" java.lang.RuntimeException:   
>>> org.apache.lucene.index.CorruptIndexException: doc counts differ   
>>> for segment
>>> _3gh: fieldsReader shows 15999 but segmentInfo shows 16000
>>>        at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java: 
>>> 433)
>>>        at org.apache.solr.core.SolrCore.<init>(SolrCore.java:216)
>>>        at MarcImporter.<init>(Unknown Source)
>>>        at MarcImporter.main(Unknown Source)
>>> Caused by: org.apache.lucene.index.CorruptIndexException: doc   
>>> counts differ for segment _3gh: fieldsReader shows 15999 but  
>>> segmentIn
>>> fo shows 16000
>>>        at org.apache.lucene.index.SegmentReader.initialize  
>>> (SegmentReader.java:313)
>>>        at org.apache.lucene.index.SegmentReader.get  
>>> (SegmentReader.java:262)
>>>        at org.apache.lucene.index.SegmentReader.get  
>>> (SegmentReader.java:197)
>>>        at org.apache.lucene.index.MultiSegmentReader.<init>  
>>> (MultiSegmentReader.java:55)
>>>        at org.apache.lucene.index.DirectoryIndexReader$1.doBody  
>>> (DirectoryIndexReader.java:75)
>>>        at org.apache.lucene.index.SegmentInfos 
>>> $FindSegmentsFile.run (SegmentInfos.java:636)
>>>        at org.apache.lucene.index.DirectoryIndexReader.open  
>>> (DirectoryIndexReader.java:63)
>>>        at org.apache.lucene.index.IndexReader.open 
>>> (IndexReader.java: 209)
>>>        at org.apache.lucene.index.IndexReader.open 
>>> (IndexReader.java: 173)
>>>        at org.apache.solr.search.SolrIndexSearcher.<init>  
>>> (SolrIndexSearcher.java:87)
>>>        at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java: 
>>> 424)
>>>        ... 3 more
>>>
>>> Does anybody have suggestions as to how to track this problem down?
>>>
>>> Thanks in Advance,
>>>
>>> Robert Haschart
>>>
>>>
>>>
>
>


Re: CorruptIndexException

Posted by Robert Haschart <rh...@virginia.edu>.
Mike,

You are right it does sound exactly like that situation.   The java 
version is:

java version "1.6.0_05"
Java(TM) SE Runtime Environment (build 1.6.0_05-b13)
Java HotSpot(TM) Server VM (build 10.0-b19, mixed mode)

Which seems to be the same as the one giving the other poster problems.

I tried running with the -Xbatch flag on for the indexing runs, but 
after 23 chunks of Marc records, the exact same problem occurred.

I'm going to try downgrading java versions to see whether that helps.

-Robert Haschart


Michael McCandless wrote:

>
> Which exact version of the JRE are you using?  Can you try running  
> java with -Xbatch (forces up front compilation)?
>
> Your situation sounds very similar to this one:
>
>   http://lucene.markmail.org/message/awkkunr7j24nh4qj
>
> Mike
>
> On Apr 17, 2008, at 10:57 AM, Robert Haschart wrote:
>
>> Greetings all,
>>
>> We are using Solr to index Marc records to create a better, more  
>> user friendly library catalog here at the University of Virginia.    
>> To do this I have written a program starting from the VuFind Java  
>> importer written by Wayne Graham (from the College of William &  
>> Mary).   After making extensive changes to accomodate our different  
>> indexing scheme the program was working great.
>> Subsequently I upgraded to lucene version 2.3.1 to realize the  
>> faster indexing performance, and although it is faster, is also  
>> seems somewhat unreliable.   A half a dozen times I have completely  
>> wiped out the existing index and started afresh, building an new  
>> index.  Each time at some point in the indexing run I receive the  
>> Error message:
>>
>>
>> Exception in thread "Thread-10" org.apache.lucene.index.MergePolicy 
>> $MergeException: org.apache.lucene.index.CorruptIndexException:
>>    doc counts differ for segment _3gh: fieldsReader shows 15999 but  
>> segmentInfo shows 16000
>>        at org.apache.lucene.index.ConcurrentMergeScheduler 
>> $MergeThread.run(ConcurrentMergeScheduler.java:271)
>> Caused by: org.apache.lucene.index.CorruptIndexException: doc  counts 
>> differ for segment _3gh: fieldsReader shows 15999 but  segmentInfo 
>> shows 16000
>>        at org.apache.lucene.index.SegmentReader.initialize 
>> (SegmentReader.java:313)
>>        at org.apache.lucene.index.SegmentReader.get 
>> (SegmentReader.java:262)
>>        at org.apache.lucene.index.SegmentReader.get 
>> (SegmentReader.java:221)
>>        at org.apache.lucene.index.IndexWriter.mergeMiddle 
>> (IndexWriter.java:3099)
>>        at org.apache.lucene.index.IndexWriter.merge 
>> (IndexWriter.java:2834)
>>        at org.apache.lucene.index.ConcurrentMergeScheduler 
>> $MergeThread.run(ConcurrentMergeScheduler.java:240)
>>
>>
>> This occurs a dozen or so times within the set of Marc records it  is 
>> processing, and when the next segment starts processing I  receive 
>> this message:
>>
>>
>> Loading properties from import.properties
>> Apr 16, 2008 7:03:51 AM org.apache.solr.core.Config setInstanceDir
>> INFO: Solr home set to '/usr/local/projects/solr/'
>> Apr 16, 2008 7:03:51 AM org.apache.solr.core.SolrConfig initConfig
>> INFO: Loaded SolrConfig: solrconfig.xml
>> Apr 16, 2008 7:03:52 AM org.apache.solr.core.SolrCore <init>
>> INFO: Opening new SolrCore at /usr/local/projects/solr/, dataDir=/ 
>> usr/local/projects/solr/data
>> Apr 16, 2008 7:03:52 AM org.apache.solr.schema.IndexSchema readConfig
>> INFO: Reading Solr Schema
>> Apr 16, 2008 7:03:52 AM org.apache.solr.schema.IndexSchema readConfig
>> INFO: Schema name=solr_int
>> Apr 16, 2008 7:03:52 AM org.apache.solr.schema.IndexSchema readConfig
>> INFO: default search field is text
>> Apr 16, 2008 7:03:52 AM org.apache.solr.schema.IndexSchema readConfig
>> INFO: query parser default operator is AND
>> Apr 16, 2008 7:03:52 AM org.apache.solr.schema.IndexSchema readConfig
>> INFO: unique key field: id
>> Apr 16, 2008 7:03:52 AM org.apache.solr.core.SolrCore parseListener
>> INFO: Searching for listeners: //listener[@event="firstSearcher"]
>> Apr 16, 2008 7:03:52 AM org.apache.solr.core.SolrCore parseListener
>> INFO: Searching for listeners: //listener[@event="newSearcher"]
>> Apr 16, 2008 7:03:52 AM org.apache.solr.core.SolrCore initWriters
>> INFO: adding queryResponseWriter  
>> xslt=org.apache.solr.request.XSLTResponseWriter
>> Apr 16, 2008 7:03:52 AM org.apache.solr.request.XSLTResponseWriter  init
>> INFO: xsltCacheLifetimeSeconds=5
>> Apr 16, 2008 7:03:52 AM org.apache.solr.core.RequestHandlers  
>> initHandlersFromConfig
>> INFO: adding requestHandler: standard=solr.StandardRequestHandler
>> Apr 16, 2008 7:03:52 AM org.apache.solr.core.RequestHandlers  
>> initHandlersFromConfig
>> INFO: adding requestHandler: partitioned=solr.DisMaxRequestHandler
>> Apr 16, 2008 7:03:52 AM org.apache.solr.core.RequestHandlers  
>> initHandlersFromConfig
>> INFO: adding requestHandler: dismax=solr.DisMaxRequestHandler
>> Apr 16, 2008 7:03:52 AM org.apache.solr.core.RequestHandlers  
>> initHandlersFromConfig
>> INFO: adding requestHandler: catalog=solr.DisMaxRequestHandler
>> Apr 16, 2008 7:03:52 AM org.apache.solr.core.RequestHandlers  
>> initHandlersFromConfig
>> INFO: adding requestHandler: music=solr.DisMaxRequestHandler
>> Apr 16, 2008 7:03:52 AM org.apache.solr.core.RequestHandlers  
>> initHandlersFromConfig
>> INFO: adding requestHandler: semester_at_sea=solr.DisMaxRequestHandler
>> Apr 16, 2008 7:03:52 AM org.apache.solr.core.RequestHandlers  
>> initHandlersFromConfig
>> INFO: adding lazy requestHandler:  
>> spellchecker=solr.SpellCheckerRequestHandler
>> Apr 16, 2008 7:03:52 AM org.apache.solr.core.RequestHandlers  
>> initHandlersFromConfig
>> INFO: adding requestHandler: /update=solr.XmlUpdateRequestHandler
>> Apr 16, 2008 7:03:52 AM org.apache.solr.core.RequestHandlers  
>> initHandlersFromConfig
>> INFO: adding lazy requestHandler: /update/csv=solr.CSVRequestHandler
>> Apr 16, 2008 7:03:52 AM org.apache.solr.core.RequestHandlers  
>> initHandlersFromConfig
>> INFO: adding requestHandler: /admin/ 
>> luke=org.apache.solr.handler.admin.LukeRequestHandler
>> Apr 16, 2008 7:03:52 AM org.apache.solr.core.RequestHandlers  
>> initHandlersFromConfig
>> INFO: adding requestHandler: /admin/ 
>> system=org.apache.solr.handler.admin.SystemInfoHandler
>> Apr 16, 2008 7:03:52 AM org.apache.solr.core.RequestHandlers  
>> initHandlersFromConfig
>> INFO: adding requestHandler: /admin/ 
>> plugins=org.apache.solr.handler.admin.PluginInfoHandler
>> Apr 16, 2008 7:03:52 AM org.apache.solr.core.RequestHandlers  
>> initHandlersFromConfig
>> INFO: adding requestHandler: /admin/ 
>> threads=org.apache.solr.handler.admin.ThreadDumpHandler
>> Apr 16, 2008 7:03:52 AM org.apache.solr.core.RequestHandlers  
>> initHandlersFromConfig
>> INFO: adding requestHandler: /admin/ 
>> properties=org.apache.solr.handler.admin.PropertiesRequestHandler
>> Apr 16, 2008 7:03:52 AM org.apache.solr.core.RequestHandlers  
>> initHandlersFromConfig
>> INFO: adding requestHandler: /debug/dump=solr.DumpRequestHandler
>> Exception in thread "main" java.lang.RuntimeException:  
>> org.apache.lucene.index.CorruptIndexException: doc counts differ  for 
>> segment
>> _3gh: fieldsReader shows 15999 but segmentInfo shows 16000
>>        at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:433)
>>        at org.apache.solr.core.SolrCore.<init>(SolrCore.java:216)
>>        at MarcImporter.<init>(Unknown Source)
>>        at MarcImporter.main(Unknown Source)
>> Caused by: org.apache.lucene.index.CorruptIndexException: doc  counts 
>> differ for segment _3gh: fieldsReader shows 15999 but segmentIn
>> fo shows 16000
>>        at org.apache.lucene.index.SegmentReader.initialize 
>> (SegmentReader.java:313)
>>        at org.apache.lucene.index.SegmentReader.get 
>> (SegmentReader.java:262)
>>        at org.apache.lucene.index.SegmentReader.get 
>> (SegmentReader.java:197)
>>        at org.apache.lucene.index.MultiSegmentReader.<init> 
>> (MultiSegmentReader.java:55)
>>        at org.apache.lucene.index.DirectoryIndexReader$1.doBody 
>> (DirectoryIndexReader.java:75)
>>        at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run 
>> (SegmentInfos.java:636)
>>        at org.apache.lucene.index.DirectoryIndexReader.open 
>> (DirectoryIndexReader.java:63)
>>        at org.apache.lucene.index.IndexReader.open(IndexReader.java: 
>> 209)
>>        at org.apache.lucene.index.IndexReader.open(IndexReader.java: 
>> 173)
>>        at org.apache.solr.search.SolrIndexSearcher.<init> 
>> (SolrIndexSearcher.java:87)
>>        at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:424)
>>        ... 3 more
>>
>> Does anybody have suggestions as to how to track this problem down?
>>
>> Thanks in Advance,
>>
>> Robert Haschart
>>
>>
>>



Re: CorruptIndexException

Posted by Michael McCandless <lu...@mikemccandless.com>.
Which exact version of the JRE are you using?  Can you try running  
java with -Xbatch (forces up front compilation)?

Your situation sounds very similar to this one:

   http://lucene.markmail.org/message/awkkunr7j24nh4qj

Mike

On Apr 17, 2008, at 10:57 AM, Robert Haschart wrote:
> Greetings all,
>
> We are using Solr to index Marc records to create a better, more  
> user friendly library catalog here at the University of Virginia.    
> To do this I have written a program starting from the VuFind Java  
> importer written by Wayne Graham (from the College of William &  
> Mary).   After making extensive changes to accomodate our different  
> indexing scheme the program was working great.
> Subsequently I upgraded to lucene version 2.3.1 to realize the  
> faster indexing performance, and although it is faster, is also  
> seems somewhat unreliable.   A half a dozen times I have completely  
> wiped out the existing index and started afresh, building an new  
> index.  Each time at some point in the indexing run I receive the  
> Error message:
>
>
> Exception in thread "Thread-10" org.apache.lucene.index.MergePolicy 
> $MergeException: org.apache.lucene.index.CorruptIndexException:
>    doc counts differ for segment _3gh: fieldsReader shows 15999 but  
> segmentInfo shows 16000
>        at org.apache.lucene.index.ConcurrentMergeScheduler 
> $MergeThread.run(ConcurrentMergeScheduler.java:271)
> Caused by: org.apache.lucene.index.CorruptIndexException: doc  
> counts differ for segment _3gh: fieldsReader shows 15999 but  
> segmentInfo shows 16000
>        at org.apache.lucene.index.SegmentReader.initialize 
> (SegmentReader.java:313)
>        at org.apache.lucene.index.SegmentReader.get 
> (SegmentReader.java:262)
>        at org.apache.lucene.index.SegmentReader.get 
> (SegmentReader.java:221)
>        at org.apache.lucene.index.IndexWriter.mergeMiddle 
> (IndexWriter.java:3099)
>        at org.apache.lucene.index.IndexWriter.merge 
> (IndexWriter.java:2834)
>        at org.apache.lucene.index.ConcurrentMergeScheduler 
> $MergeThread.run(ConcurrentMergeScheduler.java:240)
>
>
> This occurs a dozen or so times within the set of Marc records it  
> is processing, and when the next segment starts processing I  
> receive this message:
>
>
> Loading properties from import.properties
> Apr 16, 2008 7:03:51 AM org.apache.solr.core.Config setInstanceDir
> INFO: Solr home set to '/usr/local/projects/solr/'
> Apr 16, 2008 7:03:51 AM org.apache.solr.core.SolrConfig initConfig
> INFO: Loaded SolrConfig: solrconfig.xml
> Apr 16, 2008 7:03:52 AM org.apache.solr.core.SolrCore <init>
> INFO: Opening new SolrCore at /usr/local/projects/solr/, dataDir=/ 
> usr/local/projects/solr/data
> Apr 16, 2008 7:03:52 AM org.apache.solr.schema.IndexSchema readConfig
> INFO: Reading Solr Schema
> Apr 16, 2008 7:03:52 AM org.apache.solr.schema.IndexSchema readConfig
> INFO: Schema name=solr_int
> Apr 16, 2008 7:03:52 AM org.apache.solr.schema.IndexSchema readConfig
> INFO: default search field is text
> Apr 16, 2008 7:03:52 AM org.apache.solr.schema.IndexSchema readConfig
> INFO: query parser default operator is AND
> Apr 16, 2008 7:03:52 AM org.apache.solr.schema.IndexSchema readConfig
> INFO: unique key field: id
> Apr 16, 2008 7:03:52 AM org.apache.solr.core.SolrCore parseListener
> INFO: Searching for listeners: //listener[@event="firstSearcher"]
> Apr 16, 2008 7:03:52 AM org.apache.solr.core.SolrCore parseListener
> INFO: Searching for listeners: //listener[@event="newSearcher"]
> Apr 16, 2008 7:03:52 AM org.apache.solr.core.SolrCore initWriters
> INFO: adding queryResponseWriter  
> xslt=org.apache.solr.request.XSLTResponseWriter
> Apr 16, 2008 7:03:52 AM org.apache.solr.request.XSLTResponseWriter  
> init
> INFO: xsltCacheLifetimeSeconds=5
> Apr 16, 2008 7:03:52 AM org.apache.solr.core.RequestHandlers  
> initHandlersFromConfig
> INFO: adding requestHandler: standard=solr.StandardRequestHandler
> Apr 16, 2008 7:03:52 AM org.apache.solr.core.RequestHandlers  
> initHandlersFromConfig
> INFO: adding requestHandler: partitioned=solr.DisMaxRequestHandler
> Apr 16, 2008 7:03:52 AM org.apache.solr.core.RequestHandlers  
> initHandlersFromConfig
> INFO: adding requestHandler: dismax=solr.DisMaxRequestHandler
> Apr 16, 2008 7:03:52 AM org.apache.solr.core.RequestHandlers  
> initHandlersFromConfig
> INFO: adding requestHandler: catalog=solr.DisMaxRequestHandler
> Apr 16, 2008 7:03:52 AM org.apache.solr.core.RequestHandlers  
> initHandlersFromConfig
> INFO: adding requestHandler: music=solr.DisMaxRequestHandler
> Apr 16, 2008 7:03:52 AM org.apache.solr.core.RequestHandlers  
> initHandlersFromConfig
> INFO: adding requestHandler: semester_at_sea=solr.DisMaxRequestHandler
> Apr 16, 2008 7:03:52 AM org.apache.solr.core.RequestHandlers  
> initHandlersFromConfig
> INFO: adding lazy requestHandler:  
> spellchecker=solr.SpellCheckerRequestHandler
> Apr 16, 2008 7:03:52 AM org.apache.solr.core.RequestHandlers  
> initHandlersFromConfig
> INFO: adding requestHandler: /update=solr.XmlUpdateRequestHandler
> Apr 16, 2008 7:03:52 AM org.apache.solr.core.RequestHandlers  
> initHandlersFromConfig
> INFO: adding lazy requestHandler: /update/csv=solr.CSVRequestHandler
> Apr 16, 2008 7:03:52 AM org.apache.solr.core.RequestHandlers  
> initHandlersFromConfig
> INFO: adding requestHandler: /admin/ 
> luke=org.apache.solr.handler.admin.LukeRequestHandler
> Apr 16, 2008 7:03:52 AM org.apache.solr.core.RequestHandlers  
> initHandlersFromConfig
> INFO: adding requestHandler: /admin/ 
> system=org.apache.solr.handler.admin.SystemInfoHandler
> Apr 16, 2008 7:03:52 AM org.apache.solr.core.RequestHandlers  
> initHandlersFromConfig
> INFO: adding requestHandler: /admin/ 
> plugins=org.apache.solr.handler.admin.PluginInfoHandler
> Apr 16, 2008 7:03:52 AM org.apache.solr.core.RequestHandlers  
> initHandlersFromConfig
> INFO: adding requestHandler: /admin/ 
> threads=org.apache.solr.handler.admin.ThreadDumpHandler
> Apr 16, 2008 7:03:52 AM org.apache.solr.core.RequestHandlers  
> initHandlersFromConfig
> INFO: adding requestHandler: /admin/ 
> properties=org.apache.solr.handler.admin.PropertiesRequestHandler
> Apr 16, 2008 7:03:52 AM org.apache.solr.core.RequestHandlers  
> initHandlersFromConfig
> INFO: adding requestHandler: /debug/dump=solr.DumpRequestHandler
> Exception in thread "main" java.lang.RuntimeException:  
> org.apache.lucene.index.CorruptIndexException: doc counts differ  
> for segment
> _3gh: fieldsReader shows 15999 but segmentInfo shows 16000
>        at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:433)
>        at org.apache.solr.core.SolrCore.<init>(SolrCore.java:216)
>        at MarcImporter.<init>(Unknown Source)
>        at MarcImporter.main(Unknown Source)
> Caused by: org.apache.lucene.index.CorruptIndexException: doc  
> counts differ for segment _3gh: fieldsReader shows 15999 but segmentIn
> fo shows 16000
>        at org.apache.lucene.index.SegmentReader.initialize 
> (SegmentReader.java:313)
>        at org.apache.lucene.index.SegmentReader.get 
> (SegmentReader.java:262)
>        at org.apache.lucene.index.SegmentReader.get 
> (SegmentReader.java:197)
>        at org.apache.lucene.index.MultiSegmentReader.<init> 
> (MultiSegmentReader.java:55)
>        at org.apache.lucene.index.DirectoryIndexReader$1.doBody 
> (DirectoryIndexReader.java:75)
>        at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run 
> (SegmentInfos.java:636)
>        at org.apache.lucene.index.DirectoryIndexReader.open 
> (DirectoryIndexReader.java:63)
>        at org.apache.lucene.index.IndexReader.open(IndexReader.java: 
> 209)
>        at org.apache.lucene.index.IndexReader.open(IndexReader.java: 
> 173)
>        at org.apache.solr.search.SolrIndexSearcher.<init> 
> (SolrIndexSearcher.java:87)
>        at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:424)
>        ... 3 more
>
> Does anybody have suggestions as to how to track this problem down?
>
> Thanks in Advance,
>
> Robert Haschart
>
>
>