You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by simon <mt...@gmail.com> on 2017/05/04 14:49:25 UTC

Re: Indexing I/O errors and CorruptIndex messages

I've pretty much ruled out system/hardware issues - the AWS instance has
been rebooted,  and indexing to a core on a new and empty  disk/file system
fails in the same way with a CorruptIndexException.
I can  generally get indexing to complete by significantly dialing down the
number of indexer scripts running concurrently, but the duration goes up
proportionately.

-Simon


On Thu, Apr 27, 2017 at 9:26 AM, simon <mt...@gmail.com> wrote:

> Nope ... huge file system (600gb) only 50% full, and a complete index
> would be 80gb max.
>
> On Wed, Apr 26, 2017 at 4:04 PM, Erick Erickson <er...@gmail.com>
> wrote:
>
>> Disk space issue? Lucene requires at least as much free disk space as
>> your index size. Note that the disk full issue will be transient, IOW
>> if you look now and have free space it still may have been all used up
>> but had some space reclaimed.
>>
>> Best,
>> Erick
>>
>> On Wed, Apr 26, 2017 at 12:02 PM, simon <mt...@gmail.com> wrote:
>> > reposting this as the problem described is happening again and there
>> were
>> > no responses to the original email. Anyone ?
>> > ----------------------------
>> > I'm seeing an odd error during indexing for which I can't find any
>> reason.
>> >
>> > The relevant solr log entry:
>> >
>> > 2017-03-24 19:09:35.363 ERROR (commitScheduler-30-thread-1) [
>> > x:build0324] o.a.s.u.CommitTracker auto commit
>> > error...:java.io.EOFException: read past EOF:
>> MMapIndexInput(path="/
>> > indexes/solrindexes/build0324/index/_4ku.fdx")
>> >      at org.apache.lucene.store.ByteBufferIndexInput.readByte(
>> > ByteBufferIndexInput.java:75)
>> > ...
>> >     Suppressed: org.apache.lucene.index.CorruptIndexException: checksum
>> > status indeterminate: remaining=0, please run checkindex for more
>> details
>> > (resource=     BufferedChecksumIndexInput(MM
>> apIndexInput(path="/indexes/
>> > solrindexes/build0324/index/_4ku.fdx")))
>> >          at org.apache.lucene.codecs.CodecUtil.checkFooter(
>> > CodecUtil.java:451)
>> >          at org.apache.lucene.codecs.compressing.
>> > CompressingStoredFieldsReader.<init>(CompressingStoredFields
>> Reader.java:140)
>> >  followed within a few seconds by
>> >
>> >  2017-03-24 19:09:56.402 ERROR (commitScheduler-31-thread-1) [
>> > x:build0324] o.a.s.u.CommitTracker auto commit
>> > error...:org.apache.solr.common.SolrException:
>> > Error opening new searcher
>> >     at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:
>> 1820)
>> >     at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1931)
>> > ...
>> > Caused by: java.io.EOFException: read past EOF:
>> > MMapIndexInput(path="/indexes/solrindexes/build0324/index/_4ku.fdx")
>> >     at org.apache.lucene.store.ByteBufferIndexInput.readByte(
>> > ByteBufferIndexInput.java:75)
>> >
>> > This error is repeated a few times as the indexing continued and further
>> > autocommits were triggered.
>> >
>> > I stopped the indexing process, made a backup snapshot of the index,
>> >  restarted indexing at a checkpoint, and everything then completed
>> without
>> > further incidents
>> >
>> > I ran checkIndex on the saved snapshot and it reported no errors
>> > whatsoever. Operations on the complete index (inclcuing an optimize and
>> > several query scripts) have all been error-free.
>> >
>> > Some background:
>> >  Solr information from the beginning of the checkindex output:
>> >  -------
>> >  Opening index @ /indexes/solrindexes/build0324.bad/index
>> >
>> > Segments file=segments_9s numSegments=105 version=6.3.0
>> > id=7m1ldieoje0m6sljp7xocbz9l userData={commitTimeMSec=1490400514324}
>> >   1 of 105: name=_be maxDoc=1227144
>> >     version=6.3.0
>> >     id=7m1ldieoje0m6sljp7xocburb
>> >     codec=Lucene62
>> >     compound=false
>> >     numFiles=14
>> >     size (MB)=4,926.186
>> >     diagnostics = {os=Linux, java.vendor=Oracle Corporation,
>> > java.version=1.8.0_45, java.vm.version=25.45-b02, lucene.version=6.3.0,
>> > mergeMaxNumSegments=-1, os.arch=amd64, java.runtime.version=1.8.0_45-
>> b13,
>> > source=merge, mergeFactor=19, os.version=3.10.0-229.1.2.el7.x86_64,
>> > timestamp=1490380905920}
>> >     no deletions
>> >     test: open reader.........OK [took 0.176 sec]
>> >     test: check integrity.....OK [took 37.399 sec]
>> >     test: check live docs.....OK [took 0.000 sec]
>> >     test: field infos.........OK [49 fields] [took 0.000 sec]
>> >     test: field norms.........OK [17 fields] [took 0.030 sec]
>> >     test: terms, freq, prox...OK [14568108 terms; 612537186 terms/docs
>> > pairs; 801208966 tokens] [took 30.005 sec]
>> >     test: stored fields.......OK [150164874 total field count; avg 122.4
>> > fields per doc] [took 35.321 sec]
>> >     test: term vectors........OK [4804967 total term vector count; avg
>> 3.9
>> > term/freq vector fields per doc] [took 55.857 sec]
>> >     test: docvalues...........OK [4 docvalues fields; 0 BINARY; 1
>> NUMERIC;
>> > 2 SORTED; 0 SORTED_NUMERIC; 1 SORTED_SET] [took 0.954 sec]
>> >     test: points..............OK [0 fields, 0 points] [took 0.000 sec]
>> >   -----
>> >
>> >   The indexing process is a Python script (using the scorched Python
>> > client)  which spawns multiple instance of itself, in this case 6, so
>> there
>> > are definitely concurrent calls ( to /update/json )
>> >
>> > Solrconfig and the schema have not been changed for several months,
>> during
>> > which time many ingests have been done, and the documents which were
>> being
>> > indexed at the time of the error have been indexed before without
>> problems,
>> > so I don't think it's a data issue.
>> >
>> > I saw the same error occur earlier in the day, and decided at that time
>> to
>> > delete the core and restart the Solr instance.
>> >
>> > The server is an Amazon instance running CentOS 7. I checked the system
>> > logs and didn't see any evidence of hardware errors
>> >
>> > I'm puzzled as to why this would start happening out of the blue and I
>> > can't find any partiuclarly relevant posts to this forum or
>> Stackexchange.
>> > Anyone have an idea what's going on ?
>>
>
>

Re: Indexing I/O errors and CorruptIndex messages

Posted by Rick Leir <rl...@leirtech.com>.

Simon 
After hearing about the weird time issue in EC2, I am going to ask if you have a real server handy for testing. No, I have no hard facts, this is just a suggestion. 

And I have no beef with AWS, they have served me really well for other servers.
Cheers -- Rick

On May 4, 2017 10:49:25 AM EDT, simon <mt...@gmail.com> wrote:
>I've pretty much ruled out system/hardware issues - the AWS instance
>has
>been rebooted,  and indexing to a core on a new and empty  disk/file
>system
>fails in the same way with a CorruptIndexException.
>I can  generally get indexing to complete by significantly dialing down
>the
>number of indexer scripts running concurrently, but the duration goes
>up
>proportionately.
>
>-Simon
>
>
>On Thu, Apr 27, 2017 at 9:26 AM, simon <mt...@gmail.com> wrote:
>
>> Nope ... huge file system (600gb) only 50% full, and a complete index
>> would be 80gb max.
>>
>> On Wed, Apr 26, 2017 at 4:04 PM, Erick Erickson
><er...@gmail.com>
>> wrote:
>>
>>> Disk space issue? Lucene requires at least as much free disk space
>as
>>> your index size. Note that the disk full issue will be transient,
>IOW
>>> if you look now and have free space it still may have been all used
>up
>>> but had some space reclaimed.
>>>
>>> Best,
>>> Erick
>>>
>>> On Wed, Apr 26, 2017 at 12:02 PM, simon <mt...@gmail.com> wrote:
>>> > reposting this as the problem described is happening again and
>there
>>> were
>>> > no responses to the original email. Anyone ?
>>> > ----------------------------
>>> > I'm seeing an odd error during indexing for which I can't find any
>>> reason.
>>> >
>>> > The relevant solr log entry:
>>> >
>>> > 2017-03-24 19:09:35.363 ERROR (commitScheduler-30-thread-1) [
>>> > x:build0324] o.a.s.u.CommitTracker auto commit
>>> > error...:java.io.EOFException: read past EOF:
>>> MMapIndexInput(path="/
>>> > indexes/solrindexes/build0324/index/_4ku.fdx")
>>> >      at org.apache.lucene.store.ByteBufferIndexInput.readByte(
>>> > ByteBufferIndexInput.java:75)
>>> > ...
>>> >     Suppressed: org.apache.lucene.index.CorruptIndexException:
>checksum
>>> > status indeterminate: remaining=0, please run checkindex for more
>>> details
>>> > (resource=     BufferedChecksumIndexInput(MM
>>> apIndexInput(path="/indexes/
>>> > solrindexes/build0324/index/_4ku.fdx")))
>>> >          at org.apache.lucene.codecs.CodecUtil.checkFooter(
>>> > CodecUtil.java:451)
>>> >          at org.apache.lucene.codecs.compressing.
>>> > CompressingStoredFieldsReader.<init>(CompressingStoredFields
>>> Reader.java:140)
>>> >  followed within a few seconds by
>>> >
>>> >  2017-03-24 19:09:56.402 ERROR (commitScheduler-31-thread-1) [
>>> > x:build0324] o.a.s.u.CommitTracker auto commit
>>> > error...:org.apache.solr.common.SolrException:
>>> > Error opening new searcher
>>> >     at
>org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:
>>> 1820)
>>> >     at
>org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1931)
>>> > ...
>>> > Caused by: java.io.EOFException: read past EOF:
>>> >
>MMapIndexInput(path="/indexes/solrindexes/build0324/index/_4ku.fdx")
>>> >     at org.apache.lucene.store.ByteBufferIndexInput.readByte(
>>> > ByteBufferIndexInput.java:75)
>>> >
>>> > This error is repeated a few times as the indexing continued and
>further
>>> > autocommits were triggered.
>>> >
>>> > I stopped the indexing process, made a backup snapshot of the
>index,
>>> >  restarted indexing at a checkpoint, and everything then completed
>>> without
>>> > further incidents
>>> >
>>> > I ran checkIndex on the saved snapshot and it reported no errors
>>> > whatsoever. Operations on the complete index (inclcuing an
>optimize and
>>> > several query scripts) have all been error-free.
>>> >
>>> > Some background:
>>> >  Solr information from the beginning of the checkindex output:
>>> >  -------
>>> >  Opening index @ /indexes/solrindexes/build0324.bad/index
>>> >
>>> > Segments file=segments_9s numSegments=105 version=6.3.0
>>> > id=7m1ldieoje0m6sljp7xocbz9l
>userData={commitTimeMSec=1490400514324}
>>> >   1 of 105: name=_be maxDoc=1227144
>>> >     version=6.3.0
>>> >     id=7m1ldieoje0m6sljp7xocburb
>>> >     codec=Lucene62
>>> >     compound=false
>>> >     numFiles=14
>>> >     size (MB)=4,926.186
>>> >     diagnostics = {os=Linux, java.vendor=Oracle Corporation,
>>> > java.version=1.8.0_45, java.vm.version=25.45-b02,
>lucene.version=6.3.0,
>>> > mergeMaxNumSegments=-1, os.arch=amd64,
>java.runtime.version=1.8.0_45-
>>> b13,
>>> > source=merge, mergeFactor=19,
>os.version=3.10.0-229.1.2.el7.x86_64,
>>> > timestamp=1490380905920}
>>> >     no deletions
>>> >     test: open reader.........OK [took 0.176 sec]
>>> >     test: check integrity.....OK [took 37.399 sec]
>>> >     test: check live docs.....OK [took 0.000 sec]
>>> >     test: field infos.........OK [49 fields] [took 0.000 sec]
>>> >     test: field norms.........OK [17 fields] [took 0.030 sec]
>>> >     test: terms, freq, prox...OK [14568108 terms; 612537186
>terms/docs
>>> > pairs; 801208966 tokens] [took 30.005 sec]
>>> >     test: stored fields.......OK [150164874 total field count; avg
>122.4
>>> > fields per doc] [took 35.321 sec]
>>> >     test: term vectors........OK [4804967 total term vector count;
>avg
>>> 3.9
>>> > term/freq vector fields per doc] [took 55.857 sec]
>>> >     test: docvalues...........OK [4 docvalues fields; 0 BINARY; 1
>>> NUMERIC;
>>> > 2 SORTED; 0 SORTED_NUMERIC; 1 SORTED_SET] [took 0.954 sec]
>>> >     test: points..............OK [0 fields, 0 points] [took 0.000
>sec]
>>> >   -----
>>> >
>>> >   The indexing process is a Python script (using the scorched
>Python
>>> > client)  which spawns multiple instance of itself, in this case 6,
>so
>>> there
>>> > are definitely concurrent calls ( to /update/json )
>>> >
>>> > Solrconfig and the schema have not been changed for several
>months,
>>> during
>>> > which time many ingests have been done, and the documents which
>were
>>> being
>>> > indexed at the time of the error have been indexed before without
>>> problems,
>>> > so I don't think it's a data issue.
>>> >
>>> > I saw the same error occur earlier in the day, and decided at that
>time
>>> to
>>> > delete the core and restart the Solr instance.
>>> >
>>> > The server is an Amazon instance running CentOS 7. I checked the
>system
>>> > logs and didn't see any evidence of hardware errors
>>> >
>>> > I'm puzzled as to why this would start happening out of the blue
>and I
>>> > can't find any partiuclarly relevant posts to this forum or
>>> Stackexchange.
>>> > Anyone have an idea what's going on ?
>>>
>>
>>

-- 
Sorry for being brief. Alternate email is rickleir at yahoo dot com