You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Varun Thacker (JIRA)" <ji...@apache.org> on 2017/03/10 19:59:04 UTC

[jira] [Commented] (SOLR-10259) admin/cores?action=STATUS returns 500 when a single core has init failures

    [ https://issues.apache.org/jira/browse/SOLR-10259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15905617#comment-15905617 ] 

Varun Thacker commented on SOLR-10259:
--------------------------------------

Hi Oliver,

Patch looks good. However I think this patch was compiled with an older version of Solr? To apply the patch cleanly on master I needed to move the code into {{StatusOp.java}}

Also it would be nice to have a test for this. I think all we need to do is create an core , delete a segment file ( something like https://github.com/apache/lucene-solr/blob/releases/lucene-solr/5.3.0/solr/core/src/test/org/apache/solr/handler/TestRestoreCore.java#L187 where location is something like {{solrCore.getDataDir()}} ? ) and then call status

> admin/cores?action=STATUS returns 500 when a single core has init failures
> --------------------------------------------------------------------------
>
>                 Key: SOLR-10259
>                 URL: https://issues.apache.org/jira/browse/SOLR-10259
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>    Affects Versions: 5.3
>            Reporter: Oliver Bates
>            Priority: Trivial
>         Attachments: SOLR-10259.patch-1.txt, SOLR-10259.patch-2.txt
>
>
> When I have a healthy core on a node and I call solr/admin/cores?action=STATUS, I get the following healthy response:
> {quote}
> <response>
>   <lst name="responseHeader">
>     <int name="status">0</int>
>     <int name="QTime">1607</int>
>   </lst>
>   <lst name="initFailures"/>
>   <lst name="status">
>     <lst name="whoisbanana_shard1_replica1">
>       <str name="name">whoisbanana_shard1_replica1</str>
>       <str name="instanceDir">
>         /tmp/search_integration_test/solr1/whoisbanana_shard1_replica1/
>       </str>
>       <str name="dataDir">
>         /tmp/search_integration_test/solr1/whoisbanana_shard1_replica1/data/
>       </str>
>       <str name="config">solrconfig.xml</str>
>       <str name="schema">schema.xml</str>
>       <date name="startTime">2017-03-08T15:59:50.18Z</date>
>       <long name="uptime">380431</long>
>       <str name="lastPublished">active</str>
>       <int name="configVersion">0</int>
>       <lst name="index">
>       <int name="numDocs">0</int>
>       <int name="maxDoc">0</int>
>       <int name="deletedDocs">0</int>
>       <long name="indexHeapUsageBytes">0</long>
>       <long name="version">2</long>
>       <int name="segmentCount">0</int>
>       <bool name="current">true</bool>
>       <bool name="hasDeletions">false</bool>
>       <str name="directory">
> org.apache.lucene.store.NRTCachingDirectory:NRTCachingDirectory(MMapDirectory@/tmp/search_integration_test/solr1/whoisbanana_shard1_replica1/data/index lockFactory=org.apache.lucene.store.NativeFSLockFactory@762404a0; maxCacheMB=48.0 maxMergeSizeMB=4.0)
>       </str>
>       <lst name="userData"/>
>       <long name="sizeInBytes">71</long>
>       <str name="size">71 bytes</str>
>     </lst>
>   </lst>
> </lst>
> </response>
> {quote}
> If I then corrupt the index file and reload, e.g. like this:
> echo "cheese" >> /tmp/search_integration_test/solr1/whoisbanana_shard1_replica1/data/index/segments_1
> And then I call the same endpoint (solr/admin/cores?action=STATUS), I get a 500 back:
> {quote}
> <response>
>   <lst name="responseHeader">
>     <int name="status">500</int>
>     <int name="QTime">1508</int>
>   </lst>
>   <lst name="error">
>     <str name="msg">Error handling 'status' action</str>
>     <str name="trace">
> org.apache.solr.common.SolrException: Error handling 'status' action at org.apache.solr.handler.admin.CoreAdminHandler.handleStatusAction(CoreAdminHandler.java:755) at org.apache.solr.handler.admin.CoreAdminHandler.handleRequestInternal(CoreAdminHandler.java:231) at org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:196) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:146) at org.apache.solr.servlet.HttpSolrCall.handleAdminRequest(HttpSolrCall.java:676) at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:443) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:210) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:179) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652) at com.apple.cie.search.auth.TrustFilter.doFilter(TrustFilter.java:44) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652) at com.apple.cie.search.id.IdFilter.doFilter(IdFilter.java:38) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97) at org.eclipse.jetty.server.Server.handle(Server.java:499) at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310) at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257) at org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555) at java.lang.Thread.run(Thread.java:745) Caused by: org.apache.lucene.index.CorruptIndexException: misplaced codec footer (file extended?): remaining=23, expected=16 (resource=BufferedChecksumIndexInput(MMapIndexInput(path="/tmp/search_integration_test/solr1/whoisbanana_shard1_replica1/data/index/segments_1"))) at org.apache.lucene.codecs.CodecUtil.validateFooter(CodecUtil.java:411) at org.apache.lucene.codecs.CodecUtil.checkFooter(CodecUtil.java:331) at org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:442) at org.apache.lucene.index.SegmentInfos$1.doBody(SegmentInfos.java:493) at org.apache.lucene.index.SegmentInfos$1.doBody(SegmentInfos.java:490) at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:731) at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:683) at org.apache.lucene.index.SegmentInfos.readLatestCommit(SegmentInfos.java:490) at org.apache.lucene.index.StandardDirectoryReader.isCurrent(StandardDirectoryReader.java:344) at org.apache.lucene.index.FilterDirectoryReader.isCurrent(FilterDirectoryReader.java:124) at org.apache.lucene.index.FilterDirectoryReader.isCurrent(FilterDirectoryReader.java:124) at org.apache.solr.handler.admin.LukeRequestHandler.getIndexInfo(LukeRequestHandler.java:585) at org.apache.solr.handler.admin.CoreAdminHandler.getCoreStatus(CoreAdminHandler.java:1202) at org.apache.solr.handler.admin.CoreAdminHandler.handleStatusAction(CoreAdminHandler.java:743) ... 31 more
>     </str>
>     <int name="code">500</int>
>   </lst>
> </response>
> {quote}
> It seems to me like what we really want is to still return a 200, but to list the init failures under the 'initFailures' key of the  response (as seen in 'healthy response' above). This way, if a node is hosting 10 cores and 1 is corrupted, I can still query the STATUS endpoint to do get information about the non-corrupted cores, AND I can more easily determine what the problem with my corrupted core is because I can see the stack trace. This allows automated tooling, for instance, to go in there and delete and re-add a replica until the day arrives that REQUESTRECOVERY and/or leader-initiated-recovery both work when the index is corrupted (see https://issues.apache.org/jira/browse/SOLR-9836).
> I am not sure which solution the world would like best, so I am proposing two patches.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org