You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Steffen Moldenhauer <s....@intershop.de> on 2019/01/31 15:09:49 UTC

Asynchronous Calls to Backup/Restore Collections ignoring errors

Hi all,

we are using the collection API backup and restore to transfer collections from a pre-prod to a production system. We are currently using Solr version 6.6.5
But sometimes that automated process fails and collections are not working on the production system.

It seems that the asynchronous API calls backup and restore do not report some errors/exceptions.

I tried it with the solrcloud gettingstarted example:

http://localhost:8983/solr/admin/collections?action=BACKUP&name=backup-gettingstarted&collection=gettingstarted&location=D:\solr_backup

http://localhost:8983/solr/admin/collections?action=DELETE&name=gettingstarted

Now I simulate an error just by deleting somthing from the backup in the file-system and try to restore the incomplete backup:

http://localhost:8983/solr/admin/collections?action=RESTORE&name=backup-gettingstarted&collection=gettingstarted&location=D:\solr_backup&async=1000

http://localhost:8983/solr/admin/collections?action=REQUESTSTATUS&requestid=1000
<response><lst name="responseHeader"><int name="status">0</int><int name="QTime">2</int></lst><lst name="status"><str name="state">completed</str><str name="msg">found [1000] in completed tasks</str></lst></response>

The status is completed but the collection is not usable.

With a synchronous restore call I get:

http://localhost:8983/solr/admin/collections?action=RESTORE&name=backup-gettingstarted&collection=gettingstarted&location=D:\solr_backup
        <response><lst name="responseHeader"><int name="status">500</int><int name="QTime">6456</int></lst><str name="Operation restore caused exception:">org.apache.solr.common.SolrException:org.apache.solr.common.SolrException: Could not restore core</str><lst name="exception"><str name="msg">Could not restore core</str><int name="rspCode">500</int></lst><lst name="error"><lst name="metadata"><str name="error-class">org.apache.solr.common.SolrException</str><str name="root-error-class">org.apache.solr.common.SolrException</str></lst><str name="msg">Could not restore core</str><str name="trace">org.apache.solr.common.SolrException: Could not restore core
               at org.apache.solr.handler.admin.CollectionsHandler.handleResponse(CollectionsHandler.java:300)
               at org.apache.solr.handler.admin.CollectionsHandler.invokeAction(CollectionsHandler.java:237)
               at org.apache.solr.handler.admin.CollectionsHandler.handleRequestBody(CollectionsHandler.java:215)
               at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:173)
               at org.apache.solr.servlet.HttpSolrCall.handleAdmin(HttpSolrCall.java:748)
               at org.apache.solr.servlet.HttpSolrCall.handleAdminRequest(HttpSolrCall.java:729)
               at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:510)
               at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:361)
               at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:305)
               at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1691)
               at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582)
               at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
               at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
               at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)
               at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
               at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)
               at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
               at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
               at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
               at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)
               at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119)
               at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
               at org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335)
               at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
               at org.eclipse.jetty.server.Server.handle(Server.java:534)
               at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320)
               at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251)
               at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:273)
               at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95)
               at org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)
               at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303)
               at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148)
               at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136)
               at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)
               at org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)
               at java.lang.Thread.run(Thread.java:748)
</str><int name="code">500</int></lst></response>


But we cannot use the sync call because we are running in a timout even if we increase the socket timeout of the client.
And we cannot use the async because it does not report errors.

Is this a known bug? Any ideas for a workaround?

Kind regards
Steffen Moldenhauer


RE: Asynchronous Calls to Backup/Restore Collections ignoring errors

Posted by Steffen Moldenhauer <s....@intershop.de>.
Hi Jason, 

thanks for pointing me to issue SOLR-6595.  Looks to me that the async thing is similar to the handling of distributed collection cmds. 
I hope I can spare the time to try if your patch would fix it. 
Yes, I will try your suggestion and see if we can do a work around and check the collection after the restore with a query. 

Regards
Steffen 

> -----Original Message-----
> From: Jason Gerlowski [mailto:gerlowskija@gmail.com]
> Sent: Montag, 4. Februar 2019 15:43
> To: solr-user@lucene.apache.org
> Subject: Re: Asynchronous Calls to Backup/Restore Collections ignoring
> errors
> 
> Hi Steffen,
> 
> There are a few "known issues" in this area.  Probably most relevant is
> SOLR-6595, which covers a few error-reporting issues for "collection-admin"
> operations.  I don't think we've gotten any reports yet of success/failure
> determination being broken for asynchronous operations, but that's not
> too surprising given my understanding of how that bit of the code works.
> So "yes", this is a known issue.
> We've made some progress towards improving the situation, but there's
> still work to be done.
> 
> As for workarounds, I can't think of any clever suggestions.  You might be
> able to issue a query to the collection to see if it returns any docs, or a
> particular number of expected docs.  But that may not be possible,
> depending on what you meant by the collection being "unusable" above.
> 
> Best,
> 
> Jason
> 
> On Thu, Jan 31, 2019 at 10:10 AM Steffen Moldenhauer
> <s....@intershop.de> wrote:
> >
> > Hi all,
> >
> > we are using the collection API backup and restore to transfer
> > collections from a pre-prod to a production system. We are currently
> using Solr version 6.6.5 But sometimes that automated process fails and
> collections are not working on the production system.
> >
> > It seems that the asynchronous API calls backup and restore do not report
> some errors/exceptions.
> >
> > I tried it with the solrcloud gettingstarted example:
> >
> >
> http://localhost:8983/solr/admin/collections?action=BACKUP&name=back
> up
> > -gettingstarted&collection=gettingstarted&location=D:\solr_backup
> >
> >
> http://localhost:8983/solr/admin/collections?action=DELETE&name=gettin
> > gstarted
> >
> > Now I simulate an error just by deleting somthing from the backup in the
> file-system and try to restore the incomplete backup:
> >
> >
> http://localhost:8983/solr/admin/collections?action=RESTORE&name=bac
> ku
> > p-
> gettingstarted&collection=gettingstarted&location=D:\solr_backup&asy
> > nc=1000
> >
> >
> http://localhost:8983/solr/admin/collections?action=REQUESTSTATUS&req
> u
> > estid=1000 <response><lst name="responseHeader"><int
> > name="status">0</int><int name="QTime">2</int></lst><lst
> > name="status"><str name="state">completed</str><str
> name="msg">found
> > [1000] in completed tasks</str></lst></response>
> >
> > The status is completed but the collection is not usable.
> >
> > With a synchronous restore call I get:
> >
> >
> http://localhost:8983/solr/admin/collections?action=RESTORE&name=bac
> kup-gettingstarted&collection=gettingstarted&location=D:\solr_backup
> >         <response><lst name="responseHeader"><int
> name="status">500</int><int name="QTime">6456</int></lst><str
> name="Operation restore caused
> exception:">org.apache.solr.common.SolrException:org.apache.solr.commo
> n.SolrException: Could not restore core</str><lst name="exception"><str
> name="msg">Could not restore core</str><int
> name="rspCode">500</int></lst><lst name="error"><lst
> name="metadata"><str name="error-
> class">org.apache.solr.common.SolrException</str><str name="root-error-
> class">org.apache.solr.common.SolrException</str></lst><str
> name="msg">Could not restore core</str><str
> name="trace">org.apache.solr.common.SolrException: Could not restore
> core
> >                at
> org.apache.solr.handler.admin.CollectionsHandler.handleResponse(Collectio
> nsHandler.java:300)
> >                at
> org.apache.solr.handler.admin.CollectionsHandler.invokeAction(Collections
> Handler.java:237)
> >                at
> org.apache.solr.handler.admin.CollectionsHandler.handleRequestBody(Colle
> ctionsHandler.java:215)
> >                at
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandle
> rBase.java:173)
> >                at
> org.apache.solr.servlet.HttpSolrCall.handleAdmin(HttpSolrCall.java:748)
> >                at
> org.apache.solr.servlet.HttpSolrCall.handleAdminRequest(HttpSolrCall.java:
> 729)
> >                at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:510)
> >                at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:36
> 1)
> >                at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:30
> 5)
> >                at
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandle
> r.java:1691)
> >                at
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582)
> >                at
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:
> 143)
> >                at
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
> >                at
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.ja
> va:226)
> >                at
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.ja
> va:1180)
> >                at
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)
> >                at
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.jav
> a:185)
> >                at
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.jav
> a:1112)
> >                at
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:
> 141)
> >                at
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHa
> ndlerCollection.java:213)
> >                at
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.
> java:119)
> >                at
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.ja
> va:134)
> >                at
> org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java
> :335)
> >                at
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.ja
> va:134)
> >                at org.eclipse.jetty.server.Server.handle(Server.java:534)
> >                at
> org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320)
> >                at
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251)
> >                at
> org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractCo
> nnection.java:273)
> >                at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95)
> >                at
> org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.jav
> a:93)
> >                at
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProdu
> ceConsume(ExecuteProduceConsume.java:303)
> >                at
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceCons
> ume(ExecuteProduceConsume.java:148)
> >                at
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecutePr
> oduceConsume.java:136)
> >                at
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.ja
> va:671)
> >                at
> org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.ja
> va:589)
> >                at java.lang.Thread.run(Thread.java:748)
> > </str><int name="code">500</int></lst></response>
> >
> >
> > But we cannot use the sync call because we are running in a timout even if
> we increase the socket timeout of the client.
> > And we cannot use the async because it does not report errors.
> >
> > Is this a known bug? Any ideas for a workaround?
> >
> > Kind regards
> > Steffen Moldenhauer
> >

Re: Asynchronous Calls to Backup/Restore Collections ignoring errors

Posted by Jason Gerlowski <ge...@gmail.com>.
Hi Steffen,

There are a few "known issues" in this area.  Probably most relevant
is SOLR-6595, which covers a few error-reporting issues for
"collection-admin" operations.  I don't think we've gotten any reports
yet of success/failure determination being broken for asynchronous
operations, but that's not too surprising given my understanding of
how that bit of the code works.  So "yes", this is a known issue.
We've made some progress towards improving the situation, but there's
still work to be done.

As for workarounds, I can't think of any clever suggestions.  You
might be able to issue a query to the collection to see if it returns
any docs, or a particular number of expected docs.  But that may not
be possible, depending on what you meant by the collection being
"unusable" above.

Best,

Jason

On Thu, Jan 31, 2019 at 10:10 AM Steffen Moldenhauer
<s....@intershop.de> wrote:
>
> Hi all,
>
> we are using the collection API backup and restore to transfer collections from a pre-prod to a production system. We are currently using Solr version 6.6.5
> But sometimes that automated process fails and collections are not working on the production system.
>
> It seems that the asynchronous API calls backup and restore do not report some errors/exceptions.
>
> I tried it with the solrcloud gettingstarted example:
>
> http://localhost:8983/solr/admin/collections?action=BACKUP&name=backup-gettingstarted&collection=gettingstarted&location=D:\solr_backup
>
> http://localhost:8983/solr/admin/collections?action=DELETE&name=gettingstarted
>
> Now I simulate an error just by deleting somthing from the backup in the file-system and try to restore the incomplete backup:
>
> http://localhost:8983/solr/admin/collections?action=RESTORE&name=backup-gettingstarted&collection=gettingstarted&location=D:\solr_backup&async=1000
>
> http://localhost:8983/solr/admin/collections?action=REQUESTSTATUS&requestid=1000
> <response><lst name="responseHeader"><int name="status">0</int><int name="QTime">2</int></lst><lst name="status"><str name="state">completed</str><str name="msg">found [1000] in completed tasks</str></lst></response>
>
> The status is completed but the collection is not usable.
>
> With a synchronous restore call I get:
>
> http://localhost:8983/solr/admin/collections?action=RESTORE&name=backup-gettingstarted&collection=gettingstarted&location=D:\solr_backup
>         <response><lst name="responseHeader"><int name="status">500</int><int name="QTime">6456</int></lst><str name="Operation restore caused exception:">org.apache.solr.common.SolrException:org.apache.solr.common.SolrException: Could not restore core</str><lst name="exception"><str name="msg">Could not restore core</str><int name="rspCode">500</int></lst><lst name="error"><lst name="metadata"><str name="error-class">org.apache.solr.common.SolrException</str><str name="root-error-class">org.apache.solr.common.SolrException</str></lst><str name="msg">Could not restore core</str><str name="trace">org.apache.solr.common.SolrException: Could not restore core
>                at org.apache.solr.handler.admin.CollectionsHandler.handleResponse(CollectionsHandler.java:300)
>                at org.apache.solr.handler.admin.CollectionsHandler.invokeAction(CollectionsHandler.java:237)
>                at org.apache.solr.handler.admin.CollectionsHandler.handleRequestBody(CollectionsHandler.java:215)
>                at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:173)
>                at org.apache.solr.servlet.HttpSolrCall.handleAdmin(HttpSolrCall.java:748)
>                at org.apache.solr.servlet.HttpSolrCall.handleAdminRequest(HttpSolrCall.java:729)
>                at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:510)
>                at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:361)
>                at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:305)
>                at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1691)
>                at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582)
>                at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
>                at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
>                at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)
>                at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
>                at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)
>                at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
>                at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
>                at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
>                at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)
>                at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119)
>                at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
>                at org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335)
>                at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
>                at org.eclipse.jetty.server.Server.handle(Server.java:534)
>                at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320)
>                at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251)
>                at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:273)
>                at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95)
>                at org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)
>                at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303)
>                at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148)
>                at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136)
>                at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)
>                at org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)
>                at java.lang.Thread.run(Thread.java:748)
> </str><int name="code">500</int></lst></response>
>
>
> But we cannot use the sync call because we are running in a timout even if we increase the socket timeout of the client.
> And we cannot use the async because it does not report errors.
>
> Is this a known bug? Any ideas for a workaround?
>
> Kind regards
> Steffen Moldenhauer
>