You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@solr.apache.org by David Smiley <da...@gmail.com> on 2023/06/29 14:35:05 UTC

Backup, Restore, SplitShard should always internally be "async"

Some (most?) cluster admin commands can be executed in an async mode:
https://solr.apache.org/guide/solr/latest/configuration-guide/collections-api.html#asynchronous-calls

What I find really strange and unnecessary is that some of the commands
*internally* operate differently based on whether the command was invoked
this way or not.  This adds complexity to understanding / maintenance and
to testing.  Instead, I think commands should do sub-steps in whatever way
that makes sense for what the command is doing.  I propose that BackupCmd
send requests to each shard asynchronously because it's potentially a heavy
operation.  Likewise, some intermediate steps of a shard split could be
time consuming and should always be executed in an async way (e.g. the
actual index splitting step) but not cheap steps.  All this is transparent
to the client, by the way.

The only downside I can think of is that an async issued request will take
a bit longer.  But given these are used for heavy commands (that are likely
already being invoked this way) -- I think that's fair.
CollectionAdminRequest.RequestStatus.waitFor polls every 1 second.
SOLR-16313 proposes configurability of this.  I would prefer that the
server-side implementation have an option to have it wait up to a
configurable seconds via a ZK watch so that commonly an async command
wouldn't noticeably take any longer.  Nonetheless this is just an
improvement proposal; it's not a "blocker" to my proposal above, IMO.

~ David

Re: Backup, Restore, SplitShard should always internally be "async"

Posted by Ishan Chattopadhyaya <ic...@gmail.com>.
> I'm unsure Solr async responses
should include documentation/help.

I think it should return the URL to check status at. Not
"documentation/help".

On Thu, 29 Jun, 2023, 10:18 pm David Smiley, <da...@gmail.com>
wrote:

> To be extra clear, my proposal is about the *internal* operation of certain
> commands.  Thus a user/client issuing a backup command in the synchronous
> way will not be impacted at all; it'll wait and return the response.  If a
> user/client issues an async command, presumably they have looked at the
> documentation to understand how to do so.  I'm unsure Solr async responses
> should include documentation/help.
>
> ~ David
>
>
> On Thu, Jun 29, 2023 at 10:56 AM Ishan Chattopadhyaya <
> ichattopadhyaya@gmail.com> wrote:
>
> > +1. I think as an improvement, a helpful message on how to track the
> status
> > of the async request should be returned as part of the response of async
> > collection api calls.
> >
> > Even 1s poll for these commands is okay in real world.
> >
> > On Thu, 29 Jun, 2023, 8:05 pm David Smiley, <da...@gmail.com>
> > wrote:
> >
> > > Some (most?) cluster admin commands can be executed in an async mode:
> > >
> > >
> >
> https://solr.apache.org/guide/solr/latest/configuration-guide/collections-api.html#asynchronous-calls
> > >
> > > What I find really strange and unnecessary is that some of the commands
> > > *internally* operate differently based on whether the command was
> invoked
> > > this way or not.  This adds complexity to understanding / maintenance
> and
> > > to testing.  Instead, I think commands should do sub-steps in whatever
> > way
> > > that makes sense for what the command is doing.  I propose that
> BackupCmd
> > > send requests to each shard asynchronously because it's potentially a
> > heavy
> > > operation.  Likewise, some intermediate steps of a shard split could be
> > > time consuming and should always be executed in an async way (e.g. the
> > > actual index splitting step) but not cheap steps.  All this is
> > transparent
> > > to the client, by the way.
> > >
> > > The only downside I can think of is that an async issued request will
> > take
> > > a bit longer.  But given these are used for heavy commands (that are
> > likely
> > > already being invoked this way) -- I think that's fair.
> > > CollectionAdminRequest.RequestStatus.waitFor polls every 1 second.
> > > SOLR-16313 proposes configurability of this.  I would prefer that the
> > > server-side implementation have an option to have it wait up to a
> > > configurable seconds via a ZK watch so that commonly an async command
> > > wouldn't noticeably take any longer.  Nonetheless this is just an
> > > improvement proposal; it's not a "blocker" to my proposal above, IMO.
> > >
> > > ~ David
> > >
> >
>

Re: Backup, Restore, SplitShard should always internally be "async"

Posted by David Smiley <da...@gmail.com>.
To be extra clear, my proposal is about the *internal* operation of certain
commands.  Thus a user/client issuing a backup command in the synchronous
way will not be impacted at all; it'll wait and return the response.  If a
user/client issues an async command, presumably they have looked at the
documentation to understand how to do so.  I'm unsure Solr async responses
should include documentation/help.

~ David


On Thu, Jun 29, 2023 at 10:56 AM Ishan Chattopadhyaya <
ichattopadhyaya@gmail.com> wrote:

> +1. I think as an improvement, a helpful message on how to track the status
> of the async request should be returned as part of the response of async
> collection api calls.
>
> Even 1s poll for these commands is okay in real world.
>
> On Thu, 29 Jun, 2023, 8:05 pm David Smiley, <da...@gmail.com>
> wrote:
>
> > Some (most?) cluster admin commands can be executed in an async mode:
> >
> >
> https://solr.apache.org/guide/solr/latest/configuration-guide/collections-api.html#asynchronous-calls
> >
> > What I find really strange and unnecessary is that some of the commands
> > *internally* operate differently based on whether the command was invoked
> > this way or not.  This adds complexity to understanding / maintenance and
> > to testing.  Instead, I think commands should do sub-steps in whatever
> way
> > that makes sense for what the command is doing.  I propose that BackupCmd
> > send requests to each shard asynchronously because it's potentially a
> heavy
> > operation.  Likewise, some intermediate steps of a shard split could be
> > time consuming and should always be executed in an async way (e.g. the
> > actual index splitting step) but not cheap steps.  All this is
> transparent
> > to the client, by the way.
> >
> > The only downside I can think of is that an async issued request will
> take
> > a bit longer.  But given these are used for heavy commands (that are
> likely
> > already being invoked this way) -- I think that's fair.
> > CollectionAdminRequest.RequestStatus.waitFor polls every 1 second.
> > SOLR-16313 proposes configurability of this.  I would prefer that the
> > server-side implementation have an option to have it wait up to a
> > configurable seconds via a ZK watch so that commonly an async command
> > wouldn't noticeably take any longer.  Nonetheless this is just an
> > improvement proposal; it's not a "blocker" to my proposal above, IMO.
> >
> > ~ David
> >
>

Re: Backup, Restore, SplitShard should always internally be "async"

Posted by Ishan Chattopadhyaya <ic...@gmail.com>.
+1. I think as an improvement, a helpful message on how to track the status
of the async request should be returned as part of the response of async
collection api calls.

Even 1s poll for these commands is okay in real world.

On Thu, 29 Jun, 2023, 8:05 pm David Smiley, <da...@gmail.com>
wrote:

> Some (most?) cluster admin commands can be executed in an async mode:
>
> https://solr.apache.org/guide/solr/latest/configuration-guide/collections-api.html#asynchronous-calls
>
> What I find really strange and unnecessary is that some of the commands
> *internally* operate differently based on whether the command was invoked
> this way or not.  This adds complexity to understanding / maintenance and
> to testing.  Instead, I think commands should do sub-steps in whatever way
> that makes sense for what the command is doing.  I propose that BackupCmd
> send requests to each shard asynchronously because it's potentially a heavy
> operation.  Likewise, some intermediate steps of a shard split could be
> time consuming and should always be executed in an async way (e.g. the
> actual index splitting step) but not cheap steps.  All this is transparent
> to the client, by the way.
>
> The only downside I can think of is that an async issued request will take
> a bit longer.  But given these are used for heavy commands (that are likely
> already being invoked this way) -- I think that's fair.
> CollectionAdminRequest.RequestStatus.waitFor polls every 1 second.
> SOLR-16313 proposes configurability of this.  I would prefer that the
> server-side implementation have an option to have it wait up to a
> configurable seconds via a ZK watch so that commonly an async command
> wouldn't noticeably take any longer.  Nonetheless this is just an
> improvement proposal; it's not a "blocker" to my proposal above, IMO.
>
> ~ David
>