You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@solr.apache.org by David Smiley <ds...@apache.org> on 2023/05/10 03:15:40 UTC

Async collection requests and deduplication

I noticed that async admin requests to Solr must have a unique asyncId or
else a request is rejected.  Makes sense -- maybe the request is in
progress.  But what if it isn't -- what if the previous request for the
same ID either succeeded or failed?  Shouldn't we clear the previous
asyncId status and let the new request go through?

I'm imagining leveraging this uniqueness constraint in order to be an
additional protection measure against requests that should be done
atomically, like a shard split.  Yes there are already locks but this
additional measure will allow a fail-fast -- no enqueue of a doomed message
to the Overseer that will ultimately never succeed any way.  Thus the
sender of a shard split can use an async ID like
"SPLIT-collectionName-shardName".  Maybe there are other parts of SolrCloud
that could leverage this constraint to its advantage likewise.

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley

Re: Async collection requests and deduplication

Posted by Tomás Fernández Löbbe <to...@gmail.com>.

> BTW these async status objects stored in ZK are in fact cleaned up when
they reach 10k in number.  See SizeLimitedDistributedMap.

Yes, I now remember. The asyncIDs are actually on a regular DistributedMap,
but "completed" and "failure" maps will delete asyncIDs when clearing their
own elements.

On Wed, May 10, 2023 at 12:21 PM David Smiley <ds...@apache.org> wrote:

> Good point Tomas; I hadn't considered that use-case.  I suppose the
> behavior I suggest could be controlled with a boolean parameter flag like
> "asyncDeleteStatus" true/false.  WDYT?  I'm not married to it.
>
> BTW these async status objects stored in ZK are in fact cleaned up when
> they reach 10k in number.  See SizeLimitedDistributedMap.
>
> ~ David Smiley
> Apache Lucene/Solr Search Developer
> http://www.linkedin.com/in/davidwsmiley
>
>
> On Wed, May 10, 2023 at 2:36 PM Tomás Fernández Löbbe <
> tomasflobbe@gmail.com>
> wrote:
>
> > I find it very useful to keep the used async IDs regardless of the status
> > for some time. For example, If you have a workflow that involves multiple
> > steps such as add/remove replicas, you can just retry/restart the
> workflow
> > and be sure Solr will reject the request if the async ID already exists
> > (and your code can then handle this accordingly, for example, checking
> the
> > status of success/failed and act accordingly) as long as you use the
> async
> > IDs consistently.
> >
> > That said, async IDs do need to eventually be removed and AFAIK Solr
> > doesn't do this automatically. This is a problem because of ever
> increasing
> > objects in ZooKeeper. I think we should have some sort of task that
> cleans
> > up async ID after some configurable amount of time.
> >
> > On Wed, May 10, 2023 at 1:01 AM Andras Salamon <
> andras.salamon@melda.info>
> > wrote:
> >
> > > Hi,
> > >
> > >
> > >
> > > How can we be sure that the previous request status info has been
> already
> > > processed? What about the following timeline:
> > >
> > >
> > >
> > > -Client1 sends an async request
> > >
> > > -Client1 reads status info, it's still running
> > >
> > > -Client1 reads status info, it's still running
> > >
> > > -Async request finishes
> > >
> > > -Right after that Client2 sends a new async request with the same ID,
> we
> > > clear the async status because it's already finished
> > >
> > > -Client1 reads status info, but this time it will read info about the
> new
> > > async request sent by Client2.
> > >
> > >
> > >
> > > Andras
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > ---- On Wed, 10 May 2023 05:15:40 +0200 David Smiley <
> dsmiley@apache.org
> > >
> > > wrote ---
> > >
> > >
> > >
> > > I noticed that async admin requests to Solr must have a unique asyncId
> or
> > > else a request is rejected.  Makes sense -- maybe the request is in
> > > progress.  But what if it isn't -- what if the previous request for the
> > > same ID either succeeded or failed?  Shouldn't we clear the previous
> > > asyncId status and let the new request go through?
> > >
> > > I'm imagining leveraging this uniqueness constraint in order to be an
> > > additional protection measure against requests that should be done
> > > atomically, like a shard split.  Yes there are already locks but this
> > > additional measure will allow a fail-fast -- no enqueue of a doomed
> > > message
> > > to the Overseer that will ultimately never succeed any way.  Thus the
> > > sender of a shard split can use an async ID like
> > > "SPLIT-collectionName-shardName".  Maybe there are other parts of
> > > SolrCloud
> > > that could leverage this constraint to its advantage likewise.
> > >
> > > ~ David Smiley
> > > Apache Lucene/Solr Search Developer
> > > http://www.linkedin.com/in/davidwsmiley
> >
>

Re: Async collection requests and deduplication

Posted by David Smiley <ds...@apache.org>.

Good point Tomas; I hadn't considered that use-case.  I suppose the
behavior I suggest could be controlled with a boolean parameter flag like
"asyncDeleteStatus" true/false.  WDYT?  I'm not married to it.

BTW these async status objects stored in ZK are in fact cleaned up when
they reach 10k in number.  See SizeLimitedDistributedMap.

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


On Wed, May 10, 2023 at 2:36 PM Tomás Fernández Löbbe <to...@gmail.com>
wrote:

> I find it very useful to keep the used async IDs regardless of the status
> for some time. For example, If you have a workflow that involves multiple
> steps such as add/remove replicas, you can just retry/restart the workflow
> and be sure Solr will reject the request if the async ID already exists
> (and your code can then handle this accordingly, for example, checking the
> status of success/failed and act accordingly) as long as you use the async
> IDs consistently.
>
> That said, async IDs do need to eventually be removed and AFAIK Solr
> doesn't do this automatically. This is a problem because of ever increasing
> objects in ZooKeeper. I think we should have some sort of task that cleans
> up async ID after some configurable amount of time.
>
> On Wed, May 10, 2023 at 1:01 AM Andras Salamon <an...@melda.info>
> wrote:
>
> > Hi,
> >
> >
> >
> > How can we be sure that the previous request status info has been already
> > processed? What about the following timeline:
> >
> >
> >
> > -Client1 sends an async request
> >
> > -Client1 reads status info, it's still running
> >
> > -Client1 reads status info, it's still running
> >
> > -Async request finishes
> >
> > -Right after that Client2 sends a new async request with the same ID, we
> > clear the async status because it's already finished
> >
> > -Client1 reads status info, but this time it will read info about the new
> > async request sent by Client2.
> >
> >
> >
> > Andras
> >
> >
> >
> >
> >
> >
> >
> >
> > ---- On Wed, 10 May 2023 05:15:40 +0200 David Smiley <dsmiley@apache.org
> >
> > wrote ---
> >
> >
> >
> > I noticed that async admin requests to Solr must have a unique asyncId or
> > else a request is rejected.  Makes sense -- maybe the request is in
> > progress.  But what if it isn't -- what if the previous request for the
> > same ID either succeeded or failed?  Shouldn't we clear the previous
> > asyncId status and let the new request go through?
> >
> > I'm imagining leveraging this uniqueness constraint in order to be an
> > additional protection measure against requests that should be done
> > atomically, like a shard split.  Yes there are already locks but this
> > additional measure will allow a fail-fast -- no enqueue of a doomed
> > message
> > to the Overseer that will ultimately never succeed any way.  Thus the
> > sender of a shard split can use an async ID like
> > "SPLIT-collectionName-shardName".  Maybe there are other parts of
> > SolrCloud
> > that could leverage this constraint to its advantage likewise.
> >
> > ~ David Smiley
> > Apache Lucene/Solr Search Developer
> > http://www.linkedin.com/in/davidwsmiley
>

Re: Async collection requests and deduplication

Posted by Tomás Fernández Löbbe <to...@gmail.com>.

I find it very useful to keep the used async IDs regardless of the status
for some time. For example, If you have a workflow that involves multiple
steps such as add/remove replicas, you can just retry/restart the workflow
and be sure Solr will reject the request if the async ID already exists
(and your code can then handle this accordingly, for example, checking the
status of success/failed and act accordingly) as long as you use the async
IDs consistently.

That said, async IDs do need to eventually be removed and AFAIK Solr
doesn't do this automatically. This is a problem because of ever increasing
objects in ZooKeeper. I think we should have some sort of task that cleans
up async ID after some configurable amount of time.

On Wed, May 10, 2023 at 1:01 AM Andras Salamon <an...@melda.info>
wrote:

> Hi,
>
>
>
> How can we be sure that the previous request status info has been already
> processed? What about the following timeline:
>
>
>
> -Client1 sends an async request
>
> -Client1 reads status info, it's still running
>
> -Client1 reads status info, it's still running
>
> -Async request finishes
>
> -Right after that Client2 sends a new async request with the same ID, we
> clear the async status because it's already finished
>
> -Client1 reads status info, but this time it will read info about the new
> async request sent by Client2.
>
>
>
> Andras
>
>
>
>
>
>
>
>
> ---- On Wed, 10 May 2023 05:15:40 +0200 David Smiley <ds...@apache.org>
> wrote ---
>
>
>
> I noticed that async admin requests to Solr must have a unique asyncId or
> else a request is rejected.  Makes sense -- maybe the request is in
> progress.  But what if it isn't -- what if the previous request for the
> same ID either succeeded or failed?  Shouldn't we clear the previous
> asyncId status and let the new request go through?
>
> I'm imagining leveraging this uniqueness constraint in order to be an
> additional protection measure against requests that should be done
> atomically, like a shard split.  Yes there are already locks but this
> additional measure will allow a fail-fast -- no enqueue of a doomed
> message
> to the Overseer that will ultimately never succeed any way.  Thus the
> sender of a shard split can use an async ID like
> "SPLIT-collectionName-shardName".  Maybe there are other parts of
> SolrCloud
> that could leverage this constraint to its advantage likewise.
>
> ~ David Smiley
> Apache Lucene/Solr Search Developer
> http://www.linkedin.com/in/davidwsmiley

Re: Async collection requests and deduplication

Posted by Andras Salamon <an...@melda.info>.

Hi,

How can we be sure that the previous request status info has been already processed? What about the following timeline:

-Client1 sends an async request

-Client1 reads status info, it's still running

-Async request finishes

-Right after that Client2 sends a new async request with the same ID, we clear the async status because it's already finished

-Client1 reads status info, but this time it will read info about the new async request sent by Client2.

Andras

---- On Wed, 10 May 2023 05:15:40 +0200 David Smiley <ds...@apache.org> wrote ---

I noticed that async admin requests to Solr must have a unique asyncId or
else a request is rejected. Makes sense -- maybe the request is in
progress. But what if it isn't -- what if the previous request for the
same ID either succeeded or failed? Shouldn't we clear the previous
asyncId status and let the new request go through?

I'm imagining leveraging this uniqueness constraint in order to be an
additional protection measure against requests that should be done
atomically, like a shard split. Yes there are already locks but this
additional measure will allow a fail-fast -- no enqueue of a doomed message
to the Overseer that will ultimately never succeed any way. Thus the
sender of a shard split can use an async ID like
"SPLIT-collectionName-shardName". Maybe there are other parts of SolrCloud
that could leverage this constraint to its advantage likewise.

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley