You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@solr.apache.org by Ishan Chattopadhyaya <ic...@gmail.com> on 2023/06/20 07:57:32 UTC

Re: Limiting Backup IO

Might be a good question for users@ list, I guess. I'm sure other users
must've thought about this.
Cross posting there, as I'm curious myself too.

On Tue, 20 Jun 2023 at 01:07, David Smiley <ds...@apache.org> wrote:

> Has anyone mitigated the potentially large IO impact of doing a backup of a
> large collection or just in general?  If the collection is large enough,
> there very well could be many shards on one host and it could saturate the
> IO.  I wonder if there should be a rate limit mechanism or some other
> mechanism.
>
> Not the same but I know that at a segment level, the merges are rate
> limited -- ConcurrentMergeScheduler doesn't quite let you set it but
> adjusts itself automatically ("ioThrottle" boolean).
>
> ~ David Smiley
> Apache Lucene/Solr Search Developer
> http://www.linkedin.com/in/davidwsmiley
>

Re: Limiting Backup IO

Posted by David Smiley <ds...@apache.org>.

Here's a POC: https://github.com/apache/solr/pull/1729

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


On Mon, Jun 26, 2023 at 1:53 PM Jason Gerlowski <ge...@gmail.com>
wrote:

> Sounds like something that would be very useful for folks.
>
> I'm sure it'd be very dependent on your data and the type of backup,
> but I'm curious - if you can share Pierre - is there a number of
> cores-per-node being backed up where you start to see problems?
>
> Jason
>
> On Wed, Jun 21, 2023 at 8:34 AM Pierre Salagnac
> <pi...@gmail.com> wrote:
> >
> > Thanks for starting this thread David.
> >
> > I've been internally working on this, since we have issues (query
> failures)
> > during backups of big collections because of IO saturation.
> >
> > I see two different approaches to solve this:
> > 1. Throttle at the IO level, like David mentioned.
> > 2. Limit the number of cores we backup concurrently.
> > (These two options are *not* mutually exclusive.)
> >
> > I've been focused on the second option, to limit the number of concurrent
> > backups per node. Currently, the overseer sends shard requests to all
> > shards in a simple 'for' loop. If the collection has one thousand shards,
> > we'll start 1 thousand concurrent backups. The idea is to only send shard
> > level requests up to a certain limit per node, and then each time a shard
> > is complete, we send the next one for this node.
> > If you're interested, I integrated my experiment (for non incremental
> > backups) here:
> >
> https://github.com/psalagnac/solr/commit/c77c94e9a3c20aee3e45ec1198f00ab9cf0f76c5
> >
> > I don't think backup is the only operation that should be considered. At
> > least restore is, not sure whether we have other IO intensive operations
> > that are at the collection level. Ideally, we should have something
> generic
> > and not consider each type of operation individually.
> >
> > Thanks
> >
> >
> > Le mar. 20 juin 2023 à 09:58, Ishan Chattopadhyaya <
> > ichattopadhyaya@gmail.com> a écrit :
> >
> > > Might be a good question for users@ list, I guess. I'm sure other
> users
> > > must've thought about this.
> > > Cross posting there, as I'm curious myself too.
> > >
> > > On Tue, 20 Jun 2023 at 01:07, David Smiley <ds...@apache.org> wrote:
> > >
> > > > Has anyone mitigated the potentially large IO impact of doing a
> backup
> > > of a
> > > > large collection or just in general?  If the collection is large
> enough,
> > > > there very well could be many shards on one host and it could
> saturate
> > > the
> > > > IO.  I wonder if there should be a rate limit mechanism or some other
> > > > mechanism.
> > > >
> > > > Not the same but I know that at a segment level, the merges are rate
> > > > limited -- ConcurrentMergeScheduler doesn't quite let you set it but
> > > > adjusts itself automatically ("ioThrottle" boolean).
> > > >
> > > > ~ David Smiley
> > > > Apache Lucene/Solr Search Developer
> > > > http://www.linkedin.com/in/davidwsmiley
> > > >
> > >
>

Re: Limiting Backup IO

Posted by David Smiley <ds...@apache.org>.

Here's a POC: https://github.com/apache/solr/pull/1729

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


On Mon, Jun 26, 2023 at 1:53 PM Jason Gerlowski <ge...@gmail.com>
wrote:

> Sounds like something that would be very useful for folks.
>
> I'm sure it'd be very dependent on your data and the type of backup,
> but I'm curious - if you can share Pierre - is there a number of
> cores-per-node being backed up where you start to see problems?
>
> Jason
>
> On Wed, Jun 21, 2023 at 8:34 AM Pierre Salagnac
> <pi...@gmail.com> wrote:
> >
> > Thanks for starting this thread David.
> >
> > I've been internally working on this, since we have issues (query
> failures)
> > during backups of big collections because of IO saturation.
> >
> > I see two different approaches to solve this:
> > 1. Throttle at the IO level, like David mentioned.
> > 2. Limit the number of cores we backup concurrently.
> > (These two options are *not* mutually exclusive.)
> >
> > I've been focused on the second option, to limit the number of concurrent
> > backups per node. Currently, the overseer sends shard requests to all
> > shards in a simple 'for' loop. If the collection has one thousand shards,
> > we'll start 1 thousand concurrent backups. The idea is to only send shard
> > level requests up to a certain limit per node, and then each time a shard
> > is complete, we send the next one for this node.
> > If you're interested, I integrated my experiment (for non incremental
> > backups) here:
> >
> https://github.com/psalagnac/solr/commit/c77c94e9a3c20aee3e45ec1198f00ab9cf0f76c5
> >
> > I don't think backup is the only operation that should be considered. At
> > least restore is, not sure whether we have other IO intensive operations
> > that are at the collection level. Ideally, we should have something
> generic
> > and not consider each type of operation individually.
> >
> > Thanks
> >
> >
> > Le mar. 20 juin 2023 à 09:58, Ishan Chattopadhyaya <
> > ichattopadhyaya@gmail.com> a écrit :
> >
> > > Might be a good question for users@ list, I guess. I'm sure other
> users
> > > must've thought about this.
> > > Cross posting there, as I'm curious myself too.
> > >
> > > On Tue, 20 Jun 2023 at 01:07, David Smiley <ds...@apache.org> wrote:
> > >
> > > > Has anyone mitigated the potentially large IO impact of doing a
> backup
> > > of a
> > > > large collection or just in general?  If the collection is large
> enough,
> > > > there very well could be many shards on one host and it could
> saturate
> > > the
> > > > IO.  I wonder if there should be a rate limit mechanism or some other
> > > > mechanism.
> > > >
> > > > Not the same but I know that at a segment level, the merges are rate
> > > > limited -- ConcurrentMergeScheduler doesn't quite let you set it but
> > > > adjusts itself automatically ("ioThrottle" boolean).
> > > >
> > > > ~ David Smiley
> > > > Apache Lucene/Solr Search Developer
> > > > http://www.linkedin.com/in/davidwsmiley
> > > >
> > >
>

Re: Limiting Backup IO

Posted by Jason Gerlowski <ge...@gmail.com>.

Sounds like something that would be very useful for folks.

I'm sure it'd be very dependent on your data and the type of backup,
but I'm curious - if you can share Pierre - is there a number of
cores-per-node being backed up where you start to see problems?

Jason

On Wed, Jun 21, 2023 at 8:34 AM Pierre Salagnac
<pi...@gmail.com> wrote:
>
> Thanks for starting this thread David.
>
> I've been internally working on this, since we have issues (query failures)
> during backups of big collections because of IO saturation.
>
> I see two different approaches to solve this:
> 1. Throttle at the IO level, like David mentioned.
> 2. Limit the number of cores we backup concurrently.
> (These two options are *not* mutually exclusive.)
>
> I've been focused on the second option, to limit the number of concurrent
> backups per node. Currently, the overseer sends shard requests to all
> shards in a simple 'for' loop. If the collection has one thousand shards,
> we'll start 1 thousand concurrent backups. The idea is to only send shard
> level requests up to a certain limit per node, and then each time a shard
> is complete, we send the next one for this node.
> If you're interested, I integrated my experiment (for non incremental
> backups) here:
> https://github.com/psalagnac/solr/commit/c77c94e9a3c20aee3e45ec1198f00ab9cf0f76c5
>
> I don't think backup is the only operation that should be considered. At
> least restore is, not sure whether we have other IO intensive operations
> that are at the collection level. Ideally, we should have something generic
> and not consider each type of operation individually.
>
> Thanks
>
>
> Le mar. 20 juin 2023 à 09:58, Ishan Chattopadhyaya <
> ichattopadhyaya@gmail.com> a écrit :
>
> > Might be a good question for users@ list, I guess. I'm sure other users
> > must've thought about this.
> > Cross posting there, as I'm curious myself too.
> >
> > On Tue, 20 Jun 2023 at 01:07, David Smiley <ds...@apache.org> wrote:
> >
> > > Has anyone mitigated the potentially large IO impact of doing a backup
> > of a
> > > large collection or just in general?  If the collection is large enough,
> > > there very well could be many shards on one host and it could saturate
> > the
> > > IO.  I wonder if there should be a rate limit mechanism or some other
> > > mechanism.
> > >
> > > Not the same but I know that at a segment level, the merges are rate
> > > limited -- ConcurrentMergeScheduler doesn't quite let you set it but
> > > adjusts itself automatically ("ioThrottle" boolean).
> > >
> > > ~ David Smiley
> > > Apache Lucene/Solr Search Developer
> > > http://www.linkedin.com/in/davidwsmiley
> > >
> >

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@solr.apache.org
For additional commands, e-mail: dev-help@solr.apache.org

Re: Limiting Backup IO

Posted by Jason Gerlowski <ge...@gmail.com>.

Sounds like something that would be very useful for folks.

I'm sure it'd be very dependent on your data and the type of backup,
but I'm curious - if you can share Pierre - is there a number of
cores-per-node being backed up where you start to see problems?

Jason

On Wed, Jun 21, 2023 at 8:34 AM Pierre Salagnac
<pi...@gmail.com> wrote:
>
> Thanks for starting this thread David.
>
> I've been internally working on this, since we have issues (query failures)
> during backups of big collections because of IO saturation.
>
> I see two different approaches to solve this:
> 1. Throttle at the IO level, like David mentioned.
> 2. Limit the number of cores we backup concurrently.
> (These two options are *not* mutually exclusive.)
>
> I've been focused on the second option, to limit the number of concurrent
> backups per node. Currently, the overseer sends shard requests to all
> shards in a simple 'for' loop. If the collection has one thousand shards,
> we'll start 1 thousand concurrent backups. The idea is to only send shard
> level requests up to a certain limit per node, and then each time a shard
> is complete, we send the next one for this node.
> If you're interested, I integrated my experiment (for non incremental
> backups) here:
> https://github.com/psalagnac/solr/commit/c77c94e9a3c20aee3e45ec1198f00ab9cf0f76c5
>
> I don't think backup is the only operation that should be considered. At
> least restore is, not sure whether we have other IO intensive operations
> that are at the collection level. Ideally, we should have something generic
> and not consider each type of operation individually.
>
> Thanks
>
>
> Le mar. 20 juin 2023 à 09:58, Ishan Chattopadhyaya <
> ichattopadhyaya@gmail.com> a écrit :
>
> > Might be a good question for users@ list, I guess. I'm sure other users
> > must've thought about this.
> > Cross posting there, as I'm curious myself too.
> >
> > On Tue, 20 Jun 2023 at 01:07, David Smiley <ds...@apache.org> wrote:
> >
> > > Has anyone mitigated the potentially large IO impact of doing a backup
> > of a
> > > large collection or just in general?  If the collection is large enough,
> > > there very well could be many shards on one host and it could saturate
> > the
> > > IO.  I wonder if there should be a rate limit mechanism or some other
> > > mechanism.
> > >
> > > Not the same but I know that at a segment level, the merges are rate
> > > limited -- ConcurrentMergeScheduler doesn't quite let you set it but
> > > adjusts itself automatically ("ioThrottle" boolean).
> > >
> > > ~ David Smiley
> > > Apache Lucene/Solr Search Developer
> > > http://www.linkedin.com/in/davidwsmiley
> > >
> >

Re: Limiting Backup IO

Posted by Pierre Salagnac <pi...@gmail.com>.

Thanks for starting this thread David.

I've been internally working on this, since we have issues (query failures)
during backups of big collections because of IO saturation.

I see two different approaches to solve this:
1. Throttle at the IO level, like David mentioned.
2. Limit the number of cores we backup concurrently.
(These two options are *not* mutually exclusive.)

I've been focused on the second option, to limit the number of concurrent
backups per node. Currently, the overseer sends shard requests to all
shards in a simple 'for' loop. If the collection has one thousand shards,
we'll start 1 thousand concurrent backups. The idea is to only send shard
level requests up to a certain limit per node, and then each time a shard
is complete, we send the next one for this node.
If you're interested, I integrated my experiment (for non incremental
backups) here:
https://github.com/psalagnac/solr/commit/c77c94e9a3c20aee3e45ec1198f00ab9cf0f76c5

I don't think backup is the only operation that should be considered. At
least restore is, not sure whether we have other IO intensive operations
that are at the collection level. Ideally, we should have something generic
and not consider each type of operation individually.

Thanks

Le mar. 20 juin 2023 à 09:58, Ishan Chattopadhyaya <
ichattopadhyaya@gmail.com> a écrit :

> Might be a good question for users@ list, I guess. I'm sure other users
> must've thought about this.
> Cross posting there, as I'm curious myself too.
>
> On Tue, 20 Jun 2023 at 01:07, David Smiley <ds...@apache.org> wrote:
>
> > Has anyone mitigated the potentially large IO impact of doing a backup
> of a
> > large collection or just in general?  If the collection is large enough,
> > there very well could be many shards on one host and it could saturate
> the
> > IO.  I wonder if there should be a rate limit mechanism or some other
> > mechanism.
> >
> > Not the same but I know that at a segment level, the merges are rate
> > limited -- ConcurrentMergeScheduler doesn't quite let you set it but
> > adjusts itself automatically ("ioThrottle" boolean).
> >
> > ~ David Smiley
> > Apache Lucene/Solr Search Developer
> > http://www.linkedin.com/in/davidwsmiley
> >
>

Re: Limiting Backup IO

Posted by Pierre Salagnac <pi...@gmail.com>.

Thanks for starting this thread David.

I've been internally working on this, since we have issues (query failures)
during backups of big collections because of IO saturation.

I see two different approaches to solve this:
1. Throttle at the IO level, like David mentioned.
2. Limit the number of cores we backup concurrently.
(These two options are *not* mutually exclusive.)

I've been focused on the second option, to limit the number of concurrent
backups per node. Currently, the overseer sends shard requests to all
shards in a simple 'for' loop. If the collection has one thousand shards,
we'll start 1 thousand concurrent backups. The idea is to only send shard
level requests up to a certain limit per node, and then each time a shard
is complete, we send the next one for this node.
If you're interested, I integrated my experiment (for non incremental
backups) here:
https://github.com/psalagnac/solr/commit/c77c94e9a3c20aee3e45ec1198f00ab9cf0f76c5

I don't think backup is the only operation that should be considered. At
least restore is, not sure whether we have other IO intensive operations
that are at the collection level. Ideally, we should have something generic
and not consider each type of operation individually.

Thanks

Le mar. 20 juin 2023 à 09:58, Ishan Chattopadhyaya <
ichattopadhyaya@gmail.com> a écrit :

> Might be a good question for users@ list, I guess. I'm sure other users
> must've thought about this.
> Cross posting there, as I'm curious myself too.
>
> On Tue, 20 Jun 2023 at 01:07, David Smiley <ds...@apache.org> wrote:
>
> > Has anyone mitigated the potentially large IO impact of doing a backup
> of a
> > large collection or just in general?  If the collection is large enough,
> > there very well could be many shards on one host and it could saturate
> the
> > IO.  I wonder if there should be a rate limit mechanism or some other
> > mechanism.
> >
> > Not the same but I know that at a segment level, the merges are rate
> > limited -- ConcurrentMergeScheduler doesn't quite let you set it but
> > adjusts itself automatically ("ioThrottle" boolean).
> >
> > ~ David Smiley
> > Apache Lucene/Solr Search Developer
> > http://www.linkedin.com/in/davidwsmiley
> >
>