You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Damien Kamerman <da...@gmail.com> on 2017/06/26 05:28:50 UTC

async backup

I've noticed an issue with the Solr 6.5.1 Collections API BACKUP async
command returning early. The state is finished well before one shard is
finished.

The collection I'm backing up has 12 shards across 6 nodes and I suspect
the issue is that it is not waiting for all backups on the node to finish.

Alternatively, I if I change the request to not be async it works OK but
sometimes I get the exception "backup the collection time out:180s".

Has anyone seen this, or knows a workaround?

Cheers,
Damien.

Re: async backup

Posted by Damien Kamerman <da...@gmail.com>.
yes. Requeststatus is returning state=completed prematurely.

On Tuesday, 27 June 2017, Amrit Sarkar <sa...@gmail.com> wrote:

> Damien,
>
> then I poll with REQUESTSTATUS
>
>
> REQUESTSTATUS is an API which provided you the status of the any API
> (including other heavy duty apis like SPLITSHARD or CREATECOLLECTION)
> associated with async_id at that current timestamp / moment. Does that give
> you "state"="completed"?
>
> Amrit Sarkar
> Search Engineer
> Lucidworks, Inc.
> 415-589-9269
> www.lucidworks.com
> Twitter http://twitter.com/lucidworks
> LinkedIn: https://www.linkedin.com/in/sarkaramrit2
>
> On Tue, Jun 27, 2017 at 5:25 AM, Damien Kamerman <damienk@gmail.com
> <javascript:;>> wrote:
>
> > A regular backup creates the files in this order:
> > drwxr-xr-x   2 root     root          63 Jun 27 09:46 snapshot.shard7
> > drwxr-xr-x   2 root     root         159 Jun 27 09:46 snapshot.shard8
> > drwxr-xr-x   2 root     root         135 Jun 27 09:46 snapshot.shard1
> > drwxr-xr-x   2 root     root         178 Jun 27 09:46 snapshot.shard3
> > drwxr-xr-x   2 root     root         210 Jun 27 09:46 snapshot.shard11
> > drwxr-xr-x   2 root     root         218 Jun 27 09:46 snapshot.shard9
> > drwxr-xr-x   2 root     root         180 Jun 27 09:46 snapshot.shard2
> > drwxr-xr-x   2 root     root         164 Jun 27 09:47 snapshot.shard5
> > drwxr-xr-x   2 root     root         252 Jun 27 09:47 snapshot.shard6
> > drwxr-xr-x   2 root     root         103 Jun 27 09:47 snapshot.shard12
> > drwxr-xr-x   2 root     root         135 Jun 27 09:47 snapshot.shard4
> > drwxr-xr-x   2 root     root         119 Jun 27 09:47 snapshot.shard10
> > drwxr-xr-x   3 root     root           4 Jun 27 09:47 zk_backup
> > -rw-r--r--   1 root     root         185 Jun 27 09:47 backup.properties
> >
> > While an async backup creates files in this order:
> > drwxr-xr-x   2 root     root          15 Jun 27 09:49 snapshot.shard3
> > drwxr-xr-x   2 root     root          15 Jun 27 09:49 snapshot.shard9
> > drwxr-xr-x   2 root     root          62 Jun 27 09:49 snapshot.shard6
> > drwxr-xr-x   2 root     root          37 Jun 27 09:49 snapshot.shard2
> > drwxr-xr-x   2 root     root          67 Jun 27 09:49 snapshot.shard7
> > drwxr-xr-x   2 root     root          75 Jun 27 09:49 snapshot.shard5
> > drwxr-xr-x   2 root     root          70 Jun 27 09:49 snapshot.shard8
> > drwxr-xr-x   2 root     root          15 Jun 27 09:49 snapshot.shard4
> > drwxr-xr-x   2 root     root          15 Jun 27 09:50 snapshot.shard11
> > drwxr-xr-x   2 root     root         127 Jun 27 09:50 snapshot.shard1
> > drwxr-xr-x   2 root     root         116 Jun 27 09:50 snapshot.shard12
> > drwxr-xr-x   3 root     root           4 Jun 27 09:50 zk_backup
> > -rw-r--r--   1 root     root         185 Jun 27 09:50 backup.properties
> > drwxr-xr-x   2 root     root          25 Jun 27 09:51 snapshot.shard10
> >
> >
> > shard10 is much larger than the other shards.
> >
> > From the logs:
> > INFO  - 2017-06-27 09:50:33.832; [   ] org.apache.solr.cloud.BackupCmd;
> > Completed backing up ZK data for backupName=collection1
> > INFO  - 2017-06-27 09:50:33.800; [   ]
> > org.apache.solr.handler.admin.CoreAdminOperation; Checking request
> status
> > for : backup1103459705035055
> > INFO  - 2017-06-27 09:50:33.800; [   ]
> > org.apache.solr.servlet.HttpSolrCall; [admin] webapp=null
> > path=/admin/cores
> > params={qt=/admin/cores&requestid=backup1103459705035055&action=
> > REQUESTSTATUS&wt=javabin&version=2}
> > status=0 QTime=0
> > INFO  - 2017-06-27 09:51:33.405; [   ] org.apache.solr.handler.
> > SnapShooter;
> > Done creating backup snapshot: shard10 at file:///online/backup/
> > collection1
> >
> > Has anyone seen this bug, or knows a workaround?
> >
> >
> > On 27 June 2017 at 09:47, Damien Kamerman <damienk@gmail.com
> <javascript:;>> wrote:
> >
> > > Yes, the async command returns, and then I poll with REQUESTSTATUS.
> > >
> > > On 27 June 2017 at 01:24, Varun Thacker <varun@vthacker.in
> <javascript:;>> wrote:
> > >
> > >> Hi Damien,
> > >>
> > >> A backup command with async is supposed to return early. It is start
> the
> > >> backup process and return.
> > >>
> > >> Are you using the REQUESTSTATUS (
> > >> http://lucene.apache.org/solr/guide/6_6/collections-api.html
> > >> #collections-api
> > >> ) API to validate if the backup is complete?
> > >>
> > >> On Sun, Jun 25, 2017 at 10:28 PM, Damien Kamerman <damienk@gmail.com
> <javascript:;>>
> > >> wrote:
> > >>
> > >> > I've noticed an issue with the Solr 6.5.1 Collections API BACKUP
> async
> > >> > command returning early. The state is finished well before one shard
> > is
> > >> > finished.
> > >> >
> > >> > The collection I'm backing up has 12 shards across 6 nodes and I
> > suspect
> > >> > the issue is that it is not waiting for all backups on the node to
> > >> finish.
> > >> >
> > >> > Alternatively, I if I change the request to not be async it works OK
> > but
> > >> > sometimes I get the exception "backup the collection time out:180s".
> > >> >
> > >> > Has anyone seen this, or knows a workaround?
> > >> >
> > >> > Cheers,
> > >> > Damien.
> > >> >
> > >>
> > >
> > >
> >
>

Re: async backup

Posted by Amrit Sarkar <sa...@gmail.com>.
Damien,

then I poll with REQUESTSTATUS


REQUESTSTATUS is an API which provided you the status of the any API
(including other heavy duty apis like SPLITSHARD or CREATECOLLECTION)
associated with async_id at that current timestamp / moment. Does that give
you "state"="completed"?

Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2

On Tue, Jun 27, 2017 at 5:25 AM, Damien Kamerman <da...@gmail.com> wrote:

> A regular backup creates the files in this order:
> drwxr-xr-x   2 root     root          63 Jun 27 09:46 snapshot.shard7
> drwxr-xr-x   2 root     root         159 Jun 27 09:46 snapshot.shard8
> drwxr-xr-x   2 root     root         135 Jun 27 09:46 snapshot.shard1
> drwxr-xr-x   2 root     root         178 Jun 27 09:46 snapshot.shard3
> drwxr-xr-x   2 root     root         210 Jun 27 09:46 snapshot.shard11
> drwxr-xr-x   2 root     root         218 Jun 27 09:46 snapshot.shard9
> drwxr-xr-x   2 root     root         180 Jun 27 09:46 snapshot.shard2
> drwxr-xr-x   2 root     root         164 Jun 27 09:47 snapshot.shard5
> drwxr-xr-x   2 root     root         252 Jun 27 09:47 snapshot.shard6
> drwxr-xr-x   2 root     root         103 Jun 27 09:47 snapshot.shard12
> drwxr-xr-x   2 root     root         135 Jun 27 09:47 snapshot.shard4
> drwxr-xr-x   2 root     root         119 Jun 27 09:47 snapshot.shard10
> drwxr-xr-x   3 root     root           4 Jun 27 09:47 zk_backup
> -rw-r--r--   1 root     root         185 Jun 27 09:47 backup.properties
>
> While an async backup creates files in this order:
> drwxr-xr-x   2 root     root          15 Jun 27 09:49 snapshot.shard3
> drwxr-xr-x   2 root     root          15 Jun 27 09:49 snapshot.shard9
> drwxr-xr-x   2 root     root          62 Jun 27 09:49 snapshot.shard6
> drwxr-xr-x   2 root     root          37 Jun 27 09:49 snapshot.shard2
> drwxr-xr-x   2 root     root          67 Jun 27 09:49 snapshot.shard7
> drwxr-xr-x   2 root     root          75 Jun 27 09:49 snapshot.shard5
> drwxr-xr-x   2 root     root          70 Jun 27 09:49 snapshot.shard8
> drwxr-xr-x   2 root     root          15 Jun 27 09:49 snapshot.shard4
> drwxr-xr-x   2 root     root          15 Jun 27 09:50 snapshot.shard11
> drwxr-xr-x   2 root     root         127 Jun 27 09:50 snapshot.shard1
> drwxr-xr-x   2 root     root         116 Jun 27 09:50 snapshot.shard12
> drwxr-xr-x   3 root     root           4 Jun 27 09:50 zk_backup
> -rw-r--r--   1 root     root         185 Jun 27 09:50 backup.properties
> drwxr-xr-x   2 root     root          25 Jun 27 09:51 snapshot.shard10
>
>
> shard10 is much larger than the other shards.
>
> From the logs:
> INFO  - 2017-06-27 09:50:33.832; [   ] org.apache.solr.cloud.BackupCmd;
> Completed backing up ZK data for backupName=collection1
> INFO  - 2017-06-27 09:50:33.800; [   ]
> org.apache.solr.handler.admin.CoreAdminOperation; Checking request status
> for : backup1103459705035055
> INFO  - 2017-06-27 09:50:33.800; [   ]
> org.apache.solr.servlet.HttpSolrCall; [admin] webapp=null
> path=/admin/cores
> params={qt=/admin/cores&requestid=backup1103459705035055&action=
> REQUESTSTATUS&wt=javabin&version=2}
> status=0 QTime=0
> INFO  - 2017-06-27 09:51:33.405; [   ] org.apache.solr.handler.
> SnapShooter;
> Done creating backup snapshot: shard10 at file:///online/backup/
> collection1
>
> Has anyone seen this bug, or knows a workaround?
>
>
> On 27 June 2017 at 09:47, Damien Kamerman <da...@gmail.com> wrote:
>
> > Yes, the async command returns, and then I poll with REQUESTSTATUS.
> >
> > On 27 June 2017 at 01:24, Varun Thacker <va...@vthacker.in> wrote:
> >
> >> Hi Damien,
> >>
> >> A backup command with async is supposed to return early. It is start the
> >> backup process and return.
> >>
> >> Are you using the REQUESTSTATUS (
> >> http://lucene.apache.org/solr/guide/6_6/collections-api.html
> >> #collections-api
> >> ) API to validate if the backup is complete?
> >>
> >> On Sun, Jun 25, 2017 at 10:28 PM, Damien Kamerman <da...@gmail.com>
> >> wrote:
> >>
> >> > I've noticed an issue with the Solr 6.5.1 Collections API BACKUP async
> >> > command returning early. The state is finished well before one shard
> is
> >> > finished.
> >> >
> >> > The collection I'm backing up has 12 shards across 6 nodes and I
> suspect
> >> > the issue is that it is not waiting for all backups on the node to
> >> finish.
> >> >
> >> > Alternatively, I if I change the request to not be async it works OK
> but
> >> > sometimes I get the exception "backup the collection time out:180s".
> >> >
> >> > Has anyone seen this, or knows a workaround?
> >> >
> >> > Cheers,
> >> > Damien.
> >> >
> >>
> >
> >
>

Re: async backup

Posted by Damien Kamerman <da...@gmail.com>.
A regular backup creates the files in this order:
drwxr-xr-x   2 root     root          63 Jun 27 09:46 snapshot.shard7
drwxr-xr-x   2 root     root         159 Jun 27 09:46 snapshot.shard8
drwxr-xr-x   2 root     root         135 Jun 27 09:46 snapshot.shard1
drwxr-xr-x   2 root     root         178 Jun 27 09:46 snapshot.shard3
drwxr-xr-x   2 root     root         210 Jun 27 09:46 snapshot.shard11
drwxr-xr-x   2 root     root         218 Jun 27 09:46 snapshot.shard9
drwxr-xr-x   2 root     root         180 Jun 27 09:46 snapshot.shard2
drwxr-xr-x   2 root     root         164 Jun 27 09:47 snapshot.shard5
drwxr-xr-x   2 root     root         252 Jun 27 09:47 snapshot.shard6
drwxr-xr-x   2 root     root         103 Jun 27 09:47 snapshot.shard12
drwxr-xr-x   2 root     root         135 Jun 27 09:47 snapshot.shard4
drwxr-xr-x   2 root     root         119 Jun 27 09:47 snapshot.shard10
drwxr-xr-x   3 root     root           4 Jun 27 09:47 zk_backup
-rw-r--r--   1 root     root         185 Jun 27 09:47 backup.properties

While an async backup creates files in this order:
drwxr-xr-x   2 root     root          15 Jun 27 09:49 snapshot.shard3
drwxr-xr-x   2 root     root          15 Jun 27 09:49 snapshot.shard9
drwxr-xr-x   2 root     root          62 Jun 27 09:49 snapshot.shard6
drwxr-xr-x   2 root     root          37 Jun 27 09:49 snapshot.shard2
drwxr-xr-x   2 root     root          67 Jun 27 09:49 snapshot.shard7
drwxr-xr-x   2 root     root          75 Jun 27 09:49 snapshot.shard5
drwxr-xr-x   2 root     root          70 Jun 27 09:49 snapshot.shard8
drwxr-xr-x   2 root     root          15 Jun 27 09:49 snapshot.shard4
drwxr-xr-x   2 root     root          15 Jun 27 09:50 snapshot.shard11
drwxr-xr-x   2 root     root         127 Jun 27 09:50 snapshot.shard1
drwxr-xr-x   2 root     root         116 Jun 27 09:50 snapshot.shard12
drwxr-xr-x   3 root     root           4 Jun 27 09:50 zk_backup
-rw-r--r--   1 root     root         185 Jun 27 09:50 backup.properties
drwxr-xr-x   2 root     root          25 Jun 27 09:51 snapshot.shard10


shard10 is much larger than the other shards.

From the logs:
INFO  - 2017-06-27 09:50:33.832; [   ] org.apache.solr.cloud.BackupCmd;
Completed backing up ZK data for backupName=collection1
INFO  - 2017-06-27 09:50:33.800; [   ]
org.apache.solr.handler.admin.CoreAdminOperation; Checking request status
for : backup1103459705035055
INFO  - 2017-06-27 09:50:33.800; [   ]
org.apache.solr.servlet.HttpSolrCall; [admin] webapp=null path=/admin/cores
params={qt=/admin/cores&requestid=backup1103459705035055&action=REQUESTSTATUS&wt=javabin&version=2}
status=0 QTime=0
INFO  - 2017-06-27 09:51:33.405; [   ] org.apache.solr.handler.SnapShooter;
Done creating backup snapshot: shard10 at file:///online/backup/collection1

Has anyone seen this bug, or knows a workaround?


On 27 June 2017 at 09:47, Damien Kamerman <da...@gmail.com> wrote:

> Yes, the async command returns, and then I poll with REQUESTSTATUS.
>
> On 27 June 2017 at 01:24, Varun Thacker <va...@vthacker.in> wrote:
>
>> Hi Damien,
>>
>> A backup command with async is supposed to return early. It is start the
>> backup process and return.
>>
>> Are you using the REQUESTSTATUS (
>> http://lucene.apache.org/solr/guide/6_6/collections-api.html
>> #collections-api
>> ) API to validate if the backup is complete?
>>
>> On Sun, Jun 25, 2017 at 10:28 PM, Damien Kamerman <da...@gmail.com>
>> wrote:
>>
>> > I've noticed an issue with the Solr 6.5.1 Collections API BACKUP async
>> > command returning early. The state is finished well before one shard is
>> > finished.
>> >
>> > The collection I'm backing up has 12 shards across 6 nodes and I suspect
>> > the issue is that it is not waiting for all backups on the node to
>> finish.
>> >
>> > Alternatively, I if I change the request to not be async it works OK but
>> > sometimes I get the exception "backup the collection time out:180s".
>> >
>> > Has anyone seen this, or knows a workaround?
>> >
>> > Cheers,
>> > Damien.
>> >
>>
>
>

Re: async backup

Posted by Damien Kamerman <da...@gmail.com>.
Yes, the async command returns, and then I poll with REQUESTSTATUS.

On 27 June 2017 at 01:24, Varun Thacker <va...@vthacker.in> wrote:

> Hi Damien,
>
> A backup command with async is supposed to return early. It is start the
> backup process and return.
>
> Are you using the REQUESTSTATUS (
> http://lucene.apache.org/solr/guide/6_6/collections-api.
> html#collections-api
> ) API to validate if the backup is complete?
>
> On Sun, Jun 25, 2017 at 10:28 PM, Damien Kamerman <da...@gmail.com>
> wrote:
>
> > I've noticed an issue with the Solr 6.5.1 Collections API BACKUP async
> > command returning early. The state is finished well before one shard is
> > finished.
> >
> > The collection I'm backing up has 12 shards across 6 nodes and I suspect
> > the issue is that it is not waiting for all backups on the node to
> finish.
> >
> > Alternatively, I if I change the request to not be async it works OK but
> > sometimes I get the exception "backup the collection time out:180s".
> >
> > Has anyone seen this, or knows a workaround?
> >
> > Cheers,
> > Damien.
> >
>

Re: async backup

Posted by Varun Thacker <va...@vthacker.in>.
Hi Damien,

A backup command with async is supposed to return early. It is start the
backup process and return.

Are you using the REQUESTSTATUS (
http://lucene.apache.org/solr/guide/6_6/collections-api.html#collections-api
) API to validate if the backup is complete?

On Sun, Jun 25, 2017 at 10:28 PM, Damien Kamerman <da...@gmail.com> wrote:

> I've noticed an issue with the Solr 6.5.1 Collections API BACKUP async
> command returning early. The state is finished well before one shard is
> finished.
>
> The collection I'm backing up has 12 shards across 6 nodes and I suspect
> the issue is that it is not waiting for all backups on the node to finish.
>
> Alternatively, I if I change the request to not be async it works OK but
> sometimes I get the exception "backup the collection time out:180s".
>
> Has anyone seen this, or knows a workaround?
>
> Cheers,
> Damien.
>