You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Shawn Heisey <ap...@elyograg.org> on 2018/10/12 15:28:08 UTC
Something odd with async request status for BACKUP operation on
Collections API
I'm working on reproducing a problem reported via the IRC channel.
Started a test cloud with 7.5.0. Initially with two nodes, then again
with 3 nodes. Did this on Windows 10.
Command to create a collection:
bin\solr create -c test2 -shards 30 -replicationFactor 2
For these URLs, I dropped them into a browser, so URL encoding was
handled automatically. I'm sure the URL to start the backup wouldn't
work as-is with curl because it includes characters that need encoding.
Backup URL:
http://localhost:8983/solr/admin/collections?action=BACKUP&name=test2.3&collection=test2&location=C:\Users\elyograg\Downloads\solrbackups&async=sometag
Request status URL:
http://localhost:8983/solr/admin/collections?action=REQUESTSTATUS&requestid=sometag
Here's the raw JSON response from the status URL:
{
"responseHeader":{
"status":0,
"QTime":3},
"success":{
"192.168.56.1:7574_solr":{
"responseHeader":{
"status":0,
"QTime":0}},
"192.168.56.1:8983_solr":{
"responseHeader":{
"status":0,
"QTime":2}},
"192.168.56.1:8983_solr":{
"responseHeader":{
"status":0,
"QTime":0}},
"192.168.56.1:8983_solr":{
"responseHeader":{
"status":0,
"QTime":2}},
"192.168.56.1:8983_solr":{
"responseHeader":{
"status":0,
"QTime":0}},
"192.168.56.1:7574_solr":{
"responseHeader":{
"status":0,
"QTime":0}},
"192.168.56.1:7574_solr":{
"responseHeader":{
"status":0,
"QTime":1}},
"192.168.56.1:8983_solr":{
"responseHeader":{
"status":0,
"QTime":35}},
"192.168.56.1:8983_solr":{
"responseHeader":{
"status":0,
"QTime":0}},
"192.168.56.1:8983_solr":{
"responseHeader":{
"status":0,
"QTime":1}},
"192.168.56.1:8983_solr":{
"responseHeader":{
"status":0,
"QTime":1}},
"192.168.56.1:8983_solr":{
"responseHeader":{
"status":0,
"QTime":0}},
"192.168.56.1:8983_solr":{
"responseHeader":{
"status":0,
"QTime":33}},
"192.168.56.1:8983_solr":{
"responseHeader":{
"status":0,
"QTime":34}},
"192.168.56.1:8983_solr":{
"responseHeader":{
"status":0,
"QTime":0}},
"192.168.56.1:8983_solr":{
"responseHeader":{
"status":0,
"QTime":40}},
"192.168.56.1:8984_solr":{
"responseHeader":{
"status":0,
"QTime":2}},
"192.168.56.1:8984_solr":{
"responseHeader":{
"status":0,
"QTime":2}},
"192.168.56.1:7574_solr":{
"responseHeader":{
"status":0,
"QTime":0}},
"192.168.56.1:8983_solr":{
"responseHeader":{
"status":0,
"QTime":0}},
"192.168.56.1:7574_solr":{
"responseHeader":{
"status":0,
"QTime":0}},
"192.168.56.1:7574_solr":{
"responseHeader":{
"status":0,
"QTime":0}},
"192.168.56.1:8983_solr":{
"responseHeader":{
"status":0,
"QTime":0}},
"192.168.56.1:7574_solr":{
"responseHeader":{
"status":0,
"QTime":0}},
"192.168.56.1:8984_solr":{
"responseHeader":{
"status":0,
"QTime":0}},
"192.168.56.1:8984_solr":{
"responseHeader":{
"status":0,
"QTime":0}},
"192.168.56.1:7574_solr":{
"responseHeader":{
"status":0,
"QTime":0}},
"192.168.56.1:8983_solr":{
"responseHeader":{
"status":0,
"QTime":0}},
"192.168.56.1:8983_solr":{
"responseHeader":{
"status":0,
"QTime":0}},
"192.168.56.1:8983_solr":{
"responseHeader":{
"status":0,
"QTime":1}}},
"sometag135341573915254":{
"responseHeader":{
"status":0,
"QTime":0},
"STATUS":"completed",
"Response":"TaskId: sometag135341573915254 webapp=null
path=/admin/cores
params={core=test2_shard9_replica_n34&async=sometag135341573915254&qt=/admin/cores&name=shard9&action=BACKUPCORE&location=file:///C:/Users/elyograg/Downloads/solrbackups/test2.3&wt=javabin&version=2}
status=0 QTime=0"},
"sometag135341570605052":{
"responseHeader":{
"status":0,
"QTime":0},
"STATUS":"completed",
"Response":"TaskId: sometag135341570605052 webapp=null
path=/admin/cores
params={core=test2_shard1_replica_n1&async=sometag135341570605052&qt=/admin/cores&name=shard1&action=BACKUPCORE&location=file:///C:/Users/elyograg/Downloads/solrbackups/test2.3&wt=javabin&version=2}
status=0 QTime=0"},
"sometag135341570647962":{
"responseHeader":{
"status":0,
"QTime":0},
"STATUS":"completed",
"Response":"TaskId: sometag135341570647962 webapp=null
path=/admin/cores
params={core=test2_shard7_replica_n26&async=sometag135341570647962&qt=/admin/cores&name=shard7&action=BACKUPCORE&location=file:///C:/Users/elyograg/Downloads/solrbackups/test2.3&wt=javabin&version=2}
status=0 QTime=0"},
"status":{
"state":"completed",
"msg":"found [sometag] in completed tasks"}}
As you can see, only 3 (out of 30) shards are mentioned in the response.
When I did the same test on a 2-node cloud example, there were only 2
shards in the response.
Should all 30 shards have been in the response? Is there a bug here?
If I make the request without the async parameter, the response doesn't
contain ANY shard information at all. Because this is an empty
collection, the backup is fast. I expected detailed information to be in
the response. Is that worth an issue in Jira?
Side note: In the status response, the individual shard info that IS
present doesn't indicate what node handled the CoreAdmin call. That
would be useful information to include.
Thanks,
Shawn
Re: Something odd with async request status for BACKUP operation on
Collections API
Posted by Shawn Heisey <ap...@elyograg.org>.
On 10/14/2018 10:39 PM, Shalin Shekhar Mangar wrote:
> The responses are collected by node so subsequent responses from the same
> node overwrite previous responses. Definitely a bug. Please open an issue.
Done.
https://issues.apache.org/jira/browse/SOLR-12867
Thanks,
Shawn
Re: Something odd with async request status for BACKUP operation on
Collections API
Posted by Shalin Shekhar Mangar <sh...@gmail.com>.
The responses are collected by node so subsequent responses from the same
node overwrite previous responses. Definitely a bug. Please open an issue.
On Mon, Oct 15, 2018 at 6:24 AM Shawn Heisey <ap...@elyograg.org> wrote:
> On 10/14/2018 6:25 PM, damienk@gmail.com wrote:
> > I had an issue with async backup on solr 6.5.1 reporting that the backup
> > was complete when clearly it was not. I was using 12 shards across 6
> nodes.
> > I only noticed this issue when one shard was much larger than the others.
> > There were no answers here
> > http://lucene.472066.n3.nabble.com/async-backup-td4342776.html
>
> One detail I thought I had written but isn't there: The backup did
> fully complete -- all 30 shards were in the backup location. Not a lot
> in each shard backup -- the collection was empty. It would be easy
> enough to add a few thousand documents to the collection before doing
> the backup.
>
> If the backup process reports that it's done before it's ACTUALLY done,
> that's a bad thing. It's hard to say whether that problem is related to
> the problem I described. Since I haven't dived into the code, I cannot
> say for sure, but it honestly would not surprise me to find they are
> connected. Every time I try to understand Collections API code, I find
> it extremely difficult to follow.
>
> I'm sorry that you never got resolution on your problem. Do you know
> whether that is still a problem in 7.x? Setting up a reproduction where
> one shard is significantly larger than the others will take a little bit
> of work.
>
> > I was focusing on the STATUS returned from the REQUESTSTATUS command, but
> > looking again now I can see a response from only 6 shards, and each shard
> > is from a different node. So this fits with what you're seeing. I assume
> > your shards 1, 7, 9 are all on different nodes.
>
> I did not actually check, and the cloud example I was using isn't around
> any more, but each of the shards in the status response were PROBABLY on
> separate nodes. The cloud example was 3 nodes. It's an easy enough
> scenario to replicate, and I provided enough details for anyone to do it.
>
> The person on IRC that reported this problem had a cluster of 15 nodes,
> and the status response had ten shards (out of 30) mentioned. It was
> shards 1-9 and shard 20. The suspicion is that there's something
> hard-coded that limits it to 10 responses ... because without that, I
> would expect the number of shards in the response to match the number of
> nodes.
>
> Thanks,
> Shawn
>
>
--
Regards,
Shalin Shekhar Mangar.
Re: Something odd with async request status for BACKUP operation on
Collections API
Posted by Shawn Heisey <ap...@elyograg.org>.
On 10/14/2018 6:25 PM, damienk@gmail.com wrote:
> I had an issue with async backup on solr 6.5.1 reporting that the backup
> was complete when clearly it was not. I was using 12 shards across 6 nodes.
> I only noticed this issue when one shard was much larger than the others.
> There were no answers here
> http://lucene.472066.n3.nabble.com/async-backup-td4342776.html
One detail I thought I had written but isn't there: The backup did
fully complete -- all 30 shards were in the backup location. Not a lot
in each shard backup -- the collection was empty. It would be easy
enough to add a few thousand documents to the collection before doing
the backup.
If the backup process reports that it's done before it's ACTUALLY done,
that's a bad thing. It's hard to say whether that problem is related to
the problem I described. Since I haven't dived into the code, I cannot
say for sure, but it honestly would not surprise me to find they are
connected. Every time I try to understand Collections API code, I find
it extremely difficult to follow.
I'm sorry that you never got resolution on your problem. Do you know
whether that is still a problem in 7.x? Setting up a reproduction where
one shard is significantly larger than the others will take a little bit
of work.
> I was focusing on the STATUS returned from the REQUESTSTATUS command, but
> looking again now I can see a response from only 6 shards, and each shard
> is from a different node. So this fits with what you're seeing. I assume
> your shards 1, 7, 9 are all on different nodes.
I did not actually check, and the cloud example I was using isn't around
any more, but each of the shards in the status response were PROBABLY on
separate nodes. The cloud example was 3 nodes. It's an easy enough
scenario to replicate, and I provided enough details for anyone to do it.
The person on IRC that reported this problem had a cluster of 15 nodes,
and the status response had ten shards (out of 30) mentioned. It was
shards 1-9 and shard 20. The suspicion is that there's something
hard-coded that limits it to 10 responses ... because without that, I
would expect the number of shards in the response to match the number of
nodes.
Thanks,
Shawn
Re: Something odd with async request status for BACKUP operation on
Collections API
Posted by da...@gmail.com.
Hi Shawn,
I had an issue with async backup on solr 6.5.1 reporting that the backup
was complete when clearly it was not. I was using 12 shards across 6 nodes.
I only noticed this issue when one shard was much larger than the others.
There were no answers here
http://lucene.472066.n3.nabble.com/async-backup-td4342776.html
I was focusing on the STATUS returned from the REQUESTSTATUS command, but
looking again now I can see a response from only 6 shards, and each shard
is from a different node. So this fits with what you're seeing. I assume
your shards 1, 7, 9 are all on different nodes.
HTH,
Damien.
On Sat, 13 Oct 2018 at 02:28, Shawn Heisey <ap...@elyograg.org> wrote:
> I'm working on reproducing a problem reported via the IRC channel.
>
> Started a test cloud with 7.5.0. Initially with two nodes, then again
> with 3 nodes. Did this on Windows 10.
>
> Command to create a collection:
>
> bin\solr create -c test2 -shards 30 -replicationFactor 2
>
> For these URLs, I dropped them into a browser, so URL encoding was
> handled automatically. I'm sure the URL to start the backup wouldn't
> work as-is with curl because it includes characters that need encoding.
>
> Backup URL:
>
>
> http://localhost:8983/solr/admin/collections?action=BACKUP&name=test2.3&collection=test2&location=C
> :\Users\elyograg\Downloads\solrbackups&async=sometag
>
> Request status URL:
>
>
> http://localhost:8983/solr/admin/collections?action=REQUESTSTATUS&requestid=sometag
>
> Here's the raw JSON response from the status URL:
> {
> "responseHeader":{
> "status":0,
> "QTime":3},
> "success":{
> "192.168.56.1:7574_solr":{
> "responseHeader":{
> "status":0,
> "QTime":0}},
> "192.168.56.1:8983_solr":{
> "responseHeader":{
> "status":0,
> "QTime":2}},
> "192.168.56.1:8983_solr":{
> "responseHeader":{
> "status":0,
> "QTime":0}},
> "192.168.56.1:8983_solr":{
> "responseHeader":{
> "status":0,
> "QTime":2}},
> "192.168.56.1:8983_solr":{
> "responseHeader":{
> "status":0,
> "QTime":0}},
> "192.168.56.1:7574_solr":{
> "responseHeader":{
> "status":0,
> "QTime":0}},
> "192.168.56.1:7574_solr":{
> "responseHeader":{
> "status":0,
> "QTime":1}},
> "192.168.56.1:8983_solr":{
> "responseHeader":{
> "status":0,
> "QTime":35}},
> "192.168.56.1:8983_solr":{
> "responseHeader":{
> "status":0,
> "QTime":0}},
> "192.168.56.1:8983_solr":{
> "responseHeader":{
> "status":0,
> "QTime":1}},
> "192.168.56.1:8983_solr":{
> "responseHeader":{
> "status":0,
> "QTime":1}},
> "192.168.56.1:8983_solr":{
> "responseHeader":{
> "status":0,
> "QTime":0}},
> "192.168.56.1:8983_solr":{
> "responseHeader":{
> "status":0,
> "QTime":33}},
> "192.168.56.1:8983_solr":{
> "responseHeader":{
> "status":0,
> "QTime":34}},
> "192.168.56.1:8983_solr":{
> "responseHeader":{
> "status":0,
> "QTime":0}},
> "192.168.56.1:8983_solr":{
> "responseHeader":{
> "status":0,
> "QTime":40}},
> "192.168.56.1:8984_solr":{
> "responseHeader":{
> "status":0,
> "QTime":2}},
> "192.168.56.1:8984_solr":{
> "responseHeader":{
> "status":0,
> "QTime":2}},
> "192.168.56.1:7574_solr":{
> "responseHeader":{
> "status":0,
> "QTime":0}},
> "192.168.56.1:8983_solr":{
> "responseHeader":{
> "status":0,
> "QTime":0}},
> "192.168.56.1:7574_solr":{
> "responseHeader":{
> "status":0,
> "QTime":0}},
> "192.168.56.1:7574_solr":{
> "responseHeader":{
> "status":0,
> "QTime":0}},
> "192.168.56.1:8983_solr":{
> "responseHeader":{
> "status":0,
> "QTime":0}},
> "192.168.56.1:7574_solr":{
> "responseHeader":{
> "status":0,
> "QTime":0}},
> "192.168.56.1:8984_solr":{
> "responseHeader":{
> "status":0,
> "QTime":0}},
> "192.168.56.1:8984_solr":{
> "responseHeader":{
> "status":0,
> "QTime":0}},
> "192.168.56.1:7574_solr":{
> "responseHeader":{
> "status":0,
> "QTime":0}},
> "192.168.56.1:8983_solr":{
> "responseHeader":{
> "status":0,
> "QTime":0}},
> "192.168.56.1:8983_solr":{
> "responseHeader":{
> "status":0,
> "QTime":0}},
> "192.168.56.1:8983_solr":{
> "responseHeader":{
> "status":0,
> "QTime":1}}},
> "sometag135341573915254":{
> "responseHeader":{
> "status":0,
> "QTime":0},
> "STATUS":"completed",
> "Response":"TaskId: sometag135341573915254 webapp=null
> path=/admin/cores
> params={core=test2_shard9_replica_n34&async=sometag135341573915254&qt=/admin/cores&name=shard9&action=BACKUPCORE&location=file:///C:/Users/elyograg/Downloads/solrbackups/test2.3&wt=javabin&version=2}
>
> status=0 QTime=0"},
> "sometag135341570605052":{
> "responseHeader":{
> "status":0,
> "QTime":0},
> "STATUS":"completed",
> "Response":"TaskId: sometag135341570605052 webapp=null
> path=/admin/cores
> params={core=test2_shard1_replica_n1&async=sometag135341570605052&qt=/admin/cores&name=shard1&action=BACKUPCORE&location=file:///C:/Users/elyograg/Downloads/solrbackups/test2.3&wt=javabin&version=2}
>
> status=0 QTime=0"},
> "sometag135341570647962":{
> "responseHeader":{
> "status":0,
> "QTime":0},
> "STATUS":"completed",
> "Response":"TaskId: sometag135341570647962 webapp=null
> path=/admin/cores
> params={core=test2_shard7_replica_n26&async=sometag135341570647962&qt=/admin/cores&name=shard7&action=BACKUPCORE&location=file:///C:/Users/elyograg/Downloads/solrbackups/test2.3&wt=javabin&version=2}
>
> status=0 QTime=0"},
> "status":{
> "state":"completed",
> "msg":"found [sometag] in completed tasks"}}
>
>
> As you can see, only 3 (out of 30) shards are mentioned in the response.
> When I did the same test on a 2-node cloud example, there were only 2
> shards in the response.
>
> Should all 30 shards have been in the response? Is there a bug here?
>
> If I make the request without the async parameter, the response doesn't
> contain ANY shard information at all. Because this is an empty
> collection, the backup is fast. I expected detailed information to be in
> the response. Is that worth an issue in Jira?
>
> Side note: In the status response, the individual shard info that IS
> present doesn't indicate what node handled the CoreAdmin call. That
> would be useful information to include.
>
> Thanks,
> Shawn
>
>