You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Greenhorn Techie <gr...@gmail.com> on 2018/06/01 13:23:37 UTC

SolrCloud Collection Backup - Solr 5.5.4

Hi,

We are running SolrCloud with version 5.5.4. As I understand, Solr
Collection Backup and Restore API are only supported from version 6
onwards. So wondering what is the best mechanism to get our collections
backed-up on older Solr version.

When I ran backup command on a particular node (curl
http://localhost:8983/solr/gettingstarted/replication?command=backup) it
seems it only creates a snapshot for the collection data stored on that
particular node. Does that mean, if I run this command for every node
hosting my SolrCloud collection, I will be getting the required backup?
Will this backup the metadata as well from ZK? I presume not. If so, what
are the best possible approaches to get the same. Is there something made
available by Solr for the same?

Thanks

Re: SolrCloud Collection Backup - Solr 5.5.4

Posted by Shawn Heisey <ap...@elyograg.org>.
On 6/4/2018 5:36 AM, Greenhorn Techie wrote:
> 1. In the SolrCloud, as a single host can have information about multiple
> shards (either leader or replica), how does the backup API handle the
> underlying data copy? I presume it will simply copy the data across ALL the
> shards (both leader and replicas) for the specified collection.

The Collections API backup would indeed work this way.

I see this line of code in the patch for SOLR-5750:

log.debug("Sent backup requests to all shard leaders for 
snapshotName={}", backupName);

So it sounds like the leader replica will write the backup for each shard.

> 2. If I am invoking the backup command periodically to backup the data and
> then invoke restore command later (possibly due to cluster shutdown and
> create a fresh SolrCloud cluster), I presume I don't need to tinker with
> the hash values as long as the default settings have been used in both
> backup and restore situations?

The Collections API restore capability creates a new collection from the 
backup.  The backup includes information gathered from ZK.  The restored 
collection should have all the same hash ranges found in the original 
collection.

Thanks,
Shawn


Re: SolrCloud Collection Backup - Solr 5.5.4

Posted by Greenhorn Techie <gr...@gmail.com>.
Thanks Shawn for your detailed reply. It has helped to better my
understanding. Below is my summarised understanding.

In a SolrCloud setup with version less than 6.1, there is no ‘elegant’ way
of handling collection backups and restore. Instead, have to use the manual
backup and restore APIs using replication handler. However, as these APIs
were primarily designed for standalone Solr installations, we can only
backup data stored on a single Solr host for a particular core. Hence, in
order to get the complete collection data backed-up for a SolrCloud
collection, backup API should be used for all the nodes belonging to the
SolrCloud cluster and then manually backup ZooKeeper clusterstate, with
possible tweaking needed to ensure hash value consistency.

Few follow-up questions:
1. In the SolrCloud, as a single host can have information about multiple
shards (either leader or replica), how does the backup API handle the
underlying data copy? I presume it will simply copy the data across ALL the
shards (both leader and replicas) for the specified collection.
2. If I am invoking the backup command periodically to backup the data and
then invoke restore command later (possibly due to cluster shutdown and
create a fresh SolrCloud cluster), I presume I don't need to tinker with
the hash values as long as the default settings have been used in both
backup and restore situations?

Thanks


On 2 June 2018 at 08:59:26, Shawn Heisey (apache@elyograg.org) wrote:

On 6/2/2018 1:50 AM, Shawn Heisey wrote:
> If you provide a location parameter, it will write a new backup
> directory in that location.
>
>
https://lucene.apache.org/solr/guide/6_6/making-and-restoring-backups.html#standalone-mode-backups
>
> I verified that this parameter is in the 5.5 docs too, I would suggest
> you download that version in PDF format if you want a full reference.

A followup:

I suspect that if you try to use the restore functionality on the
replication handler and have multiple shard replicas, that SolrCloud
would not replicate things properly.  I could be wrong about that, but I
think that restoring from replication handler backups to SolrCloud could
get a little messy.

Thanks,
Shawn

Re: SolrCloud Collection Backup - Solr 5.5.4

Posted by Shawn Heisey <ap...@elyograg.org>.
On 6/2/2018 1:50 AM, Shawn Heisey wrote:
> If you provide a location parameter, it will write a new backup
> directory in that location.
>
> https://lucene.apache.org/solr/guide/6_6/making-and-restoring-backups.html#standalone-mode-backups
>
> I verified that this parameter is in the 5.5 docs too, I would suggest
> you download that version in PDF format if you want a full reference.

A followup:

I suspect that if you try to use the restore functionality on the 
replication handler and have multiple shard replicas, that SolrCloud 
would not replicate things properly.  I could be wrong about that, but I 
think that restoring from replication handler backups to SolrCloud could 
get a little messy.

Thanks,
Shawn


Re: SolrCloud Collection Backup - Solr 5.5.4

Posted by Shawn Heisey <ap...@elyograg.org>.
On 6/1/2018 7:23 AM, Greenhorn Techie wrote:
> We are running SolrCloud with version 5.5.4. As I understand, Solr
> Collection Backup and Restore API are only supported from version 6
> onwards. So wondering what is the best mechanism to get our collections
> backed-up on older Solr version.

That functionality was added in 6.1.

https://issues.apache.org/jira/browse/SOLR-5750

> When I ran backup command on a particular node (curl
> http://localhost:8983/solr/gettingstarted/replication?command=backup) it
> seems it only creates a snapshot for the collection data stored on that
> particular node. Does that mean, if I run this command for every node
> hosting my SolrCloud collection, I will be getting the required backup?
> Will this backup the metadata as well from ZK? I presume not.

If you provide a location parameter, it will write a new backup
directory in that location.

https://lucene.apache.org/solr/guide/6_6/making-and-restoring-backups.html#standalone-mode-backups

I verified that this parameter is in the 5.5 docs too, I would suggest
you download that version in PDF format if you want a full reference.

It would probably be a good idea to create a separate directory for each
core that you work on.

If the backup is done on all the right cores, you will get all the index
data, but you will have no info from ZK.  If the collection has more
than one shard and uses the compositeId router, then you will need the
info frpom the collection's clusterstate aabout hash shard ranges, and
those would have to be verified and possibly adjusted on the new
collection before you started putting the data back in.  If the new
collection uses different hash ranges than the one you backed up, then
the restored collection would not function correctly.

> If so, what
> are the best possible approaches to get the same. Is there something made
> available by Solr for the same?

If you can do it, upgrading to the latest 6.x or 7.x version would be a
good idea, to have full SolrCloud backup and restore functionality.

--------------

You asked me some questions via IRC when I wasn't around, then were
logged off by the time I got back to IRC.  I don't know when you might
come back online there.  Here's some info for those questions:

The reason that 'ant server' isn't working is that you're at the top
level of the source.  It should work if you change to the solr directory
first.

Similar to what you've encountered, I can't get eclipse to work properly
when using a downloaded 6.6.2 source package (solr-6.6.2-src.tgz).  But
if I use these commands instead, then import into eclipse, it works:

git clone https://git-wip-us.apache.org/repos/asf/lucene-solr.git
cd lucene-solr
git checkout refs/tags/releases/lucene-solr/6.6.2
ant clean clean-jars clean-eclipse eclipse

The clean targets are not strictly necessary with a fresh clone, but
that works even when the tree isn't fresh.

I've never had very good luck with the downloadable source packages. 
Some of the build system functionality *only* works when the source is
obtained with git, so I prefer that.

Thanks,
Shawn