You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Brian Narsi <bn...@gmail.com> on 2015/11/17 16:08:06 UTC

Data Import Handler / Backup indexes

I am using Data Import Handler to retrieve data from a database with

full-import, clean = true, commit = true and optimize = true

This has always worked correctly without any errors.

But just to be on the safe side, I am thinking that we should do a backup
before initiating Data Import Handler. And just in case something happens
restore the backup.

Can backup be done automatically (before initiating Data Import Handler)?

Thanks

Re: Data Import Handler / Backup indexes

Posted by Erick Erickson <er...@gmail.com>.
These are just Lucene indexes. There's the Cloud backup and restore
that is being worked on.

But if the index is static (i.e. not being indexed to), simply copying
the data/index (well, actually the whole data index and subdirs)
directory will backup and restore it. Copying the index directory back
(I'd have Solr shut down when copying back) would restore the index.

Best,
Erick

On Sat, Nov 21, 2015 at 10:12 PM, Brian Narsi <bn...@gmail.com> wrote:
> What are the caveats regarding the copy of a collection?
>
> At this time DIH takes only about 10 minutes. So in case of accidental
> delete we can just re-run the DIH. The reason I am thinking about backup is
> just in case records are deleted accidentally and the DIH cannot be run
> because the database is unavailable.
>
> Our collection is simple: 2 nodes - 1 collection - 2 shards with 2 replicas
> each
>
> So a simple copy (cp command) for both the nodes/shards might work for us?
> How do I restore the data back?
>
>
>
> On Tue, Nov 17, 2015 at 4:56 PM, Jeff Wartes <jw...@whitepages.com> wrote:
>
>>
>> https://github.com/whitepages/solrcloud_manager supports 5.x, and I added
>> some backup/restore functionality similar to SOLR-5750 in the last
>> release.
>> Like SOLR-5750, this backup strategy requires a shared filesystem, but
>> note that unlike SOLR-5750, I haven’t yet added any backup functionality
>> for the contents of ZK. I’m currently working on some parts of that.
>>
>>
>> Making a copy of a collection is supported too, with some caveats.
>>
>>
>> On 11/17/15, 10:20 AM, "Brian Narsi" <bn...@gmail.com> wrote:
>>
>> >Sorry I forgot to mention that we are using SolrCloud 5.1.0.
>> >
>> >
>> >
>> >On Tue, Nov 17, 2015 at 12:09 PM, KNitin <ni...@gmail.com> wrote:
>> >
>> >> afaik Data import handler does not offer backups. You can try using the
>> >> replication handler to backup data as you wish to any custom end point.
>> >>
>> >> You can also try out : https://github.com/bloomreach/solrcloud-haft.
>> >>This
>> >> helps backup solr indices across clusters.
>> >>
>> >> On Tue, Nov 17, 2015 at 7:08 AM, Brian Narsi <bn...@gmail.com>
>> wrote:
>> >>
>> >> > I am using Data Import Handler to retrieve data from a database with
>> >> >
>> >> > full-import, clean = true, commit = true and optimize = true
>> >> >
>> >> > This has always worked correctly without any errors.
>> >> >
>> >> > But just to be on the safe side, I am thinking that we should do a
>> >>backup
>> >> > before initiating Data Import Handler. And just in case something
>> >>happens
>> >> > restore the backup.
>> >> >
>> >> > Can backup be done automatically (before initiating Data Import
>> >>Handler)?
>> >> >
>> >> > Thanks
>> >> >
>> >>
>>
>>

Re: Data Import Handler / Backup indexes

Posted by Jeff Wartes <jw...@whitepages.com>.
The backup/restore approach in SOLR-5750 and in solrcloud_manager is
really just that - copying the index files.
On backup, it saves your index directories, and on restore, it puts them
in the data dir, moves a pointer for the current index dir, and opens a
new searcher. Both are mostly just wrappers on the proper Solr
replication-handler commands, since Solr already has some lower level APIs
for these operations.

There is a shared filesystem requirement for backup/restore though, which
is to account for the fact that when you make the backup you don’t know
which nodes will need to restore a given shard.

The commands would look something like:

    java -jar solrcloud_manager-assembly-1.4.0.jar backupindex -z
zk0.example.com:2181/myapp -c collection1 --dir <shareddir>
    java -jar solrcloud_manager-assembly-1.4.0.jar restoreindex -z
zk0.example.com:2181/myapp -c collection1 --dir <shareddir>

Or you could restore into a new collection:
    java -jar solrcloud_manager-assembly-1.4.0.jar backupindex -z
zk0.example.com:2181/myapp -c collection1 --dir <shareddir>
    java -jar solrcloud_manager-assembly-1.4.0.jar clonecollection -z
zk0.example.com:2181/myapp -c newcollection --fromCollection collection1
    java -jar solrcloud_manager-assembly-1.4.0.jar restoreindex -z
zk0.example.com:2181/myapp -c newcollection --dir <shareddir>
--restoreFrom collection1

If you don’t have a shared filesystem, you can still do the copy
collection route:
    java -jar solrcloud_manager-assembly-1.4.0.jar clonecollection -z
zk0.example.com:2181/myapp -c newcollection --fromCollection collection1

    java -jar solrcloud_manager-assembly-1.4.0.jar copycollection -z
zk0.example.com:2181/myapp -c newcollection --fromCollection collection1

This creates a new collection with the same settings, (clonecollection)
and triggers a one-shot “replication” into it. (copycollection) Again,
this is just framework for the proper (largely undocumented) Solr API
commands, to work around the lack of a convenient collections-level API
command.

One nice thing about using copy collection is that it can be used to keep
a backup collection up to date, only copying if necessary. Honestly
though, I don’t have as much experience with this use case as KNitin does
in solrcloud-haft, which is why I suggest using an empty collection in the
README right now. If you try that use case with solrcloud_manager, I’d be
interested in your experience. It should work, but you’ll need to disable
the verification with --skipCheck and check manually.


Having said all that though, yes, with your simple use case and small
collection, you can do everything you want with just cp. The easiest way
would be to make a backup copy of your index dir. If you need to restore,
shut down solr, nuke your index dir, and copy the backup in there. You’d
probably need to do this on all nodes at once though, to prevent a
non-leader from coming up and re-syncing with a piece of the index you
hadn’t restored yet.




On 11/21/15, 10:12 PM, "Brian Narsi" <bn...@gmail.com> wrote:

>What are the caveats regarding the copy of a collection?
>
>At this time DIH takes only about 10 minutes. So in case of accidental
>delete we can just re-run the DIH. The reason I am thinking about backup
>is
>just in case records are deleted accidentally and the DIH cannot be run
>because the database is unavailable.
>
>Our collection is simple: 2 nodes - 1 collection - 2 shards with 2
>replicas
>each
>
>So a simple copy (cp command) for both the nodes/shards might work for us?
>How do I restore the data back?
>
>
>
>On Tue, Nov 17, 2015 at 4:56 PM, Jeff Wartes <jw...@whitepages.com>
>wrote:
>
>>
>> https://github.com/whitepages/solrcloud_manager supports 5.x, and I
>>added
>> some backup/restore functionality similar to SOLR-5750 in the last
>> release.
>> Like SOLR-5750, this backup strategy requires a shared filesystem, but
>> note that unlike SOLR-5750, I haven’t yet added any backup functionality
>> for the contents of ZK. I’m currently working on some parts of that.
>>
>>
>> Making a copy of a collection is supported too, with some caveats.
>>
>>
>> On 11/17/15, 10:20 AM, "Brian Narsi" <bn...@gmail.com> wrote:
>>
>> >Sorry I forgot to mention that we are using SolrCloud 5.1.0.
>> >
>> >
>> >
>> >On Tue, Nov 17, 2015 at 12:09 PM, KNitin <ni...@gmail.com> wrote:
>> >
>> >> afaik Data import handler does not offer backups. You can try using
>>the
>> >> replication handler to backup data as you wish to any custom end
>>point.
>> >>
>> >> You can also try out : https://github.com/bloomreach/solrcloud-haft.
>> >>This
>> >> helps backup solr indices across clusters.
>> >>
>> >> On Tue, Nov 17, 2015 at 7:08 AM, Brian Narsi <bn...@gmail.com>
>> wrote:
>> >>
>> >> > I am using Data Import Handler to retrieve data from a database
>>with
>> >> >
>> >> > full-import, clean = true, commit = true and optimize = true
>> >> >
>> >> > This has always worked correctly without any errors.
>> >> >
>> >> > But just to be on the safe side, I am thinking that we should do a
>> >>backup
>> >> > before initiating Data Import Handler. And just in case something
>> >>happens
>> >> > restore the backup.
>> >> >
>> >> > Can backup be done automatically (before initiating Data Import
>> >>Handler)?
>> >> >
>> >> > Thanks
>> >> >
>> >>
>>
>>


Re: Data Import Handler / Backup indexes

Posted by Brian Narsi <bn...@gmail.com>.
What are the caveats regarding the copy of a collection?

At this time DIH takes only about 10 minutes. So in case of accidental
delete we can just re-run the DIH. The reason I am thinking about backup is
just in case records are deleted accidentally and the DIH cannot be run
because the database is unavailable.

Our collection is simple: 2 nodes - 1 collection - 2 shards with 2 replicas
each

So a simple copy (cp command) for both the nodes/shards might work for us?
How do I restore the data back?



On Tue, Nov 17, 2015 at 4:56 PM, Jeff Wartes <jw...@whitepages.com> wrote:

>
> https://github.com/whitepages/solrcloud_manager supports 5.x, and I added
> some backup/restore functionality similar to SOLR-5750 in the last
> release.
> Like SOLR-5750, this backup strategy requires a shared filesystem, but
> note that unlike SOLR-5750, I haven’t yet added any backup functionality
> for the contents of ZK. I’m currently working on some parts of that.
>
>
> Making a copy of a collection is supported too, with some caveats.
>
>
> On 11/17/15, 10:20 AM, "Brian Narsi" <bn...@gmail.com> wrote:
>
> >Sorry I forgot to mention that we are using SolrCloud 5.1.0.
> >
> >
> >
> >On Tue, Nov 17, 2015 at 12:09 PM, KNitin <ni...@gmail.com> wrote:
> >
> >> afaik Data import handler does not offer backups. You can try using the
> >> replication handler to backup data as you wish to any custom end point.
> >>
> >> You can also try out : https://github.com/bloomreach/solrcloud-haft.
> >>This
> >> helps backup solr indices across clusters.
> >>
> >> On Tue, Nov 17, 2015 at 7:08 AM, Brian Narsi <bn...@gmail.com>
> wrote:
> >>
> >> > I am using Data Import Handler to retrieve data from a database with
> >> >
> >> > full-import, clean = true, commit = true and optimize = true
> >> >
> >> > This has always worked correctly without any errors.
> >> >
> >> > But just to be on the safe side, I am thinking that we should do a
> >>backup
> >> > before initiating Data Import Handler. And just in case something
> >>happens
> >> > restore the backup.
> >> >
> >> > Can backup be done automatically (before initiating Data Import
> >>Handler)?
> >> >
> >> > Thanks
> >> >
> >>
>
>

Re: Data Import Handler / Backup indexes

Posted by Jeff Wartes <jw...@whitepages.com>.
https://github.com/whitepages/solrcloud_manager supports 5.x, and I added
some backup/restore functionality similar to SOLR-5750 in the last
release. 
Like SOLR-5750, this backup strategy requires a shared filesystem, but
note that unlike SOLR-5750, I haven’t yet added any backup functionality
for the contents of ZK. I’m currently working on some parts of that.


Making a copy of a collection is supported too, with some caveats.


On 11/17/15, 10:20 AM, "Brian Narsi" <bn...@gmail.com> wrote:

>Sorry I forgot to mention that we are using SolrCloud 5.1.0.
>
>
>
>On Tue, Nov 17, 2015 at 12:09 PM, KNitin <ni...@gmail.com> wrote:
>
>> afaik Data import handler does not offer backups. You can try using the
>> replication handler to backup data as you wish to any custom end point.
>>
>> You can also try out : https://github.com/bloomreach/solrcloud-haft.
>>This
>> helps backup solr indices across clusters.
>>
>> On Tue, Nov 17, 2015 at 7:08 AM, Brian Narsi <bn...@gmail.com> wrote:
>>
>> > I am using Data Import Handler to retrieve data from a database with
>> >
>> > full-import, clean = true, commit = true and optimize = true
>> >
>> > This has always worked correctly without any errors.
>> >
>> > But just to be on the safe side, I am thinking that we should do a
>>backup
>> > before initiating Data Import Handler. And just in case something
>>happens
>> > restore the backup.
>> >
>> > Can backup be done automatically (before initiating Data Import
>>Handler)?
>> >
>> > Thanks
>> >
>>


Re: Data Import Handler / Backup indexes

Posted by Brian Narsi <bn...@gmail.com>.
Sorry I forgot to mention that we are using SolrCloud 5.1.0.



On Tue, Nov 17, 2015 at 12:09 PM, KNitin <ni...@gmail.com> wrote:

> afaik Data import handler does not offer backups. You can try using the
> replication handler to backup data as you wish to any custom end point.
>
> You can also try out : https://github.com/bloomreach/solrcloud-haft.  This
> helps backup solr indices across clusters.
>
> On Tue, Nov 17, 2015 at 7:08 AM, Brian Narsi <bn...@gmail.com> wrote:
>
> > I am using Data Import Handler to retrieve data from a database with
> >
> > full-import, clean = true, commit = true and optimize = true
> >
> > This has always worked correctly without any errors.
> >
> > But just to be on the safe side, I am thinking that we should do a backup
> > before initiating Data Import Handler. And just in case something happens
> > restore the backup.
> >
> > Can backup be done automatically (before initiating Data Import Handler)?
> >
> > Thanks
> >
>

Re: Data Import Handler / Backup indexes

Posted by KNitin <ni...@gmail.com>.
afaik Data import handler does not offer backups. You can try using the
replication handler to backup data as you wish to any custom end point.

You can also try out : https://github.com/bloomreach/solrcloud-haft.  This
helps backup solr indices across clusters.

On Tue, Nov 17, 2015 at 7:08 AM, Brian Narsi <bn...@gmail.com> wrote:

> I am using Data Import Handler to retrieve data from a database with
>
> full-import, clean = true, commit = true and optimize = true
>
> This has always worked correctly without any errors.
>
> But just to be on the safe side, I am thinking that we should do a backup
> before initiating Data Import Handler. And just in case something happens
> restore the backup.
>
> Can backup be done automatically (before initiating Data Import Handler)?
>
> Thanks
>