You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Artem Shnayder <ar...@gmail.com> on 2012/03/28 20:46:22 UTC

DataImportHandler: backups prior to full-import

Does anyone know of any work done to automatically run a backup prior to a
DataImportHandler full-import?

I've asked this question on #solr and was pointed to
https://wiki.apache.org/solr/SolrReplication?highlight=%28backup%29#HTTP_API
which
is helpful but is not an automatic backup in the context of full-import's.
I'm wondering if anyone else has done this work yet.

-- Artem Shnayder

Re: DataImportHandler: backups prior to full-import

Posted by Artem Shnayder <ar...@gmail.com>.
Thanks for you help James, I'll try that out.

On Wed, Mar 28, 2012 at 12:30 PM, Dyer, James <Ja...@ingrambook.com>wrote:

> Unfortunately there isn't a good way to solve this.  Your best bet is to
> trigger a backup before the nightly re-index using
> /replication?command=backup
>
> The problem is the backup runs asynchronously so its hard to script a way
> to determine if the backup is finished or not.  What we do is poll the
> replicationHandler with /replicaton?command=details and scrape the response
> until <str name="snapshotCompletedAt">timestamp_here</str> changes to a new
> timestamp.
>
> James Dyer
> E-Commerce Systems
> Ingram Content Group
> (615) 213-4311
>
>
> -----Original Message-----
> From: Artem Shnayder [mailto:artem.src@gmail.com]
> Sent: Wednesday, March 28, 2012 1:59 PM
> To: solr-user@lucene.apache.org
> Subject: Re: DataImportHandler: backups prior to full-import
>
> My typical workflow is a once-a-day full-import with hourly delta-imports.
> Ideally, the backup would occur only during the full-import commits. Is
> there a way to differentiate in the replication handler?
>
> On Wed, Mar 28, 2012 at 11:54 AM, Dyer, James <James.Dyer@ingrambook.com
> >wrote:
>
> > I don't know of any effort out there to have DIH trigger a backup
> > automatically.  However, you can set the replication handler to
> > automatically backup after each commit.  This might solve your problem if
> > you aren't committing frequently.
> >
> > James Dyer
> > E-Commerce Systems
> > Ingram Content Group
> > (615) 213-4311
> >
> >
> > -----Original Message-----
> > From: Artem Shnayder [mailto:artem.src@gmail.com]
> > Sent: Wednesday, March 28, 2012 1:46 PM
> > To: solr-user@lucene.apache.org
> > Subject: DataImportHandler: backups prior to full-import
> >
> > Does anyone know of any work done to automatically run a backup prior to
> a
> > DataImportHandler full-import?
> >
> > I've asked this question on #solr and was pointed to
> >
> >
> https://wiki.apache.org/solr/SolrReplication?highlight=%28backup%29#HTTP_API
> > which
> > is helpful but is not an automatic backup in the context of
> full-import's.
> > I'm wondering if anyone else has done this work yet.
> >
> > -- Artem Shnayder
> >
>

RE: DataImportHandler: backups prior to full-import

Posted by "Dyer, James" <Ja...@ingrambook.com>.
Unfortunately there isn't a good way to solve this.  Your best bet is to trigger a backup before the nightly re-index using /replication?command=backup  

The problem is the backup runs asynchronously so its hard to script a way to determine if the backup is finished or not.  What we do is poll the replicationHandler with /replicaton?command=details and scrape the response until <str name="snapshotCompletedAt">timestamp_here</str> changes to a new timestamp.

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311


-----Original Message-----
From: Artem Shnayder [mailto:artem.src@gmail.com] 
Sent: Wednesday, March 28, 2012 1:59 PM
To: solr-user@lucene.apache.org
Subject: Re: DataImportHandler: backups prior to full-import

My typical workflow is a once-a-day full-import with hourly delta-imports.
Ideally, the backup would occur only during the full-import commits. Is
there a way to differentiate in the replication handler?

On Wed, Mar 28, 2012 at 11:54 AM, Dyer, James <Ja...@ingrambook.com>wrote:

> I don't know of any effort out there to have DIH trigger a backup
> automatically.  However, you can set the replication handler to
> automatically backup after each commit.  This might solve your problem if
> you aren't committing frequently.
>
> James Dyer
> E-Commerce Systems
> Ingram Content Group
> (615) 213-4311
>
>
> -----Original Message-----
> From: Artem Shnayder [mailto:artem.src@gmail.com]
> Sent: Wednesday, March 28, 2012 1:46 PM
> To: solr-user@lucene.apache.org
> Subject: DataImportHandler: backups prior to full-import
>
> Does anyone know of any work done to automatically run a backup prior to a
> DataImportHandler full-import?
>
> I've asked this question on #solr and was pointed to
>
> https://wiki.apache.org/solr/SolrReplication?highlight=%28backup%29#HTTP_API
> which
> is helpful but is not an automatic backup in the context of full-import's.
> I'm wondering if anyone else has done this work yet.
>
> -- Artem Shnayder
>

Re: DataImportHandler: backups prior to full-import

Posted by Artem Shnayder <ar...@gmail.com>.
My typical workflow is a once-a-day full-import with hourly delta-imports.
Ideally, the backup would occur only during the full-import commits. Is
there a way to differentiate in the replication handler?

On Wed, Mar 28, 2012 at 11:54 AM, Dyer, James <Ja...@ingrambook.com>wrote:

> I don't know of any effort out there to have DIH trigger a backup
> automatically.  However, you can set the replication handler to
> automatically backup after each commit.  This might solve your problem if
> you aren't committing frequently.
>
> James Dyer
> E-Commerce Systems
> Ingram Content Group
> (615) 213-4311
>
>
> -----Original Message-----
> From: Artem Shnayder [mailto:artem.src@gmail.com]
> Sent: Wednesday, March 28, 2012 1:46 PM
> To: solr-user@lucene.apache.org
> Subject: DataImportHandler: backups prior to full-import
>
> Does anyone know of any work done to automatically run a backup prior to a
> DataImportHandler full-import?
>
> I've asked this question on #solr and was pointed to
>
> https://wiki.apache.org/solr/SolrReplication?highlight=%28backup%29#HTTP_API
> which
> is helpful but is not an automatic backup in the context of full-import's.
> I'm wondering if anyone else has done this work yet.
>
> -- Artem Shnayder
>

RE: DataImportHandler: backups prior to full-import

Posted by "Dyer, James" <Ja...@ingrambook.com>.
I don't know of any effort out there to have DIH trigger a backup automatically.  However, you can set the replication handler to automatically backup after each commit.  This might solve your problem if you aren't committing frequently.

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311


-----Original Message-----
From: Artem Shnayder [mailto:artem.src@gmail.com] 
Sent: Wednesday, March 28, 2012 1:46 PM
To: solr-user@lucene.apache.org
Subject: DataImportHandler: backups prior to full-import

Does anyone know of any work done to automatically run a backup prior to a
DataImportHandler full-import?

I've asked this question on #solr and was pointed to
https://wiki.apache.org/solr/SolrReplication?highlight=%28backup%29#HTTP_API
which
is helpful but is not an automatic backup in the context of full-import's.
I'm wondering if anyone else has done this work yet.

-- Artem Shnayder

Re: DataImportHandler: backups prior to full-import

Posted by Bill Bell <bi...@gmail.com>.
You could use the Solr Command Utility SCU that runs from Windows and can be scheduled to run. 

https://github.com/justengland/Solr-Command-Utility

This is a windows system that will index using a core, and swap it if it succeeds. It works it's Solr.

Let me know if you have any questions.

On Mar 28, 2012, at 10:11 PM, Shawn Heisey <so...@elyograg.org> wrote:

> On 3/28/2012 12:46 PM, Artem Shnayder wrote:
>> Does anyone know of any work done to automatically run a backup prior to a
>> DataImportHandler full-import?
>> 
>> I've asked this question on #solr and was pointed to
>> https://wiki.apache.org/solr/SolrReplication?highlight=%28backup%29#HTTP_API
>> which
>> is helpful but is not an automatic backup in the context of full-import's.
>> I'm wondering if anyone else has done this work yet.
> 
> I have located a previous message from you where you mention that you are on Ubuntu.  If that's true, you can use hard links to make nearly instantaneous backups with a single command:
> 
> ln /path/to/index/* /path/to/backup/.
> 
> One caveat to that - the backup must be on the same filesystem as the index.  If keeping backups on another filesystem (or even another computer) is important, then treat the hard link backup as a temporary directory.  Copy the files from that directory to your remote location, then delete them.
> 
> This works because of the way that Lucene (and by extension Solr) manages files on disk - existing segment files are never modified.  If they get merged, new files are created before the old ones are deleted.  There is only one file in an index directory that does change without getting a new name - segments.gen.  I have verified (on Solr 3.5) that even this file is properly handled so that a hard link backup keeps the correct version.
> 
> For people running on Windows, this particular method won't work.  Newer Windows server versions do have one feature that might actually make it possible to do something similar - shadow copies.  I do not know how to leverage the feature, though.
> 
> Thanks,
> Shawn
> 

Re: DataImportHandler: backups prior to full-import

Posted by Shawn Heisey <so...@elyograg.org>.
On 3/28/2012 12:46 PM, Artem Shnayder wrote:
> Does anyone know of any work done to automatically run a backup prior to a
> DataImportHandler full-import?
>
> I've asked this question on #solr and was pointed to
> https://wiki.apache.org/solr/SolrReplication?highlight=%28backup%29#HTTP_API
> which
> is helpful but is not an automatic backup in the context of full-import's.
> I'm wondering if anyone else has done this work yet.

I have located a previous message from you where you mention that you 
are on Ubuntu.  If that's true, you can use hard links to make nearly 
instantaneous backups with a single command:

ln /path/to/index/* /path/to/backup/.

One caveat to that - the backup must be on the same filesystem as the 
index.  If keeping backups on another filesystem (or even another 
computer) is important, then treat the hard link backup as a temporary 
directory.  Copy the files from that directory to your remote location, 
then delete them.

This works because of the way that Lucene (and by extension Solr) 
manages files on disk - existing segment files are never modified.  If 
they get merged, new files are created before the old ones are deleted.  
There is only one file in an index directory that does change without 
getting a new name - segments.gen.  I have verified (on Solr 3.5) that 
even this file is properly handled so that a hard link backup keeps the 
correct version.

For people running on Windows, this particular method won't work.  Newer 
Windows server versions do have one feature that might actually make it 
possible to do something similar - shadow copies.  I do not know how to 
leverage the feature, though.

Thanks,
Shawn