You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Artem Shnayder <ar...@gmail.com> on 2012/03/28 20:46:22 UTC
DataImportHandler: backups prior to full-import
Does anyone know of any work done to automatically run a backup prior to a
DataImportHandler full-import?
I've asked this question on #solr and was pointed to
https://wiki.apache.org/solr/SolrReplication?highlight=%28backup%29#HTTP_API
which
is helpful but is not an automatic backup in the context of full-import's.
I'm wondering if anyone else has done this work yet.
-- Artem Shnayder
Re: DataImportHandler: backups prior to full-import
Posted by Artem Shnayder <ar...@gmail.com>.
Thanks for you help James, I'll try that out.
On Wed, Mar 28, 2012 at 12:30 PM, Dyer, James <Ja...@ingrambook.com>wrote:
> Unfortunately there isn't a good way to solve this. Your best bet is to
> trigger a backup before the nightly re-index using
> /replication?command=backup
>
> The problem is the backup runs asynchronously so its hard to script a way
> to determine if the backup is finished or not. What we do is poll the
> replicationHandler with /replicaton?command=details and scrape the response
> until <str name="snapshotCompletedAt">timestamp_here</str> changes to a new
> timestamp.
>
> James Dyer
> E-Commerce Systems
> Ingram Content Group
> (615) 213-4311
>
>
> -----Original Message-----
> From: Artem Shnayder [mailto:artem.src@gmail.com]
> Sent: Wednesday, March 28, 2012 1:59 PM
> To: solr-user@lucene.apache.org
> Subject: Re: DataImportHandler: backups prior to full-import
>
> My typical workflow is a once-a-day full-import with hourly delta-imports.
> Ideally, the backup would occur only during the full-import commits. Is
> there a way to differentiate in the replication handler?
>
> On Wed, Mar 28, 2012 at 11:54 AM, Dyer, James <James.Dyer@ingrambook.com
> >wrote:
>
> > I don't know of any effort out there to have DIH trigger a backup
> > automatically. However, you can set the replication handler to
> > automatically backup after each commit. This might solve your problem if
> > you aren't committing frequently.
> >
> > James Dyer
> > E-Commerce Systems
> > Ingram Content Group
> > (615) 213-4311
> >
> >
> > -----Original Message-----
> > From: Artem Shnayder [mailto:artem.src@gmail.com]
> > Sent: Wednesday, March 28, 2012 1:46 PM
> > To: solr-user@lucene.apache.org
> > Subject: DataImportHandler: backups prior to full-import
> >
> > Does anyone know of any work done to automatically run a backup prior to
> a
> > DataImportHandler full-import?
> >
> > I've asked this question on #solr and was pointed to
> >
> >
> https://wiki.apache.org/solr/SolrReplication?highlight=%28backup%29#HTTP_API
> > which
> > is helpful but is not an automatic backup in the context of
> full-import's.
> > I'm wondering if anyone else has done this work yet.
> >
> > -- Artem Shnayder
> >
>
RE: DataImportHandler: backups prior to full-import
Posted by "Dyer, James" <Ja...@ingrambook.com>.
Unfortunately there isn't a good way to solve this. Your best bet is to trigger a backup before the nightly re-index using /replication?command=backup
The problem is the backup runs asynchronously so its hard to script a way to determine if the backup is finished or not. What we do is poll the replicationHandler with /replicaton?command=details and scrape the response until <str name="snapshotCompletedAt">timestamp_here</str> changes to a new timestamp.
James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311
-----Original Message-----
From: Artem Shnayder [mailto:artem.src@gmail.com]
Sent: Wednesday, March 28, 2012 1:59 PM
To: solr-user@lucene.apache.org
Subject: Re: DataImportHandler: backups prior to full-import
My typical workflow is a once-a-day full-import with hourly delta-imports.
Ideally, the backup would occur only during the full-import commits. Is
there a way to differentiate in the replication handler?
On Wed, Mar 28, 2012 at 11:54 AM, Dyer, James <Ja...@ingrambook.com>wrote:
> I don't know of any effort out there to have DIH trigger a backup
> automatically. However, you can set the replication handler to
> automatically backup after each commit. This might solve your problem if
> you aren't committing frequently.
>
> James Dyer
> E-Commerce Systems
> Ingram Content Group
> (615) 213-4311
>
>
> -----Original Message-----
> From: Artem Shnayder [mailto:artem.src@gmail.com]
> Sent: Wednesday, March 28, 2012 1:46 PM
> To: solr-user@lucene.apache.org
> Subject: DataImportHandler: backups prior to full-import
>
> Does anyone know of any work done to automatically run a backup prior to a
> DataImportHandler full-import?
>
> I've asked this question on #solr and was pointed to
>
> https://wiki.apache.org/solr/SolrReplication?highlight=%28backup%29#HTTP_API
> which
> is helpful but is not an automatic backup in the context of full-import's.
> I'm wondering if anyone else has done this work yet.
>
> -- Artem Shnayder
>
Re: DataImportHandler: backups prior to full-import
Posted by Artem Shnayder <ar...@gmail.com>.
My typical workflow is a once-a-day full-import with hourly delta-imports.
Ideally, the backup would occur only during the full-import commits. Is
there a way to differentiate in the replication handler?
On Wed, Mar 28, 2012 at 11:54 AM, Dyer, James <Ja...@ingrambook.com>wrote:
> I don't know of any effort out there to have DIH trigger a backup
> automatically. However, you can set the replication handler to
> automatically backup after each commit. This might solve your problem if
> you aren't committing frequently.
>
> James Dyer
> E-Commerce Systems
> Ingram Content Group
> (615) 213-4311
>
>
> -----Original Message-----
> From: Artem Shnayder [mailto:artem.src@gmail.com]
> Sent: Wednesday, March 28, 2012 1:46 PM
> To: solr-user@lucene.apache.org
> Subject: DataImportHandler: backups prior to full-import
>
> Does anyone know of any work done to automatically run a backup prior to a
> DataImportHandler full-import?
>
> I've asked this question on #solr and was pointed to
>
> https://wiki.apache.org/solr/SolrReplication?highlight=%28backup%29#HTTP_API
> which
> is helpful but is not an automatic backup in the context of full-import's.
> I'm wondering if anyone else has done this work yet.
>
> -- Artem Shnayder
>
RE: DataImportHandler: backups prior to full-import
Posted by "Dyer, James" <Ja...@ingrambook.com>.
I don't know of any effort out there to have DIH trigger a backup automatically. However, you can set the replication handler to automatically backup after each commit. This might solve your problem if you aren't committing frequently.
James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311
-----Original Message-----
From: Artem Shnayder [mailto:artem.src@gmail.com]
Sent: Wednesday, March 28, 2012 1:46 PM
To: solr-user@lucene.apache.org
Subject: DataImportHandler: backups prior to full-import
Does anyone know of any work done to automatically run a backup prior to a
DataImportHandler full-import?
I've asked this question on #solr and was pointed to
https://wiki.apache.org/solr/SolrReplication?highlight=%28backup%29#HTTP_API
which
is helpful but is not an automatic backup in the context of full-import's.
I'm wondering if anyone else has done this work yet.
-- Artem Shnayder
Re: DataImportHandler: backups prior to full-import
Posted by Bill Bell <bi...@gmail.com>.
You could use the Solr Command Utility SCU that runs from Windows and can be scheduled to run.
https://github.com/justengland/Solr-Command-Utility
This is a windows system that will index using a core, and swap it if it succeeds. It works it's Solr.
Let me know if you have any questions.
On Mar 28, 2012, at 10:11 PM, Shawn Heisey <so...@elyograg.org> wrote:
> On 3/28/2012 12:46 PM, Artem Shnayder wrote:
>> Does anyone know of any work done to automatically run a backup prior to a
>> DataImportHandler full-import?
>>
>> I've asked this question on #solr and was pointed to
>> https://wiki.apache.org/solr/SolrReplication?highlight=%28backup%29#HTTP_API
>> which
>> is helpful but is not an automatic backup in the context of full-import's.
>> I'm wondering if anyone else has done this work yet.
>
> I have located a previous message from you where you mention that you are on Ubuntu. If that's true, you can use hard links to make nearly instantaneous backups with a single command:
>
> ln /path/to/index/* /path/to/backup/.
>
> One caveat to that - the backup must be on the same filesystem as the index. If keeping backups on another filesystem (or even another computer) is important, then treat the hard link backup as a temporary directory. Copy the files from that directory to your remote location, then delete them.
>
> This works because of the way that Lucene (and by extension Solr) manages files on disk - existing segment files are never modified. If they get merged, new files are created before the old ones are deleted. There is only one file in an index directory that does change without getting a new name - segments.gen. I have verified (on Solr 3.5) that even this file is properly handled so that a hard link backup keeps the correct version.
>
> For people running on Windows, this particular method won't work. Newer Windows server versions do have one feature that might actually make it possible to do something similar - shadow copies. I do not know how to leverage the feature, though.
>
> Thanks,
> Shawn
>
Re: DataImportHandler: backups prior to full-import
Posted by Shawn Heisey <so...@elyograg.org>.
On 3/28/2012 12:46 PM, Artem Shnayder wrote:
> Does anyone know of any work done to automatically run a backup prior to a
> DataImportHandler full-import?
>
> I've asked this question on #solr and was pointed to
> https://wiki.apache.org/solr/SolrReplication?highlight=%28backup%29#HTTP_API
> which
> is helpful but is not an automatic backup in the context of full-import's.
> I'm wondering if anyone else has done this work yet.
I have located a previous message from you where you mention that you
are on Ubuntu. If that's true, you can use hard links to make nearly
instantaneous backups with a single command:
ln /path/to/index/* /path/to/backup/.
One caveat to that - the backup must be on the same filesystem as the
index. If keeping backups on another filesystem (or even another
computer) is important, then treat the hard link backup as a temporary
directory. Copy the files from that directory to your remote location,
then delete them.
This works because of the way that Lucene (and by extension Solr)
manages files on disk - existing segment files are never modified. If
they get merged, new files are created before the old ones are deleted.
There is only one file in an index directory that does change without
getting a new name - segments.gen. I have verified (on Solr 3.5) that
even this file is properly handled so that a hard link backup keeps the
correct version.
For people running on Windows, this particular method won't work. Newer
Windows server versions do have one feature that might actually make it
possible to do something similar - shadow copies. I do not know how to
leverage the feature, though.
Thanks,
Shawn