You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@solr.apache.org by Thomas Woodard <tw...@eline.com> on 2022/08/05 13:00:58 UTC

solr backup location 8.11.1

I need to backup to a network file system to support recovery. I do not
want the index on a network file system, so just mounting /var/solr/data
isn't an option. I have attempted to set the location in the replication
handler, but it is not working. I've tried all of these configurations.

  <requestHandler name="/replication" class="solr.ReplicationHandler">
    <lst name="leader">
      <str name="replicateAfter">optimize</str>
      <str name="backupAfter">optimize</str>
    </lst>
    <int name="maxNumberOfBackups">2</int>
    <str name="commitReserveDuration">00:00:20</str>
    <lst name="default">
      <str name="location">/var/i8s/backup/solr/${i8s.environment}/${
solr.core.name}</str>
    </lst>
  </requestHandler>

  <requestHandler name="/replication" class="solr.ReplicationHandler">
    <lst name="leader">
      <str name="replicateAfter">optimize</str>
      <str name="backupAfter">optimize</str>
      <str name="location">/var/i8s/backup/solr/${i8s.environment}/${
solr.core.name}</str>
    </lst>
    <int name="maxNumberOfBackups">2</int>
    <str name="commitReserveDuration">00:00:20</str>
  </requestHandler>

  <requestHandler name="/replication" class="solr.ReplicationHandler">
    <lst name="leader">
      <str name="replicateAfter">optimize</str>
      <str name="backupAfter">optimize</str>
    </lst>
    <int name="maxNumberOfBackups">2</int>
    <str name="commitReserveDuration">00:00:20</str>
    <str name="location">/var/i8s/backup/solr/${i8s.environment}/${
solr.core.name}</str>
  </requestHandler>

The backups after optimize are happening, but they are going to the default
locations, not the configured location. For example:
2022-08-04 17:19:52.053 INFO  (Thread-14) [   ] o.a.s.h.SnapShooter
Creating backup snapshot <not named> at
file:///var/solr/data/contentPage/data/

I've confirmed that it isn't a path security issue, by verifying that all
paths are allowed:
2022-08-05 12:29:03.873 INFO  (main) [   ] o.a.s.c.CoreContainer Allowing
use of paths: [_ALL_]

How do I make backups go where I want?

Re: solr backup location 8.11.1

Posted by Dave <ha...@gmail.com>.
If you have any metal, a cron doing an rsync against ec2 may work well, hell you could do that with a cheap laptop that has a large hard drive running linux that is plugged in and doesn’t sleep. Enterprise? No. Works? Certainly 

> On Aug 5, 2022, at 12:31 PM, Thomas Woodard <tw...@eline.com> wrote:
> 
> Actually, soft links won't work either, because the snapshots aren't in a
> subdirectory of data, and each one has a different name.
> 
> Cron on ec2 is a bit of a pain, but yes, that does seem like the
> best solution available.
> 
>> On Fri, Aug 5, 2022 at 11:15 AM Dave <ha...@gmail.com> wrote:
>> 
>> Can’t you just make a cron job that runs an sh file that does a cp-rf on
>> the data folder with a time stamp?  The indexes are drop in when needed
>> 
>>>> On Aug 5, 2022, at 12:07 PM, Thomas Woodard <tw...@eline.com> wrote:
>>> 
>>> That is exactly what I was afraid of. Not being able to configure where
>>> automated backups go seems like a pretty major oversight, though. Is
>> anyone
>>> aware of a solution other than creating a bunch of soft links?
>>> 
>>>> On Fri, Aug 5, 2022 at 8:52 AM Shawn Heisey <ap...@elyograg.org>
>> wrote:
>>>> 
>>>>> On 8/5/22 07:42, Shawn Heisey wrote:
>>>>> I've confirmed that it isn't a path security issue, by verifying that
>> all
>>>>> paths are allowed:
>>>>> 2022-08-05 12:29:03.873 INFO  (main) [   ] o.a.s.c.CoreContainer
>> Allowing
>>>>> use of paths: [_ALL_]
>>>> 
>>>> I missed this part of your email until after I had already sent my other
>>>> reply.  Apologies for the oversight.
>>>> 
>>>> I think the problem is likely that location must be a URL parameter, not
>>>> configured in solrconfig.xml.  The code looks like it supports this
>>>> conclusion.
>>>> 
>>>> Thanks,
>>>> Shawn
>>>> 
>>>> 
>> 

Re: solr backup location 8.11.1

Posted by Thomas Woodard <tw...@eline.com>.
Actually, soft links won't work either, because the snapshots aren't in a
subdirectory of data, and each one has a different name.

Cron on ec2 is a bit of a pain, but yes, that does seem like the
best solution available.

On Fri, Aug 5, 2022 at 11:15 AM Dave <ha...@gmail.com> wrote:

> Can’t you just make a cron job that runs an sh file that does a cp-rf on
> the data folder with a time stamp?  The indexes are drop in when needed
>
> > On Aug 5, 2022, at 12:07 PM, Thomas Woodard <tw...@eline.com> wrote:
> >
> > That is exactly what I was afraid of. Not being able to configure where
> > automated backups go seems like a pretty major oversight, though. Is
> anyone
> > aware of a solution other than creating a bunch of soft links?
> >
> >> On Fri, Aug 5, 2022 at 8:52 AM Shawn Heisey <ap...@elyograg.org>
> wrote:
> >>
> >>> On 8/5/22 07:42, Shawn Heisey wrote:
> >>> I've confirmed that it isn't a path security issue, by verifying that
> all
> >>> paths are allowed:
> >>> 2022-08-05 12:29:03.873 INFO  (main) [   ] o.a.s.c.CoreContainer
> Allowing
> >>> use of paths: [_ALL_]
> >>
> >> I missed this part of your email until after I had already sent my other
> >> reply.  Apologies for the oversight.
> >>
> >> I think the problem is likely that location must be a URL parameter, not
> >> configured in solrconfig.xml.  The code looks like it supports this
> >> conclusion.
> >>
> >> Thanks,
> >> Shawn
> >>
> >>
>

Re: solr backup location 8.11.1

Posted by Dave <ha...@gmail.com>.
Can’t you just make a cron job that runs an sh file that does a cp-rf on the data folder with a time stamp?  The indexes are drop in when needed

> On Aug 5, 2022, at 12:07 PM, Thomas Woodard <tw...@eline.com> wrote:
> 
> That is exactly what I was afraid of. Not being able to configure where
> automated backups go seems like a pretty major oversight, though. Is anyone
> aware of a solution other than creating a bunch of soft links?
> 
>> On Fri, Aug 5, 2022 at 8:52 AM Shawn Heisey <ap...@elyograg.org> wrote:
>> 
>>> On 8/5/22 07:42, Shawn Heisey wrote:
>>> I've confirmed that it isn't a path security issue, by verifying that all
>>> paths are allowed:
>>> 2022-08-05 12:29:03.873 INFO  (main) [   ] o.a.s.c.CoreContainer Allowing
>>> use of paths: [_ALL_]
>> 
>> I missed this part of your email until after I had already sent my other
>> reply.  Apologies for the oversight.
>> 
>> I think the problem is likely that location must be a URL parameter, not
>> configured in solrconfig.xml.  The code looks like it supports this
>> conclusion.
>> 
>> Thanks,
>> Shawn
>> 
>> 

Re: solr backup location 8.11.1

Posted by Gus Heck <gu...@gmail.com>.
If it doesn't apply the defaults that's the bug right there I think.

On Fri, Aug 5, 2022 at 2:10 PM Shawn Heisey <ap...@elyograg.org> wrote:

> On 8/5/22 11:56, Thomas Woodard wrote:
> > Yup, I absolutely did typo when I tried to do it as a default. I'll
> update
> > my issue to correct that.
>
> It will be interesting to see whether fixing the typo makes it work.
> Sometimes the code is hard to decipher, and it is always possible that
> it does apply the defaults in the way you're expecting.
>
> Thanks,
> Shawn
>
>

-- 
http://www.needhamsoftware.com (work)
http://www.the111shift.com (play)

Re: solr backup location 8.11.1

Posted by Shawn Heisey <ap...@elyograg.org>.
On 8/5/22 11:56, Thomas Woodard wrote:
> Yup, I absolutely did typo when I tried to do it as a default. I'll update
> my issue to correct that.

It will be interesting to see whether fixing the typo makes it work.  
Sometimes the code is hard to decipher, and it is always possible that 
it does apply the defaults in the way you're expecting.

Thanks,
Shawn


Re: solr backup location 8.11.1

Posted by Thomas Woodard <tw...@eline.com>.
Yup, I absolutely did typo when I tried to do it as a default. I'll update
my issue to correct that.

On Fri, Aug 5, 2022 at 12:31 PM Gus Heck <gu...@gmail.com> wrote:

> Just looked at some other handler configurations, I think you may suffer
> from a typo... should
>
> <lst name="default">
>       <str name="location">/var/i8s/backup/solr/${i8s.environment}/${
> solr.core.name}</str>
>     </lst>
>
> have been
>
> <lst name="defaults">
>       <str name="location">/var/i8s/backup/solr/${i8s.environment}/${
> solr.core.name}</str>
>     </lst>
> (note the s)
>
> On Fri, Aug 5, 2022 at 1:05 PM Thomas Woodard <tw...@eline.com> wrote:
>
> > Thanks for the rapid replies. I've opened
> > https://issues.apache.org/jira/browse/SOLR-16326 and will proceed with
> > scripting a scheduled backup instead.
> >
> > On Fri, Aug 5, 2022 at 11:36 AM Shawn Heisey <ap...@elyograg.org>
> wrote:
> >
> > > On 8/5/22 10:06, Thomas Woodard wrote:
> > > > That is exactly what I was afraid of. Not being able to configure
> where
> > > > automated backups go seems like a pretty major oversight, though. Is
> > > anyone
> > > > aware of a solution other than creating a bunch of soft links?
> > >
> > > The symlink idea I had (but haven't mentioned) would work pretty well
> if
> > > you were calling http://server:port/solr/CORE/replication with a
> script
> > > or manually, but not for the triggered backups. Maybe in the meantime
> > > you can switch to a scheduled script and provide location and name
> > > params on the URL instead of configuring backupAfter.  Then you could
> do
> > > anything you want to do and won't have to compile it yourself or wait
> > > for a new version.
> > >
> > > FYI, if your index is not very small, you should probably not be
> > > optimizing it frequently.  If the optimizes are not frequent, or an
> > > optimize completes very quickly, then ignore that.
> > >
> > > Please open an enhancement issue in the Apache Jira on the SOLR
> > > project.  You are right that the location should be configurable as
> well
> > > as something that can be provided on the URL.  I think we need to take
> a
> > > close look at all the  parameters for the replication handler and
> decide
> > > which ones should be configurable in solrconfig.xml.
> > >
> > > When I have some free time I will look into improving the handler.  An
> > > issue in Jira makes that work easier to track, and would also get your
> > > name in the changelog.
> > >
> > > Thanks,
> > > Shawn
> > >
> > >
> >
>
>
> --
> http://www.needhamsoftware.com (work)
> http://www.the111shift.com (play)
>

Re: solr backup location 8.11.1

Posted by Gus Heck <gu...@gmail.com>.
Just looked at some other handler configurations, I think you may suffer
from a typo... should

<lst name="default">
      <str name="location">/var/i8s/backup/solr/${i8s.environment}/${
solr.core.name}</str>
    </lst>

have been

<lst name="defaults">
      <str name="location">/var/i8s/backup/solr/${i8s.environment}/${
solr.core.name}</str>
    </lst>
(note the s)

On Fri, Aug 5, 2022 at 1:05 PM Thomas Woodard <tw...@eline.com> wrote:

> Thanks for the rapid replies. I've opened
> https://issues.apache.org/jira/browse/SOLR-16326 and will proceed with
> scripting a scheduled backup instead.
>
> On Fri, Aug 5, 2022 at 11:36 AM Shawn Heisey <ap...@elyograg.org> wrote:
>
> > On 8/5/22 10:06, Thomas Woodard wrote:
> > > That is exactly what I was afraid of. Not being able to configure where
> > > automated backups go seems like a pretty major oversight, though. Is
> > anyone
> > > aware of a solution other than creating a bunch of soft links?
> >
> > The symlink idea I had (but haven't mentioned) would work pretty well if
> > you were calling http://server:port/solr/CORE/replication with a script
> > or manually, but not for the triggered backups. Maybe in the meantime
> > you can switch to a scheduled script and provide location and name
> > params on the URL instead of configuring backupAfter.  Then you could do
> > anything you want to do and won't have to compile it yourself or wait
> > for a new version.
> >
> > FYI, if your index is not very small, you should probably not be
> > optimizing it frequently.  If the optimizes are not frequent, or an
> > optimize completes very quickly, then ignore that.
> >
> > Please open an enhancement issue in the Apache Jira on the SOLR
> > project.  You are right that the location should be configurable as well
> > as something that can be provided on the URL.  I think we need to take a
> > close look at all the  parameters for the replication handler and decide
> > which ones should be configurable in solrconfig.xml.
> >
> > When I have some free time I will look into improving the handler.  An
> > issue in Jira makes that work easier to track, and would also get your
> > name in the changelog.
> >
> > Thanks,
> > Shawn
> >
> >
>


-- 
http://www.needhamsoftware.com (work)
http://www.the111shift.com (play)

Re: solr backup location 8.11.1

Posted by Thomas Woodard <tw...@eline.com>.
Thanks for the rapid replies. I've opened
https://issues.apache.org/jira/browse/SOLR-16326 and will proceed with
scripting a scheduled backup instead.

On Fri, Aug 5, 2022 at 11:36 AM Shawn Heisey <ap...@elyograg.org> wrote:

> On 8/5/22 10:06, Thomas Woodard wrote:
> > That is exactly what I was afraid of. Not being able to configure where
> > automated backups go seems like a pretty major oversight, though. Is
> anyone
> > aware of a solution other than creating a bunch of soft links?
>
> The symlink idea I had (but haven't mentioned) would work pretty well if
> you were calling http://server:port/solr/CORE/replication with a script
> or manually, but not for the triggered backups. Maybe in the meantime
> you can switch to a scheduled script and provide location and name
> params on the URL instead of configuring backupAfter.  Then you could do
> anything you want to do and won't have to compile it yourself or wait
> for a new version.
>
> FYI, if your index is not very small, you should probably not be
> optimizing it frequently.  If the optimizes are not frequent, or an
> optimize completes very quickly, then ignore that.
>
> Please open an enhancement issue in the Apache Jira on the SOLR
> project.  You are right that the location should be configurable as well
> as something that can be provided on the URL.  I think we need to take a
> close look at all the  parameters for the replication handler and decide
> which ones should be configurable in solrconfig.xml.
>
> When I have some free time I will look into improving the handler.  An
> issue in Jira makes that work easier to track, and would also get your
> name in the changelog.
>
> Thanks,
> Shawn
>
>

Re: solr backup location 8.11.1

Posted by Shawn Heisey <ap...@elyograg.org>.
On 8/5/22 10:06, Thomas Woodard wrote:
> That is exactly what I was afraid of. Not being able to configure where
> automated backups go seems like a pretty major oversight, though. Is anyone
> aware of a solution other than creating a bunch of soft links?

The symlink idea I had (but haven't mentioned) would work pretty well if 
you were calling http://server:port/solr/CORE/replication with a script 
or manually, but not for the triggered backups. Maybe in the meantime 
you can switch to a scheduled script and provide location and name 
params on the URL instead of configuring backupAfter.  Then you could do 
anything you want to do and won't have to compile it yourself or wait 
for a new version.

FYI, if your index is not very small, you should probably not be 
optimizing it frequently.  If the optimizes are not frequent, or an 
optimize completes very quickly, then ignore that.

Please open an enhancement issue in the Apache Jira on the SOLR 
project.  You are right that the location should be configurable as well 
as something that can be provided on the URL.  I think we need to take a 
close look at all the  parameters for the replication handler and decide 
which ones should be configurable in solrconfig.xml.

When I have some free time I will look into improving the handler.  An 
issue in Jira makes that work easier to track, and would also get your 
name in the changelog.

Thanks,
Shawn


Re: solr backup location 8.11.1

Posted by Thomas Woodard <tw...@eline.com>.
That is exactly what I was afraid of. Not being able to configure where
automated backups go seems like a pretty major oversight, though. Is anyone
aware of a solution other than creating a bunch of soft links?

On Fri, Aug 5, 2022 at 8:52 AM Shawn Heisey <ap...@elyograg.org> wrote:

> On 8/5/22 07:42, Shawn Heisey wrote:
> > I've confirmed that it isn't a path security issue, by verifying that all
> > paths are allowed:
> > 2022-08-05 12:29:03.873 INFO  (main) [   ] o.a.s.c.CoreContainer Allowing
> > use of paths: [_ALL_]
>
> I missed this part of your email until after I had already sent my other
> reply.  Apologies for the oversight.
>
> I think the problem is likely that location must be a URL parameter, not
> configured in solrconfig.xml.  The code looks like it supports this
> conclusion.
>
> Thanks,
> Shawn
>
>

Re: solr backup location 8.11.1

Posted by Shawn Heisey <ap...@elyograg.org>.
On 8/5/22 07:42, Shawn Heisey wrote:
> I've confirmed that it isn't a path security issue, by verifying that all
> paths are allowed:
> 2022-08-05 12:29:03.873 INFO  (main) [   ] o.a.s.c.CoreContainer Allowing
> use of paths: [_ALL_]

I missed this part of your email until after I had already sent my other 
reply.  Apologies for the oversight.

I think the problem is likely that location must be a URL parameter, not 
configured in solrconfig.xml.  The code looks like it supports this 
conclusion.

Thanks,
Shawn


Re: solr backup location 8.11.1

Posted by Shawn Heisey <ap...@elyograg.org>.
On 8/5/22 07:00, Thomas Woodard wrote:
>    <requestHandler name="/replication" class="solr.ReplicationHandler">
>      <lst name="leader">
>        <str name="replicateAfter">optimize</str>
>        <str name="backupAfter">optimize</str>
>      </lst>
>      <int name="maxNumberOfBackups">2</int>
>      <str name="commitReserveDuration">00:00:20</str>
>      <str name="location">/var/i8s/backup/solr/${i8s.environment}/${
> solr.core.name}</str>
>    </requestHandler>
>
> The backups after optimize are happening, but they are going to the default
> locations, not the configured location. For example:
> 2022-08-04 17:19:52.053 INFO  (Thread-14) [   ] o.a.s.h.SnapShooter
> Creating backup snapshot <not named> at
> file:///var/solr/data/contentPage/data/
>
> I've confirmed that it isn't a path security issue, by verifying that all
> paths are allowed:
> 2022-08-05 12:29:03.873 INFO  (main) [   ] o.a.s.c.CoreContainer Allowing
> use of paths: [_ALL_]

https://solr.apache.org/guide/8_11/index-replication.html#http-api-commands-for-the-replicationhandler

This appears to be some relevant info on that page:

  *

    |location|: Backup location. Value depends on the repository in use.
    For file system repository, location defaults to core’s dataDir, and
    if specified, it needs to be within|SOLR_HOME|,|SOLR_DATA_HOME|or
    the paths specified by solr.xml|allowPaths|.


I am not sure that you can put "location" in solrconfig.xml ... the 
reference guide lists it as a URL parameter, not a configuration 
parameter.  I have not verified this.

Once you work out whether it needs to be a URL parameter: For security 
purposes, Solr limits where it can write data that is triggered by API 
calls.  If you want it to be outside of SOLR_HOME or SOLR_DATA_HOME then 
you have to allow the path in solr.xml.

https://solr.apache.org/guide/8_11/format-of-solr-xml.html#the-solr-element

Thanks,
Shawn