You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Phil Scadden <P....@gns.cri.nz> on 2017/09/04 20:55:57 UTC

RE: write.lock file appears and solr wont open

We finally got a resolution to this - trivial but related to trying to do things by remote control. The solr process did not have the permissions to write to the core that was imported. When it tried to create the lock file it failed. The Solr code obviously assumes that file create failure means file already exists rather than perhaps insufficient permissions. Checking for file existence would result in a more informative message but I am guessing the test/production setup when developers are not allowed access to the servers is reasonably unique (I hope so anyway because it sucks).

-----Original Message-----
From: Erick Erickson [mailto:erickerickson@gmail.com]
Sent: Saturday, 26 August 2017 9:15 a.m.
To: solr-user <so...@lucene.apache.org>
Subject: Re: write.lock file appears and solr wont open

Odd. The way core discovery works, it starts at SOLR_HOME and recursively descends the directories. Whenever the recursion finds a "core.properties" file it says "Aha, this must be a core". From there it assumes the data directory is immediately below where it found the core.properties file in the absence of any dataDir overrides.

So how the write.lock file is getting preserved across Solr restarts is a mystery to me. Doing a "kill -9" is one way to make that happen if it is done at just the wrong time, but that's unlikely in what you're describing.

Are you totally sure that there were no old Solr processes running?
And there have been some issues in the past where the log display of the admin UI hold on to errors and displays them after the problem has been fixed. I'm assuming you can't query the new core, is that correct? Because if you can query the core then _something_ has the index open. I'm grasping at straws here mind you.

Best,
Erick

On Thu, Aug 24, 2017 at 9:02 PM, Phil Scadden <P....@gns.cri.nz> wrote:
> SOLR_HOME is /var/www/solr/data
> The zip was actually the entire data directory which also included configsets. And yes core.properties is in var/www/solr/data/prindex (just has single line name=prindex, in it). No other cores are present.
> The data directory should have been unzipped before the solr instance was started (I cant actually touch the machine so communicating via a deployment document but the operator usually follows every step to the letter.
> The sequence was:
> mkdir /var/www/solr
> sudo bash ./install_solr_service.sh solr-6.5.1.tgz -i /opt/local -d
> /var/www/solr edit /etc/default/solr.in.sh to set various items. (esp
> SOLR_HOME and to set SOLR_PID_DIR to /var/www/solr) unzip the data
> directory service solr start.
>
> No other instance of solr installed.
>
> Notice: This email and any attachments are confidential and may not be used, published or redistributed without the prior written consent of the Institute of Geological and Nuclear Sciences Limited (GNS Science). If received in error please destroy and immediately notify GNS Science. Do not copy or disclose the contents.
Notice: This email and any attachments are confidential and may not be used, published or redistributed without the prior written consent of the Institute of Geological and Nuclear Sciences Limited (GNS Science). If received in error please destroy and immediately notify GNS Science. Do not copy or disclose the contents.

Re: write.lock file appears and solr wont open

Posted by Erick Erickson <er...@gmail.com>.
Or only catch the specific exception and only swallow that? But yeah,
this is something that should change as I see this "in the field" and
a more specific error message would short-circuit a lot of unnecessary
pain.

see: LUCENE-7959

Erick

On Wed, Sep 6, 2017 at 5:49 AM, Shawn Heisey <ap...@elyograg.org> wrote:
> On 9/4/2017 5:53 PM, Erick Erickson wrote:
>> Gah, thanks for letting us know. I can't tell you how often
>> permissions issues have tripped me up. You're right, it does seem like
>> there could be a better error message though.
>
> I see this code in NativeFSLockFactory, code that completely ignores any
> problems creating the lockfile, right before the point in the
> obtainFSLock method where Phil's exception came from:
>
>     try {
>       Files.createFile(lockFile);
>     } catch (IOException ignore) {
>       // we must create the file to have a truly canonical path.
>       // if it's already created, we don't care. if it cant be created,
> it will fail below.
>     }
>
> I think that if we replaced that code with the following code, the
> *reason* for ignoring the creation problem (file already exists) will be
> preserved.  Any creation problem (like permissions) would throw a
> (hopefully understandable) standard Java exception that propagates up
> into what Solr logs:
>
>     // If the lockfile already exists, we're going to do nothing.
>     // If there are problems with that lockfile, they will be caught later.
>     // If we *do* create the file here, exceptions will propagate upward.
>     if (Files.notExists(lockFile))
>     {
>       Files.createFile(lockFile);
>     }
>
> The method signature already includes IOException, so this doesn't
> represent an API change.
>
> Thanks,
> Shawn
>

Re: write.lock file appears and solr wont open

Posted by Shawn Heisey <ap...@elyograg.org>.
On 9/4/2017 5:53 PM, Erick Erickson wrote:
> Gah, thanks for letting us know. I can't tell you how often
> permissions issues have tripped me up. You're right, it does seem like
> there could be a better error message though.

I see this code in NativeFSLockFactory, code that completely ignores any
problems creating the lockfile, right before the point in the
obtainFSLock method where Phil's exception came from:

    try {
      Files.createFile(lockFile);
    } catch (IOException ignore) {
      // we must create the file to have a truly canonical path.
      // if it's already created, we don't care. if it cant be created,
it will fail below.
    }

I think that if we replaced that code with the following code, the
*reason* for ignoring the creation problem (file already exists) will be
preserved.  Any creation problem (like permissions) would throw a
(hopefully understandable) standard Java exception that propagates up
into what Solr logs:

    // If the lockfile already exists, we're going to do nothing.
    // If there are problems with that lockfile, they will be caught later.
    // If we *do* create the file here, exceptions will propagate upward.
    if (Files.notExists(lockFile))
    {
      Files.createFile(lockFile);
    }

The method signature already includes IOException, so this doesn't
represent an API change.

Thanks,
Shawn


Re: write.lock file appears and solr wont open

Posted by Erick Erickson <er...@gmail.com>.
Gah, thanks for letting us know. I can't tell you how often
permissions issues have tripped me up. You're right, it does seem like
there could be a better error message though.

Erick

On Mon, Sep 4, 2017 at 1:55 PM, Phil Scadden <P....@gns.cri.nz> wrote:
> We finally got a resolution to this - trivial but related to trying to do things by remote control. The solr process did not have the permissions to write to the core that was imported. When it tried to create the lock file it failed. The Solr code obviously assumes that file create failure means file already exists rather than perhaps insufficient permissions. Checking for file existence would result in a more informative message but I am guessing the test/production setup when developers are not allowed access to the servers is reasonably unique (I hope so anyway because it sucks).
>
> -----Original Message-----
> From: Erick Erickson [mailto:erickerickson@gmail.com]
> Sent: Saturday, 26 August 2017 9:15 a.m.
> To: solr-user <so...@lucene.apache.org>
> Subject: Re: write.lock file appears and solr wont open
>
> Odd. The way core discovery works, it starts at SOLR_HOME and recursively descends the directories. Whenever the recursion finds a "core.properties" file it says "Aha, this must be a core". From there it assumes the data directory is immediately below where it found the core.properties file in the absence of any dataDir overrides.
>
> So how the write.lock file is getting preserved across Solr restarts is a mystery to me. Doing a "kill -9" is one way to make that happen if it is done at just the wrong time, but that's unlikely in what you're describing.
>
> Are you totally sure that there were no old Solr processes running?
> And there have been some issues in the past where the log display of the admin UI hold on to errors and displays them after the problem has been fixed. I'm assuming you can't query the new core, is that correct? Because if you can query the core then _something_ has the index open. I'm grasping at straws here mind you.
>
> Best,
> Erick
>
> On Thu, Aug 24, 2017 at 9:02 PM, Phil Scadden <P....@gns.cri.nz> wrote:
>> SOLR_HOME is /var/www/solr/data
>> The zip was actually the entire data directory which also included configsets. And yes core.properties is in var/www/solr/data/prindex (just has single line name=prindex, in it). No other cores are present.
>> The data directory should have been unzipped before the solr instance was started (I cant actually touch the machine so communicating via a deployment document but the operator usually follows every step to the letter.
>> The sequence was:
>> mkdir /var/www/solr
>> sudo bash ./install_solr_service.sh solr-6.5.1.tgz -i /opt/local -d
>> /var/www/solr edit /etc/default/solr.in.sh to set various items. (esp
>> SOLR_HOME and to set SOLR_PID_DIR to /var/www/solr) unzip the data
>> directory service solr start.
>>
>> No other instance of solr installed.
>>
>> Notice: This email and any attachments are confidential and may not be used, published or redistributed without the prior written consent of the Institute of Geological and Nuclear Sciences Limited (GNS Science). If received in error please destroy and immediately notify GNS Science. Do not copy or disclose the contents.
> Notice: This email and any attachments are confidential and may not be used, published or redistributed without the prior written consent of the Institute of Geological and Nuclear Sciences Limited (GNS Science). If received in error please destroy and immediately notify GNS Science. Do not copy or disclose the contents.