You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Trey <so...@gmail.com> on 2010/01/21 05:54:43 UTC

Replication Handler Severe Error: Unable to move index file

Does anyone know what would cause the following error?:

10:45:10 AM org.apache.solr.handler.SnapPuller copyAFile

     SEVERE: *Unable to move index file* from:
/home/solr/cores/core8/index.20100119103919/_6qv.fnm to:
/home/solr/cores/core8/index/_6qv.fnm
This occurred a few days back and we noticed that several full copies of the
index were subsequently pulled from the master to the slave, effectively
evicting our live index from RAM (the linux os cache), and killing our query
performance due to disk io contention.

Has anyone experienced this behavior recently?  I found an old thread about
this error from early 2009, but it looks like it was patched almost a year
ago:
http://old.nabble.com/%22Unable-to-move-index-file%22-error-during-replication-td21157722.html


Additional Relevant information:
-We are using the Solr 1.4 official release + a field collapsing patch from
mid December (which I believe should only affect query side, not indexing /
replication).
-Our Replication PollInterval for slaves checking the master is very small
(15 seconds)
-We have a multi-box distributed search with each box possessing multiple
cores
-We issue a manual (rolling) optimize across the cores on the master once a
day (occurred ~ 1-2 hours before the above timeline)
-maxWarmingSearchers is set to 1.

Re: Replication Handler Severe Error: Unable to move index file

Posted by Lance Norskog <go...@gmail.com>.
I did not have good luck with super-high-speed polling. You probably
need to adjust the various parameters on both sides of the
replication.

Some sites (LinkedIn for example with Zoie) do not use replication.
They have all query servers do their own indexing, so that new content
will be available immediately. Network bandwidth is a silent killer of
distributed systems, and the update input text is generally smaller
than the binary update files.

On Thu, Jan 21, 2010 at 2:54 PM, Trey <so...@gmail.com> wrote:
> Unfortunately, when I went back to look at the logs this morning, the log
> file had been blown away... that puts a major damper on my debugging
> capabilities - so sorry about that.  As a double whammy, we optimize
> nightly, so the old index files have completely changed at this point.
>
> I do not remember seeing an exception / stack trace in the logs associated
> with the "SEVERE *Unable to move file*" entry, but we were grepping the
> logs, so if it was outputted onto another line it could have possibly been
> there.  I wouldn't really expect to see anything based upon the code in
> SnapPuller.java:
>
> /**
>   * Copy a file by the File#renameTo() method. If it fails, it is
> considered a failure
>   * <p/>
>   * Todo may be we should try a simple copy if it fails
>   */
>  private boolean copyAFile(File tmpIdxDir, File indexDir, String fname,
> List<String> copiedfiles) {
>    File indexFileInTmpDir = new File(tmpIdxDir, fname);
>    File indexFileInIndex = new File(indexDir, fname);
>    boolean success = indexFileInTmpDir.renameTo(indexFileInIndex);
>    if (!success) {
>      LOG.error("Unable to move index file from: " + indexFileInTmpDir
>              + " to: " + indexFileInIndex);
>      for (String f : copiedfiles) {
>        File indexFile = new File(indexDir, f);
>        if (indexFile.exists())
>          indexFile.delete();
>      }
>      delTree(tmpIdxDir);
>      return false;
>    }
>    return true;
>  }
>
> In terms of whether this is an off case: this is the first occurrence of
> this I have seen in the logs.  We tried to replicate the conditions under
> which the exception occurred, but were unable.  I'll send along some more
> useful info if this happens again.
>
> In terms of the behavior we saw: It appears that a replication occurred and
> the "Unable to move file" error occurred.  As a result, it looks like the
> ENTIRE index was subsequently replicated again into a temporary directory
> (several times, over and over).
>
> The end result was that we had multiple full copies of the index in
> temporary index folders on the slave, and the original still couldn't be
> updated (the move to ./index wouldn't work).  Does Solr ever hold files open
> in a manner that would prevent a file in the index directory from being
> overridden?
>
>
> 2010/1/21 Noble Paul നോബിള്‍ नोब्ळ् <no...@corp.aol.com>
>
>> is it a one off case? do you observerve this frequently?
>>
>> On Thu, Jan 21, 2010 at 11:26 AM, Otis Gospodnetic
>> <ot...@yahoo.com> wrote:
>> > It's hard to tell without poking around, but one of the first things I'd
>> do would be to look for /home/solr/cores/core8/index.20100119103919/_6qv.fnm
>> - does this file/dir really exist?  Or, rather, did it exist when the error
>> happened.
>> >
>> > I'm not looking at the source code now, but is that really the only error
>> you got?  No exception stack trace?
>> >
>> >  Otis
>> > --
>> > Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch
>> >
>> >
>> >
>> > ----- Original Message ----
>> >> From: Trey <so...@gmail.com>
>> >> To: solr-user@lucene.apache.org
>> >> Sent: Wed, January 20, 2010 11:54:43 PM
>> >> Subject: Replication Handler Severe Error: Unable to move index file
>> >>
>> >> Does anyone know what would cause the following error?:
>> >>
>> >> 10:45:10 AM org.apache.solr.handler.SnapPuller copyAFile
>> >>
>> >>      SEVERE: *Unable to move index file* from:
>> >> /home/solr/cores/core8/index.20100119103919/_6qv.fnm to:
>> >> /home/solr/cores/core8/index/_6qv.fnm
>> >> This occurred a few days back and we noticed that several full copies of
>> the
>> >> index were subsequently pulled from the master to the slave, effectively
>> >> evicting our live index from RAM (the linux os cache), and killing our
>> query
>> >> performance due to disk io contention.
>> >>
>> >> Has anyone experienced this behavior recently?  I found an old thread
>> about
>> >> this error from early 2009, but it looks like it was patched almost a
>> year
>> >> ago:
>> >>
>> http://old.nabble.com/%22Unable-to-move-index-file%22-error-during-replication-td21157722.html
>> >>
>> >>
>> >> Additional Relevant information:
>> >> -We are using the Solr 1.4 official release + a field collapsing patch
>> from
>> >> mid December (which I believe should only affect query side, not
>> indexing /
>> >> replication).
>> >> -Our Replication PollInterval for slaves checking the master is very
>> small
>> >> (15 seconds)
>> >> -We have a multi-box distributed search with each box possessing
>> multiple
>> >> cores
>> >> -We issue a manual (rolling) optimize across the cores on the master
>> once a
>> >> day (occurred ~ 1-2 hours before the above timeline)
>> >> -maxWarmingSearchers is set to 1.
>> >
>> >
>>
>>
>>
>> --
>> -----------------------------------------------------
>> Noble Paul | Systems Architect| AOL | http://aol.com
>>
>



-- 
Lance Norskog
goksron@gmail.com

Re: Replication Handler Severe Error: Unable to move index file

Posted by Noble Paul നോബിള്‍ नोब्ळ् <no...@corp.aol.com>.
On Fri, Jan 22, 2010 at 4:24 AM, Trey <so...@gmail.com> wrote:
> Unfortunately, when I went back to look at the logs this morning, the log
> file had been blown away... that puts a major damper on my debugging
> capabilities - so sorry about that.  As a double whammy, we optimize
> nightly, so the old index files have completely changed at this point.
>
> I do not remember seeing an exception / stack trace in the logs associated
> with the "SEVERE *Unable to move file*" entry, but we were grepping the
> logs, so if it was outputted onto another line it could have possibly been
> there.  I wouldn't really expect to see anything based upon the code in
> SnapPuller.java:
>
> /**
>   * Copy a file by the File#renameTo() method. If it fails, it is
> considered a failure
>   * <p/>
>   * Todo may be we should try a simple copy if it fails
>   */
>  private boolean copyAFile(File tmpIdxDir, File indexDir, String fname,
> List<String> copiedfiles) {
>    File indexFileInTmpDir = new File(tmpIdxDir, fname);
>    File indexFileInIndex = new File(indexDir, fname);
>    boolean success = indexFileInTmpDir.renameTo(indexFileInIndex);
>    if (!success) {
>      LOG.error("Unable to move index file from: " + indexFileInTmpDir
>              + " to: " + indexFileInIndex);
>      for (String f : copiedfiles) {
>        File indexFile = new File(indexDir, f);
>        if (indexFile.exists())
>          indexFile.delete();
>      }
>      delTree(tmpIdxDir);
>      return false;
>    }
>    return true;
>  }
>
> In terms of whether this is an off case: this is the first occurrence of
> this I have seen in the logs.  We tried to replicate the conditions under
> which the exception occurred, but were unable.  I'll send along some more
> useful info if this happens again.
>
> In terms of the behavior we saw: It appears that a replication occurred and
> the "Unable to move file" error occurred.  As a result, it looks like the
> ENTIRE index was subsequently replicated again into a temporary directory
> (several times, over and over).
>
> The end result was that we had multiple full copies of the index in
> temporary index folders on the slave, and the original still couldn't be
> updated (the move to ./index wouldn't work).  Does Solr ever hold files open
> in a manner that would prevent a file in the index directory from being
> overridden?

There is a TODO which says manual it try to copy if move (renameTo)
fails. We never did it because we never observed renameTo failing.
>
>
> 2010/1/21 Noble Paul നോബിള്‍ नोब्ळ् <no...@corp.aol.com>
>
>> is it a one off case? do you observerve this frequently?
>>
>> On Thu, Jan 21, 2010 at 11:26 AM, Otis Gospodnetic
>> <ot...@yahoo.com> wrote:
>> > It's hard to tell without poking around, but one of the first things I'd
>> do would be to look for /home/solr/cores/core8/index.20100119103919/_6qv.fnm
>> - does this file/dir really exist?  Or, rather, did it exist when the error
>> happened.
>> >
>> > I'm not looking at the source code now, but is that really the only error
>> you got?  No exception stack trace?
>> >
>> >  Otis
>> > --
>> > Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch
>> >
>> >
>> >
>> > ----- Original Message ----
>> >> From: Trey <so...@gmail.com>
>> >> To: solr-user@lucene.apache.org
>> >> Sent: Wed, January 20, 2010 11:54:43 PM
>> >> Subject: Replication Handler Severe Error: Unable to move index file
>> >>
>> >> Does anyone know what would cause the following error?:
>> >>
>> >> 10:45:10 AM org.apache.solr.handler.SnapPuller copyAFile
>> >>
>> >>      SEVERE: *Unable to move index file* from:
>> >> /home/solr/cores/core8/index.20100119103919/_6qv.fnm to:
>> >> /home/solr/cores/core8/index/_6qv.fnm
>> >> This occurred a few days back and we noticed that several full copies of
>> the
>> >> index were subsequently pulled from the master to the slave, effectively
>> >> evicting our live index from RAM (the linux os cache), and killing our
>> query
>> >> performance due to disk io contention.
>> >>
>> >> Has anyone experienced this behavior recently?  I found an old thread
>> about
>> >> this error from early 2009, but it looks like it was patched almost a
>> year
>> >> ago:
>> >>
>> http://old.nabble.com/%22Unable-to-move-index-file%22-error-during-replication-td21157722.html
>> >>
>> >>
>> >> Additional Relevant information:
>> >> -We are using the Solr 1.4 official release + a field collapsing patch
>> from
>> >> mid December (which I believe should only affect query side, not
>> indexing /
>> >> replication).
>> >> -Our Replication PollInterval for slaves checking the master is very
>> small
>> >> (15 seconds)
>> >> -We have a multi-box distributed search with each box possessing
>> multiple
>> >> cores
>> >> -We issue a manual (rolling) optimize across the cores on the master
>> once a
>> >> day (occurred ~ 1-2 hours before the above timeline)
>> >> -maxWarmingSearchers is set to 1.
>> >
>> >
>>
>>
>>
>> --
>> -----------------------------------------------------
>> Noble Paul | Systems Architect| AOL | http://aol.com
>>
>



-- 
-----------------------------------------------------
Noble Paul | Systems Architect| AOL | http://aol.com

Re: Replication Handler Severe Error: Unable to move index file

Posted by Trey <so...@gmail.com>.
Unfortunately, when I went back to look at the logs this morning, the log
file had been blown away... that puts a major damper on my debugging
capabilities - so sorry about that.  As a double whammy, we optimize
nightly, so the old index files have completely changed at this point.

I do not remember seeing an exception / stack trace in the logs associated
with the "SEVERE *Unable to move file*" entry, but we were grepping the
logs, so if it was outputted onto another line it could have possibly been
there.  I wouldn't really expect to see anything based upon the code in
SnapPuller.java:

/**
   * Copy a file by the File#renameTo() method. If it fails, it is
considered a failure
   * <p/>
   * Todo may be we should try a simple copy if it fails
   */
  private boolean copyAFile(File tmpIdxDir, File indexDir, String fname,
List<String> copiedfiles) {
    File indexFileInTmpDir = new File(tmpIdxDir, fname);
    File indexFileInIndex = new File(indexDir, fname);
    boolean success = indexFileInTmpDir.renameTo(indexFileInIndex);
    if (!success) {
      LOG.error("Unable to move index file from: " + indexFileInTmpDir
              + " to: " + indexFileInIndex);
      for (String f : copiedfiles) {
        File indexFile = new File(indexDir, f);
        if (indexFile.exists())
          indexFile.delete();
      }
      delTree(tmpIdxDir);
      return false;
    }
    return true;
  }

In terms of whether this is an off case: this is the first occurrence of
this I have seen in the logs.  We tried to replicate the conditions under
which the exception occurred, but were unable.  I'll send along some more
useful info if this happens again.

In terms of the behavior we saw: It appears that a replication occurred and
the "Unable to move file" error occurred.  As a result, it looks like the
ENTIRE index was subsequently replicated again into a temporary directory
(several times, over and over).

The end result was that we had multiple full copies of the index in
temporary index folders on the slave, and the original still couldn't be
updated (the move to ./index wouldn't work).  Does Solr ever hold files open
in a manner that would prevent a file in the index directory from being
overridden?


2010/1/21 Noble Paul നോബിള്‍ नोब्ळ् <no...@corp.aol.com>

> is it a one off case? do you observerve this frequently?
>
> On Thu, Jan 21, 2010 at 11:26 AM, Otis Gospodnetic
> <ot...@yahoo.com> wrote:
> > It's hard to tell without poking around, but one of the first things I'd
> do would be to look for /home/solr/cores/core8/index.20100119103919/_6qv.fnm
> - does this file/dir really exist?  Or, rather, did it exist when the error
> happened.
> >
> > I'm not looking at the source code now, but is that really the only error
> you got?  No exception stack trace?
> >
> >  Otis
> > --
> > Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch
> >
> >
> >
> > ----- Original Message ----
> >> From: Trey <so...@gmail.com>
> >> To: solr-user@lucene.apache.org
> >> Sent: Wed, January 20, 2010 11:54:43 PM
> >> Subject: Replication Handler Severe Error: Unable to move index file
> >>
> >> Does anyone know what would cause the following error?:
> >>
> >> 10:45:10 AM org.apache.solr.handler.SnapPuller copyAFile
> >>
> >>      SEVERE: *Unable to move index file* from:
> >> /home/solr/cores/core8/index.20100119103919/_6qv.fnm to:
> >> /home/solr/cores/core8/index/_6qv.fnm
> >> This occurred a few days back and we noticed that several full copies of
> the
> >> index were subsequently pulled from the master to the slave, effectively
> >> evicting our live index from RAM (the linux os cache), and killing our
> query
> >> performance due to disk io contention.
> >>
> >> Has anyone experienced this behavior recently?  I found an old thread
> about
> >> this error from early 2009, but it looks like it was patched almost a
> year
> >> ago:
> >>
> http://old.nabble.com/%22Unable-to-move-index-file%22-error-during-replication-td21157722.html
> >>
> >>
> >> Additional Relevant information:
> >> -We are using the Solr 1.4 official release + a field collapsing patch
> from
> >> mid December (which I believe should only affect query side, not
> indexing /
> >> replication).
> >> -Our Replication PollInterval for slaves checking the master is very
> small
> >> (15 seconds)
> >> -We have a multi-box distributed search with each box possessing
> multiple
> >> cores
> >> -We issue a manual (rolling) optimize across the cores on the master
> once a
> >> day (occurred ~ 1-2 hours before the above timeline)
> >> -maxWarmingSearchers is set to 1.
> >
> >
>
>
>
> --
> -----------------------------------------------------
> Noble Paul | Systems Architect| AOL | http://aol.com
>

Re: Replication Handler Severe Error: Unable to move index file

Posted by Noble Paul നോബിള്‍ नोब्ळ् <no...@corp.aol.com>.
is it a one off case? do you observerve this frequently?

On Thu, Jan 21, 2010 at 11:26 AM, Otis Gospodnetic
<ot...@yahoo.com> wrote:
> It's hard to tell without poking around, but one of the first things I'd do would be to look for /home/solr/cores/core8/index.20100119103919/_6qv.fnm - does this file/dir really exist?  Or, rather, did it exist when the error happened.
>
> I'm not looking at the source code now, but is that really the only error you got?  No exception stack trace?
>
>  Otis
> --
> Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch
>
>
>
> ----- Original Message ----
>> From: Trey <so...@gmail.com>
>> To: solr-user@lucene.apache.org
>> Sent: Wed, January 20, 2010 11:54:43 PM
>> Subject: Replication Handler Severe Error: Unable to move index file
>>
>> Does anyone know what would cause the following error?:
>>
>> 10:45:10 AM org.apache.solr.handler.SnapPuller copyAFile
>>
>>      SEVERE: *Unable to move index file* from:
>> /home/solr/cores/core8/index.20100119103919/_6qv.fnm to:
>> /home/solr/cores/core8/index/_6qv.fnm
>> This occurred a few days back and we noticed that several full copies of the
>> index were subsequently pulled from the master to the slave, effectively
>> evicting our live index from RAM (the linux os cache), and killing our query
>> performance due to disk io contention.
>>
>> Has anyone experienced this behavior recently?  I found an old thread about
>> this error from early 2009, but it looks like it was patched almost a year
>> ago:
>> http://old.nabble.com/%22Unable-to-move-index-file%22-error-during-replication-td21157722.html
>>
>>
>> Additional Relevant information:
>> -We are using the Solr 1.4 official release + a field collapsing patch from
>> mid December (which I believe should only affect query side, not indexing /
>> replication).
>> -Our Replication PollInterval for slaves checking the master is very small
>> (15 seconds)
>> -We have a multi-box distributed search with each box possessing multiple
>> cores
>> -We issue a manual (rolling) optimize across the cores on the master once a
>> day (occurred ~ 1-2 hours before the above timeline)
>> -maxWarmingSearchers is set to 1.
>
>



-- 
-----------------------------------------------------
Noble Paul | Systems Architect| AOL | http://aol.com

Re: Replication Handler Severe Error: Unable to move index file

Posted by Otis Gospodnetic <ot...@yahoo.com>.
It's hard to tell without poking around, but one of the first things I'd do would be to look for /home/solr/cores/core8/index.20100119103919/_6qv.fnm - does this file/dir really exist?  Or, rather, did it exist when the error happened.

I'm not looking at the source code now, but is that really the only error you got?  No exception stack trace?

 Otis
--
Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch



----- Original Message ----
> From: Trey <so...@gmail.com>
> To: solr-user@lucene.apache.org
> Sent: Wed, January 20, 2010 11:54:43 PM
> Subject: Replication Handler Severe Error: Unable to move index file
> 
> Does anyone know what would cause the following error?:
> 
> 10:45:10 AM org.apache.solr.handler.SnapPuller copyAFile
> 
>      SEVERE: *Unable to move index file* from:
> /home/solr/cores/core8/index.20100119103919/_6qv.fnm to:
> /home/solr/cores/core8/index/_6qv.fnm
> This occurred a few days back and we noticed that several full copies of the
> index were subsequently pulled from the master to the slave, effectively
> evicting our live index from RAM (the linux os cache), and killing our query
> performance due to disk io contention.
> 
> Has anyone experienced this behavior recently?  I found an old thread about
> this error from early 2009, but it looks like it was patched almost a year
> ago:
> http://old.nabble.com/%22Unable-to-move-index-file%22-error-during-replication-td21157722.html
> 
> 
> Additional Relevant information:
> -We are using the Solr 1.4 official release + a field collapsing patch from
> mid December (which I believe should only affect query side, not indexing /
> replication).
> -Our Replication PollInterval for slaves checking the master is very small
> (15 seconds)
> -We have a multi-box distributed search with each box possessing multiple
> cores
> -We issue a manual (rolling) optimize across the cores on the master once a
> day (occurred ~ 1-2 hours before the above timeline)
> -maxWarmingSearchers is set to 1.