You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucy.apache.org by Peter Karman <pe...@peknet.com> on 2015/03/30 04:42:53 UTC

[lucy-dev] empty segment

I recently encountered a case where several indexes broke because they had zero-length segment and
schema files. These were indexes that were being updated incrementally via regular cron jobs, and
already had more than one segment. The condition I found them in suggested that somehow a segment
was started but not finished.

The fact that there was more than one index suggested to me that I had some systemic problem with
the cron'd indexing logic or the Dezi code that was wrapping the Lucy code, but since neither of
those had changed in months, I was puzzled to see the problem occur at all. None of my logs revealed
any error. I simply couldn't create a new Searcher, instead getting a fatal exception about a NULL
segment.

So my questions are:

(1) under what conditions would Lucy leave an index with zero-length segment and schema files?

(2) is there any way to recover from such a condition? (presumably by re-creating the .schema file)

TIA,
pek

-- 
Peter Karman  .  www.peknet.com  .  @peterkarman

Re: [lucy-dev] empty segment

Posted by Peter Karman <pe...@peknet.com>.
On 3/30/15 11:36 PM, Marvin Humphrey wrote:

>> The snapshot was also zero.
> 
> In that case, I would suspect a system level event -- power failure. OS
> glitch or hardware failrue -- which caused dirty write blocks not to get
> flushed successfully.

you got it. system reboot.

thanks for taking the time to think about it.


-- 
Peter Karman  .  www.peknet.com  .  @peterkarman

Re: [lucy-dev] empty segment

Posted by Marvin Humphrey <ma...@rectangular.com>.
On Mon, Mar 30, 2015 at 8:30 PM, Peter Karman <pe...@peknet.com> wrote:
> I too suspected full disk at first, but the disk was only 50% full so that
> was not it.

Well, that can depend on the size of the index relative to the size of the
disk.  The worst case is that you need ~3x the index size during a full
consolidation down to a single segment: the existing index, temp files, and
the final rewritten version.

However, running up against a full disk should not corrupt the index.  We
check the success of both write() and close() operations on the Unix file
descriptor and throw an exception on failure.  The atomic commit event is a
call to link() which hard links the new snapshot file from a temp file.  Prior
to that event, all OutStream objects must have been flushed and closed,
potentially triggering an exception.  So it ought to be impossible to get to
the commit event if a full disk has caused write operations to fail
for any segment file, schema file, or even the snapshot file itself.

> The snapshot was also zero.

In that case, I would suspect a system level event -- power failure. OS
glitch or hardware failrue -- which caused dirty write blocks not to get
flushed successfully.

We could theoretically improve our defenses against index corruption under
such conditions on most systems using fsync().  However, it's not an absolute
guarantee, performance would suffer, and it would be a pain to implement.

    http://linux.die.net/man/2/fsync

Marvin Humphrey

Re: [lucy-dev] empty segment

Posted by Peter Karman <pe...@peknet.com>.
On 3/30/15 6:32 PM, Marvin Humphrey wrote:
> On Sun, Mar 29, 2015 at 7:42 PM, Peter Karman <pe...@peknet.com> wrote:
>> (1) under what conditions would Lucy leave an index with zero-length
>> segment and schema files?
> 
> I'm having trouble imagining circumstances under which Lucy would produce a
> zero-length schema file but a valid, non-zero-length *snapshot* file.  The
> only thing I can think of is a power failure or OS crash which results in some
> but not all dirty write buffers to be flushed to disk.
> 
> Can you confirm that there is a valid snapshot file which references the
> zero-length schema file?
> 
> If the snapshot file is *also* zero length, I think the main suspect would be
> disk filling up.  But I don't know, because that usually would trigger an
> exception before the new snapshot file gets moved into place.
> 
> This is puzzling.  Maybe there is a bug we don't yet know about.

I too suspected full disk at first, but the disk was only 50% full so that was not it.

The snapshot was also zero. Here's a full dir list:

$ ls -l responses-zero-seg/*
-rw-r--r-- 1 pijuser webhome     0 Mar 23 19:25 responses-zero-seg/schema_3ex.json
-rw-r--r-- 1 pijuser webhome     0 Mar 23 19:25 responses-zero-seg/snapshot_3ex.json
-rw-rw---- 1 pijuser webhome     0 Mar 23 19:04 responses-zero-seg/swish_last_start
-rw-rw---- 1 pijuser webhome 15017 Mar 23 19:25 responses-zero-seg/swish.xml

responses-zero-seg/locks:
total 4
-rw-rw---- 1 pijuser webhome 75 Mar 24 10:40 write.lock

responses-zero-seg/seg_312:
total 11424972
-rw-rw---- 1 pijuser webhome 11699082064 Mar 22 05:02 cf.dat
-rw-rw---- 1 pijuser webhome       59490 Mar 22 05:02 cfmeta.json
-rw-rw---- 1 pijuser webhome       17136 Mar 22 05:02 segmeta.json

responses-zero-seg/seg_38i:
total 121120
-rw-rw---- 1 pijuser webhome 123947240 Mar 23 01:03 cf.dat
-rw-rw---- 1 pijuser webhome     57061 Mar 23 01:03 cfmeta.json
-rw-rw---- 1 pijuser webhome     17021 Mar 23 01:03 segmeta.json

responses-zero-seg/seg_3d2:
total 7144
-rw-rw---- 1 pijuser webhome 7236120 Mar 23 15:50 cf.dat
-rw-rw---- 1 pijuser webhome   53941 Mar 23 15:50 cfmeta.json
-rw-rw---- 1 pijuser webhome   16710 Mar 23 15:50 segmeta.json

responses-zero-seg/seg_3di:
total 6276
-rw-r--r-- 1 pijuser webhome 6345872 Mar 23 16:30 cf.dat
-rw-r--r-- 1 pijuser webhome   54427 Mar 23 16:30 cfmeta.json
-rw-r--r-- 1 pijuser webhome   16597 Mar 23 16:30 segmeta.json

responses-zero-seg/seg_3ex:
total 0
-rw-r--r-- 1 pijuser webhome 0 Mar 23 19:25 cf.dat
-rw-r--r-- 1 pijuser webhome 0 Mar 23 19:25 cfmeta.json
-rw-r--r-- 1 pijuser webhome 0 Mar 23 19:25 segmeta.json


> 
>> (2) is there any way to recover from such a condition? (presumably
>>  by re-creating the .schema file)
> 
> If the schema file has been obliterated, data has been lost from the index and
> the only way to recover is to restore it from an outside source -- such as the
> script which writes the index and might have the schema info.
> 
> A zero-length segment could potentially be ignored or purged.
> 
> A missing or damaged snapshot file is generally not a recoverable event.  The
> information on which segments are part of the current snapshot is *only*
> stored there.
> 


ok, that is all helpful to know. guess the best way is what I did: just rebuild the whole index from
scratch.

Thanks, Marvin.


-- 
Peter Karman  .  www.peknet.com  .  @peterkarman

Re: [lucy-dev] empty segment

Posted by Marvin Humphrey <ma...@rectangular.com>.
On Sun, Mar 29, 2015 at 7:42 PM, Peter Karman <pe...@peknet.com> wrote:
> (1) under what conditions would Lucy leave an index with zero-length
> segment and schema files?

I'm having trouble imagining circumstances under which Lucy would produce a
zero-length schema file but a valid, non-zero-length *snapshot* file.  The
only thing I can think of is a power failure or OS crash which results in some
but not all dirty write buffers to be flushed to disk.

Can you confirm that there is a valid snapshot file which references the
zero-length schema file?

If the snapshot file is *also* zero length, I think the main suspect would be
disk filling up.  But I don't know, because that usually would trigger an
exception before the new snapshot file gets moved into place.

This is puzzling.  Maybe there is a bug we don't yet know about.

> (2) is there any way to recover from such a condition? (presumably
>  by re-creating the .schema file)

If the schema file has been obliterated, data has been lost from the index and
the only way to recover is to restore it from an outside source -- such as the
script which writes the index and might have the schema info.

A zero-length segment could potentially be ignored or purged.

A missing or damaged snapshot file is generally not a recoverable event.  The
information on which segments are part of the current snapshot is *only*
stored there.

Marvin Humphrey