You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by "Tarr, Gregory" <Gr...@detica.com> on 2011/06/28 10:46:29 UTC

Corrupt segments file full of zeros

We have a problem with our fileserver where our indexes are hosted
remotely, using Lucene 2.9.3.

This can mean that a segments file is written which is full of ASCII
zeros. Using the od -ah command, we get:

0000000 nul nul nul nul nul nul nul....etc

If opened in Luke, the index opens successfully but has zero documents.

Why does this open correctly in luke, and is there a procedure in the
lucene code that can verify a segments file, e.g. check whether it
refers to any segments?

Thanks

Greg


Please consider the environment before printing this email.

This message should be regarded as confidential. If you have received this email in error please notify the sender and destroy it immediately.

Statements of intent shall only become binding when confirmed in hard copy by an authorised signatory.  The contents of this email may relate to dealings with other companies under the control of Detica Limited, details of which can be found at http://www.detica.com/statutory-information.

Detica Limited is registered in England under No: 1337451.
Registered offices: Surrey Research Park, Guildford, Surrey, GU2 7YP, England.


Re: Corrupt segments file full of zeros

Posted by Michael McCandless <lu...@mikemccandless.com>.
On Tue, Jun 28, 2011 at 10:45 PM, Trejkaz <tr...@trypticon.org> wrote:
> On Wed, Jun 29, 2011 at 2:24 AM, Michael McCandless
> <lu...@mikemccandless.com> wrote:
>> Here's the issue:
>>
>>    https://issues.apache.org/jira/browse/LUCENE-3255
>>
>> It's because we read the first 0 int to be an ancient segments file
>> format, and the next 0 int to mean there are no segments.  Yuck!
>>
>> This format pre-dates Lucene 1.9, so the fix for 3.x is to stop
>> supporting this ancient format... but I don't see any easy way to fix
>> this pre-3.x where we must (by our back compat rules) support such an
>> ancient index.
>
> It's not possible to do something based on the existence of further
> zeroes after the first 8 bytes?  I would expect the original format to
> have no additional data after that, but I don't exactly know whether a
> corrupt file could be exactly 8 bytes long...

Yes, you're right, it is!  That would work, as long as the all 0s file
isn't exactly 8 bytes long (this time yours was 20).  But then we are
still vulnerable if the corruption just happens to produce an 8 byte
all 0s file...

Simon also had a good idea, which is to check the version of the prior
segments file, and refuse to accept this ancient version of the newer
segments if the prior one is "modern".

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Corrupt segments file full of zeros

Posted by Trejkaz <tr...@trypticon.org>.
On Wed, Jun 29, 2011 at 2:24 AM, Michael McCandless
<lu...@mikemccandless.com> wrote:
> Here's the issue:
>
>    https://issues.apache.org/jira/browse/LUCENE-3255
>
> It's because we read the first 0 int to be an ancient segments file
> format, and the next 0 int to mean there are no segments.  Yuck!
>
> This format pre-dates Lucene 1.9, so the fix for 3.x is to stop
> supporting this ancient format... but I don't see any easy way to fix
> this pre-3.x where we must (by our back compat rules) support such an
> ancient index.

It's not possible to do something based on the existence of further
zeroes after the first 8 bytes?  I would expect the original format to
have no additional data after that, but I don't exactly know whether a
corrupt file could be exactly 8 bytes long...

TX

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Corrupt segments file full of zeros

Posted by Michael McCandless <lu...@mikemccandless.com>.
Here's the issue:

    https://issues.apache.org/jira/browse/LUCENE-3255

It's because we read the first 0 int to be an ancient segments file
format, and the next 0 int to mean there are no segments.  Yuck!

This format pre-dates Lucene 1.9, so the fix for 3.x is to stop
supporting this ancient format... but I don't see any easy way to fix
this pre-3.x where we must (by our back compat rules) support such an
ancient index.

Mike McCandless

http://blog.mikemccandless.com

On Tue, Jun 28, 2011 at 10:09 AM, mark harwood <ma...@yahoo.co.uk> wrote:
> I've got Greg's bad segment file and it does look to be all zeros and if I drop
> it into an existing index directory with the name segment_N+1 it reproduces the
> error i.e. IndexReader opens the index as if it contains zero docs.
> Preparing a Jira as we speak.
>
>
> ----- Original Message ----
> From: Michael McCandless <lu...@mikemccandless.com>
> To: java-user@lucene.apache.org
> Sent: Tue, 28 June, 2011 14:59:48
> Subject: Re: Corrupt segments file full of zeros
>
> On Tue, Jun 28, 2011 at 9:29 AM, mark harwood <ma...@yahoo.co.uk> wrote:
>> Hi Mike.
>>>>Hmmm -- what code are you running here, to print the number of docs?
>>
>> SegmentInfos.setInfoStream(System.out);
>> FSDirectory dir = FSDirectory.open(new File("j:/indexes/myindex"));
>> IndexReader r = IndexReader.open(dir, true);
>> System.out.println("index has "+r.maxDoc()+" docs");
>>
>> From my own tests outside of Greg's environment I've found Lucene to be doing
>> all the right things and IndexReader falls back gracefully to the previous
>> commit e.g. here is the output from when I deliberately killed an update after
>> prepareToCommit, leaving segment_2 and segment_3 and  then vandalised
> segment_3
>> with all zero bytes:
>>   SIS [main]: directory listing genA=3
>>   SIS [main]: fallback check: 2; 2
>>   SIS [main]: segments.gen check: genB=2
>>   SIS [main]: primary Exception on 'segments_3': java.io.IOException: read
> past
>> EOF'; will retry: retry=false; gen = 3
>>   SIS [main]: fallback to prior segment file 'segments_2'
>>   SIS [main]: success on fallback segments_2
>>
>> Lucene does the right thing going back to _2. I can't yet see why in Greg's
>> environment (NFS based) it fails to see _4vc as corrupt in the same way the
>> above test correctly sees _3 as corrupt.
>
> Hmm.  Mark, if you vandalise segments_3 with 0s, and then remove
> segmetns_2, what happens when you try to open the IndexReader?  (I
> would expect exc).
>
> Greg, can you post the full stdout you see from SIS after enabling its
> infoStream in the case that returns an IR with 0 docs (ie when you
> delete segments_4vb).
>
> Also: if you don't delete any of the segments_N file, and run the same
> code, how many docs do you get?
>
> Mike
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Corrupt segments file full of zeros

Posted by mark harwood <ma...@yahoo.co.uk>.
I've got Greg's bad segment file and it does look to be all zeros and if I drop 
it into an existing index directory with the name segment_N+1 it reproduces the 
error i.e. IndexReader opens the index as if it contains zero docs.
Preparing a Jira as we speak.


----- Original Message ----
From: Michael McCandless <lu...@mikemccandless.com>
To: java-user@lucene.apache.org
Sent: Tue, 28 June, 2011 14:59:48
Subject: Re: Corrupt segments file full of zeros

On Tue, Jun 28, 2011 at 9:29 AM, mark harwood <ma...@yahoo.co.uk> wrote:
> Hi Mike.
>>>Hmmm -- what code are you running here, to print the number of docs?
>
> SegmentInfos.setInfoStream(System.out);
> FSDirectory dir = FSDirectory.open(new File("j:/indexes/myindex"));
> IndexReader r = IndexReader.open(dir, true);
> System.out.println("index has "+r.maxDoc()+" docs");
>
> From my own tests outside of Greg's environment I've found Lucene to be doing
> all the right things and IndexReader falls back gracefully to the previous
> commit e.g. here is the output from when I deliberately killed an update after
> prepareToCommit, leaving segment_2 and segment_3 and  then vandalised 
segment_3
> with all zero bytes:
>   SIS [main]: directory listing genA=3
>   SIS [main]: fallback check: 2; 2
>   SIS [main]: segments.gen check: genB=2
>   SIS [main]: primary Exception on 'segments_3': java.io.IOException: read 
past
> EOF'; will retry: retry=false; gen = 3
>   SIS [main]: fallback to prior segment file 'segments_2'
>   SIS [main]: success on fallback segments_2
>
> Lucene does the right thing going back to _2. I can't yet see why in Greg's
> environment (NFS based) it fails to see _4vc as corrupt in the same way the
> above test correctly sees _3 as corrupt.

Hmm.  Mark, if you vandalise segments_3 with 0s, and then remove
segmetns_2, what happens when you try to open the IndexReader?  (I
would expect exc).

Greg, can you post the full stdout you see from SIS after enabling its
infoStream in the case that returns an IR with 0 docs (ie when you
delete segments_4vb).

Also: if you don't delete any of the segments_N file, and run the same
code, how many docs do you get?

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Corrupt segments file full of zeros

Posted by Michael McCandless <lu...@mikemccandless.com>.
On Tue, Jun 28, 2011 at 9:29 AM, mark harwood <ma...@yahoo.co.uk> wrote:
> Hi Mike.
>>>Hmmm -- what code are you running here, to print the number of docs?
>
> SegmentInfos.setInfoStream(System.out);
> FSDirectory dir = FSDirectory.open(new File("j:/indexes/myindex"));
> IndexReader r = IndexReader.open(dir, true);
> System.out.println("index has "+r.maxDoc()+" docs");
>
> From my own tests outside of Greg's environment I've found Lucene to be doing
> all the right things and IndexReader falls back gracefully to the previous
> commit e.g. here is the output from when I deliberately killed an update after
> prepareToCommit, leaving segment_2 and segment_3 and  then vandalised segment_3
> with all zero bytes:
>   SIS [main]: directory listing genA=3
>   SIS [main]: fallback check: 2; 2
>   SIS [main]: segments.gen check: genB=2
>   SIS [main]: primary Exception on 'segments_3': java.io.IOException: read past
> EOF'; will retry: retry=false; gen = 3
>   SIS [main]: fallback to prior segment file 'segments_2'
>   SIS [main]: success on fallback segments_2
>
> Lucene does the right thing going back to _2. I can't yet see why in Greg's
> environment (NFS based) it fails to see _4vc as corrupt in the same way the
> above test correctly sees _3 as corrupt.

Hmm.  Mark, if you vandalise segments_3 with 0s, and then remove
segmetns_2, what happens when you try to open the IndexReader?  (I
would expect exc).

Greg, can you post the full stdout you see from SIS after enabling its
infoStream in the case that returns an IR with 0 docs (ie when you
delete segments_4vb).

Also: if you don't delete any of the segments_N file, and run the same
code, how many docs do you get?

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Corrupt segments file full of zeros

Posted by mark harwood <ma...@yahoo.co.uk>.
Hi Mike. 
>>Hmmm -- what code are you running here, to print the number of docs?

SegmentInfos.setInfoStream(System.out);
FSDirectory dir = FSDirectory.open(new File("j:/indexes/myindex"));
IndexReader r = IndexReader.open(dir, true);
System.out.println("index has "+r.maxDoc()+" docs");  

From my own tests outside of Greg's environment I've found Lucene to be doing 
all the right things and IndexReader falls back gracefully to the previous 
commit e.g. here is the output from when I deliberately killed an update after 
prepareToCommit, leaving segment_2 and segment_3 and  then vandalised segment_3 
with all zero bytes:
   SIS [main]: directory listing genA=3
   SIS [main]: fallback check: 2; 2
   SIS [main]: segments.gen check: genB=2
   SIS [main]: primary Exception on 'segments_3': java.io.IOException: read past 
EOF'; will retry: retry=false; gen = 3
   SIS [main]: fallback to prior segment file 'segments_2'
   SIS [main]: success on fallback segments_2

Lucene does the right thing going back to _2. I can't yet see why in Greg's 
environment (NFS based) it fails to see _4vc as corrupt in the same way the 
above test correctly sees _3 as corrupt.

Cheers
Mark


----- Original Message ----
From: Michael McCandless <lu...@mikemccandless.com>
To: java-user@lucene.apache.org
Sent: Tue, 28 June, 2011 14:04:40
Subject: Re: Corrupt segments file full of zeros

On Tue, Jun 28, 2011 at 8:53 AM, Tarr, Gregory <Gr...@detica.com> wrote:
> Michael
>
> We are not using commit points unfortunately.

That's fine -- even if you don't keep multiple commit points in your
index, when a commit() op fails, then you can end up with two
segments_N files.  The older one is "good" (last successful commit)
and the new one is broken.

> This was a scheduled update to our index, and on observation the index 
>directory had two segments_N files:
>
> segments_4vb (modified 24 June 2011 02:05:38 size 7.61KB)
> segments_4vc (modified 24 June 2011 02:20:42 size 5.91KB)

OK, so you have 2 segments_N files because something went wrong during
commit of the 2nd one.

> We were not sure which one of these was the real one, so we deleted 4vb and got 
>the following from SegmentInfos:

It will always be the "older" one that was the last successful commit,
unless you keep multiple commit points in the index.

> Directory listing genA=6312
> Fallback check: 6311; 6311
> Segments.gen check: genB=6311
> Index has 0 docs

Hmmm -- what code are you running here, to print the number of docs?
new IndexWriter(), with create=true?  I would have expected IR.open to
throw an exc here.

> We then deleted 4vc and got the following:
>
> Directory listing genA=6311
> Fallback check: 6311; 6311
> Segments.gen check: genB=6311
> Index has 40022898 docs
>
> Opening 4vc in an octal editor yields only ASCII zeros (0000000 nul nul nul nul 
>nul nul nul....etc). It may be that Windows is responsible for this, as our 
>indexes are accessed through a fileserver and we know that a delayed write 
>occurred.
>
> My question is: why does an index with 4vc open?

I'm not sure, unless you are opening with IW and create=true.

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Corrupt segments file full of zeros

Posted by Michael McCandless <lu...@mikemccandless.com>.
On Tue, Jun 28, 2011 at 8:53 AM, Tarr, Gregory <Gr...@detica.com> wrote:
> Michael
>
> We are not using commit points unfortunately.

That's fine -- even if you don't keep multiple commit points in your
index, when a commit() op fails, then you can end up with two
segments_N files.  The older one is "good" (last successful commit)
and the new one is broken.

> This was a scheduled update to our index, and on observation the index directory had two segments_N files:
>
> segments_4vb (modified 24 June 2011 02:05:38 size 7.61KB)
> segments_4vc (modified 24 June 2011 02:20:42 size 5.91KB)

OK, so you have 2 segments_N files because something went wrong during
commit of the 2nd one.

> We were not sure which one of these was the real one, so we deleted 4vb and got the following from SegmentInfos:

It will always be the "older" one that was the last successful commit,
unless you keep multiple commit points in the index.

> Directory listing genA=6312
> Fallback check: 6311; 6311
> Segments.gen check: genB=6311
> Index has 0 docs

Hmmm -- what code are you running here, to print the number of docs?
new IndexWriter(), with create=true?  I would have expected IR.open to
throw an exc here.

> We then deleted 4vc and got the following:
>
> Directory listing genA=6311
> Fallback check: 6311; 6311
> Segments.gen check: genB=6311
> Index has 40022898 docs
>
> Opening 4vc in an octal editor yields only ASCII zeros (0000000 nul nul nul nul nul nul nul....etc). It may be that Windows is responsible for this, as our indexes are accessed through a fileserver and we know that a delayed write occurred.
>
> My question is: why does an index with 4vc open?

I'm not sure, unless you are opening with IW and create=true.

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


RE: Corrupt segments file full of zeros

Posted by "Tarr, Gregory" <Gr...@detica.com>.
Michael

We are not using commit points unfortunately.

This was a scheduled update to our index, and on observation the index directory had two segments_N files:

segments_4vb (modified 24 June 2011 02:05:38 size 7.61KB)
segments_4vc (modified 24 June 2011 02:20:42 size 5.91KB) 

We were not sure which one of these was the real one, so we deleted 4vb and got the following from SegmentInfos:

Directory listing genA=6312
Fallback check: 6311; 6311
Segments.gen check: genB=6311
Index has 0 docs

We then deleted 4vc and got the following:

Directory listing genA=6311
Fallback check: 6311; 6311
Segments.gen check: genB=6311
Index has 40022898 docs 

Opening 4vc in an octal editor yields only ASCII zeros (0000000 nul nul nul nul nul nul nul....etc). It may be that Windows is responsible for this, as our indexes are accessed through a fileserver and we know that a delayed write occurred.

My question is: why does an index with 4vc open?

Thanks

Greg

-----Original Message-----
From: Michael McCandless [mailto:lucene@mikemccandless.com] 
Sent: 28 June 2011 13:36
To: java-user@lucene.apache.org
Subject: Re: Corrupt segments file full of zeros

OK, this is why Lucene (and Luke) consider the index fine, ie, if Lucene has problems opening segments_N (all 0s is definitely not a valid segments_N file), it falls back to the last commit
(segments_(N-1)) and opens that instead.

Ie, IR.open and new IW(...) open the last successful commit.

Mike McCandless

http://blog.mikemccandless.com

On Tue, Jun 28, 2011 at 8:28 AM, Tarr, Gregory <Gr...@detica.com> wrote:
> There was a segments_(N-1), which was a valid segments file and opened correctly in luke.
>
> The trouble came because we had to manually rename these files in order to prevent the index from being wiped.
>
> Thanks
>
> Greg
>
> -----Original Message-----
> From: Michael McCandless [mailto:lucene@mikemccandless.com]
> Sent: 28 June 2011 13:26
> To: java-user@lucene.apache.org
> Subject: Re: Corrupt segments file full of zeros
>
> Is there only one segments_N file in the index (the one with all 0s)?
> Or is there a segments_(N-1) too?
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
> On Tue, Jun 28, 2011 at 8:17 AM, Tarr, Gregory <Gr...@detica.com> wrote:
>> We don't have a -9 in the file. It isn't a valid lucene segments 
>> file, as it only contains zeros.
>>
>> We're wondering why this opens in Luke, and why the CheckIndex 
>> reports that the index is OK.
>>
>> -----Original Message-----
>> From: mark harwood [mailto:markharw00d@yahoo.co.uk]
>> Sent: 28 June 2011 13:09
>> To: java-user@lucene.apache.org
>> Subject: Re: Corrupt segments file full of zeros
>>
>> According to the spec there should at least be an Int32 of  -9 to 
>> declare the Format - 
>> http://lucene.apache.org/java/2_9_3/fileformats.html#Segments File
>>
>>
>>
>> ----- Original Message ----
>> From: Uwe Schindler <uw...@thetaphi.de>
>> To: java-user@lucene.apache.org
>> Sent: Tue, 28 June, 2011 12:32:34
>> Subject: RE: Corrupt segments file full of zeros
>>
>> So where is the problem at all? Why should a segments file not 
>> contain lots of zeroes? If the index is not corrupt all is fine.
>>
>> -----
>> Uwe Schindler
>> H.-H.-Meier-Allee 63, D-28213 Bremen
>> http://www.thetaphi.de
>> eMail: uwe@thetaphi.de
>>
>>
>>> -----Original Message-----
>>> From: Tarr, Gregory [mailto:Gregory.tarr@detica.com]
>>> Sent: Tuesday, June 28, 2011 11:56 AM
>>> To: java-user@lucene.apache.org
>>> Subject: RE: Corrupt segments file full of zeros
>>>
>>> Yes I have done that, and you just get "No problems were detected 
>>> with
>> this
>>> index"
>>>
>>> Surely there is a major problem with this index?
>>>
>>> Also the check() procedure takes a long time - is there any way you
>> can
>> just
>>> do a health check on the segments file?
>>>
>>> Thanks
>>>
>>> Greg
>>>
>>> -----Original Message-----
>>> From: Shai Erera [mailto:serera@gmail.com]
>>> Sent: 28 June 2011 10:36
>>> To: java-user@lucene.apache.org
>>> Subject: Re: Corrupt segments file full of zeros
>>>
>>> You can try the CheckIndex tool. You feed it a directory and call
>>> .check() and it reports the results.
>>>
>>> Shai
>>>
>>> On Tue, Jun 28, 2011 at 11:46 AM, Tarr, Gregory
>>> <Gr...@detica.com>wrote:
>>>
>>> > We have a problem with our fileserver where our indexes are hosted 
>>> > remotely, using Lucene 2.9.3.
>>> >
>>> > This can mean that a segments file is written which is full of 
>>> > ASCII zeros. Using the od -ah command, we get:
>>> >
>>> > 0000000 nul nul nul nul nul nul nul....etc
>>> >
>>> > If opened in Luke, the index opens successfully but has zero
>>> documents.
>>> >
>>> > Why does this open correctly in luke, and is there a procedure in
>> the
>>> > lucene code that can verify a segments file, e.g. check whether it 
>>> > refers to any segments?
>>> >
>>> > Thanks
>>> >
>>> > Greg
>>> >
>>> >
>>> > Please consider the environment before printing this email.
>>> >
>>> > This message should be regarded as confidential. If you have
>> received
>>> > this email in error please notify the sender and destroy it
>>> immediately.
>>> >
>>> > Statements of intent shall only become binding when confirmed in
>> hard
>>> > copy by an authorised signatory.  The contents of this email may 
>>> > relate to dealings with other companies under the control of 
>>> > Detica Limited, details of which can be found at
>>> http://www.detica.com/statutory-information.
>>> >
>>> > Detica Limited is registered in England under No: 1337451.
>>> > Registered offices: Surrey Research Park, Guildford, Surrey, GU2
>> 7YP,
>>> > England.
>>> >
>>> >
>>> Please consider the environment before printing this email.
>>>
>>> This message should be regarded as confidential. If you have 
>>> received
>> this
>>> email in error please notify the sender and destroy it immediately.
>>>
>>> Statements of intent shall only become binding when confirmed in 
>>> hard
>> copy
>>> by an authorised signatory.  The contents of this email may relate 
>>> to
>> dealings
>>> with other companies under the control of Detica Limited, details of
>> which
>>> can be found at http://www.detica.com/statutory-information.
>>>
>>> Detica Limited is registered in England under No: 1337451.
>>> Registered offices: Surrey Research Park, Guildford, Surrey, GU2 
>>> 7YP, England.
>>>
>>> --------------------------------------------------------------------
>>> - To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>> Please consider the environment before printing this email.
>>
>> This message should be regarded as confidential. If you have received this email in error please notify the sender and destroy it immediately.
>>
>> Statements of intent shall only become binding when confirmed in hard copy by an authorised signatory.  The contents of this email may relate to dealings with other companies under the control of Detica Limited, details of which can be found at http://www.detica.com/statutory-information.
>>
>> Detica Limited is registered in England under No: 1337451.
>> Registered offices: Surrey Research Park, Guildford, Surrey, GU2 7YP, England.
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
> Please consider the environment before printing this email.
>
> This message should be regarded as confidential. If you have received this email in error please notify the sender and destroy it immediately.
>
> Statements of intent shall only become binding when confirmed in hard copy by an authorised signatory.  The contents of this email may relate to dealings with other companies under the control of Detica Limited, details of which can be found at http://www.detica.com/statutory-information.
>
> Detica Limited is registered in England under No: 1337451.
> Registered offices: Surrey Research Park, Guildford, Surrey, GU2 7YP, England.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Please consider the environment before printing this email.

This message should be regarded as confidential. If you have received this email in error please notify the sender and destroy it immediately.

Statements of intent shall only become binding when confirmed in hard copy by an authorised signatory.  The contents of this email may relate to dealings with other companies under the control of Detica Limited, details of which can be found at http://www.detica.com/statutory-information.

Detica Limited is registered in England under No: 1337451.
Registered offices: Surrey Research Park, Guildford, Surrey, GU2 7YP, England.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Corrupt segments file full of zeros

Posted by Michael McCandless <lu...@mikemccandless.com>.
OK, this is why Lucene (and Luke) consider the index fine, ie, if
Lucene has problems opening segments_N (all 0s is definitely not a
valid segments_N file), it falls back to the last commit
(segments_(N-1)) and opens that instead.

Ie, IR.open and new IW(...) open the last successful commit.

Mike McCandless

http://blog.mikemccandless.com

On Tue, Jun 28, 2011 at 8:28 AM, Tarr, Gregory <Gr...@detica.com> wrote:
> There was a segments_(N-1), which was a valid segments file and opened correctly in luke.
>
> The trouble came because we had to manually rename these files in order to prevent the index from being wiped.
>
> Thanks
>
> Greg
>
> -----Original Message-----
> From: Michael McCandless [mailto:lucene@mikemccandless.com]
> Sent: 28 June 2011 13:26
> To: java-user@lucene.apache.org
> Subject: Re: Corrupt segments file full of zeros
>
> Is there only one segments_N file in the index (the one with all 0s)?
> Or is there a segments_(N-1) too?
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
> On Tue, Jun 28, 2011 at 8:17 AM, Tarr, Gregory <Gr...@detica.com> wrote:
>> We don't have a -9 in the file. It isn't a valid lucene segments file,
>> as it only contains zeros.
>>
>> We're wondering why this opens in Luke, and why the CheckIndex reports
>> that the index is OK.
>>
>> -----Original Message-----
>> From: mark harwood [mailto:markharw00d@yahoo.co.uk]
>> Sent: 28 June 2011 13:09
>> To: java-user@lucene.apache.org
>> Subject: Re: Corrupt segments file full of zeros
>>
>> According to the spec there should at least be an Int32 of  -9 to
>> declare the Format -
>> http://lucene.apache.org/java/2_9_3/fileformats.html#Segments File
>>
>>
>>
>> ----- Original Message ----
>> From: Uwe Schindler <uw...@thetaphi.de>
>> To: java-user@lucene.apache.org
>> Sent: Tue, 28 June, 2011 12:32:34
>> Subject: RE: Corrupt segments file full of zeros
>>
>> So where is the problem at all? Why should a segments file not contain
>> lots of zeroes? If the index is not corrupt all is fine.
>>
>> -----
>> Uwe Schindler
>> H.-H.-Meier-Allee 63, D-28213 Bremen
>> http://www.thetaphi.de
>> eMail: uwe@thetaphi.de
>>
>>
>>> -----Original Message-----
>>> From: Tarr, Gregory [mailto:Gregory.tarr@detica.com]
>>> Sent: Tuesday, June 28, 2011 11:56 AM
>>> To: java-user@lucene.apache.org
>>> Subject: RE: Corrupt segments file full of zeros
>>>
>>> Yes I have done that, and you just get "No problems were detected
>>> with
>> this
>>> index"
>>>
>>> Surely there is a major problem with this index?
>>>
>>> Also the check() procedure takes a long time - is there any way you
>> can
>> just
>>> do a health check on the segments file?
>>>
>>> Thanks
>>>
>>> Greg
>>>
>>> -----Original Message-----
>>> From: Shai Erera [mailto:serera@gmail.com]
>>> Sent: 28 June 2011 10:36
>>> To: java-user@lucene.apache.org
>>> Subject: Re: Corrupt segments file full of zeros
>>>
>>> You can try the CheckIndex tool. You feed it a directory and call
>>> .check() and it reports the results.
>>>
>>> Shai
>>>
>>> On Tue, Jun 28, 2011 at 11:46 AM, Tarr, Gregory
>>> <Gr...@detica.com>wrote:
>>>
>>> > We have a problem with our fileserver where our indexes are hosted
>>> > remotely, using Lucene 2.9.3.
>>> >
>>> > This can mean that a segments file is written which is full of
>>> > ASCII zeros. Using the od -ah command, we get:
>>> >
>>> > 0000000 nul nul nul nul nul nul nul....etc
>>> >
>>> > If opened in Luke, the index opens successfully but has zero
>>> documents.
>>> >
>>> > Why does this open correctly in luke, and is there a procedure in
>> the
>>> > lucene code that can verify a segments file, e.g. check whether it
>>> > refers to any segments?
>>> >
>>> > Thanks
>>> >
>>> > Greg
>>> >
>>> >
>>> > Please consider the environment before printing this email.
>>> >
>>> > This message should be regarded as confidential. If you have
>> received
>>> > this email in error please notify the sender and destroy it
>>> immediately.
>>> >
>>> > Statements of intent shall only become binding when confirmed in
>> hard
>>> > copy by an authorised signatory.  The contents of this email may
>>> > relate to dealings with other companies under the control of Detica
>>> > Limited, details of which can be found at
>>> http://www.detica.com/statutory-information.
>>> >
>>> > Detica Limited is registered in England under No: 1337451.
>>> > Registered offices: Surrey Research Park, Guildford, Surrey, GU2
>> 7YP,
>>> > England.
>>> >
>>> >
>>> Please consider the environment before printing this email.
>>>
>>> This message should be regarded as confidential. If you have received
>> this
>>> email in error please notify the sender and destroy it immediately.
>>>
>>> Statements of intent shall only become binding when confirmed in hard
>> copy
>>> by an authorised signatory.  The contents of this email may relate to
>> dealings
>>> with other companies under the control of Detica Limited, details of
>> which
>>> can be found at http://www.detica.com/statutory-information.
>>>
>>> Detica Limited is registered in England under No: 1337451.
>>> Registered offices: Surrey Research Park, Guildford, Surrey, GU2 7YP,
>>> England.
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>> Please consider the environment before printing this email.
>>
>> This message should be regarded as confidential. If you have received this email in error please notify the sender and destroy it immediately.
>>
>> Statements of intent shall only become binding when confirmed in hard copy by an authorised signatory.  The contents of this email may relate to dealings with other companies under the control of Detica Limited, details of which can be found at http://www.detica.com/statutory-information.
>>
>> Detica Limited is registered in England under No: 1337451.
>> Registered offices: Surrey Research Park, Guildford, Surrey, GU2 7YP, England.
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
> Please consider the environment before printing this email.
>
> This message should be regarded as confidential. If you have received this email in error please notify the sender and destroy it immediately.
>
> Statements of intent shall only become binding when confirmed in hard copy by an authorised signatory.  The contents of this email may relate to dealings with other companies under the control of Detica Limited, details of which can be found at http://www.detica.com/statutory-information.
>
> Detica Limited is registered in England under No: 1337451.
> Registered offices: Surrey Research Park, Guildford, Surrey, GU2 7YP, England.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


RE: Corrupt segments file full of zeros

Posted by "Tarr, Gregory" <Gr...@detica.com>.
There was a segments_(N-1), which was a valid segments file and opened correctly in luke.

The trouble came because we had to manually rename these files in order to prevent the index from being wiped.

Thanks

Greg 

-----Original Message-----
From: Michael McCandless [mailto:lucene@mikemccandless.com] 
Sent: 28 June 2011 13:26
To: java-user@lucene.apache.org
Subject: Re: Corrupt segments file full of zeros

Is there only one segments_N file in the index (the one with all 0s)?
Or is there a segments_(N-1) too?

Mike McCandless

http://blog.mikemccandless.com

On Tue, Jun 28, 2011 at 8:17 AM, Tarr, Gregory <Gr...@detica.com> wrote:
> We don't have a -9 in the file. It isn't a valid lucene segments file, 
> as it only contains zeros.
>
> We're wondering why this opens in Luke, and why the CheckIndex reports 
> that the index is OK.
>
> -----Original Message-----
> From: mark harwood [mailto:markharw00d@yahoo.co.uk]
> Sent: 28 June 2011 13:09
> To: java-user@lucene.apache.org
> Subject: Re: Corrupt segments file full of zeros
>
> According to the spec there should at least be an Int32 of  -9 to 
> declare the Format - 
> http://lucene.apache.org/java/2_9_3/fileformats.html#Segments File
>
>
>
> ----- Original Message ----
> From: Uwe Schindler <uw...@thetaphi.de>
> To: java-user@lucene.apache.org
> Sent: Tue, 28 June, 2011 12:32:34
> Subject: RE: Corrupt segments file full of zeros
>
> So where is the problem at all? Why should a segments file not contain 
> lots of zeroes? If the index is not corrupt all is fine.
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
>
>
>> -----Original Message-----
>> From: Tarr, Gregory [mailto:Gregory.tarr@detica.com]
>> Sent: Tuesday, June 28, 2011 11:56 AM
>> To: java-user@lucene.apache.org
>> Subject: RE: Corrupt segments file full of zeros
>>
>> Yes I have done that, and you just get "No problems were detected 
>> with
> this
>> index"
>>
>> Surely there is a major problem with this index?
>>
>> Also the check() procedure takes a long time - is there any way you
> can
> just
>> do a health check on the segments file?
>>
>> Thanks
>>
>> Greg
>>
>> -----Original Message-----
>> From: Shai Erera [mailto:serera@gmail.com]
>> Sent: 28 June 2011 10:36
>> To: java-user@lucene.apache.org
>> Subject: Re: Corrupt segments file full of zeros
>>
>> You can try the CheckIndex tool. You feed it a directory and call
>> .check() and it reports the results.
>>
>> Shai
>>
>> On Tue, Jun 28, 2011 at 11:46 AM, Tarr, Gregory
>> <Gr...@detica.com>wrote:
>>
>> > We have a problem with our fileserver where our indexes are hosted 
>> > remotely, using Lucene 2.9.3.
>> >
>> > This can mean that a segments file is written which is full of 
>> > ASCII zeros. Using the od -ah command, we get:
>> >
>> > 0000000 nul nul nul nul nul nul nul....etc
>> >
>> > If opened in Luke, the index opens successfully but has zero
>> documents.
>> >
>> > Why does this open correctly in luke, and is there a procedure in
> the
>> > lucene code that can verify a segments file, e.g. check whether it 
>> > refers to any segments?
>> >
>> > Thanks
>> >
>> > Greg
>> >
>> >
>> > Please consider the environment before printing this email.
>> >
>> > This message should be regarded as confidential. If you have
> received
>> > this email in error please notify the sender and destroy it
>> immediately.
>> >
>> > Statements of intent shall only become binding when confirmed in
> hard
>> > copy by an authorised signatory.  The contents of this email may 
>> > relate to dealings with other companies under the control of Detica 
>> > Limited, details of which can be found at
>> http://www.detica.com/statutory-information.
>> >
>> > Detica Limited is registered in England under No: 1337451.
>> > Registered offices: Surrey Research Park, Guildford, Surrey, GU2
> 7YP,
>> > England.
>> >
>> >
>> Please consider the environment before printing this email.
>>
>> This message should be regarded as confidential. If you have received
> this
>> email in error please notify the sender and destroy it immediately.
>>
>> Statements of intent shall only become binding when confirmed in hard
> copy
>> by an authorised signatory.  The contents of this email may relate to
> dealings
>> with other companies under the control of Detica Limited, details of
> which
>> can be found at http://www.detica.com/statutory-information.
>>
>> Detica Limited is registered in England under No: 1337451.
>> Registered offices: Surrey Research Park, Guildford, Surrey, GU2 7YP, 
>> England.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
> Please consider the environment before printing this email.
>
> This message should be regarded as confidential. If you have received this email in error please notify the sender and destroy it immediately.
>
> Statements of intent shall only become binding when confirmed in hard copy by an authorised signatory.  The contents of this email may relate to dealings with other companies under the control of Detica Limited, details of which can be found at http://www.detica.com/statutory-information.
>
> Detica Limited is registered in England under No: 1337451.
> Registered offices: Surrey Research Park, Guildford, Surrey, GU2 7YP, England.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Please consider the environment before printing this email.

This message should be regarded as confidential. If you have received this email in error please notify the sender and destroy it immediately.

Statements of intent shall only become binding when confirmed in hard copy by an authorised signatory.  The contents of this email may relate to dealings with other companies under the control of Detica Limited, details of which can be found at http://www.detica.com/statutory-information.

Detica Limited is registered in England under No: 1337451.
Registered offices: Surrey Research Park, Guildford, Surrey, GU2 7YP, England.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Corrupt segments file full of zeros

Posted by Michael McCandless <lu...@mikemccandless.com>.
Is there only one segments_N file in the index (the one with all 0s)?
Or is there a segments_(N-1) too?

Mike McCandless

http://blog.mikemccandless.com

On Tue, Jun 28, 2011 at 8:17 AM, Tarr, Gregory <Gr...@detica.com> wrote:
> We don't have a -9 in the file. It isn't a valid lucene segments file,
> as it only contains zeros.
>
> We're wondering why this opens in Luke, and why the CheckIndex reports
> that the index is OK.
>
> -----Original Message-----
> From: mark harwood [mailto:markharw00d@yahoo.co.uk]
> Sent: 28 June 2011 13:09
> To: java-user@lucene.apache.org
> Subject: Re: Corrupt segments file full of zeros
>
> According to the spec there should at least be an Int32 of  -9 to
> declare the Format -
> http://lucene.apache.org/java/2_9_3/fileformats.html#Segments File
>
>
>
> ----- Original Message ----
> From: Uwe Schindler <uw...@thetaphi.de>
> To: java-user@lucene.apache.org
> Sent: Tue, 28 June, 2011 12:32:34
> Subject: RE: Corrupt segments file full of zeros
>
> So where is the problem at all? Why should a segments file not contain
> lots
> of zeroes? If the index is not corrupt all is fine.
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
>
>
>> -----Original Message-----
>> From: Tarr, Gregory [mailto:Gregory.tarr@detica.com]
>> Sent: Tuesday, June 28, 2011 11:56 AM
>> To: java-user@lucene.apache.org
>> Subject: RE: Corrupt segments file full of zeros
>>
>> Yes I have done that, and you just get "No problems were detected with
> this
>> index"
>>
>> Surely there is a major problem with this index?
>>
>> Also the check() procedure takes a long time - is there any way you
> can
> just
>> do a health check on the segments file?
>>
>> Thanks
>>
>> Greg
>>
>> -----Original Message-----
>> From: Shai Erera [mailto:serera@gmail.com]
>> Sent: 28 June 2011 10:36
>> To: java-user@lucene.apache.org
>> Subject: Re: Corrupt segments file full of zeros
>>
>> You can try the CheckIndex tool. You feed it a directory and call
>> .check() and it reports the results.
>>
>> Shai
>>
>> On Tue, Jun 28, 2011 at 11:46 AM, Tarr, Gregory
>> <Gr...@detica.com>wrote:
>>
>> > We have a problem with our fileserver where our indexes are hosted
>> > remotely, using Lucene 2.9.3.
>> >
>> > This can mean that a segments file is written which is full of ASCII
>> > zeros. Using the od -ah command, we get:
>> >
>> > 0000000 nul nul nul nul nul nul nul....etc
>> >
>> > If opened in Luke, the index opens successfully but has zero
>> documents.
>> >
>> > Why does this open correctly in luke, and is there a procedure in
> the
>> > lucene code that can verify a segments file, e.g. check whether it
>> > refers to any segments?
>> >
>> > Thanks
>> >
>> > Greg
>> >
>> >
>> > Please consider the environment before printing this email.
>> >
>> > This message should be regarded as confidential. If you have
> received
>> > this email in error please notify the sender and destroy it
>> immediately.
>> >
>> > Statements of intent shall only become binding when confirmed in
> hard
>> > copy by an authorised signatory.  The contents of this email may
>> > relate to dealings with other companies under the control of Detica
>> > Limited, details of which can be found at
>> http://www.detica.com/statutory-information.
>> >
>> > Detica Limited is registered in England under No: 1337451.
>> > Registered offices: Surrey Research Park, Guildford, Surrey, GU2
> 7YP,
>> > England.
>> >
>> >
>> Please consider the environment before printing this email.
>>
>> This message should be regarded as confidential. If you have received
> this
>> email in error please notify the sender and destroy it immediately.
>>
>> Statements of intent shall only become binding when confirmed in hard
> copy
>> by an authorised signatory.  The contents of this email may relate to
> dealings
>> with other companies under the control of Detica Limited, details of
> which
>> can be found at http://www.detica.com/statutory-information.
>>
>> Detica Limited is registered in England under No: 1337451.
>> Registered offices: Surrey Research Park, Guildford, Surrey, GU2 7YP,
>> England.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
> Please consider the environment before printing this email.
>
> This message should be regarded as confidential. If you have received this email in error please notify the sender and destroy it immediately.
>
> Statements of intent shall only become binding when confirmed in hard copy by an authorised signatory.  The contents of this email may relate to dealings with other companies under the control of Detica Limited, details of which can be found at http://www.detica.com/statutory-information.
>
> Detica Limited is registered in England under No: 1337451.
> Registered offices: Surrey Research Park, Guildford, Surrey, GU2 7YP, England.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


RE: Corrupt segments file full of zeros

Posted by "Tarr, Gregory" <Gr...@detica.com>.
We don't have a -9 in the file. It isn't a valid lucene segments file,
as it only contains zeros.

We're wondering why this opens in Luke, and why the CheckIndex reports
that the index is OK. 

-----Original Message-----
From: mark harwood [mailto:markharw00d@yahoo.co.uk] 
Sent: 28 June 2011 13:09
To: java-user@lucene.apache.org
Subject: Re: Corrupt segments file full of zeros

According to the spec there should at least be an Int32 of  -9 to
declare the Format -
http://lucene.apache.org/java/2_9_3/fileformats.html#Segments File



----- Original Message ----
From: Uwe Schindler <uw...@thetaphi.de>
To: java-user@lucene.apache.org
Sent: Tue, 28 June, 2011 12:32:34
Subject: RE: Corrupt segments file full of zeros

So where is the problem at all? Why should a segments file not contain
lots
of zeroes? If the index is not corrupt all is fine.

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de


> -----Original Message-----
> From: Tarr, Gregory [mailto:Gregory.tarr@detica.com]
> Sent: Tuesday, June 28, 2011 11:56 AM
> To: java-user@lucene.apache.org
> Subject: RE: Corrupt segments file full of zeros
> 
> Yes I have done that, and you just get "No problems were detected with
this
> index"
> 
> Surely there is a major problem with this index?
> 
> Also the check() procedure takes a long time - is there any way you
can
just
> do a health check on the segments file?
> 
> Thanks
> 
> Greg
> 
> -----Original Message-----
> From: Shai Erera [mailto:serera@gmail.com]
> Sent: 28 June 2011 10:36
> To: java-user@lucene.apache.org
> Subject: Re: Corrupt segments file full of zeros
> 
> You can try the CheckIndex tool. You feed it a directory and call
> .check() and it reports the results.
> 
> Shai
> 
> On Tue, Jun 28, 2011 at 11:46 AM, Tarr, Gregory
> <Gr...@detica.com>wrote:
> 
> > We have a problem with our fileserver where our indexes are hosted
> > remotely, using Lucene 2.9.3.
> >
> > This can mean that a segments file is written which is full of ASCII
> > zeros. Using the od -ah command, we get:
> >
> > 0000000 nul nul nul nul nul nul nul....etc
> >
> > If opened in Luke, the index opens successfully but has zero
> documents.
> >
> > Why does this open correctly in luke, and is there a procedure in
the
> > lucene code that can verify a segments file, e.g. check whether it
> > refers to any segments?
> >
> > Thanks
> >
> > Greg
> >
> >
> > Please consider the environment before printing this email.
> >
> > This message should be regarded as confidential. If you have
received
> > this email in error please notify the sender and destroy it
> immediately.
> >
> > Statements of intent shall only become binding when confirmed in
hard
> > copy by an authorised signatory.  The contents of this email may
> > relate to dealings with other companies under the control of Detica
> > Limited, details of which can be found at
> http://www.detica.com/statutory-information.
> >
> > Detica Limited is registered in England under No: 1337451.
> > Registered offices: Surrey Research Park, Guildford, Surrey, GU2
7YP,
> > England.
> >
> >
> Please consider the environment before printing this email.
> 
> This message should be regarded as confidential. If you have received
this
> email in error please notify the sender and destroy it immediately.
> 
> Statements of intent shall only become binding when confirmed in hard
copy
> by an authorised signatory.  The contents of this email may relate to
dealings
> with other companies under the control of Detica Limited, details of
which
> can be found at http://www.detica.com/statutory-information.
> 
> Detica Limited is registered in England under No: 1337451.
> Registered offices: Surrey Research Park, Guildford, Surrey, GU2 7YP,
> England.
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Please consider the environment before printing this email.

This message should be regarded as confidential. If you have received this email in error please notify the sender and destroy it immediately.

Statements of intent shall only become binding when confirmed in hard copy by an authorised signatory.  The contents of this email may relate to dealings with other companies under the control of Detica Limited, details of which can be found at http://www.detica.com/statutory-information.

Detica Limited is registered in England under No: 1337451.
Registered offices: Surrey Research Park, Guildford, Surrey, GU2 7YP, England.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Corrupt segments file full of zeros

Posted by mark harwood <ma...@yahoo.co.uk>.
According to the spec there should at least be an Int32 of  -9 to declare the 
Format - http://lucene.apache.org/java/2_9_3/fileformats.html#Segments File



----- Original Message ----
From: Uwe Schindler <uw...@thetaphi.de>
To: java-user@lucene.apache.org
Sent: Tue, 28 June, 2011 12:32:34
Subject: RE: Corrupt segments file full of zeros

So where is the problem at all? Why should a segments file not contain lots
of zeroes? If the index is not corrupt all is fine.

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de


> -----Original Message-----
> From: Tarr, Gregory [mailto:Gregory.tarr@detica.com]
> Sent: Tuesday, June 28, 2011 11:56 AM
> To: java-user@lucene.apache.org
> Subject: RE: Corrupt segments file full of zeros
> 
> Yes I have done that, and you just get "No problems were detected with
this
> index"
> 
> Surely there is a major problem with this index?
> 
> Also the check() procedure takes a long time - is there any way you can
just
> do a health check on the segments file?
> 
> Thanks
> 
> Greg
> 
> -----Original Message-----
> From: Shai Erera [mailto:serera@gmail.com]
> Sent: 28 June 2011 10:36
> To: java-user@lucene.apache.org
> Subject: Re: Corrupt segments file full of zeros
> 
> You can try the CheckIndex tool. You feed it a directory and call
> .check() and it reports the results.
> 
> Shai
> 
> On Tue, Jun 28, 2011 at 11:46 AM, Tarr, Gregory
> <Gr...@detica.com>wrote:
> 
> > We have a problem with our fileserver where our indexes are hosted
> > remotely, using Lucene 2.9.3.
> >
> > This can mean that a segments file is written which is full of ASCII
> > zeros. Using the od -ah command, we get:
> >
> > 0000000 nul nul nul nul nul nul nul....etc
> >
> > If opened in Luke, the index opens successfully but has zero
> documents.
> >
> > Why does this open correctly in luke, and is there a procedure in the
> > lucene code that can verify a segments file, e.g. check whether it
> > refers to any segments?
> >
> > Thanks
> >
> > Greg
> >
> >
> > Please consider the environment before printing this email.
> >
> > This message should be regarded as confidential. If you have received
> > this email in error please notify the sender and destroy it
> immediately.
> >
> > Statements of intent shall only become binding when confirmed in hard
> > copy by an authorised signatory.  The contents of this email may
> > relate to dealings with other companies under the control of Detica
> > Limited, details of which can be found at
> http://www.detica.com/statutory-information.
> >
> > Detica Limited is registered in England under No: 1337451.
> > Registered offices: Surrey Research Park, Guildford, Surrey, GU2 7YP,
> > England.
> >
> >
> Please consider the environment before printing this email.
> 
> This message should be regarded as confidential. If you have received this
> email in error please notify the sender and destroy it immediately.
> 
> Statements of intent shall only become binding when confirmed in hard copy
> by an authorised signatory.  The contents of this email may relate to
dealings
> with other companies under the control of Detica Limited, details of which
> can be found at http://www.detica.com/statutory-information.
> 
> Detica Limited is registered in England under No: 1337451.
> Registered offices: Surrey Research Park, Guildford, Surrey, GU2 7YP,
> England.
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


RE: Corrupt segments file full of zeros

Posted by "Tarr, Gregory" <Gr...@detica.com>.
The segments file containing lots of zeros means that the index has no
segments.

We could run the following to check this:

SegmentInfos sis = new SegmentInfos();
sis.read(indexDir);
int numSegments = sis.size();
if (numSegments < 1) { // index has no segments }

Greg 

-----Original Message-----
From: Uwe Schindler [mailto:uwe@thetaphi.de] 
Sent: 28 June 2011 12:33
To: java-user@lucene.apache.org
Subject: RE: Corrupt segments file full of zeros

So where is the problem at all? Why should a segments file not contain
lots of zeroes? If the index is not corrupt all is fine.

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de


> -----Original Message-----
> From: Tarr, Gregory [mailto:Gregory.tarr@detica.com]
> Sent: Tuesday, June 28, 2011 11:56 AM
> To: java-user@lucene.apache.org
> Subject: RE: Corrupt segments file full of zeros
> 
> Yes I have done that, and you just get "No problems were detected with
this
> index"
> 
> Surely there is a major problem with this index?
> 
> Also the check() procedure takes a long time - is there any way you 
> can
just
> do a health check on the segments file?
> 
> Thanks
> 
> Greg
> 
> -----Original Message-----
> From: Shai Erera [mailto:serera@gmail.com]
> Sent: 28 June 2011 10:36
> To: java-user@lucene.apache.org
> Subject: Re: Corrupt segments file full of zeros
> 
> You can try the CheckIndex tool. You feed it a directory and call
> .check() and it reports the results.
> 
> Shai
> 
> On Tue, Jun 28, 2011 at 11:46 AM, Tarr, Gregory
> <Gr...@detica.com>wrote:
> 
> > We have a problem with our fileserver where our indexes are hosted 
> > remotely, using Lucene 2.9.3.
> >
> > This can mean that a segments file is written which is full of ASCII

> > zeros. Using the od -ah command, we get:
> >
> > 0000000 nul nul nul nul nul nul nul....etc
> >
> > If opened in Luke, the index opens successfully but has zero
> documents.
> >
> > Why does this open correctly in luke, and is there a procedure in 
> > the lucene code that can verify a segments file, e.g. check whether 
> > it refers to any segments?
> >
> > Thanks
> >
> > Greg
> >
> >
> > Please consider the environment before printing this email.
> >
> > This message should be regarded as confidential. If you have 
> > received this email in error please notify the sender and destroy it
> immediately.
> >
> > Statements of intent shall only become binding when confirmed in 
> > hard copy by an authorised signatory.  The contents of this email 
> > may relate to dealings with other companies under the control of 
> > Detica Limited, details of which can be found at
> http://www.detica.com/statutory-information.
> >
> > Detica Limited is registered in England under No: 1337451.
> > Registered offices: Surrey Research Park, Guildford, Surrey, GU2 
> > 7YP, England.
> >
> >
> Please consider the environment before printing this email.
> 
> This message should be regarded as confidential. If you have received 
> this email in error please notify the sender and destroy it
immediately.
> 
> Statements of intent shall only become binding when confirmed in hard 
> copy by an authorised signatory.  The contents of this email may 
> relate to
dealings
> with other companies under the control of Detica Limited, details of 
> which can be found at http://www.detica.com/statutory-information.
> 
> Detica Limited is registered in England under No: 1337451.
> Registered offices: Surrey Research Park, Guildford, Surrey, GU2 7YP, 
> England.
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Please consider the environment before printing this email.

This message should be regarded as confidential. If you have received this email in error please notify the sender and destroy it immediately.

Statements of intent shall only become binding when confirmed in hard copy by an authorised signatory.  The contents of this email may relate to dealings with other companies under the control of Detica Limited, details of which can be found at http://www.detica.com/statutory-information.

Detica Limited is registered in England under No: 1337451.
Registered offices: Surrey Research Park, Guildford, Surrey, GU2 7YP, England.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


RE: Corrupt segments file full of zeros

Posted by Uwe Schindler <uw...@thetaphi.de>.
So where is the problem at all? Why should a segments file not contain lots
of zeroes? If the index is not corrupt all is fine.

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de


> -----Original Message-----
> From: Tarr, Gregory [mailto:Gregory.tarr@detica.com]
> Sent: Tuesday, June 28, 2011 11:56 AM
> To: java-user@lucene.apache.org
> Subject: RE: Corrupt segments file full of zeros
> 
> Yes I have done that, and you just get "No problems were detected with
this
> index"
> 
> Surely there is a major problem with this index?
> 
> Also the check() procedure takes a long time - is there any way you can
just
> do a health check on the segments file?
> 
> Thanks
> 
> Greg
> 
> -----Original Message-----
> From: Shai Erera [mailto:serera@gmail.com]
> Sent: 28 June 2011 10:36
> To: java-user@lucene.apache.org
> Subject: Re: Corrupt segments file full of zeros
> 
> You can try the CheckIndex tool. You feed it a directory and call
> .check() and it reports the results.
> 
> Shai
> 
> On Tue, Jun 28, 2011 at 11:46 AM, Tarr, Gregory
> <Gr...@detica.com>wrote:
> 
> > We have a problem with our fileserver where our indexes are hosted
> > remotely, using Lucene 2.9.3.
> >
> > This can mean that a segments file is written which is full of ASCII
> > zeros. Using the od -ah command, we get:
> >
> > 0000000 nul nul nul nul nul nul nul....etc
> >
> > If opened in Luke, the index opens successfully but has zero
> documents.
> >
> > Why does this open correctly in luke, and is there a procedure in the
> > lucene code that can verify a segments file, e.g. check whether it
> > refers to any segments?
> >
> > Thanks
> >
> > Greg
> >
> >
> > Please consider the environment before printing this email.
> >
> > This message should be regarded as confidential. If you have received
> > this email in error please notify the sender and destroy it
> immediately.
> >
> > Statements of intent shall only become binding when confirmed in hard
> > copy by an authorised signatory.  The contents of this email may
> > relate to dealings with other companies under the control of Detica
> > Limited, details of which can be found at
> http://www.detica.com/statutory-information.
> >
> > Detica Limited is registered in England under No: 1337451.
> > Registered offices: Surrey Research Park, Guildford, Surrey, GU2 7YP,
> > England.
> >
> >
> Please consider the environment before printing this email.
> 
> This message should be regarded as confidential. If you have received this
> email in error please notify the sender and destroy it immediately.
> 
> Statements of intent shall only become binding when confirmed in hard copy
> by an authorised signatory.  The contents of this email may relate to
dealings
> with other companies under the control of Detica Limited, details of which
> can be found at http://www.detica.com/statutory-information.
> 
> Detica Limited is registered in England under No: 1337451.
> Registered offices: Surrey Research Park, Guildford, Surrey, GU2 7YP,
> England.
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


RE: Corrupt segments file full of zeros

Posted by "Tarr, Gregory" <Gr...@detica.com>.
Yes I have done that, and you just get "No problems were detected with
this index"

Surely there is a major problem with this index?

Also the check() procedure takes a long time - is there any way you can
just do a health check on the segments file?

Thanks

Greg 

-----Original Message-----
From: Shai Erera [mailto:serera@gmail.com] 
Sent: 28 June 2011 10:36
To: java-user@lucene.apache.org
Subject: Re: Corrupt segments file full of zeros

You can try the CheckIndex tool. You feed it a directory and call
.check() and it reports the results.

Shai

On Tue, Jun 28, 2011 at 11:46 AM, Tarr, Gregory
<Gr...@detica.com>wrote:

> We have a problem with our fileserver where our indexes are hosted 
> remotely, using Lucene 2.9.3.
>
> This can mean that a segments file is written which is full of ASCII 
> zeros. Using the od -ah command, we get:
>
> 0000000 nul nul nul nul nul nul nul....etc
>
> If opened in Luke, the index opens successfully but has zero
documents.
>
> Why does this open correctly in luke, and is there a procedure in the 
> lucene code that can verify a segments file, e.g. check whether it 
> refers to any segments?
>
> Thanks
>
> Greg
>
>
> Please consider the environment before printing this email.
>
> This message should be regarded as confidential. If you have received 
> this email in error please notify the sender and destroy it
immediately.
>
> Statements of intent shall only become binding when confirmed in hard 
> copy by an authorised signatory.  The contents of this email may 
> relate to dealings with other companies under the control of Detica 
> Limited, details of which can be found at
http://www.detica.com/statutory-information.
>
> Detica Limited is registered in England under No: 1337451.
> Registered offices: Surrey Research Park, Guildford, Surrey, GU2 7YP, 
> England.
>
>
Please consider the environment before printing this email.

This message should be regarded as confidential. If you have received this email in error please notify the sender and destroy it immediately.

Statements of intent shall only become binding when confirmed in hard copy by an authorised signatory.  The contents of this email may relate to dealings with other companies under the control of Detica Limited, details of which can be found at http://www.detica.com/statutory-information.

Detica Limited is registered in England under No: 1337451.
Registered offices: Surrey Research Park, Guildford, Surrey, GU2 7YP, England.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Corrupt segments file full of zeros

Posted by Shai Erera <se...@gmail.com>.
You can try the CheckIndex tool. You feed it a directory and call .check()
and it reports the results.

Shai

On Tue, Jun 28, 2011 at 11:46 AM, Tarr, Gregory <Gr...@detica.com>wrote:

> We have a problem with our fileserver where our indexes are hosted
> remotely, using Lucene 2.9.3.
>
> This can mean that a segments file is written which is full of ASCII
> zeros. Using the od -ah command, we get:
>
> 0000000 nul nul nul nul nul nul nul....etc
>
> If opened in Luke, the index opens successfully but has zero documents.
>
> Why does this open correctly in luke, and is there a procedure in the
> lucene code that can verify a segments file, e.g. check whether it
> refers to any segments?
>
> Thanks
>
> Greg
>
>
> Please consider the environment before printing this email.
>
> This message should be regarded as confidential. If you have received this
> email in error please notify the sender and destroy it immediately.
>
> Statements of intent shall only become binding when confirmed in hard copy
> by an authorised signatory.  The contents of this email may relate to
> dealings with other companies under the control of Detica Limited, details
> of which can be found at http://www.detica.com/statutory-information.
>
> Detica Limited is registered in England under No: 1337451.
> Registered offices: Surrey Research Park, Guildford, Surrey, GU2 7YP,
> England.
>
>