You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Adrian Hempel (JIRA)" <ji...@apache.org> on 2009/06/15 09:26:07 UTC

[jira] Created: (LUCENE-1691) An index copied over another index can result in corruption

An index copied over another index can result in corruption
-----------------------------------------------------------

                 Key: LUCENE-1691
                 URL: https://issues.apache.org/jira/browse/LUCENE-1691
             Project: Lucene - Java
          Issue Type: Improvement
          Components: Store
            Reporter: Adrian Hempel
            Priority: Minor
             Fix For: 2.4.1


After restoring an older backup of an index over the top of a newer version of the index, attempts to open the index can result in CorruptIndexExceptions, such as:

{noformat}
Caused by: org.apache.lucene.index.CorruptIndexException: doc counts differ for segment _ed: fieldsReader shows 1137 but segmentInfo shows 1389
    at org.apache.lucene.index.SegmentReader.initialize(SegmentReader.java:362)
    at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:306)
    at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:228)
    at org.apache.lucene.index.MultiSegmentReader.<init>(MultiSegmentReader.java:55)
    at org.apache.lucene.index.ReadOnlyMultiSegmentReader.<init>(ReadOnlyMultiSegmentReader.java:27)
    at org.apache.lucene.index.DirectoryIndexReader$1.doBody(DirectoryIndexReader.java:102)
    at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:653)
    at org.apache.lucene.index.DirectoryIndexReader.open(DirectoryIndexReader.java:115)
    at org.apache.lucene.index.IndexReader.open(IndexReader.java:316)
    at org.apache.lucene.index.IndexReader.open(IndexReader.java:237)
{noformat}

The apparent cause is the strategy of taking the maximum of the ID in the segments.gen file, and the IDs of the apparently valid segment files (See lines 523-593 [here|http://svn.apache.org/viewvc/lucene/java/tags/lucene_2_4_1/src/java/org/apache/lucene/index/SegmentInfos.java?annotate=751393]), and using this as the current generation of the index.  This will include "stale" segments that existed before the backup was restored.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1691) An index copied over another index can result in corruption

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12719513#action_12719513 ] 

Michael McCandless commented on LUCENE-1691:
--------------------------------------------

Copying over an existing index, without first removing all files in that index, is not a supported use case for Lucene.

Ie, to restore from backup you should make an empty dir and copy back your index files.

> An index copied over another index can result in corruption
> -----------------------------------------------------------
>
>                 Key: LUCENE-1691
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1691
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Store
>            Reporter: Adrian Hempel
>            Priority: Minor
>             Fix For: 2.4.1
>
>
> After restoring an older backup of an index over the top of a newer version of the index, attempts to open the index can result in CorruptIndexExceptions, such as:
> {noformat}
> Caused by: org.apache.lucene.index.CorruptIndexException: doc counts differ for segment _ed: fieldsReader shows 1137 but segmentInfo shows 1389
>     at org.apache.lucene.index.SegmentReader.initialize(SegmentReader.java:362)
>     at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:306)
>     at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:228)
>     at org.apache.lucene.index.MultiSegmentReader.<init>(MultiSegmentReader.java:55)
>     at org.apache.lucene.index.ReadOnlyMultiSegmentReader.<init>(ReadOnlyMultiSegmentReader.java:27)
>     at org.apache.lucene.index.DirectoryIndexReader$1.doBody(DirectoryIndexReader.java:102)
>     at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:653)
>     at org.apache.lucene.index.DirectoryIndexReader.open(DirectoryIndexReader.java:115)
>     at org.apache.lucene.index.IndexReader.open(IndexReader.java:316)
>     at org.apache.lucene.index.IndexReader.open(IndexReader.java:237)
> {noformat}
> The apparent cause is the strategy of taking the maximum of the ID in the segments.gen file, and the IDs of the apparently valid segment files (See lines 523-593 [here|http://svn.apache.org/viewvc/lucene/java/tags/lucene_2_4_1/src/java/org/apache/lucene/index/SegmentInfos.java?annotate=751393]), and using this as the current generation of the index.  This will include "stale" segments that existed before the backup was restored.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: [jira] Commented: (LUCENE-1691) An index copied over another index can result in corruption

Posted by Mark Miller <ma...@gmail.com>.
Adrian Hempel (JIRA) wrote:
>     [ https://issues.apache.org/jira/browse/LUCENE-1691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12719520#action_12719520 ] 
>
> Adrian Hempel commented on LUCENE-1691:
> ---------------------------------------
>
> I realised that would probably be the case, but in the real world, this will be a common occurrence.
>   
Delete the index you are copying over first?
> Hence my raising this issue as an "Improvement" rather than a "Bug".
>
>   


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1691) An index copied over another index can result in corruption

Posted by "Adrian Hempel (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12719520#action_12719520 ] 

Adrian Hempel commented on LUCENE-1691:
---------------------------------------

I realised that would probably be the case, but in the real world, this will be a common occurrence.

Hence my raising this issue as an "Improvement" rather than a "Bug".

> An index copied over another index can result in corruption
> -----------------------------------------------------------
>
>                 Key: LUCENE-1691
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1691
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Store
>            Reporter: Adrian Hempel
>            Priority: Minor
>             Fix For: 2.4.1
>
>
> After restoring an older backup of an index over the top of a newer version of the index, attempts to open the index can result in CorruptIndexExceptions, such as:
> {noformat}
> Caused by: org.apache.lucene.index.CorruptIndexException: doc counts differ for segment _ed: fieldsReader shows 1137 but segmentInfo shows 1389
>     at org.apache.lucene.index.SegmentReader.initialize(SegmentReader.java:362)
>     at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:306)
>     at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:228)
>     at org.apache.lucene.index.MultiSegmentReader.<init>(MultiSegmentReader.java:55)
>     at org.apache.lucene.index.ReadOnlyMultiSegmentReader.<init>(ReadOnlyMultiSegmentReader.java:27)
>     at org.apache.lucene.index.DirectoryIndexReader$1.doBody(DirectoryIndexReader.java:102)
>     at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:653)
>     at org.apache.lucene.index.DirectoryIndexReader.open(DirectoryIndexReader.java:115)
>     at org.apache.lucene.index.IndexReader.open(IndexReader.java:316)
>     at org.apache.lucene.index.IndexReader.open(IndexReader.java:237)
> {noformat}
> The apparent cause is the strategy of taking the maximum of the ID in the segments.gen file, and the IDs of the apparently valid segment files (See lines 523-593 [here|http://svn.apache.org/viewvc/lucene/java/tags/lucene_2_4_1/src/java/org/apache/lucene/index/SegmentInfos.java?annotate=751393]), and using this as the current generation of the index.  This will include "stale" segments that existed before the backup was restored.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Resolved: (LUCENE-1691) An index copied over another index can result in corruption

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mark Miller resolved LUCENE-1691.
---------------------------------

    Resolution: Won't Fix

> An index copied over another index can result in corruption
> -----------------------------------------------------------
>
>                 Key: LUCENE-1691
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1691
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Store
>    Affects Versions: 2.4.1
>            Reporter: Adrian Hempel
>            Priority: Minor
>
> After restoring an older backup of an index over the top of a newer version of the index, attempts to open the index can result in CorruptIndexExceptions, such as:
> {noformat}
> Caused by: org.apache.lucene.index.CorruptIndexException: doc counts differ for segment _ed: fieldsReader shows 1137 but segmentInfo shows 1389
>     at org.apache.lucene.index.SegmentReader.initialize(SegmentReader.java:362)
>     at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:306)
>     at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:228)
>     at org.apache.lucene.index.MultiSegmentReader.<init>(MultiSegmentReader.java:55)
>     at org.apache.lucene.index.ReadOnlyMultiSegmentReader.<init>(ReadOnlyMultiSegmentReader.java:27)
>     at org.apache.lucene.index.DirectoryIndexReader$1.doBody(DirectoryIndexReader.java:102)
>     at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:653)
>     at org.apache.lucene.index.DirectoryIndexReader.open(DirectoryIndexReader.java:115)
>     at org.apache.lucene.index.IndexReader.open(IndexReader.java:316)
>     at org.apache.lucene.index.IndexReader.open(IndexReader.java:237)
> {noformat}
> The apparent cause is the strategy of taking the maximum of the ID in the segments.gen file, and the IDs of the apparently valid segment files (See lines 523-593 [here|http://svn.apache.org/viewvc/lucene/java/tags/lucene_2_4_1/src/java/org/apache/lucene/index/SegmentInfos.java?annotate=751393]), and using this as the current generation of the index.  This will include "stale" segments that existed before the backup was restored.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Updated: (LUCENE-1691) An index copied over another index can result in corruption

Posted by "Adrian Hempel (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Adrian Hempel updated LUCENE-1691:
----------------------------------

        Fix Version/s:     (was: 2.4.1)
    Affects Version/s: 2.4.1

> An index copied over another index can result in corruption
> -----------------------------------------------------------
>
>                 Key: LUCENE-1691
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1691
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Store
>    Affects Versions: 2.4.1
>            Reporter: Adrian Hempel
>            Priority: Minor
>
> After restoring an older backup of an index over the top of a newer version of the index, attempts to open the index can result in CorruptIndexExceptions, such as:
> {noformat}
> Caused by: org.apache.lucene.index.CorruptIndexException: doc counts differ for segment _ed: fieldsReader shows 1137 but segmentInfo shows 1389
>     at org.apache.lucene.index.SegmentReader.initialize(SegmentReader.java:362)
>     at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:306)
>     at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:228)
>     at org.apache.lucene.index.MultiSegmentReader.<init>(MultiSegmentReader.java:55)
>     at org.apache.lucene.index.ReadOnlyMultiSegmentReader.<init>(ReadOnlyMultiSegmentReader.java:27)
>     at org.apache.lucene.index.DirectoryIndexReader$1.doBody(DirectoryIndexReader.java:102)
>     at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:653)
>     at org.apache.lucene.index.DirectoryIndexReader.open(DirectoryIndexReader.java:115)
>     at org.apache.lucene.index.IndexReader.open(IndexReader.java:316)
>     at org.apache.lucene.index.IndexReader.open(IndexReader.java:237)
> {noformat}
> The apparent cause is the strategy of taking the maximum of the ID in the segments.gen file, and the IDs of the apparently valid segment files (See lines 523-593 [here|http://svn.apache.org/viewvc/lucene/java/tags/lucene_2_4_1/src/java/org/apache/lucene/index/SegmentInfos.java?annotate=751393]), and using this as the current generation of the index.  This will include "stale" segments that existed before the backup was restored.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org