You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@lucene.apache.org by "Nico Krijnen (JIRA)" <ji...@apache.org> on 2010/10/29 17:24:31 UTC

[jira] Created: (LUCENE-2729) Index corruption after 'read past EOF' under heavy update load

Index corruption after 'read past EOF' under heavy update load
--------------------------------------------------------------

                 Key: LUCENE-2729
                 URL: https://issues.apache.org/jira/browse/LUCENE-2729
             Project: Lucene - Java
          Issue Type: Bug
          Components: Index
    Affects Versions: 3.0.2, 3.0.1
         Environment: Happens on both OS X 10.6 and Windows 2008 Server. Integrated with zoie.
            Reporter: Nico Krijnen


We have a system running lucene and zoie. We use lucene as a content store for a CMS/DAM system. We use the hot-backup feature of zoie to make scheduled backups of the index. This works fine for small indexes and when there are not a lot of changes to the index when the backup is made.

On large indexes (about 5 GB to 19 GB), when a backup is made while the index is being changed a lot (lots of document additions and/or deletions), we almost always get a 'read past EOF' at some point, followed by lots of 'Lock obtain timed out'.
At that point we get lots of 0 kb files in the index, data gets lots, and the index is unusable.

When we stop our server, remove the 0kb files and restart our server, the index is operational again, but data has been lost.

I'm not sure if this is a zoie or a lucene issue, so i'm posting it to both. Hopefully someone has some ideas where to look to fix this.


Some more details...

Stack trace of the read past EOF and following Lock obtain timed out:

{code}
78307 [proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader@31ca5085] ERROR proj.zoie.impl.indexing.internal.BaseSearchIndex - read past EOF
java.io.IOException: read past EOF
    at org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:154)
    at org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:39)
    at org.apache.lucene.store.ChecksumIndexInput.readByte(ChecksumIndexInput.java:37)
    at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:69)
    at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:245)
    at org.apache.lucene.index.IndexFileDeleter.<init>(IndexFileDeleter.java:166)
    at org.apache.lucene.index.DirectoryReader.doCommit(DirectoryReader.java:725)
    at org.apache.lucene.index.IndexReader.commit(IndexReader.java:987)
    at org.apache.lucene.index.IndexReader.commit(IndexReader.java:973)
    at org.apache.lucene.index.IndexReader.decRef(IndexReader.java:162)
    at org.apache.lucene.index.IndexReader.close(IndexReader.java:1003)
    at proj.zoie.impl.indexing.internal.BaseSearchIndex.deleteDocs(BaseSearchIndex.java:203)
    at proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:223)
    at proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153)
    at proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134)
    at proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171)
    at proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:373)
579336 [proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader@31ca5085] ERROR proj.zoie.impl.indexing.internal.LuceneIndexDataLoader - Problem copying segments: Lock obtain timed out: org.apache.lucene.store.SingleInstanceLock@5ad0b895: write.lock
org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: org.apache.lucene.store.SingleInstanceLock@5ad0b895: write.lock
    at org.apache.lucene.store.Lock.obtain(Lock.java:84)
    at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1060)
    at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:957)
    at proj.zoie.impl.indexing.internal.DiskSearchIndex.openIndexWriter(DiskSearchIndex.java:176)
    at proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:228)
    at proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153)
    at proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134)
    at proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171)
    at proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:373)
{code}

We get exactly the same behavour on both OS X and on Windows. On both zoie is using a SimpleFSDirectory.
We also use a SingleInstanceLockFactory (since our process is the only one working with the index), but we get the same behaviour with a NativeFSLock.

The snapshot backup is being made by calling:

*proj.zoie.impl.indexing.ZoieSystem.exportSnapshot(WritableByteChannel)*

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] Commented: (LUCENE-2729) Index corruption after 'read past EOF' under heavy update load and snapshot export

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12932719#action_12932719 ] 

Michael McCandless commented on LUCENE-2729:
--------------------------------------------

Can you print the full stack trace when files are opened for writing?  Ie we need to know who wrote the zero length segments_2q.

Also, can you post "svn diff" output against Lucene's sources?  This would help me understand the log file.

Did you instrument each file-copy being done by the backup process?  And, when the backup process starts.

On the next run, can you enable the static infoStream in SegmentInfos.java?  This will print details about how it's finding the segments file.

Also, can you not turn on infoStream for IndexWriters against the RAMDirs?  This should cut down drastically on the log file.

Lucene normally should never re-use a segments file -- the geneation only ever increase on each commit to the index.  Yes, SegmentInfos.read sets the generation in SegmentInfos, but this is only called once, upon creating a new SegmentInfos().  It should never be overwritten later by Lucene calling read again.

So long as the deletion policy is keeping that commit point alive (segments_2q), Lucene should not be deleting it.

I sometimes see "writing <filename>", eg:

{noformat}
13:27:02,554 [http-8080-2] TRACE com.ds.acm.engine.search.zoieimpl.core.DirectoryManager - Directory[lockID=lucene-23f3e69abbd639b3b4a1db928c1b323d, filename=assetIndex] writing _5t.tvd, directory contents: _2q.fdt(size=969134416, modified=16:41:57,000)
{noformat}

and other times "creating output":

{noformat}
13:26:51,732 [proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader@39f46204] DEBUG org.apache.lucene.store.SimpleFSDirectory - Creating output for _66.tvx
{noformat}

Why the difference?  Like where in the code are you printing these?

> Index corruption after 'read past EOF' under heavy update load and snapshot export
> ----------------------------------------------------------------------------------
>
>                 Key: LUCENE-2729
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2729
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>    Affects Versions: 3.0.1, 3.0.2
>         Environment: Happens on both OS X 10.6 and Windows 2008 Server. Integrated with zoie (using a zoie snapshot from 2010-08-06: zoie-2.0.0-snapshot-20100806.jar).
>            Reporter: Nico Krijnen
>         Attachments: 2010-11-02 IndexWriter infoStream log.zip, backup_force_failure2.log.zip, eof-extra-logging-4-analysis.txt, eof-extra-logging-4.log.zip, LUCENE-2729-test1.patch
>
>
> We have a system running lucene and zoie. We use lucene as a content store for a CMS/DAM system. We use the hot-backup feature of zoie to make scheduled backups of the index. This works fine for small indexes and when there are not a lot of changes to the index when the backup is made.
> On large indexes (about 5 GB to 19 GB), when a backup is made while the index is being changed a lot (lots of document additions and/or deletions), we almost always get a 'read past EOF' at some point, followed by lots of 'Lock obtain timed out'.
> At that point we get lots of 0 kb files in the index, data gets lots, and the index is unusable.
> When we stop our server, remove the 0kb files and restart our server, the index is operational again, but data has been lost.
> I'm not sure if this is a zoie or a lucene issue, so i'm posting it to both. Hopefully someone has some ideas where to look to fix this.
> Some more details...
> Stack trace of the read past EOF and following Lock obtain timed out:
> {code}
> 78307 [proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader@31ca5085] 
>     ERROR proj.zoie.impl.indexing.internal.BaseSearchIndex - read past EOF
> java.io.IOException: read past EOF
>     at org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:154)
>     at org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:39)
>     at org.apache.lucene.store.ChecksumIndexInput.readByte(ChecksumIndexInput.java:37)
>     at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:69)
>     at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:245)
>     at org.apache.lucene.index.IndexFileDeleter.<init>(IndexFileDeleter.java:166)
>     at org.apache.lucene.index.DirectoryReader.doCommit(DirectoryReader.java:725)
>     at org.apache.lucene.index.IndexReader.commit(IndexReader.java:987)
>     at org.apache.lucene.index.IndexReader.commit(IndexReader.java:973)
>     at org.apache.lucene.index.IndexReader.decRef(IndexReader.java:162)
>     at org.apache.lucene.index.IndexReader.close(IndexReader.java:1003)
>     at proj.zoie.impl.indexing.internal.BaseSearchIndex.deleteDocs(BaseSearchIndex.java:203)
>     at proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:223)
>     at proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153)
>     at proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134)
>     at proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171)
>     at proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:373)
> 579336 [proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader@31ca5085] 
>     ERROR proj.zoie.impl.indexing.internal.LuceneIndexDataLoader - 
>     Problem copying segments: Lock obtain timed out: 
>     org.apache.lucene.store.SingleInstanceLock@5ad0b895: write.lock
> org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: 
>     org.apache.lucene.store.SingleInstanceLock@5ad0b895: write.lock
>     at org.apache.lucene.store.Lock.obtain(Lock.java:84)
>     at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1060)
>     at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:957)
>     at proj.zoie.impl.indexing.internal.DiskSearchIndex.openIndexWriter(DiskSearchIndex.java:176)
>     at proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:228)
>     at proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153)
>     at proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134)
>     at proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171)
>     at proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:373)
> {code}
> We get exactly the same behavour on both OS X and on Windows. On both zoie is using a SimpleFSDirectory.
> We also use a SingleInstanceLockFactory (since our process is the only one working with the index), but we get the same behaviour with a NativeFSLock.
> The snapshot backup is being made by calling:
> *proj.zoie.impl.indexing.ZoieSystem.exportSnapshot(WritableByteChannel)*
> Same issue in zoie JIRA:
> http://snaprojects.jira.com/browse/ZOIE-51

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] Commented: (LUCENE-2729) Index corruption after 'read past EOF' under heavy update load and snapshot export

Posted by "Xiaoyang Gu (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12935130#action_12935130 ] 

Xiaoyang Gu commented on LUCENE-2729:
-------------------------------------

I already reproduced the same Exception here and the above patch at least fixes it in the sense that my code for reproducing it would stop to have Exception.

I committed a patch on my master branch.

Thanks to everybody working on it.

> Index corruption after 'read past EOF' under heavy update load and snapshot export
> ----------------------------------------------------------------------------------
>
>                 Key: LUCENE-2729
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2729
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>    Affects Versions: 3.0.1, 3.0.2
>         Environment: Happens on both OS X 10.6 and Windows 2008 Server. Integrated with zoie (using a zoie snapshot from 2010-08-06: zoie-2.0.0-snapshot-20100806.jar).
>            Reporter: Nico Krijnen
>         Attachments: 2010-11-02 IndexWriter infoStream log.zip, backup_force_failure2.log.zip, eof-extra-logging-4-analysis.txt, eof-extra-logging-4.log.zip, LUCENE-2729-test1.patch, read-past-eof-debugging.zip
>
>
> We have a system running lucene and zoie. We use lucene as a content store for a CMS/DAM system. We use the hot-backup feature of zoie to make scheduled backups of the index. This works fine for small indexes and when there are not a lot of changes to the index when the backup is made.
> On large indexes (about 5 GB to 19 GB), when a backup is made while the index is being changed a lot (lots of document additions and/or deletions), we almost always get a 'read past EOF' at some point, followed by lots of 'Lock obtain timed out'.
> At that point we get lots of 0 kb files in the index, data gets lots, and the index is unusable.
> When we stop our server, remove the 0kb files and restart our server, the index is operational again, but data has been lost.
> I'm not sure if this is a zoie or a lucene issue, so i'm posting it to both. Hopefully someone has some ideas where to look to fix this.
> Some more details...
> Stack trace of the read past EOF and following Lock obtain timed out:
> {code}
> 78307 [proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader@31ca5085] 
>     ERROR proj.zoie.impl.indexing.internal.BaseSearchIndex - read past EOF
> java.io.IOException: read past EOF
>     at org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:154)
>     at org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:39)
>     at org.apache.lucene.store.ChecksumIndexInput.readByte(ChecksumIndexInput.java:37)
>     at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:69)
>     at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:245)
>     at org.apache.lucene.index.IndexFileDeleter.<init>(IndexFileDeleter.java:166)
>     at org.apache.lucene.index.DirectoryReader.doCommit(DirectoryReader.java:725)
>     at org.apache.lucene.index.IndexReader.commit(IndexReader.java:987)
>     at org.apache.lucene.index.IndexReader.commit(IndexReader.java:973)
>     at org.apache.lucene.index.IndexReader.decRef(IndexReader.java:162)
>     at org.apache.lucene.index.IndexReader.close(IndexReader.java:1003)
>     at proj.zoie.impl.indexing.internal.BaseSearchIndex.deleteDocs(BaseSearchIndex.java:203)
>     at proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:223)
>     at proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153)
>     at proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134)
>     at proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171)
>     at proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:373)
> 579336 [proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader@31ca5085] 
>     ERROR proj.zoie.impl.indexing.internal.LuceneIndexDataLoader - 
>     Problem copying segments: Lock obtain timed out: 
>     org.apache.lucene.store.SingleInstanceLock@5ad0b895: write.lock
> org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: 
>     org.apache.lucene.store.SingleInstanceLock@5ad0b895: write.lock
>     at org.apache.lucene.store.Lock.obtain(Lock.java:84)
>     at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1060)
>     at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:957)
>     at proj.zoie.impl.indexing.internal.DiskSearchIndex.openIndexWriter(DiskSearchIndex.java:176)
>     at proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:228)
>     at proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153)
>     at proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134)
>     at proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171)
>     at proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:373)
> {code}
> We get exactly the same behavour on both OS X and on Windows. On both zoie is using a SimpleFSDirectory.
> We also use a SingleInstanceLockFactory (since our process is the only one working with the index), but we get the same behaviour with a NativeFSLock.
> The snapshot backup is being made by calling:
> *proj.zoie.impl.indexing.ZoieSystem.exportSnapshot(WritableByteChannel)*
> Same issue in zoie JIRA:
> http://snaprojects.jira.com/browse/ZOIE-51

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] Commented: (LUCENE-2729) Index corruption after 'read past EOF' under heavy update load and snapshot export

Posted by "Nico Krijnen (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12930086#action_12930086 ] 

Nico Krijnen commented on LUCENE-2729:
--------------------------------------

Any ideas on what could be happening? It sounds like IndexWriter is the only one that is modifying these files, zoie only seems to be reading from them to make the backup. What should we look for in the IndexWriter's infoStream?

> Index corruption after 'read past EOF' under heavy update load and snapshot export
> ----------------------------------------------------------------------------------
>
>                 Key: LUCENE-2729
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2729
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>    Affects Versions: 3.0.1, 3.0.2
>         Environment: Happens on both OS X 10.6 and Windows 2008 Server. Integrated with zoie (using a zoie snapshot from 2010-08-06: zoie-2.0.0-snapshot-20100806.jar).
>            Reporter: Nico Krijnen
>         Attachments: 2010-11-02 IndexWriter infoStream log.zip
>
>
> We have a system running lucene and zoie. We use lucene as a content store for a CMS/DAM system. We use the hot-backup feature of zoie to make scheduled backups of the index. This works fine for small indexes and when there are not a lot of changes to the index when the backup is made.
> On large indexes (about 5 GB to 19 GB), when a backup is made while the index is being changed a lot (lots of document additions and/or deletions), we almost always get a 'read past EOF' at some point, followed by lots of 'Lock obtain timed out'.
> At that point we get lots of 0 kb files in the index, data gets lots, and the index is unusable.
> When we stop our server, remove the 0kb files and restart our server, the index is operational again, but data has been lost.
> I'm not sure if this is a zoie or a lucene issue, so i'm posting it to both. Hopefully someone has some ideas where to look to fix this.
> Some more details...
> Stack trace of the read past EOF and following Lock obtain timed out:
> {code}
> 78307 [proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader@31ca5085] ERROR proj.zoie.impl.indexing.internal.BaseSearchIndex - read past EOF
> java.io.IOException: read past EOF
>     at org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:154)
>     at org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:39)
>     at org.apache.lucene.store.ChecksumIndexInput.readByte(ChecksumIndexInput.java:37)
>     at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:69)
>     at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:245)
>     at org.apache.lucene.index.IndexFileDeleter.<init>(IndexFileDeleter.java:166)
>     at org.apache.lucene.index.DirectoryReader.doCommit(DirectoryReader.java:725)
>     at org.apache.lucene.index.IndexReader.commit(IndexReader.java:987)
>     at org.apache.lucene.index.IndexReader.commit(IndexReader.java:973)
>     at org.apache.lucene.index.IndexReader.decRef(IndexReader.java:162)
>     at org.apache.lucene.index.IndexReader.close(IndexReader.java:1003)
>     at proj.zoie.impl.indexing.internal.BaseSearchIndex.deleteDocs(BaseSearchIndex.java:203)
>     at proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:223)
>     at proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153)
>     at proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134)
>     at proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171)
>     at proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:373)
> 579336 [proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader@31ca5085] ERROR proj.zoie.impl.indexing.internal.LuceneIndexDataLoader - 
>     Problem copying segments: Lock obtain timed out: org.apache.lucene.store.SingleInstanceLock@5ad0b895: write.lock
> org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: org.apache.lucene.store.SingleInstanceLock@5ad0b895: write.lock
>     at org.apache.lucene.store.Lock.obtain(Lock.java:84)
>     at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1060)
>     at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:957)
>     at proj.zoie.impl.indexing.internal.DiskSearchIndex.openIndexWriter(DiskSearchIndex.java:176)
>     at proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:228)
>     at proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153)
>     at proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134)
>     at proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171)
>     at proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:373)
> {code}
> We get exactly the same behavour on both OS X and on Windows. On both zoie is using a SimpleFSDirectory.
> We also use a SingleInstanceLockFactory (since our process is the only one working with the index), but we get the same behaviour with a NativeFSLock.
> The snapshot backup is being made by calling:
> *proj.zoie.impl.indexing.ZoieSystem.exportSnapshot(WritableByteChannel)*
> Same issue in zoie JIRA:
> http://snaprojects.jira.com/browse/ZOIE-51

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] Updated: (LUCENE-2729) Index corruption after 'read past EOF' under heavy update load and snapshot export

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/LUCENE-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael McCandless updated LUCENE-2729:
---------------------------------------

    Attachment: LUCENE-2729-test1.patch


First, just to rule out any already-fixed-but-not-yet-released issues,
can you update your Lucene JAR to the tip of the 3.0.x branch?  Ie do
this:

{noformat}
  svn checkout https://svn.apache.org/repos/asf/lucene/java/branches/lucene_3_0 30x
  cd 30x
  ant jar
{noformat}

And then copy build/lucene-core-3.0.3-dev.jar to your CLASSPATH
(replacing old Lucene JAR).

Second, can you apply the patch I just attached
(LUCENE-2729-test1.patch) and then make this corruption happen again?
That patch throws an exception if ever we try to call
SimpleFSDir.createOutput on a file that already exists.  Lucene should
never do this under non-exceptional situations, yet somehow it looks
like it may be (with all your 0 length files).


> Index corruption after 'read past EOF' under heavy update load and snapshot export
> ----------------------------------------------------------------------------------
>
>                 Key: LUCENE-2729
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2729
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>    Affects Versions: 3.0.1, 3.0.2
>         Environment: Happens on both OS X 10.6 and Windows 2008 Server. Integrated with zoie (using a zoie snapshot from 2010-08-06: zoie-2.0.0-snapshot-20100806.jar).
>            Reporter: Nico Krijnen
>         Attachments: 2010-11-02 IndexWriter infoStream log.zip, LUCENE-2729-test1.patch
>
>
> We have a system running lucene and zoie. We use lucene as a content store for a CMS/DAM system. We use the hot-backup feature of zoie to make scheduled backups of the index. This works fine for small indexes and when there are not a lot of changes to the index when the backup is made.
> On large indexes (about 5 GB to 19 GB), when a backup is made while the index is being changed a lot (lots of document additions and/or deletions), we almost always get a 'read past EOF' at some point, followed by lots of 'Lock obtain timed out'.
> At that point we get lots of 0 kb files in the index, data gets lots, and the index is unusable.
> When we stop our server, remove the 0kb files and restart our server, the index is operational again, but data has been lost.
> I'm not sure if this is a zoie or a lucene issue, so i'm posting it to both. Hopefully someone has some ideas where to look to fix this.
> Some more details...
> Stack trace of the read past EOF and following Lock obtain timed out:
> {code}
> 78307 [proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader@31ca5085] ERROR proj.zoie.impl.indexing.internal.BaseSearchIndex - read past EOF
> java.io.IOException: read past EOF
>     at org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:154)
>     at org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:39)
>     at org.apache.lucene.store.ChecksumIndexInput.readByte(ChecksumIndexInput.java:37)
>     at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:69)
>     at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:245)
>     at org.apache.lucene.index.IndexFileDeleter.<init>(IndexFileDeleter.java:166)
>     at org.apache.lucene.index.DirectoryReader.doCommit(DirectoryReader.java:725)
>     at org.apache.lucene.index.IndexReader.commit(IndexReader.java:987)
>     at org.apache.lucene.index.IndexReader.commit(IndexReader.java:973)
>     at org.apache.lucene.index.IndexReader.decRef(IndexReader.java:162)
>     at org.apache.lucene.index.IndexReader.close(IndexReader.java:1003)
>     at proj.zoie.impl.indexing.internal.BaseSearchIndex.deleteDocs(BaseSearchIndex.java:203)
>     at proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:223)
>     at proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153)
>     at proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134)
>     at proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171)
>     at proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:373)
> 579336 [proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader@31ca5085] ERROR proj.zoie.impl.indexing.internal.LuceneIndexDataLoader - 
>     Problem copying segments: Lock obtain timed out: org.apache.lucene.store.SingleInstanceLock@5ad0b895: write.lock
> org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: org.apache.lucene.store.SingleInstanceLock@5ad0b895: write.lock
>     at org.apache.lucene.store.Lock.obtain(Lock.java:84)
>     at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1060)
>     at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:957)
>     at proj.zoie.impl.indexing.internal.DiskSearchIndex.openIndexWriter(DiskSearchIndex.java:176)
>     at proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:228)
>     at proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153)
>     at proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134)
>     at proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171)
>     at proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:373)
> {code}
> We get exactly the same behavour on both OS X and on Windows. On both zoie is using a SimpleFSDirectory.
> We also use a SingleInstanceLockFactory (since our process is the only one working with the index), but we get the same behaviour with a NativeFSLock.
> The snapshot backup is being made by calling:
> *proj.zoie.impl.indexing.ZoieSystem.exportSnapshot(WritableByteChannel)*
> Same issue in zoie JIRA:
> http://snaprojects.jira.com/browse/ZOIE-51

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] Commented: (LUCENE-2729) Index corruption after 'read past EOF' under heavy update load and snapshot export

Posted by "Nico Krijnen (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12935366#action_12935366 ] 

Nico Krijnen commented on LUCENE-2729:
--------------------------------------

Thanks for all the help guys! We greatly appreciate all the insights and thorough responses.

The fixes made to ZOIE by Xiaoyang Gu have indeed resolved the issue. For details see:

http://snaprojects.jira.com/browse/ZOIE-51

> Index corruption after 'read past EOF' under heavy update load and snapshot export
> ----------------------------------------------------------------------------------
>
>                 Key: LUCENE-2729
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2729
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>    Affects Versions: 3.0.1, 3.0.2
>         Environment: Happens on both OS X 10.6 and Windows 2008 Server. Integrated with zoie (using a zoie snapshot from 2010-08-06: zoie-2.0.0-snapshot-20100806.jar).
>            Reporter: Nico Krijnen
>         Attachments: 2010-11-02 IndexWriter infoStream log.zip, backup_force_failure2.log.zip, eof-extra-logging-4-analysis.txt, eof-extra-logging-4.log.zip, LUCENE-2729-test1.patch, read-past-eof-debugging.zip
>
>
> We have a system running lucene and zoie. We use lucene as a content store for a CMS/DAM system. We use the hot-backup feature of zoie to make scheduled backups of the index. This works fine for small indexes and when there are not a lot of changes to the index when the backup is made.
> On large indexes (about 5 GB to 19 GB), when a backup is made while the index is being changed a lot (lots of document additions and/or deletions), we almost always get a 'read past EOF' at some point, followed by lots of 'Lock obtain timed out'.
> At that point we get lots of 0 kb files in the index, data gets lots, and the index is unusable.
> When we stop our server, remove the 0kb files and restart our server, the index is operational again, but data has been lost.
> I'm not sure if this is a zoie or a lucene issue, so i'm posting it to both. Hopefully someone has some ideas where to look to fix this.
> Some more details...
> Stack trace of the read past EOF and following Lock obtain timed out:
> {code}
> 78307 [proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader@31ca5085] 
>     ERROR proj.zoie.impl.indexing.internal.BaseSearchIndex - read past EOF
> java.io.IOException: read past EOF
>     at org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:154)
>     at org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:39)
>     at org.apache.lucene.store.ChecksumIndexInput.readByte(ChecksumIndexInput.java:37)
>     at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:69)
>     at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:245)
>     at org.apache.lucene.index.IndexFileDeleter.<init>(IndexFileDeleter.java:166)
>     at org.apache.lucene.index.DirectoryReader.doCommit(DirectoryReader.java:725)
>     at org.apache.lucene.index.IndexReader.commit(IndexReader.java:987)
>     at org.apache.lucene.index.IndexReader.commit(IndexReader.java:973)
>     at org.apache.lucene.index.IndexReader.decRef(IndexReader.java:162)
>     at org.apache.lucene.index.IndexReader.close(IndexReader.java:1003)
>     at proj.zoie.impl.indexing.internal.BaseSearchIndex.deleteDocs(BaseSearchIndex.java:203)
>     at proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:223)
>     at proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153)
>     at proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134)
>     at proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171)
>     at proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:373)
> 579336 [proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader@31ca5085] 
>     ERROR proj.zoie.impl.indexing.internal.LuceneIndexDataLoader - 
>     Problem copying segments: Lock obtain timed out: 
>     org.apache.lucene.store.SingleInstanceLock@5ad0b895: write.lock
> org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: 
>     org.apache.lucene.store.SingleInstanceLock@5ad0b895: write.lock
>     at org.apache.lucene.store.Lock.obtain(Lock.java:84)
>     at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1060)
>     at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:957)
>     at proj.zoie.impl.indexing.internal.DiskSearchIndex.openIndexWriter(DiskSearchIndex.java:176)
>     at proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:228)
>     at proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153)
>     at proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134)
>     at proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171)
>     at proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:373)
> {code}
> We get exactly the same behavour on both OS X and on Windows. On both zoie is using a SimpleFSDirectory.
> We also use a SingleInstanceLockFactory (since our process is the only one working with the index), but we get the same behaviour with a NativeFSLock.
> The snapshot backup is being made by calling:
> *proj.zoie.impl.indexing.ZoieSystem.exportSnapshot(WritableByteChannel)*
> Same issue in zoie JIRA:
> http://snaprojects.jira.com/browse/ZOIE-51

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] Commented: (LUCENE-2729) Index corruption after 'read past EOF' under heavy update load and snapshot export

Posted by "Nico Krijnen (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12930094#action_12930094 ] 

Nico Krijnen commented on LUCENE-2729:
--------------------------------------

Thx! We will update, patch and re-run the test.

> Index corruption after 'read past EOF' under heavy update load and snapshot export
> ----------------------------------------------------------------------------------
>
>                 Key: LUCENE-2729
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2729
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>    Affects Versions: 3.0.1, 3.0.2
>         Environment: Happens on both OS X 10.6 and Windows 2008 Server. Integrated with zoie (using a zoie snapshot from 2010-08-06: zoie-2.0.0-snapshot-20100806.jar).
>            Reporter: Nico Krijnen
>         Attachments: 2010-11-02 IndexWriter infoStream log.zip, LUCENE-2729-test1.patch
>
>
> We have a system running lucene and zoie. We use lucene as a content store for a CMS/DAM system. We use the hot-backup feature of zoie to make scheduled backups of the index. This works fine for small indexes and when there are not a lot of changes to the index when the backup is made.
> On large indexes (about 5 GB to 19 GB), when a backup is made while the index is being changed a lot (lots of document additions and/or deletions), we almost always get a 'read past EOF' at some point, followed by lots of 'Lock obtain timed out'.
> At that point we get lots of 0 kb files in the index, data gets lots, and the index is unusable.
> When we stop our server, remove the 0kb files and restart our server, the index is operational again, but data has been lost.
> I'm not sure if this is a zoie or a lucene issue, so i'm posting it to both. Hopefully someone has some ideas where to look to fix this.
> Some more details...
> Stack trace of the read past EOF and following Lock obtain timed out:
> {code}
> 78307 [proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader@31ca5085] ERROR proj.zoie.impl.indexing.internal.BaseSearchIndex - read past EOF
> java.io.IOException: read past EOF
>     at org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:154)
>     at org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:39)
>     at org.apache.lucene.store.ChecksumIndexInput.readByte(ChecksumIndexInput.java:37)
>     at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:69)
>     at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:245)
>     at org.apache.lucene.index.IndexFileDeleter.<init>(IndexFileDeleter.java:166)
>     at org.apache.lucene.index.DirectoryReader.doCommit(DirectoryReader.java:725)
>     at org.apache.lucene.index.IndexReader.commit(IndexReader.java:987)
>     at org.apache.lucene.index.IndexReader.commit(IndexReader.java:973)
>     at org.apache.lucene.index.IndexReader.decRef(IndexReader.java:162)
>     at org.apache.lucene.index.IndexReader.close(IndexReader.java:1003)
>     at proj.zoie.impl.indexing.internal.BaseSearchIndex.deleteDocs(BaseSearchIndex.java:203)
>     at proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:223)
>     at proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153)
>     at proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134)
>     at proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171)
>     at proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:373)
> 579336 [proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader@31ca5085] ERROR proj.zoie.impl.indexing.internal.LuceneIndexDataLoader - 
>     Problem copying segments: Lock obtain timed out: org.apache.lucene.store.SingleInstanceLock@5ad0b895: write.lock
> org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: org.apache.lucene.store.SingleInstanceLock@5ad0b895: write.lock
>     at org.apache.lucene.store.Lock.obtain(Lock.java:84)
>     at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1060)
>     at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:957)
>     at proj.zoie.impl.indexing.internal.DiskSearchIndex.openIndexWriter(DiskSearchIndex.java:176)
>     at proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:228)
>     at proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153)
>     at proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134)
>     at proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171)
>     at proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:373)
> {code}
> We get exactly the same behavour on both OS X and on Windows. On both zoie is using a SimpleFSDirectory.
> We also use a SingleInstanceLockFactory (since our process is the only one working with the index), but we get the same behaviour with a NativeFSLock.
> The snapshot backup is being made by calling:
> *proj.zoie.impl.indexing.ZoieSystem.exportSnapshot(WritableByteChannel)*
> Same issue in zoie JIRA:
> http://snaprojects.jira.com/browse/ZOIE-51

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] Commented: (LUCENE-2729) Index corruption after 'read past EOF' under heavy update load and snapshot export

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12926907#action_12926907 ] 

Michael McCandless commented on LUCENE-2729:
--------------------------------------------

That long string of length 0 files is very bizarre.

Was there no original root cause here?  Eg disk full?

Or is the "read past EOF" on closing an IndexReader w/ pending deletes really the first exception you see?

Does zoie somehow touch the index files?  Taking a backup is fundamentally a read-only op on the index, so that process shouldn't by itself truncate index files.

Something is somehow reaching in and zero-ing out these files.  I don't think Lucene itself would do this.  For example, the serious of _6i.XXX zero'd files... Lucene writes these
files roughly in sequence, so if something bad happened in writing the postings, then the .nrm file should not even exist.

So we need to figure out who is truncating these files...


> Index corruption after 'read past EOF' under heavy update load and snapshot export
> ----------------------------------------------------------------------------------
>
>                 Key: LUCENE-2729
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2729
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>    Affects Versions: 3.0.1, 3.0.2
>         Environment: Happens on both OS X 10.6 and Windows 2008 Server. Integrated with zoie (using a zoie snapshot from 2010-08-06: zoie-2.0.0-snapshot-20100806.jar).
>            Reporter: Nico Krijnen
>
> We have a system running lucene and zoie. We use lucene as a content store for a CMS/DAM system. We use the hot-backup feature of zoie to make scheduled backups of the index. This works fine for small indexes and when there are not a lot of changes to the index when the backup is made.
> On large indexes (about 5 GB to 19 GB), when a backup is made while the index is being changed a lot (lots of document additions and/or deletions), we almost always get a 'read past EOF' at some point, followed by lots of 'Lock obtain timed out'.
> At that point we get lots of 0 kb files in the index, data gets lots, and the index is unusable.
> When we stop our server, remove the 0kb files and restart our server, the index is operational again, but data has been lost.
> I'm not sure if this is a zoie or a lucene issue, so i'm posting it to both. Hopefully someone has some ideas where to look to fix this.
> Some more details...
> Stack trace of the read past EOF and following Lock obtain timed out:
> {code}
> 78307 [proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader@31ca5085] ERROR proj.zoie.impl.indexing.internal.BaseSearchIndex - read past EOF
> java.io.IOException: read past EOF
>     at org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:154)
>     at org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:39)
>     at org.apache.lucene.store.ChecksumIndexInput.readByte(ChecksumIndexInput.java:37)
>     at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:69)
>     at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:245)
>     at org.apache.lucene.index.IndexFileDeleter.<init>(IndexFileDeleter.java:166)
>     at org.apache.lucene.index.DirectoryReader.doCommit(DirectoryReader.java:725)
>     at org.apache.lucene.index.IndexReader.commit(IndexReader.java:987)
>     at org.apache.lucene.index.IndexReader.commit(IndexReader.java:973)
>     at org.apache.lucene.index.IndexReader.decRef(IndexReader.java:162)
>     at org.apache.lucene.index.IndexReader.close(IndexReader.java:1003)
>     at proj.zoie.impl.indexing.internal.BaseSearchIndex.deleteDocs(BaseSearchIndex.java:203)
>     at proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:223)
>     at proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153)
>     at proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134)
>     at proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171)
>     at proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:373)
> 579336 [proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader@31ca5085] ERROR proj.zoie.impl.indexing.internal.LuceneIndexDataLoader - Problem copying segments: Lock obtain timed out: org.apache.lucene.store.SingleInstanceLock@5ad0b895: write.lock
> org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: org.apache.lucene.store.SingleInstanceLock@5ad0b895: write.lock
>     at org.apache.lucene.store.Lock.obtain(Lock.java:84)
>     at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1060)
>     at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:957)
>     at proj.zoie.impl.indexing.internal.DiskSearchIndex.openIndexWriter(DiskSearchIndex.java:176)
>     at proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:228)
>     at proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153)
>     at proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134)
>     at proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171)
>     at proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:373)
> {code}
> We get exactly the same behavour on both OS X and on Windows. On both zoie is using a SimpleFSDirectory.
> We also use a SingleInstanceLockFactory (since our process is the only one working with the index), but we get the same behaviour with a NativeFSLock.
> The snapshot backup is being made by calling:
> *proj.zoie.impl.indexing.ZoieSystem.exportSnapshot(WritableByteChannel)*
> Same issue in zoie JIRA:
> http://snaprojects.jira.com/browse/ZOIE-51

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] Updated: (LUCENE-2729) Index corruption after 'read past EOF' under heavy update load and snapshot export

Posted by "Nico Krijnen (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/LUCENE-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Nico Krijnen updated LUCENE-2729:
---------------------------------

    Attachment: eof-extra-logging-4.log.zip
                eof-extra-logging-4-analysis.txt

We did a few more runs with lots more logging and we think we have pinpointed the cause of the issue.

We added logging to track files that are opened for read and write. At the moment the 'read past EOF' happens, no files are still open for writing. So it does not seem as if a file is opened, but never closed. However we did find another interesting entry:

- Some time before the 'read past EOF' we see some activity in the disk index on the segments_... file that is 'locked' by the zoie snapshot.
-- 'our' segments_... file is deleted (the logged dir listings show that it does not exist anymore after that)
-- 'our' segments_... file is written to. At this point it is created again, but becomes 0 kb, which at a later point causes the 'read past EOF'.

We added more logging to find out more about when and who is 'setting' or 'changing' the 'generation' in SegmentInfos.

The logs from that run seem to indicate that *at some point SegementInfos.generation seems to 'wrap' to a lower value and earlier generation numbers get re-used*.

In ZoieIndexDeletionPolicy, a Snapshot is initialized using a certain IndexCommit. Zoie doesn't take into account that Lucene may re-use segments filenames, no locking is applied to this file whatsoever in terms of keeping Lucene from re-using this specific segments filename.

AFAICT by going through the code in SegmentInfos, the generation counter is reset in SegmentInfos.read(Directory). When looking at the log files and singling out the *value of generation, in our case it doesn't even reach a value above 118*. The segments filename of our Snapshot is 'segments_2q' which translates to a generation of 98. After a while, Lucene deletes this file on disk (because the name is re-used?) in a call to FSDirectory.deleteFile. However, our backup is still running at that point. Somehow this file is written to disk a moment later and on the next read, we get the 'read past EOF' error.
It's not completely clear to me, but why is the generation counter reset on read(Directory)? Is it because the file is determined through FindSegmentsFile(Directory).run()?

Although this could be interpreted like a flaw in ZoieIndexDeletionPolicy, I think this should be worked around in Lucene (correct me if I'm wrong).

We can attempt some kind of 'fix' to skip generations corresponding to segment filenames used by the '_currentSnapshots' variable in ZoieIndexDeletionPolicy. But the IndexDeletionPolicy API or SegmentInfos class does not really provide a nice way of doing that and it doesn't feel like a proper fix.

*Is there a specific reason for the generation to 'wrap around' or be re-used within hours?*
Looking at the code in SegmentInfos I do not see any explicit logic or reasons for re-using earlier generation numbers.

> Index corruption after 'read past EOF' under heavy update load and snapshot export
> ----------------------------------------------------------------------------------
>
>                 Key: LUCENE-2729
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2729
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>    Affects Versions: 3.0.1, 3.0.2
>         Environment: Happens on both OS X 10.6 and Windows 2008 Server. Integrated with zoie (using a zoie snapshot from 2010-08-06: zoie-2.0.0-snapshot-20100806.jar).
>            Reporter: Nico Krijnen
>         Attachments: 2010-11-02 IndexWriter infoStream log.zip, backup_force_failure2.log.zip, eof-extra-logging-4-analysis.txt, eof-extra-logging-4.log.zip, LUCENE-2729-test1.patch
>
>
> We have a system running lucene and zoie. We use lucene as a content store for a CMS/DAM system. We use the hot-backup feature of zoie to make scheduled backups of the index. This works fine for small indexes and when there are not a lot of changes to the index when the backup is made.
> On large indexes (about 5 GB to 19 GB), when a backup is made while the index is being changed a lot (lots of document additions and/or deletions), we almost always get a 'read past EOF' at some point, followed by lots of 'Lock obtain timed out'.
> At that point we get lots of 0 kb files in the index, data gets lots, and the index is unusable.
> When we stop our server, remove the 0kb files and restart our server, the index is operational again, but data has been lost.
> I'm not sure if this is a zoie or a lucene issue, so i'm posting it to both. Hopefully someone has some ideas where to look to fix this.
> Some more details...
> Stack trace of the read past EOF and following Lock obtain timed out:
> {code}
> 78307 [proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader@31ca5085] 
>     ERROR proj.zoie.impl.indexing.internal.BaseSearchIndex - read past EOF
> java.io.IOException: read past EOF
>     at org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:154)
>     at org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:39)
>     at org.apache.lucene.store.ChecksumIndexInput.readByte(ChecksumIndexInput.java:37)
>     at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:69)
>     at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:245)
>     at org.apache.lucene.index.IndexFileDeleter.<init>(IndexFileDeleter.java:166)
>     at org.apache.lucene.index.DirectoryReader.doCommit(DirectoryReader.java:725)
>     at org.apache.lucene.index.IndexReader.commit(IndexReader.java:987)
>     at org.apache.lucene.index.IndexReader.commit(IndexReader.java:973)
>     at org.apache.lucene.index.IndexReader.decRef(IndexReader.java:162)
>     at org.apache.lucene.index.IndexReader.close(IndexReader.java:1003)
>     at proj.zoie.impl.indexing.internal.BaseSearchIndex.deleteDocs(BaseSearchIndex.java:203)
>     at proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:223)
>     at proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153)
>     at proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134)
>     at proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171)
>     at proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:373)
> 579336 [proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader@31ca5085] 
>     ERROR proj.zoie.impl.indexing.internal.LuceneIndexDataLoader - 
>     Problem copying segments: Lock obtain timed out: 
>     org.apache.lucene.store.SingleInstanceLock@5ad0b895: write.lock
> org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: 
>     org.apache.lucene.store.SingleInstanceLock@5ad0b895: write.lock
>     at org.apache.lucene.store.Lock.obtain(Lock.java:84)
>     at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1060)
>     at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:957)
>     at proj.zoie.impl.indexing.internal.DiskSearchIndex.openIndexWriter(DiskSearchIndex.java:176)
>     at proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:228)
>     at proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153)
>     at proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134)
>     at proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171)
>     at proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:373)
> {code}
> We get exactly the same behavour on both OS X and on Windows. On both zoie is using a SimpleFSDirectory.
> We also use a SingleInstanceLockFactory (since our process is the only one working with the index), but we get the same behaviour with a NativeFSLock.
> The snapshot backup is being made by calling:
> *proj.zoie.impl.indexing.ZoieSystem.exportSnapshot(WritableByteChannel)*
> Same issue in zoie JIRA:
> http://snaprojects.jira.com/browse/ZOIE-51

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] Commented: (LUCENE-2729) Index corruption after 'read past EOF' under heavy update load and snapshot export

Posted by "Xiaoyang Gu (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12935152#action_12935152 ] 

Xiaoyang Gu commented on LUCENE-2729:
-------------------------------------

taking in Nico's patch for file channel bug ZOIE-53 and now use commit: e6c53ed

> Index corruption after 'read past EOF' under heavy update load and snapshot export
> ----------------------------------------------------------------------------------
>
>                 Key: LUCENE-2729
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2729
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>    Affects Versions: 3.0.1, 3.0.2
>         Environment: Happens on both OS X 10.6 and Windows 2008 Server. Integrated with zoie (using a zoie snapshot from 2010-08-06: zoie-2.0.0-snapshot-20100806.jar).
>            Reporter: Nico Krijnen
>         Attachments: 2010-11-02 IndexWriter infoStream log.zip, backup_force_failure2.log.zip, eof-extra-logging-4-analysis.txt, eof-extra-logging-4.log.zip, LUCENE-2729-test1.patch, read-past-eof-debugging.zip
>
>
> We have a system running lucene and zoie. We use lucene as a content store for a CMS/DAM system. We use the hot-backup feature of zoie to make scheduled backups of the index. This works fine for small indexes and when there are not a lot of changes to the index when the backup is made.
> On large indexes (about 5 GB to 19 GB), when a backup is made while the index is being changed a lot (lots of document additions and/or deletions), we almost always get a 'read past EOF' at some point, followed by lots of 'Lock obtain timed out'.
> At that point we get lots of 0 kb files in the index, data gets lots, and the index is unusable.
> When we stop our server, remove the 0kb files and restart our server, the index is operational again, but data has been lost.
> I'm not sure if this is a zoie or a lucene issue, so i'm posting it to both. Hopefully someone has some ideas where to look to fix this.
> Some more details...
> Stack trace of the read past EOF and following Lock obtain timed out:
> {code}
> 78307 [proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader@31ca5085] 
>     ERROR proj.zoie.impl.indexing.internal.BaseSearchIndex - read past EOF
> java.io.IOException: read past EOF
>     at org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:154)
>     at org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:39)
>     at org.apache.lucene.store.ChecksumIndexInput.readByte(ChecksumIndexInput.java:37)
>     at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:69)
>     at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:245)
>     at org.apache.lucene.index.IndexFileDeleter.<init>(IndexFileDeleter.java:166)
>     at org.apache.lucene.index.DirectoryReader.doCommit(DirectoryReader.java:725)
>     at org.apache.lucene.index.IndexReader.commit(IndexReader.java:987)
>     at org.apache.lucene.index.IndexReader.commit(IndexReader.java:973)
>     at org.apache.lucene.index.IndexReader.decRef(IndexReader.java:162)
>     at org.apache.lucene.index.IndexReader.close(IndexReader.java:1003)
>     at proj.zoie.impl.indexing.internal.BaseSearchIndex.deleteDocs(BaseSearchIndex.java:203)
>     at proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:223)
>     at proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153)
>     at proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134)
>     at proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171)
>     at proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:373)
> 579336 [proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader@31ca5085] 
>     ERROR proj.zoie.impl.indexing.internal.LuceneIndexDataLoader - 
>     Problem copying segments: Lock obtain timed out: 
>     org.apache.lucene.store.SingleInstanceLock@5ad0b895: write.lock
> org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: 
>     org.apache.lucene.store.SingleInstanceLock@5ad0b895: write.lock
>     at org.apache.lucene.store.Lock.obtain(Lock.java:84)
>     at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1060)
>     at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:957)
>     at proj.zoie.impl.indexing.internal.DiskSearchIndex.openIndexWriter(DiskSearchIndex.java:176)
>     at proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:228)
>     at proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153)
>     at proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134)
>     at proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171)
>     at proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:373)
> {code}
> We get exactly the same behavour on both OS X and on Windows. On both zoie is using a SimpleFSDirectory.
> We also use a SingleInstanceLockFactory (since our process is the only one working with the index), but we get the same behaviour with a NativeFSLock.
> The snapshot backup is being made by calling:
> *proj.zoie.impl.indexing.ZoieSystem.exportSnapshot(WritableByteChannel)*
> Same issue in zoie JIRA:
> http://snaprojects.jira.com/browse/ZOIE-51

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] Commented: (LUCENE-2729) Index corruption after 'read past EOF' under heavy update load and snapshot export

Posted by "Jan te Beest (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12935680#action_12935680 ] 

Jan te Beest commented on LUCENE-2729:
--------------------------------------

OK, thanks for the info.

> Index corruption after 'read past EOF' under heavy update load and snapshot export
> ----------------------------------------------------------------------------------
>
>                 Key: LUCENE-2729
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2729
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>    Affects Versions: 3.0.1, 3.0.2
>         Environment: Happens on both OS X 10.6 and Windows 2008 Server. Integrated with zoie (using a zoie snapshot from 2010-08-06: zoie-2.0.0-snapshot-20100806.jar).
>            Reporter: Nico Krijnen
>         Attachments: 2010-11-02 IndexWriter infoStream log.zip, backup_force_failure2.log.zip, eof-extra-logging-4-analysis.txt, eof-extra-logging-4.log.zip, LUCENE-2729-test1.patch, read-past-eof-debugging.zip
>
>
> We have a system running lucene and zoie. We use lucene as a content store for a CMS/DAM system. We use the hot-backup feature of zoie to make scheduled backups of the index. This works fine for small indexes and when there are not a lot of changes to the index when the backup is made.
> On large indexes (about 5 GB to 19 GB), when a backup is made while the index is being changed a lot (lots of document additions and/or deletions), we almost always get a 'read past EOF' at some point, followed by lots of 'Lock obtain timed out'.
> At that point we get lots of 0 kb files in the index, data gets lots, and the index is unusable.
> When we stop our server, remove the 0kb files and restart our server, the index is operational again, but data has been lost.
> I'm not sure if this is a zoie or a lucene issue, so i'm posting it to both. Hopefully someone has some ideas where to look to fix this.
> Some more details...
> Stack trace of the read past EOF and following Lock obtain timed out:
> {code}
> 78307 [proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader@31ca5085] 
>     ERROR proj.zoie.impl.indexing.internal.BaseSearchIndex - read past EOF
> java.io.IOException: read past EOF
>     at org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:154)
>     at org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:39)
>     at org.apache.lucene.store.ChecksumIndexInput.readByte(ChecksumIndexInput.java:37)
>     at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:69)
>     at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:245)
>     at org.apache.lucene.index.IndexFileDeleter.<init>(IndexFileDeleter.java:166)
>     at org.apache.lucene.index.DirectoryReader.doCommit(DirectoryReader.java:725)
>     at org.apache.lucene.index.IndexReader.commit(IndexReader.java:987)
>     at org.apache.lucene.index.IndexReader.commit(IndexReader.java:973)
>     at org.apache.lucene.index.IndexReader.decRef(IndexReader.java:162)
>     at org.apache.lucene.index.IndexReader.close(IndexReader.java:1003)
>     at proj.zoie.impl.indexing.internal.BaseSearchIndex.deleteDocs(BaseSearchIndex.java:203)
>     at proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:223)
>     at proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153)
>     at proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134)
>     at proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171)
>     at proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:373)
> 579336 [proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader@31ca5085] 
>     ERROR proj.zoie.impl.indexing.internal.LuceneIndexDataLoader - 
>     Problem copying segments: Lock obtain timed out: 
>     org.apache.lucene.store.SingleInstanceLock@5ad0b895: write.lock
> org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: 
>     org.apache.lucene.store.SingleInstanceLock@5ad0b895: write.lock
>     at org.apache.lucene.store.Lock.obtain(Lock.java:84)
>     at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1060)
>     at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:957)
>     at proj.zoie.impl.indexing.internal.DiskSearchIndex.openIndexWriter(DiskSearchIndex.java:176)
>     at proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:228)
>     at proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153)
>     at proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134)
>     at proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171)
>     at proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:373)
> {code}
> We get exactly the same behavour on both OS X and on Windows. On both zoie is using a SimpleFSDirectory.
> We also use a SingleInstanceLockFactory (since our process is the only one working with the index), but we get the same behaviour with a NativeFSLock.
> The snapshot backup is being made by calling:
> *proj.zoie.impl.indexing.ZoieSystem.exportSnapshot(WritableByteChannel)*
> Same issue in zoie JIRA:
> http://snaprojects.jira.com/browse/ZOIE-51

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] Updated: (LUCENE-2729) Index corruption after 'read past EOF' under heavy update load and snapshot export

Posted by "Nico Krijnen (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/LUCENE-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Nico Krijnen updated LUCENE-2729:
---------------------------------

    Description: 
We have a system running lucene and zoie. We use lucene as a content store for a CMS/DAM system. We use the hot-backup feature of zoie to make scheduled backups of the index. This works fine for small indexes and when there are not a lot of changes to the index when the backup is made.

On large indexes (about 5 GB to 19 GB), when a backup is made while the index is being changed a lot (lots of document additions and/or deletions), we almost always get a 'read past EOF' at some point, followed by lots of 'Lock obtain timed out'.
At that point we get lots of 0 kb files in the index, data gets lots, and the index is unusable.

When we stop our server, remove the 0kb files and restart our server, the index is operational again, but data has been lost.

I'm not sure if this is a zoie or a lucene issue, so i'm posting it to both. Hopefully someone has some ideas where to look to fix this.


Some more details...

Stack trace of the read past EOF and following Lock obtain timed out:

{code}
78307 [proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader@31ca5085] 
    ERROR proj.zoie.impl.indexing.internal.BaseSearchIndex - read past EOF
java.io.IOException: read past EOF
    at org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:154)
    at org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:39)
    at org.apache.lucene.store.ChecksumIndexInput.readByte(ChecksumIndexInput.java:37)
    at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:69)
    at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:245)
    at org.apache.lucene.index.IndexFileDeleter.<init>(IndexFileDeleter.java:166)
    at org.apache.lucene.index.DirectoryReader.doCommit(DirectoryReader.java:725)
    at org.apache.lucene.index.IndexReader.commit(IndexReader.java:987)
    at org.apache.lucene.index.IndexReader.commit(IndexReader.java:973)
    at org.apache.lucene.index.IndexReader.decRef(IndexReader.java:162)
    at org.apache.lucene.index.IndexReader.close(IndexReader.java:1003)
    at proj.zoie.impl.indexing.internal.BaseSearchIndex.deleteDocs(BaseSearchIndex.java:203)
    at proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:223)
    at proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153)
    at proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134)
    at proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171)
    at proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:373)
579336 [proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader@31ca5085] 
    ERROR proj.zoie.impl.indexing.internal.LuceneIndexDataLoader - 
    Problem copying segments: Lock obtain timed out: 
    org.apache.lucene.store.SingleInstanceLock@5ad0b895: write.lock
org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: 
    org.apache.lucene.store.SingleInstanceLock@5ad0b895: write.lock
    at org.apache.lucene.store.Lock.obtain(Lock.java:84)
    at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1060)
    at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:957)
    at proj.zoie.impl.indexing.internal.DiskSearchIndex.openIndexWriter(DiskSearchIndex.java:176)
    at proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:228)
    at proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153)
    at proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134)
    at proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171)
    at proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:373)
{code}

We get exactly the same behavour on both OS X and on Windows. On both zoie is using a SimpleFSDirectory.
We also use a SingleInstanceLockFactory (since our process is the only one working with the index), but we get the same behaviour with a NativeFSLock.

The snapshot backup is being made by calling:

*proj.zoie.impl.indexing.ZoieSystem.exportSnapshot(WritableByteChannel)*

Same issue in zoie JIRA:

http://snaprojects.jira.com/browse/ZOIE-51

  was:
We have a system running lucene and zoie. We use lucene as a content store for a CMS/DAM system. We use the hot-backup feature of zoie to make scheduled backups of the index. This works fine for small indexes and when there are not a lot of changes to the index when the backup is made.

On large indexes (about 5 GB to 19 GB), when a backup is made while the index is being changed a lot (lots of document additions and/or deletions), we almost always get a 'read past EOF' at some point, followed by lots of 'Lock obtain timed out'.
At that point we get lots of 0 kb files in the index, data gets lots, and the index is unusable.

When we stop our server, remove the 0kb files and restart our server, the index is operational again, but data has been lost.

I'm not sure if this is a zoie or a lucene issue, so i'm posting it to both. Hopefully someone has some ideas where to look to fix this.


Some more details...

Stack trace of the read past EOF and following Lock obtain timed out:

{code}
78307 [proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader@31ca5085] ERROR proj.zoie.impl.indexing.internal.BaseSearchIndex - read past EOF
java.io.IOException: read past EOF
    at org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:154)
    at org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:39)
    at org.apache.lucene.store.ChecksumIndexInput.readByte(ChecksumIndexInput.java:37)
    at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:69)
    at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:245)
    at org.apache.lucene.index.IndexFileDeleter.<init>(IndexFileDeleter.java:166)
    at org.apache.lucene.index.DirectoryReader.doCommit(DirectoryReader.java:725)
    at org.apache.lucene.index.IndexReader.commit(IndexReader.java:987)
    at org.apache.lucene.index.IndexReader.commit(IndexReader.java:973)
    at org.apache.lucene.index.IndexReader.decRef(IndexReader.java:162)
    at org.apache.lucene.index.IndexReader.close(IndexReader.java:1003)
    at proj.zoie.impl.indexing.internal.BaseSearchIndex.deleteDocs(BaseSearchIndex.java:203)
    at proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:223)
    at proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153)
    at proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134)
    at proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171)
    at proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:373)
579336 [proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader@31ca5085] ERROR proj.zoie.impl.indexing.internal.LuceneIndexDataLoader - 
    Problem copying segments: Lock obtain timed out: org.apache.lucene.store.SingleInstanceLock@5ad0b895: write.lock
org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: org.apache.lucene.store.SingleInstanceLock@5ad0b895: write.lock
    at org.apache.lucene.store.Lock.obtain(Lock.java:84)
    at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1060)
    at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:957)
    at proj.zoie.impl.indexing.internal.DiskSearchIndex.openIndexWriter(DiskSearchIndex.java:176)
    at proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:228)
    at proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153)
    at proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134)
    at proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171)
    at proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:373)
{code}

We get exactly the same behavour on both OS X and on Windows. On both zoie is using a SimpleFSDirectory.
We also use a SingleInstanceLockFactory (since our process is the only one working with the index), but we get the same behaviour with a NativeFSLock.

The snapshot backup is being made by calling:

*proj.zoie.impl.indexing.ZoieSystem.exportSnapshot(WritableByteChannel)*

Same issue in zoie JIRA:

http://snaprojects.jira.com/browse/ZOIE-51


> Index corruption after 'read past EOF' under heavy update load and snapshot export
> ----------------------------------------------------------------------------------
>
>                 Key: LUCENE-2729
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2729
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>    Affects Versions: 3.0.1, 3.0.2
>         Environment: Happens on both OS X 10.6 and Windows 2008 Server. Integrated with zoie (using a zoie snapshot from 2010-08-06: zoie-2.0.0-snapshot-20100806.jar).
>            Reporter: Nico Krijnen
>         Attachments: 2010-11-02 IndexWriter infoStream log.zip, LUCENE-2729-test1.patch
>
>
> We have a system running lucene and zoie. We use lucene as a content store for a CMS/DAM system. We use the hot-backup feature of zoie to make scheduled backups of the index. This works fine for small indexes and when there are not a lot of changes to the index when the backup is made.
> On large indexes (about 5 GB to 19 GB), when a backup is made while the index is being changed a lot (lots of document additions and/or deletions), we almost always get a 'read past EOF' at some point, followed by lots of 'Lock obtain timed out'.
> At that point we get lots of 0 kb files in the index, data gets lots, and the index is unusable.
> When we stop our server, remove the 0kb files and restart our server, the index is operational again, but data has been lost.
> I'm not sure if this is a zoie or a lucene issue, so i'm posting it to both. Hopefully someone has some ideas where to look to fix this.
> Some more details...
> Stack trace of the read past EOF and following Lock obtain timed out:
> {code}
> 78307 [proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader@31ca5085] 
>     ERROR proj.zoie.impl.indexing.internal.BaseSearchIndex - read past EOF
> java.io.IOException: read past EOF
>     at org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:154)
>     at org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:39)
>     at org.apache.lucene.store.ChecksumIndexInput.readByte(ChecksumIndexInput.java:37)
>     at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:69)
>     at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:245)
>     at org.apache.lucene.index.IndexFileDeleter.<init>(IndexFileDeleter.java:166)
>     at org.apache.lucene.index.DirectoryReader.doCommit(DirectoryReader.java:725)
>     at org.apache.lucene.index.IndexReader.commit(IndexReader.java:987)
>     at org.apache.lucene.index.IndexReader.commit(IndexReader.java:973)
>     at org.apache.lucene.index.IndexReader.decRef(IndexReader.java:162)
>     at org.apache.lucene.index.IndexReader.close(IndexReader.java:1003)
>     at proj.zoie.impl.indexing.internal.BaseSearchIndex.deleteDocs(BaseSearchIndex.java:203)
>     at proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:223)
>     at proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153)
>     at proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134)
>     at proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171)
>     at proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:373)
> 579336 [proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader@31ca5085] 
>     ERROR proj.zoie.impl.indexing.internal.LuceneIndexDataLoader - 
>     Problem copying segments: Lock obtain timed out: 
>     org.apache.lucene.store.SingleInstanceLock@5ad0b895: write.lock
> org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: 
>     org.apache.lucene.store.SingleInstanceLock@5ad0b895: write.lock
>     at org.apache.lucene.store.Lock.obtain(Lock.java:84)
>     at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1060)
>     at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:957)
>     at proj.zoie.impl.indexing.internal.DiskSearchIndex.openIndexWriter(DiskSearchIndex.java:176)
>     at proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:228)
>     at proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153)
>     at proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134)
>     at proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171)
>     at proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:373)
> {code}
> We get exactly the same behavour on both OS X and on Windows. On both zoie is using a SimpleFSDirectory.
> We also use a SingleInstanceLockFactory (since our process is the only one working with the index), but we get the same behaviour with a NativeFSLock.
> The snapshot backup is being made by calling:
> *proj.zoie.impl.indexing.ZoieSystem.exportSnapshot(WritableByteChannel)*
> Same issue in zoie JIRA:
> http://snaprojects.jira.com/browse/ZOIE-51

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] Commented: (LUCENE-2729) Index corruption after 'read past EOF' under heavy update load and snapshot export

Posted by "Nico Krijnen (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12926859#action_12926859 ] 

Nico Krijnen commented on LUCENE-2729:
--------------------------------------

A second file listing from another test run, same result: read past EOF

{code}
jteb:assetIndex jteb$ ls -la
total 38739848
drwxr-xr-x  2 jteb  jteb        4964 26 okt 11:51 .
drwxr-xr-x  3 jteb  jteb         204 22 okt 11:42 ..
-rw-r--r--  1 jteb  jteb   969134416 18 okt 16:41 _2q.fdt
-rw-r--r--  1 jteb  jteb       36652 18 okt 16:41 _2q.fdx
-rw-r--r--  1 jteb  jteb         276 18 okt 16:41 _2q.fnm
-rw-r--r--  1 jteb  jteb     4685726 18 okt 16:41 _2q.frq
-rw-r--r--  1 jteb  jteb        9166 18 okt 16:41 _2q.nrm
-rw-r--r--  1 jteb  jteb   393230403 18 okt 16:42 _2q.prx
-rw-r--r--  1 jteb  jteb        7447 18 okt 16:42 _2q.tii
-rw-r--r--  1 jteb  jteb      746299 18 okt 16:42 _2q.tis
-rw-r--r--  1 jteb  jteb        8394 18 okt 16:42 _2q.tvd
-rw-r--r--  1 jteb  jteb   599185081 18 okt 16:42 _2q.tvf
-rw-r--r--  1 jteb  jteb       73300 18 okt 16:42 _2q.tvx
-rw-r--r--  1 jteb  jteb  2061261675 18 okt 16:44 _39.fdt
-rw-r--r--  1 jteb  jteb        1012 18 okt 16:44 _39.fdx
-rw-r--r--  1 jteb  jteb         276 18 okt 16:44 _39.fnm
-rw-r--r--  1 jteb  jteb    17754579 18 okt 16:44 _39.frq
-rw-r--r--  1 jteb  jteb         256 18 okt 16:44 _39.nrm
-rw-r--r--  1 jteb  jteb   121067407 18 okt 16:44 _39.prx
-rw-r--r--  1 jteb  jteb      137511 18 okt 16:44 _39.tii
-rw-r--r--  1 jteb  jteb    11726653 18 okt 16:44 _39.tis
-rw-r--r--  1 jteb  jteb         185 18 okt 16:44 _39.tvd
-rw-r--r--  1 jteb  jteb   233037042 18 okt 16:44 _39.tvf
-rw-r--r--  1 jteb  jteb        2020 18 okt 16:44 _39.tvx
-rw-r--r--  1 jteb  jteb  1595882722 18 okt 16:45 _3u.fdt
-rw-r--r--  1 jteb  jteb       63692 18 okt 16:45 _3u.fdx
-rw-r--r--  1 jteb  jteb         330 18 okt 16:45 _3u.fnm
-rw-r--r--  1 jteb  jteb     8001869 18 okt 16:45 _3u.frq
-rw-r--r--  1 jteb  jteb       15926 18 okt 16:45 _3u.nrm
-rw-r--r--  1 jteb  jteb   647374863 18 okt 16:45 _3u.prx
-rw-r--r--  1 jteb  jteb       11319 18 okt 16:45 _3u.tii
-rw-r--r--  1 jteb  jteb     1168399 18 okt 16:45 _3u.tis
-rw-r--r--  1 jteb  jteb       14209 18 okt 16:45 _3u.tvd
-rw-r--r--  1 jteb  jteb   986370136 18 okt 16:46 _3u.tvf
-rw-r--r--  1 jteb  jteb      127380 18 okt 16:46 _3u.tvx
-rw-r--r--  1 jteb  jteb  2057147455 18 okt 16:47 _3v.fdt
-rw-r--r--  1 jteb  jteb         476 18 okt 16:47 _3v.fdx
-rw-r--r--  1 jteb  jteb         384 18 okt 16:47 _3v.fnm
-rw-r--r--  1 jteb  jteb    15201111 18 okt 16:47 _3v.frq
-rw-r--r--  1 jteb  jteb         122 18 okt 16:47 _3v.nrm
-rw-r--r--  1 jteb  jteb   109724024 18 okt 16:47 _3v.prx
-rw-r--r--  1 jteb  jteb      132491 18 okt 16:47 _3v.tii
-rw-r--r--  1 jteb  jteb    11457688 18 okt 16:47 _3v.tis
-rw-r--r--  1 jteb  jteb         114 18 okt 16:47 _3v.tvd
-rw-r--r--  1 jteb  jteb   211902147 18 okt 16:48 _3v.tvf
-rw-r--r--  1 jteb  jteb         948 18 okt 16:48 _3v.tvx
-rw-r--r--  1 jteb  jteb  2691565961 18 okt 16:49 _4c.fdt
-rw-r--r--  1 jteb  jteb       39572 18 okt 16:49 _4c.fdx
-rw-r--r--  1 jteb  jteb         276 18 okt 16:49 _4c.fnm
-rw-r--r--  1 jteb  jteb    18724620 18 okt 16:49 _4c.frq
-rw-r--r--  1 jteb  jteb        9896 18 okt 16:49 _4c.nrm
-rw-r--r--  1 jteb  jteb   590255960 18 okt 16:50 _4c.prx
-rw-r--r--  1 jteb  jteb      141243 18 okt 16:50 _4c.tii
-rw-r--r--  1 jteb  jteb    12185869 18 okt 16:50 _4c.tis
-rw-r--r--  1 jteb  jteb        9894 18 okt 16:50 _4c.tvd
-rw-r--r--  1 jteb  jteb   932649779 18 okt 16:51 _4c.tvf
-rw-r--r--  1 jteb  jteb       79140 18 okt 16:51 _4c.tvx
-rw-r--r--  1 jteb  jteb  2398908136 18 okt 16:52 _4d.fdt
-rw-r--r--  1 jteb  jteb         548 18 okt 16:52 _4d.fdx
-rw-r--r--  1 jteb  jteb         354 18 okt 16:52 _4d.fnm
-rw-r--r--  1 jteb  jteb    24581614 18 okt 16:52 _4d.frq
-rw-r--r--  1 jteb  jteb         140 18 okt 16:52 _4d.nrm
-rw-r--r--  1 jteb  jteb   158243133 18 okt 16:52 _4d.prx
-rw-r--r--  1 jteb  jteb      141948 18 okt 16:52 _4d.tii
-rw-r--r--  1 jteb  jteb    12259425 18 okt 16:52 _4d.tis
-rw-r--r--  1 jteb  jteb         140 18 okt 16:52 _4d.tvd
-rw-r--r--  1 jteb  jteb   303769970 18 okt 16:53 _4d.tvf
-rw-r--r--  1 jteb  jteb        1092 18 okt 16:53 _4d.tvx
-rw-r--r--  1 jteb  jteb  1081212027 18 okt 16:53 _4p.fdt
-rw-r--r--  1 jteb  jteb         212 18 okt 16:53 _4p.fdx
-rw-r--r--  1 jteb  jteb         354 18 okt 16:53 _4p.fnm
-rw-r--r--  1 jteb  jteb     8294102 18 okt 16:53 _4p.frq
-rw-r--r--  1 jteb  jteb          56 18 okt 16:53 _4p.nrm
-rw-r--r--  1 jteb  jteb    60513257 18 okt 16:53 _4p.prx
-rw-r--r--  1 jteb  jteb      134898 18 okt 16:53 _4p.tii
-rw-r--r--  1 jteb  jteb    11376730 18 okt 16:53 _4p.tis
-rw-r--r--  1 jteb  jteb          56 18 okt 16:53 _4p.tvd
-rw-r--r--  1 jteb  jteb   116715012 18 okt 16:53 _4p.tvf
-rw-r--r--  1 jteb  jteb         420 18 okt 16:53 _4p.tvx
-rw-r--r--  1 jteb  jteb   787581180 18 okt 16:54 _55.fdt
-rw-r--r--  1 jteb  jteb         156 18 okt 16:54 _55.fdx
-rw-r--r--  1 jteb  jteb         354 18 okt 16:54 _55.fnm
-rw-r--r--  1 jteb  jteb     5376741 18 okt 16:54 _55.frq
-rw-r--r--  1 jteb  jteb          42 18 okt 16:54 _55.nrm
-rw-r--r--  1 jteb  jteb    39539985 18 okt 16:54 _55.prx
-rw-r--r--  1 jteb  jteb      133483 18 okt 16:54 _55.tii
-rw-r--r--  1 jteb  jteb    11351989 18 okt 16:54 _55.tis
-rw-r--r--  1 jteb  jteb          42 18 okt 16:54 _55.tvd
-rw-r--r--  1 jteb  jteb    75958725 18 okt 16:54 _55.tvf
-rw-r--r--  1 jteb  jteb         308 18 okt 16:54 _55.tvx
-rw-r--r--  1 jteb  jteb           0 22 okt 12:11 _6i.fdt
-rw-r--r--  1 jteb  jteb           0 22 okt 12:11 _6i.fdx
-rw-r--r--  1 jteb  jteb           0 22 okt 12:11 _6i.frq
-rw-r--r--  1 jteb  jteb           0 22 okt 12:05 _6i.nrm
-rw-r--r--  1 jteb  jteb           0 22 okt 12:12 _6i.tvd
-rw-r--r--  1 jteb  jteb           0 22 okt 12:12 _6i.tvf
-rw-r--r--  1 jteb  jteb           0 22 okt 12:11 _6i.tvx
-rw-r--r--  1 jteb  jteb           0 22 okt 12:07 _6j.nrm
-rw-r--r--  1 jteb  jteb           0 22 okt 12:08 _6k.fdt
-rw-r--r--  1 jteb  jteb           0 22 okt 12:08 _6k.fdx
-rw-r--r--  1 jteb  jteb           0 22 okt 12:11 _6k.fnm
-rw-r--r--  1 jteb  jteb           0 22 okt 12:05 _6k.tis
-rw-r--r--  1 jteb  jteb           0 22 okt 12:08 _6l.frq
-rw-r--r--  1 jteb  jteb           0 22 okt 12:11 _6l.nrm
-rw-r--r--  1 jteb  jteb           0 22 okt 12:11 _6l.prx
-rw-r--r--  1 jteb  jteb   356417086 22 okt 12:02 _6u.fdt
-rw-r--r--  1 jteb  jteb      172180 22 okt 12:02 _6u.fdx
-rw-r--r--  1 jteb  jteb         723 22 okt 12:02 _6u.fnm
-rw-r--r--  1 jteb  jteb     8585718 22 okt 12:02 _6u.frq
-rw-r--r--  1 jteb  jteb       43048 22 okt 12:02 _6u.nrm
-rw-r--r--  1 jteb  jteb    18540155 22 okt 12:02 _6u.prx
-rw-r--r--  1 jteb  jteb      185484 22 okt 12:02 _6u.tii
-rw-r--r--  1 jteb  jteb    15851802 22 okt 12:02 _6u.tis
-rw-r--r--  1 jteb  jteb       21534 22 okt 12:02 _6u.tvd
-rw-r--r--  1 jteb  jteb    34227796 22 okt 12:02 _6u.tvf
-rw-r--r--  1 jteb  jteb      344356 22 okt 12:02 _6u.tvx
-rw-r--r--  1 jteb  jteb          39 22 okt 12:05 _6u_1.del
-rw-r--r--  1 jteb  jteb      316687 22 okt 12:05 _6v.fdt
-rw-r--r--  1 jteb  jteb        2716 22 okt 12:05 _6v.fdx
-rw-r--r--  1 jteb  jteb         593 22 okt 12:05 _6v.fnm
-rw-r--r--  1 jteb  jteb       93284 22 okt 12:05 _6v.frq
-rw-r--r--  1 jteb  jteb         343 22 okt 12:05 _6v.nrm
-rw-r--r--  1 jteb  jteb       12416 22 okt 12:05 _6v.prx
-rw-r--r--  1 jteb  jteb        1679 22 okt 12:05 _6v.tii
-rw-r--r--  1 jteb  jteb      120258 22 okt 12:05 _6v.tis
-rw-r--r--  1 jteb  jteb      319766 22 okt 12:05 _6w.fdt
-rw-r--r--  1 jteb  jteb        2756 22 okt 12:05 _6w.fdx
-rw-r--r--  1 jteb  jteb         593 22 okt 12:05 _6w.fnm
-rw-r--r--  1 jteb  jteb       94316 22 okt 12:05 _6w.frq
-rw-r--r--  1 jteb  jteb         348 22 okt 12:05 _6w.nrm
-rw-r--r--  1 jteb  jteb       12593 22 okt 12:05 _6w.prx
-rw-r--r--  1 jteb  jteb        1695 22 okt 12:05 _6w.tii
-rw-r--r--  1 jteb  jteb      117984 22 okt 12:05 _6w.tis
-rw-r--r--  1 jteb  jteb      300004 22 okt 12:05 _6x.fdt
-rw-r--r--  1 jteb  jteb        2548 22 okt 12:05 _6x.fdx
-rw-r--r--  1 jteb  jteb         593 22 okt 12:05 _6x.fnm
-rw-r--r--  1 jteb  jteb       86935 22 okt 12:05 _6x.frq
-rw-r--r--  1 jteb  jteb         322 22 okt 12:05 _6x.nrm
-rw-r--r--  1 jteb  jteb       11811 22 okt 12:05 _6x.prx
-rw-r--r--  1 jteb  jteb        1578 22 okt 12:05 _6x.tii
-rw-r--r--  1 jteb  jteb      111238 22 okt 12:05 _6x.tis
-rw-r--r--  1 jteb  jteb         216 22 okt 13:24 index.diagnostic
-rw-r--r--  1 jteb  jteb           5 22 okt 12:05 index.directory
-rw-r--r--  1 jteb  jteb          20 22 okt 12:05 segments.gen
-rw-r--r--  1 jteb  jteb           0 22 okt 12:05 segments_36
-rw-r--r--  1 jteb  jteb        2877 22 okt 12:05 segments_3g
{code}

> Index corruption after 'read past EOF' under heavy update load and snapshot export
> ----------------------------------------------------------------------------------
>
>                 Key: LUCENE-2729
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2729
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>    Affects Versions: 3.0.1, 3.0.2
>         Environment: Happens on both OS X 10.6 and Windows 2008 Server. Integrated with zoie (using a zoie snapshot from 2010-08-06: zoie-2.0.0-snapshot-20100806.jar).
>            Reporter: Nico Krijnen
>
> We have a system running lucene and zoie. We use lucene as a content store for a CMS/DAM system. We use the hot-backup feature of zoie to make scheduled backups of the index. This works fine for small indexes and when there are not a lot of changes to the index when the backup is made.
> On large indexes (about 5 GB to 19 GB), when a backup is made while the index is being changed a lot (lots of document additions and/or deletions), we almost always get a 'read past EOF' at some point, followed by lots of 'Lock obtain timed out'.
> At that point we get lots of 0 kb files in the index, data gets lots, and the index is unusable.
> When we stop our server, remove the 0kb files and restart our server, the index is operational again, but data has been lost.
> I'm not sure if this is a zoie or a lucene issue, so i'm posting it to both. Hopefully someone has some ideas where to look to fix this.
> Some more details...
> Stack trace of the read past EOF and following Lock obtain timed out:
> {code}
> 78307 [proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader@31ca5085] ERROR proj.zoie.impl.indexing.internal.BaseSearchIndex - read past EOF
> java.io.IOException: read past EOF
>     at org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:154)
>     at org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:39)
>     at org.apache.lucene.store.ChecksumIndexInput.readByte(ChecksumIndexInput.java:37)
>     at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:69)
>     at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:245)
>     at org.apache.lucene.index.IndexFileDeleter.<init>(IndexFileDeleter.java:166)
>     at org.apache.lucene.index.DirectoryReader.doCommit(DirectoryReader.java:725)
>     at org.apache.lucene.index.IndexReader.commit(IndexReader.java:987)
>     at org.apache.lucene.index.IndexReader.commit(IndexReader.java:973)
>     at org.apache.lucene.index.IndexReader.decRef(IndexReader.java:162)
>     at org.apache.lucene.index.IndexReader.close(IndexReader.java:1003)
>     at proj.zoie.impl.indexing.internal.BaseSearchIndex.deleteDocs(BaseSearchIndex.java:203)
>     at proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:223)
>     at proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153)
>     at proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134)
>     at proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171)
>     at proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:373)
> 579336 [proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader@31ca5085] ERROR proj.zoie.impl.indexing.internal.LuceneIndexDataLoader - Problem copying segments: Lock obtain timed out: org.apache.lucene.store.SingleInstanceLock@5ad0b895: write.lock
> org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: org.apache.lucene.store.SingleInstanceLock@5ad0b895: write.lock
>     at org.apache.lucene.store.Lock.obtain(Lock.java:84)
>     at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1060)
>     at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:957)
>     at proj.zoie.impl.indexing.internal.DiskSearchIndex.openIndexWriter(DiskSearchIndex.java:176)
>     at proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:228)
>     at proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153)
>     at proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134)
>     at proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171)
>     at proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:373)
> {code}
> We get exactly the same behavour on both OS X and on Windows. On both zoie is using a SimpleFSDirectory.
> We also use a SingleInstanceLockFactory (since our process is the only one working with the index), but we get the same behaviour with a NativeFSLock.
> The snapshot backup is being made by calling:
> *proj.zoie.impl.indexing.ZoieSystem.exportSnapshot(WritableByteChannel)*
> Same issue in zoie JIRA:
> http://snaprojects.jira.com/browse/ZOIE-51

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] Commented: (LUCENE-2729) Index corruption after 'read past EOF' under heavy update load and snapshot export

Posted by "Simon Willnauer (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12930092#action_12930092 ] 

Simon Willnauer commented on LUCENE-2729:
-----------------------------------------

ah I think I have been on the wrong branch nevermind!

> Index corruption after 'read past EOF' under heavy update load and snapshot export
> ----------------------------------------------------------------------------------
>
>                 Key: LUCENE-2729
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2729
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>    Affects Versions: 3.0.1, 3.0.2
>         Environment: Happens on both OS X 10.6 and Windows 2008 Server. Integrated with zoie (using a zoie snapshot from 2010-08-06: zoie-2.0.0-snapshot-20100806.jar).
>            Reporter: Nico Krijnen
>         Attachments: 2010-11-02 IndexWriter infoStream log.zip, LUCENE-2729-test1.patch
>
>
> We have a system running lucene and zoie. We use lucene as a content store for a CMS/DAM system. We use the hot-backup feature of zoie to make scheduled backups of the index. This works fine for small indexes and when there are not a lot of changes to the index when the backup is made.
> On large indexes (about 5 GB to 19 GB), when a backup is made while the index is being changed a lot (lots of document additions and/or deletions), we almost always get a 'read past EOF' at some point, followed by lots of 'Lock obtain timed out'.
> At that point we get lots of 0 kb files in the index, data gets lots, and the index is unusable.
> When we stop our server, remove the 0kb files and restart our server, the index is operational again, but data has been lost.
> I'm not sure if this is a zoie or a lucene issue, so i'm posting it to both. Hopefully someone has some ideas where to look to fix this.
> Some more details...
> Stack trace of the read past EOF and following Lock obtain timed out:
> {code}
> 78307 [proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader@31ca5085] ERROR proj.zoie.impl.indexing.internal.BaseSearchIndex - read past EOF
> java.io.IOException: read past EOF
>     at org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:154)
>     at org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:39)
>     at org.apache.lucene.store.ChecksumIndexInput.readByte(ChecksumIndexInput.java:37)
>     at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:69)
>     at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:245)
>     at org.apache.lucene.index.IndexFileDeleter.<init>(IndexFileDeleter.java:166)
>     at org.apache.lucene.index.DirectoryReader.doCommit(DirectoryReader.java:725)
>     at org.apache.lucene.index.IndexReader.commit(IndexReader.java:987)
>     at org.apache.lucene.index.IndexReader.commit(IndexReader.java:973)
>     at org.apache.lucene.index.IndexReader.decRef(IndexReader.java:162)
>     at org.apache.lucene.index.IndexReader.close(IndexReader.java:1003)
>     at proj.zoie.impl.indexing.internal.BaseSearchIndex.deleteDocs(BaseSearchIndex.java:203)
>     at proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:223)
>     at proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153)
>     at proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134)
>     at proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171)
>     at proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:373)
> 579336 [proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader@31ca5085] ERROR proj.zoie.impl.indexing.internal.LuceneIndexDataLoader - 
>     Problem copying segments: Lock obtain timed out: org.apache.lucene.store.SingleInstanceLock@5ad0b895: write.lock
> org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: org.apache.lucene.store.SingleInstanceLock@5ad0b895: write.lock
>     at org.apache.lucene.store.Lock.obtain(Lock.java:84)
>     at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1060)
>     at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:957)
>     at proj.zoie.impl.indexing.internal.DiskSearchIndex.openIndexWriter(DiskSearchIndex.java:176)
>     at proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:228)
>     at proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153)
>     at proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134)
>     at proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171)
>     at proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:373)
> {code}
> We get exactly the same behavour on both OS X and on Windows. On both zoie is using a SimpleFSDirectory.
> We also use a SingleInstanceLockFactory (since our process is the only one working with the index), but we get the same behaviour with a NativeFSLock.
> The snapshot backup is being made by calling:
> *proj.zoie.impl.indexing.ZoieSystem.exportSnapshot(WritableByteChannel)*
> Same issue in zoie JIRA:
> http://snaprojects.jira.com/browse/ZOIE-51

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] Commented: (LUCENE-2729) Index corruption after 'read past EOF' under heavy update load and snapshot export

Posted by "Xiaoyang Gu (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12935410#action_12935410 ] 

Xiaoyang Gu commented on LUCENE-2729:
-------------------------------------

regarding "I see a patch against RealtimeIndexDataLoader as well. Does it have anything to do with the 'load manager stopped' errors I encountered?"

This class hasn't been changed since august.

fatal exception during indexing would kill the indexing thread (load manager thread). After any such exception, you will see that error message.
At that point, we cannot recover and it makes no sense to continue indexing.

In Zoie's mbean, there's a HEALTH level value. When that value turns non-0, the system needs some serious attention.

You can use JMX to monitor the system health remotely.

> Index corruption after 'read past EOF' under heavy update load and snapshot export
> ----------------------------------------------------------------------------------
>
>                 Key: LUCENE-2729
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2729
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>    Affects Versions: 3.0.1, 3.0.2
>         Environment: Happens on both OS X 10.6 and Windows 2008 Server. Integrated with zoie (using a zoie snapshot from 2010-08-06: zoie-2.0.0-snapshot-20100806.jar).
>            Reporter: Nico Krijnen
>         Attachments: 2010-11-02 IndexWriter infoStream log.zip, backup_force_failure2.log.zip, eof-extra-logging-4-analysis.txt, eof-extra-logging-4.log.zip, LUCENE-2729-test1.patch, read-past-eof-debugging.zip
>
>
> We have a system running lucene and zoie. We use lucene as a content store for a CMS/DAM system. We use the hot-backup feature of zoie to make scheduled backups of the index. This works fine for small indexes and when there are not a lot of changes to the index when the backup is made.
> On large indexes (about 5 GB to 19 GB), when a backup is made while the index is being changed a lot (lots of document additions and/or deletions), we almost always get a 'read past EOF' at some point, followed by lots of 'Lock obtain timed out'.
> At that point we get lots of 0 kb files in the index, data gets lots, and the index is unusable.
> When we stop our server, remove the 0kb files and restart our server, the index is operational again, but data has been lost.
> I'm not sure if this is a zoie or a lucene issue, so i'm posting it to both. Hopefully someone has some ideas where to look to fix this.
> Some more details...
> Stack trace of the read past EOF and following Lock obtain timed out:
> {code}
> 78307 [proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader@31ca5085] 
>     ERROR proj.zoie.impl.indexing.internal.BaseSearchIndex - read past EOF
> java.io.IOException: read past EOF
>     at org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:154)
>     at org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:39)
>     at org.apache.lucene.store.ChecksumIndexInput.readByte(ChecksumIndexInput.java:37)
>     at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:69)
>     at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:245)
>     at org.apache.lucene.index.IndexFileDeleter.<init>(IndexFileDeleter.java:166)
>     at org.apache.lucene.index.DirectoryReader.doCommit(DirectoryReader.java:725)
>     at org.apache.lucene.index.IndexReader.commit(IndexReader.java:987)
>     at org.apache.lucene.index.IndexReader.commit(IndexReader.java:973)
>     at org.apache.lucene.index.IndexReader.decRef(IndexReader.java:162)
>     at org.apache.lucene.index.IndexReader.close(IndexReader.java:1003)
>     at proj.zoie.impl.indexing.internal.BaseSearchIndex.deleteDocs(BaseSearchIndex.java:203)
>     at proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:223)
>     at proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153)
>     at proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134)
>     at proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171)
>     at proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:373)
> 579336 [proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader@31ca5085] 
>     ERROR proj.zoie.impl.indexing.internal.LuceneIndexDataLoader - 
>     Problem copying segments: Lock obtain timed out: 
>     org.apache.lucene.store.SingleInstanceLock@5ad0b895: write.lock
> org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: 
>     org.apache.lucene.store.SingleInstanceLock@5ad0b895: write.lock
>     at org.apache.lucene.store.Lock.obtain(Lock.java:84)
>     at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1060)
>     at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:957)
>     at proj.zoie.impl.indexing.internal.DiskSearchIndex.openIndexWriter(DiskSearchIndex.java:176)
>     at proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:228)
>     at proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153)
>     at proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134)
>     at proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171)
>     at proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:373)
> {code}
> We get exactly the same behavour on both OS X and on Windows. On both zoie is using a SimpleFSDirectory.
> We also use a SingleInstanceLockFactory (since our process is the only one working with the index), but we get the same behaviour with a NativeFSLock.
> The snapshot backup is being made by calling:
> *proj.zoie.impl.indexing.ZoieSystem.exportSnapshot(WritableByteChannel)*
> Same issue in zoie JIRA:
> http://snaprojects.jira.com/browse/ZOIE-51

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] Issue Comment Edited: (LUCENE-2729) Index corruption after 'read past EOF' under heavy update load and snapshot export

Posted by "Nico Krijnen (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12932519#action_12932519 ] 

Nico Krijnen edited comment on LUCENE-2729 at 11/16/10 11:19 AM:
-----------------------------------------------------------------

We did a few more runs with lots more logging and *we think we have pinpointed the cause of the issue*. Logs and a short analysis is attached: eof-extra-logging-4-analysis.txt

We added logging to track files that are opened for read and write. At the moment the 'read past EOF' happens, no files are still open for writing. So it does not seem as if a file is opened, but never closed. However we did find another interesting entry:

- Some time before the 'read past EOF' we see some activity in the disk index on the segments_... file that is 'locked' by the zoie snapshot.
-- 'our' segments_... file is deleted (the logged dir listings show that it does not exist anymore after that)
-- 'our' segments_... file is written to. At this point it is created again, but becomes 0 kb, which at a later point causes the 'read past EOF'.

We added more logging to find out more about when and who is 'setting' or 'changing' the 'generation' in SegmentInfos.

The logs from that run seem to indicate that *at some point SegementInfos.generation seems to 'wrap' to a lower value and earlier generation numbers get re-used*.

In ZoieIndexDeletionPolicy, a Snapshot is initialized using a certain IndexCommit. Zoie doesn't take into account that Lucene may re-use segments filenames, no locking is applied to this file whatsoever in terms of keeping Lucene from re-using this specific segments filename.

AFAICT by going through the code in SegmentInfos, the generation counter is reset in SegmentInfos.read(Directory). When looking at the log files and singling out the *value of generation, in our case it doesn't even reach a value above 118*. The segments filename of our Snapshot is 'segments_2q' which translates to a generation of 98. After a while, Lucene deletes this file on disk (because the name is re-used?) in a call to FSDirectory.deleteFile. However, our backup is still running at that point. Somehow this file is written to disk a moment later and on the next read, we get the 'read past EOF' error.
It's not completely clear to me, but why is the generation counter reset on read(Directory)? Is it because the file is determined through FindSegmentsFile(Directory).run()?

Although this could be interpreted like a flaw in ZoieIndexDeletionPolicy, I think this should be worked around in Lucene (correct me if I'm wrong).

We can attempt some kind of 'fix' to skip generations corresponding to segment filenames used by the '_currentSnapshots' variable in ZoieIndexDeletionPolicy. But the IndexDeletionPolicy API or SegmentInfos class does not really provide a nice way of doing that and it doesn't feel like a proper fix.

*Is there a specific reason for the generation to 'wrap around' or be re-used within hours?*
Looking at the code in SegmentInfos I do not see any explicit logic or reasons for re-using earlier generation numbers.

      was (Author: nkrijnen):
    We did a few more runs with lots more logging and *we think we have pinpointed the cause of the issue*. Logs and a short analyisis is attached: eof-extra-logging-4-analysis.txt

We added logging to track files that are opened for read and write. At the moment the 'read past EOF' happens, no files are still open for writing. So it does not seem as if a file is opened, but never closed. However we did find another interesting entry:

- Some time before the 'read past EOF' we see some activity in the disk index on the segments_... file that is 'locked' by the zoie snapshot.
-- 'our' segments_... file is deleted (the logged dir listings show that it does not exist anymore after that)
-- 'our' segments_... file is written to. At this point it is created again, but becomes 0 kb, which at a later point causes the 'read past EOF'.

We added more logging to find out more about when and who is 'setting' or 'changing' the 'generation' in SegmentInfos.

The logs from that run seem to indicate that *at some point SegementInfos.generation seems to 'wrap' to a lower value and earlier generation numbers get re-used*.

In ZoieIndexDeletionPolicy, a Snapshot is initialized using a certain IndexCommit. Zoie doesn't take into account that Lucene may re-use segments filenames, no locking is applied to this file whatsoever in terms of keeping Lucene from re-using this specific segments filename.

AFAICT by going through the code in SegmentInfos, the generation counter is reset in SegmentInfos.read(Directory). When looking at the log files and singling out the *value of generation, in our case it doesn't even reach a value above 118*. The segments filename of our Snapshot is 'segments_2q' which translates to a generation of 98. After a while, Lucene deletes this file on disk (because the name is re-used?) in a call to FSDirectory.deleteFile. However, our backup is still running at that point. Somehow this file is written to disk a moment later and on the next read, we get the 'read past EOF' error.
It's not completely clear to me, but why is the generation counter reset on read(Directory)? Is it because the file is determined through FindSegmentsFile(Directory).run()?

Although this could be interpreted like a flaw in ZoieIndexDeletionPolicy, I think this should be worked around in Lucene (correct me if I'm wrong).

We can attempt some kind of 'fix' to skip generations corresponding to segment filenames used by the '_currentSnapshots' variable in ZoieIndexDeletionPolicy. But the IndexDeletionPolicy API or SegmentInfos class does not really provide a nice way of doing that and it doesn't feel like a proper fix.

*Is there a specific reason for the generation to 'wrap around' or be re-used within hours?*
Looking at the code in SegmentInfos I do not see any explicit logic or reasons for re-using earlier generation numbers.
  
> Index corruption after 'read past EOF' under heavy update load and snapshot export
> ----------------------------------------------------------------------------------
>
>                 Key: LUCENE-2729
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2729
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>    Affects Versions: 3.0.1, 3.0.2
>         Environment: Happens on both OS X 10.6 and Windows 2008 Server. Integrated with zoie (using a zoie snapshot from 2010-08-06: zoie-2.0.0-snapshot-20100806.jar).
>            Reporter: Nico Krijnen
>         Attachments: 2010-11-02 IndexWriter infoStream log.zip, backup_force_failure2.log.zip, eof-extra-logging-4-analysis.txt, eof-extra-logging-4.log.zip, LUCENE-2729-test1.patch
>
>
> We have a system running lucene and zoie. We use lucene as a content store for a CMS/DAM system. We use the hot-backup feature of zoie to make scheduled backups of the index. This works fine for small indexes and when there are not a lot of changes to the index when the backup is made.
> On large indexes (about 5 GB to 19 GB), when a backup is made while the index is being changed a lot (lots of document additions and/or deletions), we almost always get a 'read past EOF' at some point, followed by lots of 'Lock obtain timed out'.
> At that point we get lots of 0 kb files in the index, data gets lots, and the index is unusable.
> When we stop our server, remove the 0kb files and restart our server, the index is operational again, but data has been lost.
> I'm not sure if this is a zoie or a lucene issue, so i'm posting it to both. Hopefully someone has some ideas where to look to fix this.
> Some more details...
> Stack trace of the read past EOF and following Lock obtain timed out:
> {code}
> 78307 [proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader@31ca5085] 
>     ERROR proj.zoie.impl.indexing.internal.BaseSearchIndex - read past EOF
> java.io.IOException: read past EOF
>     at org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:154)
>     at org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:39)
>     at org.apache.lucene.store.ChecksumIndexInput.readByte(ChecksumIndexInput.java:37)
>     at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:69)
>     at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:245)
>     at org.apache.lucene.index.IndexFileDeleter.<init>(IndexFileDeleter.java:166)
>     at org.apache.lucene.index.DirectoryReader.doCommit(DirectoryReader.java:725)
>     at org.apache.lucene.index.IndexReader.commit(IndexReader.java:987)
>     at org.apache.lucene.index.IndexReader.commit(IndexReader.java:973)
>     at org.apache.lucene.index.IndexReader.decRef(IndexReader.java:162)
>     at org.apache.lucene.index.IndexReader.close(IndexReader.java:1003)
>     at proj.zoie.impl.indexing.internal.BaseSearchIndex.deleteDocs(BaseSearchIndex.java:203)
>     at proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:223)
>     at proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153)
>     at proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134)
>     at proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171)
>     at proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:373)
> 579336 [proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader@31ca5085] 
>     ERROR proj.zoie.impl.indexing.internal.LuceneIndexDataLoader - 
>     Problem copying segments: Lock obtain timed out: 
>     org.apache.lucene.store.SingleInstanceLock@5ad0b895: write.lock
> org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: 
>     org.apache.lucene.store.SingleInstanceLock@5ad0b895: write.lock
>     at org.apache.lucene.store.Lock.obtain(Lock.java:84)
>     at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1060)
>     at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:957)
>     at proj.zoie.impl.indexing.internal.DiskSearchIndex.openIndexWriter(DiskSearchIndex.java:176)
>     at proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:228)
>     at proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153)
>     at proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134)
>     at proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171)
>     at proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:373)
> {code}
> We get exactly the same behavour on both OS X and on Windows. On both zoie is using a SimpleFSDirectory.
> We also use a SingleInstanceLockFactory (since our process is the only one working with the index), but we get the same behaviour with a NativeFSLock.
> The snapshot backup is being made by calling:
> *proj.zoie.impl.indexing.ZoieSystem.exportSnapshot(WritableByteChannel)*
> Same issue in zoie JIRA:
> http://snaprojects.jira.com/browse/ZOIE-51

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] Updated: (LUCENE-2729) Index corruption after 'read past EOF' under heavy update load and snapshot export

Posted by "Nico Krijnen (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/LUCENE-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Nico Krijnen updated LUCENE-2729:
---------------------------------

    Attachment: backup_force_failure2.log.zip

> Index corruption after 'read past EOF' under heavy update load and snapshot export
> ----------------------------------------------------------------------------------
>
>                 Key: LUCENE-2729
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2729
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>    Affects Versions: 3.0.1, 3.0.2
>         Environment: Happens on both OS X 10.6 and Windows 2008 Server. Integrated with zoie (using a zoie snapshot from 2010-08-06: zoie-2.0.0-snapshot-20100806.jar).
>            Reporter: Nico Krijnen
>         Attachments: 2010-11-02 IndexWriter infoStream log.zip, backup_force_failure2.log.zip, LUCENE-2729-test1.patch
>
>
> We have a system running lucene and zoie. We use lucene as a content store for a CMS/DAM system. We use the hot-backup feature of zoie to make scheduled backups of the index. This works fine for small indexes and when there are not a lot of changes to the index when the backup is made.
> On large indexes (about 5 GB to 19 GB), when a backup is made while the index is being changed a lot (lots of document additions and/or deletions), we almost always get a 'read past EOF' at some point, followed by lots of 'Lock obtain timed out'.
> At that point we get lots of 0 kb files in the index, data gets lots, and the index is unusable.
> When we stop our server, remove the 0kb files and restart our server, the index is operational again, but data has been lost.
> I'm not sure if this is a zoie or a lucene issue, so i'm posting it to both. Hopefully someone has some ideas where to look to fix this.
> Some more details...
> Stack trace of the read past EOF and following Lock obtain timed out:
> {code}
> 78307 [proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader@31ca5085] 
>     ERROR proj.zoie.impl.indexing.internal.BaseSearchIndex - read past EOF
> java.io.IOException: read past EOF
>     at org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:154)
>     at org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:39)
>     at org.apache.lucene.store.ChecksumIndexInput.readByte(ChecksumIndexInput.java:37)
>     at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:69)
>     at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:245)
>     at org.apache.lucene.index.IndexFileDeleter.<init>(IndexFileDeleter.java:166)
>     at org.apache.lucene.index.DirectoryReader.doCommit(DirectoryReader.java:725)
>     at org.apache.lucene.index.IndexReader.commit(IndexReader.java:987)
>     at org.apache.lucene.index.IndexReader.commit(IndexReader.java:973)
>     at org.apache.lucene.index.IndexReader.decRef(IndexReader.java:162)
>     at org.apache.lucene.index.IndexReader.close(IndexReader.java:1003)
>     at proj.zoie.impl.indexing.internal.BaseSearchIndex.deleteDocs(BaseSearchIndex.java:203)
>     at proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:223)
>     at proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153)
>     at proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134)
>     at proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171)
>     at proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:373)
> 579336 [proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader@31ca5085] 
>     ERROR proj.zoie.impl.indexing.internal.LuceneIndexDataLoader - 
>     Problem copying segments: Lock obtain timed out: 
>     org.apache.lucene.store.SingleInstanceLock@5ad0b895: write.lock
> org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: 
>     org.apache.lucene.store.SingleInstanceLock@5ad0b895: write.lock
>     at org.apache.lucene.store.Lock.obtain(Lock.java:84)
>     at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1060)
>     at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:957)
>     at proj.zoie.impl.indexing.internal.DiskSearchIndex.openIndexWriter(DiskSearchIndex.java:176)
>     at proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:228)
>     at proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153)
>     at proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134)
>     at proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171)
>     at proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:373)
> {code}
> We get exactly the same behavour on both OS X and on Windows. On both zoie is using a SimpleFSDirectory.
> We also use a SingleInstanceLockFactory (since our process is the only one working with the index), but we get the same behaviour with a NativeFSLock.
> The snapshot backup is being made by calling:
> *proj.zoie.impl.indexing.ZoieSystem.exportSnapshot(WritableByteChannel)*
> Same issue in zoie JIRA:
> http://snaprojects.jira.com/browse/ZOIE-51

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] Commented: (LUCENE-2729) Index corruption after 'read past EOF' under heavy update load and snapshot export

Posted by "Jason Rutherglen (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12927031#action_12927031 ] 

Jason Rutherglen commented on LUCENE-2729:
------------------------------------------

Using Solr 1.4.2 on disk full .del files were being written with a file length of zero, however that is supposed to be fixed by https://issues.apache.org/jira/browse/LUCENE-2593  This doesn't appear to be similar because more than the .del files are of zero length.

> Index corruption after 'read past EOF' under heavy update load and snapshot export
> ----------------------------------------------------------------------------------
>
>                 Key: LUCENE-2729
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2729
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>    Affects Versions: 3.0.1, 3.0.2
>         Environment: Happens on both OS X 10.6 and Windows 2008 Server. Integrated with zoie (using a zoie snapshot from 2010-08-06: zoie-2.0.0-snapshot-20100806.jar).
>            Reporter: Nico Krijnen
>
> We have a system running lucene and zoie. We use lucene as a content store for a CMS/DAM system. We use the hot-backup feature of zoie to make scheduled backups of the index. This works fine for small indexes and when there are not a lot of changes to the index when the backup is made.
> On large indexes (about 5 GB to 19 GB), when a backup is made while the index is being changed a lot (lots of document additions and/or deletions), we almost always get a 'read past EOF' at some point, followed by lots of 'Lock obtain timed out'.
> At that point we get lots of 0 kb files in the index, data gets lots, and the index is unusable.
> When we stop our server, remove the 0kb files and restart our server, the index is operational again, but data has been lost.
> I'm not sure if this is a zoie or a lucene issue, so i'm posting it to both. Hopefully someone has some ideas where to look to fix this.
> Some more details...
> Stack trace of the read past EOF and following Lock obtain timed out:
> {code}
> 78307 [proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader@31ca5085] ERROR proj.zoie.impl.indexing.internal.BaseSearchIndex - read past EOF
> java.io.IOException: read past EOF
>     at org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:154)
>     at org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:39)
>     at org.apache.lucene.store.ChecksumIndexInput.readByte(ChecksumIndexInput.java:37)
>     at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:69)
>     at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:245)
>     at org.apache.lucene.index.IndexFileDeleter.<init>(IndexFileDeleter.java:166)
>     at org.apache.lucene.index.DirectoryReader.doCommit(DirectoryReader.java:725)
>     at org.apache.lucene.index.IndexReader.commit(IndexReader.java:987)
>     at org.apache.lucene.index.IndexReader.commit(IndexReader.java:973)
>     at org.apache.lucene.index.IndexReader.decRef(IndexReader.java:162)
>     at org.apache.lucene.index.IndexReader.close(IndexReader.java:1003)
>     at proj.zoie.impl.indexing.internal.BaseSearchIndex.deleteDocs(BaseSearchIndex.java:203)
>     at proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:223)
>     at proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153)
>     at proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134)
>     at proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171)
>     at proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:373)
> 579336 [proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader@31ca5085] ERROR proj.zoie.impl.indexing.internal.LuceneIndexDataLoader - 
>     Problem copying segments: Lock obtain timed out: org.apache.lucene.store.SingleInstanceLock@5ad0b895: write.lock
> org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: org.apache.lucene.store.SingleInstanceLock@5ad0b895: write.lock
>     at org.apache.lucene.store.Lock.obtain(Lock.java:84)
>     at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1060)
>     at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:957)
>     at proj.zoie.impl.indexing.internal.DiskSearchIndex.openIndexWriter(DiskSearchIndex.java:176)
>     at proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:228)
>     at proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153)
>     at proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134)
>     at proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171)
>     at proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:373)
> {code}
> We get exactly the same behavour on both OS X and on Windows. On both zoie is using a SimpleFSDirectory.
> We also use a SingleInstanceLockFactory (since our process is the only one working with the index), but we get the same behaviour with a NativeFSLock.
> The snapshot backup is being made by calling:
> *proj.zoie.impl.indexing.ZoieSystem.exportSnapshot(WritableByteChannel)*
> Same issue in zoie JIRA:
> http://snaprojects.jira.com/browse/ZOIE-51

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] Updated: (LUCENE-2729) Index corruption after 'read past EOF' under heavy update load and snapshot export

Posted by "Nico Krijnen (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/LUCENE-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Nico Krijnen updated LUCENE-2729:
---------------------------------

    Description: 
We have a system running lucene and zoie. We use lucene as a content store for a CMS/DAM system. We use the hot-backup feature of zoie to make scheduled backups of the index. This works fine for small indexes and when there are not a lot of changes to the index when the backup is made.

On large indexes (about 5 GB to 19 GB), when a backup is made while the index is being changed a lot (lots of document additions and/or deletions), we almost always get a 'read past EOF' at some point, followed by lots of 'Lock obtain timed out'.
At that point we get lots of 0 kb files in the index, data gets lots, and the index is unusable.

When we stop our server, remove the 0kb files and restart our server, the index is operational again, but data has been lost.

I'm not sure if this is a zoie or a lucene issue, so i'm posting it to both. Hopefully someone has some ideas where to look to fix this.


Some more details...

Stack trace of the read past EOF and following Lock obtain timed out:

{code}
78307 [proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader@31ca5085] ERROR proj.zoie.impl.indexing.internal.BaseSearchIndex - read past EOF
java.io.IOException: read past EOF
    at org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:154)
    at org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:39)
    at org.apache.lucene.store.ChecksumIndexInput.readByte(ChecksumIndexInput.java:37)
    at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:69)
    at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:245)
    at org.apache.lucene.index.IndexFileDeleter.<init>(IndexFileDeleter.java:166)
    at org.apache.lucene.index.DirectoryReader.doCommit(DirectoryReader.java:725)
    at org.apache.lucene.index.IndexReader.commit(IndexReader.java:987)
    at org.apache.lucene.index.IndexReader.commit(IndexReader.java:973)
    at org.apache.lucene.index.IndexReader.decRef(IndexReader.java:162)
    at org.apache.lucene.index.IndexReader.close(IndexReader.java:1003)
    at proj.zoie.impl.indexing.internal.BaseSearchIndex.deleteDocs(BaseSearchIndex.java:203)
    at proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:223)
    at proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153)
    at proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134)
    at proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171)
    at proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:373)
579336 [proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader@31ca5085] ERROR proj.zoie.impl.indexing.internal.LuceneIndexDataLoader - Problem copying segments: Lock obtain timed out: org.apache.lucene.store.SingleInstanceLock@5ad0b895: write.lock
org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: org.apache.lucene.store.SingleInstanceLock@5ad0b895: write.lock
    at org.apache.lucene.store.Lock.obtain(Lock.java:84)
    at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1060)
    at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:957)
    at proj.zoie.impl.indexing.internal.DiskSearchIndex.openIndexWriter(DiskSearchIndex.java:176)
    at proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:228)
    at proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153)
    at proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134)
    at proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171)
    at proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:373)
{code}

We get exactly the same behavour on both OS X and on Windows. On both zoie is using a SimpleFSDirectory.
We also use a SingleInstanceLockFactory (since our process is the only one working with the index), but we get the same behaviour with a NativeFSLock.

The snapshot backup is being made by calling:

*proj.zoie.impl.indexing.ZoieSystem.exportSnapshot(WritableByteChannel)*

Same issue in zoie JIRA:

http://snaprojects.jira.com/browse/ZOIE-51

  was:
We have a system running lucene and zoie. We use lucene as a content store for a CMS/DAM system. We use the hot-backup feature of zoie to make scheduled backups of the index. This works fine for small indexes and when there are not a lot of changes to the index when the backup is made.

On large indexes (about 5 GB to 19 GB), when a backup is made while the index is being changed a lot (lots of document additions and/or deletions), we almost always get a 'read past EOF' at some point, followed by lots of 'Lock obtain timed out'.
At that point we get lots of 0 kb files in the index, data gets lots, and the index is unusable.

When we stop our server, remove the 0kb files and restart our server, the index is operational again, but data has been lost.

I'm not sure if this is a zoie or a lucene issue, so i'm posting it to both. Hopefully someone has some ideas where to look to fix this.


Some more details...

Stack trace of the read past EOF and following Lock obtain timed out:

{code}
78307 [proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader@31ca5085] ERROR proj.zoie.impl.indexing.internal.BaseSearchIndex - read past EOF
java.io.IOException: read past EOF
    at org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:154)
    at org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:39)
    at org.apache.lucene.store.ChecksumIndexInput.readByte(ChecksumIndexInput.java:37)
    at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:69)
    at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:245)
    at org.apache.lucene.index.IndexFileDeleter.<init>(IndexFileDeleter.java:166)
    at org.apache.lucene.index.DirectoryReader.doCommit(DirectoryReader.java:725)
    at org.apache.lucene.index.IndexReader.commit(IndexReader.java:987)
    at org.apache.lucene.index.IndexReader.commit(IndexReader.java:973)
    at org.apache.lucene.index.IndexReader.decRef(IndexReader.java:162)
    at org.apache.lucene.index.IndexReader.close(IndexReader.java:1003)
    at proj.zoie.impl.indexing.internal.BaseSearchIndex.deleteDocs(BaseSearchIndex.java:203)
    at proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:223)
    at proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153)
    at proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134)
    at proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171)
    at proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:373)
579336 [proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader@31ca5085] ERROR proj.zoie.impl.indexing.internal.LuceneIndexDataLoader - Problem copying segments: Lock obtain timed out: org.apache.lucene.store.SingleInstanceLock@5ad0b895: write.lock
org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: org.apache.lucene.store.SingleInstanceLock@5ad0b895: write.lock
    at org.apache.lucene.store.Lock.obtain(Lock.java:84)
    at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1060)
    at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:957)
    at proj.zoie.impl.indexing.internal.DiskSearchIndex.openIndexWriter(DiskSearchIndex.java:176)
    at proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:228)
    at proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153)
    at proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134)
    at proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171)
    at proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:373)
{code}

We get exactly the same behavour on both OS X and on Windows. On both zoie is using a SimpleFSDirectory.
We also use a SingleInstanceLockFactory (since our process is the only one working with the index), but we get the same behaviour with a NativeFSLock.

The snapshot backup is being made by calling:

*proj.zoie.impl.indexing.ZoieSystem.exportSnapshot(WritableByteChannel)*

    Environment: Happens on both OS X 10.6 and Windows 2008 Server. Integrated with zoie (using a zoie snapshot from 2010-08-06: zoie-2.0.0-snapshot-20100806.jar).  (was: Happens on both OS X 10.6 and Windows 2008 Server. Integrated with zoie.)
        Summary: Index corruption after 'read past EOF' under heavy update load and snapshot export  (was: Index corruption after 'read past EOF' under heavy update load)

> Index corruption after 'read past EOF' under heavy update load and snapshot export
> ----------------------------------------------------------------------------------
>
>                 Key: LUCENE-2729
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2729
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>    Affects Versions: 3.0.1, 3.0.2
>         Environment: Happens on both OS X 10.6 and Windows 2008 Server. Integrated with zoie (using a zoie snapshot from 2010-08-06: zoie-2.0.0-snapshot-20100806.jar).
>            Reporter: Nico Krijnen
>
> We have a system running lucene and zoie. We use lucene as a content store for a CMS/DAM system. We use the hot-backup feature of zoie to make scheduled backups of the index. This works fine for small indexes and when there are not a lot of changes to the index when the backup is made.
> On large indexes (about 5 GB to 19 GB), when a backup is made while the index is being changed a lot (lots of document additions and/or deletions), we almost always get a 'read past EOF' at some point, followed by lots of 'Lock obtain timed out'.
> At that point we get lots of 0 kb files in the index, data gets lots, and the index is unusable.
> When we stop our server, remove the 0kb files and restart our server, the index is operational again, but data has been lost.
> I'm not sure if this is a zoie or a lucene issue, so i'm posting it to both. Hopefully someone has some ideas where to look to fix this.
> Some more details...
> Stack trace of the read past EOF and following Lock obtain timed out:
> {code}
> 78307 [proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader@31ca5085] ERROR proj.zoie.impl.indexing.internal.BaseSearchIndex - read past EOF
> java.io.IOException: read past EOF
>     at org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:154)
>     at org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:39)
>     at org.apache.lucene.store.ChecksumIndexInput.readByte(ChecksumIndexInput.java:37)
>     at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:69)
>     at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:245)
>     at org.apache.lucene.index.IndexFileDeleter.<init>(IndexFileDeleter.java:166)
>     at org.apache.lucene.index.DirectoryReader.doCommit(DirectoryReader.java:725)
>     at org.apache.lucene.index.IndexReader.commit(IndexReader.java:987)
>     at org.apache.lucene.index.IndexReader.commit(IndexReader.java:973)
>     at org.apache.lucene.index.IndexReader.decRef(IndexReader.java:162)
>     at org.apache.lucene.index.IndexReader.close(IndexReader.java:1003)
>     at proj.zoie.impl.indexing.internal.BaseSearchIndex.deleteDocs(BaseSearchIndex.java:203)
>     at proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:223)
>     at proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153)
>     at proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134)
>     at proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171)
>     at proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:373)
> 579336 [proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader@31ca5085] ERROR proj.zoie.impl.indexing.internal.LuceneIndexDataLoader - Problem copying segments: Lock obtain timed out: org.apache.lucene.store.SingleInstanceLock@5ad0b895: write.lock
> org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: org.apache.lucene.store.SingleInstanceLock@5ad0b895: write.lock
>     at org.apache.lucene.store.Lock.obtain(Lock.java:84)
>     at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1060)
>     at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:957)
>     at proj.zoie.impl.indexing.internal.DiskSearchIndex.openIndexWriter(DiskSearchIndex.java:176)
>     at proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:228)
>     at proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153)
>     at proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134)
>     at proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171)
>     at proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:373)
> {code}
> We get exactly the same behavour on both OS X and on Windows. On both zoie is using a SimpleFSDirectory.
> We also use a SingleInstanceLockFactory (since our process is the only one working with the index), but we get the same behaviour with a NativeFSLock.
> The snapshot backup is being made by calling:
> *proj.zoie.impl.indexing.ZoieSystem.exportSnapshot(WritableByteChannel)*
> Same issue in zoie JIRA:
> http://snaprojects.jira.com/browse/ZOIE-51

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] Commented: (LUCENE-2729) Index corruption after 'read past EOF' under heavy update load and snapshot export

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12931081#action_12931081 ] 

Michael McCandless commented on LUCENE-2729:
--------------------------------------------

bq. I'll see if we can also add some logging to detect (including stack-trace) who opened a file but never closes it..

Actually you can use the MockDirectoryWrapper in Lucene's unit tests for this -- it tracks open inputs & outputs and then throws an exception on close if anything is still open.  You should modify it to print all open inputs/outputs instead of throwing an exception, for this usage.

> Index corruption after 'read past EOF' under heavy update load and snapshot export
> ----------------------------------------------------------------------------------
>
>                 Key: LUCENE-2729
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2729
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>    Affects Versions: 3.0.1, 3.0.2
>         Environment: Happens on both OS X 10.6 and Windows 2008 Server. Integrated with zoie (using a zoie snapshot from 2010-08-06: zoie-2.0.0-snapshot-20100806.jar).
>            Reporter: Nico Krijnen
>         Attachments: 2010-11-02 IndexWriter infoStream log.zip, backup_force_failure2.log.zip, LUCENE-2729-test1.patch
>
>
> We have a system running lucene and zoie. We use lucene as a content store for a CMS/DAM system. We use the hot-backup feature of zoie to make scheduled backups of the index. This works fine for small indexes and when there are not a lot of changes to the index when the backup is made.
> On large indexes (about 5 GB to 19 GB), when a backup is made while the index is being changed a lot (lots of document additions and/or deletions), we almost always get a 'read past EOF' at some point, followed by lots of 'Lock obtain timed out'.
> At that point we get lots of 0 kb files in the index, data gets lots, and the index is unusable.
> When we stop our server, remove the 0kb files and restart our server, the index is operational again, but data has been lost.
> I'm not sure if this is a zoie or a lucene issue, so i'm posting it to both. Hopefully someone has some ideas where to look to fix this.
> Some more details...
> Stack trace of the read past EOF and following Lock obtain timed out:
> {code}
> 78307 [proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader@31ca5085] 
>     ERROR proj.zoie.impl.indexing.internal.BaseSearchIndex - read past EOF
> java.io.IOException: read past EOF
>     at org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:154)
>     at org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:39)
>     at org.apache.lucene.store.ChecksumIndexInput.readByte(ChecksumIndexInput.java:37)
>     at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:69)
>     at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:245)
>     at org.apache.lucene.index.IndexFileDeleter.<init>(IndexFileDeleter.java:166)
>     at org.apache.lucene.index.DirectoryReader.doCommit(DirectoryReader.java:725)
>     at org.apache.lucene.index.IndexReader.commit(IndexReader.java:987)
>     at org.apache.lucene.index.IndexReader.commit(IndexReader.java:973)
>     at org.apache.lucene.index.IndexReader.decRef(IndexReader.java:162)
>     at org.apache.lucene.index.IndexReader.close(IndexReader.java:1003)
>     at proj.zoie.impl.indexing.internal.BaseSearchIndex.deleteDocs(BaseSearchIndex.java:203)
>     at proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:223)
>     at proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153)
>     at proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134)
>     at proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171)
>     at proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:373)
> 579336 [proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader@31ca5085] 
>     ERROR proj.zoie.impl.indexing.internal.LuceneIndexDataLoader - 
>     Problem copying segments: Lock obtain timed out: 
>     org.apache.lucene.store.SingleInstanceLock@5ad0b895: write.lock
> org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: 
>     org.apache.lucene.store.SingleInstanceLock@5ad0b895: write.lock
>     at org.apache.lucene.store.Lock.obtain(Lock.java:84)
>     at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1060)
>     at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:957)
>     at proj.zoie.impl.indexing.internal.DiskSearchIndex.openIndexWriter(DiskSearchIndex.java:176)
>     at proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:228)
>     at proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153)
>     at proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134)
>     at proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171)
>     at proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:373)
> {code}
> We get exactly the same behavour on both OS X and on Windows. On both zoie is using a SimpleFSDirectory.
> We also use a SingleInstanceLockFactory (since our process is the only one working with the index), but we get the same behaviour with a NativeFSLock.
> The snapshot backup is being made by calling:
> *proj.zoie.impl.indexing.ZoieSystem.exportSnapshot(WritableByteChannel)*
> Same issue in zoie JIRA:
> http://snaprojects.jira.com/browse/ZOIE-51

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] Commented: (LUCENE-2729) Index corruption after 'read past EOF' under heavy update load and snapshot export

Posted by "Nico Krijnen (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12931049#action_12931049 ] 

Nico Krijnen commented on LUCENE-2729:
--------------------------------------

About 'segments_2j' occuring multiple times. I forgot to mention that we have two indexes operating at the same time... one for assets, one for relations.
So adding the directory + RAM/disk index in the logging sounds like a very good plan indeed.

About the 0-bytes on close detection logs, seems we log this a few lines too early (before super.close is called instead of after) - whoops... will correct this and run again.

{quote}
It looks like this 0 length files are being opened (by something) but then never closed.
{quote}

I'll see if we can also add some logging to detect (including stack-trace) who opened a file but never closes it...

{quote}
EG are you sure you're running the JRE with assertions enabled?
{quote}

Pretty sure it is not, we'll turn assertions on.

{quote}
when you print out the full file listing, in addition to the size of each file, also print out the .fileModified result?
{quote}

Will do.

> Index corruption after 'read past EOF' under heavy update load and snapshot export
> ----------------------------------------------------------------------------------
>
>                 Key: LUCENE-2729
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2729
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>    Affects Versions: 3.0.1, 3.0.2
>         Environment: Happens on both OS X 10.6 and Windows 2008 Server. Integrated with zoie (using a zoie snapshot from 2010-08-06: zoie-2.0.0-snapshot-20100806.jar).
>            Reporter: Nico Krijnen
>         Attachments: 2010-11-02 IndexWriter infoStream log.zip, backup_force_failure2.log.zip, LUCENE-2729-test1.patch
>
>
> We have a system running lucene and zoie. We use lucene as a content store for a CMS/DAM system. We use the hot-backup feature of zoie to make scheduled backups of the index. This works fine for small indexes and when there are not a lot of changes to the index when the backup is made.
> On large indexes (about 5 GB to 19 GB), when a backup is made while the index is being changed a lot (lots of document additions and/or deletions), we almost always get a 'read past EOF' at some point, followed by lots of 'Lock obtain timed out'.
> At that point we get lots of 0 kb files in the index, data gets lots, and the index is unusable.
> When we stop our server, remove the 0kb files and restart our server, the index is operational again, but data has been lost.
> I'm not sure if this is a zoie or a lucene issue, so i'm posting it to both. Hopefully someone has some ideas where to look to fix this.
> Some more details...
> Stack trace of the read past EOF and following Lock obtain timed out:
> {code}
> 78307 [proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader@31ca5085] 
>     ERROR proj.zoie.impl.indexing.internal.BaseSearchIndex - read past EOF
> java.io.IOException: read past EOF
>     at org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:154)
>     at org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:39)
>     at org.apache.lucene.store.ChecksumIndexInput.readByte(ChecksumIndexInput.java:37)
>     at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:69)
>     at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:245)
>     at org.apache.lucene.index.IndexFileDeleter.<init>(IndexFileDeleter.java:166)
>     at org.apache.lucene.index.DirectoryReader.doCommit(DirectoryReader.java:725)
>     at org.apache.lucene.index.IndexReader.commit(IndexReader.java:987)
>     at org.apache.lucene.index.IndexReader.commit(IndexReader.java:973)
>     at org.apache.lucene.index.IndexReader.decRef(IndexReader.java:162)
>     at org.apache.lucene.index.IndexReader.close(IndexReader.java:1003)
>     at proj.zoie.impl.indexing.internal.BaseSearchIndex.deleteDocs(BaseSearchIndex.java:203)
>     at proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:223)
>     at proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153)
>     at proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134)
>     at proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171)
>     at proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:373)
> 579336 [proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader@31ca5085] 
>     ERROR proj.zoie.impl.indexing.internal.LuceneIndexDataLoader - 
>     Problem copying segments: Lock obtain timed out: 
>     org.apache.lucene.store.SingleInstanceLock@5ad0b895: write.lock
> org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: 
>     org.apache.lucene.store.SingleInstanceLock@5ad0b895: write.lock
>     at org.apache.lucene.store.Lock.obtain(Lock.java:84)
>     at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1060)
>     at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:957)
>     at proj.zoie.impl.indexing.internal.DiskSearchIndex.openIndexWriter(DiskSearchIndex.java:176)
>     at proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:228)
>     at proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153)
>     at proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134)
>     at proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171)
>     at proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:373)
> {code}
> We get exactly the same behavour on both OS X and on Windows. On both zoie is using a SimpleFSDirectory.
> We also use a SingleInstanceLockFactory (since our process is the only one working with the index), but we get the same behaviour with a NativeFSLock.
> The snapshot backup is being made by calling:
> *proj.zoie.impl.indexing.ZoieSystem.exportSnapshot(WritableByteChannel)*
> Same issue in zoie JIRA:
> http://snaprojects.jira.com/browse/ZOIE-51

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] Commented: (LUCENE-2729) Index corruption after 'read past EOF' under heavy update load and snapshot export

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12930130#action_12930130 ] 

Michael McCandless commented on LUCENE-2729:
--------------------------------------------


Thank you for attaching the IW infoStream output!  Sorry it took so
long for me to respond.

Aside: it is sad but there is no master TODO list in open source.  It
all comes down to our own email inboxes, todo lists, post-it notes all
over the place, etc., and (in my case anyway) things sometimes fall
past the event horizon.

So please if I don't respond in a day or two on an active issue, bump
it again (put a comment on the issue)!  I'd much rather people
over-nag than under-nag but unfortunately under-nag is far far more
common and it causes important issues to languish unnecessarily.

OK back to the issue :)

I looked through the infoStream but I don't see a smoking gun.  Ie,
the logs indicate that nowhere did Lucene try to delete/overwrite
those zero-length files; I see other files being deleted, so, this is
what I'd expect given that ZoieDeletionPolicy is presumably protecting
the segments_3t commit point (to back up its files).

I do see some spooky "other" exceptions, though... these are the first
2 exceptions I see in the log:

{noformat}
14:27:41,290 [bigIndexBuilder_QueueProcessor_3] WARN  com.ds.acm.logic.impl.AssetManagerImpl -
  Ignoring AssetNotFoundException trying to make sure all metadata from index is loaded before updating an existing asset
Exception in thread "pool-5-thread-6" java.lang.NullPointerException
	at org.apache.coyote.http11.InternalNioOutputBuffer.writeToSocket(InternalNioOutputBuffer.java:430)
	at org.apache.coyote.http11.InternalNioOutputBuffer.flushBuffer(InternalNioOutputBuffer.java:784)
	at org.apache.coyote.http11.InternalNioOutputBuffer.flush(InternalNioOutputBuffer.java:300)
	at org.apache.coyote.http11.Http11NioProcessor.action(Http11NioProcessor.java:1060)
	at org.apache.coyote.Response.action(Response.java:183)
	at org.apache.catalina.connector.OutputBuffer.doFlush(OutputBuffer.java:314)
	at org.apache.catalina.connector.OutputBuffer.flush(OutputBuffer.java:288)
	at org.apache.catalina.connector.Response.flushBuffer(Response.java:548)
	at org.apache.catalina.connector.ResponseFacade.flushBuffer(ResponseFacade.java:279)
	at org.granite.gravity.AbstractChannel.runReceived(AbstractChannel.java:251)
	at org.granite.gravity.AbstractChannel.runReceive(AbstractChannel.java:199)
	at org.granite.gravity.AsyncReceiver.doRun(AsyncReceiver.java:34)
	at org.granite.gravity.AsyncChannelRunner.run(AsyncChannelRunner.java:52)
	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
	at java.lang.Thread.run(Thread.java:680)
{noformat}

and

{noformat}
14:40:18,382 [Low Memory Detector] WARN  com.ds.acm.engine.search.zoieimpl.core.ZoieSystemManager -
  Ignoring timeout while attempting to flush zoie memory index to disk to free memory
proj.zoie.api.ZoieException: sync timed out
	at proj.zoie.impl.indexing.AsyncDataConsumer.syncWthVersion(AsyncDataConsumer.java:177)
	at proj.zoie.impl.indexing.AsyncDataConsumer.flushEvents(AsyncDataConsumer.java:155)
	at proj.zoie.impl.indexing.ZoieSystem.flushEvents(ZoieSystem.java:308)
	at com.ds.acm.engine.search.zoieimpl.core.ZoieSystemManager.onLowMemory(ZoieSystemManager.java:220)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)
	at com.ds.util.event.BasicEventBroadcaster$Handler.invokeMethod(BasicEventBroadcaster.java:197)
	at com.ds.util.event.BasicEventBroadcaster$Handler.handle(BasicEventBroadcaster.java:190)
	at com.ds.util.event.BasicEventBroadcaster.fire(BasicEventBroadcaster.java:108)
	at com.ds.util.cache.LowMemoryWarningBroadcaster$1.handleNotification(LowMemoryWarningBroadcaster.java:135)
	at sun.management.NotificationEmitterSupport.sendNotification(NotificationEmitterSupport.java:138)
	at sun.management.MemoryImpl.createNotification(MemoryImpl.java:171)
	at sun.management.MemoryPoolImpl$PoolSensor.triggerAction(MemoryPoolImpl.java:272)
	at sun.management.Sensor.trigger(Sensor.java:120)
{noformat}

That 2nd exception happens a total of 9 times... and is rather
spooky.  What does it mean?  Ie, why is Zoie timing out on flushing
the index to disk, and, what does it then do w/ its RAMDir?

I also see alot of these:

{noformat}
15:50:18,856 [bigIndexBuilder_QueueProcessor_10] WARN  com.ds.acm.logic.impl.AssetManagerImpl -
  Ignoring AssetNotFoundException trying to make sure all metadata from index is loaded before
  updating an existing asset
{noformat}

Would it be possible to instrument to Zoie code to note as the backup
process is copying each file in the snapshot, and at that point print
a listing of the directory?

Also, can you write to the log when Zoie applies deletes?  (Looks like
it happens in proj.zoie.impl.indexing.internal.BaseSearchIndex.deleteDocs).
It's on applying deletes that the corruption is first detected, so, if
we log this event we can better bracket the period of time when the
corruption happened.

Does Zoie ever open an IndexReader or IndexWriter passing in an
existing commit point?  Or does it always open the latest commit?

The timestamps on the zero length files are particularly spooky -- the
earliest ones are 15:21 (when first EOF is hit), but then also 15:47
and 15:49 on the others.  It seems like on 3 separate occasions
something truncated the files.


> Index corruption after 'read past EOF' under heavy update load and snapshot export
> ----------------------------------------------------------------------------------
>
>                 Key: LUCENE-2729
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2729
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>    Affects Versions: 3.0.1, 3.0.2
>         Environment: Happens on both OS X 10.6 and Windows 2008 Server. Integrated with zoie (using a zoie snapshot from 2010-08-06: zoie-2.0.0-snapshot-20100806.jar).
>            Reporter: Nico Krijnen
>         Attachments: 2010-11-02 IndexWriter infoStream log.zip, LUCENE-2729-test1.patch
>
>
> We have a system running lucene and zoie. We use lucene as a content store for a CMS/DAM system. We use the hot-backup feature of zoie to make scheduled backups of the index. This works fine for small indexes and when there are not a lot of changes to the index when the backup is made.
> On large indexes (about 5 GB to 19 GB), when a backup is made while the index is being changed a lot (lots of document additions and/or deletions), we almost always get a 'read past EOF' at some point, followed by lots of 'Lock obtain timed out'.
> At that point we get lots of 0 kb files in the index, data gets lots, and the index is unusable.
> When we stop our server, remove the 0kb files and restart our server, the index is operational again, but data has been lost.
> I'm not sure if this is a zoie or a lucene issue, so i'm posting it to both. Hopefully someone has some ideas where to look to fix this.
> Some more details...
> Stack trace of the read past EOF and following Lock obtain timed out:
> {code}
> 78307 [proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader@31ca5085] ERROR proj.zoie.impl.indexing.internal.BaseSearchIndex - read past EOF
> java.io.IOException: read past EOF
>     at org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:154)
>     at org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:39)
>     at org.apache.lucene.store.ChecksumIndexInput.readByte(ChecksumIndexInput.java:37)
>     at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:69)
>     at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:245)
>     at org.apache.lucene.index.IndexFileDeleter.<init>(IndexFileDeleter.java:166)
>     at org.apache.lucene.index.DirectoryReader.doCommit(DirectoryReader.java:725)
>     at org.apache.lucene.index.IndexReader.commit(IndexReader.java:987)
>     at org.apache.lucene.index.IndexReader.commit(IndexReader.java:973)
>     at org.apache.lucene.index.IndexReader.decRef(IndexReader.java:162)
>     at org.apache.lucene.index.IndexReader.close(IndexReader.java:1003)
>     at proj.zoie.impl.indexing.internal.BaseSearchIndex.deleteDocs(BaseSearchIndex.java:203)
>     at proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:223)
>     at proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153)
>     at proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134)
>     at proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171)
>     at proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:373)
> 579336 [proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader@31ca5085] ERROR proj.zoie.impl.indexing.internal.LuceneIndexDataLoader - 
>     Problem copying segments: Lock obtain timed out: org.apache.lucene.store.SingleInstanceLock@5ad0b895: write.lock
> org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: org.apache.lucene.store.SingleInstanceLock@5ad0b895: write.lock
>     at org.apache.lucene.store.Lock.obtain(Lock.java:84)
>     at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1060)
>     at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:957)
>     at proj.zoie.impl.indexing.internal.DiskSearchIndex.openIndexWriter(DiskSearchIndex.java:176)
>     at proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:228)
>     at proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153)
>     at proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134)
>     at proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171)
>     at proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:373)
> {code}
> We get exactly the same behavour on both OS X and on Windows. On both zoie is using a SimpleFSDirectory.
> We also use a SingleInstanceLockFactory (since our process is the only one working with the index), but we get the same behaviour with a NativeFSLock.
> The snapshot backup is being made by calling:
> *proj.zoie.impl.indexing.ZoieSystem.exportSnapshot(WritableByteChannel)*
> Same issue in zoie JIRA:
> http://snaprojects.jira.com/browse/ZOIE-51

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] Commented: (LUCENE-2729) Index corruption after 'read past EOF' under heavy update load and snapshot export

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12927043#action_12927043 ] 

Michael McCandless commented on LUCENE-2729:
--------------------------------------------

Somehow we have to locate the event that causes the truncation of the files.

Can you enable IndexWriter's infoStream and then get the corruption to happen, and post the results?

> Index corruption after 'read past EOF' under heavy update load and snapshot export
> ----------------------------------------------------------------------------------
>
>                 Key: LUCENE-2729
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2729
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>    Affects Versions: 3.0.1, 3.0.2
>         Environment: Happens on both OS X 10.6 and Windows 2008 Server. Integrated with zoie (using a zoie snapshot from 2010-08-06: zoie-2.0.0-snapshot-20100806.jar).
>            Reporter: Nico Krijnen
>
> We have a system running lucene and zoie. We use lucene as a content store for a CMS/DAM system. We use the hot-backup feature of zoie to make scheduled backups of the index. This works fine for small indexes and when there are not a lot of changes to the index when the backup is made.
> On large indexes (about 5 GB to 19 GB), when a backup is made while the index is being changed a lot (lots of document additions and/or deletions), we almost always get a 'read past EOF' at some point, followed by lots of 'Lock obtain timed out'.
> At that point we get lots of 0 kb files in the index, data gets lots, and the index is unusable.
> When we stop our server, remove the 0kb files and restart our server, the index is operational again, but data has been lost.
> I'm not sure if this is a zoie or a lucene issue, so i'm posting it to both. Hopefully someone has some ideas where to look to fix this.
> Some more details...
> Stack trace of the read past EOF and following Lock obtain timed out:
> {code}
> 78307 [proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader@31ca5085] ERROR proj.zoie.impl.indexing.internal.BaseSearchIndex - read past EOF
> java.io.IOException: read past EOF
>     at org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:154)
>     at org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:39)
>     at org.apache.lucene.store.ChecksumIndexInput.readByte(ChecksumIndexInput.java:37)
>     at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:69)
>     at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:245)
>     at org.apache.lucene.index.IndexFileDeleter.<init>(IndexFileDeleter.java:166)
>     at org.apache.lucene.index.DirectoryReader.doCommit(DirectoryReader.java:725)
>     at org.apache.lucene.index.IndexReader.commit(IndexReader.java:987)
>     at org.apache.lucene.index.IndexReader.commit(IndexReader.java:973)
>     at org.apache.lucene.index.IndexReader.decRef(IndexReader.java:162)
>     at org.apache.lucene.index.IndexReader.close(IndexReader.java:1003)
>     at proj.zoie.impl.indexing.internal.BaseSearchIndex.deleteDocs(BaseSearchIndex.java:203)
>     at proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:223)
>     at proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153)
>     at proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134)
>     at proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171)
>     at proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:373)
> 579336 [proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader@31ca5085] ERROR proj.zoie.impl.indexing.internal.LuceneIndexDataLoader - 
>     Problem copying segments: Lock obtain timed out: org.apache.lucene.store.SingleInstanceLock@5ad0b895: write.lock
> org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: org.apache.lucene.store.SingleInstanceLock@5ad0b895: write.lock
>     at org.apache.lucene.store.Lock.obtain(Lock.java:84)
>     at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1060)
>     at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:957)
>     at proj.zoie.impl.indexing.internal.DiskSearchIndex.openIndexWriter(DiskSearchIndex.java:176)
>     at proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:228)
>     at proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153)
>     at proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134)
>     at proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171)
>     at proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:373)
> {code}
> We get exactly the same behavour on both OS X and on Windows. On both zoie is using a SimpleFSDirectory.
> We also use a SingleInstanceLockFactory (since our process is the only one working with the index), but we get the same behaviour with a NativeFSLock.
> The snapshot backup is being made by calling:
> *proj.zoie.impl.indexing.ZoieSystem.exportSnapshot(WritableByteChannel)*
> Same issue in zoie JIRA:
> http://snaprojects.jira.com/browse/ZOIE-51

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] Commented: (LUCENE-2729) Index corruption after 'read past EOF' under heavy update load and snapshot export

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12931034#action_12931034 ] 

Michael McCandless commented on LUCENE-2729:
--------------------------------------------

Also, I like that you detect 0-bytes on close!  However, I'm not sure it's working right.  EG line 1,017,585 has this:
{noformat}
java.lang.Exception: 11:02:18,639 [proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader@7636d59a] FOUND ONE! closing 0kb IndexOutput, file: _65.fnm, file.length():0, bytesWritten: 0
{noformat}

But then shortly thereafter (line 1,017,604) I see that _65.fnm is 1749 bytes long.  Maybe the code you added for this warning is failing to take into account the buffered bytes?  Ie, instead of calling file.length() you should call IndexOutput.getFilePointer()?  Hmm, but if you call file.length() after the file really is closed... then that ought to be correct.

> Index corruption after 'read past EOF' under heavy update load and snapshot export
> ----------------------------------------------------------------------------------
>
>                 Key: LUCENE-2729
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2729
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>    Affects Versions: 3.0.1, 3.0.2
>         Environment: Happens on both OS X 10.6 and Windows 2008 Server. Integrated with zoie (using a zoie snapshot from 2010-08-06: zoie-2.0.0-snapshot-20100806.jar).
>            Reporter: Nico Krijnen
>         Attachments: 2010-11-02 IndexWriter infoStream log.zip, backup_force_failure2.log.zip, LUCENE-2729-test1.patch
>
>
> We have a system running lucene and zoie. We use lucene as a content store for a CMS/DAM system. We use the hot-backup feature of zoie to make scheduled backups of the index. This works fine for small indexes and when there are not a lot of changes to the index when the backup is made.
> On large indexes (about 5 GB to 19 GB), when a backup is made while the index is being changed a lot (lots of document additions and/or deletions), we almost always get a 'read past EOF' at some point, followed by lots of 'Lock obtain timed out'.
> At that point we get lots of 0 kb files in the index, data gets lots, and the index is unusable.
> When we stop our server, remove the 0kb files and restart our server, the index is operational again, but data has been lost.
> I'm not sure if this is a zoie or a lucene issue, so i'm posting it to both. Hopefully someone has some ideas where to look to fix this.
> Some more details...
> Stack trace of the read past EOF and following Lock obtain timed out:
> {code}
> 78307 [proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader@31ca5085] 
>     ERROR proj.zoie.impl.indexing.internal.BaseSearchIndex - read past EOF
> java.io.IOException: read past EOF
>     at org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:154)
>     at org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:39)
>     at org.apache.lucene.store.ChecksumIndexInput.readByte(ChecksumIndexInput.java:37)
>     at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:69)
>     at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:245)
>     at org.apache.lucene.index.IndexFileDeleter.<init>(IndexFileDeleter.java:166)
>     at org.apache.lucene.index.DirectoryReader.doCommit(DirectoryReader.java:725)
>     at org.apache.lucene.index.IndexReader.commit(IndexReader.java:987)
>     at org.apache.lucene.index.IndexReader.commit(IndexReader.java:973)
>     at org.apache.lucene.index.IndexReader.decRef(IndexReader.java:162)
>     at org.apache.lucene.index.IndexReader.close(IndexReader.java:1003)
>     at proj.zoie.impl.indexing.internal.BaseSearchIndex.deleteDocs(BaseSearchIndex.java:203)
>     at proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:223)
>     at proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153)
>     at proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134)
>     at proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171)
>     at proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:373)
> 579336 [proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader@31ca5085] 
>     ERROR proj.zoie.impl.indexing.internal.LuceneIndexDataLoader - 
>     Problem copying segments: Lock obtain timed out: 
>     org.apache.lucene.store.SingleInstanceLock@5ad0b895: write.lock
> org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: 
>     org.apache.lucene.store.SingleInstanceLock@5ad0b895: write.lock
>     at org.apache.lucene.store.Lock.obtain(Lock.java:84)
>     at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1060)
>     at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:957)
>     at proj.zoie.impl.indexing.internal.DiskSearchIndex.openIndexWriter(DiskSearchIndex.java:176)
>     at proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:228)
>     at proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153)
>     at proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134)
>     at proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171)
>     at proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:373)
> {code}
> We get exactly the same behavour on both OS X and on Windows. On both zoie is using a SimpleFSDirectory.
> We also use a SingleInstanceLockFactory (since our process is the only one working with the index), but we get the same behaviour with a NativeFSLock.
> The snapshot backup is being made by calling:
> *proj.zoie.impl.indexing.ZoieSystem.exportSnapshot(WritableByteChannel)*
> Same issue in zoie JIRA:
> http://snaprojects.jira.com/browse/ZOIE-51

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] Commented: (LUCENE-2729) Index corruption after 'read past EOF' under heavy update load and snapshot export

Posted by "Nico Krijnen (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12931024#action_12931024 ] 

Nico Krijnen commented on LUCENE-2729:
--------------------------------------

We finished another run today and got some interesting results with from the additional logging. The log times are searchable in the log file...

- 09:40:21,996: we are creating a snapshot for our backup, segments filename used for snapshot: 'segments_2j'
On the next line we can see that at that moment the segments_2j is 2353 bytes. We log the filenames in the index + the (filesizes).

- 11:03:27,653: segments_2j is written to disk
_How can this be while this is the name of the segments filename used to create our Snapshot?_
On the next line we see that it has indeed been written to disk, but has a 0 byte file size.

- 11:08:02,731: First "read past EOF" exception
_It looks like segments filenames are reused and at a point in time well after our Snapshot is created, the name 'segments_2j' is reused?
This somehow causes the segments file to get a size of 0 bytes. Perhaps it is being deleted and written at the same time, or because of the name clash there's no data to be written to the segments file?_

- 11:16:03,300: our backup is finished.

Here is an excerpt of some of the more interesting log entries:

{quote}
09:39:30,719 [http-8080-2] DEBUG com.ds.acm.logic.tools.BackupManager - Start asset index backup

09:40:21,996 [http-8080-2] DEBUG proj.zoie.impl.indexing.internal.ZoieIndexDeletionPolicy - Ctor Snapshot. segments filename: *segments_2j*, commit filenames: [_5f.tvd, _4c.tvf, _5f.tvf, _4d.fdx, _4p.tis, _4c.tvd, _4p.frq, _5e.tvx, _39.tis, _3u.frq, _4d.fdt, _4p.tii, _5f.fnm, _4c.tvx, _39.tii, _55.tis, _2q.fnm, _4p.fdx, _2q.tvf, _5e.tvf, _5e.tvd, _2q.tvd, _5f.tvx, _4p.fdt, _55.tii, _55.nrm, _55.frq, _2q.tvx, _3u.tvx, _5f.fdx, _4c.frq, _5f.fdt, _4c.fdx, _2q.nrm, _4c.fdt, _4p.tvf, _4c.nrm, _3v.tvf, _4c.tii, _3u.prx, _4p.tvd, _4c.tis, _3u.fdt, _5e.fnm, _3v.tvd, _3u.fdx, _55.tvx, _39.fdt, _39.fnm, _5e.prx, _3u.tis, *segments_2j*, _3v.prx, _2q.fdx, _4c.prx, _4d.prx, _4p.tvx, _39.fdx, _3u.tii, _2q.fdt, _4d.frq, _55.fdt, _2q.prx, _5e.tii, _3u.fnm, _55.fdx, _5e.nrm, _5f.tii, _5e.tis, _4d.nrm, _4d.tii, _5e.fdt, _4d.tis, _5f.frq, _5e.fdx, _39.tvd, _39.tvf, _55.fnm, _3v.frq, _2q.frq, _2q.tii, _3v.tii, _5f.prx, _4p.nrm, _5f.nrm, _2q.tis, _3v.tvx, _4c.fnm, _3v.nrm, _55.tvf, _55.tvd, _3u.nrm, _5f.tis, _3v.tis, _3v.fdt, _4d.fnm, _4d.tvf, _3v.fdx, _4d.tvd, _4d.tvx, _4p.prx, _3v.fnm, _5e.frq, _39.prx, _39.tvx, _39.frq, _4p.fnm, _39.nrm, _3u.tvf, _55.prx, _3u.tvd]

11:03:27,653 [http-8080-2] TRACE com.ds.acm.engine.search.zoieimpl.core.DirectoryManager - Writing *segments_2j*, directory contents: _2q.fdt(969134416), _2q.fdx(36652), _2q.fnm(276), _2q.frq(4685726), _2q.nrm(9166), _2q.prx(393230403), _2q.tii(7447), _2q.tis(746299), _2q.tvd(8394), _2q.tvf(599185081), _2q.tvx(73300), _39.fdt(2061261675), _39.fdx(1012), _39.fnm(276), _39.frq(17754579), _39.nrm(256), _39.prx(121067407), _39.tii(137511), _39.tis(11726653), _39.tvd(185), _39.tvf(233037042), _39.tvx(2020), _3u.fdt(1595882722), _3u.fdx(63692), _3u.fnm(330), _3u.frq(8001869), _3u.nrm(15926), _3u.prx(647374863), _3u.tii(11319), _3u.tis(1168399), _3u.tvd(14209), _3u.tvf(986370136), _3u.tvx(127380), _3v.fdt(2057147455), _3v.fdx(476), _3v.fnm(384), _3v.frq(15201111), _3v.nrm(122), _3v.prx(109724024), _3v.tii(132491), _3v.tis(11457688), _3v.tvd(114), _3v.tvf(211902147), _3v.tvx(948), _4c.fdt(2691565961), _4c.fdx(39572), _4c.fnm(276), _4c.frq(18724620), _4c.nrm(9896), _4c.prx(590255960), _4c.tii(141243), _4c.tis(12185869), _4c.tvd(9894), _4c.tvf(932649779), _4c.tvx(79140), _4d.fdt(2398908136), _4d.fdx(548), _4d.fnm(354), _4d.frq(24581614), _4d.nrm(140), _4d.prx(158243133), _4d.tii(141948), _4d.tis(12259425), _4d.tvd(140), _4d.tvf(303769970), _4d.tvx(1092), _4p.fdt(1081212027), _4p.fdx(212), _4p.fnm(354), _4p.frq(8294102), _4p.nrm(56), _4p.prx(60513257), _4p.tii(134898), _4p.tis(11376730), _4p.tvd(56), _4p.tvf(116715012), _4p.tvx(420), _55.fdt(787581180), _55.fdx(156), _55.fnm(354), _55.frq(5376741), _55.nrm(42), _55.prx(39539985), _55.tii(133483), _55.tis(11351989), _55.tvd(42), _55.tvf(75958725), _55.tvx(308), _5e.prx(0), _5z.fdt(853614415), _5z.fdx(23700), _5z.fnm(1696), _5z.frq(9041984), _5z.nrm(5928), _5z.prx(32617823), _5z.tii(146115), _5z.tis(12387143), _5z.tvd(4142), _5z.tvf(74409815), _5z.tvx(47396), _63.fdt(212690811), _63.fdx(8188), _63.fnm(1652), _63.frq(2492889), _63.nrm(2050), _63.prx(5672093), _63.tii(8470), _63.tis(689993), _63.tvd(1442), _63.tvf(15371474), _63.tvx(16372), _64.fdt(4118409126), _64.fdx(1484), _64.fnm(384), _64.frq(35294399), _64.nrm(374), _64.prx(230791431), _64.tii(143860), _64.tis(12491845), _64.tvd(295), _64.tvf(444939185), _64.tvx(2964), _65.fdt(1369489408), _65.fdx(0), _65.fnm(1749), index.directory(5), segments.gen(20), segments_2x(2354)
11:03:27,706 [http-8080-2] TRACE com.ds.acm.engine.search.zoieimpl.core.DirectoryManager - Writing _3v.prx, directory contents: _2q.fdt(969134416), _2q.fdx(36652), _2q.fnm(276), _2q.frq(4685726), _2q.nrm(9166), _2q.prx(393230403), _2q.tii(7447), _2q.tis(746299), _2q.tvd(8394), _2q.tvf(599185081), _2q.tvx(73300), _39.fdt(2061261675), _39.fdx(1012), _39.fnm(276), _39.frq(17754579), _39.nrm(256), _39.prx(121067407), _39.tii(137511), _39.tis(11726653), _39.tvd(185), _39.tvf(233037042), _39.tvx(2020), _3u.fdt(1595882722), _3u.fdx(63692), _3u.fnm(330), _3u.frq(8001869), _3u.nrm(15926), _3u.prx(647374863), _3u.tii(11319), _3u.tis(1168399), _3u.tvd(14209), _3u.tvf(986370136), _3u.tvx(127380), _3v.fdt(2057147455), _3v.fdx(476), _3v.fnm(384), _3v.frq(15201111), _3v.nrm(122), _3v.prx(109724024), _3v.tii(132491), _3v.tis(11457688), _3v.tvd(114), _3v.tvf(211902147), _3v.tvx(948), _4c.fdt(2691565961), _4c.fdx(39572), _4c.fnm(276), _4c.frq(18724620), _4c.nrm(9896), _4c.prx(590255960), _4c.tii(141243), _4c.tis(12185869), _4c.tvd(9894), _4c.tvf(932649779), _4c.tvx(79140), _4d.fdt(2398908136), _4d.fdx(548), _4d.fnm(354), _4d.frq(24581614), _4d.nrm(140), _4d.prx(158243133), _4d.tii(141948), _4d.tis(12259425), _4d.tvd(140), _4d.tvf(303769970), _4d.tvx(1092), _4p.fdt(1081212027), _4p.fdx(212), _4p.fnm(354), _4p.frq(8294102), _4p.nrm(56), _4p.prx(60513257), _4p.tii(134898), _4p.tis(11376730), _4p.tvd(56), _4p.tvf(116715012), _4p.tvx(420), _55.fdt(787581180), _55.fdx(156), _55.fnm(354), _55.frq(5376741), _55.nrm(42), _55.prx(39539985), _55.tii(133483), _55.tis(11351989), _55.tvd(42), _55.tvf(75958725), _55.tvx(308), _5e.prx(0), _5z.fdt(853614415), _5z.fdx(23700), _5z.fnm(1696), _5z.frq(9041984), _5z.nrm(5928), _5z.prx(32617823), _5z.tii(146115), _5z.tis(12387143), _5z.tvd(4142), _5z.tvf(74409815), _5z.tvx(47396), _63.fdt(212690811), _63.fdx(8188), _63.fnm(1652), _63.frq(2492889), _63.nrm(2050), _63.prx(5672093), _63.tii(8470), _63.tis(689993), _63.tvd(1442), _63.tvf(15371474), _63.tvx(16372), _64.fdt(4118409126), _64.fdx(1484), _64.fnm(384), _64.frq(35294399), _64.nrm(374), _64.prx(230791431), _64.tii(143860), _64.tis(12491845), _64.tvd(295), _64.tvf(444939185), _64.tvx(2964), _65.fdt(1369554944), _65.fdx(0), _65.fnm(1749), index.directory(5), segments.gen(20), *segments_2j(0)*, segments_2x(2354)

IFD [proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader@7636d59a]: init: load commit "*segments_2j*"
IW 1110 [proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader@7636d59a]: init: hit exception on init; releasing write lock
11:08:02,731 [proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader@7636d59a] ERROR proj.zoie.impl.indexing.internal.LuceneIndexDataLoader - Problem copying segments: *read past EOF*, start=0, end=0, bufferStart=0, bufferPosition=0, bufferSize=1024

11:16:03,300 [http-8080-2] DEBUG com.ds.acm.logic.tools.BackupManager - Asset index backup complete
{quote}

Looking in the logs there are *31 occurences of 'delete "segments_2j"' and 32 for 'wrote segments file "segments_2j"'*. Very weird, since the ZoieIndexDeletionPolicy should prevent the segments_2j from being deleted...

So just guessing here, but the problem seems to be that segement filenames are being 'rotated' and getting reused?
The ZoieIndexDeletionPolicy definitely does not seem to take this into account, it uses the segments filename as a unique id to track the references to a specific commit. But looking at it, segment filenames are not unique over a certain period of time, and that time period gets shorter when lots of modifications are being made to the index.
Reading the code it should still prevent any segments_2j commit from being deleted... but we do see deletes for that file... hmmm

And then something else still seems te be going wrong, somehow the segments file does become 0 bytes. It is not because it was written with no data, we added a check for that (look for 'closing 0kb IndexOutput') but that never happens to a segments_* file.

Maybe some logic needs to be added to prevent certain segment_... numbers from being re-used if the commit is still 'referenced' by an IndexDeletionPolicy...?

> Index corruption after 'read past EOF' under heavy update load and snapshot export
> ----------------------------------------------------------------------------------
>
>                 Key: LUCENE-2729
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2729
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>    Affects Versions: 3.0.1, 3.0.2
>         Environment: Happens on both OS X 10.6 and Windows 2008 Server. Integrated with zoie (using a zoie snapshot from 2010-08-06: zoie-2.0.0-snapshot-20100806.jar).
>            Reporter: Nico Krijnen
>         Attachments: 2010-11-02 IndexWriter infoStream log.zip, LUCENE-2729-test1.patch
>
>
> We have a system running lucene and zoie. We use lucene as a content store for a CMS/DAM system. We use the hot-backup feature of zoie to make scheduled backups of the index. This works fine for small indexes and when there are not a lot of changes to the index when the backup is made.
> On large indexes (about 5 GB to 19 GB), when a backup is made while the index is being changed a lot (lots of document additions and/or deletions), we almost always get a 'read past EOF' at some point, followed by lots of 'Lock obtain timed out'.
> At that point we get lots of 0 kb files in the index, data gets lots, and the index is unusable.
> When we stop our server, remove the 0kb files and restart our server, the index is operational again, but data has been lost.
> I'm not sure if this is a zoie or a lucene issue, so i'm posting it to both. Hopefully someone has some ideas where to look to fix this.
> Some more details...
> Stack trace of the read past EOF and following Lock obtain timed out:
> {code}
> 78307 [proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader@31ca5085] 
>     ERROR proj.zoie.impl.indexing.internal.BaseSearchIndex - read past EOF
> java.io.IOException: read past EOF
>     at org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:154)
>     at org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:39)
>     at org.apache.lucene.store.ChecksumIndexInput.readByte(ChecksumIndexInput.java:37)
>     at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:69)
>     at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:245)
>     at org.apache.lucene.index.IndexFileDeleter.<init>(IndexFileDeleter.java:166)
>     at org.apache.lucene.index.DirectoryReader.doCommit(DirectoryReader.java:725)
>     at org.apache.lucene.index.IndexReader.commit(IndexReader.java:987)
>     at org.apache.lucene.index.IndexReader.commit(IndexReader.java:973)
>     at org.apache.lucene.index.IndexReader.decRef(IndexReader.java:162)
>     at org.apache.lucene.index.IndexReader.close(IndexReader.java:1003)
>     at proj.zoie.impl.indexing.internal.BaseSearchIndex.deleteDocs(BaseSearchIndex.java:203)
>     at proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:223)
>     at proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153)
>     at proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134)
>     at proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171)
>     at proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:373)
> 579336 [proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader@31ca5085] 
>     ERROR proj.zoie.impl.indexing.internal.LuceneIndexDataLoader - 
>     Problem copying segments: Lock obtain timed out: 
>     org.apache.lucene.store.SingleInstanceLock@5ad0b895: write.lock
> org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: 
>     org.apache.lucene.store.SingleInstanceLock@5ad0b895: write.lock
>     at org.apache.lucene.store.Lock.obtain(Lock.java:84)
>     at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1060)
>     at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:957)
>     at proj.zoie.impl.indexing.internal.DiskSearchIndex.openIndexWriter(DiskSearchIndex.java:176)
>     at proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:228)
>     at proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153)
>     at proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134)
>     at proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171)
>     at proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:373)
> {code}
> We get exactly the same behavour on both OS X and on Windows. On both zoie is using a SimpleFSDirectory.
> We also use a SingleInstanceLockFactory (since our process is the only one working with the index), but we get the same behaviour with a NativeFSLock.
> The snapshot backup is being made by calling:
> *proj.zoie.impl.indexing.ZoieSystem.exportSnapshot(WritableByteChannel)*
> Same issue in zoie JIRA:
> http://snaprojects.jira.com/browse/ZOIE-51

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] Commented: (LUCENE-2729) Index corruption after 'read past EOF' under heavy update load and snapshot export

Posted by "Jan te Beest (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12935256#action_12935256 ] 

Jan te Beest commented on LUCENE-2729:
--------------------------------------

First of all, many thanks for your support (and patience at it)!

{quote}
Are you running with assertions enabled? Maybe we'll catch
something...
{quote}
Well I had, but somehow the option to the JVM got lost when switching between git branches. 
I'll turn it on again before testing with Xiaoyang's patches merged into my local git repository.

Looking at the patches, I have a good feeling they will solve the issue.

Xiaoyang, I see a patch against RealtimeIndexDataLoader as well. Does it have anything to do with the 'load manager stopped' errors I encountered?
I have to say though that I didn't see those errors when the backup process didn't fail.

{quote}
Also: can you try upgrading to the latest 3.0.x tip (svn checkout
https://svn.apache.org/repos/asf/lucene/java/branches/lucene_3_0)?
This doesn't sound like any bug we've fixed on 3.0.x but it's worth a
shot 
{quote}
I'll do that anyway, I like being at the tip of the branch :)

{quote}
I see strange output for the backup, eg "Backup to file _7c.prx,
dataLen: 0" and then "Backup to file _7c.prx DONE, written: 8" - why
is that? (A number of files show this, but then a bunch of others
look like "correct" sizes). Are you sure the backup copy of
segments_5e is actually valid?
{quote}
We actually encounter this regularly for any files other than segments_*, it seems harmless though.
As for the segments_5e being valid at that point; yes it is.

bq. You should probably mark ZoieIDP.lastSnapshotSegmentsFileName as volatile.
You're totally right, how could I have missed that?

OK, I'll do some more tests and let you know the outcome.


> Index corruption after 'read past EOF' under heavy update load and snapshot export
> ----------------------------------------------------------------------------------
>
>                 Key: LUCENE-2729
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2729
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>    Affects Versions: 3.0.1, 3.0.2
>         Environment: Happens on both OS X 10.6 and Windows 2008 Server. Integrated with zoie (using a zoie snapshot from 2010-08-06: zoie-2.0.0-snapshot-20100806.jar).
>            Reporter: Nico Krijnen
>         Attachments: 2010-11-02 IndexWriter infoStream log.zip, backup_force_failure2.log.zip, eof-extra-logging-4-analysis.txt, eof-extra-logging-4.log.zip, LUCENE-2729-test1.patch, read-past-eof-debugging.zip
>
>
> We have a system running lucene and zoie. We use lucene as a content store for a CMS/DAM system. We use the hot-backup feature of zoie to make scheduled backups of the index. This works fine for small indexes and when there are not a lot of changes to the index when the backup is made.
> On large indexes (about 5 GB to 19 GB), when a backup is made while the index is being changed a lot (lots of document additions and/or deletions), we almost always get a 'read past EOF' at some point, followed by lots of 'Lock obtain timed out'.
> At that point we get lots of 0 kb files in the index, data gets lots, and the index is unusable.
> When we stop our server, remove the 0kb files and restart our server, the index is operational again, but data has been lost.
> I'm not sure if this is a zoie or a lucene issue, so i'm posting it to both. Hopefully someone has some ideas where to look to fix this.
> Some more details...
> Stack trace of the read past EOF and following Lock obtain timed out:
> {code}
> 78307 [proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader@31ca5085] 
>     ERROR proj.zoie.impl.indexing.internal.BaseSearchIndex - read past EOF
> java.io.IOException: read past EOF
>     at org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:154)
>     at org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:39)
>     at org.apache.lucene.store.ChecksumIndexInput.readByte(ChecksumIndexInput.java:37)
>     at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:69)
>     at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:245)
>     at org.apache.lucene.index.IndexFileDeleter.<init>(IndexFileDeleter.java:166)
>     at org.apache.lucene.index.DirectoryReader.doCommit(DirectoryReader.java:725)
>     at org.apache.lucene.index.IndexReader.commit(IndexReader.java:987)
>     at org.apache.lucene.index.IndexReader.commit(IndexReader.java:973)
>     at org.apache.lucene.index.IndexReader.decRef(IndexReader.java:162)
>     at org.apache.lucene.index.IndexReader.close(IndexReader.java:1003)
>     at proj.zoie.impl.indexing.internal.BaseSearchIndex.deleteDocs(BaseSearchIndex.java:203)
>     at proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:223)
>     at proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153)
>     at proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134)
>     at proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171)
>     at proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:373)
> 579336 [proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader@31ca5085] 
>     ERROR proj.zoie.impl.indexing.internal.LuceneIndexDataLoader - 
>     Problem copying segments: Lock obtain timed out: 
>     org.apache.lucene.store.SingleInstanceLock@5ad0b895: write.lock
> org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: 
>     org.apache.lucene.store.SingleInstanceLock@5ad0b895: write.lock
>     at org.apache.lucene.store.Lock.obtain(Lock.java:84)
>     at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1060)
>     at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:957)
>     at proj.zoie.impl.indexing.internal.DiskSearchIndex.openIndexWriter(DiskSearchIndex.java:176)
>     at proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:228)
>     at proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153)
>     at proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134)
>     at proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171)
>     at proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:373)
> {code}
> We get exactly the same behavour on both OS X and on Windows. On both zoie is using a SimpleFSDirectory.
> We also use a SingleInstanceLockFactory (since our process is the only one working with the index), but we get the same behaviour with a NativeFSLock.
> The snapshot backup is being made by calling:
> *proj.zoie.impl.indexing.ZoieSystem.exportSnapshot(WritableByteChannel)*
> Same issue in zoie JIRA:
> http://snaprojects.jira.com/browse/ZOIE-51

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] Commented: (LUCENE-2729) Index corruption after 'read past EOF' under heavy update load and snapshot export

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12931031#action_12931031 ] 

Michael McCandless commented on LUCENE-2729:
--------------------------------------------

I'll look at the new log, but, quickly one correction: Zoie writes first into a RAMDir, and then periodically uses addIndexesNoOptimize to fold that RAM index into the "real" one.

So you have to be careful when you "grep" because for segments_2j, all of the 'delete "segments_2j"' look like they are in the RAMDir (the thread is ConsumerThread), and all but one of the 'wrote segments file "segments_2j"' are also to the RAMDir.

> Index corruption after 'read past EOF' under heavy update load and snapshot export
> ----------------------------------------------------------------------------------
>
>                 Key: LUCENE-2729
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2729
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>    Affects Versions: 3.0.1, 3.0.2
>         Environment: Happens on both OS X 10.6 and Windows 2008 Server. Integrated with zoie (using a zoie snapshot from 2010-08-06: zoie-2.0.0-snapshot-20100806.jar).
>            Reporter: Nico Krijnen
>         Attachments: 2010-11-02 IndexWriter infoStream log.zip, backup_force_failure2.log.zip, LUCENE-2729-test1.patch
>
>
> We have a system running lucene and zoie. We use lucene as a content store for a CMS/DAM system. We use the hot-backup feature of zoie to make scheduled backups of the index. This works fine for small indexes and when there are not a lot of changes to the index when the backup is made.
> On large indexes (about 5 GB to 19 GB), when a backup is made while the index is being changed a lot (lots of document additions and/or deletions), we almost always get a 'read past EOF' at some point, followed by lots of 'Lock obtain timed out'.
> At that point we get lots of 0 kb files in the index, data gets lots, and the index is unusable.
> When we stop our server, remove the 0kb files and restart our server, the index is operational again, but data has been lost.
> I'm not sure if this is a zoie or a lucene issue, so i'm posting it to both. Hopefully someone has some ideas where to look to fix this.
> Some more details...
> Stack trace of the read past EOF and following Lock obtain timed out:
> {code}
> 78307 [proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader@31ca5085] 
>     ERROR proj.zoie.impl.indexing.internal.BaseSearchIndex - read past EOF
> java.io.IOException: read past EOF
>     at org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:154)
>     at org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:39)
>     at org.apache.lucene.store.ChecksumIndexInput.readByte(ChecksumIndexInput.java:37)
>     at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:69)
>     at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:245)
>     at org.apache.lucene.index.IndexFileDeleter.<init>(IndexFileDeleter.java:166)
>     at org.apache.lucene.index.DirectoryReader.doCommit(DirectoryReader.java:725)
>     at org.apache.lucene.index.IndexReader.commit(IndexReader.java:987)
>     at org.apache.lucene.index.IndexReader.commit(IndexReader.java:973)
>     at org.apache.lucene.index.IndexReader.decRef(IndexReader.java:162)
>     at org.apache.lucene.index.IndexReader.close(IndexReader.java:1003)
>     at proj.zoie.impl.indexing.internal.BaseSearchIndex.deleteDocs(BaseSearchIndex.java:203)
>     at proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:223)
>     at proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153)
>     at proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134)
>     at proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171)
>     at proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:373)
> 579336 [proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader@31ca5085] 
>     ERROR proj.zoie.impl.indexing.internal.LuceneIndexDataLoader - 
>     Problem copying segments: Lock obtain timed out: 
>     org.apache.lucene.store.SingleInstanceLock@5ad0b895: write.lock
> org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: 
>     org.apache.lucene.store.SingleInstanceLock@5ad0b895: write.lock
>     at org.apache.lucene.store.Lock.obtain(Lock.java:84)
>     at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1060)
>     at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:957)
>     at proj.zoie.impl.indexing.internal.DiskSearchIndex.openIndexWriter(DiskSearchIndex.java:176)
>     at proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:228)
>     at proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153)
>     at proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134)
>     at proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171)
>     at proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:373)
> {code}
> We get exactly the same behavour on both OS X and on Windows. On both zoie is using a SimpleFSDirectory.
> We also use a SingleInstanceLockFactory (since our process is the only one working with the index), but we get the same behaviour with a NativeFSLock.
> The snapshot backup is being made by calling:
> *proj.zoie.impl.indexing.ZoieSystem.exportSnapshot(WritableByteChannel)*
> Same issue in zoie JIRA:
> http://snaprojects.jira.com/browse/ZOIE-51

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] Commented: (LUCENE-2729) Index corruption after 'read past EOF' under heavy update load and snapshot export

Posted by "Xiaoyang Gu (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12935134#action_12935134 ] 

Xiaoyang Gu commented on LUCENE-2729:
-------------------------------------

at git@github.com:xiaoyang/zoie.git at commit: af19217

> Index corruption after 'read past EOF' under heavy update load and snapshot export
> ----------------------------------------------------------------------------------
>
>                 Key: LUCENE-2729
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2729
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>    Affects Versions: 3.0.1, 3.0.2
>         Environment: Happens on both OS X 10.6 and Windows 2008 Server. Integrated with zoie (using a zoie snapshot from 2010-08-06: zoie-2.0.0-snapshot-20100806.jar).
>            Reporter: Nico Krijnen
>         Attachments: 2010-11-02 IndexWriter infoStream log.zip, backup_force_failure2.log.zip, eof-extra-logging-4-analysis.txt, eof-extra-logging-4.log.zip, LUCENE-2729-test1.patch, read-past-eof-debugging.zip
>
>
> We have a system running lucene and zoie. We use lucene as a content store for a CMS/DAM system. We use the hot-backup feature of zoie to make scheduled backups of the index. This works fine for small indexes and when there are not a lot of changes to the index when the backup is made.
> On large indexes (about 5 GB to 19 GB), when a backup is made while the index is being changed a lot (lots of document additions and/or deletions), we almost always get a 'read past EOF' at some point, followed by lots of 'Lock obtain timed out'.
> At that point we get lots of 0 kb files in the index, data gets lots, and the index is unusable.
> When we stop our server, remove the 0kb files and restart our server, the index is operational again, but data has been lost.
> I'm not sure if this is a zoie or a lucene issue, so i'm posting it to both. Hopefully someone has some ideas where to look to fix this.
> Some more details...
> Stack trace of the read past EOF and following Lock obtain timed out:
> {code}
> 78307 [proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader@31ca5085] 
>     ERROR proj.zoie.impl.indexing.internal.BaseSearchIndex - read past EOF
> java.io.IOException: read past EOF
>     at org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:154)
>     at org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:39)
>     at org.apache.lucene.store.ChecksumIndexInput.readByte(ChecksumIndexInput.java:37)
>     at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:69)
>     at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:245)
>     at org.apache.lucene.index.IndexFileDeleter.<init>(IndexFileDeleter.java:166)
>     at org.apache.lucene.index.DirectoryReader.doCommit(DirectoryReader.java:725)
>     at org.apache.lucene.index.IndexReader.commit(IndexReader.java:987)
>     at org.apache.lucene.index.IndexReader.commit(IndexReader.java:973)
>     at org.apache.lucene.index.IndexReader.decRef(IndexReader.java:162)
>     at org.apache.lucene.index.IndexReader.close(IndexReader.java:1003)
>     at proj.zoie.impl.indexing.internal.BaseSearchIndex.deleteDocs(BaseSearchIndex.java:203)
>     at proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:223)
>     at proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153)
>     at proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134)
>     at proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171)
>     at proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:373)
> 579336 [proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader@31ca5085] 
>     ERROR proj.zoie.impl.indexing.internal.LuceneIndexDataLoader - 
>     Problem copying segments: Lock obtain timed out: 
>     org.apache.lucene.store.SingleInstanceLock@5ad0b895: write.lock
> org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: 
>     org.apache.lucene.store.SingleInstanceLock@5ad0b895: write.lock
>     at org.apache.lucene.store.Lock.obtain(Lock.java:84)
>     at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1060)
>     at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:957)
>     at proj.zoie.impl.indexing.internal.DiskSearchIndex.openIndexWriter(DiskSearchIndex.java:176)
>     at proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:228)
>     at proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153)
>     at proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134)
>     at proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171)
>     at proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:373)
> {code}
> We get exactly the same behavour on both OS X and on Windows. On both zoie is using a SimpleFSDirectory.
> We also use a SingleInstanceLockFactory (since our process is the only one working with the index), but we get the same behaviour with a NativeFSLock.
> The snapshot backup is being made by calling:
> *proj.zoie.impl.indexing.ZoieSystem.exportSnapshot(WritableByteChannel)*
> Same issue in zoie JIRA:
> http://snaprojects.jira.com/browse/ZOIE-51

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] Commented: (LUCENE-2729) Index corruption after 'read past EOF' under heavy update load and snapshot export

Posted by "Jason Rutherglen (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12926360#action_12926360 ] 

Jason Rutherglen commented on LUCENE-2729:
------------------------------------------

Post a listing of the index files with their lengths, ie, ls -la.  

> Index corruption after 'read past EOF' under heavy update load and snapshot export
> ----------------------------------------------------------------------------------
>
>                 Key: LUCENE-2729
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2729
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>    Affects Versions: 3.0.1, 3.0.2
>         Environment: Happens on both OS X 10.6 and Windows 2008 Server. Integrated with zoie (using a zoie snapshot from 2010-08-06: zoie-2.0.0-snapshot-20100806.jar).
>            Reporter: Nico Krijnen
>
> We have a system running lucene and zoie. We use lucene as a content store for a CMS/DAM system. We use the hot-backup feature of zoie to make scheduled backups of the index. This works fine for small indexes and when there are not a lot of changes to the index when the backup is made.
> On large indexes (about 5 GB to 19 GB), when a backup is made while the index is being changed a lot (lots of document additions and/or deletions), we almost always get a 'read past EOF' at some point, followed by lots of 'Lock obtain timed out'.
> At that point we get lots of 0 kb files in the index, data gets lots, and the index is unusable.
> When we stop our server, remove the 0kb files and restart our server, the index is operational again, but data has been lost.
> I'm not sure if this is a zoie or a lucene issue, so i'm posting it to both. Hopefully someone has some ideas where to look to fix this.
> Some more details...
> Stack trace of the read past EOF and following Lock obtain timed out:
> {code}
> 78307 [proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader@31ca5085] ERROR proj.zoie.impl.indexing.internal.BaseSearchIndex - read past EOF
> java.io.IOException: read past EOF
>     at org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:154)
>     at org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:39)
>     at org.apache.lucene.store.ChecksumIndexInput.readByte(ChecksumIndexInput.java:37)
>     at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:69)
>     at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:245)
>     at org.apache.lucene.index.IndexFileDeleter.<init>(IndexFileDeleter.java:166)
>     at org.apache.lucene.index.DirectoryReader.doCommit(DirectoryReader.java:725)
>     at org.apache.lucene.index.IndexReader.commit(IndexReader.java:987)
>     at org.apache.lucene.index.IndexReader.commit(IndexReader.java:973)
>     at org.apache.lucene.index.IndexReader.decRef(IndexReader.java:162)
>     at org.apache.lucene.index.IndexReader.close(IndexReader.java:1003)
>     at proj.zoie.impl.indexing.internal.BaseSearchIndex.deleteDocs(BaseSearchIndex.java:203)
>     at proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:223)
>     at proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153)
>     at proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134)
>     at proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171)
>     at proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:373)
> 579336 [proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader@31ca5085] ERROR proj.zoie.impl.indexing.internal.LuceneIndexDataLoader - Problem copying segments: Lock obtain timed out: org.apache.lucene.store.SingleInstanceLock@5ad0b895: write.lock
> org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: org.apache.lucene.store.SingleInstanceLock@5ad0b895: write.lock
>     at org.apache.lucene.store.Lock.obtain(Lock.java:84)
>     at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1060)
>     at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:957)
>     at proj.zoie.impl.indexing.internal.DiskSearchIndex.openIndexWriter(DiskSearchIndex.java:176)
>     at proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:228)
>     at proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153)
>     at proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134)
>     at proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171)
>     at proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:373)
> {code}
> We get exactly the same behavour on both OS X and on Windows. On both zoie is using a SimpleFSDirectory.
> We also use a SingleInstanceLockFactory (since our process is the only one working with the index), but we get the same behaviour with a NativeFSLock.
> The snapshot backup is being made by calling:
> *proj.zoie.impl.indexing.ZoieSystem.exportSnapshot(WritableByteChannel)*
> Same issue in zoie JIRA:
> http://snaprojects.jira.com/browse/ZOIE-51

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] Commented: (LUCENE-2729) Index corruption after 'read past EOF' under heavy update load and snapshot export

Posted by "Nico Krijnen (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12926908#action_12926908 ] 

Nico Krijnen commented on LUCENE-2729:
--------------------------------------

bq. Was there no original root cause here? Eg disk full?

This was one of the first things i thought, but the disk has more than enough free space: 200GB. Also, for this test we write the backup to a different disk - both for better performance and to prevent the disk with the index on it from running out of free space.

bq. Or is the "read past EOF" on closing an IndexReader w/ pending deletes really the first exception you see?

It is the first exception we see. We turned on quite a bit of additional logging but we have not been able to find out anything weird happening before this error. I do expect something weird must have happened to cause the 'read past EOF'.

Do you have any clues as to what we could look for? - that might narrow the search.
We are able to consistently reproduce this on our test environment. So if you have clues to specific debug logging that should be turned on - we can do another test run.

bq. Does zoie somehow touch the index files?

We'll try to find out. For as far as I see the basic backup procedure is to grab the last 'commit snapshot', prevent it from being deleted (ZoieIndexDeletionPolicy), and write all the files mentioned in the commit snapshot to a NIO WritableByteChannel.

> Index corruption after 'read past EOF' under heavy update load and snapshot export
> ----------------------------------------------------------------------------------
>
>                 Key: LUCENE-2729
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2729
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>    Affects Versions: 3.0.1, 3.0.2
>         Environment: Happens on both OS X 10.6 and Windows 2008 Server. Integrated with zoie (using a zoie snapshot from 2010-08-06: zoie-2.0.0-snapshot-20100806.jar).
>            Reporter: Nico Krijnen
>
> We have a system running lucene and zoie. We use lucene as a content store for a CMS/DAM system. We use the hot-backup feature of zoie to make scheduled backups of the index. This works fine for small indexes and when there are not a lot of changes to the index when the backup is made.
> On large indexes (about 5 GB to 19 GB), when a backup is made while the index is being changed a lot (lots of document additions and/or deletions), we almost always get a 'read past EOF' at some point, followed by lots of 'Lock obtain timed out'.
> At that point we get lots of 0 kb files in the index, data gets lots, and the index is unusable.
> When we stop our server, remove the 0kb files and restart our server, the index is operational again, but data has been lost.
> I'm not sure if this is a zoie or a lucene issue, so i'm posting it to both. Hopefully someone has some ideas where to look to fix this.
> Some more details...
> Stack trace of the read past EOF and following Lock obtain timed out:
> {code}
> 78307 [proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader@31ca5085] ERROR proj.zoie.impl.indexing.internal.BaseSearchIndex - read past EOF
> java.io.IOException: read past EOF
>     at org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:154)
>     at org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:39)
>     at org.apache.lucene.store.ChecksumIndexInput.readByte(ChecksumIndexInput.java:37)
>     at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:69)
>     at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:245)
>     at org.apache.lucene.index.IndexFileDeleter.<init>(IndexFileDeleter.java:166)
>     at org.apache.lucene.index.DirectoryReader.doCommit(DirectoryReader.java:725)
>     at org.apache.lucene.index.IndexReader.commit(IndexReader.java:987)
>     at org.apache.lucene.index.IndexReader.commit(IndexReader.java:973)
>     at org.apache.lucene.index.IndexReader.decRef(IndexReader.java:162)
>     at org.apache.lucene.index.IndexReader.close(IndexReader.java:1003)
>     at proj.zoie.impl.indexing.internal.BaseSearchIndex.deleteDocs(BaseSearchIndex.java:203)
>     at proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:223)
>     at proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153)
>     at proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134)
>     at proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171)
>     at proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:373)
> 579336 [proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader@31ca5085] ERROR proj.zoie.impl.indexing.internal.LuceneIndexDataLoader - 
>     Problem copying segments: Lock obtain timed out: org.apache.lucene.store.SingleInstanceLock@5ad0b895: write.lock
> org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: org.apache.lucene.store.SingleInstanceLock@5ad0b895: write.lock
>     at org.apache.lucene.store.Lock.obtain(Lock.java:84)
>     at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1060)
>     at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:957)
>     at proj.zoie.impl.indexing.internal.DiskSearchIndex.openIndexWriter(DiskSearchIndex.java:176)
>     at proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:228)
>     at proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153)
>     at proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134)
>     at proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171)
>     at proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:373)
> {code}
> We get exactly the same behavour on both OS X and on Windows. On both zoie is using a SimpleFSDirectory.
> We also use a SingleInstanceLockFactory (since our process is the only one working with the index), but we get the same behaviour with a NativeFSLock.
> The snapshot backup is being made by calling:
> *proj.zoie.impl.indexing.ZoieSystem.exportSnapshot(WritableByteChannel)*
> Same issue in zoie JIRA:
> http://snaprojects.jira.com/browse/ZOIE-51

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] Commented: (LUCENE-2729) Index corruption after 'read past EOF' under heavy update load and snapshot export

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12930579#action_12930579 ] 

Michael McCandless commented on LUCENE-2729:
--------------------------------------------

bq. Will do, but we also got the error while zoie was opening a new IndexWriter:

You're right... the next attempt to open an IndexWriter, or and IndexReader that's trying to commit pending deletes, will hit the exc due to the zero-length segments file.

So in fact this already helps us further bracket when the truncation happened, because IW 115 was able to successfully open on the assetIndex, but then the attempt to write deletions hits the exc, as well as the next attempt to open a writer on the index.

So sometime after IW 115 is opened and before the deletions are applied, something truncates these index files.

bq. My guess is that the 'read past EOF' is not really specific to applying deletes, but just happens when the SegementInfos is loaded on a 0kb file.

That's right.  What's really odd is that these 0-length files are very old, ie, indexing has long ago moved on to newer segments.  I suspect they are in fact the files protected by the snapshot for backups... so I think it's important we instrument Zoie's backup process next.

{quote}
We ran a test with your patch to throw a RuntimeException when an output already exists.
We did get a 'read past EOF', but the additional RuntimeException is never thrown.
{quote}
Hmmmm.  So it sounds like it's not Lucene's SimpleFSDir.createOutput that's truncating these files.

> Index corruption after 'read past EOF' under heavy update load and snapshot export
> ----------------------------------------------------------------------------------
>
>                 Key: LUCENE-2729
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2729
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>    Affects Versions: 3.0.1, 3.0.2
>         Environment: Happens on both OS X 10.6 and Windows 2008 Server. Integrated with zoie (using a zoie snapshot from 2010-08-06: zoie-2.0.0-snapshot-20100806.jar).
>            Reporter: Nico Krijnen
>         Attachments: 2010-11-02 IndexWriter infoStream log.zip, LUCENE-2729-test1.patch
>
>
> We have a system running lucene and zoie. We use lucene as a content store for a CMS/DAM system. We use the hot-backup feature of zoie to make scheduled backups of the index. This works fine for small indexes and when there are not a lot of changes to the index when the backup is made.
> On large indexes (about 5 GB to 19 GB), when a backup is made while the index is being changed a lot (lots of document additions and/or deletions), we almost always get a 'read past EOF' at some point, followed by lots of 'Lock obtain timed out'.
> At that point we get lots of 0 kb files in the index, data gets lots, and the index is unusable.
> When we stop our server, remove the 0kb files and restart our server, the index is operational again, but data has been lost.
> I'm not sure if this is a zoie or a lucene issue, so i'm posting it to both. Hopefully someone has some ideas where to look to fix this.
> Some more details...
> Stack trace of the read past EOF and following Lock obtain timed out:
> {code}
> 78307 [proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader@31ca5085] ERROR proj.zoie.impl.indexing.internal.BaseSearchIndex - read past EOF
> java.io.IOException: read past EOF
>     at org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:154)
>     at org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:39)
>     at org.apache.lucene.store.ChecksumIndexInput.readByte(ChecksumIndexInput.java:37)
>     at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:69)
>     at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:245)
>     at org.apache.lucene.index.IndexFileDeleter.<init>(IndexFileDeleter.java:166)
>     at org.apache.lucene.index.DirectoryReader.doCommit(DirectoryReader.java:725)
>     at org.apache.lucene.index.IndexReader.commit(IndexReader.java:987)
>     at org.apache.lucene.index.IndexReader.commit(IndexReader.java:973)
>     at org.apache.lucene.index.IndexReader.decRef(IndexReader.java:162)
>     at org.apache.lucene.index.IndexReader.close(IndexReader.java:1003)
>     at proj.zoie.impl.indexing.internal.BaseSearchIndex.deleteDocs(BaseSearchIndex.java:203)
>     at proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:223)
>     at proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153)
>     at proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134)
>     at proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171)
>     at proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:373)
> 579336 [proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader@31ca5085] ERROR proj.zoie.impl.indexing.internal.LuceneIndexDataLoader - 
>     Problem copying segments: Lock obtain timed out: org.apache.lucene.store.SingleInstanceLock@5ad0b895: write.lock
> org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: org.apache.lucene.store.SingleInstanceLock@5ad0b895: write.lock
>     at org.apache.lucene.store.Lock.obtain(Lock.java:84)
>     at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1060)
>     at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:957)
>     at proj.zoie.impl.indexing.internal.DiskSearchIndex.openIndexWriter(DiskSearchIndex.java:176)
>     at proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:228)
>     at proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153)
>     at proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134)
>     at proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171)
>     at proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:373)
> {code}
> We get exactly the same behavour on both OS X and on Windows. On both zoie is using a SimpleFSDirectory.
> We also use a SingleInstanceLockFactory (since our process is the only one working with the index), but we get the same behaviour with a NativeFSLock.
> The snapshot backup is being made by calling:
> *proj.zoie.impl.indexing.ZoieSystem.exportSnapshot(WritableByteChannel)*
> Same issue in zoie JIRA:
> http://snaprojects.jira.com/browse/ZOIE-51

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] Commented: (LUCENE-2729) Index corruption after 'read past EOF' under heavy update load and snapshot export

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12935082#action_12935082 ] 

Michael McCandless commented on LUCENE-2729:
--------------------------------------------

bq. We accidentally opened the file in "rw" mode. 

OK, phew :)  So that explains how they became 0 length.

But: can you also check where Zoie opens the IndexReader to apply deletions?  I suspect that IndexReader is not being provided the snapshot deletion policy...

> Index corruption after 'read past EOF' under heavy update load and snapshot export
> ----------------------------------------------------------------------------------
>
>                 Key: LUCENE-2729
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2729
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>    Affects Versions: 3.0.1, 3.0.2
>         Environment: Happens on both OS X 10.6 and Windows 2008 Server. Integrated with zoie (using a zoie snapshot from 2010-08-06: zoie-2.0.0-snapshot-20100806.jar).
>            Reporter: Nico Krijnen
>         Attachments: 2010-11-02 IndexWriter infoStream log.zip, backup_force_failure2.log.zip, eof-extra-logging-4-analysis.txt, eof-extra-logging-4.log.zip, LUCENE-2729-test1.patch, read-past-eof-debugging.zip
>
>
> We have a system running lucene and zoie. We use lucene as a content store for a CMS/DAM system. We use the hot-backup feature of zoie to make scheduled backups of the index. This works fine for small indexes and when there are not a lot of changes to the index when the backup is made.
> On large indexes (about 5 GB to 19 GB), when a backup is made while the index is being changed a lot (lots of document additions and/or deletions), we almost always get a 'read past EOF' at some point, followed by lots of 'Lock obtain timed out'.
> At that point we get lots of 0 kb files in the index, data gets lots, and the index is unusable.
> When we stop our server, remove the 0kb files and restart our server, the index is operational again, but data has been lost.
> I'm not sure if this is a zoie or a lucene issue, so i'm posting it to both. Hopefully someone has some ideas where to look to fix this.
> Some more details...
> Stack trace of the read past EOF and following Lock obtain timed out:
> {code}
> 78307 [proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader@31ca5085] 
>     ERROR proj.zoie.impl.indexing.internal.BaseSearchIndex - read past EOF
> java.io.IOException: read past EOF
>     at org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:154)
>     at org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:39)
>     at org.apache.lucene.store.ChecksumIndexInput.readByte(ChecksumIndexInput.java:37)
>     at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:69)
>     at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:245)
>     at org.apache.lucene.index.IndexFileDeleter.<init>(IndexFileDeleter.java:166)
>     at org.apache.lucene.index.DirectoryReader.doCommit(DirectoryReader.java:725)
>     at org.apache.lucene.index.IndexReader.commit(IndexReader.java:987)
>     at org.apache.lucene.index.IndexReader.commit(IndexReader.java:973)
>     at org.apache.lucene.index.IndexReader.decRef(IndexReader.java:162)
>     at org.apache.lucene.index.IndexReader.close(IndexReader.java:1003)
>     at proj.zoie.impl.indexing.internal.BaseSearchIndex.deleteDocs(BaseSearchIndex.java:203)
>     at proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:223)
>     at proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153)
>     at proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134)
>     at proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171)
>     at proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:373)
> 579336 [proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader@31ca5085] 
>     ERROR proj.zoie.impl.indexing.internal.LuceneIndexDataLoader - 
>     Problem copying segments: Lock obtain timed out: 
>     org.apache.lucene.store.SingleInstanceLock@5ad0b895: write.lock
> org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: 
>     org.apache.lucene.store.SingleInstanceLock@5ad0b895: write.lock
>     at org.apache.lucene.store.Lock.obtain(Lock.java:84)
>     at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1060)
>     at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:957)
>     at proj.zoie.impl.indexing.internal.DiskSearchIndex.openIndexWriter(DiskSearchIndex.java:176)
>     at proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:228)
>     at proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153)
>     at proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134)
>     at proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171)
>     at proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:373)
> {code}
> We get exactly the same behavour on both OS X and on Windows. On both zoie is using a SimpleFSDirectory.
> We also use a SingleInstanceLockFactory (since our process is the only one working with the index), but we get the same behaviour with a NativeFSLock.
> The snapshot backup is being made by calling:
> *proj.zoie.impl.indexing.ZoieSystem.exportSnapshot(WritableByteChannel)*
> Same issue in zoie JIRA:
> http://snaprojects.jira.com/browse/ZOIE-51

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] Commented: (LUCENE-2729) Index corruption after 'read past EOF' under heavy update load and snapshot export

Posted by "Jan te Beest (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12934857#action_12934857 ] 

Jan te Beest commented on LUCENE-2729:
--------------------------------------

Hi Michael,

I'm one of Nico's colleagues and I've taken over where he left off.

After a lot of testing and debugging, I have found a workaround to the issue and I've collected more logging. 
Hopefully this will help to find a real fix (be it in Zoie of Lucene) instead of the dirty hack I've put in the code right now.

Attached is a zip-file with a full log (eof-logging-9.log) the same log stripped down to a period where the backup is happening, i.e. between creating the Snapshot and the moment where the 'read past EOF' is happening (stripped-log.log). The other to files are a svn-diff against lucene and a git patch against zoie.

First let me try to explain what I think is happening.

{noformat}11:50:07,449 [http-8080-exec-4] DEBUG proj.zoie.impl.indexing.internal.ZoieIndexDeletionPolicy - Init Snapshot segmentsFileName=segments_5e{noformat}
Snapshot is initialized with IndexCommit(segments_5e)

It is not really apparent from this log as I had to restart the server due to those "load manager has stopped" ZoieExceptions, but the behaviour is the same as in all other test runs I did.
All files are properly written to the .snapshot file, including the segments_5e file. 

After the backup completes, the Snapshot is dereferenced.
{noformat}
12:43:06,988 [Finalizer] DEBUG proj.zoie.impl.indexing.internal.ZoieIndexDeletionPolicy - Snapshot(IndexFileDeleter.CommitPoint(segments_5e)).refCount=0
java.lang.Exception: 12:43:06.988[Finalizer]

	at com.ds.util.log.TraceUtil.stackTrace(TraceUtil.java:20)
	at proj.zoie.impl.indexing.internal.ZoieIndexDeletionPolicy$Snapshot.decRef(ZoieIndexDeletionPolicy.java:139)
	at proj.zoie.impl.indexing.internal.ZoieIndexDeletionPolicy$Snapshot.close(ZoieIndexDeletionPolicy.java:123)
	at proj.zoie.impl.indexing.internal.ZoieIndexDeletionPolicy$Snapshot.finalize(ZoieIndexDeletionPolicy.java:148)
	at java.lang.ref.Finalizer.invokeFinalizeMethod(Native Method)
	at java.lang.ref.Finalizer.runFinalizer(Finalizer.java:83)
	at java.lang.ref.Finalizer.access$100(Finalizer.java:14)
	at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:160)

12:43:06,991 [Finalizer] DEBUG proj.zoie.impl.indexing.internal.ZoieIndexDeletionPolicy - Snapshot(IndexFileDeleter.CommitPoint(segments_5e)), refcount is now -1
{noformat}

Now something is making the segments_5e file 0 bytes, I've not been able to figure out why this is happening though. I'll try to get some proper logging about this.

Where the 'read past EOF' happens is quite clear at this point however.

When a new IndexFileDeleter is initialized and the directory is scanned for segments_* files, i.e.
{noformat}
        if (fileName.startsWith(IndexFileNames.SEGMENTS)) {

          // This is a commit (segments or segments_N), and
          // it's valid (<= the max gen).  Load it, then
          // incref all files it refers to:
{noformat}

This piece of code yields the EOF error, which is obvious as the segments_ file is 0 bytes.
{noformat}
          SegmentInfos sis = new SegmentInfos();
          try {
            sis.read(directory, fileName);
          } catch (FileNotFoundException e) {
{noformat}

I added another check to the {noformat}if (fileName.startsWith(IndexFileNames.SEGMENTS)){noformat}. However that's not solving the problem, rather working around it.
{noformat}
        long fileLength = directory.fileLength(fileName);
        if (fileName.startsWith(IndexFileNames.SEGMENTS) && fileLength <= 0) {
        	message("init: file "+fileName+" is invalid (has no length), skipping further handling");
        }
        
        if (fileName.startsWith(IndexFileNames.SEGMENTS) && fileLength > 0) {

          // This is a commit (segments or segments_N), and
          // it's valid (<= the max gen).  Load it, then
          // incref all files it refers to:
{noformat}

We don't see any more errors with this hack (in fact I was able to do 10 backups of a 21GB index in a row). The segments_ files are deleted from the index directory after a while as well, so it's not like they have a positive refcount anywhere else (if I understand correctly).

So, somewhere in between the Snapshot being closed and a new IndexFileDeleter initialized the segments_* something is writing 0 bytes to that segments_* file.
Like said before, I have not been able to find the place where this happens. As it is only happening under heavy load (I had 25 threads reading and writing the index besides backing up the index to the snapshot file), I'm guessing it may be some synchronization issue?

Perhaps you get an idea now, if you want me to put in more logging or do something else to clear it up, just give me a hint.

Cheers,
Jan te Beest


> Index corruption after 'read past EOF' under heavy update load and snapshot export
> ----------------------------------------------------------------------------------
>
>                 Key: LUCENE-2729
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2729
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>    Affects Versions: 3.0.1, 3.0.2
>         Environment: Happens on both OS X 10.6 and Windows 2008 Server. Integrated with zoie (using a zoie snapshot from 2010-08-06: zoie-2.0.0-snapshot-20100806.jar).
>            Reporter: Nico Krijnen
>         Attachments: 2010-11-02 IndexWriter infoStream log.zip, backup_force_failure2.log.zip, eof-extra-logging-4-analysis.txt, eof-extra-logging-4.log.zip, LUCENE-2729-test1.patch, read-past-eof-debugging.zip
>
>
> We have a system running lucene and zoie. We use lucene as a content store for a CMS/DAM system. We use the hot-backup feature of zoie to make scheduled backups of the index. This works fine for small indexes and when there are not a lot of changes to the index when the backup is made.
> On large indexes (about 5 GB to 19 GB), when a backup is made while the index is being changed a lot (lots of document additions and/or deletions), we almost always get a 'read past EOF' at some point, followed by lots of 'Lock obtain timed out'.
> At that point we get lots of 0 kb files in the index, data gets lots, and the index is unusable.
> When we stop our server, remove the 0kb files and restart our server, the index is operational again, but data has been lost.
> I'm not sure if this is a zoie or a lucene issue, so i'm posting it to both. Hopefully someone has some ideas where to look to fix this.
> Some more details...
> Stack trace of the read past EOF and following Lock obtain timed out:
> {code}
> 78307 [proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader@31ca5085] 
>     ERROR proj.zoie.impl.indexing.internal.BaseSearchIndex - read past EOF
> java.io.IOException: read past EOF
>     at org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:154)
>     at org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:39)
>     at org.apache.lucene.store.ChecksumIndexInput.readByte(ChecksumIndexInput.java:37)
>     at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:69)
>     at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:245)
>     at org.apache.lucene.index.IndexFileDeleter.<init>(IndexFileDeleter.java:166)
>     at org.apache.lucene.index.DirectoryReader.doCommit(DirectoryReader.java:725)
>     at org.apache.lucene.index.IndexReader.commit(IndexReader.java:987)
>     at org.apache.lucene.index.IndexReader.commit(IndexReader.java:973)
>     at org.apache.lucene.index.IndexReader.decRef(IndexReader.java:162)
>     at org.apache.lucene.index.IndexReader.close(IndexReader.java:1003)
>     at proj.zoie.impl.indexing.internal.BaseSearchIndex.deleteDocs(BaseSearchIndex.java:203)
>     at proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:223)
>     at proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153)
>     at proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134)
>     at proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171)
>     at proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:373)
> 579336 [proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader@31ca5085] 
>     ERROR proj.zoie.impl.indexing.internal.LuceneIndexDataLoader - 
>     Problem copying segments: Lock obtain timed out: 
>     org.apache.lucene.store.SingleInstanceLock@5ad0b895: write.lock
> org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: 
>     org.apache.lucene.store.SingleInstanceLock@5ad0b895: write.lock
>     at org.apache.lucene.store.Lock.obtain(Lock.java:84)
>     at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1060)
>     at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:957)
>     at proj.zoie.impl.indexing.internal.DiskSearchIndex.openIndexWriter(DiskSearchIndex.java:176)
>     at proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:228)
>     at proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153)
>     at proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134)
>     at proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171)
>     at proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:373)
> {code}
> We get exactly the same behavour on both OS X and on Windows. On both zoie is using a SimpleFSDirectory.
> We also use a SingleInstanceLockFactory (since our process is the only one working with the index), but we get the same behaviour with a NativeFSLock.
> The snapshot backup is being made by calling:
> *proj.zoie.impl.indexing.ZoieSystem.exportSnapshot(WritableByteChannel)*
> Same issue in zoie JIRA:
> http://snaprojects.jira.com/browse/ZOIE-51

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] Updated: (LUCENE-2729) Index corruption after 'read past EOF' under heavy update load and snapshot export

Posted by "Jan te Beest (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/LUCENE-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jan te Beest updated LUCENE-2729:
---------------------------------

    Attachment: eof-extra-logging-4.log.zip

Full log, stripped log and diffs against Lucene/Zoie.

> Index corruption after 'read past EOF' under heavy update load and snapshot export
> ----------------------------------------------------------------------------------
>
>                 Key: LUCENE-2729
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2729
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>    Affects Versions: 3.0.1, 3.0.2
>         Environment: Happens on both OS X 10.6 and Windows 2008 Server. Integrated with zoie (using a zoie snapshot from 2010-08-06: zoie-2.0.0-snapshot-20100806.jar).
>            Reporter: Nico Krijnen
>         Attachments: 2010-11-02 IndexWriter infoStream log.zip, backup_force_failure2.log.zip, eof-extra-logging-4-analysis.txt, eof-extra-logging-4.log.zip, eof-extra-logging-4.log.zip, LUCENE-2729-test1.patch
>
>
> We have a system running lucene and zoie. We use lucene as a content store for a CMS/DAM system. We use the hot-backup feature of zoie to make scheduled backups of the index. This works fine for small indexes and when there are not a lot of changes to the index when the backup is made.
> On large indexes (about 5 GB to 19 GB), when a backup is made while the index is being changed a lot (lots of document additions and/or deletions), we almost always get a 'read past EOF' at some point, followed by lots of 'Lock obtain timed out'.
> At that point we get lots of 0 kb files in the index, data gets lots, and the index is unusable.
> When we stop our server, remove the 0kb files and restart our server, the index is operational again, but data has been lost.
> I'm not sure if this is a zoie or a lucene issue, so i'm posting it to both. Hopefully someone has some ideas where to look to fix this.
> Some more details...
> Stack trace of the read past EOF and following Lock obtain timed out:
> {code}
> 78307 [proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader@31ca5085] 
>     ERROR proj.zoie.impl.indexing.internal.BaseSearchIndex - read past EOF
> java.io.IOException: read past EOF
>     at org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:154)
>     at org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:39)
>     at org.apache.lucene.store.ChecksumIndexInput.readByte(ChecksumIndexInput.java:37)
>     at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:69)
>     at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:245)
>     at org.apache.lucene.index.IndexFileDeleter.<init>(IndexFileDeleter.java:166)
>     at org.apache.lucene.index.DirectoryReader.doCommit(DirectoryReader.java:725)
>     at org.apache.lucene.index.IndexReader.commit(IndexReader.java:987)
>     at org.apache.lucene.index.IndexReader.commit(IndexReader.java:973)
>     at org.apache.lucene.index.IndexReader.decRef(IndexReader.java:162)
>     at org.apache.lucene.index.IndexReader.close(IndexReader.java:1003)
>     at proj.zoie.impl.indexing.internal.BaseSearchIndex.deleteDocs(BaseSearchIndex.java:203)
>     at proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:223)
>     at proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153)
>     at proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134)
>     at proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171)
>     at proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:373)
> 579336 [proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader@31ca5085] 
>     ERROR proj.zoie.impl.indexing.internal.LuceneIndexDataLoader - 
>     Problem copying segments: Lock obtain timed out: 
>     org.apache.lucene.store.SingleInstanceLock@5ad0b895: write.lock
> org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: 
>     org.apache.lucene.store.SingleInstanceLock@5ad0b895: write.lock
>     at org.apache.lucene.store.Lock.obtain(Lock.java:84)
>     at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1060)
>     at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:957)
>     at proj.zoie.impl.indexing.internal.DiskSearchIndex.openIndexWriter(DiskSearchIndex.java:176)
>     at proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:228)
>     at proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153)
>     at proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134)
>     at proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171)
>     at proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:373)
> {code}
> We get exactly the same behavour on both OS X and on Windows. On both zoie is using a SimpleFSDirectory.
> We also use a SingleInstanceLockFactory (since our process is the only one working with the index), but we get the same behaviour with a NativeFSLock.
> The snapshot backup is being made by calling:
> *proj.zoie.impl.indexing.ZoieSystem.exportSnapshot(WritableByteChannel)*
> Same issue in zoie JIRA:
> http://snaprojects.jira.com/browse/ZOIE-51

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] Commented: (LUCENE-2729) Index corruption after 'read past EOF' under heavy update load and snapshot export

Posted by "Nico Krijnen (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12926858#action_12926858 ] 

Nico Krijnen commented on LUCENE-2729:
--------------------------------------

{code}
jteb:assetIndex jteb$ ls -la
total 41550832
drwxr-xr-x  2 jteb  jteb        4862  1 nov 08:52 .
drwxr-xr-x  4 jteb  jteb         238 29 okt 14:10 ..
-rw-r--r--@ 1 jteb  jteb       21508  1 nov 08:52 .DS_Store
-rw-r--r--  1 jteb  jteb   969134416 18 okt 16:41 _2q.fdt
-rw-r--r--  1 jteb  jteb       36652 18 okt 16:41 _2q.fdx
-rw-r--r--  1 jteb  jteb         276 18 okt 16:41 _2q.fnm
-rw-r--r--  1 jteb  jteb     4685726 18 okt 16:41 _2q.frq
-rw-r--r--  1 jteb  jteb        9166 18 okt 16:41 _2q.nrm
-rw-r--r--  1 jteb  jteb   393230403 18 okt 16:42 _2q.prx
-rw-r--r--  1 jteb  jteb        7447 18 okt 16:42 _2q.tii
-rw-r--r--  1 jteb  jteb      746299 18 okt 16:42 _2q.tis
-rw-r--r--  1 jteb  jteb        8394 18 okt 16:42 _2q.tvd
-rw-r--r--  1 jteb  jteb   599185081 18 okt 16:42 _2q.tvf
-rw-r--r--  1 jteb  jteb       73300 18 okt 16:42 _2q.tvx
-rw-r--r--  1 jteb  jteb  1595882722 18 okt 16:45 _3u.fdt
-rw-r--r--  1 jteb  jteb       63692 18 okt 16:45 _3u.fdx
-rw-r--r--  1 jteb  jteb         330 18 okt 16:45 _3u.fnm
-rw-r--r--  1 jteb  jteb     8001869 18 okt 16:45 _3u.frq
-rw-r--r--  1 jteb  jteb       15926 18 okt 16:45 _3u.nrm
-rw-r--r--  1 jteb  jteb   647374863 18 okt 16:45 _3u.prx
-rw-r--r--  1 jteb  jteb       11319 18 okt 16:45 _3u.tii
-rw-r--r--  1 jteb  jteb     1168399 18 okt 16:45 _3u.tis
-rw-r--r--  1 jteb  jteb       14209 18 okt 16:45 _3u.tvd
-rw-r--r--  1 jteb  jteb   986370136 18 okt 16:46 _3u.tvf
-rw-r--r--  1 jteb  jteb      127380 18 okt 16:46 _3u.tvx
-rw-r--r--  1 jteb  jteb  2691565961 18 okt 16:49 _4c.fdt
-rw-r--r--  1 jteb  jteb       39572 18 okt 16:49 _4c.fdx
-rw-r--r--  1 jteb  jteb         276 18 okt 16:49 _4c.fnm
-rw-r--r--  1 jteb  jteb    18724620 18 okt 16:49 _4c.frq
-rw-r--r--  1 jteb  jteb        9896 18 okt 16:49 _4c.nrm
-rw-r--r--  1 jteb  jteb   590255960 18 okt 16:50 _4c.prx
-rw-r--r--  1 jteb  jteb      141243 18 okt 16:50 _4c.tii
-rw-r--r--  1 jteb  jteb    12185869 18 okt 16:50 _4c.tis
-rw-r--r--  1 jteb  jteb        9894 18 okt 16:50 _4c.tvd
-rw-r--r--  1 jteb  jteb   932649779 18 okt 16:51 _4c.tvf
-rw-r--r--  1 jteb  jteb       79140 18 okt 16:51 _4c.tvx
-rw-r--r--  1 jteb  jteb  2398908136 18 okt 16:52 _4d.fdt
-rw-r--r--  1 jteb  jteb         548 18 okt 16:52 _4d.fdx
-rw-r--r--  1 jteb  jteb         354 18 okt 16:52 _4d.fnm
-rw-r--r--  1 jteb  jteb    24581614 18 okt 16:52 _4d.frq
-rw-r--r--  1 jteb  jteb         140 18 okt 16:52 _4d.nrm
-rw-r--r--  1 jteb  jteb   158243133 18 okt 16:52 _4d.prx
-rw-r--r--  1 jteb  jteb      141948 18 okt 16:52 _4d.tii
-rw-r--r--  1 jteb  jteb    12259425 18 okt 16:52 _4d.tis
-rw-r--r--  1 jteb  jteb         140 18 okt 16:52 _4d.tvd
-rw-r--r--  1 jteb  jteb   303769970 18 okt 16:53 _4d.tvf
-rw-r--r--  1 jteb  jteb        1092 18 okt 16:53 _4d.tvx
-rw-r--r--  1 jteb  jteb  4118409126 29 okt 16:26 _6g.fdt
-rw-r--r--  1 jteb  jteb        1484 29 okt 16:26 _6g.fdx
-rw-r--r--  1 jteb  jteb         384 29 okt 16:17 _6g.fnm
-rw-r--r--  1 jteb  jteb    35294399 29 okt 16:27 _6g.frq
-rw-r--r--  1 jteb  jteb         374 29 okt 16:27 _6g.nrm
-rw-r--r--  1 jteb  jteb   230791431 29 okt 16:27 _6g.prx
-rw-r--r--  1 jteb  jteb      143860 29 okt 16:27 _6g.tii
-rw-r--r--  1 jteb  jteb    12491845 29 okt 16:27 _6g.tis
-rw-r--r--  1 jteb  jteb         295 29 okt 16:28 _6g.tvd
-rw-r--r--  1 jteb  jteb   444939185 29 okt 16:28 _6g.tvf
-rw-r--r--  1 jteb  jteb        2964 29 okt 16:28 _6g.tvx
-rw-r--r--  1 jteb  jteb  2758122671 29 okt 16:31 _6h.fdt
-rw-r--r--  1 jteb  jteb       96388 29 okt 16:31 _6h.fdx
-rw-r--r--  1 jteb  jteb         723 29 okt 16:29 _6h.fnm
-rw-r--r--  1 jteb  jteb    51142700 29 okt 16:31 _6h.frq
-rw-r--r--  1 jteb  jteb       24100 29 okt 16:31 _6h.nrm
-rw-r--r--  1 jteb  jteb   189178767 29 okt 16:31 _6h.prx
-rw-r--r--  1 jteb  jteb      270472 29 okt 16:31 _6h.tii
-rw-r--r--  1 jteb  jteb    21710405 29 okt 16:31 _6h.tis
-rw-r--r--  1 jteb  jteb       23873 29 okt 16:31 _6h.tvd
-rw-r--r--  1 jteb  jteb   394088075 29 okt 16:31 _6h.tvf
-rw-r--r--  1 jteb  jteb      192772 29 okt 16:31 _6h.tvx
-rw-r--r--  1 jteb  jteb           0 29 okt 20:22 _8b.fnm
-rw-r--r--  1 jteb  jteb           0 29 okt 20:26 _8b.tvd
-rw-r--r--  1 jteb  jteb           0 29 okt 20:26 _8b.tvf
-rw-r--r--  1 jteb  jteb           0 29 okt 20:22 _8c.fdt
-rw-r--r--  1 jteb  jteb           0 29 okt 20:22 _8c.fdx
-rw-r--r--  1 jteb  jteb           0 29 okt 20:26 _8c.frq
-rw-r--r--  1 jteb  jteb           0 29 okt 20:24 _8c.tii
-rw-r--r--  1 jteb  jteb           0 29 okt 20:24 _8c.tis
-rw-r--r--  1 jteb  jteb           0 29 okt 20:28 _8c.tvf
-rw-r--r--  1 jteb  jteb           0 29 okt 20:30 _8c.tvx
-rw-r--r--  1 jteb  jteb           0 29 okt 20:24 _8d.fdt
-rw-r--r--  1 jteb  jteb           0 29 okt 20:25 _8d.fdx
-rw-r--r--  1 jteb  jteb           0 29 okt 20:24 _8d.frq
-rw-r--r--  1 jteb  jteb           0 29 okt 20:26 _8d.nrm
-rw-r--r--  1 jteb  jteb           0 29 okt 20:26 _8d.prx
-rw-r--r--  1 jteb  jteb           0 29 okt 20:26 _8d.tii
-rw-r--r--  1 jteb  jteb           0 29 okt 20:26 _8d.tis
-rw-r--r--  1 jteb  jteb           0 29 okt 20:30 _8d.tvd
-rw-r--r--  1 jteb  jteb           0 29 okt 20:22 _8e.fdt
-rw-r--r--  1 jteb  jteb           0 29 okt 20:22 _8e.fdx
-rw-r--r--  1 jteb  jteb           0 29 okt 20:26 _8e.frq
-rw-r--r--  1 jteb  jteb           0 29 okt 20:24 _8e.nrm
-rw-r--r--  1 jteb  jteb           0 29 okt 20:07 _8e.tis
-rw-r--r--  1 jteb  jteb           0 29 okt 21:09 _8e.tvd
-rw-r--r--  1 jteb  jteb           0 29 okt 21:09 _8e.tvf
-rw-r--r--  1 jteb  jteb           0 29 okt 21:04 _8e.tvx
-rw-r--r--  1 jteb  jteb   331348323 29 okt 19:49 _9a.fdt
-rw-r--r--  1 jteb  jteb       97020 29 okt 19:49 _9a.fdx
-rw-r--r--  1 jteb  jteb        1480 29 okt 19:49 _9a.fnm
-rw-r--r--  1 jteb  jteb    38868223 29 okt 19:50 _9a.frq
-rw-r--r--  1 jteb  jteb       24258 29 okt 19:50 _9a.nrm
-rw-r--r--  1 jteb  jteb    88673922 29 okt 19:50 _9a.prx
-rw-r--r--  1 jteb  jteb      147173 29 okt 19:50 _9a.tii
-rw-r--r--  1 jteb  jteb    11155683 29 okt 19:50 _9a.tis
-rw-r--r--  1 jteb  jteb       23837 29 okt 19:50 _9a.tvd
-rw-r--r--  1 jteb  jteb   194739895 29 okt 19:50 _9a.tvf
-rw-r--r--  1 jteb  jteb      194036 29 okt 19:50 _9a.tvx
-rw-r--r--  1 jteb  jteb          27 29 okt 19:54 _9a_1.del
-rw-r--r--  1 jteb  jteb      562391 29 okt 19:59 _9e.fdt
-rw-r--r--  1 jteb  jteb        9868 29 okt 19:59 _9e.fdx
-rw-r--r--  1 jteb  jteb        1271 29 okt 19:59 _9e.fnm
-rw-r--r--  1 jteb  jteb      325115 29 okt 19:59 _9e.frq
-rw-r--r--  1 jteb  jteb        1237 29 okt 19:59 _9e.nrm
-rw-r--r--  1 jteb  jteb       29378 29 okt 19:59 _9e.prx
-rw-r--r--  1 jteb  jteb        3367 29 okt 19:59 _9e.tii
-rw-r--r--  1 jteb  jteb      308103 29 okt 19:59 _9e.tis
-rw-r--r--  1 jteb  jteb      117116 29 okt 20:04 _9f.fdt
-rw-r--r--  1 jteb  jteb        2124 29 okt 20:04 _9f.fdx
-rw-r--r--  1 jteb  jteb         416 29 okt 20:04 _9f.fnm
-rw-r--r--  1 jteb  jteb       67935 29 okt 20:04 _9f.frq
-rw-r--r--  1 jteb  jteb         269 29 okt 20:04 _9f.nrm
-rw-r--r--  1 jteb  jteb        6165 29 okt 20:04 _9f.prx
-rw-r--r--  1 jteb  jteb         903 29 okt 20:04 _9f.tii
-rw-r--r--  1 jteb  jteb       75060 29 okt 20:04 _9f.tis
-rw-r--r--  1 jteb  jteb      120877 29 okt 20:04 _9g.fdt
-rw-r--r--  1 jteb  jteb        2188 29 okt 20:04 _9g.fdx
-rw-r--r--  1 jteb  jteb         416 29 okt 20:04 _9g.fnm
-rw-r--r--  1 jteb  jteb       70021 29 okt 20:04 _9g.frq
-rw-r--r--  1 jteb  jteb         277 29 okt 20:04 _9g.nrm
-rw-r--r--  1 jteb  jteb        6315 29 okt 20:04 _9g.prx
-rw-r--r--  1 jteb  jteb         842 29 okt 20:04 _9g.tii
-rw-r--r--  1 jteb  jteb       72269 29 okt 20:04 _9g.tis
-rw-r--r--  1 jteb  jteb       11704 29 okt 20:04 _9h.fdt
-rw-r--r--  1 jteb  jteb         212 29 okt 20:04 _9h.fdx
-rw-r--r--  1 jteb  jteb         416 29 okt 20:04 _9h.fnm
-rw-r--r--  1 jteb  jteb        6063 29 okt 20:04 _9h.frq
-rw-r--r--  1 jteb  jteb          30 29 okt 20:04 _9h.nrm
-rw-r--r--  1 jteb  jteb         627 29 okt 20:04 _9h.prx
-rw-r--r--  1 jteb  jteb         146 29 okt 20:04 _9h.tii
-rw-r--r--  1 jteb  jteb        9576 29 okt 20:04 _9h.tis
-rw-r--r--  1 jteb  jteb           5 29 okt 20:04 index.directory
-rw-r--r--  1 jteb  jteb          20 29 okt 20:04 segments.gen
-rw-r--r--  1 jteb  jteb           0 29 okt 20:07 segments_44
-rw-r--r--  1 jteb  jteb        2641 29 okt 20:04 segments_4m
{code}

> Index corruption after 'read past EOF' under heavy update load and snapshot export
> ----------------------------------------------------------------------------------
>
>                 Key: LUCENE-2729
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2729
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>    Affects Versions: 3.0.1, 3.0.2
>         Environment: Happens on both OS X 10.6 and Windows 2008 Server. Integrated with zoie (using a zoie snapshot from 2010-08-06: zoie-2.0.0-snapshot-20100806.jar).
>            Reporter: Nico Krijnen
>
> We have a system running lucene and zoie. We use lucene as a content store for a CMS/DAM system. We use the hot-backup feature of zoie to make scheduled backups of the index. This works fine for small indexes and when there are not a lot of changes to the index when the backup is made.
> On large indexes (about 5 GB to 19 GB), when a backup is made while the index is being changed a lot (lots of document additions and/or deletions), we almost always get a 'read past EOF' at some point, followed by lots of 'Lock obtain timed out'.
> At that point we get lots of 0 kb files in the index, data gets lots, and the index is unusable.
> When we stop our server, remove the 0kb files and restart our server, the index is operational again, but data has been lost.
> I'm not sure if this is a zoie or a lucene issue, so i'm posting it to both. Hopefully someone has some ideas where to look to fix this.
> Some more details...
> Stack trace of the read past EOF and following Lock obtain timed out:
> {code}
> 78307 [proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader@31ca5085] ERROR proj.zoie.impl.indexing.internal.BaseSearchIndex - read past EOF
> java.io.IOException: read past EOF
>     at org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:154)
>     at org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:39)
>     at org.apache.lucene.store.ChecksumIndexInput.readByte(ChecksumIndexInput.java:37)
>     at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:69)
>     at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:245)
>     at org.apache.lucene.index.IndexFileDeleter.<init>(IndexFileDeleter.java:166)
>     at org.apache.lucene.index.DirectoryReader.doCommit(DirectoryReader.java:725)
>     at org.apache.lucene.index.IndexReader.commit(IndexReader.java:987)
>     at org.apache.lucene.index.IndexReader.commit(IndexReader.java:973)
>     at org.apache.lucene.index.IndexReader.decRef(IndexReader.java:162)
>     at org.apache.lucene.index.IndexReader.close(IndexReader.java:1003)
>     at proj.zoie.impl.indexing.internal.BaseSearchIndex.deleteDocs(BaseSearchIndex.java:203)
>     at proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:223)
>     at proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153)
>     at proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134)
>     at proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171)
>     at proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:373)
> 579336 [proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader@31ca5085] ERROR proj.zoie.impl.indexing.internal.LuceneIndexDataLoader - Problem copying segments: Lock obtain timed out: org.apache.lucene.store.SingleInstanceLock@5ad0b895: write.lock
> org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: org.apache.lucene.store.SingleInstanceLock@5ad0b895: write.lock
>     at org.apache.lucene.store.Lock.obtain(Lock.java:84)
>     at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1060)
>     at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:957)
>     at proj.zoie.impl.indexing.internal.DiskSearchIndex.openIndexWriter(DiskSearchIndex.java:176)
>     at proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:228)
>     at proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153)
>     at proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134)
>     at proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171)
>     at proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:373)
> {code}
> We get exactly the same behavour on both OS X and on Windows. On both zoie is using a SimpleFSDirectory.
> We also use a SingleInstanceLockFactory (since our process is the only one working with the index), but we get the same behaviour with a NativeFSLock.
> The snapshot backup is being made by calling:
> *proj.zoie.impl.indexing.ZoieSystem.exportSnapshot(WritableByteChannel)*
> Same issue in zoie JIRA:
> http://snaprojects.jira.com/browse/ZOIE-51

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] Issue Comment Edited: (LUCENE-2729) Index corruption after 'read past EOF' under heavy update load and snapshot export

Posted by "Nico Krijnen (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12927481#action_12927481 ] 

Nico Krijnen edited comment on LUCENE-2729 at 11/4/10 4:57 AM:
---------------------------------------------------------------

Attached IndexWriter's infoStream log and a directory listing of the failed index. 

The log file contains logging from 14:22:29 (start of our backup process) to 15:57:58 (application termination). Index gets corrupt around 15:21:39 (first "read past EOF"). These log times are searchable in the log file.

  
> Index corruption after 'read past EOF' under heavy update load and snapshot export
> ----------------------------------------------------------------------------------
>
>                 Key: LUCENE-2729
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2729
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>    Affects Versions: 3.0.1, 3.0.2
>         Environment: Happens on both OS X 10.6 and Windows 2008 Server. Integrated with zoie (using a zoie snapshot from 2010-08-06: zoie-2.0.0-snapshot-20100806.jar).
>            Reporter: Nico Krijnen
>         Attachments: 2010-11-02 IndexWriter infoStream log.zip
>
>
> We have a system running lucene and zoie. We use lucene as a content store for a CMS/DAM system. We use the hot-backup feature of zoie to make scheduled backups of the index. This works fine for small indexes and when there are not a lot of changes to the index when the backup is made.
> On large indexes (about 5 GB to 19 GB), when a backup is made while the index is being changed a lot (lots of document additions and/or deletions), we almost always get a 'read past EOF' at some point, followed by lots of 'Lock obtain timed out'.
> At that point we get lots of 0 kb files in the index, data gets lots, and the index is unusable.
> When we stop our server, remove the 0kb files and restart our server, the index is operational again, but data has been lost.
> I'm not sure if this is a zoie or a lucene issue, so i'm posting it to both. Hopefully someone has some ideas where to look to fix this.
> Some more details...
> Stack trace of the read past EOF and following Lock obtain timed out:
> {code}
> 78307 [proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader@31ca5085] ERROR proj.zoie.impl.indexing.internal.BaseSearchIndex - read past EOF
> java.io.IOException: read past EOF
>     at org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:154)
>     at org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:39)
>     at org.apache.lucene.store.ChecksumIndexInput.readByte(ChecksumIndexInput.java:37)
>     at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:69)
>     at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:245)
>     at org.apache.lucene.index.IndexFileDeleter.<init>(IndexFileDeleter.java:166)
>     at org.apache.lucene.index.DirectoryReader.doCommit(DirectoryReader.java:725)
>     at org.apache.lucene.index.IndexReader.commit(IndexReader.java:987)
>     at org.apache.lucene.index.IndexReader.commit(IndexReader.java:973)
>     at org.apache.lucene.index.IndexReader.decRef(IndexReader.java:162)
>     at org.apache.lucene.index.IndexReader.close(IndexReader.java:1003)
>     at proj.zoie.impl.indexing.internal.BaseSearchIndex.deleteDocs(BaseSearchIndex.java:203)
>     at proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:223)
>     at proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153)
>     at proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134)
>     at proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171)
>     at proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:373)
> 579336 [proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader@31ca5085] ERROR proj.zoie.impl.indexing.internal.LuceneIndexDataLoader - 
>     Problem copying segments: Lock obtain timed out: org.apache.lucene.store.SingleInstanceLock@5ad0b895: write.lock
> org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: org.apache.lucene.store.SingleInstanceLock@5ad0b895: write.lock
>     at org.apache.lucene.store.Lock.obtain(Lock.java:84)
>     at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1060)
>     at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:957)
>     at proj.zoie.impl.indexing.internal.DiskSearchIndex.openIndexWriter(DiskSearchIndex.java:176)
>     at proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:228)
>     at proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153)
>     at proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134)
>     at proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171)
>     at proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:373)
> {code}
> We get exactly the same behavour on both OS X and on Windows. On both zoie is using a SimpleFSDirectory.
> We also use a SingleInstanceLockFactory (since our process is the only one working with the index), but we get the same behaviour with a NativeFSLock.
> The snapshot backup is being made by calling:
> *proj.zoie.impl.indexing.ZoieSystem.exportSnapshot(WritableByteChannel)*
> Same issue in zoie JIRA:
> http://snaprojects.jira.com/browse/ZOIE-51

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] Commented: (LUCENE-2729) Index corruption after 'read past EOF' under heavy update load and snapshot export

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12931044#action_12931044 ] 

Michael McCandless commented on LUCENE-2729:
--------------------------------------------

When you print the TRACE output saying that a file is being written, can your also print which dir instance we are "in"?  I can can see that suddenly segments_2j is opened for writing at 11:03:27,653 but it's not clear which directory that is.  Maybe print the threads stack trace at that point?

It looks like this 0 length files are being opened (by something) but then never closed.

Also: I see lines with "Writing _2q.fdx" where the directory contents on that line (and lines above it) show _2q.fdx clearly exists.  Yet, you had added an assert to check this very case but the assert never tripped?  So something is wrong.  EG are you sure you're running the JRE with assertions enabled?

Maybe when you print out the full file listing, in addition to the size of each file, also print out the .fileModified result?  This can help us confirm that files are getting overwritten.

> Index corruption after 'read past EOF' under heavy update load and snapshot export
> ----------------------------------------------------------------------------------
>
>                 Key: LUCENE-2729
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2729
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>    Affects Versions: 3.0.1, 3.0.2
>         Environment: Happens on both OS X 10.6 and Windows 2008 Server. Integrated with zoie (using a zoie snapshot from 2010-08-06: zoie-2.0.0-snapshot-20100806.jar).
>            Reporter: Nico Krijnen
>         Attachments: 2010-11-02 IndexWriter infoStream log.zip, backup_force_failure2.log.zip, LUCENE-2729-test1.patch
>
>
> We have a system running lucene and zoie. We use lucene as a content store for a CMS/DAM system. We use the hot-backup feature of zoie to make scheduled backups of the index. This works fine for small indexes and when there are not a lot of changes to the index when the backup is made.
> On large indexes (about 5 GB to 19 GB), when a backup is made while the index is being changed a lot (lots of document additions and/or deletions), we almost always get a 'read past EOF' at some point, followed by lots of 'Lock obtain timed out'.
> At that point we get lots of 0 kb files in the index, data gets lots, and the index is unusable.
> When we stop our server, remove the 0kb files and restart our server, the index is operational again, but data has been lost.
> I'm not sure if this is a zoie or a lucene issue, so i'm posting it to both. Hopefully someone has some ideas where to look to fix this.
> Some more details...
> Stack trace of the read past EOF and following Lock obtain timed out:
> {code}
> 78307 [proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader@31ca5085] 
>     ERROR proj.zoie.impl.indexing.internal.BaseSearchIndex - read past EOF
> java.io.IOException: read past EOF
>     at org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:154)
>     at org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:39)
>     at org.apache.lucene.store.ChecksumIndexInput.readByte(ChecksumIndexInput.java:37)
>     at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:69)
>     at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:245)
>     at org.apache.lucene.index.IndexFileDeleter.<init>(IndexFileDeleter.java:166)
>     at org.apache.lucene.index.DirectoryReader.doCommit(DirectoryReader.java:725)
>     at org.apache.lucene.index.IndexReader.commit(IndexReader.java:987)
>     at org.apache.lucene.index.IndexReader.commit(IndexReader.java:973)
>     at org.apache.lucene.index.IndexReader.decRef(IndexReader.java:162)
>     at org.apache.lucene.index.IndexReader.close(IndexReader.java:1003)
>     at proj.zoie.impl.indexing.internal.BaseSearchIndex.deleteDocs(BaseSearchIndex.java:203)
>     at proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:223)
>     at proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153)
>     at proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134)
>     at proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171)
>     at proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:373)
> 579336 [proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader@31ca5085] 
>     ERROR proj.zoie.impl.indexing.internal.LuceneIndexDataLoader - 
>     Problem copying segments: Lock obtain timed out: 
>     org.apache.lucene.store.SingleInstanceLock@5ad0b895: write.lock
> org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: 
>     org.apache.lucene.store.SingleInstanceLock@5ad0b895: write.lock
>     at org.apache.lucene.store.Lock.obtain(Lock.java:84)
>     at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1060)
>     at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:957)
>     at proj.zoie.impl.indexing.internal.DiskSearchIndex.openIndexWriter(DiskSearchIndex.java:176)
>     at proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:228)
>     at proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153)
>     at proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134)
>     at proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171)
>     at proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:373)
> {code}
> We get exactly the same behavour on both OS X and on Windows. On both zoie is using a SimpleFSDirectory.
> We also use a SingleInstanceLockFactory (since our process is the only one working with the index), but we get the same behaviour with a NativeFSLock.
> The snapshot backup is being made by calling:
> *proj.zoie.impl.indexing.ZoieSystem.exportSnapshot(WritableByteChannel)*
> Same issue in zoie JIRA:
> http://snaprojects.jira.com/browse/ZOIE-51

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] Commented: (LUCENE-2729) Index corruption after 'read past EOF' under heavy update load and snapshot export

Posted by "Nico Krijnen (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12930142#action_12930142 ] 

Nico Krijnen commented on LUCENE-2729:
--------------------------------------

{quote}
...spooky "other" exceptions...
{quote}

These are all 'WARN' level and all of them caused by non-critical timeouts in our code. All caused by the system being under very heavy load needed to reproduce the bug.


{quote}
Would it be possible to instrument to Zoie code to note as the backup
process is copying each file in the snapshot, and at that point print
a listing of the directory?
{quote}

Will do, that is a good one. Then we know which files are being 'held' by the Zoie deletion policy for the backup.


{quote}
Also, can you write to the log when Zoie applies deletes? (Looks like
it happens in proj.zoie.impl.indexing.internal.BaseSearchIndex.deleteDocs).
It's on applying deletes that the corruption is first detected, so, if
we log this event we can better bracket the period of time when the
corruption happened.
{quote}

Will do, but we also got the error while zoie was opening a new IndexWriter:

{code}
15:25:03,453 [proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader@3d9e7719] 
    ERROR proj.zoie.impl.indexing.internal.LuceneIndexDataLoader - 
    Problem copying segments: read past EOF
java.io.IOException: read past EOF
    at org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:154)
    at org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:39)
    at org.apache.lucene.store.ChecksumIndexInput.readByte(ChecksumIndexInput.java:37)
    at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:69)
    at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:245)
    at org.apache.lucene.index.IndexFileDeleter.<init>(IndexFileDeleter.java:170)
    at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1127)
    at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:960)
    at proj.zoie.impl.indexing.internal.DiskSearchIndex.openIndexWriter(DiskSearchIndex.java:176)
    at proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:228)
    at proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153)
    at proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134)
    at proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:172)
    at proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:377)
{code}

I'll add a log there too.
My guess is that the 'read past EOF' is not really specific to applying deletes, but just happens when the SegementInfos is loaded on a 0kb file.


{quote}
Does Zoie ever open an IndexReader or IndexWriter passing in an
existing commit point? Or does it always open the latest commit?
{quote}

I'll try to find out.


{quote}
The timestamps on the zero length files are particularly spooky - the
earliest ones are 15:21 (when first EOF is hit), but then also 15:47
and 15:49 on the others. It seems like on 3 separate occasions
something truncated the files.
{quote}

Indeed, I thought this was weird too.

> Index corruption after 'read past EOF' under heavy update load and snapshot export
> ----------------------------------------------------------------------------------
>
>                 Key: LUCENE-2729
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2729
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>    Affects Versions: 3.0.1, 3.0.2
>         Environment: Happens on both OS X 10.6 and Windows 2008 Server. Integrated with zoie (using a zoie snapshot from 2010-08-06: zoie-2.0.0-snapshot-20100806.jar).
>            Reporter: Nico Krijnen
>         Attachments: 2010-11-02 IndexWriter infoStream log.zip, LUCENE-2729-test1.patch
>
>
> We have a system running lucene and zoie. We use lucene as a content store for a CMS/DAM system. We use the hot-backup feature of zoie to make scheduled backups of the index. This works fine for small indexes and when there are not a lot of changes to the index when the backup is made.
> On large indexes (about 5 GB to 19 GB), when a backup is made while the index is being changed a lot (lots of document additions and/or deletions), we almost always get a 'read past EOF' at some point, followed by lots of 'Lock obtain timed out'.
> At that point we get lots of 0 kb files in the index, data gets lots, and the index is unusable.
> When we stop our server, remove the 0kb files and restart our server, the index is operational again, but data has been lost.
> I'm not sure if this is a zoie or a lucene issue, so i'm posting it to both. Hopefully someone has some ideas where to look to fix this.
> Some more details...
> Stack trace of the read past EOF and following Lock obtain timed out:
> {code}
> 78307 [proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader@31ca5085] ERROR proj.zoie.impl.indexing.internal.BaseSearchIndex - read past EOF
> java.io.IOException: read past EOF
>     at org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:154)
>     at org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:39)
>     at org.apache.lucene.store.ChecksumIndexInput.readByte(ChecksumIndexInput.java:37)
>     at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:69)
>     at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:245)
>     at org.apache.lucene.index.IndexFileDeleter.<init>(IndexFileDeleter.java:166)
>     at org.apache.lucene.index.DirectoryReader.doCommit(DirectoryReader.java:725)
>     at org.apache.lucene.index.IndexReader.commit(IndexReader.java:987)
>     at org.apache.lucene.index.IndexReader.commit(IndexReader.java:973)
>     at org.apache.lucene.index.IndexReader.decRef(IndexReader.java:162)
>     at org.apache.lucene.index.IndexReader.close(IndexReader.java:1003)
>     at proj.zoie.impl.indexing.internal.BaseSearchIndex.deleteDocs(BaseSearchIndex.java:203)
>     at proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:223)
>     at proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153)
>     at proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134)
>     at proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171)
>     at proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:373)
> 579336 [proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader@31ca5085] ERROR proj.zoie.impl.indexing.internal.LuceneIndexDataLoader - 
>     Problem copying segments: Lock obtain timed out: org.apache.lucene.store.SingleInstanceLock@5ad0b895: write.lock
> org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: org.apache.lucene.store.SingleInstanceLock@5ad0b895: write.lock
>     at org.apache.lucene.store.Lock.obtain(Lock.java:84)
>     at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1060)
>     at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:957)
>     at proj.zoie.impl.indexing.internal.DiskSearchIndex.openIndexWriter(DiskSearchIndex.java:176)
>     at proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:228)
>     at proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153)
>     at proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134)
>     at proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171)
>     at proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:373)
> {code}
> We get exactly the same behavour on both OS X and on Windows. On both zoie is using a SimpleFSDirectory.
> We also use a SingleInstanceLockFactory (since our process is the only one working with the index), but we get the same behaviour with a NativeFSLock.
> The snapshot backup is being made by calling:
> *proj.zoie.impl.indexing.ZoieSystem.exportSnapshot(WritableByteChannel)*
> Same issue in zoie JIRA:
> http://snaprojects.jira.com/browse/ZOIE-51

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] Commented: (LUCENE-2729) Index corruption after 'read past EOF' under heavy update load and snapshot export

Posted by "Nico Krijnen (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12927040#action_12927040 ] 

Nico Krijnen commented on LUCENE-2729:
--------------------------------------

In the mean time, we also did a test with a checkout of the latest lucene_3_0 branch (@2010-11-01), which should include the fix that Jason mentions.

Does not seem to make a difference though. We still get a 'read past EOF'.

On the last run we did get a slightly different stacktrace. This time the 'read past EOF' happens when the zoie RAM index is written to the zoie Disk index. Last time it occurred a little earlier in BaseSearchIndex#loadFromIndex, while committing deletes to the disk IndexReader. This could be just a coincidence though. My feeling is still that the 'read past EOF' is just a result/symptom of something else that happened just before it - still trying to figure out what that could be...

{code}
15:25:03,453 [proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader@3d9e7719] 
    ERROR proj.zoie.impl.indexing.internal.LuceneIndexDataLoader - 
    Problem copying segments: read past EOF
java.io.IOException: read past EOF
    at org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:154)
    at org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:39)
    at org.apache.lucene.store.ChecksumIndexInput.readByte(ChecksumIndexInput.java:37)
    at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:69)
    at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:245)
    at org.apache.lucene.index.IndexFileDeleter.<init>(IndexFileDeleter.java:170)
    at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1127)
    at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:960)
    at proj.zoie.impl.indexing.internal.DiskSearchIndex.openIndexWriter(DiskSearchIndex.java:176)
    at proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:228)
    at proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153)
    at proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134)
    at proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:172)
    at proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:377)
{code}

> Index corruption after 'read past EOF' under heavy update load and snapshot export
> ----------------------------------------------------------------------------------
>
>                 Key: LUCENE-2729
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2729
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>    Affects Versions: 3.0.1, 3.0.2
>         Environment: Happens on both OS X 10.6 and Windows 2008 Server. Integrated with zoie (using a zoie snapshot from 2010-08-06: zoie-2.0.0-snapshot-20100806.jar).
>            Reporter: Nico Krijnen
>
> We have a system running lucene and zoie. We use lucene as a content store for a CMS/DAM system. We use the hot-backup feature of zoie to make scheduled backups of the index. This works fine for small indexes and when there are not a lot of changes to the index when the backup is made.
> On large indexes (about 5 GB to 19 GB), when a backup is made while the index is being changed a lot (lots of document additions and/or deletions), we almost always get a 'read past EOF' at some point, followed by lots of 'Lock obtain timed out'.
> At that point we get lots of 0 kb files in the index, data gets lots, and the index is unusable.
> When we stop our server, remove the 0kb files and restart our server, the index is operational again, but data has been lost.
> I'm not sure if this is a zoie or a lucene issue, so i'm posting it to both. Hopefully someone has some ideas where to look to fix this.
> Some more details...
> Stack trace of the read past EOF and following Lock obtain timed out:
> {code}
> 78307 [proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader@31ca5085] ERROR proj.zoie.impl.indexing.internal.BaseSearchIndex - read past EOF
> java.io.IOException: read past EOF
>     at org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:154)
>     at org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:39)
>     at org.apache.lucene.store.ChecksumIndexInput.readByte(ChecksumIndexInput.java:37)
>     at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:69)
>     at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:245)
>     at org.apache.lucene.index.IndexFileDeleter.<init>(IndexFileDeleter.java:166)
>     at org.apache.lucene.index.DirectoryReader.doCommit(DirectoryReader.java:725)
>     at org.apache.lucene.index.IndexReader.commit(IndexReader.java:987)
>     at org.apache.lucene.index.IndexReader.commit(IndexReader.java:973)
>     at org.apache.lucene.index.IndexReader.decRef(IndexReader.java:162)
>     at org.apache.lucene.index.IndexReader.close(IndexReader.java:1003)
>     at proj.zoie.impl.indexing.internal.BaseSearchIndex.deleteDocs(BaseSearchIndex.java:203)
>     at proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:223)
>     at proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153)
>     at proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134)
>     at proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171)
>     at proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:373)
> 579336 [proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader@31ca5085] ERROR proj.zoie.impl.indexing.internal.LuceneIndexDataLoader - 
>     Problem copying segments: Lock obtain timed out: org.apache.lucene.store.SingleInstanceLock@5ad0b895: write.lock
> org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: org.apache.lucene.store.SingleInstanceLock@5ad0b895: write.lock
>     at org.apache.lucene.store.Lock.obtain(Lock.java:84)
>     at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1060)
>     at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:957)
>     at proj.zoie.impl.indexing.internal.DiskSearchIndex.openIndexWriter(DiskSearchIndex.java:176)
>     at proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:228)
>     at proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153)
>     at proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134)
>     at proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171)
>     at proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:373)
> {code}
> We get exactly the same behavour on both OS X and on Windows. On both zoie is using a SimpleFSDirectory.
> We also use a SingleInstanceLockFactory (since our process is the only one working with the index), but we get the same behaviour with a NativeFSLock.
> The snapshot backup is being made by calling:
> *proj.zoie.impl.indexing.ZoieSystem.exportSnapshot(WritableByteChannel)*
> Same issue in zoie JIRA:
> http://snaprojects.jira.com/browse/ZOIE-51

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] Commented: (LUCENE-2729) Index corruption after 'read past EOF' under heavy update load and snapshot export

Posted by "Xiaoyang Gu (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12935121#action_12935121 ] 

Xiaoyang Gu commented on LUCENE-2729:
-------------------------------------

The "rw" mode shouldn't explain anything, since we don't write anything, it doesn't affect the state of the file at least for MAC OS.

But the missed deletion policy could cause deletion of files that shouldn't be deleted and then the "rw" mode could create some 0 length files.

> Index corruption after 'read past EOF' under heavy update load and snapshot export
> ----------------------------------------------------------------------------------
>
>                 Key: LUCENE-2729
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2729
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>    Affects Versions: 3.0.1, 3.0.2
>         Environment: Happens on both OS X 10.6 and Windows 2008 Server. Integrated with zoie (using a zoie snapshot from 2010-08-06: zoie-2.0.0-snapshot-20100806.jar).
>            Reporter: Nico Krijnen
>         Attachments: 2010-11-02 IndexWriter infoStream log.zip, backup_force_failure2.log.zip, eof-extra-logging-4-analysis.txt, eof-extra-logging-4.log.zip, LUCENE-2729-test1.patch, read-past-eof-debugging.zip
>
>
> We have a system running lucene and zoie. We use lucene as a content store for a CMS/DAM system. We use the hot-backup feature of zoie to make scheduled backups of the index. This works fine for small indexes and when there are not a lot of changes to the index when the backup is made.
> On large indexes (about 5 GB to 19 GB), when a backup is made while the index is being changed a lot (lots of document additions and/or deletions), we almost always get a 'read past EOF' at some point, followed by lots of 'Lock obtain timed out'.
> At that point we get lots of 0 kb files in the index, data gets lots, and the index is unusable.
> When we stop our server, remove the 0kb files and restart our server, the index is operational again, but data has been lost.
> I'm not sure if this is a zoie or a lucene issue, so i'm posting it to both. Hopefully someone has some ideas where to look to fix this.
> Some more details...
> Stack trace of the read past EOF and following Lock obtain timed out:
> {code}
> 78307 [proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader@31ca5085] 
>     ERROR proj.zoie.impl.indexing.internal.BaseSearchIndex - read past EOF
> java.io.IOException: read past EOF
>     at org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:154)
>     at org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:39)
>     at org.apache.lucene.store.ChecksumIndexInput.readByte(ChecksumIndexInput.java:37)
>     at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:69)
>     at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:245)
>     at org.apache.lucene.index.IndexFileDeleter.<init>(IndexFileDeleter.java:166)
>     at org.apache.lucene.index.DirectoryReader.doCommit(DirectoryReader.java:725)
>     at org.apache.lucene.index.IndexReader.commit(IndexReader.java:987)
>     at org.apache.lucene.index.IndexReader.commit(IndexReader.java:973)
>     at org.apache.lucene.index.IndexReader.decRef(IndexReader.java:162)
>     at org.apache.lucene.index.IndexReader.close(IndexReader.java:1003)
>     at proj.zoie.impl.indexing.internal.BaseSearchIndex.deleteDocs(BaseSearchIndex.java:203)
>     at proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:223)
>     at proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153)
>     at proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134)
>     at proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171)
>     at proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:373)
> 579336 [proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader@31ca5085] 
>     ERROR proj.zoie.impl.indexing.internal.LuceneIndexDataLoader - 
>     Problem copying segments: Lock obtain timed out: 
>     org.apache.lucene.store.SingleInstanceLock@5ad0b895: write.lock
> org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: 
>     org.apache.lucene.store.SingleInstanceLock@5ad0b895: write.lock
>     at org.apache.lucene.store.Lock.obtain(Lock.java:84)
>     at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1060)
>     at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:957)
>     at proj.zoie.impl.indexing.internal.DiskSearchIndex.openIndexWriter(DiskSearchIndex.java:176)
>     at proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:228)
>     at proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153)
>     at proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134)
>     at proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171)
>     at proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:373)
> {code}
> We get exactly the same behavour on both OS X and on Windows. On both zoie is using a SimpleFSDirectory.
> We also use a SingleInstanceLockFactory (since our process is the only one working with the index), but we get the same behaviour with a NativeFSLock.
> The snapshot backup is being made by calling:
> *proj.zoie.impl.indexing.ZoieSystem.exportSnapshot(WritableByteChannel)*
> Same issue in zoie JIRA:
> http://snaprojects.jira.com/browse/ZOIE-51

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] Issue Comment Edited: (LUCENE-2729) Index corruption after 'read past EOF' under heavy update load and snapshot export

Posted by "Nico Krijnen (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12931024#action_12931024 ] 

Nico Krijnen edited comment on LUCENE-2729 at 11/11/10 9:45 AM:
----------------------------------------------------------------

We finished another run today and got some interesting results with from the additional logging. The log times are searchable in the log file...

- 09:40:21,996: we are creating a snapshot for our backup, segments filename used for snapshot: 'segments_2j'
On the next line we can see that at that moment the segments_2j is 2353 bytes. We log the filenames in the index + the (filesizes).

- 11:03:27,653: segments_2j is written to disk
_How can this be while this is the name of the segments filename used to create our Snapshot?_
On the next line we see that it has indeed been written to disk, but has a 0 byte file size.

- 11:08:02,731: First "read past EOF" exception
_It looks like segments filenames are reused and at a point in time well after our Snapshot is created, the name 'segments_2j' is reused?
This somehow causes the segments file to get a size of 0 bytes. Perhaps it is being deleted and written at the same time, or because of the name clash there's no data to be written to the segments file?_

- 11:16:03,300: our backup is finished.

Here is an excerpt of some of the more interesting log entries (complete log is attached as backup_force_failure2.log.zip):

{quote}
09:39:30,719 [http-8080-2] DEBUG com.ds.acm.logic.tools.BackupManager - Start asset index backup

09:40:21,996 [http-8080-2] DEBUG proj.zoie.impl.indexing.internal.ZoieIndexDeletionPolicy - Ctor Snapshot. segments filename: *segments_2j*, commit filenames: [_5f.tvd, _4c.tvf, _5f.tvf, _4d.fdx, _4p.tis, _4c.tvd, _4p.frq, _5e.tvx, _39.tis, _3u.frq, _4d.fdt, _4p.tii, _5f.fnm, _4c.tvx, _39.tii, _55.tis, _2q.fnm, _4p.fdx, _2q.tvf, _5e.tvf, _5e.tvd, _2q.tvd, _5f.tvx, _4p.fdt, _55.tii, _55.nrm, _55.frq, _2q.tvx, _3u.tvx, _5f.fdx, _4c.frq, _5f.fdt, _4c.fdx, _2q.nrm, _4c.fdt, _4p.tvf, _4c.nrm, _3v.tvf, _4c.tii, _3u.prx, _4p.tvd, _4c.tis, _3u.fdt, _5e.fnm, _3v.tvd, _3u.fdx, _55.tvx, _39.fdt, _39.fnm, _5e.prx, _3u.tis, *segments_2j*, _3v.prx, _2q.fdx, _4c.prx, _4d.prx, _4p.tvx, _39.fdx, _3u.tii, _2q.fdt, _4d.frq, _55.fdt, _2q.prx, _5e.tii, _3u.fnm, _55.fdx, _5e.nrm, _5f.tii, _5e.tis, _4d.nrm, _4d.tii, _5e.fdt, _4d.tis, _5f.frq, _5e.fdx, _39.tvd, _39.tvf, _55.fnm, _3v.frq, _2q.frq, _2q.tii, _3v.tii, _5f.prx, _4p.nrm, _5f.nrm, _2q.tis, _3v.tvx, _4c.fnm, _3v.nrm, _55.tvf, _55.tvd, _3u.nrm, _5f.tis, _3v.tis, _3v.fdt, _4d.fnm, _4d.tvf, _3v.fdx, _4d.tvd, _4d.tvx, _4p.prx, _3v.fnm, _5e.frq, _39.prx, _39.tvx, _39.frq, _4p.fnm, _39.nrm, _3u.tvf, _55.prx, _3u.tvd]

11:03:27,653 [http-8080-2] TRACE com.ds.acm.engine.search.zoieimpl.core.DirectoryManager - Writing *segments_2j*, directory contents: _2q.fdt(969134416), _2q.fdx(36652), _2q.fnm(276), _2q.frq(4685726), _2q.nrm(9166), _2q.prx(393230403), _2q.tii(7447), _2q.tis(746299), _2q.tvd(8394), _2q.tvf(599185081), _2q.tvx(73300), _39.fdt(2061261675), _39.fdx(1012), _39.fnm(276), _39.frq(17754579), _39.nrm(256), _39.prx(121067407), _39.tii(137511), _39.tis(11726653), _39.tvd(185), _39.tvf(233037042), _39.tvx(2020), _3u.fdt(1595882722), _3u.fdx(63692), _3u.fnm(330), _3u.frq(8001869), _3u.nrm(15926), _3u.prx(647374863), _3u.tii(11319), _3u.tis(1168399), _3u.tvd(14209), _3u.tvf(986370136), _3u.tvx(127380), _3v.fdt(2057147455), _3v.fdx(476), _3v.fnm(384), _3v.frq(15201111), _3v.nrm(122), _3v.prx(109724024), _3v.tii(132491), _3v.tis(11457688), _3v.tvd(114), _3v.tvf(211902147), _3v.tvx(948), _4c.fdt(2691565961), _4c.fdx(39572), _4c.fnm(276), _4c.frq(18724620), _4c.nrm(9896), _4c.prx(590255960), _4c.tii(141243), _4c.tis(12185869), _4c.tvd(9894), _4c.tvf(932649779), _4c.tvx(79140), _4d.fdt(2398908136), _4d.fdx(548), _4d.fnm(354), _4d.frq(24581614), _4d.nrm(140), _4d.prx(158243133), _4d.tii(141948), _4d.tis(12259425), _4d.tvd(140), _4d.tvf(303769970), _4d.tvx(1092), _4p.fdt(1081212027), _4p.fdx(212), _4p.fnm(354), _4p.frq(8294102), _4p.nrm(56), _4p.prx(60513257), _4p.tii(134898), _4p.tis(11376730), _4p.tvd(56), _4p.tvf(116715012), _4p.tvx(420), _55.fdt(787581180), _55.fdx(156), _55.fnm(354), _55.frq(5376741), _55.nrm(42), _55.prx(39539985), _55.tii(133483), _55.tis(11351989), _55.tvd(42), _55.tvf(75958725), _55.tvx(308), _5e.prx(0), _5z.fdt(853614415), _5z.fdx(23700), _5z.fnm(1696), _5z.frq(9041984), _5z.nrm(5928), _5z.prx(32617823), _5z.tii(146115), _5z.tis(12387143), _5z.tvd(4142), _5z.tvf(74409815), _5z.tvx(47396), _63.fdt(212690811), _63.fdx(8188), _63.fnm(1652), _63.frq(2492889), _63.nrm(2050), _63.prx(5672093), _63.tii(8470), _63.tis(689993), _63.tvd(1442), _63.tvf(15371474), _63.tvx(16372), _64.fdt(4118409126), _64.fdx(1484), _64.fnm(384), _64.frq(35294399), _64.nrm(374), _64.prx(230791431), _64.tii(143860), _64.tis(12491845), _64.tvd(295), _64.tvf(444939185), _64.tvx(2964), _65.fdt(1369489408), _65.fdx(0), _65.fnm(1749), index.directory(5), segments.gen(20), segments_2x(2354)
11:03:27,706 [http-8080-2] TRACE com.ds.acm.engine.search.zoieimpl.core.DirectoryManager - Writing _3v.prx, directory contents: _2q.fdt(969134416), _2q.fdx(36652), _2q.fnm(276), _2q.frq(4685726), _2q.nrm(9166), _2q.prx(393230403), _2q.tii(7447), _2q.tis(746299), _2q.tvd(8394), _2q.tvf(599185081), _2q.tvx(73300), _39.fdt(2061261675), _39.fdx(1012), _39.fnm(276), _39.frq(17754579), _39.nrm(256), _39.prx(121067407), _39.tii(137511), _39.tis(11726653), _39.tvd(185), _39.tvf(233037042), _39.tvx(2020), _3u.fdt(1595882722), _3u.fdx(63692), _3u.fnm(330), _3u.frq(8001869), _3u.nrm(15926), _3u.prx(647374863), _3u.tii(11319), _3u.tis(1168399), _3u.tvd(14209), _3u.tvf(986370136), _3u.tvx(127380), _3v.fdt(2057147455), _3v.fdx(476), _3v.fnm(384), _3v.frq(15201111), _3v.nrm(122), _3v.prx(109724024), _3v.tii(132491), _3v.tis(11457688), _3v.tvd(114), _3v.tvf(211902147), _3v.tvx(948), _4c.fdt(2691565961), _4c.fdx(39572), _4c.fnm(276), _4c.frq(18724620), _4c.nrm(9896), _4c.prx(590255960), _4c.tii(141243), _4c.tis(12185869), _4c.tvd(9894), _4c.tvf(932649779), _4c.tvx(79140), _4d.fdt(2398908136), _4d.fdx(548), _4d.fnm(354), _4d.frq(24581614), _4d.nrm(140), _4d.prx(158243133), _4d.tii(141948), _4d.tis(12259425), _4d.tvd(140), _4d.tvf(303769970), _4d.tvx(1092), _4p.fdt(1081212027), _4p.fdx(212), _4p.fnm(354), _4p.frq(8294102), _4p.nrm(56), _4p.prx(60513257), _4p.tii(134898), _4p.tis(11376730), _4p.tvd(56), _4p.tvf(116715012), _4p.tvx(420), _55.fdt(787581180), _55.fdx(156), _55.fnm(354), _55.frq(5376741), _55.nrm(42), _55.prx(39539985), _55.tii(133483), _55.tis(11351989), _55.tvd(42), _55.tvf(75958725), _55.tvx(308), _5e.prx(0), _5z.fdt(853614415), _5z.fdx(23700), _5z.fnm(1696), _5z.frq(9041984), _5z.nrm(5928), _5z.prx(32617823), _5z.tii(146115), _5z.tis(12387143), _5z.tvd(4142), _5z.tvf(74409815), _5z.tvx(47396), _63.fdt(212690811), _63.fdx(8188), _63.fnm(1652), _63.frq(2492889), _63.nrm(2050), _63.prx(5672093), _63.tii(8470), _63.tis(689993), _63.tvd(1442), _63.tvf(15371474), _63.tvx(16372), _64.fdt(4118409126), _64.fdx(1484), _64.fnm(384), _64.frq(35294399), _64.nrm(374), _64.prx(230791431), _64.tii(143860), _64.tis(12491845), _64.tvd(295), _64.tvf(444939185), _64.tvx(2964), _65.fdt(1369554944), _65.fdx(0), _65.fnm(1749), index.directory(5), segments.gen(20), *segments_2j(0)*, segments_2x(2354)

IFD [proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader@7636d59a]: init: load commit "*segments_2j*"
IW 1110 [proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader@7636d59a]: init: hit exception on init; releasing write lock
11:08:02,731 [proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader@7636d59a] ERROR proj.zoie.impl.indexing.internal.LuceneIndexDataLoader - Problem copying segments: *read past EOF*, start=0, end=0, bufferStart=0, bufferPosition=0, bufferSize=1024

11:16:03,300 [http-8080-2] DEBUG com.ds.acm.logic.tools.BackupManager - Asset index backup complete
{quote}

Looking in the logs there are *31 occurences of 'delete "segments_2j"' and 32 for 'wrote segments file "segments_2j"'*. Very weird, since the ZoieIndexDeletionPolicy should prevent the segments_2j from being deleted...

So just guessing here, but the problem seems to be that segement filenames are being 'rotated' and getting reused?
The ZoieIndexDeletionPolicy definitely does not seem to take this into account, it uses the segments filename as a unique id to track the references to a specific commit. But looking at it, segment filenames are not unique over a certain period of time, and that time period gets shorter when lots of modifications are being made to the index.
Reading the code it should still prevent any segments_2j commit from being deleted... but we do see deletes for that file... hmmm

And then something else still seems te be going wrong, somehow the segments file does become 0 bytes. It is not because it was written with no data, we added a check for that (look for 'closing 0kb IndexOutput') but that never happens to a segments_* file.

Maybe some logic needs to be added to prevent certain segment_... numbers from being re-used if the commit is still 'referenced' by an IndexDeletionPolicy...?

      was (Author: nkrijnen):
    We finished another run today and got some interesting results with from the additional logging. The log times are searchable in the log file...

- 09:40:21,996: we are creating a snapshot for our backup, segments filename used for snapshot: 'segments_2j'
On the next line we can see that at that moment the segments_2j is 2353 bytes. We log the filenames in the index + the (filesizes).

- 11:03:27,653: segments_2j is written to disk
_How can this be while this is the name of the segments filename used to create our Snapshot?_
On the next line we see that it has indeed been written to disk, but has a 0 byte file size.

- 11:08:02,731: First "read past EOF" exception
_It looks like segments filenames are reused and at a point in time well after our Snapshot is created, the name 'segments_2j' is reused?
This somehow causes the segments file to get a size of 0 bytes. Perhaps it is being deleted and written at the same time, or because of the name clash there's no data to be written to the segments file?_

- 11:16:03,300: our backup is finished.

Here is an excerpt of some of the more interesting log entries:

{quote}
09:39:30,719 [http-8080-2] DEBUG com.ds.acm.logic.tools.BackupManager - Start asset index backup

09:40:21,996 [http-8080-2] DEBUG proj.zoie.impl.indexing.internal.ZoieIndexDeletionPolicy - Ctor Snapshot. segments filename: *segments_2j*, commit filenames: [_5f.tvd, _4c.tvf, _5f.tvf, _4d.fdx, _4p.tis, _4c.tvd, _4p.frq, _5e.tvx, _39.tis, _3u.frq, _4d.fdt, _4p.tii, _5f.fnm, _4c.tvx, _39.tii, _55.tis, _2q.fnm, _4p.fdx, _2q.tvf, _5e.tvf, _5e.tvd, _2q.tvd, _5f.tvx, _4p.fdt, _55.tii, _55.nrm, _55.frq, _2q.tvx, _3u.tvx, _5f.fdx, _4c.frq, _5f.fdt, _4c.fdx, _2q.nrm, _4c.fdt, _4p.tvf, _4c.nrm, _3v.tvf, _4c.tii, _3u.prx, _4p.tvd, _4c.tis, _3u.fdt, _5e.fnm, _3v.tvd, _3u.fdx, _55.tvx, _39.fdt, _39.fnm, _5e.prx, _3u.tis, *segments_2j*, _3v.prx, _2q.fdx, _4c.prx, _4d.prx, _4p.tvx, _39.fdx, _3u.tii, _2q.fdt, _4d.frq, _55.fdt, _2q.prx, _5e.tii, _3u.fnm, _55.fdx, _5e.nrm, _5f.tii, _5e.tis, _4d.nrm, _4d.tii, _5e.fdt, _4d.tis, _5f.frq, _5e.fdx, _39.tvd, _39.tvf, _55.fnm, _3v.frq, _2q.frq, _2q.tii, _3v.tii, _5f.prx, _4p.nrm, _5f.nrm, _2q.tis, _3v.tvx, _4c.fnm, _3v.nrm, _55.tvf, _55.tvd, _3u.nrm, _5f.tis, _3v.tis, _3v.fdt, _4d.fnm, _4d.tvf, _3v.fdx, _4d.tvd, _4d.tvx, _4p.prx, _3v.fnm, _5e.frq, _39.prx, _39.tvx, _39.frq, _4p.fnm, _39.nrm, _3u.tvf, _55.prx, _3u.tvd]

11:03:27,653 [http-8080-2] TRACE com.ds.acm.engine.search.zoieimpl.core.DirectoryManager - Writing *segments_2j*, directory contents: _2q.fdt(969134416), _2q.fdx(36652), _2q.fnm(276), _2q.frq(4685726), _2q.nrm(9166), _2q.prx(393230403), _2q.tii(7447), _2q.tis(746299), _2q.tvd(8394), _2q.tvf(599185081), _2q.tvx(73300), _39.fdt(2061261675), _39.fdx(1012), _39.fnm(276), _39.frq(17754579), _39.nrm(256), _39.prx(121067407), _39.tii(137511), _39.tis(11726653), _39.tvd(185), _39.tvf(233037042), _39.tvx(2020), _3u.fdt(1595882722), _3u.fdx(63692), _3u.fnm(330), _3u.frq(8001869), _3u.nrm(15926), _3u.prx(647374863), _3u.tii(11319), _3u.tis(1168399), _3u.tvd(14209), _3u.tvf(986370136), _3u.tvx(127380), _3v.fdt(2057147455), _3v.fdx(476), _3v.fnm(384), _3v.frq(15201111), _3v.nrm(122), _3v.prx(109724024), _3v.tii(132491), _3v.tis(11457688), _3v.tvd(114), _3v.tvf(211902147), _3v.tvx(948), _4c.fdt(2691565961), _4c.fdx(39572), _4c.fnm(276), _4c.frq(18724620), _4c.nrm(9896), _4c.prx(590255960), _4c.tii(141243), _4c.tis(12185869), _4c.tvd(9894), _4c.tvf(932649779), _4c.tvx(79140), _4d.fdt(2398908136), _4d.fdx(548), _4d.fnm(354), _4d.frq(24581614), _4d.nrm(140), _4d.prx(158243133), _4d.tii(141948), _4d.tis(12259425), _4d.tvd(140), _4d.tvf(303769970), _4d.tvx(1092), _4p.fdt(1081212027), _4p.fdx(212), _4p.fnm(354), _4p.frq(8294102), _4p.nrm(56), _4p.prx(60513257), _4p.tii(134898), _4p.tis(11376730), _4p.tvd(56), _4p.tvf(116715012), _4p.tvx(420), _55.fdt(787581180), _55.fdx(156), _55.fnm(354), _55.frq(5376741), _55.nrm(42), _55.prx(39539985), _55.tii(133483), _55.tis(11351989), _55.tvd(42), _55.tvf(75958725), _55.tvx(308), _5e.prx(0), _5z.fdt(853614415), _5z.fdx(23700), _5z.fnm(1696), _5z.frq(9041984), _5z.nrm(5928), _5z.prx(32617823), _5z.tii(146115), _5z.tis(12387143), _5z.tvd(4142), _5z.tvf(74409815), _5z.tvx(47396), _63.fdt(212690811), _63.fdx(8188), _63.fnm(1652), _63.frq(2492889), _63.nrm(2050), _63.prx(5672093), _63.tii(8470), _63.tis(689993), _63.tvd(1442), _63.tvf(15371474), _63.tvx(16372), _64.fdt(4118409126), _64.fdx(1484), _64.fnm(384), _64.frq(35294399), _64.nrm(374), _64.prx(230791431), _64.tii(143860), _64.tis(12491845), _64.tvd(295), _64.tvf(444939185), _64.tvx(2964), _65.fdt(1369489408), _65.fdx(0), _65.fnm(1749), index.directory(5), segments.gen(20), segments_2x(2354)
11:03:27,706 [http-8080-2] TRACE com.ds.acm.engine.search.zoieimpl.core.DirectoryManager - Writing _3v.prx, directory contents: _2q.fdt(969134416), _2q.fdx(36652), _2q.fnm(276), _2q.frq(4685726), _2q.nrm(9166), _2q.prx(393230403), _2q.tii(7447), _2q.tis(746299), _2q.tvd(8394), _2q.tvf(599185081), _2q.tvx(73300), _39.fdt(2061261675), _39.fdx(1012), _39.fnm(276), _39.frq(17754579), _39.nrm(256), _39.prx(121067407), _39.tii(137511), _39.tis(11726653), _39.tvd(185), _39.tvf(233037042), _39.tvx(2020), _3u.fdt(1595882722), _3u.fdx(63692), _3u.fnm(330), _3u.frq(8001869), _3u.nrm(15926), _3u.prx(647374863), _3u.tii(11319), _3u.tis(1168399), _3u.tvd(14209), _3u.tvf(986370136), _3u.tvx(127380), _3v.fdt(2057147455), _3v.fdx(476), _3v.fnm(384), _3v.frq(15201111), _3v.nrm(122), _3v.prx(109724024), _3v.tii(132491), _3v.tis(11457688), _3v.tvd(114), _3v.tvf(211902147), _3v.tvx(948), _4c.fdt(2691565961), _4c.fdx(39572), _4c.fnm(276), _4c.frq(18724620), _4c.nrm(9896), _4c.prx(590255960), _4c.tii(141243), _4c.tis(12185869), _4c.tvd(9894), _4c.tvf(932649779), _4c.tvx(79140), _4d.fdt(2398908136), _4d.fdx(548), _4d.fnm(354), _4d.frq(24581614), _4d.nrm(140), _4d.prx(158243133), _4d.tii(141948), _4d.tis(12259425), _4d.tvd(140), _4d.tvf(303769970), _4d.tvx(1092), _4p.fdt(1081212027), _4p.fdx(212), _4p.fnm(354), _4p.frq(8294102), _4p.nrm(56), _4p.prx(60513257), _4p.tii(134898), _4p.tis(11376730), _4p.tvd(56), _4p.tvf(116715012), _4p.tvx(420), _55.fdt(787581180), _55.fdx(156), _55.fnm(354), _55.frq(5376741), _55.nrm(42), _55.prx(39539985), _55.tii(133483), _55.tis(11351989), _55.tvd(42), _55.tvf(75958725), _55.tvx(308), _5e.prx(0), _5z.fdt(853614415), _5z.fdx(23700), _5z.fnm(1696), _5z.frq(9041984), _5z.nrm(5928), _5z.prx(32617823), _5z.tii(146115), _5z.tis(12387143), _5z.tvd(4142), _5z.tvf(74409815), _5z.tvx(47396), _63.fdt(212690811), _63.fdx(8188), _63.fnm(1652), _63.frq(2492889), _63.nrm(2050), _63.prx(5672093), _63.tii(8470), _63.tis(689993), _63.tvd(1442), _63.tvf(15371474), _63.tvx(16372), _64.fdt(4118409126), _64.fdx(1484), _64.fnm(384), _64.frq(35294399), _64.nrm(374), _64.prx(230791431), _64.tii(143860), _64.tis(12491845), _64.tvd(295), _64.tvf(444939185), _64.tvx(2964), _65.fdt(1369554944), _65.fdx(0), _65.fnm(1749), index.directory(5), segments.gen(20), *segments_2j(0)*, segments_2x(2354)

IFD [proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader@7636d59a]: init: load commit "*segments_2j*"
IW 1110 [proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader@7636d59a]: init: hit exception on init; releasing write lock
11:08:02,731 [proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader@7636d59a] ERROR proj.zoie.impl.indexing.internal.LuceneIndexDataLoader - Problem copying segments: *read past EOF*, start=0, end=0, bufferStart=0, bufferPosition=0, bufferSize=1024

11:16:03,300 [http-8080-2] DEBUG com.ds.acm.logic.tools.BackupManager - Asset index backup complete
{quote}

Looking in the logs there are *31 occurences of 'delete "segments_2j"' and 32 for 'wrote segments file "segments_2j"'*. Very weird, since the ZoieIndexDeletionPolicy should prevent the segments_2j from being deleted...

So just guessing here, but the problem seems to be that segement filenames are being 'rotated' and getting reused?
The ZoieIndexDeletionPolicy definitely does not seem to take this into account, it uses the segments filename as a unique id to track the references to a specific commit. But looking at it, segment filenames are not unique over a certain period of time, and that time period gets shorter when lots of modifications are being made to the index.
Reading the code it should still prevent any segments_2j commit from being deleted... but we do see deletes for that file... hmmm

And then something else still seems te be going wrong, somehow the segments file does become 0 bytes. It is not because it was written with no data, we added a check for that (look for 'closing 0kb IndexOutput') but that never happens to a segments_* file.

Maybe some logic needs to be added to prevent certain segment_... numbers from being re-used if the commit is still 'referenced' by an IndexDeletionPolicy...?
  
> Index corruption after 'read past EOF' under heavy update load and snapshot export
> ----------------------------------------------------------------------------------
>
>                 Key: LUCENE-2729
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2729
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>    Affects Versions: 3.0.1, 3.0.2
>         Environment: Happens on both OS X 10.6 and Windows 2008 Server. Integrated with zoie (using a zoie snapshot from 2010-08-06: zoie-2.0.0-snapshot-20100806.jar).
>            Reporter: Nico Krijnen
>         Attachments: 2010-11-02 IndexWriter infoStream log.zip, backup_force_failure2.log.zip, LUCENE-2729-test1.patch
>
>
> We have a system running lucene and zoie. We use lucene as a content store for a CMS/DAM system. We use the hot-backup feature of zoie to make scheduled backups of the index. This works fine for small indexes and when there are not a lot of changes to the index when the backup is made.
> On large indexes (about 5 GB to 19 GB), when a backup is made while the index is being changed a lot (lots of document additions and/or deletions), we almost always get a 'read past EOF' at some point, followed by lots of 'Lock obtain timed out'.
> At that point we get lots of 0 kb files in the index, data gets lots, and the index is unusable.
> When we stop our server, remove the 0kb files and restart our server, the index is operational again, but data has been lost.
> I'm not sure if this is a zoie or a lucene issue, so i'm posting it to both. Hopefully someone has some ideas where to look to fix this.
> Some more details...
> Stack trace of the read past EOF and following Lock obtain timed out:
> {code}
> 78307 [proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader@31ca5085] 
>     ERROR proj.zoie.impl.indexing.internal.BaseSearchIndex - read past EOF
> java.io.IOException: read past EOF
>     at org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:154)
>     at org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:39)
>     at org.apache.lucene.store.ChecksumIndexInput.readByte(ChecksumIndexInput.java:37)
>     at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:69)
>     at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:245)
>     at org.apache.lucene.index.IndexFileDeleter.<init>(IndexFileDeleter.java:166)
>     at org.apache.lucene.index.DirectoryReader.doCommit(DirectoryReader.java:725)
>     at org.apache.lucene.index.IndexReader.commit(IndexReader.java:987)
>     at org.apache.lucene.index.IndexReader.commit(IndexReader.java:973)
>     at org.apache.lucene.index.IndexReader.decRef(IndexReader.java:162)
>     at org.apache.lucene.index.IndexReader.close(IndexReader.java:1003)
>     at proj.zoie.impl.indexing.internal.BaseSearchIndex.deleteDocs(BaseSearchIndex.java:203)
>     at proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:223)
>     at proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153)
>     at proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134)
>     at proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171)
>     at proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:373)
> 579336 [proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader@31ca5085] 
>     ERROR proj.zoie.impl.indexing.internal.LuceneIndexDataLoader - 
>     Problem copying segments: Lock obtain timed out: 
>     org.apache.lucene.store.SingleInstanceLock@5ad0b895: write.lock
> org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: 
>     org.apache.lucene.store.SingleInstanceLock@5ad0b895: write.lock
>     at org.apache.lucene.store.Lock.obtain(Lock.java:84)
>     at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1060)
>     at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:957)
>     at proj.zoie.impl.indexing.internal.DiskSearchIndex.openIndexWriter(DiskSearchIndex.java:176)
>     at proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:228)
>     at proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153)
>     at proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134)
>     at proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171)
>     at proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:373)
> {code}
> We get exactly the same behavour on both OS X and on Windows. On both zoie is using a SimpleFSDirectory.
> We also use a SingleInstanceLockFactory (since our process is the only one working with the index), but we get the same behaviour with a NativeFSLock.
> The snapshot backup is being made by calling:
> *proj.zoie.impl.indexing.ZoieSystem.exportSnapshot(WritableByteChannel)*
> Same issue in zoie JIRA:
> http://snaprojects.jira.com/browse/ZOIE-51

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] Commented: (LUCENE-2729) Index corruption after 'read past EOF' under heavy update load and snapshot export

Posted by "Nico Krijnen (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12930186#action_12930186 ] 

Nico Krijnen commented on LUCENE-2729:
--------------------------------------

We really appreciate the help and understand that this is not the only thing you are working on ;-)
Collecting post-it notes sounds so familliar :)

We ran a test with your patch to throw a RuntimeException when an output already exists.
We did get a 'read past EOF', but the additional RuntimeException is never thrown.

We'll add the other log points and do another test run with those. If you have more suggestions for logging, let us know, we won't start the next run until tomorrow anyway...

> Index corruption after 'read past EOF' under heavy update load and snapshot export
> ----------------------------------------------------------------------------------
>
>                 Key: LUCENE-2729
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2729
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>    Affects Versions: 3.0.1, 3.0.2
>         Environment: Happens on both OS X 10.6 and Windows 2008 Server. Integrated with zoie (using a zoie snapshot from 2010-08-06: zoie-2.0.0-snapshot-20100806.jar).
>            Reporter: Nico Krijnen
>         Attachments: 2010-11-02 IndexWriter infoStream log.zip, LUCENE-2729-test1.patch
>
>
> We have a system running lucene and zoie. We use lucene as a content store for a CMS/DAM system. We use the hot-backup feature of zoie to make scheduled backups of the index. This works fine for small indexes and when there are not a lot of changes to the index when the backup is made.
> On large indexes (about 5 GB to 19 GB), when a backup is made while the index is being changed a lot (lots of document additions and/or deletions), we almost always get a 'read past EOF' at some point, followed by lots of 'Lock obtain timed out'.
> At that point we get lots of 0 kb files in the index, data gets lots, and the index is unusable.
> When we stop our server, remove the 0kb files and restart our server, the index is operational again, but data has been lost.
> I'm not sure if this is a zoie or a lucene issue, so i'm posting it to both. Hopefully someone has some ideas where to look to fix this.
> Some more details...
> Stack trace of the read past EOF and following Lock obtain timed out:
> {code}
> 78307 [proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader@31ca5085] ERROR proj.zoie.impl.indexing.internal.BaseSearchIndex - read past EOF
> java.io.IOException: read past EOF
>     at org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:154)
>     at org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:39)
>     at org.apache.lucene.store.ChecksumIndexInput.readByte(ChecksumIndexInput.java:37)
>     at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:69)
>     at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:245)
>     at org.apache.lucene.index.IndexFileDeleter.<init>(IndexFileDeleter.java:166)
>     at org.apache.lucene.index.DirectoryReader.doCommit(DirectoryReader.java:725)
>     at org.apache.lucene.index.IndexReader.commit(IndexReader.java:987)
>     at org.apache.lucene.index.IndexReader.commit(IndexReader.java:973)
>     at org.apache.lucene.index.IndexReader.decRef(IndexReader.java:162)
>     at org.apache.lucene.index.IndexReader.close(IndexReader.java:1003)
>     at proj.zoie.impl.indexing.internal.BaseSearchIndex.deleteDocs(BaseSearchIndex.java:203)
>     at proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:223)
>     at proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153)
>     at proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134)
>     at proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171)
>     at proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:373)
> 579336 [proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader@31ca5085] ERROR proj.zoie.impl.indexing.internal.LuceneIndexDataLoader - 
>     Problem copying segments: Lock obtain timed out: org.apache.lucene.store.SingleInstanceLock@5ad0b895: write.lock
> org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: org.apache.lucene.store.SingleInstanceLock@5ad0b895: write.lock
>     at org.apache.lucene.store.Lock.obtain(Lock.java:84)
>     at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1060)
>     at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:957)
>     at proj.zoie.impl.indexing.internal.DiskSearchIndex.openIndexWriter(DiskSearchIndex.java:176)
>     at proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:228)
>     at proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153)
>     at proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134)
>     at proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171)
>     at proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:373)
> {code}
> We get exactly the same behavour on both OS X and on Windows. On both zoie is using a SimpleFSDirectory.
> We also use a SingleInstanceLockFactory (since our process is the only one working with the index), but we get the same behaviour with a NativeFSLock.
> The snapshot backup is being made by calling:
> *proj.zoie.impl.indexing.ZoieSystem.exportSnapshot(WritableByteChannel)*
> Same issue in zoie JIRA:
> http://snaprojects.jira.com/browse/ZOIE-51

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] Commented: (LUCENE-2729) Index corruption after 'read past EOF' under heavy update load and snapshot export

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12934913#action_12934913 ] 

Michael McCandless commented on LUCENE-2729:
--------------------------------------------

On your workaround, I think that should work -- you're having IFD
disregard zero length segments files.

Are you running with assertions enabled?  Maybe we'll catch
something...

I really don't like that "refcount is now -1" -- such a thing should
never happen to a refCount!  But looking at ZoieIDP's sources I think
that's harmless.  It has a finalizer that first sets refCount to 0 and
then calls close...

So the backup completes successfully, after which the segments_5e
exists as a (correct) non-zero file.  Then the snapshot is released,
and at that point, instead of the file being deleted (as we'd expect),
something is somehow making it 0 bytes.

Can you try modifying FSDir.fsync?  First, check if they file
about-to-be-sync'd in fact exists (throw exception or warn if not);
second, if we are syncing a segments_N file, print its name & full
stack trace?  I wonder if somehow we are syncing these files after
they got deleted...

Also: can you try upgrading to the latest 3.0.x tip (svn checkout
https://svn.apache.org/repos/asf/lucene/java/branches/lucene_3_0)?
This doesn't sound like any bug we've fixed on 3.0.x but it's worth a
shot :)

I see strange output for the backup, eg "Backup to file _7c.prx,
dataLen: 0" and then "Backup to file _7c.prx DONE, written: 8" -- why
is that?  (A number of files show this, but then a bunch of others
look like "correct" sizes).  Are you sure the backup copy of
segments_5e is actually valid?

You should probably mark ZoieIDP.lastSnapshotSegmentsFileName as
volatile.

OK digging through the log... it looks like IW is closed, committing
segments_5f and protecting segments_5e from deletion.  Then, and
IndexReader is opened (to do deletes) and it looks like that deletes
segments_5e because the next IW that's opened no longer sees
segments_5e (despite that the backup process is still going).

Can you double check that when Zoie opens an IndexReader for deletions
it's passing the ZoieIDP instance that's protecting segments_5e?

That's a serious problem, but, it cannot explain the zero length
files... unless: does Zoie open a read/write file (against the file it
needs to copy) when it does its backups?


> Index corruption after 'read past EOF' under heavy update load and snapshot export
> ----------------------------------------------------------------------------------
>
>                 Key: LUCENE-2729
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2729
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>    Affects Versions: 3.0.1, 3.0.2
>         Environment: Happens on both OS X 10.6 and Windows 2008 Server. Integrated with zoie (using a zoie snapshot from 2010-08-06: zoie-2.0.0-snapshot-20100806.jar).
>            Reporter: Nico Krijnen
>         Attachments: 2010-11-02 IndexWriter infoStream log.zip, backup_force_failure2.log.zip, eof-extra-logging-4-analysis.txt, eof-extra-logging-4.log.zip, LUCENE-2729-test1.patch, read-past-eof-debugging.zip
>
>
> We have a system running lucene and zoie. We use lucene as a content store for a CMS/DAM system. We use the hot-backup feature of zoie to make scheduled backups of the index. This works fine for small indexes and when there are not a lot of changes to the index when the backup is made.
> On large indexes (about 5 GB to 19 GB), when a backup is made while the index is being changed a lot (lots of document additions and/or deletions), we almost always get a 'read past EOF' at some point, followed by lots of 'Lock obtain timed out'.
> At that point we get lots of 0 kb files in the index, data gets lots, and the index is unusable.
> When we stop our server, remove the 0kb files and restart our server, the index is operational again, but data has been lost.
> I'm not sure if this is a zoie or a lucene issue, so i'm posting it to both. Hopefully someone has some ideas where to look to fix this.
> Some more details...
> Stack trace of the read past EOF and following Lock obtain timed out:
> {code}
> 78307 [proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader@31ca5085] 
>     ERROR proj.zoie.impl.indexing.internal.BaseSearchIndex - read past EOF
> java.io.IOException: read past EOF
>     at org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:154)
>     at org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:39)
>     at org.apache.lucene.store.ChecksumIndexInput.readByte(ChecksumIndexInput.java:37)
>     at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:69)
>     at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:245)
>     at org.apache.lucene.index.IndexFileDeleter.<init>(IndexFileDeleter.java:166)
>     at org.apache.lucene.index.DirectoryReader.doCommit(DirectoryReader.java:725)
>     at org.apache.lucene.index.IndexReader.commit(IndexReader.java:987)
>     at org.apache.lucene.index.IndexReader.commit(IndexReader.java:973)
>     at org.apache.lucene.index.IndexReader.decRef(IndexReader.java:162)
>     at org.apache.lucene.index.IndexReader.close(IndexReader.java:1003)
>     at proj.zoie.impl.indexing.internal.BaseSearchIndex.deleteDocs(BaseSearchIndex.java:203)
>     at proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:223)
>     at proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153)
>     at proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134)
>     at proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171)
>     at proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:373)
> 579336 [proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader@31ca5085] 
>     ERROR proj.zoie.impl.indexing.internal.LuceneIndexDataLoader - 
>     Problem copying segments: Lock obtain timed out: 
>     org.apache.lucene.store.SingleInstanceLock@5ad0b895: write.lock
> org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: 
>     org.apache.lucene.store.SingleInstanceLock@5ad0b895: write.lock
>     at org.apache.lucene.store.Lock.obtain(Lock.java:84)
>     at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1060)
>     at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:957)
>     at proj.zoie.impl.indexing.internal.DiskSearchIndex.openIndexWriter(DiskSearchIndex.java:176)
>     at proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:228)
>     at proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153)
>     at proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134)
>     at proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171)
>     at proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:373)
> {code}
> We get exactly the same behavour on both OS X and on Windows. On both zoie is using a SimpleFSDirectory.
> We also use a SingleInstanceLockFactory (since our process is the only one working with the index), but we get the same behaviour with a NativeFSLock.
> The snapshot backup is being made by calling:
> *proj.zoie.impl.indexing.ZoieSystem.exportSnapshot(WritableByteChannel)*
> Same issue in zoie JIRA:
> http://snaprojects.jira.com/browse/ZOIE-51

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] Commented: (LUCENE-2729) Index corruption after 'read past EOF' under heavy update load and snapshot export

Posted by "Jan te Beest (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12935351#action_12935351 ] 

Jan te Beest commented on LUCENE-2729:
--------------------------------------

Splendid, it's working like a charm now!

We're using lucene+zoie as a backend to our DAM system (http://elvisready.com). Most of our clients have
multiple wire-feeds coming in 24/7 and twice this issue left a corrupted index behind. Which was
easily fixed by deleting the 0k segments file, but it was really scary nonetheless.

I won't attach the log file as it's rather boring now, but if you want to take a look let me know and I'll post it anyway.

Thanks again for your help!

> Index corruption after 'read past EOF' under heavy update load and snapshot export
> ----------------------------------------------------------------------------------
>
>                 Key: LUCENE-2729
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2729
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>    Affects Versions: 3.0.1, 3.0.2
>         Environment: Happens on both OS X 10.6 and Windows 2008 Server. Integrated with zoie (using a zoie snapshot from 2010-08-06: zoie-2.0.0-snapshot-20100806.jar).
>            Reporter: Nico Krijnen
>         Attachments: 2010-11-02 IndexWriter infoStream log.zip, backup_force_failure2.log.zip, eof-extra-logging-4-analysis.txt, eof-extra-logging-4.log.zip, LUCENE-2729-test1.patch, read-past-eof-debugging.zip
>
>
> We have a system running lucene and zoie. We use lucene as a content store for a CMS/DAM system. We use the hot-backup feature of zoie to make scheduled backups of the index. This works fine for small indexes and when there are not a lot of changes to the index when the backup is made.
> On large indexes (about 5 GB to 19 GB), when a backup is made while the index is being changed a lot (lots of document additions and/or deletions), we almost always get a 'read past EOF' at some point, followed by lots of 'Lock obtain timed out'.
> At that point we get lots of 0 kb files in the index, data gets lots, and the index is unusable.
> When we stop our server, remove the 0kb files and restart our server, the index is operational again, but data has been lost.
> I'm not sure if this is a zoie or a lucene issue, so i'm posting it to both. Hopefully someone has some ideas where to look to fix this.
> Some more details...
> Stack trace of the read past EOF and following Lock obtain timed out:
> {code}
> 78307 [proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader@31ca5085] 
>     ERROR proj.zoie.impl.indexing.internal.BaseSearchIndex - read past EOF
> java.io.IOException: read past EOF
>     at org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:154)
>     at org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:39)
>     at org.apache.lucene.store.ChecksumIndexInput.readByte(ChecksumIndexInput.java:37)
>     at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:69)
>     at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:245)
>     at org.apache.lucene.index.IndexFileDeleter.<init>(IndexFileDeleter.java:166)
>     at org.apache.lucene.index.DirectoryReader.doCommit(DirectoryReader.java:725)
>     at org.apache.lucene.index.IndexReader.commit(IndexReader.java:987)
>     at org.apache.lucene.index.IndexReader.commit(IndexReader.java:973)
>     at org.apache.lucene.index.IndexReader.decRef(IndexReader.java:162)
>     at org.apache.lucene.index.IndexReader.close(IndexReader.java:1003)
>     at proj.zoie.impl.indexing.internal.BaseSearchIndex.deleteDocs(BaseSearchIndex.java:203)
>     at proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:223)
>     at proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153)
>     at proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134)
>     at proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171)
>     at proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:373)
> 579336 [proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader@31ca5085] 
>     ERROR proj.zoie.impl.indexing.internal.LuceneIndexDataLoader - 
>     Problem copying segments: Lock obtain timed out: 
>     org.apache.lucene.store.SingleInstanceLock@5ad0b895: write.lock
> org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: 
>     org.apache.lucene.store.SingleInstanceLock@5ad0b895: write.lock
>     at org.apache.lucene.store.Lock.obtain(Lock.java:84)
>     at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1060)
>     at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:957)
>     at proj.zoie.impl.indexing.internal.DiskSearchIndex.openIndexWriter(DiskSearchIndex.java:176)
>     at proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:228)
>     at proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153)
>     at proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134)
>     at proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171)
>     at proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:373)
> {code}
> We get exactly the same behavour on both OS X and on Windows. On both zoie is using a SimpleFSDirectory.
> We also use a SingleInstanceLockFactory (since our process is the only one working with the index), but we get the same behaviour with a NativeFSLock.
> The snapshot backup is being made by calling:
> *proj.zoie.impl.indexing.ZoieSystem.exportSnapshot(WritableByteChannel)*
> Same issue in zoie JIRA:
> http://snaprojects.jira.com/browse/ZOIE-51

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] Commented: (LUCENE-2729) Index corruption after 'read past EOF' under heavy update load and snapshot export

Posted by "Simon Willnauer (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12930091#action_12930091 ] 

Simon Willnauer commented on LUCENE-2729:
-----------------------------------------

bq.     at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:245)
I just had a quick look at it and I wonder what revision you use for that? SegmentsInfos.java does not contain a read call nice [revision 892992 | https://svn.apache.org/viewvc/lucene/dev/branches/branch_3x/lucene/src/java/org/apache/lucene/index/SegmentInfos.java?revision=892992&view=markup]

can you clarify?

> Index corruption after 'read past EOF' under heavy update load and snapshot export
> ----------------------------------------------------------------------------------
>
>                 Key: LUCENE-2729
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2729
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>    Affects Versions: 3.0.1, 3.0.2
>         Environment: Happens on both OS X 10.6 and Windows 2008 Server. Integrated with zoie (using a zoie snapshot from 2010-08-06: zoie-2.0.0-snapshot-20100806.jar).
>            Reporter: Nico Krijnen
>         Attachments: 2010-11-02 IndexWriter infoStream log.zip, LUCENE-2729-test1.patch
>
>
> We have a system running lucene and zoie. We use lucene as a content store for a CMS/DAM system. We use the hot-backup feature of zoie to make scheduled backups of the index. This works fine for small indexes and when there are not a lot of changes to the index when the backup is made.
> On large indexes (about 5 GB to 19 GB), when a backup is made while the index is being changed a lot (lots of document additions and/or deletions), we almost always get a 'read past EOF' at some point, followed by lots of 'Lock obtain timed out'.
> At that point we get lots of 0 kb files in the index, data gets lots, and the index is unusable.
> When we stop our server, remove the 0kb files and restart our server, the index is operational again, but data has been lost.
> I'm not sure if this is a zoie or a lucene issue, so i'm posting it to both. Hopefully someone has some ideas where to look to fix this.
> Some more details...
> Stack trace of the read past EOF and following Lock obtain timed out:
> {code}
> 78307 [proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader@31ca5085] ERROR proj.zoie.impl.indexing.internal.BaseSearchIndex - read past EOF
> java.io.IOException: read past EOF
>     at org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:154)
>     at org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:39)
>     at org.apache.lucene.store.ChecksumIndexInput.readByte(ChecksumIndexInput.java:37)
>     at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:69)
>     at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:245)
>     at org.apache.lucene.index.IndexFileDeleter.<init>(IndexFileDeleter.java:166)
>     at org.apache.lucene.index.DirectoryReader.doCommit(DirectoryReader.java:725)
>     at org.apache.lucene.index.IndexReader.commit(IndexReader.java:987)
>     at org.apache.lucene.index.IndexReader.commit(IndexReader.java:973)
>     at org.apache.lucene.index.IndexReader.decRef(IndexReader.java:162)
>     at org.apache.lucene.index.IndexReader.close(IndexReader.java:1003)
>     at proj.zoie.impl.indexing.internal.BaseSearchIndex.deleteDocs(BaseSearchIndex.java:203)
>     at proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:223)
>     at proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153)
>     at proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134)
>     at proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171)
>     at proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:373)
> 579336 [proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader@31ca5085] ERROR proj.zoie.impl.indexing.internal.LuceneIndexDataLoader - 
>     Problem copying segments: Lock obtain timed out: org.apache.lucene.store.SingleInstanceLock@5ad0b895: write.lock
> org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: org.apache.lucene.store.SingleInstanceLock@5ad0b895: write.lock
>     at org.apache.lucene.store.Lock.obtain(Lock.java:84)
>     at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1060)
>     at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:957)
>     at proj.zoie.impl.indexing.internal.DiskSearchIndex.openIndexWriter(DiskSearchIndex.java:176)
>     at proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:228)
>     at proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153)
>     at proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134)
>     at proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171)
>     at proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:373)
> {code}
> We get exactly the same behavour on both OS X and on Windows. On both zoie is using a SimpleFSDirectory.
> We also use a SingleInstanceLockFactory (since our process is the only one working with the index), but we get the same behaviour with a NativeFSLock.
> The snapshot backup is being made by calling:
> *proj.zoie.impl.indexing.ZoieSystem.exportSnapshot(WritableByteChannel)*
> Same issue in zoie JIRA:
> http://snaprojects.jira.com/browse/ZOIE-51

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] Issue Comment Edited: (LUCENE-2729) Index corruption after 'read past EOF' under heavy update load and snapshot export

Posted by "Nico Krijnen (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12926908#action_12926908 ] 

Nico Krijnen edited comment on LUCENE-2729 at 11/1/10 6:52 AM:
---------------------------------------------------------------

bq. Was there no original root cause here? Eg disk full?

This was one of the first things i thought, but the disk has more than enough free space: 200GB. Also, for this test we write the backup to a different disk - both for better performance and to prevent the disk with the index on it from running out of free space.

bq. Or is the "read past EOF" on closing an IndexReader w/ pending deletes really the first exception you see?

It is the first exception we see. We turned on quite a bit of additional logging but we have not been able to find anything weird happening before this error. I do expect something weird must have happened to cause the 'read past EOF'.

Do you have any clues as to what we could look for? - that might narrow the search.
We are able to consistently reproduce this on our test environment. So if you have clues to specific debug logging that should be turned on - we can do another test run.

bq. Does zoie somehow touch the index files?

We'll try to find out. For as far as I see the basic backup procedure is to grab the last 'commit snapshot', prevent it from being deleted (ZoieIndexDeletionPolicy), and write all the files mentioned in the commit snapshot to a NIO WritableByteChannel (proj.zoie.impl.indexing.internal.DiskIndexSnapshot#writeTo) - we call proj.zoie.impl.indexing.ZoieSystem.exportSnapshot(WritableByteChannel) ourselves.

      was (Author: nkrijnen):
    bq. Was there no original root cause here? Eg disk full?

This was one of the first things i thought, but the disk has more than enough free space: 200GB. Also, for this test we write the backup to a different disk - both for better performance and to prevent the disk with the index on it from running out of free space.

bq. Or is the "read past EOF" on closing an IndexReader w/ pending deletes really the first exception you see?

It is the first exception we see. We turned on quite a bit of additional logging but we have not been able to find out anything weird happening before this error. I do expect something weird must have happened to cause the 'read past EOF'.

Do you have any clues as to what we could look for? - that might narrow the search.
We are able to consistently reproduce this on our test environment. So if you have clues to specific debug logging that should be turned on - we can do another test run.

bq. Does zoie somehow touch the index files?

We'll try to find out. For as far as I see the basic backup procedure is to grab the last 'commit snapshot', prevent it from being deleted (ZoieIndexDeletionPolicy), and write all the files mentioned in the commit snapshot to a NIO WritableByteChannel.
  
> Index corruption after 'read past EOF' under heavy update load and snapshot export
> ----------------------------------------------------------------------------------
>
>                 Key: LUCENE-2729
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2729
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>    Affects Versions: 3.0.1, 3.0.2
>         Environment: Happens on both OS X 10.6 and Windows 2008 Server. Integrated with zoie (using a zoie snapshot from 2010-08-06: zoie-2.0.0-snapshot-20100806.jar).
>            Reporter: Nico Krijnen
>
> We have a system running lucene and zoie. We use lucene as a content store for a CMS/DAM system. We use the hot-backup feature of zoie to make scheduled backups of the index. This works fine for small indexes and when there are not a lot of changes to the index when the backup is made.
> On large indexes (about 5 GB to 19 GB), when a backup is made while the index is being changed a lot (lots of document additions and/or deletions), we almost always get a 'read past EOF' at some point, followed by lots of 'Lock obtain timed out'.
> At that point we get lots of 0 kb files in the index, data gets lots, and the index is unusable.
> When we stop our server, remove the 0kb files and restart our server, the index is operational again, but data has been lost.
> I'm not sure if this is a zoie or a lucene issue, so i'm posting it to both. Hopefully someone has some ideas where to look to fix this.
> Some more details...
> Stack trace of the read past EOF and following Lock obtain timed out:
> {code}
> 78307 [proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader@31ca5085] ERROR proj.zoie.impl.indexing.internal.BaseSearchIndex - read past EOF
> java.io.IOException: read past EOF
>     at org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:154)
>     at org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:39)
>     at org.apache.lucene.store.ChecksumIndexInput.readByte(ChecksumIndexInput.java:37)
>     at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:69)
>     at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:245)
>     at org.apache.lucene.index.IndexFileDeleter.<init>(IndexFileDeleter.java:166)
>     at org.apache.lucene.index.DirectoryReader.doCommit(DirectoryReader.java:725)
>     at org.apache.lucene.index.IndexReader.commit(IndexReader.java:987)
>     at org.apache.lucene.index.IndexReader.commit(IndexReader.java:973)
>     at org.apache.lucene.index.IndexReader.decRef(IndexReader.java:162)
>     at org.apache.lucene.index.IndexReader.close(IndexReader.java:1003)
>     at proj.zoie.impl.indexing.internal.BaseSearchIndex.deleteDocs(BaseSearchIndex.java:203)
>     at proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:223)
>     at proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153)
>     at proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134)
>     at proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171)
>     at proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:373)
> 579336 [proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader@31ca5085] ERROR proj.zoie.impl.indexing.internal.LuceneIndexDataLoader - 
>     Problem copying segments: Lock obtain timed out: org.apache.lucene.store.SingleInstanceLock@5ad0b895: write.lock
> org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: org.apache.lucene.store.SingleInstanceLock@5ad0b895: write.lock
>     at org.apache.lucene.store.Lock.obtain(Lock.java:84)
>     at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1060)
>     at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:957)
>     at proj.zoie.impl.indexing.internal.DiskSearchIndex.openIndexWriter(DiskSearchIndex.java:176)
>     at proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:228)
>     at proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153)
>     at proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134)
>     at proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171)
>     at proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:373)
> {code}
> We get exactly the same behavour on both OS X and on Windows. On both zoie is using a SimpleFSDirectory.
> We also use a SingleInstanceLockFactory (since our process is the only one working with the index), but we get the same behaviour with a NativeFSLock.
> The snapshot backup is being made by calling:
> *proj.zoie.impl.indexing.ZoieSystem.exportSnapshot(WritableByteChannel)*
> Same issue in zoie JIRA:
> http://snaprojects.jira.com/browse/ZOIE-51

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] Updated: (LUCENE-2729) Index corruption after 'read past EOF' under heavy update load and snapshot export

Posted by "Jan te Beest (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/LUCENE-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jan te Beest updated LUCENE-2729:
---------------------------------

    Attachment: read-past-eof-debugging.zip

The proper zip file containing full log, stripped log, svn-diff against lucene and git patch against zoie.

> Index corruption after 'read past EOF' under heavy update load and snapshot export
> ----------------------------------------------------------------------------------
>
>                 Key: LUCENE-2729
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2729
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>    Affects Versions: 3.0.1, 3.0.2
>         Environment: Happens on both OS X 10.6 and Windows 2008 Server. Integrated with zoie (using a zoie snapshot from 2010-08-06: zoie-2.0.0-snapshot-20100806.jar).
>            Reporter: Nico Krijnen
>         Attachments: 2010-11-02 IndexWriter infoStream log.zip, backup_force_failure2.log.zip, eof-extra-logging-4-analysis.txt, eof-extra-logging-4.log.zip, LUCENE-2729-test1.patch, read-past-eof-debugging.zip
>
>
> We have a system running lucene and zoie. We use lucene as a content store for a CMS/DAM system. We use the hot-backup feature of zoie to make scheduled backups of the index. This works fine for small indexes and when there are not a lot of changes to the index when the backup is made.
> On large indexes (about 5 GB to 19 GB), when a backup is made while the index is being changed a lot (lots of document additions and/or deletions), we almost always get a 'read past EOF' at some point, followed by lots of 'Lock obtain timed out'.
> At that point we get lots of 0 kb files in the index, data gets lots, and the index is unusable.
> When we stop our server, remove the 0kb files and restart our server, the index is operational again, but data has been lost.
> I'm not sure if this is a zoie or a lucene issue, so i'm posting it to both. Hopefully someone has some ideas where to look to fix this.
> Some more details...
> Stack trace of the read past EOF and following Lock obtain timed out:
> {code}
> 78307 [proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader@31ca5085] 
>     ERROR proj.zoie.impl.indexing.internal.BaseSearchIndex - read past EOF
> java.io.IOException: read past EOF
>     at org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:154)
>     at org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:39)
>     at org.apache.lucene.store.ChecksumIndexInput.readByte(ChecksumIndexInput.java:37)
>     at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:69)
>     at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:245)
>     at org.apache.lucene.index.IndexFileDeleter.<init>(IndexFileDeleter.java:166)
>     at org.apache.lucene.index.DirectoryReader.doCommit(DirectoryReader.java:725)
>     at org.apache.lucene.index.IndexReader.commit(IndexReader.java:987)
>     at org.apache.lucene.index.IndexReader.commit(IndexReader.java:973)
>     at org.apache.lucene.index.IndexReader.decRef(IndexReader.java:162)
>     at org.apache.lucene.index.IndexReader.close(IndexReader.java:1003)
>     at proj.zoie.impl.indexing.internal.BaseSearchIndex.deleteDocs(BaseSearchIndex.java:203)
>     at proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:223)
>     at proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153)
>     at proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134)
>     at proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171)
>     at proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:373)
> 579336 [proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader@31ca5085] 
>     ERROR proj.zoie.impl.indexing.internal.LuceneIndexDataLoader - 
>     Problem copying segments: Lock obtain timed out: 
>     org.apache.lucene.store.SingleInstanceLock@5ad0b895: write.lock
> org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: 
>     org.apache.lucene.store.SingleInstanceLock@5ad0b895: write.lock
>     at org.apache.lucene.store.Lock.obtain(Lock.java:84)
>     at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1060)
>     at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:957)
>     at proj.zoie.impl.indexing.internal.DiskSearchIndex.openIndexWriter(DiskSearchIndex.java:176)
>     at proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:228)
>     at proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153)
>     at proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134)
>     at proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171)
>     at proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:373)
> {code}
> We get exactly the same behavour on both OS X and on Windows. On both zoie is using a SimpleFSDirectory.
> We also use a SingleInstanceLockFactory (since our process is the only one working with the index), but we get the same behaviour with a NativeFSLock.
> The snapshot backup is being made by calling:
> *proj.zoie.impl.indexing.ZoieSystem.exportSnapshot(WritableByteChannel)*
> Same issue in zoie JIRA:
> http://snaprojects.jira.com/browse/ZOIE-51

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] Commented: (LUCENE-2729) Index corruption after 'read past EOF' under heavy update load and snapshot export

Posted by "John Wang (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12929702#action_12929702 ] 

John Wang commented on LUCENE-2729:
-----------------------------------

zoie does not touch index files, only adds an index.directory file containing version information.

> Index corruption after 'read past EOF' under heavy update load and snapshot export
> ----------------------------------------------------------------------------------
>
>                 Key: LUCENE-2729
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2729
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>    Affects Versions: 3.0.1, 3.0.2
>         Environment: Happens on both OS X 10.6 and Windows 2008 Server. Integrated with zoie (using a zoie snapshot from 2010-08-06: zoie-2.0.0-snapshot-20100806.jar).
>            Reporter: Nico Krijnen
>         Attachments: 2010-11-02 IndexWriter infoStream log.zip
>
>
> We have a system running lucene and zoie. We use lucene as a content store for a CMS/DAM system. We use the hot-backup feature of zoie to make scheduled backups of the index. This works fine for small indexes and when there are not a lot of changes to the index when the backup is made.
> On large indexes (about 5 GB to 19 GB), when a backup is made while the index is being changed a lot (lots of document additions and/or deletions), we almost always get a 'read past EOF' at some point, followed by lots of 'Lock obtain timed out'.
> At that point we get lots of 0 kb files in the index, data gets lots, and the index is unusable.
> When we stop our server, remove the 0kb files and restart our server, the index is operational again, but data has been lost.
> I'm not sure if this is a zoie or a lucene issue, so i'm posting it to both. Hopefully someone has some ideas where to look to fix this.
> Some more details...
> Stack trace of the read past EOF and following Lock obtain timed out:
> {code}
> 78307 [proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader@31ca5085] ERROR proj.zoie.impl.indexing.internal.BaseSearchIndex - read past EOF
> java.io.IOException: read past EOF
>     at org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:154)
>     at org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:39)
>     at org.apache.lucene.store.ChecksumIndexInput.readByte(ChecksumIndexInput.java:37)
>     at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:69)
>     at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:245)
>     at org.apache.lucene.index.IndexFileDeleter.<init>(IndexFileDeleter.java:166)
>     at org.apache.lucene.index.DirectoryReader.doCommit(DirectoryReader.java:725)
>     at org.apache.lucene.index.IndexReader.commit(IndexReader.java:987)
>     at org.apache.lucene.index.IndexReader.commit(IndexReader.java:973)
>     at org.apache.lucene.index.IndexReader.decRef(IndexReader.java:162)
>     at org.apache.lucene.index.IndexReader.close(IndexReader.java:1003)
>     at proj.zoie.impl.indexing.internal.BaseSearchIndex.deleteDocs(BaseSearchIndex.java:203)
>     at proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:223)
>     at proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153)
>     at proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134)
>     at proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171)
>     at proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:373)
> 579336 [proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader@31ca5085] ERROR proj.zoie.impl.indexing.internal.LuceneIndexDataLoader - 
>     Problem copying segments: Lock obtain timed out: org.apache.lucene.store.SingleInstanceLock@5ad0b895: write.lock
> org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: org.apache.lucene.store.SingleInstanceLock@5ad0b895: write.lock
>     at org.apache.lucene.store.Lock.obtain(Lock.java:84)
>     at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1060)
>     at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:957)
>     at proj.zoie.impl.indexing.internal.DiskSearchIndex.openIndexWriter(DiskSearchIndex.java:176)
>     at proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:228)
>     at proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153)
>     at proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134)
>     at proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171)
>     at proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:373)
> {code}
> We get exactly the same behavour on both OS X and on Windows. On both zoie is using a SimpleFSDirectory.
> We also use a SingleInstanceLockFactory (since our process is the only one working with the index), but we get the same behaviour with a NativeFSLock.
> The snapshot backup is being made by calling:
> *proj.zoie.impl.indexing.ZoieSystem.exportSnapshot(WritableByteChannel)*
> Same issue in zoie JIRA:
> http://snaprojects.jira.com/browse/ZOIE-51

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] Updated: (LUCENE-2729) Index corruption after 'read past EOF' under heavy update load and snapshot export

Posted by "Nico Krijnen (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/LUCENE-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Nico Krijnen updated LUCENE-2729:
---------------------------------

    Description: 
We have a system running lucene and zoie. We use lucene as a content store for a CMS/DAM system. We use the hot-backup feature of zoie to make scheduled backups of the index. This works fine for small indexes and when there are not a lot of changes to the index when the backup is made.

On large indexes (about 5 GB to 19 GB), when a backup is made while the index is being changed a lot (lots of document additions and/or deletions), we almost always get a 'read past EOF' at some point, followed by lots of 'Lock obtain timed out'.
At that point we get lots of 0 kb files in the index, data gets lots, and the index is unusable.

When we stop our server, remove the 0kb files and restart our server, the index is operational again, but data has been lost.

I'm not sure if this is a zoie or a lucene issue, so i'm posting it to both. Hopefully someone has some ideas where to look to fix this.


Some more details...

Stack trace of the read past EOF and following Lock obtain timed out:

{code}
78307 [proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader@31ca5085] ERROR proj.zoie.impl.indexing.internal.BaseSearchIndex - read past EOF
java.io.IOException: read past EOF
    at org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:154)
    at org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:39)
    at org.apache.lucene.store.ChecksumIndexInput.readByte(ChecksumIndexInput.java:37)
    at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:69)
    at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:245)
    at org.apache.lucene.index.IndexFileDeleter.<init>(IndexFileDeleter.java:166)
    at org.apache.lucene.index.DirectoryReader.doCommit(DirectoryReader.java:725)
    at org.apache.lucene.index.IndexReader.commit(IndexReader.java:987)
    at org.apache.lucene.index.IndexReader.commit(IndexReader.java:973)
    at org.apache.lucene.index.IndexReader.decRef(IndexReader.java:162)
    at org.apache.lucene.index.IndexReader.close(IndexReader.java:1003)
    at proj.zoie.impl.indexing.internal.BaseSearchIndex.deleteDocs(BaseSearchIndex.java:203)
    at proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:223)
    at proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153)
    at proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134)
    at proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171)
    at proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:373)
579336 [proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader@31ca5085] ERROR proj.zoie.impl.indexing.internal.LuceneIndexDataLoader - 
    Problem copying segments: Lock obtain timed out: org.apache.lucene.store.SingleInstanceLock@5ad0b895: write.lock
org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: org.apache.lucene.store.SingleInstanceLock@5ad0b895: write.lock
    at org.apache.lucene.store.Lock.obtain(Lock.java:84)
    at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1060)
    at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:957)
    at proj.zoie.impl.indexing.internal.DiskSearchIndex.openIndexWriter(DiskSearchIndex.java:176)
    at proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:228)
    at proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153)
    at proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134)
    at proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171)
    at proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:373)
{code}

We get exactly the same behavour on both OS X and on Windows. On both zoie is using a SimpleFSDirectory.
We also use a SingleInstanceLockFactory (since our process is the only one working with the index), but we get the same behaviour with a NativeFSLock.

The snapshot backup is being made by calling:

*proj.zoie.impl.indexing.ZoieSystem.exportSnapshot(WritableByteChannel)*

Same issue in zoie JIRA:

http://snaprojects.jira.com/browse/ZOIE-51

  was:
We have a system running lucene and zoie. We use lucene as a content store for a CMS/DAM system. We use the hot-backup feature of zoie to make scheduled backups of the index. This works fine for small indexes and when there are not a lot of changes to the index when the backup is made.

On large indexes (about 5 GB to 19 GB), when a backup is made while the index is being changed a lot (lots of document additions and/or deletions), we almost always get a 'read past EOF' at some point, followed by lots of 'Lock obtain timed out'.
At that point we get lots of 0 kb files in the index, data gets lots, and the index is unusable.

When we stop our server, remove the 0kb files and restart our server, the index is operational again, but data has been lost.

I'm not sure if this is a zoie or a lucene issue, so i'm posting it to both. Hopefully someone has some ideas where to look to fix this.


Some more details...

Stack trace of the read past EOF and following Lock obtain timed out:

{code}
78307 [proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader@31ca5085] ERROR proj.zoie.impl.indexing.internal.BaseSearchIndex - read past EOF
java.io.IOException: read past EOF
    at org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:154)
    at org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:39)
    at org.apache.lucene.store.ChecksumIndexInput.readByte(ChecksumIndexInput.java:37)
    at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:69)
    at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:245)
    at org.apache.lucene.index.IndexFileDeleter.<init>(IndexFileDeleter.java:166)
    at org.apache.lucene.index.DirectoryReader.doCommit(DirectoryReader.java:725)
    at org.apache.lucene.index.IndexReader.commit(IndexReader.java:987)
    at org.apache.lucene.index.IndexReader.commit(IndexReader.java:973)
    at org.apache.lucene.index.IndexReader.decRef(IndexReader.java:162)
    at org.apache.lucene.index.IndexReader.close(IndexReader.java:1003)
    at proj.zoie.impl.indexing.internal.BaseSearchIndex.deleteDocs(BaseSearchIndex.java:203)
    at proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:223)
    at proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153)
    at proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134)
    at proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171)
    at proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:373)
579336 [proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader@31ca5085] ERROR proj.zoie.impl.indexing.internal.LuceneIndexDataLoader - Problem copying segments: Lock obtain timed out: org.apache.lucene.store.SingleInstanceLock@5ad0b895: write.lock
org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: org.apache.lucene.store.SingleInstanceLock@5ad0b895: write.lock
    at org.apache.lucene.store.Lock.obtain(Lock.java:84)
    at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1060)
    at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:957)
    at proj.zoie.impl.indexing.internal.DiskSearchIndex.openIndexWriter(DiskSearchIndex.java:176)
    at proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:228)
    at proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153)
    at proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134)
    at proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171)
    at proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:373)
{code}

We get exactly the same behavour on both OS X and on Windows. On both zoie is using a SimpleFSDirectory.
We also use a SingleInstanceLockFactory (since our process is the only one working with the index), but we get the same behaviour with a NativeFSLock.

The snapshot backup is being made by calling:

*proj.zoie.impl.indexing.ZoieSystem.exportSnapshot(WritableByteChannel)*

Same issue in zoie JIRA:

http://snaprojects.jira.com/browse/ZOIE-51


> Index corruption after 'read past EOF' under heavy update load and snapshot export
> ----------------------------------------------------------------------------------
>
>                 Key: LUCENE-2729
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2729
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>    Affects Versions: 3.0.1, 3.0.2
>         Environment: Happens on both OS X 10.6 and Windows 2008 Server. Integrated with zoie (using a zoie snapshot from 2010-08-06: zoie-2.0.0-snapshot-20100806.jar).
>            Reporter: Nico Krijnen
>
> We have a system running lucene and zoie. We use lucene as a content store for a CMS/DAM system. We use the hot-backup feature of zoie to make scheduled backups of the index. This works fine for small indexes and when there are not a lot of changes to the index when the backup is made.
> On large indexes (about 5 GB to 19 GB), when a backup is made while the index is being changed a lot (lots of document additions and/or deletions), we almost always get a 'read past EOF' at some point, followed by lots of 'Lock obtain timed out'.
> At that point we get lots of 0 kb files in the index, data gets lots, and the index is unusable.
> When we stop our server, remove the 0kb files and restart our server, the index is operational again, but data has been lost.
> I'm not sure if this is a zoie or a lucene issue, so i'm posting it to both. Hopefully someone has some ideas where to look to fix this.
> Some more details...
> Stack trace of the read past EOF and following Lock obtain timed out:
> {code}
> 78307 [proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader@31ca5085] ERROR proj.zoie.impl.indexing.internal.BaseSearchIndex - read past EOF
> java.io.IOException: read past EOF
>     at org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:154)
>     at org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:39)
>     at org.apache.lucene.store.ChecksumIndexInput.readByte(ChecksumIndexInput.java:37)
>     at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:69)
>     at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:245)
>     at org.apache.lucene.index.IndexFileDeleter.<init>(IndexFileDeleter.java:166)
>     at org.apache.lucene.index.DirectoryReader.doCommit(DirectoryReader.java:725)
>     at org.apache.lucene.index.IndexReader.commit(IndexReader.java:987)
>     at org.apache.lucene.index.IndexReader.commit(IndexReader.java:973)
>     at org.apache.lucene.index.IndexReader.decRef(IndexReader.java:162)
>     at org.apache.lucene.index.IndexReader.close(IndexReader.java:1003)
>     at proj.zoie.impl.indexing.internal.BaseSearchIndex.deleteDocs(BaseSearchIndex.java:203)
>     at proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:223)
>     at proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153)
>     at proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134)
>     at proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171)
>     at proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:373)
> 579336 [proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader@31ca5085] ERROR proj.zoie.impl.indexing.internal.LuceneIndexDataLoader - 
>     Problem copying segments: Lock obtain timed out: org.apache.lucene.store.SingleInstanceLock@5ad0b895: write.lock
> org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: org.apache.lucene.store.SingleInstanceLock@5ad0b895: write.lock
>     at org.apache.lucene.store.Lock.obtain(Lock.java:84)
>     at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1060)
>     at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:957)
>     at proj.zoie.impl.indexing.internal.DiskSearchIndex.openIndexWriter(DiskSearchIndex.java:176)
>     at proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:228)
>     at proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153)
>     at proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134)
>     at proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171)
>     at proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:373)
> {code}
> We get exactly the same behavour on both OS X and on Windows. On both zoie is using a SimpleFSDirectory.
> We also use a SingleInstanceLockFactory (since our process is the only one working with the index), but we get the same behaviour with a NativeFSLock.
> The snapshot backup is being made by calling:
> *proj.zoie.impl.indexing.ZoieSystem.exportSnapshot(WritableByteChannel)*
> Same issue in zoie JIRA:
> http://snaprojects.jira.com/browse/ZOIE-51

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] Updated: (LUCENE-2729) Index corruption after 'read past EOF' under heavy update load and snapshot export

Posted by "Jan te Beest (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/LUCENE-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jan te Beest updated LUCENE-2729:
---------------------------------

    Attachment:     (was: eof-extra-logging-4.log.zip)

> Index corruption after 'read past EOF' under heavy update load and snapshot export
> ----------------------------------------------------------------------------------
>
>                 Key: LUCENE-2729
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2729
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>    Affects Versions: 3.0.1, 3.0.2
>         Environment: Happens on both OS X 10.6 and Windows 2008 Server. Integrated with zoie (using a zoie snapshot from 2010-08-06: zoie-2.0.0-snapshot-20100806.jar).
>            Reporter: Nico Krijnen
>         Attachments: 2010-11-02 IndexWriter infoStream log.zip, backup_force_failure2.log.zip, eof-extra-logging-4-analysis.txt, eof-extra-logging-4.log.zip, LUCENE-2729-test1.patch, read-past-eof-debugging.zip
>
>
> We have a system running lucene and zoie. We use lucene as a content store for a CMS/DAM system. We use the hot-backup feature of zoie to make scheduled backups of the index. This works fine for small indexes and when there are not a lot of changes to the index when the backup is made.
> On large indexes (about 5 GB to 19 GB), when a backup is made while the index is being changed a lot (lots of document additions and/or deletions), we almost always get a 'read past EOF' at some point, followed by lots of 'Lock obtain timed out'.
> At that point we get lots of 0 kb files in the index, data gets lots, and the index is unusable.
> When we stop our server, remove the 0kb files and restart our server, the index is operational again, but data has been lost.
> I'm not sure if this is a zoie or a lucene issue, so i'm posting it to both. Hopefully someone has some ideas where to look to fix this.
> Some more details...
> Stack trace of the read past EOF and following Lock obtain timed out:
> {code}
> 78307 [proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader@31ca5085] 
>     ERROR proj.zoie.impl.indexing.internal.BaseSearchIndex - read past EOF
> java.io.IOException: read past EOF
>     at org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:154)
>     at org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:39)
>     at org.apache.lucene.store.ChecksumIndexInput.readByte(ChecksumIndexInput.java:37)
>     at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:69)
>     at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:245)
>     at org.apache.lucene.index.IndexFileDeleter.<init>(IndexFileDeleter.java:166)
>     at org.apache.lucene.index.DirectoryReader.doCommit(DirectoryReader.java:725)
>     at org.apache.lucene.index.IndexReader.commit(IndexReader.java:987)
>     at org.apache.lucene.index.IndexReader.commit(IndexReader.java:973)
>     at org.apache.lucene.index.IndexReader.decRef(IndexReader.java:162)
>     at org.apache.lucene.index.IndexReader.close(IndexReader.java:1003)
>     at proj.zoie.impl.indexing.internal.BaseSearchIndex.deleteDocs(BaseSearchIndex.java:203)
>     at proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:223)
>     at proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153)
>     at proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134)
>     at proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171)
>     at proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:373)
> 579336 [proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader@31ca5085] 
>     ERROR proj.zoie.impl.indexing.internal.LuceneIndexDataLoader - 
>     Problem copying segments: Lock obtain timed out: 
>     org.apache.lucene.store.SingleInstanceLock@5ad0b895: write.lock
> org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: 
>     org.apache.lucene.store.SingleInstanceLock@5ad0b895: write.lock
>     at org.apache.lucene.store.Lock.obtain(Lock.java:84)
>     at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1060)
>     at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:957)
>     at proj.zoie.impl.indexing.internal.DiskSearchIndex.openIndexWriter(DiskSearchIndex.java:176)
>     at proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:228)
>     at proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153)
>     at proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134)
>     at proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171)
>     at proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:373)
> {code}
> We get exactly the same behavour on both OS X and on Windows. On both zoie is using a SimpleFSDirectory.
> We also use a SingleInstanceLockFactory (since our process is the only one working with the index), but we get the same behaviour with a NativeFSLock.
> The snapshot backup is being made by calling:
> *proj.zoie.impl.indexing.ZoieSystem.exportSnapshot(WritableByteChannel)*
> Same issue in zoie JIRA:
> http://snaprojects.jira.com/browse/ZOIE-51

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] Resolved: (LUCENE-2729) Index corruption after 'read past EOF' under heavy update load and snapshot export

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/LUCENE-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael McCandless resolved LUCENE-2729.
----------------------------------------

    Resolution: Invalid

OK I'm glad to hear it's solved!

Resolving as Invalid since this wasn't a Lucene problem.


> Index corruption after 'read past EOF' under heavy update load and snapshot export
> ----------------------------------------------------------------------------------
>
>                 Key: LUCENE-2729
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2729
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>    Affects Versions: 3.0.1, 3.0.2
>         Environment: Happens on both OS X 10.6 and Windows 2008 Server. Integrated with zoie (using a zoie snapshot from 2010-08-06: zoie-2.0.0-snapshot-20100806.jar).
>            Reporter: Nico Krijnen
>         Attachments: 2010-11-02 IndexWriter infoStream log.zip, backup_force_failure2.log.zip, eof-extra-logging-4-analysis.txt, eof-extra-logging-4.log.zip, LUCENE-2729-test1.patch, read-past-eof-debugging.zip
>
>
> We have a system running lucene and zoie. We use lucene as a content store for a CMS/DAM system. We use the hot-backup feature of zoie to make scheduled backups of the index. This works fine for small indexes and when there are not a lot of changes to the index when the backup is made.
> On large indexes (about 5 GB to 19 GB), when a backup is made while the index is being changed a lot (lots of document additions and/or deletions), we almost always get a 'read past EOF' at some point, followed by lots of 'Lock obtain timed out'.
> At that point we get lots of 0 kb files in the index, data gets lots, and the index is unusable.
> When we stop our server, remove the 0kb files and restart our server, the index is operational again, but data has been lost.
> I'm not sure if this is a zoie or a lucene issue, so i'm posting it to both. Hopefully someone has some ideas where to look to fix this.
> Some more details...
> Stack trace of the read past EOF and following Lock obtain timed out:
> {code}
> 78307 [proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader@31ca5085] 
>     ERROR proj.zoie.impl.indexing.internal.BaseSearchIndex - read past EOF
> java.io.IOException: read past EOF
>     at org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:154)
>     at org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:39)
>     at org.apache.lucene.store.ChecksumIndexInput.readByte(ChecksumIndexInput.java:37)
>     at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:69)
>     at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:245)
>     at org.apache.lucene.index.IndexFileDeleter.<init>(IndexFileDeleter.java:166)
>     at org.apache.lucene.index.DirectoryReader.doCommit(DirectoryReader.java:725)
>     at org.apache.lucene.index.IndexReader.commit(IndexReader.java:987)
>     at org.apache.lucene.index.IndexReader.commit(IndexReader.java:973)
>     at org.apache.lucene.index.IndexReader.decRef(IndexReader.java:162)
>     at org.apache.lucene.index.IndexReader.close(IndexReader.java:1003)
>     at proj.zoie.impl.indexing.internal.BaseSearchIndex.deleteDocs(BaseSearchIndex.java:203)
>     at proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:223)
>     at proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153)
>     at proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134)
>     at proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171)
>     at proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:373)
> 579336 [proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader@31ca5085] 
>     ERROR proj.zoie.impl.indexing.internal.LuceneIndexDataLoader - 
>     Problem copying segments: Lock obtain timed out: 
>     org.apache.lucene.store.SingleInstanceLock@5ad0b895: write.lock
> org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: 
>     org.apache.lucene.store.SingleInstanceLock@5ad0b895: write.lock
>     at org.apache.lucene.store.Lock.obtain(Lock.java:84)
>     at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1060)
>     at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:957)
>     at proj.zoie.impl.indexing.internal.DiskSearchIndex.openIndexWriter(DiskSearchIndex.java:176)
>     at proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:228)
>     at proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153)
>     at proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134)
>     at proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171)
>     at proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:373)
> {code}
> We get exactly the same behavour on both OS X and on Windows. On both zoie is using a SimpleFSDirectory.
> We also use a SingleInstanceLockFactory (since our process is the only one working with the index), but we get the same behaviour with a NativeFSLock.
> The snapshot backup is being made by calling:
> *proj.zoie.impl.indexing.ZoieSystem.exportSnapshot(WritableByteChannel)*
> Same issue in zoie JIRA:
> http://snaprojects.jira.com/browse/ZOIE-51

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] Commented: (LUCENE-2729) Index corruption after 'read past EOF' under heavy update load and snapshot export

Posted by "Xiaoyang Gu (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12935118#action_12935118 ] 

Xiaoyang Gu commented on LUCENE-2729:
-------------------------------------

Yes, I was just looking at that part.

Can you test with the following patched method in DiskSearchIndex class?

  @Override
  protected IndexReader openIndexReaderForDelete() throws IOException {
    Directory directory = _dirMgr.getDirectory(true);
    if (IndexReader.indexExists(directory)){		
      return IndexReader.open(directory, _deletionPolicy, false);
    }
    else{
      return null;
    }
  }


> Index corruption after 'read past EOF' under heavy update load and snapshot export
> ----------------------------------------------------------------------------------
>
>                 Key: LUCENE-2729
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2729
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>    Affects Versions: 3.0.1, 3.0.2
>         Environment: Happens on both OS X 10.6 and Windows 2008 Server. Integrated with zoie (using a zoie snapshot from 2010-08-06: zoie-2.0.0-snapshot-20100806.jar).
>            Reporter: Nico Krijnen
>         Attachments: 2010-11-02 IndexWriter infoStream log.zip, backup_force_failure2.log.zip, eof-extra-logging-4-analysis.txt, eof-extra-logging-4.log.zip, LUCENE-2729-test1.patch, read-past-eof-debugging.zip
>
>
> We have a system running lucene and zoie. We use lucene as a content store for a CMS/DAM system. We use the hot-backup feature of zoie to make scheduled backups of the index. This works fine for small indexes and when there are not a lot of changes to the index when the backup is made.
> On large indexes (about 5 GB to 19 GB), when a backup is made while the index is being changed a lot (lots of document additions and/or deletions), we almost always get a 'read past EOF' at some point, followed by lots of 'Lock obtain timed out'.
> At that point we get lots of 0 kb files in the index, data gets lots, and the index is unusable.
> When we stop our server, remove the 0kb files and restart our server, the index is operational again, but data has been lost.
> I'm not sure if this is a zoie or a lucene issue, so i'm posting it to both. Hopefully someone has some ideas where to look to fix this.
> Some more details...
> Stack trace of the read past EOF and following Lock obtain timed out:
> {code}
> 78307 [proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader@31ca5085] 
>     ERROR proj.zoie.impl.indexing.internal.BaseSearchIndex - read past EOF
> java.io.IOException: read past EOF
>     at org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:154)
>     at org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:39)
>     at org.apache.lucene.store.ChecksumIndexInput.readByte(ChecksumIndexInput.java:37)
>     at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:69)
>     at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:245)
>     at org.apache.lucene.index.IndexFileDeleter.<init>(IndexFileDeleter.java:166)
>     at org.apache.lucene.index.DirectoryReader.doCommit(DirectoryReader.java:725)
>     at org.apache.lucene.index.IndexReader.commit(IndexReader.java:987)
>     at org.apache.lucene.index.IndexReader.commit(IndexReader.java:973)
>     at org.apache.lucene.index.IndexReader.decRef(IndexReader.java:162)
>     at org.apache.lucene.index.IndexReader.close(IndexReader.java:1003)
>     at proj.zoie.impl.indexing.internal.BaseSearchIndex.deleteDocs(BaseSearchIndex.java:203)
>     at proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:223)
>     at proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153)
>     at proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134)
>     at proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171)
>     at proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:373)
> 579336 [proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader@31ca5085] 
>     ERROR proj.zoie.impl.indexing.internal.LuceneIndexDataLoader - 
>     Problem copying segments: Lock obtain timed out: 
>     org.apache.lucene.store.SingleInstanceLock@5ad0b895: write.lock
> org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: 
>     org.apache.lucene.store.SingleInstanceLock@5ad0b895: write.lock
>     at org.apache.lucene.store.Lock.obtain(Lock.java:84)
>     at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1060)
>     at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:957)
>     at proj.zoie.impl.indexing.internal.DiskSearchIndex.openIndexWriter(DiskSearchIndex.java:176)
>     at proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:228)
>     at proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153)
>     at proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134)
>     at proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171)
>     at proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:373)
> {code}
> We get exactly the same behavour on both OS X and on Windows. On both zoie is using a SimpleFSDirectory.
> We also use a SingleInstanceLockFactory (since our process is the only one working with the index), but we get the same behaviour with a NativeFSLock.
> The snapshot backup is being made by calling:
> *proj.zoie.impl.indexing.ZoieSystem.exportSnapshot(WritableByteChannel)*
> Same issue in zoie JIRA:
> http://snaprojects.jira.com/browse/ZOIE-51

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] Commented: (LUCENE-2729) Index corruption after 'read past EOF' under heavy update load and snapshot export

Posted by "Xiaoyang Gu (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12935025#action_12935025 ] 

Xiaoyang Gu commented on LUCENE-2729:
-------------------------------------

Here's the code in zoie that exports index in class DefaultDirectoryManager:
We accidentally opened the file in "rw" mode. But there's no write operations to the file. Can you test whether changing it to "r" mode makes any difference?

Thanks!

  public long transferFromFileToChannel(String fileName, WritableByteChannel channel) throws IOException
  {
    long amount = 0;
    File file = new File(_location, fileName);
    RandomAccessFile raf = null;
    FileChannel fc = null;
    try
    {
      raf = new RandomAccessFile(file, "rw");
      fc = raf.getChannel();
      long dataLen = fc.size();
      amount += ChannelUtil.writeLong(channel, dataLen);
      amount += fc.transferTo(0, dataLen, channel);
    }
    finally
    {
      try
      {
      if (fc != null) fc.close();
      } finally
      {
        if (raf != null) raf.close();
      }
    }
    return amount;
  }


> Index corruption after 'read past EOF' under heavy update load and snapshot export
> ----------------------------------------------------------------------------------
>
>                 Key: LUCENE-2729
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2729
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>    Affects Versions: 3.0.1, 3.0.2
>         Environment: Happens on both OS X 10.6 and Windows 2008 Server. Integrated with zoie (using a zoie snapshot from 2010-08-06: zoie-2.0.0-snapshot-20100806.jar).
>            Reporter: Nico Krijnen
>         Attachments: 2010-11-02 IndexWriter infoStream log.zip, backup_force_failure2.log.zip, eof-extra-logging-4-analysis.txt, eof-extra-logging-4.log.zip, LUCENE-2729-test1.patch, read-past-eof-debugging.zip
>
>
> We have a system running lucene and zoie. We use lucene as a content store for a CMS/DAM system. We use the hot-backup feature of zoie to make scheduled backups of the index. This works fine for small indexes and when there are not a lot of changes to the index when the backup is made.
> On large indexes (about 5 GB to 19 GB), when a backup is made while the index is being changed a lot (lots of document additions and/or deletions), we almost always get a 'read past EOF' at some point, followed by lots of 'Lock obtain timed out'.
> At that point we get lots of 0 kb files in the index, data gets lots, and the index is unusable.
> When we stop our server, remove the 0kb files and restart our server, the index is operational again, but data has been lost.
> I'm not sure if this is a zoie or a lucene issue, so i'm posting it to both. Hopefully someone has some ideas where to look to fix this.
> Some more details...
> Stack trace of the read past EOF and following Lock obtain timed out:
> {code}
> 78307 [proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader@31ca5085] 
>     ERROR proj.zoie.impl.indexing.internal.BaseSearchIndex - read past EOF
> java.io.IOException: read past EOF
>     at org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:154)
>     at org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:39)
>     at org.apache.lucene.store.ChecksumIndexInput.readByte(ChecksumIndexInput.java:37)
>     at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:69)
>     at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:245)
>     at org.apache.lucene.index.IndexFileDeleter.<init>(IndexFileDeleter.java:166)
>     at org.apache.lucene.index.DirectoryReader.doCommit(DirectoryReader.java:725)
>     at org.apache.lucene.index.IndexReader.commit(IndexReader.java:987)
>     at org.apache.lucene.index.IndexReader.commit(IndexReader.java:973)
>     at org.apache.lucene.index.IndexReader.decRef(IndexReader.java:162)
>     at org.apache.lucene.index.IndexReader.close(IndexReader.java:1003)
>     at proj.zoie.impl.indexing.internal.BaseSearchIndex.deleteDocs(BaseSearchIndex.java:203)
>     at proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:223)
>     at proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153)
>     at proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134)
>     at proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171)
>     at proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:373)
> 579336 [proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader@31ca5085] 
>     ERROR proj.zoie.impl.indexing.internal.LuceneIndexDataLoader - 
>     Problem copying segments: Lock obtain timed out: 
>     org.apache.lucene.store.SingleInstanceLock@5ad0b895: write.lock
> org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: 
>     org.apache.lucene.store.SingleInstanceLock@5ad0b895: write.lock
>     at org.apache.lucene.store.Lock.obtain(Lock.java:84)
>     at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1060)
>     at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:957)
>     at proj.zoie.impl.indexing.internal.DiskSearchIndex.openIndexWriter(DiskSearchIndex.java:176)
>     at proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:228)
>     at proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153)
>     at proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134)
>     at proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171)
>     at proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:373)
> {code}
> We get exactly the same behavour on both OS X and on Windows. On both zoie is using a SimpleFSDirectory.
> We also use a SingleInstanceLockFactory (since our process is the only one working with the index), but we get the same behaviour with a NativeFSLock.
> The snapshot backup is being made by calling:
> *proj.zoie.impl.indexing.ZoieSystem.exportSnapshot(WritableByteChannel)*
> Same issue in zoie JIRA:
> http://snaprojects.jira.com/browse/ZOIE-51

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] Issue Comment Edited: (LUCENE-2729) Index corruption after 'read past EOF' under heavy update load and snapshot export

Posted by "Nico Krijnen (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12927040#action_12927040 ] 

Nico Krijnen edited comment on LUCENE-2729 at 11/1/10 12:53 PM:
----------------------------------------------------------------

In the mean time, we also did a test with a checkout of the latest lucene_3_0 branch (@2010-11-01), which should include the fix that Jason mentions.

Does not seem to make a difference though. We still get a 'read past EOF'.

On the last run we did get a slightly different stacktrace. This time the 'read past EOF' happens when the zoie RAM index is written to the zoie Disk index. Last time it occurred a little earlier in BaseSearchIndex#loadFromIndex, while committing deletes to the disk IndexReader. This could be just a coincidence though. My feeling is still that the 'read past EOF' is just a result/symptom of something else that happened just before it - still trying to figure out what that could be... any suggestions are welcome.

{code}
15:25:03,453 [proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader@3d9e7719] 
    ERROR proj.zoie.impl.indexing.internal.LuceneIndexDataLoader - 
    Problem copying segments: read past EOF
java.io.IOException: read past EOF
    at org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:154)
    at org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:39)
    at org.apache.lucene.store.ChecksumIndexInput.readByte(ChecksumIndexInput.java:37)
    at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:69)
    at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:245)
    at org.apache.lucene.index.IndexFileDeleter.<init>(IndexFileDeleter.java:170)
    at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1127)
    at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:960)
    at proj.zoie.impl.indexing.internal.DiskSearchIndex.openIndexWriter(DiskSearchIndex.java:176)
    at proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:228)
    at proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153)
    at proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134)
    at proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:172)
    at proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:377)
{code}

      was (Author: nkrijnen):
    In the mean time, we also did a test with a checkout of the latest lucene_3_0 branch (@2010-11-01), which should include the fix that Jason mentions.

Does not seem to make a difference though. We still get a 'read past EOF'.

On the last run we did get a slightly different stacktrace. This time the 'read past EOF' happens when the zoie RAM index is written to the zoie Disk index. Last time it occurred a little earlier in BaseSearchIndex#loadFromIndex, while committing deletes to the disk IndexReader. This could be just a coincidence though. My feeling is still that the 'read past EOF' is just a result/symptom of something else that happened just before it - still trying to figure out what that could be...

{code}
15:25:03,453 [proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader@3d9e7719] 
    ERROR proj.zoie.impl.indexing.internal.LuceneIndexDataLoader - 
    Problem copying segments: read past EOF
java.io.IOException: read past EOF
    at org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:154)
    at org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:39)
    at org.apache.lucene.store.ChecksumIndexInput.readByte(ChecksumIndexInput.java:37)
    at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:69)
    at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:245)
    at org.apache.lucene.index.IndexFileDeleter.<init>(IndexFileDeleter.java:170)
    at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1127)
    at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:960)
    at proj.zoie.impl.indexing.internal.DiskSearchIndex.openIndexWriter(DiskSearchIndex.java:176)
    at proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:228)
    at proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153)
    at proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134)
    at proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:172)
    at proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:377)
{code}
  
> Index corruption after 'read past EOF' under heavy update load and snapshot export
> ----------------------------------------------------------------------------------
>
>                 Key: LUCENE-2729
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2729
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>    Affects Versions: 3.0.1, 3.0.2
>         Environment: Happens on both OS X 10.6 and Windows 2008 Server. Integrated with zoie (using a zoie snapshot from 2010-08-06: zoie-2.0.0-snapshot-20100806.jar).
>            Reporter: Nico Krijnen
>
> We have a system running lucene and zoie. We use lucene as a content store for a CMS/DAM system. We use the hot-backup feature of zoie to make scheduled backups of the index. This works fine for small indexes and when there are not a lot of changes to the index when the backup is made.
> On large indexes (about 5 GB to 19 GB), when a backup is made while the index is being changed a lot (lots of document additions and/or deletions), we almost always get a 'read past EOF' at some point, followed by lots of 'Lock obtain timed out'.
> At that point we get lots of 0 kb files in the index, data gets lots, and the index is unusable.
> When we stop our server, remove the 0kb files and restart our server, the index is operational again, but data has been lost.
> I'm not sure if this is a zoie or a lucene issue, so i'm posting it to both. Hopefully someone has some ideas where to look to fix this.
> Some more details...
> Stack trace of the read past EOF and following Lock obtain timed out:
> {code}
> 78307 [proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader@31ca5085] ERROR proj.zoie.impl.indexing.internal.BaseSearchIndex - read past EOF
> java.io.IOException: read past EOF
>     at org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:154)
>     at org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:39)
>     at org.apache.lucene.store.ChecksumIndexInput.readByte(ChecksumIndexInput.java:37)
>     at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:69)
>     at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:245)
>     at org.apache.lucene.index.IndexFileDeleter.<init>(IndexFileDeleter.java:166)
>     at org.apache.lucene.index.DirectoryReader.doCommit(DirectoryReader.java:725)
>     at org.apache.lucene.index.IndexReader.commit(IndexReader.java:987)
>     at org.apache.lucene.index.IndexReader.commit(IndexReader.java:973)
>     at org.apache.lucene.index.IndexReader.decRef(IndexReader.java:162)
>     at org.apache.lucene.index.IndexReader.close(IndexReader.java:1003)
>     at proj.zoie.impl.indexing.internal.BaseSearchIndex.deleteDocs(BaseSearchIndex.java:203)
>     at proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:223)
>     at proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153)
>     at proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134)
>     at proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171)
>     at proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:373)
> 579336 [proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader@31ca5085] ERROR proj.zoie.impl.indexing.internal.LuceneIndexDataLoader - 
>     Problem copying segments: Lock obtain timed out: org.apache.lucene.store.SingleInstanceLock@5ad0b895: write.lock
> org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: org.apache.lucene.store.SingleInstanceLock@5ad0b895: write.lock
>     at org.apache.lucene.store.Lock.obtain(Lock.java:84)
>     at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1060)
>     at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:957)
>     at proj.zoie.impl.indexing.internal.DiskSearchIndex.openIndexWriter(DiskSearchIndex.java:176)
>     at proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:228)
>     at proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153)
>     at proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134)
>     at proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171)
>     at proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:373)
> {code}
> We get exactly the same behavour on both OS X and on Windows. On both zoie is using a SimpleFSDirectory.
> We also use a SingleInstanceLockFactory (since our process is the only one working with the index), but we get the same behaviour with a NativeFSLock.
> The snapshot backup is being made by calling:
> *proj.zoie.impl.indexing.ZoieSystem.exportSnapshot(WritableByteChannel)*
> Same issue in zoie JIRA:
> http://snaprojects.jira.com/browse/ZOIE-51

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] Issue Comment Edited: (LUCENE-2729) Index corruption after 'read past EOF' under heavy update load and snapshot export

Posted by "Nico Krijnen (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12932519#action_12932519 ] 

Nico Krijnen edited comment on LUCENE-2729 at 11/16/10 11:18 AM:
-----------------------------------------------------------------

We did a few more runs with lots more logging and *we think we have pinpointed the cause of the issue*. Logs and a short analyisis is attached: eof-extra-logging-4-analysis.txt

We added logging to track files that are opened for read and write. At the moment the 'read past EOF' happens, no files are still open for writing. So it does not seem as if a file is opened, but never closed. However we did find another interesting entry:

- Some time before the 'read past EOF' we see some activity in the disk index on the segments_... file that is 'locked' by the zoie snapshot.
-- 'our' segments_... file is deleted (the logged dir listings show that it does not exist anymore after that)
-- 'our' segments_... file is written to. At this point it is created again, but becomes 0 kb, which at a later point causes the 'read past EOF'.

We added more logging to find out more about when and who is 'setting' or 'changing' the 'generation' in SegmentInfos.

The logs from that run seem to indicate that *at some point SegementInfos.generation seems to 'wrap' to a lower value and earlier generation numbers get re-used*.

In ZoieIndexDeletionPolicy, a Snapshot is initialized using a certain IndexCommit. Zoie doesn't take into account that Lucene may re-use segments filenames, no locking is applied to this file whatsoever in terms of keeping Lucene from re-using this specific segments filename.

AFAICT by going through the code in SegmentInfos, the generation counter is reset in SegmentInfos.read(Directory). When looking at the log files and singling out the *value of generation, in our case it doesn't even reach a value above 118*. The segments filename of our Snapshot is 'segments_2q' which translates to a generation of 98. After a while, Lucene deletes this file on disk (because the name is re-used?) in a call to FSDirectory.deleteFile. However, our backup is still running at that point. Somehow this file is written to disk a moment later and on the next read, we get the 'read past EOF' error.
It's not completely clear to me, but why is the generation counter reset on read(Directory)? Is it because the file is determined through FindSegmentsFile(Directory).run()?

Although this could be interpreted like a flaw in ZoieIndexDeletionPolicy, I think this should be worked around in Lucene (correct me if I'm wrong).

We can attempt some kind of 'fix' to skip generations corresponding to segment filenames used by the '_currentSnapshots' variable in ZoieIndexDeletionPolicy. But the IndexDeletionPolicy API or SegmentInfos class does not really provide a nice way of doing that and it doesn't feel like a proper fix.

*Is there a specific reason for the generation to 'wrap around' or be re-used within hours?*
Looking at the code in SegmentInfos I do not see any explicit logic or reasons for re-using earlier generation numbers.

      was (Author: nkrijnen):
    We did a few more runs with lots more logging and we think we have pinpointed the cause of the issue.

We added logging to track files that are opened for read and write. At the moment the 'read past EOF' happens, no files are still open for writing. So it does not seem as if a file is opened, but never closed. However we did find another interesting entry:

- Some time before the 'read past EOF' we see some activity in the disk index on the segments_... file that is 'locked' by the zoie snapshot.
-- 'our' segments_... file is deleted (the logged dir listings show that it does not exist anymore after that)
-- 'our' segments_... file is written to. At this point it is created again, but becomes 0 kb, which at a later point causes the 'read past EOF'.

We added more logging to find out more about when and who is 'setting' or 'changing' the 'generation' in SegmentInfos.

The logs from that run seem to indicate that *at some point SegementInfos.generation seems to 'wrap' to a lower value and earlier generation numbers get re-used*.

In ZoieIndexDeletionPolicy, a Snapshot is initialized using a certain IndexCommit. Zoie doesn't take into account that Lucene may re-use segments filenames, no locking is applied to this file whatsoever in terms of keeping Lucene from re-using this specific segments filename.

AFAICT by going through the code in SegmentInfos, the generation counter is reset in SegmentInfos.read(Directory). When looking at the log files and singling out the *value of generation, in our case it doesn't even reach a value above 118*. The segments filename of our Snapshot is 'segments_2q' which translates to a generation of 98. After a while, Lucene deletes this file on disk (because the name is re-used?) in a call to FSDirectory.deleteFile. However, our backup is still running at that point. Somehow this file is written to disk a moment later and on the next read, we get the 'read past EOF' error.
It's not completely clear to me, but why is the generation counter reset on read(Directory)? Is it because the file is determined through FindSegmentsFile(Directory).run()?

Although this could be interpreted like a flaw in ZoieIndexDeletionPolicy, I think this should be worked around in Lucene (correct me if I'm wrong).

We can attempt some kind of 'fix' to skip generations corresponding to segment filenames used by the '_currentSnapshots' variable in ZoieIndexDeletionPolicy. But the IndexDeletionPolicy API or SegmentInfos class does not really provide a nice way of doing that and it doesn't feel like a proper fix.

*Is there a specific reason for the generation to 'wrap around' or be re-used within hours?*
Looking at the code in SegmentInfos I do not see any explicit logic or reasons for re-using earlier generation numbers.
  
> Index corruption after 'read past EOF' under heavy update load and snapshot export
> ----------------------------------------------------------------------------------
>
>                 Key: LUCENE-2729
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2729
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>    Affects Versions: 3.0.1, 3.0.2
>         Environment: Happens on both OS X 10.6 and Windows 2008 Server. Integrated with zoie (using a zoie snapshot from 2010-08-06: zoie-2.0.0-snapshot-20100806.jar).
>            Reporter: Nico Krijnen
>         Attachments: 2010-11-02 IndexWriter infoStream log.zip, backup_force_failure2.log.zip, eof-extra-logging-4-analysis.txt, eof-extra-logging-4.log.zip, LUCENE-2729-test1.patch
>
>
> We have a system running lucene and zoie. We use lucene as a content store for a CMS/DAM system. We use the hot-backup feature of zoie to make scheduled backups of the index. This works fine for small indexes and when there are not a lot of changes to the index when the backup is made.
> On large indexes (about 5 GB to 19 GB), when a backup is made while the index is being changed a lot (lots of document additions and/or deletions), we almost always get a 'read past EOF' at some point, followed by lots of 'Lock obtain timed out'.
> At that point we get lots of 0 kb files in the index, data gets lots, and the index is unusable.
> When we stop our server, remove the 0kb files and restart our server, the index is operational again, but data has been lost.
> I'm not sure if this is a zoie or a lucene issue, so i'm posting it to both. Hopefully someone has some ideas where to look to fix this.
> Some more details...
> Stack trace of the read past EOF and following Lock obtain timed out:
> {code}
> 78307 [proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader@31ca5085] 
>     ERROR proj.zoie.impl.indexing.internal.BaseSearchIndex - read past EOF
> java.io.IOException: read past EOF
>     at org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:154)
>     at org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:39)
>     at org.apache.lucene.store.ChecksumIndexInput.readByte(ChecksumIndexInput.java:37)
>     at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:69)
>     at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:245)
>     at org.apache.lucene.index.IndexFileDeleter.<init>(IndexFileDeleter.java:166)
>     at org.apache.lucene.index.DirectoryReader.doCommit(DirectoryReader.java:725)
>     at org.apache.lucene.index.IndexReader.commit(IndexReader.java:987)
>     at org.apache.lucene.index.IndexReader.commit(IndexReader.java:973)
>     at org.apache.lucene.index.IndexReader.decRef(IndexReader.java:162)
>     at org.apache.lucene.index.IndexReader.close(IndexReader.java:1003)
>     at proj.zoie.impl.indexing.internal.BaseSearchIndex.deleteDocs(BaseSearchIndex.java:203)
>     at proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:223)
>     at proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153)
>     at proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134)
>     at proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171)
>     at proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:373)
> 579336 [proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader@31ca5085] 
>     ERROR proj.zoie.impl.indexing.internal.LuceneIndexDataLoader - 
>     Problem copying segments: Lock obtain timed out: 
>     org.apache.lucene.store.SingleInstanceLock@5ad0b895: write.lock
> org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: 
>     org.apache.lucene.store.SingleInstanceLock@5ad0b895: write.lock
>     at org.apache.lucene.store.Lock.obtain(Lock.java:84)
>     at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1060)
>     at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:957)
>     at proj.zoie.impl.indexing.internal.DiskSearchIndex.openIndexWriter(DiskSearchIndex.java:176)
>     at proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:228)
>     at proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153)
>     at proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134)
>     at proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171)
>     at proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:373)
> {code}
> We get exactly the same behavour on both OS X and on Windows. On both zoie is using a SimpleFSDirectory.
> We also use a SingleInstanceLockFactory (since our process is the only one working with the index), but we get the same behaviour with a NativeFSLock.
> The snapshot backup is being made by calling:
> *proj.zoie.impl.indexing.ZoieSystem.exportSnapshot(WritableByteChannel)*
> Same issue in zoie JIRA:
> http://snaprojects.jira.com/browse/ZOIE-51

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org