You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Mark Miller (JIRA)" <ji...@apache.org> on 2013/03/01 14:59:12 UTC

[jira] [Commented] (SOLR-4519) corrupt tlog causes fullCopy download index files every time reboot a node

    [ https://issues.apache.org/jira/browse/SOLR-4519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13590526#comment-13590526 ] 

Mark Miller commented on SOLR-4519:
-----------------------------------

bq. the tlog should be fixed.

Currently, when you replicate, you get nothing in the tlog. Yonik has brought up perhaps doing a little trick on replication to populate the tlog a bit, but nothing has been started on that front. So once you replicate, unless some docs are then added, the next fail will require another replication.

However, we may actually be able to take advantage of replication itself noticing that it doesn't need to do a full replicate. Currently, in SolrCloud we force a replication every time no matter what when we call replicate - now that std replication has had some bugs fixed and has better tests, we may not have to force that anymore - and so the next full replication would not actually move any files.
                
> corrupt tlog causes fullCopy download index files every time reboot a node
> --------------------------------------------------------------------------
>
>                 Key: SOLR-4519
>                 URL: https://issues.apache.org/jira/browse/SOLR-4519
>             Project: Solr
>          Issue Type: Bug
>    Affects Versions: 4.0
>         Environment: The solrcloud is implemented on three servers. There are three solr instance on each server. The collection has three shards. Every shard has three replica. Replicas in same shard run in solr instance on different server.
>            Reporter: Simon Scofield
>
> There are two questions:
> 1. The tlog of one replica of shard1 is damaged by some reason. We are still looking for the reason. Please give some clue if you are familia with this problem.
> 2. The error replica successed to recovery by fullcopy download index files from leader. Then I killed the instance and started it again, the recovery process still is fullcopy download. In my opinion, after the first time fullcopy recovery, the tlog should be fixed. Here is some log: 
> 2013-02-28 15:04:58,622 INFO org.apache.solr.cloud.ZkController:757 - Core needs to recover:metadata
> 2013-02-28 15:04:58,622 INFO org.apache.solr.update.DefaultSolrCoreState:214 - Running recovery - first canceling any ongoing recovery
> 2013-02-28 15:04:58,625 INFO org.apache.solr.cloud.RecoveryStrategy:217 - Starting recovery process.  core=metadata recoveringAfterStartup=true
> 2013-02-28 15:04:58,626 INFO org.apache.solr.common.cloud.ZkStateReader:295 - Updating cloud state from ZooKeeper...
> 2013-02-28 15:04:58,628 ERROR org.apache.solr.update.UpdateLog:957 - Exception reading versions from log
> java.io.EOFException
>         at org.apache.solr.common.util.FastInputStream.readUnsignedByte(FastInputStream.java:72)
>         at org.apache.solr.common.util.FastInputStream.readInt(FastInputStream.java:206)
>         at org.apache.solr.update.TransactionLog$ReverseReader.next(TransactionLog.java:705)
>         at org.apache.solr.update.UpdateLog$RecentUpdates.update(UpdateLog.java:906)
>         at org.apache.solr.update.UpdateLog$RecentUpdates.access$000(UpdateLog.java:846)
>         at org.apache.solr.update.UpdateLog.getRecentUpdates(UpdateLog.java:996)
>         at org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:256)
>         at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:220)
> 2013-02-28 15:05:01,857 INFO org.apache.solr.cloud.RecoveryStrategy:399 - Begin buffering updates. core=metadata
> 2013-02-28 15:05:01,857 INFO org.apache.solr.update.UpdateLog:1015 - Starting to buffer updates. FSUpdateLog{state=ACTIVE, tlog=null}
> 2013-02-28 15:05:01,857 INFO org.apache.solr.cloud.RecoveryStrategy:126 - Attempting to replicate from http://23.61.21.121:65201/solr/metadata/. core=metadata
> 2013-02-28 15:05:02,882 INFO org.apache.solr.handler.SnapPuller:305 - Master's generation: 6993
> 2013-02-28 15:05:02,882 INFO org.apache.solr.handler.SnapPuller:306 - Slave's generation: 6993
> 2013-02-28 15:05:02,882 INFO org.apache.solr.handler.SnapPuller:307 - Starting replication process
> 2013-02-28 15:05:02,893 INFO org.apache.solr.handler.SnapPuller:312 - Number of files in latest index in master: 422
> 2013-02-28 15:05:02,897 INFO org.apache.solr.handler.SnapPuller:325 - Starting download to /solr/nodes/node1/bin/../solr/metadata/data/index.20130228150502893 fullCopy=true
> 2013-02-28 15:33:55,848 INFO org.apache.solr.handler.SnapPuller:334 - Total time taken for download : 1732 secs (The size of index files is 94G)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org