You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by adm1n <ev...@gmail.com> on 2013/03/12 10:58:28 UTC

index "got lost"

Hi,

This weekend I experienced very strange SOLR's (4.0) behavior. One of shards
(both, master and slave) "dropped" their indexes folders and started to use
new ones.

Before this strange "incident" index folder was:
solr/collection/data/index.20130128144430264

and after it it became:
solr/collection/data/index.20130309055741370

While solr/collection/data/index.20130128144430264 is 5 GB (about 2.2M docs)
and solr/collection/data/index.20130309055741370 is 57 MB (about 17K docs).

in the solr log I can see the following:


Mar 09, 2013 5:57:14 AM org.apache.solr.common.cloud.ZkStateReader$2 process
INFO: A cluster state change has occurred - updating...
Mar 09, 2013 5:57:39 AM org.apache.solr.update.PeerSync handleResponse
WARNING: PeerSync: core=collection url=http://prd-solr-01b:7501/solr 
exception talking to http://prd-solr-01a:7501/solr/collection/, failed
org.apache.solr.client.solrj.SolrServerException: IOException occured when
talking to server at: http://prd-solr-01a:7501/solr/collection
        at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:413)
        at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181)
        at
org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:166)
        at
org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:133)
        at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
        at java.util.concurrent.FutureTask.run(Unknown Source)
        at java.util.concurrent.Executors$RunnableAdapter.call(Unknown
Source)
        at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
        at java.util.concurrent.FutureTask.run(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
Source)
        at java.lang.Thread.run(Unknown Source)
Caused by: org.apache.http.conn.ConnectTimeoutException: Connect to
prd-solr-01a:7501 timed out
        at
org.apache.http.conn.scheme.PlainSocketFactory.connectSocket(PlainSocketFactory.java:125)
        at
org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnectionOperator.java:148)
        at
org.apache.http.impl.conn.AbstractPoolEntry.open(AbstractPoolEntry.java:150)
        at
org.apache.http.impl.conn.AbstractPooledConnAdapter.open(AbstractPooledConnAdapter.java:121)
        at
org.apache.http.impl.client.DefaultRequestDirector.tryConnect(DefaultRequestDirector.java:575)
        at
org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:425)
        at
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:820)
        at
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:754)
        at
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:732)
        at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:352)
        ... 11 more
Mar 09, 2013 5:57:39 AM org.apache.solr.update.PeerSync sync
INFO: PeerSync: core=collection url=http://prd-solr-01b:7501/solr DONE. sync
failed
Mar 09, 2013 5:57:39 AM org.apache.solr.cloud.RecoveryStrategy doRecovery
INFO: PeerSync Recovery was not successful - trying replication.
core=collection
Mar 09, 2013 5:57:39 AM org.apache.solr.cloud.RecoveryStrategy doRecovery
INFO: Starting Replication Recovery. core=collection
Mar 09, 2013 5:57:39 AM org.apache.solr.client.solrj.impl.HttpClientUtil
createClient
INFO: Creating new http client,
config:maxConnections=128&maxConnectionsPerHost=32&followRedirects=false
Mar 09, 2013 5:57:41 AM org.apache.solr.cloud.RecoveryStrategy doRecovery
INFO: Begin buffering updates. core=collection
Mar 09, 2013 5:57:41 AM org.apache.solr.update.UpdateLog bufferUpdates
INFO: Starting to buffer updates. FSUpdateLog{state=ACTIVE, tlog=null}
Mar 09, 2013 5:57:41 AM org.apache.solr.cloud.RecoveryStrategy replicate
INFO: Attempting to replicate from
http://prd-solr-01a:7501/solr/collection/. core=collection
Mar 09, 2013 5:57:41 AM org.apache.solr.client.solrj.impl.HttpClientUtil
createClient
INFO: Creating new http client,
config:maxConnections=128&maxConnectionsPerHost=32&followRedirects=false
Mar 09, 2013 5:57:41 AM org.apache.solr.handler.SnapPuller <init>
INFO:  No value set for 'pollInterval'. Timer Task not started.
Mar 09, 2013 5:57:41 AM org.apache.solr.handler.SnapPuller fetchLatestIndex
INFO: Master's generation: 100229
Mar 09, 2013 5:57:41 AM org.apache.solr.handler.SnapPuller fetchLatestIndex
INFO: Slave's generation: 100234
Mar 09, 2013 5:57:41 AM org.apache.solr.handler.SnapPuller fetchLatestIndex
INFO: Starting replication process
Mar 09, 2013 5:57:41 AM org.apache.solr.handler.SnapPuller fetchLatestIndex
INFO: Number of files in latest index in master: 77
Mar 09, 2013 5:57:41 AM org.apache.solr.handler.SnapPuller fetchLatestIndex
INFO: Starting download to
/usr/local/pkg/apache-solr/prod1/solr/collection/data/index.20130309055741370
fullCopy=true
Mar 09, 2013 5:57:42 AM org.apache.solr.handler.SnapPuller fetchLatestIndex
INFO: Total time taken for download : 0 secs
Mar 09, 2013 5:57:45 AM org.apache.solr.handler.SnapPuller modifyIndexProps
INFO: New index installed. Updating index properties...
index=index.20130309055741370
Mar 09, 2013 5:57:45 AM org.apache.solr.update.DefaultSolrCoreState
newIndexWriter
INFO: Creating new IndexWriter...
Mar 09, 2013 5:57:45 AM org.apache.solr.update.DefaultSolrCoreState
newIndexWriter
INFO: Waiting until IndexWriter is unused... core=collection
Mar 09, 2013 5:57:45 AM org.apache.solr.update.DefaultSolrCoreState
newIndexWriter
INFO: Rollback old IndexWriter... core=collection
Mar 09, 2013 5:57:45 AM org.apache.solr.core.SolrCore getNewIndexDir
WARNING: New index directory detected:
old=/usr/local/pkg/apache-solr/prod1/solr/collection/data/index.20130309052611025
new=/usr/local/pkg/apache-solr/prod1/solr/collection/data/index.20130309055741370
Mar 09, 2013 5:57:45 AM org.apache.solr.core.CachingDirectoryFactory get
INFO: return new directory for
/usr/local/pkg/apache-solr/prod1/solr/collection/data/index.20130309055741370
forceNew:true



the shard is located on hosts prd-solr-01[a,b]:7501 while zookeper manages
the master-salve roles.
The log is from prd-solr-01b:7501 machine.

Hosts are in the AWS and I was told by our IT guys that there were some
issues with AWS during this time.
I assume, that solr tried to save some changes in index (the error has) and
failed to write them and because of that new index was created.

Is my assumption correct?
If it is, is there a way to change such behavior? If it isn't - what could
be the reason for such behavior?

thanks. 









--
View this message in context: http://lucene.472066.n3.nabble.com/index-got-lost-tp4046589.html
Sent from the Solr - User mailing list archive at Nabble.com.