You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Timothy Potter (JIRA)" <ji...@apache.org> on 2016/04/29 09:55:12 UTC
[jira] [Commented] (SOLR-9050) IndexFetcher not retrying after
SocketTimeoutException correctly, which leads to trying a full download
again
[ https://issues.apache.org/jira/browse/SOLR-9050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15263708#comment-15263708 ]
Timothy Potter commented on SOLR-9050:
--------------------------------------
What's not clear to me is how we're getting into the cleanup method? I'm not seeing multiple retries to download the file after the first WARN is logged. There's about 4 minutes between the SocketTimeoutException and the Unable to download ERROR.
> IndexFetcher not retrying after SocketTimeoutException correctly, which leads to trying a full download again
> -------------------------------------------------------------------------------------------------------------
>
> Key: SOLR-9050
> URL: https://issues.apache.org/jira/browse/SOLR-9050
> Project: Solr
> Issue Type: Bug
> Components: replication (java)
> Affects Versions: 5.3.1
> Reporter: Timothy Potter
> Assignee: Timothy Potter
>
> I'm seeing a problem where reading a large file from the leader (in SolrCloud mode) during index replication leads to a SocketTimeoutException:
> {code}
> 2016-04-28 16:22:23.568 WARN (RecoveryThread-foo_shard11_replica2) [c:foo s:shard11 r:core_node139 x:foo_shard11_replica2] o.a.s.h.IndexFetcher Error in fetching file: _405k.cfs (downloaded 7314866176 of 9990844536 bytes)
> java.net.SocketTimeoutException: Read timed out
> at java.net.SocketInputStream.socketRead0(Native Method)
> at java.net.SocketInputStream.read(SocketInputStream.java:150)
> at java.net.SocketInputStream.read(SocketInputStream.java:121)
> at org.apache.http.impl.io.AbstractSessionInputBuffer.fillBuffer(AbstractSessionInputBuffer.java:160)
> at org.apache.http.impl.io.SocketInputBuffer.fillBuffer(SocketInputBuffer.java:84)
> at org.apache.http.impl.io.AbstractSessionInputBuffer.readLine(AbstractSessionInputBuffer.java:273)
> at org.apache.http.impl.io.ChunkedInputStream.getChunkSize(ChunkedInputStream.java:253)
> at org.apache.http.impl.io.ChunkedInputStream.nextChunk(ChunkedInputStream.java:227)
> at org.apache.http.impl.io.ChunkedInputStream.read(ChunkedInputStream.java:186)
> at org.apache.http.conn.EofSensorInputStream.read(EofSensorInputStream.java:137)
> at org.apache.solr.common.util.FastInputStream.readWrappedStream(FastInputStream.java:80)
> at org.apache.solr.common.util.FastInputStream.refill(FastInputStream.java:89)
> at org.apache.solr.common.util.FastInputStream.read(FastInputStream.java:140)
> at org.apache.solr.common.util.FastInputStream.readFully(FastInputStream.java:167)
> at org.apache.solr.common.util.FastInputStream.readFully(FastInputStream.java:161)
> at org.apache.solr.handler.IndexFetcher$FileFetcher.fetchPackets(IndexFetcher.java:1312)
> at org.apache.solr.handler.IndexFetcher$FileFetcher.fetchFile(IndexFetcher.java:1275)
> at org.apache.solr.handler.IndexFetcher.downloadIndexFiles(IndexFetcher.java:800)
> {code}
> and this leads to the following error in cleanup:
> {code}
> 2016-04-28 16:26:04.332 ERROR (RecoveryThread-foo_shard11_replica2) [c:foo s:shard11 r:core_node139 x:foo_shard11_replica2] o.a.s.h.ReplicationHandler Index fetch failed :org.apache.solr.common.SolrException: Unable to download _405k.cfs completely. Downloaded 7314866176!=9990844536
> at org.apache.solr.handler.IndexFetcher$FileFetcher.cleanup(IndexFetcher.java:1406)
> at org.apache.solr.handler.IndexFetcher$FileFetcher.fetchFile(IndexFetcher.java:1286)
> at org.apache.solr.handler.IndexFetcher.downloadIndexFiles(IndexFetcher.java:800)
> at org.apache.solr.handler.IndexFetcher.fetchLatestIndex(IndexFetcher.java:423)
> at org.apache.solr.handler.IndexFetcher.fetchLatestIndex(IndexFetcher.java:254)
> at org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:380)
> at org.apache.solr.cloud.RecoveryStrategy.replicate(RecoveryStrategy.java:162)
> at org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:437)
> at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:227)
> 2016-04-28 16:26:04.332 ERROR (RecoveryThread-foo_shard11_replica2) [c:foo s:shard11 r:core_node139 x:foo_shard11_replica2] o.a.s.c.RecoveryStrategy Error while trying to recover:org.apache.solr.common.SolrException: Replication for recovery failed.
> at org.apache.solr.cloud.RecoveryStrategy.replicate(RecoveryStrategy.java:165)
> at org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:437)
> at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:227)
> {code}
> So a simple read timeout exception leads to re-downloading the whole index again, and again, and again ...
> It also looks like any exception raised in fetchPackets would be squelched if an exception is raised in cleanup (called in the finally block)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org