You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Dmitry Kravchuk (Jira)" <ji...@apache.org> on 2022/11/16 19:50:00 UTC
[jira] [Updated] (SPARK-41163) Spark 3.2.2 ShuffleBlockFetcherIterator issue
[ https://issues.apache.org/jira/browse/SPARK-41163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dmitry Kravchuk updated SPARK-41163:
------------------------------------
Summary: Spark 3.2.2 ShuffleBlockFetcherIterator issue (was: Spark 3.2.2 )
> Spark 3.2.2 ShuffleBlockFetcherIterator issue
> ---------------------------------------------
>
> Key: SPARK-41163
> URL: https://issues.apache.org/jira/browse/SPARK-41163
> Project: Spark
> Issue Type: Bug
> Components: Build, Deploy
> Affects Versions: 3.2.2
> Environment: * spark 3.2.2
> * hadoop 3.1.2
> * hive 3.1.1
> * scala 2.12
> Reporter: Dmitry Kravchuk
> Priority: Major
> Fix For: 3.2.3
>
>
> Hello there.
> I've build spark 3.2.2 for my cluster which has hadoop 3.1.2 and scala 2.12 (pom.xml is attached).
> build script:
>
> {code:java}
> cd spark && \
> ./build/mvn -Pyarn -Dhadoop.version=3.1.2 -Pscala-2.12 -Phive -Phive-thriftserver -DskipTests clean package {code}
>
> It was working fine but a few applications has got strage error and warning form time to time.
> It always looks like datanode connection lost and shuffle reading issues.
> {code:java}
> 2022-11-16 22:18:25,423 ERROR server.TransportChannelHandler: Connection to s00abd02node9.company.com/10.x.y.163:35143 has been quiet for 120000 ms while there are outstanding requests. Assuming connection is dead; please adjust spark.shuffle.io.connectionTimeout if this is wrong.
> 2022-11-16 22:18:25,423 ERROR client.TransportResponseHandler: Still have 5 requests outstanding when connection from s00abd02node9.company.com/10.x.y.163:35143 is closed
> 2022-11-16 22:18:25,423 WARN netty.NettyBlockTransferService: Error while trying to get the host local dirs for [16]
> 2022-11-16 22:18:25,425 ERROR storage.ShuffleBlockFetcherIterator: Error occurred while fetching host local blocks {code}
> So when it happend application will go to retry and fail after 2nd start.
> Can anybody help?
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org