You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Dmitry Kravchuk (Jira)" <ji...@apache.org> on 2022/11/16 19:50:00 UTC

[jira] [Updated] (SPARK-41163) Spark 3.2.2 ShuffleBlockFetcherIterator issue

     [ https://issues.apache.org/jira/browse/SPARK-41163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dmitry Kravchuk updated SPARK-41163:
------------------------------------
    Summary: Spark 3.2.2 ShuffleBlockFetcherIterator issue  (was: Spark 3.2.2 )

> Spark 3.2.2 ShuffleBlockFetcherIterator issue
> ---------------------------------------------
>
>                 Key: SPARK-41163
>                 URL: https://issues.apache.org/jira/browse/SPARK-41163
>             Project: Spark
>          Issue Type: Bug
>          Components: Build, Deploy
>    Affects Versions: 3.2.2
>         Environment: * spark 3.2.2
>  * hadoop 3.1.2
>  * hive 3.1.1
>  * scala 2.12
>            Reporter: Dmitry Kravchuk
>            Priority: Major
>             Fix For: 3.2.3
>
>
> Hello there.
> I've build spark 3.2.2 for my cluster which has hadoop 3.1.2 and scala 2.12 (pom.xml is attached).
> build script:
>  
> {code:java}
> cd spark && \
> ./build/mvn -Pyarn -Dhadoop.version=3.1.2 -Pscala-2.12 -Phive -Phive-thriftserver -DskipTests clean package {code}
>  
> It was working fine but a few applications has got strage error and warning form time to time.
> It always looks like datanode connection lost and shuffle reading issues.
> {code:java}
> 2022-11-16 22:18:25,423 ERROR server.TransportChannelHandler: Connection to s00abd02node9.company.com/10.x.y.163:35143 has been quiet for 120000 ms while there are outstanding requests. Assuming connection is dead; please adjust spark.shuffle.io.connectionTimeout if this is wrong.
> 2022-11-16 22:18:25,423 ERROR client.TransportResponseHandler: Still have 5 requests outstanding when connection from s00abd02node9.company.com/10.x.y.163:35143 is closed
> 2022-11-16 22:18:25,423 WARN netty.NettyBlockTransferService: Error while trying to get the host local dirs for [16]
> 2022-11-16 22:18:25,425 ERROR storage.ShuffleBlockFetcherIterator: Error occurred while fetching host local blocks {code}
> So when it happend application will go to retry and fail after 2nd start.
> Can anybody help?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org