You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Dongjoon Hyun (JIRA)" <ji...@apache.org> on 2019/03/25 04:52:00 UTC

[jira] [Resolved] (SPARK-27219) Misleading exceptions in transport code's SASL fallback path

     [ https://issues.apache.org/jira/browse/SPARK-27219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dongjoon Hyun resolved SPARK-27219.
-----------------------------------
       Resolution: Fixed
         Assignee: Marcelo Vanzin
    Fix Version/s: 3.0.0

This is resolved via https://github.com/apache/spark/pull/24160 .

> Misleading exceptions in transport code's SASL fallback path
> ------------------------------------------------------------
>
>                 Key: SPARK-27219
>                 URL: https://issues.apache.org/jira/browse/SPARK-27219
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core
>    Affects Versions: 2.4.0
>            Reporter: Marcelo Vanzin
>            Assignee: Marcelo Vanzin
>            Priority: Minor
>             Fix For: 3.0.0
>
>
> There are a couple of code paths in the SASL fallback handling that result in misleading exceptions printed to logs. One of them is if a timeout occurs during authentication; for example:
> {noformat}
> 19/03/15 11:21:37 WARN crypto.AuthClientBootstrap: New auth protocol failed, trying SASL.
> java.lang.RuntimeException: java.util.concurrent.TimeoutException: Timeout waiting for task.
>         at org.spark_project.guava.base.Throwables.propagate(Throwables.java:160)
>         at org.apache.spark.network.client.TransportClient.sendRpcSync(TransportClient.java:258)
>         at org.apache.spark.network.crypto.AuthClientBootstrap.doSparkAuth(AuthClientBootstrap.java:105)
>         at org.apache.spark.network.crypto.AuthClientBootstrap.doBootstrap(AuthClientBootstrap.java:79)
>         at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:262)
>         at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:192)
>         at org.apache.spark.network.shuffle.ExternalShuffleClient.lambda$fetchBlocks$0(ExternalShuffleClient.java:100)
>         at org.apache.spark.network.shuffle.RetryingBlockFetcher.fetchAllOutstanding(RetryingBlockFetcher.java:141)
> ...
> Caused by: java.util.concurrent.TimeoutException: Timeout waiting for task.
>         at org.spark_project.guava.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:276)
>         at org.spark_project.guava.util.concurrent.AbstractFuture.get(AbstractFuture.java:96)
>         at org.apache.spark.network.client.TransportClient.sendRpcSync(TransportClient.java:254)
>         ... 38 more
> 19/03/15 11:21:38 WARN server.TransportChannelHandler: Exception in connection from vc1033.halxg.cloudera.com/10.17.216.43:7337
> java.lang.IllegalArgumentException: Frame length should be positive: -3702202170875367528
>         at org.spark_project.guava.base.Preconditions.checkArgument(Preconditions.java:119)
> {noformat}
> The IllegalArgumentException shouldn't happen, it only happens because the code is ignoring the time out and retrying, at which point the remote side is in a different state and thus doesn't expect the message.
> The same line that prints that exception can result in a noisy log message when the remote side (e.g. an old shuffle service) does not understand the new auth protocol. Since it's a warning it seems like something is wrong, when it's just doing what's expected.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org