You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by "Sinha, Breeta (Nokia - IN/Bangalore)" <br...@nokia.com> on 2019/03/26 09:29:07 UTC

RPC timeout error for AES based encryption between driver and executor

Hi All,

We are trying to enable RPC encryption between driver and executor. Currently we're working on Spark 2.4 on Kubernetes.

According to Apache Spark Security document (https://spark.apache.org/docs/latest/security.html) and our understanding on the same, it is clear that Spark supports AES-based encryption for RPC connections. There is also support for SASL-based encryption, although it should be considered deprecated.

spark.network.crypto.enabled true , will enable AES-based RPC encryption.
However, when we enable AES based encryption between driver and executor, we could observe a very sporadic behaviour in communication between driver and executor in the logs.

Follwing are the options and their default values, we used for enabling encryption:-

spark.authenticate true
spark.authenticate.secret <some-value>
spark.network.crypto.enabled true
spark.network.crypto.keyLength 256
spark.network.crypto.saslFallback false

A snippet of the executor log is provided below:-
Exception in thread "main" 19/02/26 07:27:08 ERROR RpcOutboxMessage: Ask timeout before connecting successfully
Caused by: java.util.concurrent.TimeoutException: Cannot receive any reply from sts-spark-thrift-server-1551165767426-driver-svc.default.svc:7078 in 120 seconds

But, there is no error message or any message from executor seen in the driver log for the same timestamp.

We also tried increasing spark.network.timeout, but no luck.

This issue is seen sporadically, as the following observations were noted:-
1) Sometimes, enabling AES encryption works completely fine.
2) Sometimes, enabling AES encryption works fine for around 10 consecutive spark-submits but next trigger of spark-submit would go into hang state with the above mentioned error in the executor log.
3) Also, there are times, when enabling AES encryption would not work at all, as it would keep on spawnning more than 50 executors where the executors fail with the above mentioned error.
Even, setting spark.network.crypto.saslFallback to true didn't help.

Things are working fine when we enable SASL encryption, that is, only setting the following parameters:-
spark.authenticate true
spark.authenticate.secret <some-value>

I have attached the log file containing detailed error message. Please let us know if any configuration is missing or if any one has faced the same issue.

Any leads would be highly appreciated!!

Kind Regards,
Breeta Sinha


RE: RPC timeout error for AES based encryption between driver and executor

Posted by "Sinha, Breeta (Nokia - IN/Bangalore)" <br...@nokia.com>.
Hi Vanzin,

"spark.authenticate" is working properly for our environment (Spark 2.4 on Kubernetes).
We have made few code changes through which secure communication between driver and executor is working fine using shared spark.authenticate.secret.

Even SASL encryption works but when we set, 
spark.network.crypto.enabled true
to enable AES based encryption, we see RPC timeout error message sporadically.

Kind Regards,
Breeta


-----Original Message-----
From: Marcelo Vanzin <va...@cloudera.com> 
Sent: Tuesday, March 26, 2019 9:10 PM
To: Sinha, Breeta (Nokia - IN/Bangalore) <br...@nokia.com>
Cc: user@spark.apache.org
Subject: Re: RPC timeout error for AES based encryption between driver and executor

I don't think "spark.authenticate" works properly with k8s in 2.4 (which would make it impossible to enable encryption since it requires authentication). I'm pretty sure I fixed it in master, though.

On Tue, Mar 26, 2019 at 2:29 AM Sinha, Breeta (Nokia - IN/Bangalore) <br...@nokia.com> wrote:
>
> Hi All,
>
>
>
> We are trying to enable RPC encryption between driver and executor. Currently we're working on Spark 2.4 on Kubernetes.
>
>
>
> According to Apache Spark Security document (https://spark.apache.org/docs/latest/security.html) and our understanding on the same, it is clear that Spark supports AES-based encryption for RPC connections. There is also support for SASL-based encryption, although it should be considered deprecated.
>
>
>
> spark.network.crypto.enabled true , will enable AES-based RPC encryption.
>
> However, when we enable AES based encryption between driver and executor, we could observe a very sporadic behaviour in communication between driver and executor in the logs.
>
>
>
> Follwing are the options and their default values, we used for 
> enabling encryption:-
>
>
>
> spark.authenticate true
>
> spark.authenticate.secret <some-value>
>
> spark.network.crypto.enabled true
>
> spark.network.crypto.keyLength 256
>
> spark.network.crypto.saslFallback false
>
>
>
> A snippet of the executor log is provided below:-
>
> Exception in thread "main" 19/02/26 07:27:08 ERROR RpcOutboxMessage: 
> Ask timeout before connecting successfully
>
> Caused by: java.util.concurrent.TimeoutException: Cannot receive any 
> reply from 
> sts-spark-thrift-server-1551165767426-driver-svc.default.svc:7078 in 
> 120 seconds
>
>
>
> But, there is no error message or any message from executor seen in the driver log for the same timestamp.
>
>
>
> We also tried increasing spark.network.timeout, but no luck.
>
>
>
> This issue is seen sporadically, as the following observations were 
> noted:-
>
> 1) Sometimes, enabling AES encryption works completely fine.
>
> 2) Sometimes, enabling AES encryption works fine for around 10 consecutive spark-submits but next trigger of spark-submit would go into hang state with the above mentioned error in the executor log.
>
> 3) Also, there are times, when enabling AES encryption would not work at all, as it would keep on spawnning more than 50 executors where the executors fail with the above mentioned error.
>
> Even, setting spark.network.crypto.saslFallback to true didn't help.
>
>
>
> Things are working fine when we enable SASL encryption, that is, only 
> setting the following parameters:-
>
> spark.authenticate true
>
> spark.authenticate.secret <some-value>
>
>
>
> I have attached the log file containing detailed error message. Please let us know if any configuration is missing or if any one has faced the same issue.
>
>
>
> Any leads would be highly appreciated!!
>
>
>
> Kind Regards,
>
> Breeta Sinha
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org



--
Marcelo

Re: RPC timeout error for AES based encryption between driver and executor

Posted by Marcelo Vanzin <va...@cloudera.com.INVALID>.
I don't think "spark.authenticate" works properly with k8s in 2.4
(which would make it impossible to enable encryption since it requires
authentication). I'm pretty sure I fixed it in master, though.

On Tue, Mar 26, 2019 at 2:29 AM Sinha, Breeta (Nokia - IN/Bangalore)
<br...@nokia.com> wrote:
>
> Hi All,
>
>
>
> We are trying to enable RPC encryption between driver and executor. Currently we're working on Spark 2.4 on Kubernetes.
>
>
>
> According to Apache Spark Security document (https://spark.apache.org/docs/latest/security.html) and our understanding on the same, it is clear that Spark supports AES-based encryption for RPC connections. There is also support for SASL-based encryption, although it should be considered deprecated.
>
>
>
> spark.network.crypto.enabled true , will enable AES-based RPC encryption.
>
> However, when we enable AES based encryption between driver and executor, we could observe a very sporadic behaviour in communication between driver and executor in the logs.
>
>
>
> Follwing are the options and their default values, we used for enabling encryption:-
>
>
>
> spark.authenticate true
>
> spark.authenticate.secret <some-value>
>
> spark.network.crypto.enabled true
>
> spark.network.crypto.keyLength 256
>
> spark.network.crypto.saslFallback false
>
>
>
> A snippet of the executor log is provided below:-
>
> Exception in thread "main" 19/02/26 07:27:08 ERROR RpcOutboxMessage: Ask timeout before connecting successfully
>
> Caused by: java.util.concurrent.TimeoutException: Cannot receive any reply from sts-spark-thrift-server-1551165767426-driver-svc.default.svc:7078 in 120 seconds
>
>
>
> But, there is no error message or any message from executor seen in the driver log for the same timestamp.
>
>
>
> We also tried increasing spark.network.timeout, but no luck.
>
>
>
> This issue is seen sporadically, as the following observations were noted:-
>
> 1) Sometimes, enabling AES encryption works completely fine.
>
> 2) Sometimes, enabling AES encryption works fine for around 10 consecutive spark-submits but next trigger of spark-submit would go into hang state with the above mentioned error in the executor log.
>
> 3) Also, there are times, when enabling AES encryption would not work at all, as it would keep on spawnning more than 50 executors where the executors fail with the above mentioned error.
>
> Even, setting spark.network.crypto.saslFallback to true didn't help.
>
>
>
> Things are working fine when we enable SASL encryption, that is, only setting the following parameters:-
>
> spark.authenticate true
>
> spark.authenticate.secret <some-value>
>
>
>
> I have attached the log file containing detailed error message. Please let us know if any configuration is missing or if any one has faced the same issue.
>
>
>
> Any leads would be highly appreciated!!
>
>
>
> Kind Regards,
>
> Breeta Sinha
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org



-- 
Marcelo

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org