You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Jean-Yves STEPHAN (Jira)" <ji...@apache.org> on 2021/02/02 15:02:00 UTC

[jira] [Commented] (SPARK-34115) Long runtime on many environment variables

    [ https://issues.apache.org/jira/browse/SPARK-34115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17277182#comment-17277182 ] 

Jean-Yves STEPHAN commented on SPARK-34115:
-------------------------------------------

Hello - thanks for this fix [~nob13] , we discovered it had a signficant impact on our workloads at Data Mechanics.

Two questions for [~hyukjin.kwon]:
 * Could this be backported to 2.4?
 * Is there an upcoming release of Spark 3.0.2 planned? I know Spark 3.1.1 is coming out soon, but according to the Jira this fix will not be included.

Thanks for this work!

> Long runtime on many environment variables
> ------------------------------------------
>
>                 Key: SPARK-34115
>                 URL: https://issues.apache.org/jira/browse/SPARK-34115
>             Project: Spark
>          Issue Type: Sub-task
>          Components: Spark Core, SQL
>    Affects Versions: 2.4.0, 2.4.7, 3.0.1
>         Environment: Spark 2.4.0 local[2] on a Kubernetes Pod
>            Reporter: Norbert Schultz
>            Assignee: Norbert Schultz
>            Priority: Major
>             Fix For: 3.0.2, 3.1.2
>
>         Attachments: spark-bug-34115.tar.gz
>
>
> I am not sure if this is a bug report or a feature request. The code is is the same in current versions of Spark and maybe this ticket saves someone some time for debugging.
> We migrated some older code to Spark 2.4.0, and suddently the integration tests on our build machine were much slower than expected.
> On local machines it was running perfectly.
> At the end it turned out, that Spark was wasting CPU Cycles during DataFrame analyzing in the following functions
>  * AnalysisHelper.assertNotAnalysisRule calling
>  * Utils.isTesting
> Utils.isTesting is traversing all environment variables.
> The offending build machine was a Kubernetes Pod which automatically exposed all services as environment variables, so it had more than 3000 environment variables.
> As Utils.isTesting is called very often throgh AnalysisHelper.assertNotAnalysisRule (via AnalysisHelper.transformDown, transformUp).
>  
> Of course we will restrict the number of environment variables, on the other side Utils.isTesting could also use a lazy val for
>  
> {code:java}
> sys.env.contains("SPARK_TESTING") {code}
>  
> to not make it that expensive.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org