You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Jean-Yves STEPHAN (Jira)" <ji...@apache.org> on 2021/02/02 15:02:00 UTC
[jira] [Commented] (SPARK-34115) Long runtime on many environment
variables
[ https://issues.apache.org/jira/browse/SPARK-34115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17277182#comment-17277182 ]
Jean-Yves STEPHAN commented on SPARK-34115:
-------------------------------------------
Hello - thanks for this fix [~nob13] , we discovered it had a signficant impact on our workloads at Data Mechanics.
Two questions for [~hyukjin.kwon]:
* Could this be backported to 2.4?
* Is there an upcoming release of Spark 3.0.2 planned? I know Spark 3.1.1 is coming out soon, but according to the Jira this fix will not be included.
Thanks for this work!
> Long runtime on many environment variables
> ------------------------------------------
>
> Key: SPARK-34115
> URL: https://issues.apache.org/jira/browse/SPARK-34115
> Project: Spark
> Issue Type: Sub-task
> Components: Spark Core, SQL
> Affects Versions: 2.4.0, 2.4.7, 3.0.1
> Environment: Spark 2.4.0 local[2] on a Kubernetes Pod
> Reporter: Norbert Schultz
> Assignee: Norbert Schultz
> Priority: Major
> Fix For: 3.0.2, 3.1.2
>
> Attachments: spark-bug-34115.tar.gz
>
>
> I am not sure if this is a bug report or a feature request. The code is is the same in current versions of Spark and maybe this ticket saves someone some time for debugging.
> We migrated some older code to Spark 2.4.0, and suddently the integration tests on our build machine were much slower than expected.
> On local machines it was running perfectly.
> At the end it turned out, that Spark was wasting CPU Cycles during DataFrame analyzing in the following functions
> * AnalysisHelper.assertNotAnalysisRule calling
> * Utils.isTesting
> Utils.isTesting is traversing all environment variables.
> The offending build machine was a Kubernetes Pod which automatically exposed all services as environment variables, so it had more than 3000 environment variables.
> As Utils.isTesting is called very often throgh AnalysisHelper.assertNotAnalysisRule (via AnalysisHelper.transformDown, transformUp).
>
> Of course we will restrict the number of environment variables, on the other side Utils.isTesting could also use a lazy val for
>
> {code:java}
> sys.env.contains("SPARK_TESTING") {code}
>
> to not make it that expensive.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org