You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2020/11/23 15:54:15 UTC

[GitHub] [spark] ScrapCodes opened a new pull request #30472: [WIP][SPARK-32221] skip binary files and large files.

ScrapCodes opened a new pull request #30472:
URL: https://github.com/apache/spark/pull/30472


   <!--
   Thanks for sending a pull request!  Here are some tips for you:
     1. If this is your first time, please read our contributor guidelines: https://spark.apache.org/contributing.html
     2. Ensure you have added or run the appropriate tests for your PR: https://spark.apache.org/developer-tools.html
     3. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP][SPARK-XXXX] Your PR title ...'.
     4. Be sure to keep the PR description updated to reflect all changes.
     5. Please write your PR title to summarize what this PR proposes.
     6. If possible, provide a concise example to reproduce the issue for a faster review.
     7. If you want to add a new configuration, please read the guideline first for naming configurations in
        'core/src/main/scala/org/apache/spark/internal/config/ConfigEntry.scala'.
   -->
   
   ### What changes were proposed in this pull request?
   <!--
   Please clarify what changes you are proposing. The purpose of this section is to outline the changes and how this PR fixes the issue. 
   If possible, please consider writing useful notes for better and faster reviews in your PR. See the examples below.
     1. If you refactor some codes with changing classes, showing the class hierarchy will help reviewers.
     2. If you fix some SQL features, you can provide some references of other DBMSes.
     3. If there is design documentation, please add the link.
     4. If there is a discussion in the mailing list, please add the link.
   -->
   Skip files if they are binary or very large to fit the configMap's max size.
   
   ### Why are the changes needed?
   <!--
   Please clarify why the changes are needed. For instance,
     1. If you propose a new API, clarify the use case for a new API.
     2. If you fix a bug, you can clarify why it is a bug.
   -->
   Config map cannot hold binary files and there is also a limit on how much data a configMap can hold.
   This limit can be configured by the k8s cluster admin. This PR, skips such files instead of failing with weird runtime errors.
   
   ### Does this PR introduce _any_ user-facing change?
   <!--
   Note that it means *any* user-facing change including all aspects such as the documentation fix.
   If yes, please clarify the previous behavior and the change this PR proposes - provide the console output, description and/or an example to show the behavior difference if possible.
   If possible, please also clarify if this is a user-facing change compared to the released Spark versions or within the unreleased branches such as master.
   If no, write 'No'.
   -->
   yes, avoids possible errors.
   
   ### How was this patch tested?
   <!--
   If tests were added, say they were added here. Please make sure to add some test cases that check the changes thoroughly including negative and positive cases if possible.
   If it was tested in a way different from regular unit tests, please clarify how you tested step by step, ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future.
   If tests were not added, please describe why they were not added and/or why it was difficult to add.
   -->
   Added relevant tests.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] ScrapCodes commented on a change in pull request #30472: [SPARK-32221][k8s] Avoid possible errors due to incorrect file size or type supplied in spark conf.

Posted by GitBox <gi...@apache.org>.
ScrapCodes commented on a change in pull request #30472:
URL: https://github.com/apache/spark/pull/30472#discussion_r537291078



##########
File path: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/KubernetesClientUtils.scala
##########
@@ -51,6 +54,29 @@ private[spark] object KubernetesClientUtils extends Logging {
     propertiesWriter.toString
   }
 
+  object StringLengthOrdering extends Ordering[(String, String)] {
+    override def compare(x: (String, String), y: (String, String)): Int = {
+      // compare based on file length and break the tie with string comparison of keys.
+      (x._1.length + x._2.length).compare(y._1.length + y._2.length) * 10 +
+        x._1.compareTo(y._1)
+    }

Review comment:
       Good questions, and hoping that I have understood correctly, attempting to answer.
   
   1) We need to compare the file sizes _including their names_, because that is how they will occupy space in a config map. 
   2) We would like to give priority to as many config files as possible we can mount (or store in the configMap), so we would like to sort them by size.
   3) If the two files are equal in lengths, then we do not want to declare them equal, so in the compare equation, if two files are equal in length - we check their string compare results as well. This is done to break the tie.  When two files have equal lengths, we check if the two files have same name as well ? Then we say they are truly equal - in principle this should not happen, because they are files in the same directory i.e. `SPARK_CONF_DIR`. So, each file will get some ordering value as a result of computing comparison equation, and no data will be lost.
   
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a change in pull request #30472: [SPARK-32221][k8s] Avoid possible errors due to incorrect file size or type supplied in spark conf.

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #30472:
URL: https://github.com/apache/spark/pull/30472#discussion_r535759310



##########
File path: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/KubernetesClientUtils.scala
##########
@@ -94,25 +120,41 @@ private[spark] object KubernetesClientUtils extends Logging {
     val confDir = Option(conf.getenv(ENV_SPARK_CONF_DIR)).orElse(
       conf.getOption("spark.home").map(dir => s"$dir/conf"))
     if (confDir.isDefined) {
-      val confFiles = listConfFiles(confDir.get)
-      logInfo(s"Spark configuration files loaded from $confDir : ${confFiles.mkString(",")}")
-      confFiles.map { file =>
-        val source = Source.fromFile(file)(Codec.UTF8)
-        val mapping = (file.getName -> source.mkString)
-        source.close()
-        mapping
-      }.toMap
+      val confFiles = listConfFiles(confDir.get, conf.get(Config.CONFIG_MAP_MAXSIZE))
+      val filesSeq: Seq[Option[(String, String)]] = confFiles.map { file =>
+        try {
+          val source = Source.fromFile(file)(Codec.UTF8)
+          val mapping = Some(file.getName -> source.mkString)
+          source.close()
+          mapping
+        } catch {
+          case e: MalformedInputException =>
+            logWarning(s"skipped a non UTF-8 encoded file ${file.getAbsolutePath}.", e)
+            None
+        }
+      }
+      val truncatedMap = truncateToSize(filesSeq.flatten, conf.get(Config.CONFIG_MAP_MAXSIZE))
+      logInfo(s"Spark configuration files loaded from $confDir :" +
+        s" ${truncatedMap.keys.mkString(",")}")
+      truncatedMap
     } else {
       Map.empty[String, String]
     }
   }
 
-  private def listConfFiles(confDir: String): Seq[File] = {
-    // We exclude all the template files and user provided spark conf or properties.
-    // As spark properties are resolved in a different step.
+  private def listConfFiles(confDir: String, maxSize: Long): Seq[File] = {
+    // At the moment configmaps does not support storing binary content.
+    // configMaps do not allow for size greater than 1.5 MiB(configurable).
+    // https://etcd.io/docs/v3.4.0/dev-guide/limit/
+    // We exclude all the template files and user provided spark conf or properties,
+    // and binary files (e.g. jars and zip). Spark properties are resolved in a different step.
+    def test(f: File): Boolean = (f.length() + f.getName.length > maxSize) ||

Review comment:
       BTW, why `f.length() + f.getName.length > 0` is here?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #30472: [SPARK-32221][k8s] Avoid possible errors due to incorrect file size or type supplied in spark conf.

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #30472:
URL: https://github.com/apache/spark/pull/30472#issuecomment-752869110


   **[Test build #133558 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/133558/testReport)** for PR 30472 at commit [`a893fd8`](https://github.com/apache/spark/commit/a893fd8ae1476ff487ef635cace6707fd09210aa).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a change in pull request #30472: [SPARK-32221][k8s] Avoid possible errors due to incorrect file size or type supplied in spark conf.

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #30472:
URL: https://github.com/apache/spark/pull/30472#discussion_r535755593



##########
File path: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala
##########
@@ -90,6 +90,14 @@ private[spark] object Config extends Logging {
       .toSequence
       .createWithDefault(Nil)
 
+  val CONFIG_MAP_MAXSIZE =
+    ConfigBuilder("spark.kubernetes.configMap.maxsize")

Review comment:
       `maxsize` -> `maxSize`? (e.g. `spark.rpc.message.maxSize`, `spark.executor.logs.rolling.maxSize`, `spark.reducer.maxSizeInFlight`.)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #30472: [SPARK-32221][k8s] Avoid possible errors due to incorrect file size or type supplied in spark conf.

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #30472:
URL: https://github.com/apache/spark/pull/30472#issuecomment-739764791


   **[Test build #132349 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/132349/testReport)** for PR 30472 at commit [`7b9436c`](https://github.com/apache/spark/commit/7b9436ce2bd9c0407351a3b189bc5a6327eb5eb7).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a change in pull request #30472: [SPARK-32221][k8s] Avoid possible errors due to incorrect file size or type supplied in spark conf.

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #30472:
URL: https://github.com/apache/spark/pull/30472#discussion_r546152007



##########
File path: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/KubernetesClientUtils.scala
##########
@@ -90,29 +92,66 @@ private[spark] object KubernetesClientUtils extends Logging {
       .build()
   }
 
-  private def loadSparkConfDirFiles(conf: SparkConf): Map[String, String] = {
+  private def orderFilesBySize(confFiles: Seq[File]): Seq[File] = {
+    val fileToFileSizePairs = confFiles.map(f => (f, f.getName.length + f.length()))
+    // sort first by name and then by length, so that during tests we have consistent results.
+    fileToFileSizePairs.sortBy(f => f._1).sortBy(f => f._2).map(_._1)
+  }
+
+  // exposed for testing
+  private[submit] def loadSparkConfDirFiles(conf: SparkConf): Map[String, String] = {
     val confDir = Option(conf.getenv(ENV_SPARK_CONF_DIR)).orElse(
       conf.getOption("spark.home").map(dir => s"$dir/conf"))
+    val maxSize = conf.get(Config.CONFIG_MAP_MAXSIZE)
     if (confDir.isDefined) {
-      val confFiles = listConfFiles(confDir.get)
-      logInfo(s"Spark configuration files loaded from $confDir : ${confFiles.mkString(",")}")
-      confFiles.map { file =>
-        val source = Source.fromFile(file)(Codec.UTF8)
-        val mapping = (file.getName -> source.mkString)
-        source.close()
-        mapping
-      }.toMap
+      val confFiles: Seq[File] = listConfFiles(confDir.get, maxSize)
+      val orderedConfFiles = orderFilesBySize(confFiles)
+      var i: Long = 0
+      val truncatedMap = mutable.HashMap[String, String]()
+      for (file <- orderedConfFiles) {
+        var source: Source = Source.fromString("") // init with empty source.
+        try {
+          source = Source.fromFile(file)(Codec.UTF8)
+          val (fileName, fileContent) = file.getName -> source.mkString
+          if (fileContent.length > 0) {
+            i = i + (fileName.length + fileContent.length)
+            if (i < maxSize) {

Review comment:
       Do we need to add an `else` statement with `logWarning` in this case?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30472: [SPARK-32221][k8s] Avoid possible errors due to incorrect file size or type supplied in spark conf.

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30472:
URL: https://github.com/apache/spark/pull/30472#issuecomment-752863133


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/133556/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30472: [SPARK-32221][k8s] Avoid possible errors due to incorrect file size or type supplied in spark conf.

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30472:
URL: https://github.com/apache/spark/pull/30472#issuecomment-739954398


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/36969/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30472: [SPARK-32221][k8s] Avoid possible errors due to incorrect file size or type supplied in spark conf.

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30472:
URL: https://github.com/apache/spark/pull/30472#issuecomment-754594269


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38266/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30472: [SPARK-32221][k8s] Avoid possible errors due to incorrect file size or type supplied in spark conf.

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30472:
URL: https://github.com/apache/spark/pull/30472#issuecomment-751672658






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] ScrapCodes commented on pull request #30472: [SPARK-32221][k8s] Avoid possible errors due to incorrect file size or type supplied in spark conf.

Posted by GitBox <gi...@apache.org>.
ScrapCodes commented on pull request #30472:
URL: https://github.com/apache/spark/pull/30472#issuecomment-755105187


   @dongjoon-hyun, yes 
    Thanks !


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30472: [SPARK-32221][k8s] Avoid possible errors due to incorrect file size or type supplied in spark conf.

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30472:
URL: https://github.com/apache/spark/pull/30472#issuecomment-755149787


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/38305/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30472: [SPARK-32221][k8s] Avoid possible errors due to incorrect file size or type supplied in spark conf.

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30472:
URL: https://github.com/apache/spark/pull/30472#issuecomment-751656050


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/133442/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on pull request #30472: [SPARK-32221][k8s] Avoid possible errors due to incorrect file size or type supplied in spark conf.

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on pull request #30472:
URL: https://github.com/apache/spark/pull/30472#issuecomment-749345172


   Please let me know if this is ready back, @ScrapCodes .


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30472: [SPARK-32221][k8s] Avoid possible errors due to incorrect file size or type supplied in spark conf.

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30472:
URL: https://github.com/apache/spark/pull/30472#issuecomment-737782039






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30472: [SPARK-32221][k8s] Avoid possible errors due to incorrect file size or type supplied in spark conf.

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30472:
URL: https://github.com/apache/spark/pull/30472#issuecomment-754541391


   **[Test build #133677 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/133677/testReport)** for PR 30472 at commit [`8a5e387`](https://github.com/apache/spark/commit/8a5e3871a3cd20c54a2ceafacc3114ad0cc3290c).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30472: [WIP][SPARK-32221] skip binary files and large files.

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30472:
URL: https://github.com/apache/spark/pull/30472#issuecomment-732250856


   **[Test build #131554 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131554/testReport)** for PR 30472 at commit [`c99b2bc`](https://github.com/apache/spark/commit/c99b2bca50b2386067bdf09ffed5aec181deadb5).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] ScrapCodes commented on pull request #30472: [SPARK-32221][k8s] Avoid possible errors due to incorrect file size or type supplied in spark conf.

Posted by GitBox <gi...@apache.org>.
ScrapCodes commented on pull request #30472:
URL: https://github.com/apache/spark/pull/30472#issuecomment-752863646


   Jenkins, build is failing due to `no space left on device`.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30472: [SPARK-32221][k8s] Avoid possible errors due to incorrect file size or type supplied in spark conf.

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30472:
URL: https://github.com/apache/spark/pull/30472#issuecomment-748625395


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/37705/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] ScrapCodes commented on pull request #30472: [SPARK-32221][k8s] Avoid possible errors due to incorrect file size or type supplied in spark conf.

Posted by GitBox <gi...@apache.org>.
ScrapCodes commented on pull request #30472:
URL: https://github.com/apache/spark/pull/30472#issuecomment-734083543


   Hi @dongjoon-hyun, I have tried to improve test coverage and also some corner cases. Would you like to take a look here as well ?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30472: [WIP][SPARK-32221] skip binary files and large files.

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30472:
URL: https://github.com/apache/spark/pull/30472#issuecomment-732507016






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] ScrapCodes commented on a change in pull request #30472: [SPARK-32221][k8s] Avoid possible errors due to incorrect file size or type supplied in spark conf.

Posted by GitBox <gi...@apache.org>.
ScrapCodes commented on a change in pull request #30472:
URL: https://github.com/apache/spark/pull/30472#discussion_r540706293



##########
File path: resource-managers/kubernetes/core/src/test/scala/org/apache/spark/deploy/k8s/submit/ClientSuite.scala
##########
@@ -191,25 +191,32 @@ class ClientSuite extends SparkFunSuite with BeforeAndAfter {
     assert(configMap.getData.get(SPARK_CONF_FILE_NAME).contains("conf2key=conf2value"))
   }
 
-  test("All files from SPARK_CONF_DIR, except templates and spark config " +
+  test("All files from SPARK_CONF_DIR, " +
+    "except templates, spark config, binary files and are within size limit, " +
     "should be populated to pod's configMap.") {
     def testSetup: (SparkConf, Seq[String]) = {
       val tempDir = Utils.createTempDir()
-      val sparkConf = new SparkConf(loadDefaults = false).setSparkHome(tempDir.getAbsolutePath)
+      val sparkConf = new SparkConf(loadDefaults = false)
+        .setSparkHome(tempDir.getAbsolutePath)
 
       val tempConfDir = new File(s"${tempDir.getAbsolutePath}/conf")
       tempConfDir.mkdir()
       // File names - which should not get mounted on the resultant config map.
       val filteredConfFileNames =
-        Set("spark-env.sh.template", "spark.properties", "spark-defaults.conf")
-      val confFileNames = for (i <- 1 to 5) yield s"testConf.$i" ++
+        Set("spark-env.sh.template", "spark.properties", "spark-defaults.conf",
+          "test.gz", "test2.jar", "non_utf8.txt")

Review comment:
       For example, if a user needs to pass his hadoop's core-site.xml. or some other framework he is trying to use, has a configuration file and for that matter, he has a custom application which can be configured.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30472: [SPARK-32221][k8s] Avoid possible errors due to incorrect file size or type supplied in spark conf.

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30472:
URL: https://github.com/apache/spark/pull/30472#issuecomment-755149787


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/38305/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30472: [SPARK-32221][k8s] Avoid possible errors due to incorrect file size or type supplied in spark conf.

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30472:
URL: https://github.com/apache/spark/pull/30472#issuecomment-755080860


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/133709/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] ScrapCodes commented on pull request #30472: [SPARK-32221][k8s] Avoid possible errors due to incorrect file size or type supplied in spark conf.

Posted by GitBox <gi...@apache.org>.
ScrapCodes commented on pull request #30472:
URL: https://github.com/apache/spark/pull/30472#issuecomment-739921021


   @srowen and @dongjoon-hyun Thanks a lot for taking a look, I have hopefully addressed all the comments. Please take a look again. Thanks !


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #30472: [WIP][SPARK-32221] Avoid possible errors due to incorrect file size or type supplied in spark conf.

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #30472:
URL: https://github.com/apache/spark/pull/30472#issuecomment-732960228


   **[Test build #131660 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131660/testReport)** for PR 30472 at commit [`8838aba`](https://github.com/apache/spark/commit/8838abafc9c659a67e912651279d2706a4a503fd).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30472: [WIP][SPARK-32221] Avoid possible errors due to incorrect file size or type supplied in spark conf.

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30472:
URL: https://github.com/apache/spark/pull/30472#issuecomment-732967004






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30472: [SPARK-32221][k8s] Avoid possible errors due to incorrect file size or type supplied in spark conf.

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30472:
URL: https://github.com/apache/spark/pull/30472#issuecomment-733577201


   **[Test build #131761 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131761/testReport)** for PR 30472 at commit [`2d676cd`](https://github.com/apache/spark/commit/2d676cdeaed89ffe89f00de41350e51122b559d1).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30472: [SPARK-32221][k8s] Avoid possible errors due to incorrect file size or type supplied in spark conf.

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30472:
URL: https://github.com/apache/spark/pull/30472#issuecomment-751676576


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/38036/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] srowen commented on a change in pull request #30472: [SPARK-32221][k8s] Avoid possible errors due to incorrect file size or type supplied in spark conf.

Posted by GitBox <gi...@apache.org>.
srowen commented on a change in pull request #30472:
URL: https://github.com/apache/spark/pull/30472#discussion_r536112344



##########
File path: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/KubernetesClientUtils.scala
##########
@@ -51,6 +54,29 @@ private[spark] object KubernetesClientUtils extends Logging {
     propertiesWriter.toString
   }
 
+  object StringLengthOrdering extends Ordering[(String, String)] {
+    override def compare(x: (String, String), y: (String, String)): Int = {
+      // compare based on file length and break the tie with string comparison of keys.
+      (x._1.length + x._2.length).compare(y._1.length + y._2.length) * 10 +
+        x._1.compareTo(y._1)
+    }
+  }
+
+  def truncateToSize(seq: Seq[(String, String)], maxSize: Long): Map[String, String] = {
+    // First order the entries in order of their size.
+    // Skip entries if the resulting Map size exceeds maxSize.
+    val ordering: Ordering[(String, String)] = StringLengthOrdering
+    val sortedSet = SortedSet[(String, String)](seq : _*)(ordering)

Review comment:
       `seq: _*`

##########
File path: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala
##########
@@ -90,6 +90,14 @@ private[spark] object Config extends Logging {
       .toSequence
       .createWithDefault(Nil)
 
+  val CONFIG_MAP_MAXSIZE =
+    ConfigBuilder("spark.kubernetes.configMap.maxsize")
+      .doc("Max size limit for a config map. This is configurable as per" +
+        " https://etcd.io/docs/v3.4.0/dev-guide/limit/ on k8s server end.")
+      .version("3.1.0")

Review comment:
       This will probably be for 3.2.0, note

##########
File path: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/KubernetesClientUtils.scala
##########
@@ -94,25 +120,41 @@ private[spark] object KubernetesClientUtils extends Logging {
     val confDir = Option(conf.getenv(ENV_SPARK_CONF_DIR)).orElse(
       conf.getOption("spark.home").map(dir => s"$dir/conf"))
     if (confDir.isDefined) {
-      val confFiles = listConfFiles(confDir.get)
-      logInfo(s"Spark configuration files loaded from $confDir : ${confFiles.mkString(",")}")
-      confFiles.map { file =>
-        val source = Source.fromFile(file)(Codec.UTF8)
-        val mapping = (file.getName -> source.mkString)
-        source.close()
-        mapping
-      }.toMap
+      val confFiles = listConfFiles(confDir.get, conf.get(Config.CONFIG_MAP_MAXSIZE))
+      val filesSeq: Seq[Option[(String, String)]] = confFiles.map { file =>
+        try {
+          val source = Source.fromFile(file)(Codec.UTF8)
+          val mapping = Some(file.getName -> source.mkString)
+          source.close()
+          mapping
+        } catch {
+          case e: MalformedInputException =>
+            logWarning(s"skipped a non UTF-8 encoded file ${file.getAbsolutePath}.", e)
+            None
+        }
+      }
+      val truncatedMap = truncateToSize(filesSeq.flatten, conf.get(Config.CONFIG_MAP_MAXSIZE))
+      logInfo(s"Spark configuration files loaded from $confDir :" +

Review comment:
       Or do you want to truncate earlier before reading them all into memory?

##########
File path: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/KubernetesClientUtils.scala
##########
@@ -51,6 +54,29 @@ private[spark] object KubernetesClientUtils extends Logging {
     propertiesWriter.toString
   }
 
+  object StringLengthOrdering extends Ordering[(String, String)] {
+    override def compare(x: (String, String), y: (String, String)): Int = {
+      // compare based on file length and break the tie with string comparison of keys.
+      (x._1.length + x._2.length).compare(y._1.length + y._2.length) * 10 +
+        x._1.compareTo(y._1)
+    }
+  }
+
+  def truncateToSize(seq: Seq[(String, String)], maxSize: Long): Map[String, String] = {
+    // First order the entries in order of their size.
+    // Skip entries if the resulting Map size exceeds maxSize.
+    val ordering: Ordering[(String, String)] = StringLengthOrdering
+    val sortedSet = SortedSet[(String, String)](seq : _*)(ordering)
+    var i: Int = 0
+    val map = mutable.HashMap[String, String]()
+    for (item <- sortedSet) {
+      i += item._1.length + item._2.length
+      if ( i < maxSize) {

Review comment:
       And no big deal but if they're sorted by sum, you can stop after you hit something of maxSize

##########
File path: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/KubernetesClientUtils.scala
##########
@@ -51,6 +54,29 @@ private[spark] object KubernetesClientUtils extends Logging {
     propertiesWriter.toString
   }
 
+  object StringLengthOrdering extends Ordering[(String, String)] {
+    override def compare(x: (String, String), y: (String, String)): Int = {
+      // compare based on file length and break the tie with string comparison of keys.
+      (x._1.length + x._2.length).compare(y._1.length + y._2.length) * 10 +
+        x._1.compareTo(y._1)
+    }

Review comment:
       Yeah don't you want to compare the sum and only compare the first one if they're equal? do you really even need compare()? 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30472: [WIP][SPARK-32221] Avoid possible errors due to incorrect file size or type supplied in spark conf.

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30472:
URL: https://github.com/apache/spark/pull/30472#issuecomment-732967004






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30472: [SPARK-32221][k8s] Avoid possible errors due to incorrect file size or type supplied in spark conf.

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30472:
URL: https://github.com/apache/spark/pull/30472#issuecomment-754611772


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38266/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a change in pull request #30472: [SPARK-32221][k8s] Avoid possible errors due to incorrect file size or type supplied in spark conf.

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #30472:
URL: https://github.com/apache/spark/pull/30472#discussion_r550364684



##########
File path: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/KubernetesClientUtils.scala
##########
@@ -90,29 +92,67 @@ private[spark] object KubernetesClientUtils extends Logging {
       .build()
   }
 
-  private def loadSparkConfDirFiles(conf: SparkConf): Map[String, String] = {
+  private def orderFilesBySize(confFiles: Seq[File]): Seq[File] = {
+    val fileToFileSizePairs = confFiles.map(f => (f, f.getName.length + f.length()))
+    // sort first by name and then by length, so that during tests we have consistent results.
+    fileToFileSizePairs.sortBy(f => f._1).sortBy(f => f._2).map(_._1)

Review comment:
       This has a bug because we will consider file name later by `truncatedMapSize = truncatedMapSize + (fileName.length + fileContent.length)`.
   ```scala
   scala> Seq("b" -> 1, "abcdef" -> 1).sortBy(f => f._1).sortBy(f => f._2)
   res14: Seq[(String, Int)] = List((abcdef,1), (b,1))
   
   scala> Seq("b" -> 1, "abcdef" -> 1).sortBy(f => f._1).sortBy(f => f._2).map(f => f._1.length + f._2)
   res15: Seq[Int] = List(7, 2)
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30472: [SPARK-32221][k8s] Avoid possible errors due to incorrect file size or type supplied in spark conf.

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30472:
URL: https://github.com/apache/spark/pull/30472#issuecomment-739804188


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/132355/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30472: [SPARK-32221][k8s] Avoid possible errors due to incorrect file size or type supplied in spark conf.

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30472:
URL: https://github.com/apache/spark/pull/30472#issuecomment-748616608






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #30472: [SPARK-32221][k8s] Avoid possible errors due to incorrect file size or type supplied in spark conf.

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #30472:
URL: https://github.com/apache/spark/pull/30472#issuecomment-739796546


   **[Test build #132355 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/132355/testReport)** for PR 30472 at commit [`73c2aa7`](https://github.com/apache/spark/commit/73c2aa79597a8a8e862fc3256767270bb12613ba).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30472: [SPARK-32221][k8s] Avoid possible errors due to incorrect file size or type supplied in spark conf.

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30472:
URL: https://github.com/apache/spark/pull/30472#issuecomment-748621163


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/37705/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30472: [SPARK-32221][k8s] Avoid possible errors due to incorrect file size or type supplied in spark conf.

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30472:
URL: https://github.com/apache/spark/pull/30472#issuecomment-755144719


   Kubernetes integration test status success
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38305/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] ScrapCodes commented on a change in pull request #30472: [SPARK-32221][k8s] Avoid possible errors due to incorrect file size or type supplied in spark conf.

Posted by GitBox <gi...@apache.org>.
ScrapCodes commented on a change in pull request #30472:
URL: https://github.com/apache/spark/pull/30472#discussion_r537291078



##########
File path: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/KubernetesClientUtils.scala
##########
@@ -51,6 +54,29 @@ private[spark] object KubernetesClientUtils extends Logging {
     propertiesWriter.toString
   }
 
+  object StringLengthOrdering extends Ordering[(String, String)] {
+    override def compare(x: (String, String), y: (String, String)): Int = {
+      // compare based on file length and break the tie with string comparison of keys.
+      (x._1.length + x._2.length).compare(y._1.length + y._2.length) * 10 +
+        x._1.compareTo(y._1)
+    }

Review comment:
       Good questions, and hoping that I have understood correctly, attempting to answer.
   
   1) We need to compare the file sizes _including their names_, because that is how they will occupy space in a config map. 
   2) We would like to give priority to as many config files as possible we can mount (or store in the configMap), so we would like to sort them by size.
   3) If the two files are equal in lengths, then we do not want to declare them equal, so in the compare equation we add their string compare results as well. This is done to break the tie.  When two files have equal lengths, we check if the two files have same name as well ? Then we say they are truly equal - in principle this should not happen, because they are files in the same directory i.e. `SPARK_CONF_DIR`. So, each file will get some ordering value as a result of computing comparison equation, and no data will be lost.
   
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30472: [WIP][SPARK-32221] skip binary files and large files.

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30472:
URL: https://github.com/apache/spark/pull/30472#issuecomment-732501424


   **[Test build #131581 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131581/testReport)** for PR 30472 at commit [`e8cfadd`](https://github.com/apache/spark/commit/e8cfaddfcdcfe15ed1a1dfd8e5fcf5b78621ee25).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #30472: [SPARK-32221][k8s] Avoid possible errors due to incorrect file size or type supplied in spark conf.

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #30472:
URL: https://github.com/apache/spark/pull/30472#issuecomment-733568741


   **[Test build #131761 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131761/testReport)** for PR 30472 at commit [`2d676cd`](https://github.com/apache/spark/commit/2d676cdeaed89ffe89f00de41350e51122b559d1).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30472: [SPARK-32221][k8s] Avoid possible errors due to incorrect file size or type supplied in spark conf.

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30472:
URL: https://github.com/apache/spark/pull/30472#issuecomment-739804188


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/132355/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] asfgit closed pull request #30472: [SPARK-32221][k8s] Avoid possible errors due to incorrect file size or type supplied in spark conf.

Posted by GitBox <gi...@apache.org>.
asfgit closed pull request #30472:
URL: https://github.com/apache/spark/pull/30472


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30472: [SPARK-32221][k8s] Avoid possible errors due to incorrect file size or type supplied in spark conf.

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30472:
URL: https://github.com/apache/spark/pull/30472#issuecomment-739786400


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/36949/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a change in pull request #30472: [SPARK-32221][k8s] Avoid possible errors due to incorrect file size or type supplied in spark conf.

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #30472:
URL: https://github.com/apache/spark/pull/30472#discussion_r535757680



##########
File path: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/KubernetesClientUtils.scala
##########
@@ -94,25 +120,41 @@ private[spark] object KubernetesClientUtils extends Logging {
     val confDir = Option(conf.getenv(ENV_SPARK_CONF_DIR)).orElse(
       conf.getOption("spark.home").map(dir => s"$dir/conf"))
     if (confDir.isDefined) {
-      val confFiles = listConfFiles(confDir.get)
-      logInfo(s"Spark configuration files loaded from $confDir : ${confFiles.mkString(",")}")
-      confFiles.map { file =>
-        val source = Source.fromFile(file)(Codec.UTF8)
-        val mapping = (file.getName -> source.mkString)
-        source.close()
-        mapping
-      }.toMap
+      val confFiles = listConfFiles(confDir.get, conf.get(Config.CONFIG_MAP_MAXSIZE))
+      val filesSeq: Seq[Option[(String, String)]] = confFiles.map { file =>
+        try {
+          val source = Source.fromFile(file)(Codec.UTF8)
+          val mapping = Some(file.getName -> source.mkString)
+          source.close()
+          mapping
+        } catch {
+          case e: MalformedInputException =>
+            logWarning(s"skipped a non UTF-8 encoded file ${file.getAbsolutePath}.", e)
+            None
+        }
+      }
+      val truncatedMap = truncateToSize(filesSeq.flatten, conf.get(Config.CONFIG_MAP_MAXSIZE))
+      logInfo(s"Spark configuration files loaded from $confDir :" +

Review comment:
       Shall we skip this if `truncatedMap` is empty?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30472: [SPARK-32221][k8s] Avoid possible errors due to incorrect file size or type supplied in spark conf.

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30472:
URL: https://github.com/apache/spark/pull/30472#issuecomment-739803982


   **[Test build #132355 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/132355/testReport)** for PR 30472 at commit [`73c2aa7`](https://github.com/apache/spark/commit/73c2aa79597a8a8e862fc3256767270bb12613ba).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30472: [SPARK-32221][k8s] Avoid possible errors due to incorrect file size or type supplied in spark conf.

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30472:
URL: https://github.com/apache/spark/pull/30472#issuecomment-752881280


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/38147/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] lyssg commented on pull request #30472: [SPARK-32221][k8s] Avoid possible errors due to incorrect file size or type supplied in spark conf.

Posted by GitBox <gi...@apache.org>.
lyssg commented on pull request #30472:
URL: https://github.com/apache/spark/pull/30472#issuecomment-973642087


   Hi @ScrapCodes ,should we make the same restriction when mounting file in hadoop conf in HadoopConfDriverFeatureStep?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30472: [SPARK-32221][k8s] Avoid possible errors due to incorrect file size or type supplied in spark conf.

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30472:
URL: https://github.com/apache/spark/pull/30472#issuecomment-734244332






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30472: [SPARK-32221][k8s] Avoid possible errors due to incorrect file size or type supplied in spark conf.

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30472:
URL: https://github.com/apache/spark/pull/30472#issuecomment-739764791


   **[Test build #132349 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/132349/testReport)** for PR 30472 at commit [`7b9436c`](https://github.com/apache/spark/commit/7b9436ce2bd9c0407351a3b189bc5a6327eb5eb7).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a change in pull request #30472: [SPARK-32221][k8s] Avoid possible errors due to incorrect file size or type supplied in spark conf.

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #30472:
URL: https://github.com/apache/spark/pull/30472#discussion_r535757939



##########
File path: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/KubernetesClientUtils.scala
##########
@@ -94,25 +120,41 @@ private[spark] object KubernetesClientUtils extends Logging {
     val confDir = Option(conf.getenv(ENV_SPARK_CONF_DIR)).orElse(
       conf.getOption("spark.home").map(dir => s"$dir/conf"))
     if (confDir.isDefined) {
-      val confFiles = listConfFiles(confDir.get)
-      logInfo(s"Spark configuration files loaded from $confDir : ${confFiles.mkString(",")}")
-      confFiles.map { file =>
-        val source = Source.fromFile(file)(Codec.UTF8)
-        val mapping = (file.getName -> source.mkString)
-        source.close()
-        mapping
-      }.toMap
+      val confFiles = listConfFiles(confDir.get, conf.get(Config.CONFIG_MAP_MAXSIZE))
+      val filesSeq: Seq[Option[(String, String)]] = confFiles.map { file =>
+        try {
+          val source = Source.fromFile(file)(Codec.UTF8)
+          val mapping = Some(file.getName -> source.mkString)
+          source.close()
+          mapping
+        } catch {
+          case e: MalformedInputException =>
+            logWarning(s"skipped a non UTF-8 encoded file ${file.getAbsolutePath}.", e)
+            None
+        }
+      }
+      val truncatedMap = truncateToSize(filesSeq.flatten, conf.get(Config.CONFIG_MAP_MAXSIZE))
+      logInfo(s"Spark configuration files loaded from $confDir :" +
+        s" ${truncatedMap.keys.mkString(",")}")
+      truncatedMap
     } else {
       Map.empty[String, String]
     }
   }
 
-  private def listConfFiles(confDir: String): Seq[File] = {
-    // We exclude all the template files and user provided spark conf or properties.
-    // As spark properties are resolved in a different step.
+  private def listConfFiles(confDir: String, maxSize: Long): Seq[File] = {
+    // At the moment configmaps does not support storing binary content.

Review comment:
       `does not` -> `do not`




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] ScrapCodes commented on a change in pull request #30472: [SPARK-32221][k8s] Avoid possible errors due to incorrect file size or type supplied in spark conf.

Posted by GitBox <gi...@apache.org>.
ScrapCodes commented on a change in pull request #30472:
URL: https://github.com/apache/spark/pull/30472#discussion_r550406649



##########
File path: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/KubernetesClientUtils.scala
##########
@@ -90,29 +92,67 @@ private[spark] object KubernetesClientUtils extends Logging {
       .build()
   }
 
-  private def loadSparkConfDirFiles(conf: SparkConf): Map[String, String] = {
+  private def orderFilesBySize(confFiles: Seq[File]): Seq[File] = {
+    val fileToFileSizePairs = confFiles.map(f => (f, f.getName.length + f.length()))
+    // sort first by name and then by length, so that during tests we have consistent results.
+    fileToFileSizePairs.sortBy(f => f._1).sortBy(f => f._2).map(_._1)

Review comment:
       Values are actually sum total of the size of filename and it's content. Two items with same size (i.e. `"b" ->10 and "abcdef" ->10` ) would mean, that if first one did not fit, next one won't fit either. Anyway, I have addressed it, by - update the `truncateMapSize` iff it is updated.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] ScrapCodes commented on pull request #30472: [SPARK-32221][k8s] Avoid possible errors due to incorrect file size or type supplied in spark conf.

Posted by GitBox <gi...@apache.org>.
ScrapCodes commented on pull request #30472:
URL: https://github.com/apache/spark/pull/30472#issuecomment-748614874


   > And,
   > 
   > 1. Could you rebase to the master to bring the latest CI fixes please?
   > 2. For versioning, I'm also okay to have this in 3.1.0 with the same reason with Sean.
   
   Thanks @dongjoon-hyun, I have updated it.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30472: [WIP][SPARK-32221] skip binary files and large files.

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30472:
URL: https://github.com/apache/spark/pull/30472#issuecomment-732601885






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30472: [SPARK-32221][k8s] Avoid possible errors due to incorrect file size or type supplied in spark conf.

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30472:
URL: https://github.com/apache/spark/pull/30472#issuecomment-739796546


   **[Test build #132355 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/132355/testReport)** for PR 30472 at commit [`73c2aa7`](https://github.com/apache/spark/commit/73c2aa79597a8a8e862fc3256767270bb12613ba).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30472: [SPARK-32221][k8s] Avoid possible errors due to incorrect file size or type supplied in spark conf.

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30472:
URL: https://github.com/apache/spark/pull/30472#issuecomment-737854515


   Kubernetes integration test status success
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/36709/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30472: [SPARK-32221][k8s] Avoid possible errors due to incorrect file size or type supplied in spark conf.

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30472:
URL: https://github.com/apache/spark/pull/30472#issuecomment-739800810


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/36949/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30472: [WIP][SPARK-32221] skip binary files and large files.

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30472:
URL: https://github.com/apache/spark/pull/30472#issuecomment-732261201






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a change in pull request #30472: [SPARK-32221][k8s] Avoid possible errors due to incorrect file size or type supplied in spark conf.

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #30472:
URL: https://github.com/apache/spark/pull/30472#discussion_r546151251



##########
File path: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/KubernetesClientUtils.scala
##########
@@ -90,29 +92,66 @@ private[spark] object KubernetesClientUtils extends Logging {
       .build()
   }
 
-  private def loadSparkConfDirFiles(conf: SparkConf): Map[String, String] = {
+  private def orderFilesBySize(confFiles: Seq[File]): Seq[File] = {
+    val fileToFileSizePairs = confFiles.map(f => (f, f.getName.length + f.length()))
+    // sort first by name and then by length, so that during tests we have consistent results.
+    fileToFileSizePairs.sortBy(f => f._1).sortBy(f => f._2).map(_._1)
+  }
+
+  // exposed for testing
+  private[submit] def loadSparkConfDirFiles(conf: SparkConf): Map[String, String] = {
     val confDir = Option(conf.getenv(ENV_SPARK_CONF_DIR)).orElse(
       conf.getOption("spark.home").map(dir => s"$dir/conf"))
+    val maxSize = conf.get(Config.CONFIG_MAP_MAXSIZE)
     if (confDir.isDefined) {
-      val confFiles = listConfFiles(confDir.get)
-      logInfo(s"Spark configuration files loaded from $confDir : ${confFiles.mkString(",")}")
-      confFiles.map { file =>
-        val source = Source.fromFile(file)(Codec.UTF8)
-        val mapping = (file.getName -> source.mkString)
-        source.close()
-        mapping
-      }.toMap
+      val confFiles: Seq[File] = listConfFiles(confDir.get, maxSize)
+      val orderedConfFiles = orderFilesBySize(confFiles)
+      var i: Long = 0

Review comment:
       Could you use more meaningful name?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30472: [SPARK-32221][k8s] Avoid possible errors due to incorrect file size or type supplied in spark conf.

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30472:
URL: https://github.com/apache/spark/pull/30472#issuecomment-754566862


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/133677/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] ScrapCodes commented on a change in pull request #30472: [SPARK-32221][k8s] Avoid possible errors due to incorrect file size or type supplied in spark conf.

Posted by GitBox <gi...@apache.org>.
ScrapCodes commented on a change in pull request #30472:
URL: https://github.com/apache/spark/pull/30472#discussion_r537291078



##########
File path: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/KubernetesClientUtils.scala
##########
@@ -51,6 +54,29 @@ private[spark] object KubernetesClientUtils extends Logging {
     propertiesWriter.toString
   }
 
+  object StringLengthOrdering extends Ordering[(String, String)] {
+    override def compare(x: (String, String), y: (String, String)): Int = {
+      // compare based on file length and break the tie with string comparison of keys.
+      (x._1.length + x._2.length).compare(y._1.length + y._2.length) * 10 +
+        x._1.compareTo(y._1)
+    }

Review comment:
       Good questions, and hoping that I have understood correctly, attempting to answer.
   
   1) We need to compare the file sizes _including their names_, because that is how they will occupy space in a config map. 
   2) We would like to give priority to as many config files as possible we can mount (or store in the configMap), so we would like to sort them by size.
   3) If the two files are equal in lengths, then we do not want to declare them equal, so in the compare equation we add their string compare results as well. This is done to break the tie. So `*10` is done to give more priority to compare by their length but if the comparison result is equal, i.e. two files are exactly equal in length then we  get zero as the comparison result. By adding string comparison result of their names, i.e. when two files have equal lengths, we check if the two files have same name as well ? Then we say they are truly equal - in principle this should not happen, because they are files in the same directory i.e. `SPARK_CONF_DIR`. So, each file will get some ordering value as a result of computing comparison equation, and no data will be lost.
   
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #30472: [SPARK-32221][k8s] Avoid possible errors due to incorrect file size or type supplied in spark conf.

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #30472:
URL: https://github.com/apache/spark/pull/30472#issuecomment-754541391


   **[Test build #133677 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/133677/testReport)** for PR 30472 at commit [`8a5e387`](https://github.com/apache/spark/commit/8a5e3871a3cd20c54a2ceafacc3114ad0cc3290c).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30472: [SPARK-32221][k8s] Avoid possible errors due to incorrect file size or type supplied in spark conf.

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30472:
URL: https://github.com/apache/spark/pull/30472#issuecomment-739800837


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/36949/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30472: [SPARK-32221][k8s] Avoid possible errors due to incorrect file size or type supplied in spark conf.

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30472:
URL: https://github.com/apache/spark/pull/30472#issuecomment-748628989


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/37705/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30472: [SPARK-32221][k8s] Avoid possible errors due to incorrect file size or type supplied in spark conf.

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30472:
URL: https://github.com/apache/spark/pull/30472#issuecomment-755080860


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/133709/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30472: [WIP][SPARK-32221] skip binary files and large files.

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30472:
URL: https://github.com/apache/spark/pull/30472#issuecomment-732280480


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/36157/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a change in pull request #30472: [SPARK-32221][k8s] Avoid possible errors due to incorrect file size or type supplied in spark conf.

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #30472:
URL: https://github.com/apache/spark/pull/30472#discussion_r535756787



##########
File path: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/KubernetesClientUtils.scala
##########
@@ -51,6 +54,29 @@ private[spark] object KubernetesClientUtils extends Logging {
     propertiesWriter.toString
   }
 
+  object StringLengthOrdering extends Ordering[(String, String)] {
+    override def compare(x: (String, String), y: (String, String)): Int = {
+      // compare based on file length and break the tie with string comparison of keys.
+      (x._1.length + x._2.length).compare(y._1.length + y._2.length) * 10 +
+        x._1.compareTo(y._1)
+    }
+  }
+
+  def truncateToSize(seq: Seq[(String, String)], maxSize: Long): Map[String, String] = {
+    // First order the entries in order of their size.
+    // Skip entries if the resulting Map size exceeds maxSize.
+    val ordering: Ordering[(String, String)] = StringLengthOrdering
+    val sortedSet = SortedSet[(String, String)](seq : _*)(ordering)
+    var i: Int = 0
+    val map = mutable.HashMap[String, String]()
+    for (item <- sortedSet) {
+      i += item._1.length + item._2.length
+      if ( i < maxSize) {

Review comment:
       ```scala
   - if ( i < maxSize) {
   + if (i < maxSize) {
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30472: [SPARK-32221][k8s] Avoid possible errors due to incorrect file size or type supplied in spark conf.

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30472:
URL: https://github.com/apache/spark/pull/30472#issuecomment-737854577






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a change in pull request #30472: [SPARK-32221][k8s] Avoid possible errors due to incorrect file size or type supplied in spark conf.

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #30472:
URL: https://github.com/apache/spark/pull/30472#discussion_r550364684



##########
File path: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/KubernetesClientUtils.scala
##########
@@ -90,29 +92,67 @@ private[spark] object KubernetesClientUtils extends Logging {
       .build()
   }
 
-  private def loadSparkConfDirFiles(conf: SparkConf): Map[String, String] = {
+  private def orderFilesBySize(confFiles: Seq[File]): Seq[File] = {
+    val fileToFileSizePairs = confFiles.map(f => (f, f.getName.length + f.length()))
+    // sort first by name and then by length, so that during tests we have consistent results.
+    fileToFileSizePairs.sortBy(f => f._1).sortBy(f => f._2).map(_._1)

Review comment:
       This seems to introduce a bug because we will consider file name later by `truncatedMapSize = truncatedMapSize + (fileName.length + fileContent.length)`. Please see [this comment](https://github.com/apache/spark/pull/30472#discussion_r550364758).
   ```scala
   scala> Seq("b" -> 1, "abcdef" -> 1).sortBy(f => f._1).sortBy(f => f._2)
   res14: Seq[(String, Int)] = List((abcdef,1), (b,1))
   
   scala> Seq("b" -> 1, "abcdef" -> 1).sortBy(f => f._1).sortBy(f => f._2).map(f => f._1.length + f._2)
   res15: Seq[Int] = List(7, 2)
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #30472: [SPARK-32221][k8s] Avoid possible errors due to incorrect file size or type supplied in spark conf.

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #30472:
URL: https://github.com/apache/spark/pull/30472#issuecomment-755084792


   **[Test build #133717 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/133717/testReport)** for PR 30472 at commit [`a42f82d`](https://github.com/apache/spark/commit/a42f82d40337a3b171196e6a97b9f76c8977a08b).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30472: [SPARK-32221][k8s] Avoid possible errors due to incorrect file size or type supplied in spark conf.

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30472:
URL: https://github.com/apache/spark/pull/30472#issuecomment-755084792


   **[Test build #133717 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/133717/testReport)** for PR 30472 at commit [`a42f82d`](https://github.com/apache/spark/commit/a42f82d40337a3b171196e6a97b9f76c8977a08b).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #30472: [SPARK-32221][k8s] Avoid possible errors due to incorrect file size or type supplied in spark conf.

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #30472:
URL: https://github.com/apache/spark/pull/30472#issuecomment-737774511


   **[Test build #132110 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/132110/testReport)** for PR 30472 at commit [`25e677f`](https://github.com/apache/spark/commit/25e677f1bcade7df997e578378bf3e7d0e9f8ec4).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] ScrapCodes edited a comment on pull request #30472: [SPARK-32221][k8s] Avoid possible errors due to incorrect file size or type supplied in spark conf.

Posted by GitBox <gi...@apache.org>.
ScrapCodes edited a comment on pull request #30472:
URL: https://github.com/apache/spark/pull/30472#issuecomment-737198852


   Hi @dongjoon-hyun, sure !  please see #30568 
   
   Thank you for taking a look !


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30472: [SPARK-32221][k8s] Avoid possible errors due to incorrect file size or type supplied in spark conf.

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30472:
URL: https://github.com/apache/spark/pull/30472#issuecomment-739921858


   **[Test build #132369 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/132369/testReport)** for PR 30472 at commit [`6413f32`](https://github.com/apache/spark/commit/6413f32a084b9d7cd5119f48451f6802a9c7eef1).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] ScrapCodes commented on pull request #30472: [SPARK-32221][k8s] Avoid possible errors due to incorrect file size or type supplied in spark conf.

Posted by GitBox <gi...@apache.org>.
ScrapCodes commented on pull request #30472:
URL: https://github.com/apache/spark/pull/30472#issuecomment-754518119


   Hi @dongjoon-hyun, I have done one more pass of self review, and found few possible improvements and updated accordingly.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30472: [SPARK-32221][k8s] Avoid possible errors due to incorrect file size or type supplied in spark conf.

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30472:
URL: https://github.com/apache/spark/pull/30472#issuecomment-752881913


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/38148/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30472: [SPARK-32221][k8s] Avoid possible errors due to incorrect file size or type supplied in spark conf.

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30472:
URL: https://github.com/apache/spark/pull/30472#issuecomment-752867034


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/133557/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] ScrapCodes commented on a change in pull request #30472: [SPARK-32221][k8s] Avoid possible errors due to incorrect file size or type supplied in spark conf.

Posted by GitBox <gi...@apache.org>.
ScrapCodes commented on a change in pull request #30472:
URL: https://github.com/apache/spark/pull/30472#discussion_r550402095



##########
File path: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala
##########
@@ -99,6 +99,14 @@ private[spark] object Config extends Logging {
       .toSequence
       .createWithDefault(Nil)
 
+  val CONFIG_MAP_MAXSIZE =
+    ConfigBuilder("spark.kubernetes.configMap.maxSize")
+      .doc("Max size limit for a config map. This is configurable as per" +
+        " https://etcd.io/docs/v3.4.0/dev-guide/limit/ on k8s server end.")
+      .version("3.1.0")
+      .longConf
+      .createWithDefault(1573000) // 1.5 MiB

Review comment:
       @dongjoon-hyun I think you are correct about this (though it is 1.5MiB not 1.5Mib Mebibyte v/s Mebibit). I was deceived by google:
   ![Untitled](https://user-images.githubusercontent.com/992952/103396327-91305080-4b58-11eb-8392-5276701451dd.png)
   Correct value is : 1.5MiB= 1572864 bytes




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] ScrapCodes commented on a change in pull request #30472: [SPARK-32221][k8s] Avoid possible errors due to incorrect file size or type supplied in spark conf.

Posted by GitBox <gi...@apache.org>.
ScrapCodes commented on a change in pull request #30472:
URL: https://github.com/apache/spark/pull/30472#discussion_r550406649



##########
File path: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/KubernetesClientUtils.scala
##########
@@ -90,29 +92,67 @@ private[spark] object KubernetesClientUtils extends Logging {
       .build()
   }
 
-  private def loadSparkConfDirFiles(conf: SparkConf): Map[String, String] = {
+  private def orderFilesBySize(confFiles: Seq[File]): Seq[File] = {
+    val fileToFileSizePairs = confFiles.map(f => (f, f.getName.length + f.length()))
+    // sort first by name and then by length, so that during tests we have consistent results.
+    fileToFileSizePairs.sortBy(f => f._1).sortBy(f => f._2).map(_._1)

Review comment:
       Values are actually sum total of the size of filename and it's content. Two items with same size (i.e. `"b" ->10 and "abcdef" ->10` ) would mean, that if first one did not fit, next one won't fit either. Anyway, I have addressed it, by -- update the `truncateMapSize` iff `truncateMap` is updated.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on pull request #30472: [SPARK-32221][k8s] Avoid possible errors due to incorrect file size or type supplied in spark conf.

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on pull request #30472:
URL: https://github.com/apache/spark/pull/30472#issuecomment-755051833


   Retest this please.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30472: [WIP][SPARK-32221] skip binary files and large files.

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30472:
URL: https://github.com/apache/spark/pull/30472#issuecomment-732296301


   Kubernetes integration test status success
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/36157/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30472: [WIP][SPARK-32221] skip binary files and large files.

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30472:
URL: https://github.com/apache/spark/pull/30472#issuecomment-732506832


   **[Test build #131581 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131581/testReport)** for PR 30472 at commit [`e8cfadd`](https://github.com/apache/spark/commit/e8cfaddfcdcfe15ed1a1dfd8e5fcf5b78621ee25).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30472: [SPARK-32221][k8s] Avoid possible errors due to incorrect file size or type supplied in spark conf.

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30472:
URL: https://github.com/apache/spark/pull/30472#issuecomment-733761479






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a change in pull request #30472: [SPARK-32221][k8s] Avoid possible errors due to incorrect file size or type supplied in spark conf.

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #30472:
URL: https://github.com/apache/spark/pull/30472#discussion_r550364684



##########
File path: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/KubernetesClientUtils.scala
##########
@@ -90,29 +92,67 @@ private[spark] object KubernetesClientUtils extends Logging {
       .build()
   }
 
-  private def loadSparkConfDirFiles(conf: SparkConf): Map[String, String] = {
+  private def orderFilesBySize(confFiles: Seq[File]): Seq[File] = {
+    val fileToFileSizePairs = confFiles.map(f => (f, f.getName.length + f.length()))
+    // sort first by name and then by length, so that during tests we have consistent results.
+    fileToFileSizePairs.sortBy(f => f._1).sortBy(f => f._2).map(_._1)

Review comment:
       This has a bug because we will consider file name later by `truncatedMapSize = truncatedMapSize + (fileName.length + fileContent.length)`. Please see [this comment](https://github.com/apache/spark/pull/30472#discussion_r550364758).
   ```scala
   scala> Seq("b" -> 1, "abcdef" -> 1).sortBy(f => f._1).sortBy(f => f._2)
   res14: Seq[(String, Int)] = List((abcdef,1), (b,1))
   
   scala> Seq("b" -> 1, "abcdef" -> 1).sortBy(f => f._1).sortBy(f => f._2).map(f => f._1.length + f._2)
   res15: Seq[Int] = List(7, 2)
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] ScrapCodes commented on a change in pull request #30472: [SPARK-32221][k8s] Avoid possible errors due to incorrect file size or type supplied in spark conf.

Posted by GitBox <gi...@apache.org>.
ScrapCodes commented on a change in pull request #30472:
URL: https://github.com/apache/spark/pull/30472#discussion_r530893082



##########
File path: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala
##########
@@ -90,6 +90,14 @@ private[spark] object Config extends Logging {
       .toSequence
       .createWithDefault(Nil)
 
+  val CONFIG_MAP_MAXSIZE =
+    ConfigBuilder("spark.kubernetes.configMap.maxsize")
+    .doc("Max size limit for a config map. This may be configurable as per" +
+      " https://etcd.io/docs/v3.4.0/dev-guide/limit/ in the future.")
+    .version("3.1.0")

Review comment:
       "Max size limit for a config map. This is configurable as per https://etcd.io/docs/v3.4.0/dev-guide/limit/ on k8s server end."
   
   So may be we should keep it configurable too.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30472: [WIP][SPARK-32221] skip binary files and large files.

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30472:
URL: https://github.com/apache/spark/pull/30472#issuecomment-732264932






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a change in pull request #30472: [SPARK-32221][k8s] Avoid possible errors due to incorrect file size or type supplied in spark conf.

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #30472:
URL: https://github.com/apache/spark/pull/30472#discussion_r535758501



##########
File path: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/KubernetesClientUtils.scala
##########
@@ -94,25 +120,41 @@ private[spark] object KubernetesClientUtils extends Logging {
     val confDir = Option(conf.getenv(ENV_SPARK_CONF_DIR)).orElse(
       conf.getOption("spark.home").map(dir => s"$dir/conf"))
     if (confDir.isDefined) {
-      val confFiles = listConfFiles(confDir.get)
-      logInfo(s"Spark configuration files loaded from $confDir : ${confFiles.mkString(",")}")
-      confFiles.map { file =>
-        val source = Source.fromFile(file)(Codec.UTF8)
-        val mapping = (file.getName -> source.mkString)
-        source.close()
-        mapping
-      }.toMap
+      val confFiles = listConfFiles(confDir.get, conf.get(Config.CONFIG_MAP_MAXSIZE))
+      val filesSeq: Seq[Option[(String, String)]] = confFiles.map { file =>
+        try {
+          val source = Source.fromFile(file)(Codec.UTF8)
+          val mapping = Some(file.getName -> source.mkString)
+          source.close()
+          mapping
+        } catch {
+          case e: MalformedInputException =>
+            logWarning(s"skipped a non UTF-8 encoded file ${file.getAbsolutePath}.", e)
+            None
+        }
+      }
+      val truncatedMap = truncateToSize(filesSeq.flatten, conf.get(Config.CONFIG_MAP_MAXSIZE))
+      logInfo(s"Spark configuration files loaded from $confDir :" +
+        s" ${truncatedMap.keys.mkString(",")}")
+      truncatedMap
     } else {
       Map.empty[String, String]
     }
   }
 
-  private def listConfFiles(confDir: String): Seq[File] = {
-    // We exclude all the template files and user provided spark conf or properties.
-    // As spark properties are resolved in a different step.
+  private def listConfFiles(confDir: String, maxSize: Long): Seq[File] = {
+    // At the moment configmaps does not support storing binary content.
+    // configMaps do not allow for size greater than 1.5 MiB(configurable).
+    // https://etcd.io/docs/v3.4.0/dev-guide/limit/
+    // We exclude all the template files and user provided spark conf or properties,
+    // and binary files (e.g. jars and zip). Spark properties are resolved in a different step.
+    def test(f: File): Boolean = (f.length() + f.getName.length > maxSize) ||

Review comment:
       `test` looks a little too broad name.

##########
File path: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/KubernetesClientUtils.scala
##########
@@ -94,25 +120,41 @@ private[spark] object KubernetesClientUtils extends Logging {
     val confDir = Option(conf.getenv(ENV_SPARK_CONF_DIR)).orElse(
       conf.getOption("spark.home").map(dir => s"$dir/conf"))
     if (confDir.isDefined) {
-      val confFiles = listConfFiles(confDir.get)
-      logInfo(s"Spark configuration files loaded from $confDir : ${confFiles.mkString(",")}")
-      confFiles.map { file =>
-        val source = Source.fromFile(file)(Codec.UTF8)
-        val mapping = (file.getName -> source.mkString)
-        source.close()
-        mapping
-      }.toMap
+      val confFiles = listConfFiles(confDir.get, conf.get(Config.CONFIG_MAP_MAXSIZE))
+      val filesSeq: Seq[Option[(String, String)]] = confFiles.map { file =>
+        try {
+          val source = Source.fromFile(file)(Codec.UTF8)
+          val mapping = Some(file.getName -> source.mkString)
+          source.close()
+          mapping
+        } catch {
+          case e: MalformedInputException =>
+            logWarning(s"skipped a non UTF-8 encoded file ${file.getAbsolutePath}.", e)
+            None
+        }
+      }
+      val truncatedMap = truncateToSize(filesSeq.flatten, conf.get(Config.CONFIG_MAP_MAXSIZE))
+      logInfo(s"Spark configuration files loaded from $confDir :" +
+        s" ${truncatedMap.keys.mkString(",")}")
+      truncatedMap
     } else {
       Map.empty[String, String]
     }
   }
 
-  private def listConfFiles(confDir: String): Seq[File] = {
-    // We exclude all the template files and user provided spark conf or properties.
-    // As spark properties are resolved in a different step.
+  private def listConfFiles(confDir: String, maxSize: Long): Seq[File] = {
+    // At the moment configmaps does not support storing binary content.
+    // configMaps do not allow for size greater than 1.5 MiB(configurable).
+    // https://etcd.io/docs/v3.4.0/dev-guide/limit/
+    // We exclude all the template files and user provided spark conf or properties,
+    // and binary files (e.g. jars and zip). Spark properties are resolved in a different step.
+    def test(f: File): Boolean = (f.length() + f.getName.length > maxSize) ||

Review comment:
       `test` looks like a little too broad name.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30472: [SPARK-32221][k8s] Avoid possible errors due to incorrect file size or type supplied in spark conf.

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30472:
URL: https://github.com/apache/spark/pull/30472#issuecomment-733761479






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] ScrapCodes commented on a change in pull request #30472: [SPARK-32221][k8s] Avoid possible errors due to incorrect file size or type supplied in spark conf.

Posted by GitBox <gi...@apache.org>.
ScrapCodes commented on a change in pull request #30472:
URL: https://github.com/apache/spark/pull/30472#discussion_r551850022



##########
File path: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/KubernetesClientUtils.scala
##########
@@ -156,11 +162,8 @@ private[spark] object KubernetesClientUtils extends Logging {
     }
     val confFiles: Seq[File] = {
       val dir = new File(confDir)
-      if (dir.isDirectory) {
-        dir.listFiles.filter(x => fileFilter(x)).toSeq
-      } else {
-        Nil
-      }
+      assert(dir.isDirectory, "Spark conf should be a directory.")

Review comment:
       In terms of execution, both have similar effect. Still if others also feel - that this should be moved, I can do so.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30472: [SPARK-32221][k8s] Avoid possible errors due to incorrect file size or type supplied in spark conf.

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30472:
URL: https://github.com/apache/spark/pull/30472#issuecomment-734212394


   **[Test build #131841 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131841/testReport)** for PR 30472 at commit [`3e49641`](https://github.com/apache/spark/commit/3e496419c086757a1d8b36ed56d824d487cfa16e).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30472: [SPARK-32221][k8s] Avoid possible errors due to incorrect file size or type supplied in spark conf.

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30472:
URL: https://github.com/apache/spark/pull/30472#issuecomment-734205929


   **[Test build #131841 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131841/testReport)** for PR 30472 at commit [`3e49641`](https://github.com/apache/spark/commit/3e496419c086757a1d8b36ed56d824d487cfa16e).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30472: [SPARK-32221][k8s] Avoid possible errors due to incorrect file size or type supplied in spark conf.

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30472:
URL: https://github.com/apache/spark/pull/30472#issuecomment-755090995


   **[Test build #133717 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/133717/testReport)** for PR 30472 at commit [`a42f82d`](https://github.com/apache/spark/commit/a42f82d40337a3b171196e6a97b9f76c8977a08b).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30472: [SPARK-32221][k8s] Avoid possible errors due to incorrect file size or type supplied in spark conf.

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30472:
URL: https://github.com/apache/spark/pull/30472#issuecomment-737774511


   **[Test build #132110 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/132110/testReport)** for PR 30472 at commit [`25e677f`](https://github.com/apache/spark/commit/25e677f1bcade7df997e578378bf3e7d0e9f8ec4).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30472: [SPARK-32221][k8s] Avoid possible errors due to incorrect file size or type supplied in spark conf.

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30472:
URL: https://github.com/apache/spark/pull/30472#issuecomment-733567813






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30472: [SPARK-32221][k8s] Avoid possible errors due to incorrect file size or type supplied in spark conf.

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30472:
URL: https://github.com/apache/spark/pull/30472#issuecomment-737814097


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/36709/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a change in pull request #30472: [SPARK-32221][k8s] Avoid possible errors due to incorrect file size or type supplied in spark conf.

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #30472:
URL: https://github.com/apache/spark/pull/30472#discussion_r535756626



##########
File path: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/KubernetesClientUtils.scala
##########
@@ -51,6 +54,29 @@ private[spark] object KubernetesClientUtils extends Logging {
     propertiesWriter.toString
   }
 
+  object StringLengthOrdering extends Ordering[(String, String)] {
+    override def compare(x: (String, String), y: (String, String)): Int = {
+      // compare based on file length and break the tie with string comparison of keys.
+      (x._1.length + x._2.length).compare(y._1.length + y._2.length) * 10 +
+        x._1.compareTo(y._1)
+    }

Review comment:
       Is this `*10 + ...` correctly?
   - https://www.scala-lang.org/api/2.12.10/scala/math/Ordering.html#compare(x:T,y:T):Int




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30472: [SPARK-32221][k8s] Avoid possible errors due to incorrect file size or type supplied in spark conf.

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30472:
URL: https://github.com/apache/spark/pull/30472#issuecomment-733577468






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30472: [WIP][SPARK-32221] skip binary files and large files.

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30472:
URL: https://github.com/apache/spark/pull/30472#issuecomment-732507016






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30472: [SPARK-32221][k8s] Avoid possible errors due to incorrect file size or type supplied in spark conf.

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30472:
URL: https://github.com/apache/spark/pull/30472#issuecomment-751644392


   **[Test build #133442 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/133442/testReport)** for PR 30472 at commit [`38288f1`](https://github.com/apache/spark/commit/38288f1e49ec290fb8a9cb1435e11f3bd881f0dd).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #30472: [WIP][SPARK-32221] skip binary files and large files.

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #30472:
URL: https://github.com/apache/spark/pull/30472#issuecomment-732250856


   **[Test build #131554 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131554/testReport)** for PR 30472 at commit [`c99b2bc`](https://github.com/apache/spark/commit/c99b2bca50b2386067bdf09ffed5aec181deadb5).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30472: [SPARK-32221][k8s] Avoid possible errors due to incorrect file size or type supplied in spark conf.

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30472:
URL: https://github.com/apache/spark/pull/30472#issuecomment-751656050


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/133442/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30472: [SPARK-32221][k8s] Avoid possible errors due to incorrect file size or type supplied in spark conf.

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30472:
URL: https://github.com/apache/spark/pull/30472#issuecomment-752874894


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38147/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30472: [SPARK-32221][k8s] Avoid possible errors due to incorrect file size or type supplied in spark conf.

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30472:
URL: https://github.com/apache/spark/pull/30472#issuecomment-739928903


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/132369/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] ScrapCodes commented on a change in pull request #30472: [SPARK-32221][k8s] Avoid possible errors due to incorrect file size or type supplied in spark conf.

Posted by GitBox <gi...@apache.org>.
ScrapCodes commented on a change in pull request #30472:
URL: https://github.com/apache/spark/pull/30472#discussion_r540084075



##########
File path: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala
##########
@@ -90,6 +90,14 @@ private[spark] object Config extends Logging {
       .toSequence
       .createWithDefault(Nil)
 
+  val CONFIG_MAP_MAXSIZE =
+    ConfigBuilder("spark.kubernetes.configMap.maxSize")
+      .doc("Max size limit for a config map. This is configurable as per" +
+        " https://etcd.io/docs/v3.4.0/dev-guide/limit/ on k8s server end.")
+      .version("3.2.0")

Review comment:
       It would be good, if this could make it into 3.1.0 ? @dongjoon-hyun and @srowen WDYT? 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30472: [SPARK-32221][k8s] Avoid possible errors due to incorrect file size or type supplied in spark conf.

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30472:
URL: https://github.com/apache/spark/pull/30472#issuecomment-733550292


   **[Test build #131757 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131757/testReport)** for PR 30472 at commit [`534f2ff`](https://github.com/apache/spark/commit/534f2ffc3aa6f019e9c3f85b5f7d35be92c0c379).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30472: [SPARK-32221][k8s] Avoid possible errors due to incorrect file size or type supplied in spark conf.

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30472:
URL: https://github.com/apache/spark/pull/30472#issuecomment-739992714


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/36969/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] srowen commented on a change in pull request #30472: [SPARK-32221][k8s] Avoid possible errors due to incorrect file size or type supplied in spark conf.

Posted by GitBox <gi...@apache.org>.
srowen commented on a change in pull request #30472:
URL: https://github.com/apache/spark/pull/30472#discussion_r540333116



##########
File path: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala
##########
@@ -90,6 +90,14 @@ private[spark] object Config extends Logging {
       .toSequence
       .createWithDefault(Nil)
 
+  val CONFIG_MAP_MAXSIZE =
+    ConfigBuilder("spark.kubernetes.configMap.maxSize")
+      .doc("Max size limit for a config map. This is configurable as per" +
+        " https://etcd.io/docs/v3.4.0/dev-guide/limit/ on k8s server end.")
+      .version("3.2.0")

Review comment:
       On the one hand we normally only back-port fixes, and this is more like a better-error-message improvement. That said, for minor low-risk changes I am not against it if one feels at all strongly about it.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30472: [SPARK-32221][k8s] Avoid possible errors due to incorrect file size or type supplied in spark conf.

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30472:
URL: https://github.com/apache/spark/pull/30472#issuecomment-739800837


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/36949/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30472: [SPARK-32221][k8s] Avoid possible errors due to incorrect file size or type supplied in spark conf.

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30472:
URL: https://github.com/apache/spark/pull/30472#issuecomment-739774239


   **[Test build #132349 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/132349/testReport)** for PR 30472 at commit [`7b9436c`](https://github.com/apache/spark/commit/7b9436ce2bd9c0407351a3b189bc5a6327eb5eb7).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30472: [SPARK-32221][k8s] Avoid possible errors due to incorrect file size or type supplied in spark conf.

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30472:
URL: https://github.com/apache/spark/pull/30472#issuecomment-734212562






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #30472: [SPARK-32221][k8s] Avoid possible errors due to incorrect file size or type supplied in spark conf.

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #30472:
URL: https://github.com/apache/spark/pull/30472#issuecomment-751644392


   **[Test build #133442 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/133442/testReport)** for PR 30472 at commit [`38288f1`](https://github.com/apache/spark/commit/38288f1e49ec290fb8a9cb1435e11f3bd881f0dd).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #30472: [SPARK-32221][k8s] Avoid possible errors due to incorrect file size or type supplied in spark conf.

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #30472:
URL: https://github.com/apache/spark/pull/30472#issuecomment-748615298


   **[Test build #133105 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/133105/testReport)** for PR 30472 at commit [`cfdc71e`](https://github.com/apache/spark/commit/cfdc71ea847c4095a0070ae4a1db424bda7b8b06).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a change in pull request #30472: [SPARK-32221][k8s] Avoid possible errors due to incorrect file size or type supplied in spark conf.

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #30472:
URL: https://github.com/apache/spark/pull/30472#discussion_r533596698



##########
File path: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala
##########
@@ -90,6 +90,14 @@ private[spark] object Config extends Logging {
       .toSequence
       .createWithDefault(Nil)
 
+  val CONFIG_MAP_MAXSIZE =
+    ConfigBuilder("spark.kubernetes.configMap.maxsize")
+    .doc("Max size limit for a config map. This is configurable as per" +

Review comment:
       indentation?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on pull request #30472: [SPARK-32221][k8s] Avoid possible errors due to incorrect file size or type supplied in spark conf.

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on pull request #30472:
URL: https://github.com/apache/spark/pull/30472#issuecomment-738476424


   Thank you for updates, @ScrapCodes !


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30472: [WIP][SPARK-32221] skip binary files and large files.

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30472:
URL: https://github.com/apache/spark/pull/30472#issuecomment-732296342






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #30472: [WIP][SPARK-32221] skip binary files and large files.

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #30472:
URL: https://github.com/apache/spark/pull/30472#issuecomment-732501424


   **[Test build #131581 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131581/testReport)** for PR 30472 at commit [`e8cfadd`](https://github.com/apache/spark/commit/e8cfaddfcdcfe15ed1a1dfd8e5fcf5b78621ee25).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30472: [SPARK-32221][k8s] Avoid possible errors due to incorrect file size or type supplied in spark conf.

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30472:
URL: https://github.com/apache/spark/pull/30472#issuecomment-737782039






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30472: [SPARK-32221][k8s] Avoid possible errors due to incorrect file size or type supplied in spark conf.

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30472:
URL: https://github.com/apache/spark/pull/30472#issuecomment-739928681


   **[Test build #132369 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/132369/testReport)** for PR 30472 at commit [`6413f32`](https://github.com/apache/spark/commit/6413f32a084b9d7cd5119f48451f6802a9c7eef1).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30472: [WIP][SPARK-32221] skip binary files and large files.

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30472:
URL: https://github.com/apache/spark/pull/30472#issuecomment-732264932






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #30472: [SPARK-32221][k8s] Avoid possible errors due to incorrect file size or type supplied in spark conf.

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #30472:
URL: https://github.com/apache/spark/pull/30472#issuecomment-752861651


   **[Test build #133556 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/133556/testReport)** for PR 30472 at commit [`92ad72c`](https://github.com/apache/spark/commit/92ad72c73162d5cd45c2f107ca925489d4bf339d).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30472: [SPARK-32221][k8s] Avoid possible errors due to incorrect file size or type supplied in spark conf.

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30472:
URL: https://github.com/apache/spark/pull/30472#issuecomment-752867034


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/133557/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30472: [WIP][SPARK-32221] Avoid possible errors due to incorrect file size or type supplied in spark conf.

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30472:
URL: https://github.com/apache/spark/pull/30472#issuecomment-732960228


   **[Test build #131660 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131660/testReport)** for PR 30472 at commit [`8838aba`](https://github.com/apache/spark/commit/8838abafc9c659a67e912651279d2706a4a503fd).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30472: [SPARK-32221][k8s] Avoid possible errors due to incorrect file size or type supplied in spark conf.

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30472:
URL: https://github.com/apache/spark/pull/30472#issuecomment-739992714


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/36969/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] ScrapCodes commented on pull request #30472: [SPARK-32221][k8s] Avoid possible errors due to incorrect file size or type supplied in spark conf.

Posted by GitBox <gi...@apache.org>.
ScrapCodes commented on pull request #30472:
URL: https://github.com/apache/spark/pull/30472#issuecomment-738634123


   > `truncateToSize` seems to ignore users' files without proper warnings.
   > Instead of config data loss, why don't we make Spark job fail in this case?
   
   Yeah, it makes sense to fail with a good explanation. 
   
   One of my considerations were, by failing nicely we are changing the existing behaviour (as in 3.0.x) i.e. some invalid or large configuration file in SPARK_CONF_DIR did not fail the job run. After this change, those who upgrade, will have to make changes to their config directory as well.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30472: [WIP][SPARK-32221] skip binary files and large files.

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30472:
URL: https://github.com/apache/spark/pull/30472#issuecomment-732254059


   **[Test build #131555 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131555/testReport)** for PR 30472 at commit [`e8cfadd`](https://github.com/apache/spark/commit/e8cfaddfcdcfe15ed1a1dfd8e5fcf5b78621ee25).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30472: [SPARK-32221][k8s] Avoid possible errors due to incorrect file size or type supplied in spark conf.

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30472:
URL: https://github.com/apache/spark/pull/30472#issuecomment-752872534


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/133558/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30472: [SPARK-32221][k8s] Avoid possible errors due to incorrect file size or type supplied in spark conf.

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30472:
URL: https://github.com/apache/spark/pull/30472#issuecomment-751672658


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/133447/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on pull request #30472: [SPARK-32221][k8s] Avoid possible errors due to incorrect file size or type supplied in spark conf.

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on pull request #30472:
URL: https://github.com/apache/spark/pull/30472#issuecomment-737353126


   Thanks, @ScrapCodes . 
   https://github.com/apache/spark/pull/30568 is merged. Could you rebase this to the master?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] ScrapCodes commented on a change in pull request #30472: [SPARK-32221][k8s] Avoid possible errors due to incorrect file size or type supplied in spark conf.

Posted by GitBox <gi...@apache.org>.
ScrapCodes commented on a change in pull request #30472:
URL: https://github.com/apache/spark/pull/30472#discussion_r546383494



##########
File path: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/KubernetesClientUtils.scala
##########
@@ -90,29 +92,66 @@ private[spark] object KubernetesClientUtils extends Logging {
       .build()
   }
 
-  private def loadSparkConfDirFiles(conf: SparkConf): Map[String, String] = {
+  private def orderFilesBySize(confFiles: Seq[File]): Seq[File] = {
+    val fileToFileSizePairs = confFiles.map(f => (f, f.getName.length + f.length()))
+    // sort first by name and then by length, so that during tests we have consistent results.
+    fileToFileSizePairs.sortBy(f => f._1).sortBy(f => f._2).map(_._1)
+  }
+
+  // exposed for testing
+  private[submit] def loadSparkConfDirFiles(conf: SparkConf): Map[String, String] = {
     val confDir = Option(conf.getenv(ENV_SPARK_CONF_DIR)).orElse(
       conf.getOption("spark.home").map(dir => s"$dir/conf"))
+    val maxSize = conf.get(Config.CONFIG_MAP_MAXSIZE)
     if (confDir.isDefined) {
-      val confFiles = listConfFiles(confDir.get)
-      logInfo(s"Spark configuration files loaded from $confDir : ${confFiles.mkString(",")}")
-      confFiles.map { file =>
-        val source = Source.fromFile(file)(Codec.UTF8)
-        val mapping = (file.getName -> source.mkString)
-        source.close()
-        mapping
-      }.toMap
+      val confFiles: Seq[File] = listConfFiles(confDir.get, maxSize)
+      val orderedConfFiles = orderFilesBySize(confFiles)
+      var i: Long = 0
+      val truncatedMap = mutable.HashMap[String, String]()
+      for (file <- orderedConfFiles) {
+        var source: Source = Source.fromString("") // init with empty source.
+        try {
+          source = Source.fromFile(file)(Codec.UTF8)
+          val (fileName, fileContent) = file.getName -> source.mkString
+          if (fileContent.length > 0) {

Review comment:
       reverted this change.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #30472: [SPARK-32221][k8s] Avoid possible errors due to incorrect file size or type supplied in spark conf.

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #30472:
URL: https://github.com/apache/spark/pull/30472#issuecomment-734205929


   **[Test build #131841 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131841/testReport)** for PR 30472 at commit [`3e49641`](https://github.com/apache/spark/commit/3e496419c086757a1d8b36ed56d824d487cfa16e).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] ScrapCodes commented on a change in pull request #30472: [SPARK-32221][k8s] Avoid possible errors due to incorrect file size or type supplied in spark conf.

Posted by GitBox <gi...@apache.org>.
ScrapCodes commented on a change in pull request #30472:
URL: https://github.com/apache/spark/pull/30472#discussion_r540704406



##########
File path: resource-managers/kubernetes/core/src/test/scala/org/apache/spark/deploy/k8s/submit/ClientSuite.scala
##########
@@ -191,25 +191,32 @@ class ClientSuite extends SparkFunSuite with BeforeAndAfter {
     assert(configMap.getData.get(SPARK_CONF_FILE_NAME).contains("conf2key=conf2value"))
   }
 
-  test("All files from SPARK_CONF_DIR, except templates and spark config " +
+  test("All files from SPARK_CONF_DIR, " +
+    "except templates, spark config, binary files and are within size limit, " +
     "should be populated to pod's configMap.") {
     def testSetup: (SparkConf, Seq[String]) = {
       val tempDir = Utils.createTempDir()
-      val sparkConf = new SparkConf(loadDefaults = false).setSparkHome(tempDir.getAbsolutePath)
+      val sparkConf = new SparkConf(loadDefaults = false)
+        .setSparkHome(tempDir.getAbsolutePath)
 
       val tempConfDir = new File(s"${tempDir.getAbsolutePath}/conf")
       tempConfDir.mkdir()
       // File names - which should not get mounted on the resultant config map.
       val filteredConfFileNames =
-        Set("spark-env.sh.template", "spark.properties", "spark-defaults.conf")
-      val confFileNames = for (i <- 1 to 5) yield s"testConf.$i" ++
+        Set("spark-env.sh.template", "spark.properties", "spark-defaults.conf",
+          "test.gz", "test2.jar", "non_utf8.txt")

Review comment:
       That would disallow a user from supplying his library/framework specific configuration. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #30472: [SPARK-32221][k8s] Avoid possible errors due to incorrect file size or type supplied in spark conf.

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #30472:
URL: https://github.com/apache/spark/pull/30472#issuecomment-752864101


   **[Test build #133557 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/133557/testReport)** for PR 30472 at commit [`8d6a773`](https://github.com/apache/spark/commit/8d6a7736e3847a5db321bce3412526a7a0e2925b).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30472: [SPARK-32221][k8s] Avoid possible errors due to incorrect file size or type supplied in spark conf.

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30472:
URL: https://github.com/apache/spark/pull/30472#issuecomment-752881271


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38147/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30472: [SPARK-32221][k8s] Avoid possible errors due to incorrect file size or type supplied in spark conf.

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30472:
URL: https://github.com/apache/spark/pull/30472#issuecomment-748615298


   **[Test build #133105 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/133105/testReport)** for PR 30472 at commit [`cfdc71e`](https://github.com/apache/spark/commit/cfdc71ea847c4095a0070ae4a1db424bda7b8b06).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30472: [WIP][SPARK-32221] skip binary files and large files.

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30472:
URL: https://github.com/apache/spark/pull/30472#issuecomment-732601885






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] ScrapCodes commented on a change in pull request #30472: [SPARK-32221][k8s] Avoid possible errors due to incorrect file size or type supplied in spark conf.

Posted by GitBox <gi...@apache.org>.
ScrapCodes commented on a change in pull request #30472:
URL: https://github.com/apache/spark/pull/30472#discussion_r550406649



##########
File path: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/KubernetesClientUtils.scala
##########
@@ -90,29 +92,67 @@ private[spark] object KubernetesClientUtils extends Logging {
       .build()
   }
 
-  private def loadSparkConfDirFiles(conf: SparkConf): Map[String, String] = {
+  private def orderFilesBySize(confFiles: Seq[File]): Seq[File] = {
+    val fileToFileSizePairs = confFiles.map(f => (f, f.getName.length + f.length()))
+    // sort first by name and then by length, so that during tests we have consistent results.
+    fileToFileSizePairs.sortBy(f => f._1).sortBy(f => f._2).map(_._1)

Review comment:
       Values are actually sum total of the size of filename and it's content. Two items with same size (i.e. `"b" ->10 and "abcdef" ->10` ) would mean, that if first one did not fit, next one won't fit either. Anyway, I have addressed it, by -- update the `truncateMapSize` iff it `truncateMap` updated.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] ScrapCodes commented on a change in pull request #30472: [SPARK-32221][k8s] Avoid possible errors due to incorrect file size or type supplied in spark conf.

Posted by GitBox <gi...@apache.org>.
ScrapCodes commented on a change in pull request #30472:
URL: https://github.com/apache/spark/pull/30472#discussion_r537298366



##########
File path: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/KubernetesClientUtils.scala
##########
@@ -94,25 +120,41 @@ private[spark] object KubernetesClientUtils extends Logging {
     val confDir = Option(conf.getenv(ENV_SPARK_CONF_DIR)).orElse(
       conf.getOption("spark.home").map(dir => s"$dir/conf"))
     if (confDir.isDefined) {
-      val confFiles = listConfFiles(confDir.get)
-      logInfo(s"Spark configuration files loaded from $confDir : ${confFiles.mkString(",")}")
-      confFiles.map { file =>
-        val source = Source.fromFile(file)(Codec.UTF8)
-        val mapping = (file.getName -> source.mkString)
-        source.close()
-        mapping
-      }.toMap
+      val confFiles = listConfFiles(confDir.get, conf.get(Config.CONFIG_MAP_MAXSIZE))
+      val filesSeq: Seq[Option[(String, String)]] = confFiles.map { file =>
+        try {
+          val source = Source.fromFile(file)(Codec.UTF8)
+          val mapping = Some(file.getName -> source.mkString)
+          source.close()
+          mapping
+        } catch {
+          case e: MalformedInputException =>
+            logWarning(s"skipped a non UTF-8 encoded file ${file.getAbsolutePath}.", e)
+            None
+        }
+      }
+      val truncatedMap = truncateToSize(filesSeq.flatten, conf.get(Config.CONFIG_MAP_MAXSIZE))
+      logInfo(s"Spark configuration files loaded from $confDir :" +

Review comment:
       Challenge with trying to truncate before reading is, we do not know which files will have `MalformedInputException` i.e. they are not encoded with UTF-8. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30472: [SPARK-32221][k8s] Avoid possible errors due to incorrect file size or type supplied in spark conf.

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30472:
URL: https://github.com/apache/spark/pull/30472#issuecomment-755127203


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38305/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a change in pull request #30472: [SPARK-32221][k8s] Avoid possible errors due to incorrect file size or type supplied in spark conf.

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #30472:
URL: https://github.com/apache/spark/pull/30472#discussion_r535759798



##########
File path: resource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/SparkConfPropagateSuite.scala
##########
@@ -59,4 +59,42 @@ private[spark] trait SparkConfPropagateSuite { k8sSuite: KubernetesSuite =>
       new File(logConfFilePath).delete()
     }
   }
+
+  test("Verify large files and unsupported binary/non-utf8 encoded files are skipped.",
+    k8sTestTag) {
+    val loggingConfigFileName = "log-config-test-log4j.properties"
+    val loggingConfURL: URL = this.getClass.getClassLoader.getResource(loggingConfigFileName)
+    assert(loggingConfURL != null, "Logging configuration file not available.")
+
+    val content = Source.createBufferedSource(loggingConfURL.openStream()).getLines().mkString("\n")
+    val sparkConfDirPath = s"${sparkHomeDir.toFile}/conf"
+    val filesToGenerateBinary =
+      Seq("test.jar", "non-utf8.txt", "test.zip", "some-random-binary-file")
+    val binaryContent: Array[Byte] = (1 to 10000).map(_.toByte).toArray
+    val someValidFiles = Seq("config.xml", "log4j.properties", "utf8.txt")
+    try {
+

Review comment:
       Shall we remove this empty line?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30472: [SPARK-32221][k8s] Avoid possible errors due to incorrect file size or type supplied in spark conf.

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30472:
URL: https://github.com/apache/spark/pull/30472#issuecomment-739972810


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/36969/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #30472: [SPARK-32221][k8s] Avoid possible errors due to incorrect file size or type supplied in spark conf.

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #30472:
URL: https://github.com/apache/spark/pull/30472#issuecomment-739921858


   **[Test build #132369 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/132369/testReport)** for PR 30472 at commit [`6413f32`](https://github.com/apache/spark/commit/6413f32a084b9d7cd5119f48451f6802a9c7eef1).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] ScrapCodes edited a comment on pull request #30472: [SPARK-32221][k8s] Avoid possible errors due to incorrect file size or type supplied in spark conf.

Posted by GitBox <gi...@apache.org>.
ScrapCodes edited a comment on pull request #30472:
URL: https://github.com/apache/spark/pull/30472#issuecomment-738475971


   Hi @dongjoon-hyun , thank you.
    I have rebased it.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30472: [WIP][SPARK-32221] Avoid possible errors due to incorrect file size or type supplied in spark conf.

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30472:
URL: https://github.com/apache/spark/pull/30472#issuecomment-732966793


   **[Test build #131660 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131660/testReport)** for PR 30472 at commit [`8838aba`](https://github.com/apache/spark/commit/8838abafc9c659a67e912651279d2706a4a503fd).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a change in pull request #30472: [SPARK-32221][k8s] Avoid possible errors due to incorrect file size or type supplied in spark conf.

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #30472:
URL: https://github.com/apache/spark/pull/30472#discussion_r550361643



##########
File path: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala
##########
@@ -99,6 +99,14 @@ private[spark] object Config extends Logging {
       .toSequence
       .createWithDefault(Nil)
 
+  val CONFIG_MAP_MAXSIZE =
+    ConfigBuilder("spark.kubernetes.configMap.maxSize")
+      .doc("Max size limit for a config map. This is configurable as per" +
+        " https://etcd.io/docs/v3.4.0/dev-guide/limit/ on k8s server end.")
+      .version("3.1.0")
+      .longConf
+      .createWithDefault(1573000) // 1.5 MiB

Review comment:
       Is this correct? The spec means 1.5Mib = `1572864`. This value seems to be greater than that and will not prevent the failure.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30472: [SPARK-32221][k8s] Avoid possible errors due to incorrect file size or type supplied in spark conf.

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30472:
URL: https://github.com/apache/spark/pull/30472#issuecomment-755102361






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #30472: [SPARK-32221][k8s] Avoid possible errors due to incorrect file size or type supplied in spark conf.

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #30472:
URL: https://github.com/apache/spark/pull/30472#issuecomment-755065657


   **[Test build #133709 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/133709/testReport)** for PR 30472 at commit [`8a5e387`](https://github.com/apache/spark/commit/8a5e3871a3cd20c54a2ceafacc3114ad0cc3290c).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] ScrapCodes commented on pull request #30472: [SPARK-32221][k8s] Avoid possible errors due to incorrect file size or type supplied in spark conf.

Posted by GitBox <gi...@apache.org>.
ScrapCodes commented on pull request #30472:
URL: https://github.com/apache/spark/pull/30472#issuecomment-752895903


   Hi @dongjoon-hyun thank you for taking a look, I have updated as per your feedback, please take a look again.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30472: [SPARK-32221][k8s] Avoid possible errors due to incorrect file size or type supplied in spark conf.

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30472:
URL: https://github.com/apache/spark/pull/30472#issuecomment-754622826


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/38266/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30472: [SPARK-32221][k8s] Avoid possible errors due to incorrect file size or type supplied in spark conf.

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30472:
URL: https://github.com/apache/spark/pull/30472#issuecomment-752875130


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38148/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30472: [SPARK-32221][k8s] Avoid possible errors due to incorrect file size or type supplied in spark conf.

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30472:
URL: https://github.com/apache/spark/pull/30472#issuecomment-751676568


   Kubernetes integration test status success
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38036/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30472: [SPARK-32221][k8s] Avoid possible errors due to incorrect file size or type supplied in spark conf.

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30472:
URL: https://github.com/apache/spark/pull/30472#issuecomment-752866920


   **[Test build #133557 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/133557/testReport)** for PR 30472 at commit [`8d6a773`](https://github.com/apache/spark/commit/8d6a7736e3847a5db321bce3412526a7a0e2925b).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] ScrapCodes commented on pull request #30472: [SPARK-32221][k8s] Avoid possible errors due to incorrect file size or type supplied in spark conf.

Posted by GitBox <gi...@apache.org>.
ScrapCodes commented on pull request #30472:
URL: https://github.com/apache/spark/pull/30472#issuecomment-738475971


   Hi @dongjoon-hyun , I have rebased it.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30472: [SPARK-32221][k8s] Avoid possible errors due to incorrect file size or type supplied in spark conf.

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30472:
URL: https://github.com/apache/spark/pull/30472#issuecomment-737854577






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30472: [SPARK-32221][k8s] Avoid possible errors due to incorrect file size or type supplied in spark conf.

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30472:
URL: https://github.com/apache/spark/pull/30472#issuecomment-739774482


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/132349/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] ScrapCodes commented on pull request #30472: [SPARK-32221][k8s] Avoid possible errors due to incorrect file size or type supplied in spark conf.

Posted by GitBox <gi...@apache.org>.
ScrapCodes commented on pull request #30472:
URL: https://github.com/apache/spark/pull/30472#issuecomment-751632890


   Hi @dongjoon-hyun, I am sorry I was not well.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30472: [WIP][SPARK-32221] skip binary files and large files.

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30472:
URL: https://github.com/apache/spark/pull/30472#issuecomment-732261201






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30472: [SPARK-32221][k8s] Avoid possible errors due to incorrect file size or type supplied in spark conf.

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30472:
URL: https://github.com/apache/spark/pull/30472#issuecomment-752861651


   **[Test build #133556 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/133556/testReport)** for PR 30472 at commit [`92ad72c`](https://github.com/apache/spark/commit/92ad72c73162d5cd45c2f107ca925489d4bf339d).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30472: [SPARK-32221][k8s] Avoid possible errors due to incorrect file size or type supplied in spark conf.

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30472:
URL: https://github.com/apache/spark/pull/30472#issuecomment-751662807


   **[Test build #133447 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/133447/testReport)** for PR 30472 at commit [`0c4420a`](https://github.com/apache/spark/commit/0c4420ad07d2b988755ed482ba6d5c6d031aa525).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30472: [SPARK-32221][k8s] Avoid possible errors due to incorrect file size or type supplied in spark conf.

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30472:
URL: https://github.com/apache/spark/pull/30472#issuecomment-748616567


   **[Test build #133105 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/133105/testReport)** for PR 30472 at commit [`cfdc71e`](https://github.com/apache/spark/commit/cfdc71ea847c4095a0070ae4a1db424bda7b8b06).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #30472: [SPARK-32221][k8s] Avoid possible errors due to incorrect file size or type supplied in spark conf.

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #30472:
URL: https://github.com/apache/spark/pull/30472#issuecomment-751658256


   **[Test build #133447 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/133447/testReport)** for PR 30472 at commit [`0c4420a`](https://github.com/apache/spark/commit/0c4420ad07d2b988755ed482ba6d5c6d031aa525).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] ScrapCodes commented on pull request #30472: [SPARK-32221][k8s] Avoid possible errors due to incorrect file size or type supplied in spark conf.

Posted by GitBox <gi...@apache.org>.
ScrapCodes commented on pull request #30472:
URL: https://github.com/apache/spark/pull/30472#issuecomment-737198852


   Hi @dongjoon-hyun, sure !  please see #30568 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30472: [SPARK-32221][k8s] Avoid possible errors due to incorrect file size or type supplied in spark conf.

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30472:
URL: https://github.com/apache/spark/pull/30472#issuecomment-748616608


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/133105/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] ScrapCodes commented on a change in pull request #30472: [SPARK-32221][k8s] Avoid possible errors due to incorrect file size or type supplied in spark conf.

Posted by GitBox <gi...@apache.org>.
ScrapCodes commented on a change in pull request #30472:
URL: https://github.com/apache/spark/pull/30472#discussion_r537297478



##########
File path: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/KubernetesClientUtils.scala
##########
@@ -51,6 +54,29 @@ private[spark] object KubernetesClientUtils extends Logging {
     propertiesWriter.toString
   }
 
+  object StringLengthOrdering extends Ordering[(String, String)] {
+    override def compare(x: (String, String), y: (String, String)): Int = {
+      // compare based on file length and break the tie with string comparison of keys.
+      (x._1.length + x._2.length).compare(y._1.length + y._2.length) * 10 +
+        x._1.compareTo(y._1)
+    }
+  }
+
+  def truncateToSize(seq: Seq[(String, String)], maxSize: Long): Map[String, String] = {
+    // First order the entries in order of their size.
+    // Skip entries if the resulting Map size exceeds maxSize.
+    val ordering: Ordering[(String, String)] = StringLengthOrdering
+    val sortedSet = SortedSet[(String, String)](seq : _*)(ordering)
+    var i: Int = 0
+    val map = mutable.HashMap[String, String]()
+    for (item <- sortedSet) {
+      i += item._1.length + item._2.length
+      if ( i < maxSize) {

Review comment:
       P.S. I have evaluated the approach of trying to extend SortedSet, and some how make it size limited i.e. items cannot be added beyond a size is reached. But that was more complex in terms of code, so here I have chosen a piece of code that is more simpler to maintain and does not need an overhauling every time we upgrade scala major version.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30472: [SPARK-32221][k8s] Avoid possible errors due to incorrect file size or type supplied in spark conf.

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30472:
URL: https://github.com/apache/spark/pull/30472#issuecomment-752862950


   **[Test build #133556 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/133556/testReport)** for PR 30472 at commit [`92ad72c`](https://github.com/apache/spark/commit/92ad72c73162d5cd45c2f107ca925489d4bf339d).
    * This patch **fails to build**.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30472: [WIP][SPARK-32221] Avoid possible errors due to incorrect file size or type supplied in spark conf.

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30472:
URL: https://github.com/apache/spark/pull/30472#issuecomment-733543655


   **[Test build #131757 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131757/testReport)** for PR 30472 at commit [`534f2ff`](https://github.com/apache/spark/commit/534f2ffc3aa6f019e9c3f85b5f7d35be92c0c379).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30472: [SPARK-32221][k8s] Avoid possible errors due to incorrect file size or type supplied in spark conf.

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30472:
URL: https://github.com/apache/spark/pull/30472#issuecomment-733577468






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30472: [SPARK-32221][k8s] Avoid possible errors due to incorrect file size or type supplied in spark conf.

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30472:
URL: https://github.com/apache/spark/pull/30472#issuecomment-752864101


   **[Test build #133557 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/133557/testReport)** for PR 30472 at commit [`8d6a773`](https://github.com/apache/spark/commit/8d6a7736e3847a5db321bce3412526a7a0e2925b).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #30472: [SPARK-32221][k8s] Avoid possible errors due to incorrect file size or type supplied in spark conf.

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #30472:
URL: https://github.com/apache/spark/pull/30472#issuecomment-733543655


   **[Test build #131757 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131757/testReport)** for PR 30472 at commit [`534f2ff`](https://github.com/apache/spark/commit/534f2ffc3aa6f019e9c3f85b5f7d35be92c0c379).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] ScrapCodes edited a comment on pull request #30472: [SPARK-32221][k8s] Avoid possible errors due to incorrect file size or type supplied in spark conf.

Posted by GitBox <gi...@apache.org>.
ScrapCodes edited a comment on pull request #30472:
URL: https://github.com/apache/spark/pull/30472#issuecomment-734083543


   Hi @dongjoon-hyun, I have tried to improve test coverage and also covered some corner cases. Would you like to take a look here as well ?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] ScrapCodes commented on a change in pull request #30472: [SPARK-32221][k8s] Avoid possible errors due to incorrect file size or type supplied in spark conf.

Posted by GitBox <gi...@apache.org>.
ScrapCodes commented on a change in pull request #30472:
URL: https://github.com/apache/spark/pull/30472#discussion_r537298366



##########
File path: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/KubernetesClientUtils.scala
##########
@@ -94,25 +120,41 @@ private[spark] object KubernetesClientUtils extends Logging {
     val confDir = Option(conf.getenv(ENV_SPARK_CONF_DIR)).orElse(
       conf.getOption("spark.home").map(dir => s"$dir/conf"))
     if (confDir.isDefined) {
-      val confFiles = listConfFiles(confDir.get)
-      logInfo(s"Spark configuration files loaded from $confDir : ${confFiles.mkString(",")}")
-      confFiles.map { file =>
-        val source = Source.fromFile(file)(Codec.UTF8)
-        val mapping = (file.getName -> source.mkString)
-        source.close()
-        mapping
-      }.toMap
+      val confFiles = listConfFiles(confDir.get, conf.get(Config.CONFIG_MAP_MAXSIZE))
+      val filesSeq: Seq[Option[(String, String)]] = confFiles.map { file =>
+        try {
+          val source = Source.fromFile(file)(Codec.UTF8)
+          val mapping = Some(file.getName -> source.mkString)
+          source.close()
+          mapping
+        } catch {
+          case e: MalformedInputException =>
+            logWarning(s"skipped a non UTF-8 encoded file ${file.getAbsolutePath}.", e)
+            None
+        }
+      }
+      val truncatedMap = truncateToSize(filesSeq.flatten, conf.get(Config.CONFIG_MAP_MAXSIZE))
+      logInfo(s"Spark configuration files loaded from $confDir :" +

Review comment:
       Challenge with trying to truncate before reading is, we do not know which files will have `MalformedInputException` i.e. they are not encoded with UTF-8. 
   
   Yeah, there is one corner case, that if a user places a really large number of small files (<1.5MiB) in the SPARK_CONF_DIR then, we may run into SPARK OOM. The way to avoid is to not read everything in the memory.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] ScrapCodes commented on a change in pull request #30472: [SPARK-32221][k8s] Avoid possible errors due to incorrect file size or type supplied in spark conf.

Posted by GitBox <gi...@apache.org>.
ScrapCodes commented on a change in pull request #30472:
URL: https://github.com/apache/spark/pull/30472#discussion_r540704406



##########
File path: resource-managers/kubernetes/core/src/test/scala/org/apache/spark/deploy/k8s/submit/ClientSuite.scala
##########
@@ -191,25 +191,32 @@ class ClientSuite extends SparkFunSuite with BeforeAndAfter {
     assert(configMap.getData.get(SPARK_CONF_FILE_NAME).contains("conf2key=conf2value"))
   }
 
-  test("All files from SPARK_CONF_DIR, except templates and spark config " +
+  test("All files from SPARK_CONF_DIR, " +
+    "except templates, spark config, binary files and are within size limit, " +
     "should be populated to pod's configMap.") {
     def testSetup: (SparkConf, Seq[String]) = {
       val tempDir = Utils.createTempDir()
-      val sparkConf = new SparkConf(loadDefaults = false).setSparkHome(tempDir.getAbsolutePath)
+      val sparkConf = new SparkConf(loadDefaults = false)
+        .setSparkHome(tempDir.getAbsolutePath)
 
       val tempConfDir = new File(s"${tempDir.getAbsolutePath}/conf")
       tempConfDir.mkdir()
       // File names - which should not get mounted on the resultant config map.
       val filteredConfFileNames =
-        Set("spark-env.sh.template", "spark.properties", "spark-defaults.conf")
-      val confFileNames = for (i <- 1 to 5) yield s"testConf.$i" ++
+        Set("spark-env.sh.template", "spark.properties", "spark-defaults.conf",
+          "test.gz", "test2.jar", "non_utf8.txt")

Review comment:
       Hi @HyukjinKwon, thanks for taking a look. That would disallow a user from supplying his library/framework specific configuration. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a change in pull request #30472: [SPARK-32221][k8s] Avoid possible errors due to incorrect file size or type supplied in spark conf.

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on a change in pull request #30472:
URL: https://github.com/apache/spark/pull/30472#discussion_r540701621



##########
File path: resource-managers/kubernetes/core/src/test/scala/org/apache/spark/deploy/k8s/submit/ClientSuite.scala
##########
@@ -191,25 +191,32 @@ class ClientSuite extends SparkFunSuite with BeforeAndAfter {
     assert(configMap.getData.get(SPARK_CONF_FILE_NAME).contains("conf2key=conf2value"))
   }
 
-  test("All files from SPARK_CONF_DIR, except templates and spark config " +
+  test("All files from SPARK_CONF_DIR, " +
+    "except templates, spark config, binary files and are within size limit, " +
     "should be populated to pod's configMap.") {
     def testSetup: (SparkConf, Seq[String]) = {
       val tempDir = Utils.createTempDir()
-      val sparkConf = new SparkConf(loadDefaults = false).setSparkHome(tempDir.getAbsolutePath)
+      val sparkConf = new SparkConf(loadDefaults = false)
+        .setSparkHome(tempDir.getAbsolutePath)
 
       val tempConfDir = new File(s"${tempDir.getAbsolutePath}/conf")
       tempConfDir.mkdir()
       // File names - which should not get mounted on the resultant config map.
       val filteredConfFileNames =
-        Set("spark-env.sh.template", "spark.properties", "spark-defaults.conf")
-      val confFileNames = for (i <- 1 to 5) yield s"testConf.$i" ++
+        Set("spark-env.sh.template", "spark.properties", "spark-defaults.conf",
+          "test.gz", "test2.jar", "non_utf8.txt")

Review comment:
       @ScrapCodes, quick question. Is it possible to have a list of allowed items only instead?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] ScrapCodes commented on a change in pull request #30472: [SPARK-32221][k8s] Avoid possible errors due to incorrect file size or type supplied in spark conf.

Posted by GitBox <gi...@apache.org>.
ScrapCodes commented on a change in pull request #30472:
URL: https://github.com/apache/spark/pull/30472#discussion_r550406649



##########
File path: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/KubernetesClientUtils.scala
##########
@@ -90,29 +92,67 @@ private[spark] object KubernetesClientUtils extends Logging {
       .build()
   }
 
-  private def loadSparkConfDirFiles(conf: SparkConf): Map[String, String] = {
+  private def orderFilesBySize(confFiles: Seq[File]): Seq[File] = {
+    val fileToFileSizePairs = confFiles.map(f => (f, f.getName.length + f.length()))
+    // sort first by name and then by length, so that during tests we have consistent results.
+    fileToFileSizePairs.sortBy(f => f._1).sortBy(f => f._2).map(_._1)

Review comment:
       Values are actually sum total of the size of filename and it's content. Two items with same size would mean, that if first one did not fit, next one won't fit either. Anyway, I have addressed it.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30472: [WIP][SPARK-32221] Avoid possible errors due to incorrect file size or type supplied in spark conf.

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30472:
URL: https://github.com/apache/spark/pull/30472#issuecomment-733232808






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30472: [SPARK-32221][k8s] Avoid possible errors due to incorrect file size or type supplied in spark conf.

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30472:
URL: https://github.com/apache/spark/pull/30472#issuecomment-754547851


   **[Test build #133677 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/133677/testReport)** for PR 30472 at commit [`8a5e387`](https://github.com/apache/spark/commit/8a5e3871a3cd20c54a2ceafacc3114ad0cc3290c).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30472: [SPARK-32221][k8s] Avoid possible errors due to incorrect file size or type supplied in spark conf.

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30472:
URL: https://github.com/apache/spark/pull/30472#issuecomment-752881280


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/38147/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30472: [SPARK-32221][k8s] Avoid possible errors due to incorrect file size or type supplied in spark conf.

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30472:
URL: https://github.com/apache/spark/pull/30472#issuecomment-755077443


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38297/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] ScrapCodes commented on pull request #30472: [SPARK-32221][k8s] Avoid possible errors due to incorrect file size or type supplied in spark conf.

Posted by GitBox <gi...@apache.org>.
ScrapCodes commented on pull request #30472:
URL: https://github.com/apache/spark/pull/30472#issuecomment-754624222


   K8s integration is failing, I need to either fix the `- Run in client mode. *** FAILED **.*` suite or revert the change, `assert(dir.isDirectory, " ")`


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30472: [SPARK-32221][k8s] Avoid possible errors due to incorrect file size or type supplied in spark conf.

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30472:
URL: https://github.com/apache/spark/pull/30472#issuecomment-755065657


   **[Test build #133709 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/133709/testReport)** for PR 30472 at commit [`8a5e387`](https://github.com/apache/spark/commit/8a5e3871a3cd20c54a2ceafacc3114ad0cc3290c).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30472: [SPARK-32221][k8s] Avoid possible errors due to incorrect file size or type supplied in spark conf.

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30472:
URL: https://github.com/apache/spark/pull/30472#issuecomment-755089522


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38297/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30472: [SPARK-32221][k8s] Avoid possible errors due to incorrect file size or type supplied in spark conf.

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30472:
URL: https://github.com/apache/spark/pull/30472#issuecomment-751648508


   **[Test build #133442 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/133442/testReport)** for PR 30472 at commit [`38288f1`](https://github.com/apache/spark/commit/38288f1e49ec290fb8a9cb1435e11f3bd881f0dd).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30472: [WIP][SPARK-32221] Avoid possible errors due to incorrect file size or type supplied in spark conf.

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30472:
URL: https://github.com/apache/spark/pull/30472#issuecomment-733232808






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] ScrapCodes commented on a change in pull request #30472: [SPARK-32221][k8s] Avoid possible errors due to incorrect file size or type supplied in spark conf.

Posted by GitBox <gi...@apache.org>.
ScrapCodes commented on a change in pull request #30472:
URL: https://github.com/apache/spark/pull/30472#discussion_r550406649



##########
File path: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/KubernetesClientUtils.scala
##########
@@ -90,29 +92,67 @@ private[spark] object KubernetesClientUtils extends Logging {
       .build()
   }
 
-  private def loadSparkConfDirFiles(conf: SparkConf): Map[String, String] = {
+  private def orderFilesBySize(confFiles: Seq[File]): Seq[File] = {
+    val fileToFileSizePairs = confFiles.map(f => (f, f.getName.length + f.length()))
+    // sort first by name and then by length, so that during tests we have consistent results.
+    fileToFileSizePairs.sortBy(f => f._1).sortBy(f => f._2).map(_._1)

Review comment:
       Values are actually sum total of the size of filename and it's content. Two items with same size (i.e. `"b" ->10 and "abcdef" ->10` ) would mean, that if first one did not fit, next one won't fit either. Anyway, I have addressed it.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30472: [WIP][SPARK-32221] skip binary files and large files.

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30472:
URL: https://github.com/apache/spark/pull/30472#issuecomment-732296342






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30472: [SPARK-32221][k8s] Avoid possible errors due to incorrect file size or type supplied in spark conf.

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30472:
URL: https://github.com/apache/spark/pull/30472#issuecomment-733568741


   **[Test build #131761 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131761/testReport)** for PR 30472 at commit [`2d676cd`](https://github.com/apache/spark/commit/2d676cdeaed89ffe89f00de41350e51122b559d1).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] ScrapCodes commented on a change in pull request #30472: [SPARK-32221][k8s] Avoid possible errors due to incorrect file size or type supplied in spark conf.

Posted by GitBox <gi...@apache.org>.
ScrapCodes commented on a change in pull request #30472:
URL: https://github.com/apache/spark/pull/30472#discussion_r537298366



##########
File path: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/KubernetesClientUtils.scala
##########
@@ -94,25 +120,41 @@ private[spark] object KubernetesClientUtils extends Logging {
     val confDir = Option(conf.getenv(ENV_SPARK_CONF_DIR)).orElse(
       conf.getOption("spark.home").map(dir => s"$dir/conf"))
     if (confDir.isDefined) {
-      val confFiles = listConfFiles(confDir.get)
-      logInfo(s"Spark configuration files loaded from $confDir : ${confFiles.mkString(",")}")
-      confFiles.map { file =>
-        val source = Source.fromFile(file)(Codec.UTF8)
-        val mapping = (file.getName -> source.mkString)
-        source.close()
-        mapping
-      }.toMap
+      val confFiles = listConfFiles(confDir.get, conf.get(Config.CONFIG_MAP_MAXSIZE))
+      val filesSeq: Seq[Option[(String, String)]] = confFiles.map { file =>
+        try {
+          val source = Source.fromFile(file)(Codec.UTF8)
+          val mapping = Some(file.getName -> source.mkString)
+          source.close()
+          mapping
+        } catch {
+          case e: MalformedInputException =>
+            logWarning(s"skipped a non UTF-8 encoded file ${file.getAbsolutePath}.", e)
+            None
+        }
+      }
+      val truncatedMap = truncateToSize(filesSeq.flatten, conf.get(Config.CONFIG_MAP_MAXSIZE))
+      logInfo(s"Spark configuration files loaded from $confDir :" +

Review comment:
       ~~Challenge with trying to truncate before reading is, we do not know which files will have `MalformedInputException` i.e. they are not encoded with UTF-8.~~
   
   Yeah, there is one corner case, that if a user places a really large number of small files (<1.5MiB) in the SPARK_CONF_DIR then, we may run into SPARK OOM. The way to avoid is to not read everything in the memory.
   
   So now, we read only as much required in the memory and not everything. Thanks @srowen for correcting me on this.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30472: [WIP][SPARK-32221] skip binary files and large files.

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30472:
URL: https://github.com/apache/spark/pull/30472#issuecomment-732260988


   **[Test build #131554 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131554/testReport)** for PR 30472 at commit [`c99b2bc`](https://github.com/apache/spark/commit/c99b2bca50b2386067bdf09ffed5aec181deadb5).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30472: [WIP][SPARK-32221] skip binary files and large files.

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30472:
URL: https://github.com/apache/spark/pull/30472#issuecomment-732264692


   **[Test build #131555 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131555/testReport)** for PR 30472 at commit [`e8cfadd`](https://github.com/apache/spark/commit/e8cfaddfcdcfe15ed1a1dfd8e5fcf5b78621ee25).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30472: [SPARK-32221][k8s] Avoid possible errors due to incorrect file size or type supplied in spark conf.

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30472:
URL: https://github.com/apache/spark/pull/30472#issuecomment-739774482


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/132349/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] ScrapCodes commented on a change in pull request #30472: [SPARK-32221][k8s] Avoid possible errors due to incorrect file size or type supplied in spark conf.

Posted by GitBox <gi...@apache.org>.
ScrapCodes commented on a change in pull request #30472:
URL: https://github.com/apache/spark/pull/30472#discussion_r535900210



##########
File path: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/KubernetesClientUtils.scala
##########
@@ -94,25 +120,41 @@ private[spark] object KubernetesClientUtils extends Logging {
     val confDir = Option(conf.getenv(ENV_SPARK_CONF_DIR)).orElse(
       conf.getOption("spark.home").map(dir => s"$dir/conf"))
     if (confDir.isDefined) {
-      val confFiles = listConfFiles(confDir.get)
-      logInfo(s"Spark configuration files loaded from $confDir : ${confFiles.mkString(",")}")
-      confFiles.map { file =>
-        val source = Source.fromFile(file)(Codec.UTF8)
-        val mapping = (file.getName -> source.mkString)
-        source.close()
-        mapping
-      }.toMap
+      val confFiles = listConfFiles(confDir.get, conf.get(Config.CONFIG_MAP_MAXSIZE))
+      val filesSeq: Seq[Option[(String, String)]] = confFiles.map { file =>
+        try {
+          val source = Source.fromFile(file)(Codec.UTF8)
+          val mapping = Some(file.getName -> source.mkString)
+          source.close()
+          mapping
+        } catch {
+          case e: MalformedInputException =>
+            logWarning(s"skipped a non UTF-8 encoded file ${file.getAbsolutePath}.", e)
+            None
+        }
+      }
+      val truncatedMap = truncateToSize(filesSeq.flatten, conf.get(Config.CONFIG_MAP_MAXSIZE))
+      logInfo(s"Spark configuration files loaded from $confDir :" +
+        s" ${truncatedMap.keys.mkString(",")}")
+      truncatedMap
     } else {
       Map.empty[String, String]
     }
   }
 
-  private def listConfFiles(confDir: String): Seq[File] = {
-    // We exclude all the template files and user provided spark conf or properties.
-    // As spark properties are resolved in a different step.
+  private def listConfFiles(confDir: String, maxSize: Long): Seq[File] = {
+    // At the moment configmaps does not support storing binary content.
+    // configMaps do not allow for size greater than 1.5 MiB(configurable).
+    // https://etcd.io/docs/v3.4.0/dev-guide/limit/
+    // We exclude all the template files and user provided spark conf or properties,
+    // and binary files (e.g. jars and zip). Spark properties are resolved in a different step.
+    def test(f: File): Boolean = (f.length() + f.getName.length > maxSize) ||

Review comment:
       > BTW, why f.length() + f.getName.length > 0 is here?
   
   Ensures a single file which is larger than the total capacity for the config map is skipped. As we cannot accommodate it anyway.
   An alternative is to fail here with proper exception, what do you think?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30472: [SPARK-32221][k8s] Avoid possible errors due to incorrect file size or type supplied in spark conf.

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30472:
URL: https://github.com/apache/spark/pull/30472#issuecomment-755102361






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30472: [SPARK-32221][k8s] Avoid possible errors due to incorrect file size or type supplied in spark conf.

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30472:
URL: https://github.com/apache/spark/pull/30472#issuecomment-734212562






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30472: [SPARK-32221][k8s] Avoid possible errors due to incorrect file size or type supplied in spark conf.

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30472:
URL: https://github.com/apache/spark/pull/30472#issuecomment-752881865


   Kubernetes integration test status success
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38148/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a change in pull request #30472: [SPARK-32221][k8s] Avoid possible errors due to incorrect file size or type supplied in spark conf.

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #30472:
URL: https://github.com/apache/spark/pull/30472#discussion_r546150719



##########
File path: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/KubernetesClientUtils.scala
##########
@@ -90,29 +92,66 @@ private[spark] object KubernetesClientUtils extends Logging {
       .build()
   }
 
-  private def loadSparkConfDirFiles(conf: SparkConf): Map[String, String] = {
+  private def orderFilesBySize(confFiles: Seq[File]): Seq[File] = {
+    val fileToFileSizePairs = confFiles.map(f => (f, f.getName.length + f.length()))
+    // sort first by name and then by length, so that during tests we have consistent results.
+    fileToFileSizePairs.sortBy(f => f._1).sortBy(f => f._2).map(_._1)
+  }
+
+  // exposed for testing
+  private[submit] def loadSparkConfDirFiles(conf: SparkConf): Map[String, String] = {
     val confDir = Option(conf.getenv(ENV_SPARK_CONF_DIR)).orElse(
       conf.getOption("spark.home").map(dir => s"$dir/conf"))
+    val maxSize = conf.get(Config.CONFIG_MAP_MAXSIZE)
     if (confDir.isDefined) {
-      val confFiles = listConfFiles(confDir.get)
-      logInfo(s"Spark configuration files loaded from $confDir : ${confFiles.mkString(",")}")
-      confFiles.map { file =>
-        val source = Source.fromFile(file)(Codec.UTF8)
-        val mapping = (file.getName -> source.mkString)
-        source.close()
-        mapping
-      }.toMap
+      val confFiles: Seq[File] = listConfFiles(confDir.get, maxSize)
+      val orderedConfFiles = orderFilesBySize(confFiles)
+      var i: Long = 0
+      val truncatedMap = mutable.HashMap[String, String]()
+      for (file <- orderedConfFiles) {
+        var source: Source = Source.fromString("") // init with empty source.
+        try {
+          source = Source.fromFile(file)(Codec.UTF8)
+          val (fileName, fileContent) = file.getName -> source.mkString
+          if (fileContent.length > 0) {

Review comment:
       This seems like a behavior change in terms of empty conf file, doesn't it?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30472: [SPARK-32221][k8s] Avoid possible errors due to incorrect file size or type supplied in spark conf.

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30472:
URL: https://github.com/apache/spark/pull/30472#issuecomment-752863133


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/133556/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a change in pull request #30472: [SPARK-32221][k8s] Avoid possible errors due to incorrect file size or type supplied in spark conf.

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on a change in pull request #30472:
URL: https://github.com/apache/spark/pull/30472#discussion_r540701621



##########
File path: resource-managers/kubernetes/core/src/test/scala/org/apache/spark/deploy/k8s/submit/ClientSuite.scala
##########
@@ -191,25 +191,32 @@ class ClientSuite extends SparkFunSuite with BeforeAndAfter {
     assert(configMap.getData.get(SPARK_CONF_FILE_NAME).contains("conf2key=conf2value"))
   }
 
-  test("All files from SPARK_CONF_DIR, except templates and spark config " +
+  test("All files from SPARK_CONF_DIR, " +
+    "except templates, spark config, binary files and are within size limit, " +
     "should be populated to pod's configMap.") {
     def testSetup: (SparkConf, Seq[String]) = {
       val tempDir = Utils.createTempDir()
-      val sparkConf = new SparkConf(loadDefaults = false).setSparkHome(tempDir.getAbsolutePath)
+      val sparkConf = new SparkConf(loadDefaults = false)
+        .setSparkHome(tempDir.getAbsolutePath)
 
       val tempConfDir = new File(s"${tempDir.getAbsolutePath}/conf")
       tempConfDir.mkdir()
       // File names - which should not get mounted on the resultant config map.
       val filteredConfFileNames =
-        Set("spark-env.sh.template", "spark.properties", "spark-defaults.conf")
-      val confFileNames = for (i <- 1 to 5) yield s"testConf.$i" ++
+        Set("spark-env.sh.template", "spark.properties", "spark-defaults.conf",
+          "test.gz", "test2.jar", "non_utf8.txt")

Review comment:
       @ScrapCodes, quick question. Is it possible to have a list of allowed filenames only instead?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30472: [SPARK-32221][k8s] Avoid possible errors due to incorrect file size or type supplied in spark conf.

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30472:
URL: https://github.com/apache/spark/pull/30472#issuecomment-737781819


   **[Test build #132110 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/132110/testReport)** for PR 30472 at commit [`25e677f`](https://github.com/apache/spark/commit/25e677f1bcade7df997e578378bf3e7d0e9f8ec4).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #30472: [WIP][SPARK-32221] skip binary files and large files.

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #30472:
URL: https://github.com/apache/spark/pull/30472#issuecomment-732254059


   **[Test build #131555 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131555/testReport)** for PR 30472 at commit [`e8cfadd`](https://github.com/apache/spark/commit/e8cfaddfcdcfe15ed1a1dfd8e5fcf5b78621ee25).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] jaceklaskowski commented on a change in pull request #30472: [SPARK-32221][k8s] Avoid possible errors due to incorrect file size or type supplied in spark conf.

Posted by GitBox <gi...@apache.org>.
jaceklaskowski commented on a change in pull request #30472:
URL: https://github.com/apache/spark/pull/30472#discussion_r551818647



##########
File path: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/KubernetesClientUtils.scala
##########
@@ -156,11 +162,8 @@ private[spark] object KubernetesClientUtils extends Logging {
     }
     val confFiles: Seq[File] = {
       val dir = new File(confDir)
-      if (dir.isDirectory) {
-        dir.listFiles.filter(x => fileFilter(x)).toSeq
-      } else {
-        Nil
-      }
+      assert(dir.isDirectory, "Spark conf should be a directory.")

Review comment:
       nit: move this assertion up so it's caught early (at least around line 135). I also think it should be `IllegalArgumentException` instead.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30472: [SPARK-32221][k8s] Avoid possible errors due to incorrect file size or type supplied in spark conf.

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30472:
URL: https://github.com/apache/spark/pull/30472#issuecomment-751668701


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38036/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] ScrapCodes commented on a change in pull request #30472: [SPARK-32221][k8s] Avoid possible errors due to incorrect file size or type supplied in spark conf.

Posted by GitBox <gi...@apache.org>.
ScrapCodes commented on a change in pull request #30472:
URL: https://github.com/apache/spark/pull/30472#discussion_r550408715



##########
File path: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/KubernetesClientUtils.scala
##########
@@ -90,29 +92,67 @@ private[spark] object KubernetesClientUtils extends Logging {
       .build()
   }
 
-  private def loadSparkConfDirFiles(conf: SparkConf): Map[String, String] = {
+  private def orderFilesBySize(confFiles: Seq[File]): Seq[File] = {
+    val fileToFileSizePairs = confFiles.map(f => (f, f.getName.length + f.length()))
+    // sort first by name and then by length, so that during tests we have consistent results.
+    fileToFileSizePairs.sortBy(f => f._1).sortBy(f => f._2).map(_._1)
+  }
+
+  // exposed for testing
+  private[submit] def loadSparkConfDirFiles(conf: SparkConf): Map[String, String] = {
     val confDir = Option(conf.getenv(ENV_SPARK_CONF_DIR)).orElse(
       conf.getOption("spark.home").map(dir => s"$dir/conf"))
+    val maxSize = conf.get(Config.CONFIG_MAP_MAXSIZE)
     if (confDir.isDefined) {
-      val confFiles = listConfFiles(confDir.get)
-      logInfo(s"Spark configuration files loaded from $confDir : ${confFiles.mkString(",")}")
-      confFiles.map { file =>
-        val source = Source.fromFile(file)(Codec.UTF8)
-        val mapping = (file.getName -> source.mkString)
-        source.close()
-        mapping
-      }.toMap
+      val confFiles: Seq[File] = listConfFiles(confDir.get, maxSize)
+      val orderedConfFiles = orderFilesBySize(confFiles)
+      var truncatedMapSize: Long = 0
+      val truncatedMap = mutable.HashMap[String, String]()
+      var source: Source = Source.fromString("") // init with empty source.
+      for (file <- orderedConfFiles) {
+        try {
+          source = Source.fromFile(file)(Codec.UTF8)
+          val (fileName, fileContent) = file.getName -> source.mkString
+          truncatedMapSize = truncatedMapSize + (fileName.length + fileContent.length)
+          if (truncatedMapSize < maxSize) {
+            truncatedMap.put(fileName, fileContent)
+          } else {

Review comment:
       Now, we increment the size, only when the map is updated.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30472: [SPARK-32221][k8s] Avoid possible errors due to incorrect file size or type supplied in spark conf.

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30472:
URL: https://github.com/apache/spark/pull/30472#issuecomment-752869110


   **[Test build #133558 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/133558/testReport)** for PR 30472 at commit [`a893fd8`](https://github.com/apache/spark/commit/a893fd8ae1476ff487ef635cace6707fd09210aa).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a change in pull request #30472: [SPARK-32221][k8s] Avoid possible errors due to incorrect file size or type supplied in spark conf.

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #30472:
URL: https://github.com/apache/spark/pull/30472#discussion_r550365393



##########
File path: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/KubernetesClientUtils.scala
##########
@@ -90,29 +92,67 @@ private[spark] object KubernetesClientUtils extends Logging {
       .build()
   }
 
-  private def loadSparkConfDirFiles(conf: SparkConf): Map[String, String] = {
+  private def orderFilesBySize(confFiles: Seq[File]): Seq[File] = {
+    val fileToFileSizePairs = confFiles.map(f => (f, f.getName.length + f.length()))
+    // sort first by name and then by length, so that during tests we have consistent results.
+    fileToFileSizePairs.sortBy(f => f._1).sortBy(f => f._2).map(_._1)
+  }
+
+  // exposed for testing
+  private[submit] def loadSparkConfDirFiles(conf: SparkConf): Map[String, String] = {
     val confDir = Option(conf.getenv(ENV_SPARK_CONF_DIR)).orElse(
       conf.getOption("spark.home").map(dir => s"$dir/conf"))
+    val maxSize = conf.get(Config.CONFIG_MAP_MAXSIZE)
     if (confDir.isDefined) {
-      val confFiles = listConfFiles(confDir.get)
-      logInfo(s"Spark configuration files loaded from $confDir : ${confFiles.mkString(",")}")
-      confFiles.map { file =>
-        val source = Source.fromFile(file)(Codec.UTF8)
-        val mapping = (file.getName -> source.mkString)
-        source.close()
-        mapping
-      }.toMap
+      val confFiles: Seq[File] = listConfFiles(confDir.get, maxSize)
+      val orderedConfFiles = orderFilesBySize(confFiles)
+      var truncatedMapSize: Long = 0
+      val truncatedMap = mutable.HashMap[String, String]()
+      var source: Source = Source.fromString("") // init with empty source.
+      for (file <- orderedConfFiles) {
+        try {
+          source = Source.fromFile(file)(Codec.UTF8)
+          val (fileName, fileContent) = file.getName -> source.mkString
+          truncatedMapSize = truncatedMapSize + (fileName.length + fileContent.length)
+          if (truncatedMapSize < maxSize) {
+            truncatedMap.put(fileName, fileContent)
+          } else {
+            logWarning(s"Skipped a conf file $fileName, due to size constraint." +
+              s" Please see, config: `${Config.CONFIG_MAP_MAXSIZE.key}` for more details.")
+          }
+        } catch {
+          case e: MalformedInputException =>
+            truncatedMapSize = truncatedMapSize - (file.getName.length + file.length)
+            logWarning(
+              s"Unable to read a non UTF-8 encoded file ${file.getAbsolutePath}. Skipping...", e)
+            None
+        } finally {
+          source.close()
+        }
+      }
+      if (truncatedMap.nonEmpty) {
+        logInfo(s"Spark configuration files loaded from $confDir :" +
+          s" ${truncatedMap.keys.mkString(",")}")
+      }
+      truncatedMap.toMap
     } else {
       Map.empty[String, String]
     }
   }
 
-  private def listConfFiles(confDir: String): Seq[File] = {
-    // We exclude all the template files and user provided spark conf or properties.
-    // As spark properties are resolved in a different step.
+  private def listConfFiles(confDir: String, maxSize: Long): Seq[File] = {
+    // At the moment configmaps do not support storing binary content.
+    // configMaps do not allow for size greater than 1.5 MiB(configurable).
+    // https://etcd.io/docs/v3.4.0/dev-guide/limit/
+    // We exclude all the template files and user provided spark conf or properties,
+    // and binary files (e.g. jars and zip). Spark properties are resolved in a different step.
+    def testIfTooLargeOrBinary(f: File): Boolean = (f.length() + f.getName.length > maxSize) ||

Review comment:
       `testIfTooLargeOrBinary` looks misleading because it doesn't include `f.getName.matches(".*\\.template") || f.getName.matches("spark.*(conf|properties)")`.

##########
File path: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/KubernetesClientUtils.scala
##########
@@ -90,29 +92,67 @@ private[spark] object KubernetesClientUtils extends Logging {
       .build()
   }
 
-  private def loadSparkConfDirFiles(conf: SparkConf): Map[String, String] = {
+  private def orderFilesBySize(confFiles: Seq[File]): Seq[File] = {
+    val fileToFileSizePairs = confFiles.map(f => (f, f.getName.length + f.length()))
+    // sort first by name and then by length, so that during tests we have consistent results.
+    fileToFileSizePairs.sortBy(f => f._1).sortBy(f => f._2).map(_._1)

Review comment:
       Ur, this has a bug because we will consider file name later by `truncatedMapSize = truncatedMapSize + (fileName.length + fileContent.length)`.
   ```scala
   scala> Seq("b" -> 1, "abcdef" -> 1).sortBy(f => f._1).sortBy(f => f._2)
   res14: Seq[(String, Int)] = List((abcdef,1), (b,1))
   
   scala> Seq("b" -> 1, "abcdef" -> 1).sortBy(f => f._1).sortBy(f => f._2).map(f => f._1.length + f._2)
   res15: Seq[Int] = List(7, 2)
   ```

##########
File path: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/KubernetesClientUtils.scala
##########
@@ -90,29 +92,67 @@ private[spark] object KubernetesClientUtils extends Logging {
       .build()
   }
 
-  private def loadSparkConfDirFiles(conf: SparkConf): Map[String, String] = {
+  private def orderFilesBySize(confFiles: Seq[File]): Seq[File] = {
+    val fileToFileSizePairs = confFiles.map(f => (f, f.getName.length + f.length()))
+    // sort first by name and then by length, so that during tests we have consistent results.
+    fileToFileSizePairs.sortBy(f => f._1).sortBy(f => f._2).map(_._1)
+  }
+
+  // exposed for testing
+  private[submit] def loadSparkConfDirFiles(conf: SparkConf): Map[String, String] = {
     val confDir = Option(conf.getenv(ENV_SPARK_CONF_DIR)).orElse(
       conf.getOption("spark.home").map(dir => s"$dir/conf"))
+    val maxSize = conf.get(Config.CONFIG_MAP_MAXSIZE)
     if (confDir.isDefined) {
-      val confFiles = listConfFiles(confDir.get)
-      logInfo(s"Spark configuration files loaded from $confDir : ${confFiles.mkString(",")}")
-      confFiles.map { file =>
-        val source = Source.fromFile(file)(Codec.UTF8)
-        val mapping = (file.getName -> source.mkString)
-        source.close()
-        mapping
-      }.toMap
+      val confFiles: Seq[File] = listConfFiles(confDir.get, maxSize)
+      val orderedConfFiles = orderFilesBySize(confFiles)
+      var truncatedMapSize: Long = 0
+      val truncatedMap = mutable.HashMap[String, String]()
+      var source: Source = Source.fromString("") // init with empty source.
+      for (file <- orderedConfFiles) {
+        try {
+          source = Source.fromFile(file)(Codec.UTF8)
+          val (fileName, fileContent) = file.getName -> source.mkString
+          truncatedMapSize = truncatedMapSize + (fileName.length + fileContent.length)
+          if (truncatedMapSize < maxSize) {
+            truncatedMap.put(fileName, fileContent)
+          } else {

Review comment:
       In `else`, we should reduce `truncatedMapSize` back because the next file can have a shorter name.
   
   In case of `MalformedInputException`, we already has the same logic.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30472: [SPARK-32221][k8s] Avoid possible errors due to incorrect file size or type supplied in spark conf.

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30472:
URL: https://github.com/apache/spark/pull/30472#issuecomment-755070257


   **[Test build #133709 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/133709/testReport)** for PR 30472 at commit [`8a5e387`](https://github.com/apache/spark/commit/8a5e3871a3cd20c54a2ceafacc3114ad0cc3290c).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] ScrapCodes commented on a change in pull request #30472: [SPARK-32221][k8s] Avoid possible errors due to incorrect file size or type supplied in spark conf.

Posted by GitBox <gi...@apache.org>.
ScrapCodes commented on a change in pull request #30472:
URL: https://github.com/apache/spark/pull/30472#discussion_r537297478



##########
File path: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/KubernetesClientUtils.scala
##########
@@ -51,6 +54,29 @@ private[spark] object KubernetesClientUtils extends Logging {
     propertiesWriter.toString
   }
 
+  object StringLengthOrdering extends Ordering[(String, String)] {
+    override def compare(x: (String, String), y: (String, String)): Int = {
+      // compare based on file length and break the tie with string comparison of keys.
+      (x._1.length + x._2.length).compare(y._1.length + y._2.length) * 10 +
+        x._1.compareTo(y._1)
+    }
+  }
+
+  def truncateToSize(seq: Seq[(String, String)], maxSize: Long): Map[String, String] = {
+    // First order the entries in order of their size.
+    // Skip entries if the resulting Map size exceeds maxSize.
+    val ordering: Ordering[(String, String)] = StringLengthOrdering
+    val sortedSet = SortedSet[(String, String)](seq : _*)(ordering)
+    var i: Int = 0
+    val map = mutable.HashMap[String, String]()
+    for (item <- sortedSet) {
+      i += item._1.length + item._2.length
+      if ( i < maxSize) {

Review comment:
       
   
   P.S. I have evaluated the approach of trying to extend SortedSet, and some how make it size limited i.e. items cannot be added beyond a size is reached. This seemed to be simpler to maintain.
   
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30472: [SPARK-32221][k8s] Avoid possible errors due to incorrect file size or type supplied in spark conf.

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30472:
URL: https://github.com/apache/spark/pull/30472#issuecomment-754622826


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/38266/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30472: [SPARK-32221][k8s] Avoid possible errors due to incorrect file size or type supplied in spark conf.

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30472:
URL: https://github.com/apache/spark/pull/30472#issuecomment-739928903


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/132369/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30472: [SPARK-32221][k8s] Avoid possible errors due to incorrect file size or type supplied in spark conf.

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30472:
URL: https://github.com/apache/spark/pull/30472#issuecomment-754566862


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/133677/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30472: [SPARK-32221][k8s] Avoid possible errors due to incorrect file size or type supplied in spark conf.

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30472:
URL: https://github.com/apache/spark/pull/30472#issuecomment-752872534


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/133558/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30472: [SPARK-32221][k8s] Avoid possible errors due to incorrect file size or type supplied in spark conf.

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30472:
URL: https://github.com/apache/spark/pull/30472#issuecomment-752881913


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/38148/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30472: [SPARK-32221][k8s] Avoid possible errors due to incorrect file size or type supplied in spark conf.

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30472:
URL: https://github.com/apache/spark/pull/30472#issuecomment-733567813






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30472: [SPARK-32221][k8s] Avoid possible errors due to incorrect file size or type supplied in spark conf.

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30472:
URL: https://github.com/apache/spark/pull/30472#issuecomment-751658256


   **[Test build #133447 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/133447/testReport)** for PR 30472 at commit [`0c4420a`](https://github.com/apache/spark/commit/0c4420ad07d2b988755ed482ba6d5c6d031aa525).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30472: [SPARK-32221][k8s] Avoid possible errors due to incorrect file size or type supplied in spark conf.

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30472:
URL: https://github.com/apache/spark/pull/30472#issuecomment-752872399


   **[Test build #133558 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/133558/testReport)** for PR 30472 at commit [`a893fd8`](https://github.com/apache/spark/commit/a893fd8ae1476ff487ef635cace6707fd09210aa).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30472: [SPARK-32221][k8s] Avoid possible errors due to incorrect file size or type supplied in spark conf.

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30472:
URL: https://github.com/apache/spark/pull/30472#issuecomment-733726090






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org