You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/03/26 07:46:44 UTC

[GitHub] [spark] lw33 opened a new pull request #35979: [SPARK-38664][Core] EventLog compact will fail if there are illegal characters in the path

lw33 opened a new pull request #35979:
URL: https://github.com/apache/spark/pull/35979


   ### What changes were proposed in this pull request?
   Do not escape illegal characters.
   
   ### Why are the changes needed?
   Compact EventLog fail.
   
   ### Does this PR introduce any user-facing change?
   No.
   
   ### How was this patch tested?
   Unit Test


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a change in pull request #35979: [SPARK-38664][SS] Support compact EventLog when there are illegal characters in the path

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #35979:
URL: https://github.com/apache/spark/pull/35979#discussion_r841147264



##########
File path: core/src/main/scala/org/apache/spark/deploy/history/EventLogFileCompactor.scala
##########
@@ -221,5 +221,5 @@ private class CompactedEventLogFileWriter(
     hadoopConf: Configuration)
   extends SingleEventLogFileWriter(appId, appAttemptId, logBaseDir, sparkConf, hadoopConf) {
 
-  override val logPath: String = originalFilePath.toUri.toString + EventLogFileWriter.COMPACTED

Review comment:
       Is this new code safe with S3 filesystem?
   ```scala
   override val logPath: String = originalFilePath.toString + EventLogFileWriter.COMPACTED
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] lw33 commented on a change in pull request #35979: [SPARK-38664][DEPLOY] EventLog compact will fail if there are illegal characters in the path

Posted by GitBox <gi...@apache.org>.
lw33 commented on a change in pull request #35979:
URL: https://github.com/apache/spark/pull/35979#discussion_r835858773



##########
File path: core/src/test/scala/org/apache/spark/deploy/history/EventLogFileCompactorSuite.scala
##########
@@ -119,7 +119,7 @@ class EventLogFileCompactorSuite extends SparkFunSuite {
     withTempDir { dir =>
       val fs = new Path(dir.getAbsolutePath).getFileSystem(hadoopConf)
 
-      val fileStatuses = writeEventsToRollingWriter(fs, "app", dir, sparkConf, hadoopConf,
+      val fileStatuses = writeEventsToRollingWriter(fs, "app a", dir, sparkConf, hadoopConf,

Review comment:
       My mistake,i fixed this. Plz check it again.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on a change in pull request #35979: [SPARK-38664][DEPLOY] Support compact EventLog when there are illegal characters in the path

Posted by GitBox <gi...@apache.org>.
LuciferYang commented on a change in pull request #35979:
URL: https://github.com/apache/spark/pull/35979#discussion_r836021224



##########
File path: core/src/test/scala/org/apache/spark/deploy/history/EventLogFileCompactorSuite.scala
##########
@@ -284,6 +286,20 @@ class EventLogFileCompactorSuite extends SparkFunSuite {
     }
   }
 
+  test("SPARK-38664: Support compact EventLog when there are illegal characters in the path") {
+    withTempDir { dir =>

Review comment:
       If you still want to use `withTempDir`, you need to refactor `withTempDir` to accept `namePrefix`(but this will involve many file changes). 
   
   At present, two `tmpdirs` are created in this way... which is not necessary
   
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on a change in pull request #35979: [SPARK-38664][DEPLOY] EventLog compact will fail if there are illegal characters in the path

Posted by GitBox <gi...@apache.org>.
LuciferYang commented on a change in pull request #35979:
URL: https://github.com/apache/spark/pull/35979#discussion_r835899221



##########
File path: core/src/test/scala/org/apache/spark/deploy/history/EventLogFileCompactorSuite.scala
##########
@@ -119,7 +119,7 @@ class EventLogFileCompactorSuite extends SparkFunSuite {
     withTempDir { dir =>
       val fs = new Path(dir.getAbsolutePath).getFileSystem(hadoopConf)
 
-      val fileStatuses = writeEventsToRollingWriter(fs, "app", dir, sparkConf, hadoopConf,
+      val fileStatuses = writeEventsToRollingWriter(fs, "app a", dir, sparkConf, hadoopConf,

Review comment:
       On second thought, I think in the production environment, illegal characters(such as %) may appear in `basedir`, maybe we can add a new case as follows:
   
   ```scala
   test("SPARK-38664: ${Test display name}") {
       // create a base dir with illegal characters
       val dir = Utils.createTempDir("spark-%")
       try {
         // do test 
       } finally {
         Utils.deleteRecursively(dir)
       }
     }
   
   ```
   
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a change in pull request #35979: [SPARK-38664][SS] Support compact EventLog when there are illegal characters in the path

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #35979:
URL: https://github.com/apache/spark/pull/35979#discussion_r841147264



##########
File path: core/src/main/scala/org/apache/spark/deploy/history/EventLogFileCompactor.scala
##########
@@ -221,5 +221,5 @@ private class CompactedEventLogFileWriter(
     hadoopConf: Configuration)
   extends SingleEventLogFileWriter(appId, appAttemptId, logBaseDir, sparkConf, hadoopConf) {
 
-  override val logPath: String = originalFilePath.toUri.toString + EventLogFileWriter.COMPACTED

Review comment:
       Is this new code at line 224 safe with S3 filesystem?
   ```scala
   override val logPath: String = originalFilePath.toString + EventLogFileWriter.COMPACTED
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on a change in pull request #35979: [SPARK-38664][DEPLOY] EventLog compact will fail if there are illegal characters in the path

Posted by GitBox <gi...@apache.org>.
LuciferYang commented on a change in pull request #35979:
URL: https://github.com/apache/spark/pull/35979#discussion_r835854993



##########
File path: core/src/test/scala/org/apache/spark/deploy/history/EventLogFileCompactorSuite.scala
##########
@@ -119,7 +119,7 @@ class EventLogFileCompactorSuite extends SparkFunSuite {
     withTempDir { dir =>
       val fs = new Path(dir.getAbsolutePath).getFileSystem(hadoopConf)
 
-      val fileStatuses = writeEventsToRollingWriter(fs, "app", dir, sparkConf, hadoopConf,
+      val fileStatuses = writeEventsToRollingWriter(fs, "app a", dir, sparkConf, hadoopConf,

Review comment:
       I check this case as follows:
   
   ```
   gh pr checkout 35979
   manually revert change of EventLogFileCompactor.scala
   mvn clean install -DskipTests -pl core -am
   mvn test -pl core -Dtest=none -DwildcardSuites=org.apache.spark.deploy.history.EventLogFileCompactorSuite
   ```
   Test result as follows:
   ```
   Discovery starting.
   Discovery completed in 1 second, 647 milliseconds.
   Run starting. Expected test count is: 9
   EventLogFileCompactorSuite:
   - No event log files
   - No compact file, less origin files available than max files to retain
   - No compact file, more origin files available than max files to retain
   - compact file exists, less origin files available than max files to retain
   - compact file exists, number of origin files are same as max files to retain
   - compact file exists, more origin files available than max files to retain
   - events for finished job are dropped in new compact file
   - Don't compact file if score is lower than threshold
   - rewrite files with test filters
   Run completed in 2 seconds, 481 milliseconds.
   Total number of tests run: 9
   Suites: completed 2, aborted 0
   Tests: succeeded 9, failed 0, canceled 0, ignored 0, pending 0
   All tests passed.
   ```
   
   It seems that the new UT is successful without this pr. @lw33 
   
   Can you help double check? @HyukjinKwon 
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a change in pull request #35979: [SPARK-38664][SS] Support compact EventLog when there are illegal characters in the path

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #35979:
URL: https://github.com/apache/spark/pull/35979#discussion_r841147264



##########
File path: core/src/main/scala/org/apache/spark/deploy/history/EventLogFileCompactor.scala
##########
@@ -221,5 +221,5 @@ private class CompactedEventLogFileWriter(
     hadoopConf: Configuration)
   extends SingleEventLogFileWriter(appId, appAttemptId, logBaseDir, sparkConf, hadoopConf) {
 
-  override val logPath: String = originalFilePath.toUri.toString + EventLogFileWriter.COMPACTED

Review comment:
       Is this new code safe with S3 filesystem?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] lw33 commented on pull request #35979: [SPARK-38664][CORE] Support compact EventLog when there are illegal characters in the path

Posted by GitBox <gi...@apache.org>.
lw33 commented on pull request #35979:
URL: https://github.com/apache/spark/pull/35979#issuecomment-1086831394


   > Sorry guys. Supporting `illegal char` by removing `toURI` doesn't look like a safe improvement to me.
   > 
   > Given the trade-off between benefit and risk, we had better recommend to avoid this kind of **illegal char** usage instead, @lw33 .
   
   Why we need `toURI` here, i see the parent class `SingleEventLogFileWriter` `logpath` is `Path.toString`.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a change in pull request #35979: [SPARK-38664][CORE] Support compact EventLog when there are illegal characters in the path

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #35979:
URL: https://github.com/apache/spark/pull/35979#discussion_r841157417



##########
File path: core/src/main/scala/org/apache/spark/deploy/history/EventLogFileCompactor.scala
##########
@@ -221,5 +221,5 @@ private class CompactedEventLogFileWriter(
     hadoopConf: Configuration)
   extends SingleEventLogFileWriter(appId, appAttemptId, logBaseDir, sparkConf, hadoopConf) {
 
-  override val logPath: String = originalFilePath.toUri.toString + EventLogFileWriter.COMPACTED

Review comment:
       Ya, it looks like that to me. Especially with Apache Hadoop 3.3.2.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on a change in pull request #35979: [SPARK-38664][DEPLOY] Support compact EventLog when there are illegal characters in the path

Posted by GitBox <gi...@apache.org>.
LuciferYang commented on a change in pull request #35979:
URL: https://github.com/apache/spark/pull/35979#discussion_r836021224



##########
File path: core/src/test/scala/org/apache/spark/deploy/history/EventLogFileCompactorSuite.scala
##########
@@ -284,6 +286,20 @@ class EventLogFileCompactorSuite extends SparkFunSuite {
     }
   }
 
+  test("SPARK-38664: Support compact EventLog when there are illegal characters in the path") {
+    withTempDir { dir =>

Review comment:
       ~~If you still want to use `withTempDir`, you need to refactor `withTempDir` to accept `namePrefix`(but this will involve many file changes). 
   
   At present, two `tmpdirs` are created in this way... which is not necessary ~~~
   
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] lw33 commented on pull request #35979: [SPARK-38664][DEPLOY] Support compact EventLog when there are illegal characters in the path

Posted by GitBox <gi...@apache.org>.
lw33 commented on pull request #35979:
URL: https://github.com/apache/spark/pull/35979#issuecomment-1082654819


   @HeartSaVioR cc


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya commented on a change in pull request #35979: [SPARK-38664][SS] Support compact EventLog when there are illegal characters in the path

Posted by GitBox <gi...@apache.org>.
viirya commented on a change in pull request #35979:
URL: https://github.com/apache/spark/pull/35979#discussion_r841148392



##########
File path: core/src/main/scala/org/apache/spark/deploy/history/EventLogFileCompactor.scala
##########
@@ -221,5 +221,5 @@ private class CompactedEventLogFileWriter(
     hadoopConf: Configuration)
   extends SingleEventLogFileWriter(appId, appAttemptId, logBaseDir, sparkConf, hadoopConf) {
 
-  override val logPath: String = originalFilePath.toUri.toString + EventLogFileWriter.COMPACTED

Review comment:
       This looks like to have unexpected regression, if we are not sure the change is compatible with other file systems.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on pull request #35979: [SPARK-38664][DEPLOY] Support compact EventLog when there are illegal characters in the path

Posted by GitBox <gi...@apache.org>.
LuciferYang commented on pull request #35979:
URL: https://github.com/apache/spark/pull/35979#issuecomment-1080128652


   LGTM + 1


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #35979: [SPARK-38664][Deploy] EventLog compact will fail if there are illegal characters in the path

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #35979:
URL: https://github.com/apache/spark/pull/35979#issuecomment-1079702024


   Can one of the admins verify this patch?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on a change in pull request #35979: [SPARK-38664][DEPLOY] Support compact EventLog when there are illegal characters in the path

Posted by GitBox <gi...@apache.org>.
LuciferYang commented on a change in pull request #35979:
URL: https://github.com/apache/spark/pull/35979#discussion_r836022524



##########
File path: core/src/test/scala/org/apache/spark/deploy/history/EventLogFileCompactorSuite.scala
##########
@@ -284,6 +286,20 @@ class EventLogFileCompactorSuite extends SparkFunSuite {
     }
   }
 
+  test("SPARK-38664: Support compact EventLog when there are illegal characters in the path") {
+    withTempDir { dir =>

Review comment:
       Ignore previous comments
   
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #35979: [SPARK-38664][DEPLOY] EventLog compact will fail if there are illegal characters in the path

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on pull request #35979:
URL: https://github.com/apache/spark/pull/35979#issuecomment-1079806283


   cc @HeartSaVioR FYI


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a change in pull request #35979: [SPARK-38664][SS] Support compact EventLog when there are illegal characters in the path

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #35979:
URL: https://github.com/apache/spark/pull/35979#discussion_r841147264



##########
File path: core/src/main/scala/org/apache/spark/deploy/history/EventLogFileCompactor.scala
##########
@@ -221,5 +221,5 @@ private class CompactedEventLogFileWriter(
     hadoopConf: Configuration)
   extends SingleEventLogFileWriter(appId, appAttemptId, logBaseDir, sparkConf, hadoopConf) {
 
-  override val logPath: String = originalFilePath.toUri.toString + EventLogFileWriter.COMPACTED

Review comment:
       Is this safe with S3 filesystem?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on pull request #35979: [SPARK-38664][CORE] Support compact EventLog when there are illegal characters in the path

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on pull request #35979:
URL: https://github.com/apache/spark/pull/35979#issuecomment-1086919965


   Back to the original proposal, why do we need to support `illegal char`, @lw33 ? It's illegal, isn't it?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org