You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2021/02/02 08:13:44 UTC

[GitHub] [spark] HeartSaVioR opened a new pull request #31435: [SPARK-34326][CORE][SQL] Fix UTs added in SPARK-31793 depending on the length of temp path

HeartSaVioR opened a new pull request #31435:
URL: https://github.com/apache/spark/pull/31435


   ### What changes were proposed in this pull request?
   
   This PR proposes to fix the UTs being added in SPARK-31793, so that all things contributing the length limit are properly accounted.
   
   ### Why are the changes needed?
   
   The test `DataSourceScanExecRedactionSuite.SPARK-31793: FileSourceScanExec metadata should contain limited file paths` is failing conditionally, depending on the length of the temp directory.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No.
   
   ### How was this patch tested?
   
   Modified UTs explain the missing points, which also do the test.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31435: [SPARK-34326][CORE][SQL] Fix UTs added in SPARK-31793 depending on the length of temp path

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31435:
URL: https://github.com/apache/spark/pull/31435#issuecomment-771550064






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #31435: [SPARK-34326][CORE][SQL] Fix UTs added in SPARK-31793 depending on the length of temp path

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #31435:
URL: https://github.com/apache/spark/pull/31435#issuecomment-771651352


   **[Test build #134782 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134782/testReport)** for PR 31435 at commit [`36943c6`](https://github.com/apache/spark/commit/36943c6563278a86131e0d41b3334a538366d597).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on a change in pull request #31435: [SPARK-34326][CORE][SQL] Fix UTs added in SPARK-31793 depending on the length of temp path

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on a change in pull request #31435:
URL: https://github.com/apache/spark/pull/31435#discussion_r568565607



##########
File path: sql/core/src/test/scala/org/apache/spark/sql/execution/DataSourceScanExecRedactionSuite.scala
##########
@@ -137,9 +137,24 @@ class DataSourceScanExecRedactionSuite extends DataSourceScanRedactionTest {
       assert(location.isDefined)
       // The location metadata should at least contain one path
       assert(location.get.contains(paths.head))
-      // If the temp path length is larger than 100, the metadata length should not exceed
-      // twice of the length; otherwise, the metadata length should be controlled within 200.
-      assert(location.get.length < Math.max(paths.head.length, 100) * 2)
+
+      // The location metadata should have bracket wrapping paths
+      assert(location.get.indexOf('[') > -1)
+      assert(location.get.indexOf(']') > -1)
+
+      // extract paths in location metadata (removing classname, brackets, separators)
+      val pathsInLocation = location.get.substring(
+        location.get.indexOf('[') + 1, location.get.indexOf(']')).split(", ").toSeq
+
+      // If the temp path length is less than (stop appending threshold - 1), say, 100 - 1 = 99,
+      // location should include more than one paths. Otherwise location should include only one
+      // path.
+      // (Note we apply subtraction with 1 to count start bracket '['.)
+      if (paths.head.length < 99) {
+        assert(pathsInLocation.size >= 2)

Review comment:
       I'm not sure I understand. Could you please elaborate?
   
   We don't truncate the path itself, right? I think it's also something to be fixed (I'd rather want to see path being truncated with ellipses (...) instead of not adding and leaving it as it is.) but it's more likely bigger fix which may worth another fix instead of test fix.
   
   If you meant counting the number of paths or something for edge-case, dealing with another UT would be easier, like we could simply add edge-cases there in this PR.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31435: [SPARK-34326][CORE][SQL] Fix UTs added in SPARK-31793 depending on the length of temp path

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31435:
URL: https://github.com/apache/spark/pull/31435#issuecomment-771754529


   **[Test build #134782 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134782/testReport)** for PR 31435 at commit [`36943c6`](https://github.com/apache/spark/commit/36943c6563278a86131e0d41b3334a538366d597).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31435: [SPARK-34326][CORE][SQL] Fix UTs added in SPARK-31793 depending on the length of temp path

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31435:
URL: https://github.com/apache/spark/pull/31435#issuecomment-771693400


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39369/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on pull request #31435: [SPARK-34326][CORE][SQL] Fix UTs added in SPARK-31793 depending on the length of temp path

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on pull request #31435:
URL: https://github.com/apache/spark/pull/31435#issuecomment-772057238


   Merging to master/3.1.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] maropu commented on pull request #31435: [SPARK-34326][CORE][SQL] Fix UTs added in SPARK-31793 depending on the length of temp path

Posted by GitBox <gi...@apache.org>.
maropu commented on pull request #31435:
URL: https://github.com/apache/spark/pull/31435#issuecomment-771599722


   Nice catch!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31435: [SPARK-34326][CORE][SQL] Fix UTs added in SPARK-31793 depending on the length of temp path

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31435:
URL: https://github.com/apache/spark/pull/31435#issuecomment-771651352


   **[Test build #134782 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134782/testReport)** for PR 31435 at commit [`36943c6`](https://github.com/apache/spark/commit/36943c6563278a86131e0d41b3334a538366d597).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on a change in pull request #31435: [SPARK-34326][CORE][SQL] Fix UTs added in SPARK-31793 depending on the length of temp path

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on a change in pull request #31435:
URL: https://github.com/apache/spark/pull/31435#discussion_r568565607



##########
File path: sql/core/src/test/scala/org/apache/spark/sql/execution/DataSourceScanExecRedactionSuite.scala
##########
@@ -137,9 +137,24 @@ class DataSourceScanExecRedactionSuite extends DataSourceScanRedactionTest {
       assert(location.isDefined)
       // The location metadata should at least contain one path
       assert(location.get.contains(paths.head))
-      // If the temp path length is larger than 100, the metadata length should not exceed
-      // twice of the length; otherwise, the metadata length should be controlled within 200.
-      assert(location.get.length < Math.max(paths.head.length, 100) * 2)
+
+      // The location metadata should have bracket wrapping paths
+      assert(location.get.indexOf('[') > -1)
+      assert(location.get.indexOf(']') > -1)
+
+      // extract paths in location metadata (removing classname, brackets, separators)
+      val pathsInLocation = location.get.substring(
+        location.get.indexOf('[') + 1, location.get.indexOf(']')).split(", ").toSeq
+
+      // If the temp path length is less than (stop appending threshold - 1), say, 100 - 1 = 99,
+      // location should include more than one paths. Otherwise location should include only one
+      // path.
+      // (Note we apply subtraction with 1 to count start bracket '['.)
+      if (paths.head.length < 99) {
+        assert(pathsInLocation.size >= 2)

Review comment:
       I'm not sure I understand. Could you please elaborate?
   
   We don't truncate the path itself, right? I think it's also something to be fixed (I'd rather want to see path being truncated with ellipses (...) instead of not adding and leaving it as it is.) but it's more likely bigger fix which may worth another fix instead of test fix.
   
   If you meant counting the number of paths or something for edge-case, dealing with another UT would be easier, like we could simply add edge-cases there in this PR. (And that UT already tests basic functionality.)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR closed pull request #31435: [SPARK-34326][CORE][SQL] Fix UTs added in SPARK-31793 depending on the length of temp path

Posted by GitBox <gi...@apache.org>.
HeartSaVioR closed pull request #31435:
URL: https://github.com/apache/spark/pull/31435


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on pull request #31435: [SPARK-34326][CORE][SQL] Fix UTs added in SPARK-31793 depending on the length of temp path

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on pull request #31435:
URL: https://github.com/apache/spark/pull/31435#issuecomment-772060608


   Thanks all for the quick reviews!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR closed pull request #31435: [SPARK-34326][CORE][SQL] Fix UTs added in SPARK-31793 depending on the length of temp path

Posted by GitBox <gi...@apache.org>.
HeartSaVioR closed pull request #31435:
URL: https://github.com/apache/spark/pull/31435


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #31435: [SPARK-34326][CORE][SQL] Fix UTs added in SPARK-31793 depending on the length of temp path

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on pull request #31435:
URL: https://github.com/apache/spark/pull/31435#issuecomment-772195293


   Hm, the tests started to fail:
   
   ```
   [info] - SPARK-31793: FileSourceScanExec metadata should contain limited file paths *** FAILED *** (265 milliseconds)
   [info]   1 was not greater than or equal to 2 (DataSourceScanExecRedactionSuite.scala:154)
   [info]   org.scalatest.exceptions.TestFailedException:
   [info]   at org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:472)
   [info]   at org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:471)
   [info]   at org.scalatest.Assertions$.newAssertionFailedException(Assertions.scala:1231)
   [info]   at org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:1295)
   [info]   at org.apache.spark.sql.execution.DataSourceScanExecRedactionSuite.$anonfun$new$9(DataSourceScanExecRedactionSuite.scala:154)
   [info]   at org.apache.spark.sql.execution.DataSourceScanExecRedactionSuite.$anonfun$new$9$adapted(DataSourceScanExecRedactionSuite.scala:123)
   [info]   at org.apache.spark.sql.catalyst.plans.SQLHelper.withTempPath(SQLHelper.scala:69)
   [info]   at org.apache.spark.sql.catalyst.plans.SQLHelper.withTempPath$(SQLHelper.scala:66)
   [info]   at org.apache.spark.sql.QueryTest.withTempPath(QueryTest.scala:34)
   [info]   at org.apache.spark.sql.execution.DataSourceScanExecRedactionSuite.$anonfun$new$8(DataSourceScanExecRedactionSuite.scala:123)
   [info]   at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
   ```
   
   https://github.com/apache/spark/runs/1818352990
   https://github.com/apache/spark/runs/1818908589
   
   @HeartSaVioR, please let me revert this for now since it targeted to fix the test but it started to fail in any event.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31435: [SPARK-34326][CORE][SQL] Fix UTs added in SPARK-31793 depending on the length of temp path

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31435:
URL: https://github.com/apache/spark/pull/31435#issuecomment-771784529


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/134782/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31435: [SPARK-34326][CORE][SQL] Fix UTs added in SPARK-31793 depending on the length of temp path

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31435:
URL: https://github.com/apache/spark/pull/31435#issuecomment-771784529


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/134782/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] gengliangwang commented on a change in pull request #31435: [SPARK-34326][CORE][SQL] Fix UTs added in SPARK-31793 depending on the length of temp path

Posted by GitBox <gi...@apache.org>.
gengliangwang commented on a change in pull request #31435:
URL: https://github.com/apache/spark/pull/31435#discussion_r568551066



##########
File path: sql/core/src/test/scala/org/apache/spark/sql/execution/DataSourceScanExecRedactionSuite.scala
##########
@@ -137,9 +137,24 @@ class DataSourceScanExecRedactionSuite extends DataSourceScanRedactionTest {
       assert(location.isDefined)
       // The location metadata should at least contain one path
       assert(location.get.contains(paths.head))
-      // If the temp path length is larger than 100, the metadata length should not exceed
-      // twice of the length; otherwise, the metadata length should be controlled within 200.
-      assert(location.get.length < Math.max(paths.head.length, 100) * 2)
+
+      // The location metadata should have bracket wrapping paths
+      assert(location.get.indexOf('[') > -1)
+      assert(location.get.indexOf(']') > -1)
+
+      // extract paths in location metadata (removing classname, brackets, separators)
+      val pathsInLocation = location.get.substring(
+        location.get.indexOf('[') + 1, location.get.indexOf(']')).split(", ").toSeq
+
+      // If the temp path length is less than (stop appending threshold - 1), say, 100 - 1 = 99,
+      // location should include more than one paths. Otherwise location should include only one
+      // path.
+      // (Note we apply subtraction with 1 to count start bracket '['.)
+      if (paths.head.length < 99) {
+        assert(pathsInLocation.size >= 2)

Review comment:
       We still need to check that the paths are truncated here.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on pull request #31435: [SPARK-34326][CORE][SQL] Fix UTs added in SPARK-31793 depending on the length of temp path

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on pull request #31435:
URL: https://github.com/apache/spark/pull/31435#issuecomment-771641694


   retest this, please


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] maropu commented on pull request #31435: [SPARK-34326][CORE][SQL] Fix UTs added in SPARK-31793 depending on the length of temp path

Posted by GitBox <gi...@apache.org>.
maropu commented on pull request #31435:
URL: https://github.com/apache/spark/pull/31435#issuecomment-771599722


   Nice catch!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31435: [SPARK-34326][CORE][SQL] Fix UTs added in SPARK-31793 depending on the length of temp path

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31435:
URL: https://github.com/apache/spark/pull/31435#issuecomment-771494956






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on a change in pull request #31435: [SPARK-34326][CORE][SQL] Fix UTs added in SPARK-31793 depending on the length of temp path

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on a change in pull request #31435:
URL: https://github.com/apache/spark/pull/31435#discussion_r568609672



##########
File path: sql/core/src/test/scala/org/apache/spark/sql/execution/DataSourceScanExecRedactionSuite.scala
##########
@@ -137,9 +137,24 @@ class DataSourceScanExecRedactionSuite extends DataSourceScanRedactionTest {
       assert(location.isDefined)
       // The location metadata should at least contain one path
       assert(location.get.contains(paths.head))
-      // If the temp path length is larger than 100, the metadata length should not exceed
-      // twice of the length; otherwise, the metadata length should be controlled within 200.
-      assert(location.get.length < Math.max(paths.head.length, 100) * 2)
+
+      // The location metadata should have bracket wrapping paths
+      assert(location.get.indexOf('[') > -1)
+      assert(location.get.indexOf(']') > -1)
+
+      // extract paths in location metadata (removing classname, brackets, separators)
+      val pathsInLocation = location.get.substring(
+        location.get.indexOf('[') + 1, location.get.indexOf(']')).split(", ").toSeq
+
+      // If the temp path length is less than (stop appending threshold - 1), say, 100 - 1 = 99,
+      // location should include more than one paths. Otherwise location should include only one
+      // path.
+      // (Note we apply subtraction with 1 to count start bracket '['.)
+      if (paths.head.length < 99) {
+        assert(pathsInLocation.size >= 2)

Review comment:
       Yeah makes sense, but let's leave it as it is, as I think I'd propose the another form which would make clear how many elements they are instead of only showing available paths and don't say there're more.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on a change in pull request #31435: [SPARK-34326][CORE][SQL] Fix UTs added in SPARK-31793 depending on the length of temp path

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on a change in pull request #31435:
URL: https://github.com/apache/spark/pull/31435#discussion_r568399650



##########
File path: sql/core/src/test/scala/org/apache/spark/sql/execution/DataSourceScanExecRedactionSuite.scala
##########
@@ -137,9 +137,24 @@ class DataSourceScanExecRedactionSuite extends DataSourceScanRedactionTest {
       assert(location.isDefined)
       // The location metadata should at least contain one path
       assert(location.get.contains(paths.head))
-      // If the temp path length is larger than 100, the metadata length should not exceed
-      // twice of the length; otherwise, the metadata length should be controlled within 200.
-      assert(location.get.length < Math.max(paths.head.length, 100) * 2)
+
+      // The location metadata should have bracket wrapping paths
+      assert(location.get.indexOf('[') > -1)
+      assert(location.get.indexOf(']') > -1)
+
+      // extract paths in location metadata (removing classname, brackets, separators)
+      val pathsInLocation = location.get.substring(
+        location.get.indexOf('[') + 1, location.get.indexOf(']')).split(", ").toSeq
+
+      // If the temp path length is less than (stop appending threshold - 1), say, 100 - 1 = 99,

Review comment:
       I intentionally excluded the thing on closing bracket (']'), but the closing bracket will be added in any way, regardless of the current location string length. This was also not accounted as well.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on pull request #31435: [SPARK-34326][CORE][SQL] Fix UTs added in SPARK-31793 depending on the length of temp path

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on pull request #31435:
URL: https://github.com/apache/spark/pull/31435#issuecomment-771453884






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31435: [SPARK-34326][CORE][SQL] Fix UTs added in SPARK-31793 depending on the length of temp path

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31435:
URL: https://github.com/apache/spark/pull/31435#issuecomment-771550064






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] gengliangwang commented on a change in pull request #31435: [SPARK-34326][CORE][SQL] Fix UTs added in SPARK-31793 depending on the length of temp path

Posted by GitBox <gi...@apache.org>.
gengliangwang commented on a change in pull request #31435:
URL: https://github.com/apache/spark/pull/31435#discussion_r568550648



##########
File path: sql/core/src/test/scala/org/apache/spark/sql/execution/DataSourceScanExecRedactionSuite.scala
##########
@@ -137,9 +137,24 @@ class DataSourceScanExecRedactionSuite extends DataSourceScanRedactionTest {
       assert(location.isDefined)
       // The location metadata should at least contain one path
       assert(location.get.contains(paths.head))
-      // If the temp path length is larger than 100, the metadata length should not exceed
-      // twice of the length; otherwise, the metadata length should be controlled within 200.
-      assert(location.get.length < Math.max(paths.head.length, 100) * 2)

Review comment:
       You are right, this can be flaky.

##########
File path: sql/core/src/test/scala/org/apache/spark/sql/execution/DataSourceScanExecRedactionSuite.scala
##########
@@ -137,9 +137,24 @@ class DataSourceScanExecRedactionSuite extends DataSourceScanRedactionTest {
       assert(location.isDefined)
       // The location metadata should at least contain one path
       assert(location.get.contains(paths.head))
-      // If the temp path length is larger than 100, the metadata length should not exceed
-      // twice of the length; otherwise, the metadata length should be controlled within 200.
-      assert(location.get.length < Math.max(paths.head.length, 100) * 2)
+
+      // The location metadata should have bracket wrapping paths
+      assert(location.get.indexOf('[') > -1)
+      assert(location.get.indexOf(']') > -1)
+
+      // extract paths in location metadata (removing classname, brackets, separators)
+      val pathsInLocation = location.get.substring(
+        location.get.indexOf('[') + 1, location.get.indexOf(']')).split(", ").toSeq
+
+      // If the temp path length is less than (stop appending threshold - 1), say, 100 - 1 = 99,
+      // location should include more than one paths. Otherwise location should include only one
+      // path.
+      // (Note we apply subtraction with 1 to count start bracket '['.)
+      if (paths.head.length < 99) {
+        assert(pathsInLocation.size >= 2)

Review comment:
       We still need to check that the paths are truncated here.

##########
File path: sql/core/src/test/scala/org/apache/spark/sql/execution/DataSourceScanExecRedactionSuite.scala
##########
@@ -137,9 +137,24 @@ class DataSourceScanExecRedactionSuite extends DataSourceScanRedactionTest {
       assert(location.isDefined)
       // The location metadata should at least contain one path
       assert(location.get.contains(paths.head))
-      // If the temp path length is larger than 100, the metadata length should not exceed
-      // twice of the length; otherwise, the metadata length should be controlled within 200.
-      assert(location.get.length < Math.max(paths.head.length, 100) * 2)
+
+      // The location metadata should have bracket wrapping paths
+      assert(location.get.indexOf('[') > -1)
+      assert(location.get.indexOf(']') > -1)
+
+      // extract paths in location metadata (removing classname, brackets, separators)
+      val pathsInLocation = location.get.substring(
+        location.get.indexOf('[') + 1, location.get.indexOf(']')).split(", ").toSeq
+
+      // If the temp path length is less than (stop appending threshold - 1), say, 100 - 1 = 99,
+      // location should include more than one paths. Otherwise location should include only one
+      // path.
+      // (Note we apply subtraction with 1 to count start bracket '['.)
+      if (paths.head.length < 99) {
+        assert(pathsInLocation.size >= 2)

Review comment:
       I mean simply checking 
   ```
   assert(pathsInLocation.size < paths.size)
   ```
   But this is a minor comment. You can ignore it since there is UT for it already.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31435: [SPARK-34326][CORE][SQL] Fix UTs added in SPARK-31793 depending on the length of temp path

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31435:
URL: https://github.com/apache/spark/pull/31435#issuecomment-771550071






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31435: [SPARK-34326][CORE][SQL] Fix UTs added in SPARK-31793 depending on the length of temp path

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31435:
URL: https://github.com/apache/spark/pull/31435#issuecomment-771494956


   **[Test build #134770 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134770/testReport)** for PR 31435 at commit [`36943c6`](https://github.com/apache/spark/commit/36943c6563278a86131e0d41b3334a538366d597).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31435: [SPARK-34326][CORE][SQL] Fix UTs added in SPARK-31793 depending on the length of temp path

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31435:
URL: https://github.com/apache/spark/pull/31435#issuecomment-771700545


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39369/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on a change in pull request #31435: [SPARK-34326][CORE][SQL] Fix UTs added in SPARK-31793 depending on the length of temp path

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on a change in pull request #31435:
URL: https://github.com/apache/spark/pull/31435#discussion_r568565607



##########
File path: sql/core/src/test/scala/org/apache/spark/sql/execution/DataSourceScanExecRedactionSuite.scala
##########
@@ -137,9 +137,24 @@ class DataSourceScanExecRedactionSuite extends DataSourceScanRedactionTest {
       assert(location.isDefined)
       // The location metadata should at least contain one path
       assert(location.get.contains(paths.head))
-      // If the temp path length is larger than 100, the metadata length should not exceed
-      // twice of the length; otherwise, the metadata length should be controlled within 200.
-      assert(location.get.length < Math.max(paths.head.length, 100) * 2)
+
+      // The location metadata should have bracket wrapping paths
+      assert(location.get.indexOf('[') > -1)
+      assert(location.get.indexOf(']') > -1)
+
+      // extract paths in location metadata (removing classname, brackets, separators)
+      val pathsInLocation = location.get.substring(
+        location.get.indexOf('[') + 1, location.get.indexOf(']')).split(", ").toSeq
+
+      // If the temp path length is less than (stop appending threshold - 1), say, 100 - 1 = 99,
+      // location should include more than one paths. Otherwise location should include only one
+      // path.
+      // (Note we apply subtraction with 1 to count start bracket '['.)
+      if (paths.head.length < 99) {
+        assert(pathsInLocation.size >= 2)

Review comment:
       I'm not sure I understand. Could you please elaborate?
   
   We don't truncate the path, right? I think it's also something to be fixed (I'd rather want to see path being truncated with ellipses (...) instead of not adding and leaving it as it is.) but it's more likely bigger fix which may worth another fix instead of test fix.
   
   If you meant counting the number of paths or something for edge-case, dealing with another UT would be easier, like we could simply add edge-cases there in this PR.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on pull request #31435: [SPARK-34326][CORE][SQL] Fix UTs added in SPARK-31793 depending on the length of temp path

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on pull request #31435:
URL: https://github.com/apache/spark/pull/31435#issuecomment-771453884


   cc.ing @gengliangwang @cloud-fan @HyukjinKwon @maropu who are author/reviewers of #28610


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] gengliangwang commented on a change in pull request #31435: [SPARK-34326][CORE][SQL] Fix UTs added in SPARK-31793 depending on the length of temp path

Posted by GitBox <gi...@apache.org>.
gengliangwang commented on a change in pull request #31435:
URL: https://github.com/apache/spark/pull/31435#discussion_r568550648



##########
File path: sql/core/src/test/scala/org/apache/spark/sql/execution/DataSourceScanExecRedactionSuite.scala
##########
@@ -137,9 +137,24 @@ class DataSourceScanExecRedactionSuite extends DataSourceScanRedactionTest {
       assert(location.isDefined)
       // The location metadata should at least contain one path
       assert(location.get.contains(paths.head))
-      // If the temp path length is larger than 100, the metadata length should not exceed
-      // twice of the length; otherwise, the metadata length should be controlled within 200.
-      assert(location.get.length < Math.max(paths.head.length, 100) * 2)

Review comment:
       You are right, this can be flaky.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31435: [SPARK-34326][CORE][SQL] Fix UTs added in SPARK-31793 depending on the length of temp path

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31435:
URL: https://github.com/apache/spark/pull/31435#issuecomment-771734985


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/39369/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on pull request #31435: [SPARK-34326][CORE][SQL] Fix UTs added in SPARK-31793 depending on the length of temp path

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on pull request #31435:
URL: https://github.com/apache/spark/pull/31435#issuecomment-772197077


   Ah thanks for the check. Looks odd... I'll check it.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31435: [SPARK-34326][CORE][SQL] Fix UTs added in SPARK-31793 depending on the length of temp path

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31435:
URL: https://github.com/apache/spark/pull/31435#issuecomment-771550071






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on a change in pull request #31435: [SPARK-34326][CORE][SQL] Fix UTs added in SPARK-31793 depending on the length of temp path

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on a change in pull request #31435:
URL: https://github.com/apache/spark/pull/31435#discussion_r568399650



##########
File path: sql/core/src/test/scala/org/apache/spark/sql/execution/DataSourceScanExecRedactionSuite.scala
##########
@@ -137,9 +137,24 @@ class DataSourceScanExecRedactionSuite extends DataSourceScanRedactionTest {
       assert(location.isDefined)
       // The location metadata should at least contain one path
       assert(location.get.contains(paths.head))
-      // If the temp path length is larger than 100, the metadata length should not exceed
-      // twice of the length; otherwise, the metadata length should be controlled within 200.
-      assert(location.get.length < Math.max(paths.head.length, 100) * 2)
+
+      // The location metadata should have bracket wrapping paths
+      assert(location.get.indexOf('[') > -1)
+      assert(location.get.indexOf(']') > -1)
+
+      // extract paths in location metadata (removing classname, brackets, separators)
+      val pathsInLocation = location.get.substring(
+        location.get.indexOf('[') + 1, location.get.indexOf(']')).split(", ").toSeq
+
+      // If the temp path length is less than (stop appending threshold - 1), say, 100 - 1 = 99,

Review comment:
       I intentionally excluded the thing on closing bracket (']'), but the closing bracket is added in any way (regardless of the current location string length). This was also not accounted as well.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on pull request #31435: [SPARK-34326][CORE][SQL] Fix UTs added in SPARK-31793 depending on the length of temp path

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on pull request #31435:
URL: https://github.com/apache/spark/pull/31435#issuecomment-772233682






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on pull request #31435: [SPARK-34326][CORE][SQL] Fix UTs added in SPARK-31793 depending on the length of temp path

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on pull request #31435:
URL: https://github.com/apache/spark/pull/31435#issuecomment-771643162


   Just retriggered Github Action as well.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31435: [SPARK-34326][CORE][SQL] Fix UTs added in SPARK-31793 depending on the length of temp path

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31435:
URL: https://github.com/apache/spark/pull/31435#issuecomment-771734985


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/39369/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] gengliangwang commented on a change in pull request #31435: [SPARK-34326][CORE][SQL] Fix UTs added in SPARK-31793 depending on the length of temp path

Posted by GitBox <gi...@apache.org>.
gengliangwang commented on a change in pull request #31435:
URL: https://github.com/apache/spark/pull/31435#discussion_r568567871



##########
File path: sql/core/src/test/scala/org/apache/spark/sql/execution/DataSourceScanExecRedactionSuite.scala
##########
@@ -137,9 +137,24 @@ class DataSourceScanExecRedactionSuite extends DataSourceScanRedactionTest {
       assert(location.isDefined)
       // The location metadata should at least contain one path
       assert(location.get.contains(paths.head))
-      // If the temp path length is larger than 100, the metadata length should not exceed
-      // twice of the length; otherwise, the metadata length should be controlled within 200.
-      assert(location.get.length < Math.max(paths.head.length, 100) * 2)
+
+      // The location metadata should have bracket wrapping paths
+      assert(location.get.indexOf('[') > -1)
+      assert(location.get.indexOf(']') > -1)
+
+      // extract paths in location metadata (removing classname, brackets, separators)
+      val pathsInLocation = location.get.substring(
+        location.get.indexOf('[') + 1, location.get.indexOf(']')).split(", ").toSeq
+
+      // If the temp path length is less than (stop appending threshold - 1), say, 100 - 1 = 99,
+      // location should include more than one paths. Otherwise location should include only one
+      // path.
+      // (Note we apply subtraction with 1 to count start bracket '['.)
+      if (paths.head.length < 99) {
+        assert(pathsInLocation.size >= 2)

Review comment:
       I mean simply checking 
   ```
   assert(pathsInLocation.size < paths.size)
   ```
   But this is a minor comment. You can ignore it since there is UT for it already.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on a change in pull request #31435: [SPARK-34326][CORE][SQL] Fix UTs added in SPARK-31793 depending on the length of temp path

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on a change in pull request #31435:
URL: https://github.com/apache/spark/pull/31435#discussion_r568399650



##########
File path: sql/core/src/test/scala/org/apache/spark/sql/execution/DataSourceScanExecRedactionSuite.scala
##########
@@ -137,9 +137,24 @@ class DataSourceScanExecRedactionSuite extends DataSourceScanRedactionTest {
       assert(location.isDefined)
       // The location metadata should at least contain one path
       assert(location.get.contains(paths.head))
-      // If the temp path length is larger than 100, the metadata length should not exceed
-      // twice of the length; otherwise, the metadata length should be controlled within 200.
-      assert(location.get.length < Math.max(paths.head.length, 100) * 2)
+
+      // The location metadata should have bracket wrapping paths
+      assert(location.get.indexOf('[') > -1)
+      assert(location.get.indexOf(']') > -1)
+
+      // extract paths in location metadata (removing classname, brackets, separators)
+      val pathsInLocation = location.get.substring(
+        location.get.indexOf('[') + 1, location.get.indexOf(']')).split(", ").toSeq
+
+      // If the temp path length is less than (stop appending threshold - 1), say, 100 - 1 = 99,

Review comment:
       I intentionally excluded the thing on closing bracket (']'), but the closing bracket is added in any way (regardless of the current location string length). This was also not accounted as well.

##########
File path: sql/core/src/test/scala/org/apache/spark/sql/execution/DataSourceScanExecRedactionSuite.scala
##########
@@ -137,9 +137,24 @@ class DataSourceScanExecRedactionSuite extends DataSourceScanRedactionTest {
       assert(location.isDefined)
       // The location metadata should at least contain one path
       assert(location.get.contains(paths.head))
-      // If the temp path length is larger than 100, the metadata length should not exceed
-      // twice of the length; otherwise, the metadata length should be controlled within 200.
-      assert(location.get.length < Math.max(paths.head.length, 100) * 2)
+
+      // The location metadata should have bracket wrapping paths
+      assert(location.get.indexOf('[') > -1)
+      assert(location.get.indexOf(']') > -1)
+
+      // extract paths in location metadata (removing classname, brackets, separators)
+      val pathsInLocation = location.get.substring(
+        location.get.indexOf('[') + 1, location.get.indexOf(']')).split(", ").toSeq
+
+      // If the temp path length is less than (stop appending threshold - 1), say, 100 - 1 = 99,

Review comment:
       I intentionally excluded the thing on closing bracket (']'), but the closing bracket will be added in any way, regardless of the current location string length. This was also not accounted as well.

##########
File path: sql/core/src/test/scala/org/apache/spark/sql/execution/DataSourceScanExecRedactionSuite.scala
##########
@@ -137,9 +137,24 @@ class DataSourceScanExecRedactionSuite extends DataSourceScanRedactionTest {
       assert(location.isDefined)
       // The location metadata should at least contain one path
       assert(location.get.contains(paths.head))
-      // If the temp path length is larger than 100, the metadata length should not exceed
-      // twice of the length; otherwise, the metadata length should be controlled within 200.
-      assert(location.get.length < Math.max(paths.head.length, 100) * 2)
+
+      // The location metadata should have bracket wrapping paths
+      assert(location.get.indexOf('[') > -1)
+      assert(location.get.indexOf(']') > -1)
+
+      // extract paths in location metadata (removing classname, brackets, separators)
+      val pathsInLocation = location.get.substring(
+        location.get.indexOf('[') + 1, location.get.indexOf(']')).split(", ").toSeq
+
+      // If the temp path length is less than (stop appending threshold - 1), say, 100 - 1 = 99,
+      // location should include more than one paths. Otherwise location should include only one
+      // path.
+      // (Note we apply subtraction with 1 to count start bracket '['.)
+      if (paths.head.length < 99) {
+        assert(pathsInLocation.size >= 2)

Review comment:
       I'm not sure I understand. Could you please elaborate?
   
   We don't truncate the path, right? I think it's also something to be fixed (I'd rather want to see path being truncated with ellipses (...) instead of not adding and leaving it as it is.) but it's more likely bigger fix which may worth another fix instead of test fix.
   
   If you meant counting the number of paths or something for edge-case, dealing with another UT would be easier, like we could simply add edge-cases there in this PR.

##########
File path: sql/core/src/test/scala/org/apache/spark/sql/execution/DataSourceScanExecRedactionSuite.scala
##########
@@ -137,9 +137,24 @@ class DataSourceScanExecRedactionSuite extends DataSourceScanRedactionTest {
       assert(location.isDefined)
       // The location metadata should at least contain one path
       assert(location.get.contains(paths.head))
-      // If the temp path length is larger than 100, the metadata length should not exceed
-      // twice of the length; otherwise, the metadata length should be controlled within 200.
-      assert(location.get.length < Math.max(paths.head.length, 100) * 2)
+
+      // The location metadata should have bracket wrapping paths
+      assert(location.get.indexOf('[') > -1)
+      assert(location.get.indexOf(']') > -1)
+
+      // extract paths in location metadata (removing classname, brackets, separators)
+      val pathsInLocation = location.get.substring(
+        location.get.indexOf('[') + 1, location.get.indexOf(']')).split(", ").toSeq
+
+      // If the temp path length is less than (stop appending threshold - 1), say, 100 - 1 = 99,
+      // location should include more than one paths. Otherwise location should include only one
+      // path.
+      // (Note we apply subtraction with 1 to count start bracket '['.)
+      if (paths.head.length < 99) {
+        assert(pathsInLocation.size >= 2)

Review comment:
       I'm not sure I understand. Could you please elaborate?
   
   We don't truncate the path itself, right? I think it's also something to be fixed (I'd rather want to see path being truncated with ellipses (...) instead of not adding and leaving it as it is.) but it's more likely bigger fix which may worth another fix instead of test fix.
   
   If you meant counting the number of paths or something for edge-case, dealing with another UT would be easier, like we could simply add edge-cases there in this PR.

##########
File path: sql/core/src/test/scala/org/apache/spark/sql/execution/DataSourceScanExecRedactionSuite.scala
##########
@@ -137,9 +137,24 @@ class DataSourceScanExecRedactionSuite extends DataSourceScanRedactionTest {
       assert(location.isDefined)
       // The location metadata should at least contain one path
       assert(location.get.contains(paths.head))
-      // If the temp path length is larger than 100, the metadata length should not exceed
-      // twice of the length; otherwise, the metadata length should be controlled within 200.
-      assert(location.get.length < Math.max(paths.head.length, 100) * 2)
+
+      // The location metadata should have bracket wrapping paths
+      assert(location.get.indexOf('[') > -1)
+      assert(location.get.indexOf(']') > -1)
+
+      // extract paths in location metadata (removing classname, brackets, separators)
+      val pathsInLocation = location.get.substring(
+        location.get.indexOf('[') + 1, location.get.indexOf(']')).split(", ").toSeq
+
+      // If the temp path length is less than (stop appending threshold - 1), say, 100 - 1 = 99,
+      // location should include more than one paths. Otherwise location should include only one
+      // path.
+      // (Note we apply subtraction with 1 to count start bracket '['.)
+      if (paths.head.length < 99) {
+        assert(pathsInLocation.size >= 2)

Review comment:
       I'm not sure I understand. Could you please elaborate?
   
   We don't truncate the path itself, right? I think it's also something to be fixed (I'd rather want to see path being truncated with ellipses (...) instead of not adding and leaving it as it is.) but it's more likely bigger fix which may worth another fix instead of test fix.
   
   If you meant counting the number of paths or something for edge-case, dealing with another UT would be easier, like we could simply add edge-cases there in this PR. (And that UT already tests basic functionality.)

##########
File path: sql/core/src/test/scala/org/apache/spark/sql/execution/DataSourceScanExecRedactionSuite.scala
##########
@@ -137,9 +137,24 @@ class DataSourceScanExecRedactionSuite extends DataSourceScanRedactionTest {
       assert(location.isDefined)
       // The location metadata should at least contain one path
       assert(location.get.contains(paths.head))
-      // If the temp path length is larger than 100, the metadata length should not exceed
-      // twice of the length; otherwise, the metadata length should be controlled within 200.
-      assert(location.get.length < Math.max(paths.head.length, 100) * 2)
+
+      // The location metadata should have bracket wrapping paths
+      assert(location.get.indexOf('[') > -1)
+      assert(location.get.indexOf(']') > -1)
+
+      // extract paths in location metadata (removing classname, brackets, separators)
+      val pathsInLocation = location.get.substring(
+        location.get.indexOf('[') + 1, location.get.indexOf(']')).split(", ").toSeq
+
+      // If the temp path length is less than (stop appending threshold - 1), say, 100 - 1 = 99,
+      // location should include more than one paths. Otherwise location should include only one
+      // path.
+      // (Note we apply subtraction with 1 to count start bracket '['.)
+      if (paths.head.length < 99) {
+        assert(pathsInLocation.size >= 2)

Review comment:
       Yeah makes sense, but let's leave it as it is, as I think I'd propose the another form which would make clear how many elements they are instead of only showing available paths and don't say there're more.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #31435: [SPARK-34326][CORE][SQL] Fix UTs added in SPARK-31793 depending on the length of temp path

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #31435:
URL: https://github.com/apache/spark/pull/31435#issuecomment-771651352


   **[Test build #134782 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134782/testReport)** for PR 31435 at commit [`36943c6`](https://github.com/apache/spark/commit/36943c6563278a86131e0d41b3334a538366d597).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org