You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/11/09 05:35:32 UTC

[GitHub] [spark] itholic opened a new pull request, #38575: [WIP][SPARK-40948][SQL][FOLLOWUP] Restore PATH_NOT_FOUND

itholic opened a new pull request, #38575:
URL: https://github.com/apache/spark/pull/38575

   
   ### What changes were proposed in this pull request?
   
   The original PR to introduce the error class `PATH_NOT_FOUND` was reverted since it breaks the tests in different test env.
   
   This PR proposes to restore it back.
   
   
   ### Why are the changes needed?
   
   Restoring the reverted changes with proper fix.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No
   
   
   ### How was this patch tested?
   
   The existing CI should pass.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on pull request #38575: [WIP][SPARK-40948][SQL][FOLLOWUP] Restore PATH_NOT_FOUND

Posted by GitBox <gi...@apache.org>.
LuciferYang commented on PR #38575:
URL: https://github.com/apache/spark/pull/38575#issuecomment-1310407470

   Looks like we need to refactor this case
   
   https://github.com/apache/spark/blob/c5d27603f29437f1686cac70727594c19410a273/R/pkg/tests/fulltests/test_sparkSQL.R#L3986-L3998


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on a diff in pull request #38575: [WIP][SPARK-40948][SQL][FOLLOWUP] Restore PATH_NOT_FOUND

Posted by GitBox <gi...@apache.org>.
LuciferYang commented on code in PR #38575:
URL: https://github.com/apache/spark/pull/38575#discussion_r1017484528


##########
sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala:
##########
@@ -2330,7 +2330,7 @@ class DataFrameSuite extends QueryTest
           new File(uuid, "file3").getAbsolutePath,
           uuid).rdd
       }
-      assert(e.getMessage.startsWith("Path does not exist"))
+      assert(e.getErrorClass == "PATH_NOT_FOUND")

Review Comment:
   nit: Can the empty `finally` block in this case be cleaned together in this pr?
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] MaxGekk commented on a diff in pull request #38575: [SPARK-40948][SQL][FOLLOWUP] Restore PATH_NOT_FOUND

Posted by GitBox <gi...@apache.org>.
MaxGekk commented on code in PR #38575:
URL: https://github.com/apache/spark/pull/38575#discussion_r1029760391


##########
R/pkg/tests/fulltests/test_sparkSQL.R:
##########
@@ -3990,12 +3990,16 @@ test_that("Call DataFrameWriter.load() API in Java without path and check argume
   expect_error(read.df(source = "json"),
                paste("Error in load : analysis error - Unable to infer schema for JSON.",
                      "It must be specified manually"))
-  expect_error(read.df("arbitrary_path"), "Error in load : analysis error - Path does not exist")
-  expect_error(read.json("arbitrary_path"), "Error in json : analysis error - Path does not exist")
-  expect_error(read.text("arbitrary_path"), "Error in text : analysis error - Path does not exist")
-  expect_error(read.orc("arbitrary_path"), "Error in orc : analysis error - Path does not exist")
+  expect_error(read.df("arbitrary_path"),
+               "Error in load : analysis error - \\[PATH_NOT_FOUND\\].*")

Review Comment:
   @HyukjinKwon @srielau @cloud-fan Are you ok with such changes?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] MaxGekk commented on pull request #38575: [SPARK-40948][SQL][FOLLOWUP] Restore PATH_NOT_FOUND

Posted by GitBox <gi...@apache.org>.
MaxGekk commented on PR #38575:
URL: https://github.com/apache/spark/pull/38575#issuecomment-1324575522

   +1, LGTM. Merging to master.
   Thank you, @itholic and @HyukjinKwon @cloud-fan @LuciferYang for review.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a diff in pull request #38575: [SPARK-40948][SQL][FOLLOWUP] Restore PATH_NOT_FOUND

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on code in PR #38575:
URL: https://github.com/apache/spark/pull/38575#discussion_r1029997623


##########
R/pkg/tests/fulltests/test_sparkSQL.R:
##########
@@ -3990,12 +3990,16 @@ test_that("Call DataFrameWriter.load() API in Java without path and check argume
   expect_error(read.df(source = "json"),
                paste("Error in load : analysis error - Unable to infer schema for JSON.",
                      "It must be specified manually"))
-  expect_error(read.df("arbitrary_path"), "Error in load : analysis error - Path does not exist")
-  expect_error(read.json("arbitrary_path"), "Error in json : analysis error - Path does not exist")
-  expect_error(read.text("arbitrary_path"), "Error in text : analysis error - Path does not exist")
-  expect_error(read.orc("arbitrary_path"), "Error in orc : analysis error - Path does not exist")
+  expect_error(read.df("arbitrary_path"),
+               "Error in load : analysis error - \\[PATH_NOT_FOUND\\].*")

Review Comment:
   LGTM



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on pull request #38575: [WIP][SPARK-40948][SQL][FOLLOWUP] Restore PATH_NOT_FOUND

Posted by GitBox <gi...@apache.org>.
LuciferYang commented on PR #38575:
URL: https://github.com/apache/spark/pull/38575#issuecomment-1310402593

   Is the sparkr UTs failure is related to this one?
   
   https://github.com/itholic/spark/actions/runs/3425639144/jobs/5708796073
   
   ```
   ══ Failed ══════════════════════════════════════════════════════════════════════
   ── 1. Failure ('test_sparkSQL.R:3993'): Call DataFrameWriter.load() API in Java 
   `read.df("arbitrary_path")` threw an error with unexpected message.
   Expected match: "Error in load : analysis error - Path does not exist"
   Actual message: "Error in load : analysis error - [PATH_NOT_FOUND] Path does not exist: file:/__w/spark/spark/arbitrary_path."
   Backtrace:
     1. testthat::expect_error(read.df("arbitrary_path"), "Error in load : analysis error - Path does not exist")
          at test_sparkSQL.R:3993:2
     6. SparkR::read.df("arbitrary_path")
     7. SparkR:::handledCallJMethod(read, "load")
     8. base::tryCatch(...)
     9. base (local) tryCatchList(expr, classes, parentenv, handlers)
    10. base (local) tryCatchOne(expr, names, parentenv, handlers[[1L]])
    11. value[[3L]](cond)
    12. SparkR:::captureJVMException(e, method)
   ```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] itholic commented on a diff in pull request #38575: [WIP][SPARK-40948][SQL][FOLLOWUP] Restore PATH_NOT_FOUND

Posted by GitBox <gi...@apache.org>.
itholic commented on code in PR #38575:
URL: https://github.com/apache/spark/pull/38575#discussion_r1017450594


##########
sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala:
##########
@@ -2330,7 +2330,7 @@ class DataFrameSuite extends QueryTest
           new File(uuid, "file3").getAbsolutePath,
           uuid).rdd
       }
-      assert(e.getMessage.startsWith("Path does not exist"))
+      assert(e.getErrorClass == "PATH_NOT_FOUND")

Review Comment:
   The current `checkError` seems not allow pattern matching for `parameter` checking.
   
   Let me just check the errorClass for now.
   
   Otherwise, it fails on different test envs: https://github.com/apache/spark/pull/38422#discussion_r1016116865



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a diff in pull request #38575: [SPARK-40948][SQL][FOLLOWUP] Restore PATH_NOT_FOUND

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on code in PR #38575:
URL: https://github.com/apache/spark/pull/38575#discussion_r1030018762


##########
R/pkg/tests/fulltests/test_sparkSQL.R:
##########
@@ -3990,12 +3990,16 @@ test_that("Call DataFrameWriter.load() API in Java without path and check argume
   expect_error(read.df(source = "json"),
                paste("Error in load : analysis error - Unable to infer schema for JSON.",
                      "It must be specified manually"))
-  expect_error(read.df("arbitrary_path"), "Error in load : analysis error - Path does not exist")
-  expect_error(read.json("arbitrary_path"), "Error in json : analysis error - Path does not exist")
-  expect_error(read.text("arbitrary_path"), "Error in text : analysis error - Path does not exist")
-  expect_error(read.orc("arbitrary_path"), "Error in orc : analysis error - Path does not exist")
+  expect_error(read.df("arbitrary_path"),
+               "Error in load : analysis error - \\[PATH_NOT_FOUND\\].*")

Review Comment:
   thanks man



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] MaxGekk closed pull request #38575: [SPARK-40948][SQL][FOLLOWUP] Restore PATH_NOT_FOUND

Posted by GitBox <gi...@apache.org>.
MaxGekk closed pull request #38575: [SPARK-40948][SQL][FOLLOWUP] Restore PATH_NOT_FOUND
URL: https://github.com/apache/spark/pull/38575


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] MaxGekk commented on a diff in pull request #38575: [WIP][SPARK-40948][SQL][FOLLOWUP] Restore PATH_NOT_FOUND

Posted by GitBox <gi...@apache.org>.
MaxGekk commented on code in PR #38575:
URL: https://github.com/apache/spark/pull/38575#discussion_r1018719912


##########
sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala:
##########
@@ -2330,7 +2330,7 @@ class DataFrameSuite extends QueryTest
           new File(uuid, "file3").getAbsolutePath,
           uuid).rdd
       }
-      assert(e.getMessage.startsWith("Path does not exist"))
+      assert(e.getErrorClass == "PATH_NOT_FOUND")

Review Comment:
   > The current checkError seems not allow pattern matching for parameter checking.
   
   It does, see https://github.com/apache/spark/blob/74668e2bf14760dbc60509f7736f410c09084697/core/src/test/scala/org/apache/spark/SparkFunSuite.scala#L310
   
   Please, use `checkError` here. Just set `matchPVals` to `true`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] itholic commented on a diff in pull request #38575: [WIP][SPARK-40948][SQL][FOLLOWUP] Restore PATH_NOT_FOUND

Posted by GitBox <gi...@apache.org>.
itholic commented on code in PR #38575:
URL: https://github.com/apache/spark/pull/38575#discussion_r1017450594


##########
sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala:
##########
@@ -2330,7 +2330,7 @@ class DataFrameSuite extends QueryTest
           new File(uuid, "file3").getAbsolutePath,
           uuid).rdd
       }
-      assert(e.getMessage.startsWith("Path does not exist"))
+      assert(e.getErrorClass == "PATH_NOT_FOUND")

Review Comment:
   The current `checkError` seems not allow pattern matching for `parameter` checking.
   
   Let me just check the errorClass for now.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] itholic commented on a diff in pull request #38575: [SPARK-40948][SQL][FOLLOWUP] Restore PATH_NOT_FOUND

Posted by GitBox <gi...@apache.org>.
itholic commented on code in PR #38575:
URL: https://github.com/apache/spark/pull/38575#discussion_r1029988770


##########
R/pkg/tests/fulltests/test_sparkSQL.R:
##########
@@ -3990,12 +3990,16 @@ test_that("Call DataFrameWriter.load() API in Java without path and check argume
   expect_error(read.df(source = "json"),
                paste("Error in load : analysis error - Unable to infer schema for JSON.",
                      "It must be specified manually"))
-  expect_error(read.df("arbitrary_path"), "Error in load : analysis error - Path does not exist")
-  expect_error(read.json("arbitrary_path"), "Error in json : analysis error - Path does not exist")
-  expect_error(read.text("arbitrary_path"), "Error in text : analysis error - Path does not exist")
-  expect_error(read.orc("arbitrary_path"), "Error in orc : analysis error - Path does not exist")
+  expect_error(read.df("arbitrary_path"),
+               "Error in load : analysis error - \\[PATH_NOT_FOUND\\].*")

Review Comment:
   FYI: the `arbitrary_path` here is different per test environments, so I use regexp for the path string.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org