You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by dongjoon-hyun <gi...@git.apache.org> on 2018/11/01 20:33:36 UTC
[GitHub] spark pull request #22927: [SPARK-25918][SQL] LOAD DATA LOCAL INPATH should ...
GitHub user dongjoon-hyun opened a pull request:
https://github.com/apache/spark/pull/22927
[SPARK-25918][SQL] LOAD DATA LOCAL INPATH should handle a relative path
## What changes were proposed in this pull request?
Unfortunately, it seems that we missed this in 2.4.0. In Spark 2.4, `LOAD DATA LOCAL INPATH` only works in case of absolute paths. This PR aims to fix it to support relative paths. This is a regression in 2.4.0.
```scala
$ ls kv1.txt
kv1.txt
scala> spark.sql("LOAD DATA LOCAL INPATH 'kv1.txt' INTO TABLE t")
org.apache.spark.sql.AnalysisException: LOAD DATA input path does not exist: kv1.txt;
```
## How was this patch tested?
Pass the Jenkins
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/dongjoon-hyun/spark SPARK-LOAD
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/22927.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #22927
----
commit efb99da8fb505aaeeb0d95fff99c245bd3c0a0b8
Author: Dongjoon Hyun <do...@...>
Date: 2018-11-01T20:26:47Z
[SPARK-25918][SQL] LOAD DATA LOCAL INPATH should handle a relative path
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22927: [SPARK-25918][SQL] LOAD DATA LOCAL INPATH should handle ...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22927
**[Test build #98377 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98377/testReport)** for PR 22927 at commit [`85a5864`](https://github.com/apache/spark/commit/85a5864a5b6a910f3cc702d0407a5e015de2efcc).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22927: [SPARK-25918][SQL] LOAD DATA LOCAL INPATH should ...
Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:
https://github.com/apache/spark/pull/22927
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22927: [SPARK-25918][SQL] LOAD DATA LOCAL INPATH should handle ...
Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/22927
I'll list in as a known issue in 2.4.0, thanks for fixing it!
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22927: [SPARK-25918][SQL] LOAD DATA LOCAL INPATH should handle ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22927
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98368/
Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22927: [SPARK-25918][SQL] LOAD DATA LOCAL INPATH should handle ...
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:
https://github.com/apache/spark/pull/22927
For SparkR failure, https://issues.apache.org/jira/browse/SPARK-25923 is filed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22927: [SPARK-25918][SQL] LOAD DATA LOCAL INPATH should ...
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:
https://github.com/apache/spark/pull/22927#discussion_r230224899
--- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala ---
@@ -1987,6 +1987,13 @@ class SQLQuerySuite extends QueryTest with SQLTestUtils with TestHiveSingleton {
}
}
+ test("SPARK-25918: LOAD DATA LOCAL INPATH should handle a relative path") {
--- End diff --
I'll move this.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22927: [SPARK-25918][SQL] LOAD DATA LOCAL INPATH should ...
Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:
https://github.com/apache/spark/pull/22927#discussion_r230216125
--- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala ---
@@ -1987,6 +1987,13 @@ class SQLQuerySuite extends QueryTest with SQLTestUtils with TestHiveSingleton {
}
}
+ test("SPARK-25918: LOAD DATA LOCAL INPATH should handle a relative path") {
--- End diff --
This does not belong to this test suite. `HiveCommandSuite.scala` is the best place, although this is not for hive module.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22927: [SPARK-25918][SQL] LOAD DATA LOCAL INPATH should ...
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:
https://github.com/apache/spark/pull/22927#discussion_r230225838
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala ---
@@ -393,7 +393,7 @@ object LoadDataCommand {
throw new IllegalArgumentException(e)
}
} else {
- path
+ new Path(pathUri)
--- End diff --
Good point. Instead of this, `new Path(workingDir, path)` will work. In that case, we may refactor this as a variable for line 379 and reuse it.
```
new Path(workingDir, path)
```
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22927: [SPARK-25918][SQL] LOAD DATA LOCAL INPATH should handle ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22927
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22927: [SPARK-25918][SQL] LOAD DATA LOCAL INPATH should handle ...
Posted by squito <gi...@git.apache.org>.
Github user squito commented on the issue:
https://github.com/apache/spark/pull/22927
good catch, I should have checked this case too.
lgtm
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22927: [SPARK-25918][SQL] LOAD DATA LOCAL INPATH should handle ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22927
Merged build finished. Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22927: [SPARK-25918][SQL] LOAD DATA LOCAL INPATH should ...
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:
https://github.com/apache/spark/pull/22927#discussion_r230224836
--- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala ---
@@ -1987,6 +1987,13 @@ class SQLQuerySuite extends QueryTest with SQLTestUtils with TestHiveSingleton {
}
}
+ test("SPARK-25918: LOAD DATA LOCAL INPATH should handle a relative path") {
--- End diff --
Ya. I know. I put this here before the previous test case are [here](https://github.com/apache/spark/pull/22927/files/efb99da8fb505aaeeb0d95fff99c245bd3c0a0b8#diff-1ea02a6fab84e938582f7f87cc4d9ea1R1997).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22927: [SPARK-25918][SQL] LOAD DATA LOCAL INPATH should handle ...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22927
**[Test build #98368 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98368/testReport)** for PR 22927 at commit [`efb99da`](https://github.com/apache/spark/commit/efb99da8fb505aaeeb0d95fff99c245bd3c0a0b8).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22927: [SPARK-25918][SQL] LOAD DATA LOCAL INPATH should handle ...
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:
https://github.com/apache/spark/pull/22927
Thank you for review, @squito !
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22927: [SPARK-25918][SQL] LOAD DATA LOCAL INPATH should handle ...
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:
https://github.com/apache/spark/pull/22927
Thank you, @squito and @gatorsmile . I address the review comments.
The SparkR failure looks irrelevant to this. I also observed that in another unrelated PR (https://github.com/apache/spark/pull/22924), too
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22927: [SPARK-25918][SQL] LOAD DATA LOCAL INPATH should handle ...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22927
**[Test build #98377 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98377/testReport)** for PR 22927 at commit [`85a5864`](https://github.com/apache/spark/commit/85a5864a5b6a910f3cc702d0407a5e015de2efcc).
* This patch **fails SparkR unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22927: [SPARK-25918][SQL] LOAD DATA LOCAL INPATH should handle ...
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:
https://github.com/apache/spark/pull/22927
Thank you for review, @squito , @gatorsmile , @HyukjinKwon .
According to the log, all Java/Scala/Python/R tests passed.
The failure mark is only due to SPARK-25923. I'll merge this.
```
* checking CRAN incoming feasibility ...Error in .check_package_CRAN_incoming(pkgdir) :
dims [product 26] do not match the length of object [0]
Execution halted
```
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22927: [SPARK-25918][SQL] LOAD DATA LOCAL INPATH should ...
Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:
https://github.com/apache/spark/pull/22927#discussion_r230217147
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala ---
@@ -393,7 +393,7 @@ object LoadDataCommand {
throw new IllegalArgumentException(e)
}
} else {
- path
+ new Path(pathUri)
--- End diff --
Here, it is converted PATH to URI and then converted back to Path. What is your goal rather than directly building a path?
```
if (path.isAbsolute()) path else new Path(workingDir, path)
````
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22927: [SPARK-25918][SQL] LOAD DATA LOCAL INPATH should handle ...
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:
https://github.com/apache/spark/pull/22927
Merged to master/branch-2.4.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22927: [SPARK-25918][SQL] LOAD DATA LOCAL INPATH should ...
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:
https://github.com/apache/spark/pull/22927#discussion_r230188800
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala ---
@@ -393,7 +393,7 @@ object LoadDataCommand {
throw new IllegalArgumentException(e)
}
} else {
- path
+ new Path(pathUri)
--- End diff --
`path` doesn't contain `workingDir` information.
Could you review this, @cloud-fan , @gatorsmile , @HyukjinKwon , @vanzin and @squito ?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22927: [SPARK-25918][SQL] LOAD DATA LOCAL INPATH should handle ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22927
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4712/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22927: [SPARK-25918][SQL] LOAD DATA LOCAL INPATH should handle ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22927
Merged build finished. Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22927: [SPARK-25918][SQL] LOAD DATA LOCAL INPATH should handle ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22927
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98377/
Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22927: [SPARK-25918][SQL] LOAD DATA LOCAL INPATH should handle ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22927
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22927: [SPARK-25918][SQL] LOAD DATA LOCAL INPATH should handle ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22927
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4705/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22927: [SPARK-25918][SQL] LOAD DATA LOCAL INPATH should handle ...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22927
**[Test build #98368 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98368/testReport)** for PR 22927 at commit [`efb99da`](https://github.com/apache/spark/commit/efb99da8fb505aaeeb0d95fff99c245bd3c0a0b8).
* This patch **fails SparkR unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22927: [SPARK-25918][SQL] LOAD DATA LOCAL INPATH should handle ...
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:
https://github.com/apache/spark/pull/22927
Thank you, @cloud-fan !
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org