You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by dongjoon-hyun <gi...@git.apache.org> on 2018/11/01 20:33:36 UTC

[GitHub] spark pull request #22927: [SPARK-25918][SQL] LOAD DATA LOCAL INPATH should ...

GitHub user dongjoon-hyun opened a pull request:

    https://github.com/apache/spark/pull/22927

    [SPARK-25918][SQL] LOAD DATA LOCAL INPATH should handle a relative path

    ## What changes were proposed in this pull request?
    
    Unfortunately, it seems that we missed this in 2.4.0. In Spark 2.4, `LOAD DATA LOCAL INPATH` only works in case of absolute paths. This PR aims to fix it to support relative paths. This is a regression in 2.4.0.
    
    ```scala
    $ ls kv1.txt
    kv1.txt
    
    scala> spark.sql("LOAD DATA LOCAL INPATH 'kv1.txt' INTO TABLE t")
    org.apache.spark.sql.AnalysisException: LOAD DATA input path does not exist: kv1.txt;
    ```
    
    ## How was this patch tested?
    
    Pass the Jenkins

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/dongjoon-hyun/spark SPARK-LOAD

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/22927.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #22927
    
----
commit efb99da8fb505aaeeb0d95fff99c245bd3c0a0b8
Author: Dongjoon Hyun <do...@...>
Date:   2018-11-01T20:26:47Z

    [SPARK-25918][SQL] LOAD DATA LOCAL INPATH should handle a relative path

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22927: [SPARK-25918][SQL] LOAD DATA LOCAL INPATH should handle ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22927
  
    **[Test build #98377 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98377/testReport)** for PR 22927 at commit [`85a5864`](https://github.com/apache/spark/commit/85a5864a5b6a910f3cc702d0407a5e015de2efcc).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22927: [SPARK-25918][SQL] LOAD DATA LOCAL INPATH should ...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/22927


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22927: [SPARK-25918][SQL] LOAD DATA LOCAL INPATH should handle ...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/22927
  
    I'll list in as a known issue in 2.4.0, thanks for fixing it!


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22927: [SPARK-25918][SQL] LOAD DATA LOCAL INPATH should handle ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22927
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98368/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22927: [SPARK-25918][SQL] LOAD DATA LOCAL INPATH should handle ...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/22927
  
    For SparkR failure, https://issues.apache.org/jira/browse/SPARK-25923 is filed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22927: [SPARK-25918][SQL] LOAD DATA LOCAL INPATH should ...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22927#discussion_r230224899
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala ---
    @@ -1987,6 +1987,13 @@ class SQLQuerySuite extends QueryTest with SQLTestUtils with TestHiveSingleton {
         }
       }
     
    +  test("SPARK-25918: LOAD DATA LOCAL INPATH should handle a relative path") {
    --- End diff --
    
    I'll move this.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22927: [SPARK-25918][SQL] LOAD DATA LOCAL INPATH should ...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22927#discussion_r230216125
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala ---
    @@ -1987,6 +1987,13 @@ class SQLQuerySuite extends QueryTest with SQLTestUtils with TestHiveSingleton {
         }
       }
     
    +  test("SPARK-25918: LOAD DATA LOCAL INPATH should handle a relative path") {
    --- End diff --
    
    This does not belong to this test suite. `HiveCommandSuite.scala` is the best place, although this is not for hive module. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22927: [SPARK-25918][SQL] LOAD DATA LOCAL INPATH should ...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22927#discussion_r230225838
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala ---
    @@ -393,7 +393,7 @@ object LoadDataCommand {
               throw new IllegalArgumentException(e)
           }
         } else {
    -      path
    +      new Path(pathUri)
    --- End diff --
    
    Good point. Instead of this, `new Path(workingDir, path)` will work. In that case, we may refactor this as a variable for line 379 and reuse it.
    ```
    new Path(workingDir, path)
    ```


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22927: [SPARK-25918][SQL] LOAD DATA LOCAL INPATH should handle ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22927
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22927: [SPARK-25918][SQL] LOAD DATA LOCAL INPATH should handle ...

Posted by squito <gi...@git.apache.org>.
Github user squito commented on the issue:

    https://github.com/apache/spark/pull/22927
  
    good catch, I should have checked this case too.
    
    lgtm


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22927: [SPARK-25918][SQL] LOAD DATA LOCAL INPATH should handle ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22927
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22927: [SPARK-25918][SQL] LOAD DATA LOCAL INPATH should ...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22927#discussion_r230224836
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala ---
    @@ -1987,6 +1987,13 @@ class SQLQuerySuite extends QueryTest with SQLTestUtils with TestHiveSingleton {
         }
       }
     
    +  test("SPARK-25918: LOAD DATA LOCAL INPATH should handle a relative path") {
    --- End diff --
    
    Ya. I know. I put this here before the previous test case are [here](https://github.com/apache/spark/pull/22927/files/efb99da8fb505aaeeb0d95fff99c245bd3c0a0b8#diff-1ea02a6fab84e938582f7f87cc4d9ea1R1997).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22927: [SPARK-25918][SQL] LOAD DATA LOCAL INPATH should handle ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22927
  
    **[Test build #98368 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98368/testReport)** for PR 22927 at commit [`efb99da`](https://github.com/apache/spark/commit/efb99da8fb505aaeeb0d95fff99c245bd3c0a0b8).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22927: [SPARK-25918][SQL] LOAD DATA LOCAL INPATH should handle ...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/22927
  
    Thank you for review, @squito !


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22927: [SPARK-25918][SQL] LOAD DATA LOCAL INPATH should handle ...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/22927
  
    Thank you, @squito and @gatorsmile . I address the review comments.
    The SparkR failure looks irrelevant to this. I also observed that in another unrelated PR (https://github.com/apache/spark/pull/22924), too


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22927: [SPARK-25918][SQL] LOAD DATA LOCAL INPATH should handle ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22927
  
    **[Test build #98377 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98377/testReport)** for PR 22927 at commit [`85a5864`](https://github.com/apache/spark/commit/85a5864a5b6a910f3cc702d0407a5e015de2efcc).
     * This patch **fails SparkR unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22927: [SPARK-25918][SQL] LOAD DATA LOCAL INPATH should handle ...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/22927
  
    Thank you for review, @squito , @gatorsmile , @HyukjinKwon .
    According to the log, all Java/Scala/Python/R tests passed. 
    
    The failure mark is only due to SPARK-25923. I'll merge this.
    ```
    * checking CRAN incoming feasibility ...Error in .check_package_CRAN_incoming(pkgdir) : 
      dims [product 26] do not match the length of object [0]
    Execution halted
    ```


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22927: [SPARK-25918][SQL] LOAD DATA LOCAL INPATH should ...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22927#discussion_r230217147
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala ---
    @@ -393,7 +393,7 @@ object LoadDataCommand {
               throw new IllegalArgumentException(e)
           }
         } else {
    -      path
    +      new Path(pathUri)
    --- End diff --
    
    Here, it is converted PATH to URI and then converted back to Path. What is your goal rather than directly building a path?
    ```
    if (path.isAbsolute()) path else new Path(workingDir, path)
    ```` 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22927: [SPARK-25918][SQL] LOAD DATA LOCAL INPATH should handle ...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/22927
  
    Merged to master/branch-2.4.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22927: [SPARK-25918][SQL] LOAD DATA LOCAL INPATH should ...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22927#discussion_r230188800
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala ---
    @@ -393,7 +393,7 @@ object LoadDataCommand {
               throw new IllegalArgumentException(e)
           }
         } else {
    -      path
    +      new Path(pathUri)
    --- End diff --
    
    `path` doesn't contain `workingDir` information.
    
    Could you review this, @cloud-fan , @gatorsmile , @HyukjinKwon , @vanzin and @squito ?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22927: [SPARK-25918][SQL] LOAD DATA LOCAL INPATH should handle ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22927
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4712/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22927: [SPARK-25918][SQL] LOAD DATA LOCAL INPATH should handle ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22927
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22927: [SPARK-25918][SQL] LOAD DATA LOCAL INPATH should handle ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22927
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98377/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22927: [SPARK-25918][SQL] LOAD DATA LOCAL INPATH should handle ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22927
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22927: [SPARK-25918][SQL] LOAD DATA LOCAL INPATH should handle ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22927
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4705/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22927: [SPARK-25918][SQL] LOAD DATA LOCAL INPATH should handle ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22927
  
    **[Test build #98368 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98368/testReport)** for PR 22927 at commit [`efb99da`](https://github.com/apache/spark/commit/efb99da8fb505aaeeb0d95fff99c245bd3c0a0b8).
     * This patch **fails SparkR unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22927: [SPARK-25918][SQL] LOAD DATA LOCAL INPATH should handle ...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/22927
  
    Thank you, @cloud-fan !


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org