You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by watermen <gi...@git.apache.org> on 2015/02/11 11:59:54 UTC

[GitHub] spark pull request: [SPARK-5741][SQL] Support comma in path in Hiv...

GitHub user watermen opened a pull request:

    https://github.com/apache/spark/pull/4532

    [SPARK-5741][SQL] Support comma in path in HiveContext

    When run ```select * from nzhang_part where hr = 'file,';```, it will throw error ```java.lang.IllegalArgumentException: Can not create a Path from an empty string```
    ```, because the path name of hdfs contains comma.
    set hive.merge.mapfiles=true; 
    set hive.merge.mapredfiles=true;
    set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
    set hive.exec.dynamic.partition=true;
    set hive.exec.dynamic.partition.mode=nonstrict;
    create table nzhang_part like srcpart;
    insert overwrite table nzhang_part partition (ds='2010-08-15', hr) select key, value, hr from srcpart where ds='2008-04-08';
    insert overwrite table nzhang_part partition (ds='2010-08-15', hr=11) select key, value from srcpart where ds='2008-04-08';
    insert overwrite table nzhang_part partition (ds='2010-08-15', hr) 
    select * from (
    select key, value, hr from srcpart where ds='2008-04-08'
    union all
    select '1' as key, '1' as value, 'file,' as hr from src limit 1) s;
    select * from nzhang_part where hr = 'file,';
    ```
    ###############################
    Error log
    ###############################
    15/02/10 14:33:16 ERROR SparkSQLDriver: Failed in [select * from nzhang_part where hr = 'file,']
    java.lang.IllegalArgumentException: Can not create a Path from an empty string
    at org.apache.hadoop.fs.Path.checkPathArg(Path.java:127)
    at org.apache.hadoop.fs.Path.<init>(Path.java:135)
    at org.apache.hadoop.util.StringUtils.stringToPath(StringUtils.java:241)
    at org.apache.hadoop.mapred.FileInputFormat.setInputPaths(FileInputFormat.java:400)
    at org.apache.spark.sql.hive.HadoopTableReader$.initializeLocalJobConfFunc(TableReader.scala:251)
    at org.apache.spark.sql.hive.HadoopTableReader$$anonfun$11.apply(TableReader.scala:229)
    at org.apache.spark.sql.hive.HadoopTableReader$$anonfun$11.apply(TableReader.scala:229)
    at org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$6.apply(HadoopRDD.scala:172)
    at org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$6.apply(HadoopRDD.scala:172)
    at scala.Option.map(Option.scala:145)
    at org.apache.spark.rdd.HadoopRDD.getJobConf(HadoopRDD.scala:172)
    at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:196)

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/watermen/spark SPARK-5741

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/4532.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #4532
    
----
commit 2eedacc06b36dd582155da1709985e6d262a1e57
Author: q00251598 <qi...@huawei.com>
Date:   2015-02-11T09:16:06Z

    change setInputPaths to set

commit dc83c8893dc43f4efa6631f6d0aef925ab84c4dc
Author: q00251598 <qi...@huawei.com>
Date:   2015-02-11T09:41:18Z

    change setInputPaths to set

commit ae41e55c93745c8eaa3f0a5bd50131917271481d
Author: q00251598 <qi...@huawei.com>
Date:   2015-02-11T09:52:37Z

    change setInputPaths to set

commit 358ba4d8614536b68b938a71f67011911d158c8d
Author: q00251598 <qi...@huawei.com>
Date:   2015-02-11T10:17:32Z

    change FileInputFormat.setInputPaths to jobConf.set

commit 0ab9fac1ba9410b1a632c4d63357c3e7437b031b
Author: q00251598 <qi...@huawei.com>
Date:   2015-02-11T10:48:52Z

    change FileInputFormat.setInputPaths to jobConf.set

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5741][SQL] Support the path contains co...

Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on the pull request:

    https://github.com/apache/spark/pull/4532#issuecomment-76329616
  
    LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5741][SQL] Support the path contains co...

Posted by watermen <gi...@git.apache.org>.
Github user watermen commented on the pull request:

    https://github.com/apache/spark/pull/4532#issuecomment-76328119
  
    @yhuai Can you review the code for me?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5741][SQL] Support the path contains co...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4532#issuecomment-76325234
  
      [Test build #28034 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28034/consoleFull) for   PR 4532 at commit [`9758ab1`](https://github.com/apache/spark/commit/9758ab1a28a5bd2a7d58dc8dfa178988f12daaa0).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5741][SQL] Support the path contains co...

Posted by watermen <gi...@git.apache.org>.
Github user watermen commented on the pull request:

    https://github.com/apache/spark/pull/4532#issuecomment-76648238
  
    @marmbrus @Cheng Lian Can it be merged?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5741][SQL] Support the path contains co...

Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4532#discussion_r25406876
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/TableReader.scala ---
    @@ -248,7 +249,7 @@ private[hive] object HadoopTableReader extends HiveInspectors {
        * instantiate a HadoopRDD.
        */
       def initializeLocalJobConfFunc(path: String, tableDesc: TableDesc)(jobConf: JobConf) {
    -    FileInputFormat.setInputPaths(jobConf, path)
    +    jobConf.set("mapred.input.dir", StringUtils.escapeString(path.toString()))
    --- End diff --
    
    o, I see. `getPathStrings` does not really care if a comma is escaped or not... Can we use `public static void setInputPaths(Job job,  Path... inputPaths)`? I think it is better to avoid using `set` directly with a string key (using a method seems more robust).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5741][SQL] Support the path contains co...

Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4532#discussion_r24512732
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/TableReader.scala ---
    @@ -248,7 +249,7 @@ private[hive] object HadoopTableReader extends HiveInspectors {
        * instantiate a HadoopRDD.
        */
       def initializeLocalJobConfFunc(path: String, tableDesc: TableDesc)(jobConf: JobConf) {
    -    FileInputFormat.setInputPaths(jobConf, path)
    +    jobConf.set("mapred.input.dir", StringUtils.escapeString(path.toString()))
    --- End diff --
    
    Instead of setting the conf using the key, can we still use `FileInputFormat.setInputPaths`? Like
    ```
    FileInputFormat.setInputPaths(jobConf, StringUtils.escapeString(path))
    ```



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5741][SQL] Support the path contains co...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/4532#issuecomment-76325241
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28034/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5741][SQL] Support the path contains co...

Posted by watermen <gi...@git.apache.org>.
Github user watermen commented on the pull request:

    https://github.com/apache/spark/pull/4532#issuecomment-75016913
  
    @yhuai Can you review it?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5741][SQL] Support the path contains co...

Posted by scwf <gi...@git.apache.org>.
Github user scwf commented on the pull request:

    https://github.com/apache/spark/pull/4532#issuecomment-74173731
  
    lgtm



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5741][SQL] Support the path contains co...

Posted by adrian-wang <gi...@git.apache.org>.
Github user adrian-wang commented on the pull request:

    https://github.com/apache/spark/pull/4532#issuecomment-73919122
  
    ok to test.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5741][SQL] Support the path contains co...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/4532


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5741][SQL] Support the path contains co...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/4532#issuecomment-73938036
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27292/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5741][SQL] Support the path contains co...

Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on the pull request:

    https://github.com/apache/spark/pull/4532#issuecomment-73921513
  
    ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5741][SQL] Support the path contains co...

Posted by watermen <gi...@git.apache.org>.
Github user watermen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4532#discussion_r24551019
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/TableReader.scala ---
    @@ -248,7 +249,7 @@ private[hive] object HadoopTableReader extends HiveInspectors {
        * instantiate a HadoopRDD.
        */
       def initializeLocalJobConfFunc(path: String, tableDesc: TableDesc)(jobConf: JobConf) {
    -    FileInputFormat.setInputPaths(jobConf, path)
    +    jobConf.set("mapred.input.dir", StringUtils.escapeString(path.toString()))
    --- End diff --
    
    Can't, for examples "hdfs://x.x.x.x:9000/user/hive/warehouse/nzhang_part/ds=2010-08-15/hr=file," is will be splited into "hdfs://x.x.x.x:9000/user/hive/warehouse/nzhang_part/ds=2010-08-15/hr=file" and "" by FileInputFormat.getPathStrings, "" will be checked by Path.checkPathArg and 
    ```
    if( path.length() == 0 ) {
           throw new IllegalArgumentException(
               "Can not create a Path from an empty string");
        }
    ```
    you can see 
    ```
    FileInputFormat.setInputPaths -> FileInputFormat.getPathStrings -> Path.checkPathArg
    ```
    in hadoop for detail.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5741][SQL] Support the path contains co...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on the pull request:

    https://github.com/apache/spark/pull/4532#issuecomment-76766940
  
    Thanks!  Merging to master and 1.3.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5741][SQL] Support the path contains co...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4532#issuecomment-76318535
  
      [Test build #28034 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28034/consoleFull) for   PR 4532 at commit [`9758ab1`](https://github.com/apache/spark/commit/9758ab1a28a5bd2a7d58dc8dfa178988f12daaa0).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5741][SQL] Support the path contains co...

Posted by watermen <gi...@git.apache.org>.
Github user watermen commented on the pull request:

    https://github.com/apache/spark/pull/4532#issuecomment-76648469
  
    @marmbrus @rxin  Can it be merged?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5741][SQL] Support the path contains co...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4532#issuecomment-73922024
  
      [Test build #27292 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27292/consoleFull) for   PR 4532 at commit [`b788a72`](https://github.com/apache/spark/commit/b788a724880e2f478a3bafda7ca040a0390985c2).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5741][SQL] Support comma in path in Hiv...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/4532#issuecomment-73864321
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5741][SQL] Support the path contains co...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4532#issuecomment-73938027
  
      [Test build #27292 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27292/consoleFull) for   PR 4532 at commit [`b788a72`](https://github.com/apache/spark/commit/b788a724880e2f478a3bafda7ca040a0390985c2).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org