You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by zenglinxi0615 <gi...@git.apache.org> on 2016/07/07 07:31:36 UTC

[GitHub] spark pull request #14085: [SPARK-16408][SQL] SparkSQL Added file get Except...

GitHub user zenglinxi0615 opened a pull request:

    https://github.com/apache/spark/pull/14085

    [SPARK-16408][SQL] SparkSQL Added file get Exception: is a directory …

    ## What changes were proposed in this pull request?
    This PR is for adding an parameter (spark.input.dir.recursive) to control the value of recursive in SparkContext#addFile, so we can support "add file hdfs://dir/path" cmd in SparkSQL 
    
    ## How was this patch tested?
    manual tests:
    set the conf: --conf spark.input.dir.recursive=true, and run spark-sql -e "add file hdfs://dir/path"

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/zenglinxi0615/spark SPARK-16408

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/14085.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #14085
    
----
commit d2e05c155e4e52dfda177a21615de7743a2c5917
Author: 曾林西 <ze...@meituan.com>
Date:   2016-07-07T06:20:19Z

    [SPARK-16408][SQL] SparkSQL Added file get Exception: is a directory and recursive is not turned on

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14085: [SPARK-16408][SQL] SparkSQL Added file get Except...

Posted by zenglinxi0615 <gi...@git.apache.org>.
Github user zenglinxi0615 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14085#discussion_r69865365
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/commands.scala ---
    @@ -113,8 +113,9 @@ case class AddFile(path: String) extends RunnableCommand {
     
       override def run(sqlContext: SQLContext): Seq[Row] = {
         val hiveContext = sqlContext.asInstanceOf[HiveContext]
    +    val recursive = sqlContext.sparkContext.getConf.getBoolean("spark.input.dir.recursive", false)
    --- End diff --
    
    And by the way, I have tried:
    val recursive = hiveContext.getConf("spark.input.dir.recursive", "false")
    but this can only work in spark sql by execute set spark.input.dir.recursive=true before add file, and we can't set the value by --conf spark.input.dir.recursive=true. This makes it difficult for us to move some hive sql directly to SparkSQL.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14085: [SPARK-16408][SQL] SparkSQL Added file get Except...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14085#discussion_r69863303
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/commands.scala ---
    @@ -113,8 +113,9 @@ case class AddFile(path: String) extends RunnableCommand {
     
       override def run(sqlContext: SQLContext): Seq[Row] = {
         val hiveContext = sqlContext.asInstanceOf[HiveContext]
    +    val recursive = sqlContext.sparkContext.getConf.getBoolean("spark.input.dir.recursive", false)
    --- End diff --
    
    I'm not sure these are semantics that are supported by the SQL dialect in Spark SQL. In any event the name of this property is too generic, and I don't think it is something that is set globally.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14085: [SPARK-16408][SQL] SparkSQL Added file get Exception: is...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/14085
  
    @zenglinxi0615 Could you answer to the question above if you are active?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14085: [SPARK-16408][SQL] SparkSQL Added file get Except...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/14085


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14085: [SPARK-16408][SQL] SparkSQL Added file get Exception: is...

Posted by jinxing64 <gi...@git.apache.org>.
Github user jinxing64 commented on the issue:

    https://github.com/apache/spark/pull/14085
  
    @zenglinxi0615 
    This pr is about adding all files in a directory recursively, thus no need to enumerate all the filenames? I think this can be pretty useful especially in production env.
    
    Just one quick question, could we give a default configuration for `spark.input.dir.recursive` and at the same time we can also set it via `set spark.input.dir.recursive=true` ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14085: [SPARK-16408][SQL] SparkSQL Added file get Except...

Posted by zenglinxi0615 <gi...@git.apache.org>.
Github user zenglinxi0615 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14085#discussion_r122620464
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/commands.scala ---
    @@ -113,8 +113,9 @@ case class AddFile(path: String) extends RunnableCommand {
     
       override def run(sqlContext: SQLContext): Seq[Row] = {
         val hiveContext = sqlContext.asInstanceOf[HiveContext]
    +    val recursive = sqlContext.sparkContext.getConf.getBoolean("spark.input.dir.recursive", false)
    --- End diff --
    
    I was wondering if we could call:
    sparkSession.sparkContext.addFile(path, true)
    in AddFileCommand func, since it's a general demand in ETL.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14085: [SPARK-16408][SQL] SparkSQL Added file get Exception: is...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14085
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14085: [SPARK-16408][SQL] SparkSQL Added file get Except...

Posted by zenglinxi0615 <gi...@git.apache.org>.
Github user zenglinxi0615 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14085#discussion_r69864435
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/commands.scala ---
    @@ -113,8 +113,9 @@ case class AddFile(path: String) extends RunnableCommand {
     
       override def run(sqlContext: SQLContext): Seq[Row] = {
         val hiveContext = sqlContext.asInstanceOf[HiveContext]
    +    val recursive = sqlContext.sparkContext.getConf.getBoolean("spark.input.dir.recursive", false)
    --- End diff --
    
    I'm pretty sure that it's supported by the SQL dialect in Spark SQL. 
    And about "the name of this property is too generic, and I don't think it is something that is set globally", do you think we should use another name? and the default value should be true?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14085: [SPARK-16408][SQL] SparkSQL Added file get Except...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14085#discussion_r121723863
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/commands.scala ---
    @@ -113,8 +113,9 @@ case class AddFile(path: String) extends RunnableCommand {
     
       override def run(sqlContext: SQLContext): Seq[Row] = {
         val hiveContext = sqlContext.asInstanceOf[HiveContext]
    +    val recursive = sqlContext.sparkContext.getConf.getBoolean("spark.input.dir.recursive", false)
    --- End diff --
    
    Adding this session-scoped configuration is risky. If needed, we can improve the SQL syntax for supporting it. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org