You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by dongjoon-hyun <gi...@git.apache.org> on 2017/03/09 07:35:07 UTC

[GitHub] spark pull request #17223: [SPARK-19881][SQL] Support Dynamic Partition Inse...

GitHub user dongjoon-hyun opened a pull request:

    https://github.com/apache/spark/pull/17223

    [SPARK-19881][SQL] Support Dynamic Partition Inserts params with SET command

    ## What changes were proposed in this pull request?
    
    Since Spark 2.0.0, `SET` commands do not pass the values to HiveClient. In most case, Spark handles well. However, for the dynamic partition insert, users meet the following misleading situation. 
    
    ```scala
    scala> spark.range(1001).selectExpr("id as key", "id as value").registerTempTable("t1001")
    
    scala> sql("create table p (value int) partitioned by (key int)").show
    
    scala> sql("insert into table p partition(key) select key, value from t1001")
    org.apache.spark.SparkException:
    Dynamic partition strict mode requires at least one static partition column.
    To turn this off set hive.exec.dynamic.partition.mode=nonstrict
    
    scala> sql("set hive.exec.dynamic.partition.mode=nonstrict")
    
    scala> sql("insert into table p partition(key) select key, value from t1001")
    org.apache.hadoop.hive.ql.metadata.HiveException:
    Number of dynamic partitions created is 1001, which is more than 1000.
    To solve this try to set hive.exec.max.dynamic.partitions to at least 1001.
    
    scala> sql("set hive.exec.dynamic.partition.mode=1001")
    
    scala> sql("insert into table p partition(key) select key, value from t1001")
    org.apache.hadoop.hive.ql.metadata.HiveException:
    Number of dynamic partitions created is 1001, which is more than 1000.
    To solve this try to set hive.exec.max.dynamic.partitions to at least 1001.
    ```
    
    The root cause is that `hive` parameters are passed to `HiveClient` on creating. So, There is a workaround to use `--hiveconf`. However, we had better handle this case without misleading error messages ending infinite loop.
    
    ## How was this patch tested?
    
    Manual.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/dongjoon-hyun/spark SPARK-19881

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/17223.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #17223
    
----
commit a8744551608cc8dd2fd6ada63c28a93d65e865b4
Author: Dongjoon Hyun <do...@apache.org>
Date:   2017-03-09T06:35:17Z

    [SPARK-19881][SQL] Support Dynamic Partition Inserts params with SET command

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17223: [SPARK-19881][SQL] Support Dynamic Partition Inserts par...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17223
  
    **[Test build #74247 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74247/testReport)** for PR 17223 at commit [`a874455`](https://github.com/apache/spark/commit/a8744551608cc8dd2fd6ada63c28a93d65e865b4).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17223: [SPARK-19881][SQL] Support Dynamic Partition Inserts par...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17223
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74247/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17223: [SPARK-19881][SQL] Support Dynamic Partition Inserts par...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17223
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17223: [SPARK-19881][SQL] Support Dynamic Partition Inse...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun closed the pull request at:

    https://github.com/apache/spark/pull/17223


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17223: [SPARK-19881][SQL] Support Dynamic Partition Inserts par...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17223
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74255/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17223: [SPARK-19881][SQL] Support Dynamic Partition Inserts par...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/17223
  
    Could you review this when you have sometime, @gatorsmile ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17223: [SPARK-19881][SQL] Support Dynamic Partition Inserts par...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/17223
  
    I see. That's the reason why not to support that. Thank you, @cloud-fan.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17223: [SPARK-19881][SQL] Support Dynamic Partition Inse...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17223#discussion_r105797391
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala ---
    @@ -793,6 +794,20 @@ private[spark] class HiveExternalCatalog(conf: SparkConf, hadoopConf: Configurat
           orderedPartitionSpec.put(colName.toLowerCase, partition(colName))
         }
     
    +    // Access SQLConf to get 'Dynamic Partition Inserts' parameter specified dynamically
    +    // after HiveClient is created
    +    val sqlConf = SparkSession.getActiveSession.get.sessionState.conf
    +    Seq(
    +      "hive.exec.max.dynamic.partitions",
    +      "hive.exec.max.dynamic.partitions.pernode",
    +      "hive.exec.max.created.files",
    +      "hive.error.on.empty.partition"
    +    ).foreach { param =>
    +      if (sqlConf.contains(param)) {
    +        client.runSqlHive(s"set $param=${sqlConf.getConfString(param)}")
    --- End diff --
    
    That will be the best approach since this is a general issue for all unhandled hive param options. The reason to do this here is that `SetCommand` lives in `sql/core` and does not interact with this. Is there a way to invoke `runSqlHive` there?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17223: [SPARK-19881][SQL] Support Dynamic Partition Inserts par...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/17223
  
    Hi, @cloud-fan .
    Could you review this when you have sometime?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17223: [SPARK-19881][SQL] Support Dynamic Partition Inse...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17223#discussion_r105102457
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala ---
    @@ -793,6 +794,20 @@ private[spark] class HiveExternalCatalog(conf: SparkConf, hadoopConf: Configurat
           orderedPartitionSpec.put(colName.toLowerCase, partition(colName))
         }
     
    +    // Access SQLConf to get 'Dynamic Partition Inserts' parameter specified dynamically
    +    // after HiveClient is created
    +    val sqlConf = SparkSession.getActiveSession.get.sessionState.conf
    +    Seq(
    +      "hive.exec.max.dynamic.partitions",
    +      "hive.exec.max.dynamic.partitions.pernode",
    +      "hive.exec.max.created.files",
    +      "hive.error.on.empty.partition"
    --- End diff --
    
    We need only 4 among 6 parameters in https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML#LanguageManualDML-DynamicPartitionInserts .


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17223: [SPARK-19881][SQL] Support Dynamic Partition Inse...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17223#discussion_r105777063
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala ---
    @@ -793,6 +794,20 @@ private[spark] class HiveExternalCatalog(conf: SparkConf, hadoopConf: Configurat
           orderedPartitionSpec.put(colName.toLowerCase, partition(colName))
         }
     
    +    // Access SQLConf to get 'Dynamic Partition Inserts' parameter specified dynamically
    +    // after HiveClient is created
    +    val sqlConf = SparkSession.getActiveSession.get.sessionState.conf
    +    Seq(
    +      "hive.exec.max.dynamic.partitions",
    +      "hive.exec.max.dynamic.partitions.pernode",
    +      "hive.exec.max.created.files",
    +      "hive.error.on.empty.partition"
    +    ).foreach { param =>
    +      if (sqlConf.contains(param)) {
    +        client.runSqlHive(s"set $param=${sqlConf.getConfString(param)}")
    --- End diff --
    
    Should we do it when users issuing the SET command? Is it a general issue?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17223: [SPARK-19881][SQL] Support Dynamic Partition Inserts par...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17223
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17223: [SPARK-19881][SQL] Support Dynamic Partition Inserts par...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/17223
  
    I'll close this PR and JIRA issue.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17223: [SPARK-19881][SQL] Support Dynamic Partition Inserts par...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/17223
  
    Retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17223: [SPARK-19881][SQL] Support Dynamic Partition Inserts par...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17223
  
    **[Test build #74255 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74255/testReport)** for PR 17223 at commit [`a874455`](https://github.com/apache/spark/commit/a8744551608cc8dd2fd6ada63c28a93d65e865b4).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17223: [SPARK-19881][SQL] Support Dynamic Partition Inserts par...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17223
  
    **[Test build #74255 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74255/testReport)** for PR 17223 at commit [`a874455`](https://github.com/apache/spark/commit/a8744551608cc8dd2fd6ada63c28a93d65e865b4).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17223: [SPARK-19881][SQL] Support Dynamic Partition Inserts par...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/17223
  
    Since hive client is shared among all sessions, we can't set hive conf dynamically, to keep session isolation. I think we should treat hive conf as static sql conf, and throw exception when users try to change them.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org