You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Dongjoon Hyun (JIRA)" <ji...@apache.org> on 2017/03/09 07:55:38 UTC

[jira] [Updated] (SPARK-19881) Support Dynamic Partition Inserts params with SET command

     [ https://issues.apache.org/jira/browse/SPARK-19881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dongjoon Hyun updated SPARK-19881:
----------------------------------
    Description: 
## What changes were proposed in this pull request?

Since Spark 2.0.0, `SET` commands do not pass the values to HiveClient. In most case, Spark handles well. However, for the dynamic partition insert, users meet the following misleading situation. 

{code}
scala> spark.range(1001).selectExpr("id as key", "id as value").registerTempTable("t1001")

scala> sql("create table p (value int) partitioned by (key int)").show

scala> sql("insert into table p partition(key) select key, value from t1001")
org.apache.spark.SparkException:
Dynamic partition strict mode requires at least one static partition column.
To turn this off set hive.exec.dynamic.partition.mode=nonstrict

scala> sql("set hive.exec.dynamic.partition.mode=nonstrict")

scala> sql("insert into table p partition(key) select key, value from t1001")
org.apache.hadoop.hive.ql.metadata.HiveException:
Number of dynamic partitions created is 1001, which is more than 1000.
To solve this try to set hive.exec.max.dynamic.partitions to at least 1001.

scala> sql("set hive.exec.max.dynamic.partitions=1001")

scala> sql("set hive.exec.max.dynamic.partitions").show(false)
+--------------------------------+-----+
|key                             |value|
+--------------------------------+-----+
|hive.exec.max.dynamic.partitions|1001 |
+--------------------------------+-----+

scala> sql("insert into table p partition(key) select key, value from t1001")
org.apache.hadoop.hive.ql.metadata.HiveException:
Number of dynamic partitions created is 1001, which is more than 1000.
To solve this try to set hive.exec.max.dynamic.partitions to at least 1001.
{code}

The last error is the same with the previous one. `HiveClient` does not know new value 1001. There is no way to change the default value of `hive.exec.max.dynamic.partitions` of `HiveCilent` with `SET` command.

The root cause is that `hive` parameters are passed to `HiveClient` on creating. So, the workaround is to use `--hiveconf` when starting `spark-shell`. However, it is still unchangeable in `spark-shell`. We had better handle this case without misleading error messages ending infinite loop.

  was:
Currently, `SET` command does not pass the values to Hive. In most case, Spark handles well. However, for the dynamic partition insert, users meet the following situation. 

{code}
scala> spark.range(1001).selectExpr("id as key", "id as value").registerTempTable("t1001")

scala> sql("create table p (value int) partitioned by (key int)").show

scala> sql("insert into table p partition(key) select key, value from t1001")
org.apache.spark.SparkException: Dynamic partition strict mode requires at least one static partition column. To turn this off set hive.exec.dynamic.partition.mode=nonstrict

scala> sql("set hive.exec.dynamic.partition.mode=nonstrict")

scala> sql("insert into table p partition(key) select key, value from t1001")
org.apache.hadoop.hive.ql.metadata.HiveException: Number of dynamic partitions created is 1001, which is more than 1000. To solve this try to set hive.exec.max.dynamic.partitions to at least 1001.

scala> sql("set hive.exec.dynamic.partition.mode=1001")

scala> sql("insert into table p partition(key) select key, value from t1001")
org.apache.hadoop.hive.ql.metadata.HiveException: Number of dynamic partitions created is 1001, which is more than 1000. To solve this try to set hive.exec.max.dynamic.partitions to at least 1001.

<== Repeat the same error message.
{code}

The root cause is that `hive` parameters are passed to `HiveClient` on creating. So, The workaround is using `--hiveconf`.

We had better handle this case without misleading error messages.


> Support Dynamic Partition Inserts params with SET command
> ---------------------------------------------------------
>
>                 Key: SPARK-19881
>                 URL: https://issues.apache.org/jira/browse/SPARK-19881
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.0.0, 2.1.0
>            Reporter: Dongjoon Hyun
>            Priority: Minor
>
> ## What changes were proposed in this pull request?
> Since Spark 2.0.0, `SET` commands do not pass the values to HiveClient. In most case, Spark handles well. However, for the dynamic partition insert, users meet the following misleading situation. 
> {code}
> scala> spark.range(1001).selectExpr("id as key", "id as value").registerTempTable("t1001")
> scala> sql("create table p (value int) partitioned by (key int)").show
> scala> sql("insert into table p partition(key) select key, value from t1001")
> org.apache.spark.SparkException:
> Dynamic partition strict mode requires at least one static partition column.
> To turn this off set hive.exec.dynamic.partition.mode=nonstrict
> scala> sql("set hive.exec.dynamic.partition.mode=nonstrict")
> scala> sql("insert into table p partition(key) select key, value from t1001")
> org.apache.hadoop.hive.ql.metadata.HiveException:
> Number of dynamic partitions created is 1001, which is more than 1000.
> To solve this try to set hive.exec.max.dynamic.partitions to at least 1001.
> scala> sql("set hive.exec.max.dynamic.partitions=1001")
> scala> sql("set hive.exec.max.dynamic.partitions").show(false)
> +--------------------------------+-----+
> |key                             |value|
> +--------------------------------+-----+
> |hive.exec.max.dynamic.partitions|1001 |
> +--------------------------------+-----+
> scala> sql("insert into table p partition(key) select key, value from t1001")
> org.apache.hadoop.hive.ql.metadata.HiveException:
> Number of dynamic partitions created is 1001, which is more than 1000.
> To solve this try to set hive.exec.max.dynamic.partitions to at least 1001.
> {code}
> The last error is the same with the previous one. `HiveClient` does not know new value 1001. There is no way to change the default value of `hive.exec.max.dynamic.partitions` of `HiveCilent` with `SET` command.
> The root cause is that `hive` parameters are passed to `HiveClient` on creating. So, the workaround is to use `--hiveconf` when starting `spark-shell`. However, it is still unchangeable in `spark-shell`. We had better handle this case without misleading error messages ending infinite loop.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org