You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Apache Spark (JIRA)" <ji...@apache.org> on 2015/12/10 11:59:11 UTC

[jira] [Assigned] (SPARK-12257) Non partitioned insert into a partitioned Hive table doesn't fail

     [ https://issues.apache.org/jira/browse/SPARK-12257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Apache Spark reassigned SPARK-12257:
------------------------------------

    Assignee: Apache Spark

> Non partitioned insert into a partitioned Hive table doesn't fail
> -----------------------------------------------------------------
>
>                 Key: SPARK-12257
>                 URL: https://issues.apache.org/jira/browse/SPARK-12257
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.5.1
>            Reporter: Mark Grover
>            Assignee: Apache Spark
>            Priority: Minor
>
> I am using Spark 1.5.1 but I anticipate this to be a problem with master as well (will check later).
> I have a dataframe, and a partitioned Hive table that I want to insert the contents of the data frame into.
> Let's say mytable is a non-partitioned Hive table and mytable_partitioned is a partitioned Hive table. In Hive, if you try to insert from the non-partitioned mytable table into mytable_partitioned without specifying the partition, the query fails, as expected:
> {quote}
> INSERT INTO mytable_partitioned SELECT * FROM mytable;
> {quote}
> Error: Error while compiling statement: FAILED: SemanticException 1:12 Need to specify partition columns because the destination table is partitioned. Error encountered near token 'mytable_partitioned' (state=42000,code=40000)
> {quote}
> However, if I do the same in Spark SQL:
> {code}
> val myDfTempTable = myDf.registerTempTable("my_df_temp_table")
> sqlContext.sql("INSERT INTO mytable_partitioned SELECT * FROM my_df_temp_table")
> {code}
> This appears to succeed but does no insertion. This should fail with an error stating the data is being inserted into a partitioned table without specifying the name of the partition.
> Of course, the name of the partition is explicitly specified, both Hive and Spark SQL do the right thing and function correctly.
> In hive:
> {code}
> INSERT INTO mytable_partitioned PARTITION (y='abc') SELECT * FROM mytable;
> {code}
> In Spark SQL:
> {code}
> val myDfTempTable = myDf.registerTempTable("my_df_temp_table")
> sqlContext.sql("INSERT INTO mytable_partitioned PARTITION (y='abc') SELECT * FROM my_df_temp_table")
> {code}
> And, here are the definitions of my tables, as reference:
> {code}
> CREATE TABLE mytable(x INT);
> CREATE TABLE mytable_partitioned (x INT) PARTITIONED BY (y INT);
> {code}
> You will also need to insert some dummy data into mytable to ensure that the insertion is actually not working:
> {code}
> #!/bin/bash
> rm -rf data.txt;
> for i in {0..9}; do
> echo $i >> data.txt
> done
> sudo -u hdfs hadoop fs -put data.txt /user/hive/warehouse/mytable
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org