You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Mark Grover (JIRA)" <ji...@apache.org> on 2015/12/10 07:47:10 UTC
[jira] [Created] (SPARK-12257) Non partitioned insert into a partitioned Hive table doesn't fail

Mark Grover created SPARK-12257:
-----------------------------------

             Summary: Non partitioned insert into a partitioned Hive table doesn't fail
                 Key: SPARK-12257
                 URL: https://issues.apache.org/jira/browse/SPARK-12257
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 1.5.1
            Reporter: Mark Grover
            Priority: Minor


I am using Spark 1.5.1 but I anticipate this to be a problem with master as well (will check later).

I have a dataframe, and a partitioned Hive table that I want to insert the contents of the data frame into.

Let's say mytable is a non-partitioned Hive table and mytable_partitioned is a partitioned Hive table. In Hive, if you try to insert from the non-partitioned mytable table into mytable_partitioned without specifying the partition, the query fails, as expected:
{quote}
INSERT INTO mytable_partitioned SELECT * FROM mytable;
{quote}
Error: Error while compiling statement: FAILED: SemanticException 1:12 Need to specify partition columns because the destination table is partitioned. Error encountered near token 'mytable_partitioned' (state=42000,code=40000)
{quote}

However, if I do the same in Spark SQL:
{code}
val myDfTempTable = myDf.registerTempTable("my_df_temp_table")
sqlContext.sql("INSERT INTO mytable_partitioned SELECT * FROM my_df_temp_table")
{code}
This appears to succeed but does no insertion. This should fail with an error stating the data is being inserted into a partitioned table without specifying the name of the partition.

Of course, the name of the partition is explicitly specified, both Hive and Spark SQL do the right thing and function correctly.
In hive:
{code}
INSERT INTO mytable_partitioned PARTITION (y='abc') SELECT * FROM mytable;
{code}
In Spark SQL:
{code}
val myDfTempTable = myDf.registerTempTable("my_df_temp_table")
sqlContext.sql("INSERT INTO mytable_partitioned PARTITION (y='abc') SELECT * FROM my_df_temp_table")
{code}

And, here are the definitions of my tables, as reference:
{code}
CREATE TABLE mytable(x INT);
CREATE TABLE mytable_partitioned (x INT) PARTITIONED BY (y INT);
{code}

You will also need to insert some dummy data into mytable to ensure that the insertion is actually not working:
{code}
#!/bin/bash
rm -rf data.txt;
for i in {0..9}; do
echo $i >> data.txt
done
sudo -u hdfs hadoop fs -put data.txt /user/hive/warehouse/mytable
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org