You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "dongxu (JIRA)" <ji...@apache.org> on 2015/04/01 07:16:52 UTC
[jira] [Updated] (SPARK-6644) [SPARK-SQL]when the partition schema
does not match table schema(ADD COLUMN), new column value is NULL
[ https://issues.apache.org/jira/browse/SPARK-6644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
dongxu updated SPARK-6644:
--------------------------
Summary: [SPARK-SQL]when the partition schema does not match table schema(ADD COLUMN), new column value is NULL (was: [SPARK-SQL]when the partition schema does not match table schema(ADD COLUMN), new column is NULL)
> [SPARK-SQL]when the partition schema does not match table schema(ADD COLUMN), new column value is NULL
> ------------------------------------------------------------------------------------------------------
>
> Key: SPARK-6644
> URL: https://issues.apache.org/jira/browse/SPARK-6644
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 1.3.0
> Reporter: dongxu
>
> In hive,the schema of partition may be difference from the table schema. For example, we add new column. When we use spark-sql to query the data of partition which schema is difference from the table schema.
> some problems is solved(https://github.com/apache/spark/pull/4289),
> but if you add a new column,put new data into the old partition,new column value is NULL
> [According to the following steps]:
> case class TestData(key: Int, value: String)
> val testData = TestHive.sparkContext.parallelize(
> (1 to 10).map(i => TestData(i, i.toString))).toDF()
> testData.registerTempTable("testData")
> sql("DROP TABLE IF EXISTS table_with_partition ")
> sql(s"CREATE TABLE IF NOT EXISTS table_with_partition(key int,value string) PARTITIONED by (ds string) location '${tmpDir.toURI.toString}' ")
> sql("INSERT OVERWRITE TABLE table_with_partition partition (ds='1') SELECT key,value FROM testData")
> // add column to table
> sql("ALTER TABLE table_with_partition ADD COLUMNS(key1 string)")
> sql("ALTER TABLE table_with_partition ADD COLUMNS(destlng double)")
> sql("INSERT OVERWRITE TABLE table_with_partition partition (ds='1') SELECT key,value,'test',1.11 FROM testData")
> sql("select * from table_with_partition where ds='1' ").collect().foreach(println)
>
> result :
> [1,1,null,null,1]
> [2,2,null,null,1]
>
> result we expect:
> [1,1,test,1.11,1]
> [2,2,test,1.11,1]
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org