You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "dongxu (JIRA)" <ji...@apache.org> on 2015/04/01 07:16:52 UTC

[jira] [Updated] (SPARK-6644) [SPARK-SQL]when the partition schema does not match table schema(ADD COLUMN), new column value is NULL

     [ https://issues.apache.org/jira/browse/SPARK-6644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

dongxu updated SPARK-6644:
--------------------------
    Summary: [SPARK-SQL]when the partition schema does not match table schema(ADD COLUMN), new column value is NULL  (was: [SPARK-SQL]when the partition schema does not match table schema(ADD COLUMN), new column is NULL)

> [SPARK-SQL]when the partition schema does not match table schema(ADD COLUMN), new column value is NULL
> ------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-6644
>                 URL: https://issues.apache.org/jira/browse/SPARK-6644
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.3.0
>            Reporter: dongxu
>
> In hive,the schema of partition may be difference from the table schema. For example, we add new column. When we use spark-sql to query the data of partition which schema is difference from the table schema.
> some problems is solved(https://github.com/apache/spark/pull/4289), 
> but if you add a new column,put new data into the old partition,new column value is NULL
> [According to the following steps]:
> case class TestData(key: Int, value: String)
> val testData = TestHive.sparkContext.parallelize(
>       (1 to 10).map(i => TestData(i, i.toString))).toDF()
> testData.registerTempTable("testData")
>  sql("DROP TABLE IF EXISTS table_with_partition ")
>  sql(s"CREATE  TABLE  IF NOT EXISTS  table_with_partition(key int,value string) PARTITIONED by (ds string) location '${tmpDir.toURI.toString}' ")
>  sql("INSERT OVERWRITE TABLE table_with_partition  partition (ds='1') SELECT key,value FROM testData")
>     // add column to table
>  sql("ALTER TABLE table_with_partition ADD COLUMNS(key1 string)")
>  sql("ALTER TABLE table_with_partition ADD COLUMNS(destlng double)") 
>  sql("INSERT OVERWRITE TABLE table_with_partition  partition (ds='1') SELECT key,value,'test',1.11 FROM testData")
>  sql("select * from table_with_partition where ds='1' ").collect().foreach(println)	 
>  
> result : 
> [1,1,null,null,1]
> [2,2,null,null,1]
>  
> result we expect:
> [1,1,test,1.11,1]
> [2,2,test,1.11,1]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org