You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "CHC (Jira)" <ji...@apache.org> on 2020/09/12 07:48:00 UTC
[jira] [Comment Edited] (SPARK-32838) Connot overwite different partition with same table

    [ https://issues.apache.org/jira/browse/SPARK-32838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17194644#comment-17194644 ] 

CHC edited comment on SPARK-32838 at 9/12/20, 7:47 AM:
-------------------------------------------------------

After spending a long time exploring,

I found that [HiveStrategies.scala|https://github.com/apache/spark/blob/v3.0.0/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveStrategies.scala#L209-L215] will convert HiveTableRelation to LogicalRelation,

and this will match this case [DataSourceAnalysis|https://github.com/apache/spark/blob/v3.0.0/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala#L166-L228] condition

(at spark 2.4.3, HiveTableRelation will not to convert to LogicalRelation if table is partitioned,

so if table is none partitioned, insert overwirte itselft also will be get an error)

This is ok when:
{code:java}
set spark.sql.hive.convertInsertingPartitionedTable=false;
insert overwrite table tmp.spark3_snap partition(dt='2020-09-10')
select id from tmp.spark3_snap where dt='2020-09-09';
{code}
I think this is bug cause this scene is normal demand.

 

 


was (Author: chenxchen):
After spending a long time exploring,

I found that [HiveStrategies.scala|https://github.com/apache/spark/blob/v3.0.0/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveStrategies.scala#L209-L215] will convert HiveTableRelation to LogicalRelation,

and this will match this case [DataSourceAnalysis|https://github.com/apache/spark/blob/v3.0.0/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala#L166-L228] condition

(at spark 2.4.3, HiveTableRelation will not to convert to LogicalRelation if table is partitioned,

so if table is none partitioned, insert overwirte itselft also will be get an error)

This is ok when:

 
{code:java}
set spark.sql.hive.convertInsertingPartitionedTable=false;
insert overwrite table tmp.spark3_snap partition(dt='2020-09-10')
select id from tmp.spark3_snap where dt='2020-09-09';
{code}
I think this is bug cause this scene is normal demand.

 

 

> Connot overwite different partition with same table
> ---------------------------------------------------
>
>                 Key: SPARK-32838
>                 URL: https://issues.apache.org/jira/browse/SPARK-32838
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 3.0.0
>         Environment: hadoop 2.7 + spark 3.0.0
>            Reporter: CHC
>            Priority: Major
>
> When:
> {code:java}
> CREATE TABLE tmp.spark3_snap (
> id string
> )
> PARTITIONED BY (dt string)
> STORED AS ORC
> ;
> insert overwrite table tmp.spark3_snap partition(dt='2020-09-09')
> select 10;
> insert overwrite table tmp.spark3_snap partition(dt='2020-09-10')
> select 1;
> insert overwrite table tmp.spark3_snap partition(dt='2020-09-10')
> select id from tmp.spark3_snap where dt='2020-09-09';
> {code}
> and it will be get a error: "Cannot overwrite a path that is also being read from"
> related: https://issues.apache.org/jira/browse/SPARK-24194
> This work on spark 2.4.3 and do not work on spark 3.0.0
>   



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org