You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "xinzhang (JIRA)" <ji...@apache.org> on 2017/11/01 02:31:00 UTC
[jira] [Comment Edited] (SPARK-21725) spark thriftserver insert overwrite table partition select

    [ https://issues.apache.org/jira/browse/SPARK-21725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16233578#comment-16233578 ] 

xinzhang edited comment on SPARK-21725 at 11/1/17 2:30 AM:
-----------------------------------------------------------

[~mgaido]

1. hive 1.2.1  
   download a new tar only change hive-site.xml 
  about hive metastore with mysql . metastore(local 9083) 
2.spark-sql copy the hive-site.xml  
3.start spark-thriftserver
4.beeline connect the thriftserver 

The metastore has changed from derby to mysql . My suggest is could u do it as a new env without your exit env.
Like what u say might be related to the metastore. I tested the case in cdh5.7(hadoop2.6)   and hadoop2.8(new env) , they will always appear , No matter what I did . Hope your help . Thanks .


was (Author: zhangxin0112zx):
1. hive 1.2.1  
   download a new tar only change hive-site.xml 
  about hive metastore with mysql . metastore(local 9083) 
2.spark-sql copy the hive-site.xml  
3.start spark-thriftserver
4.beeline connect the thriftserver 

The metastore has changed from derby to mysql . My suggest is could u do it as a new env without your exit env.
Like what u say might be related to the metastore. I tested the case in cdh5.7(hadoop2.6)   and hadoop2.8(new env) , they will always appear , No matter what I did . Hope your help . Thanks .

> spark thriftserver insert overwrite table partition select 
> -----------------------------------------------------------
>
>                 Key: SPARK-21725
>                 URL: https://issues.apache.org/jira/browse/SPARK-21725
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 2.1.0
>         Environment: centos 6.7 spark 2.1  jdk8
>            Reporter: xinzhang
>            Priority: Major
>              Labels: spark-sql
>
> use thriftserver create table with partitions.
> session 1:
>  SET hive.default.fileformat=Parquet;create table tmp_10(count bigint) partitioned by (pt string) stored as parquet;
> --ok
>  !exit
> session 2:
>  SET hive.default.fileformat=Parquet;create table tmp_11(count bigint) partitioned by (pt string) stored as parquet; 
> --ok
>  !exit
> session 3:
> --connect the thriftserver
> SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 partition(pt='1') select count(1) count from tmp_11;
> --ok
>  !exit
> session 4(do it again):
> --connect the thriftserver
> SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 partition(pt='1') select count(1) count from tmp_11;
> --error
>  !exit
> -------------------------------------------------------------------------------------
> 17/08/14 18:13:42 ERROR SparkExecuteStatementOperation: Error executing query, currentState RUNNING, 
> java.lang.reflect.InvocationTargetException
> ......
> ......
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/.hive-staging_hive_2017-08-14_18-13-39_035_6303339779053
> 512282-2/-ext-10000/part-00000 to destination hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/pt=1/part-00000
>         at org.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:2644)
>         at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:2711)
>         at org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1403)
>         at org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1324)
>         ... 45 more
> Caused by: java.io.IOException: Filesystem closed
> ....
> -------------------------------------------------------------------------------------
> the doc about the parquet table desc here http://spark.apache.org/docs/latest/sql-programming-guide.html#parquet-files
> Hive metastore Parquet table conversion
> When reading from and writing to Hive metastore Parquet tables, Spark SQL will try to use its own Parquet support instead of Hive SerDe for better performance. This behavior is controlled by the spark.sql.hive.convertMetastoreParquet configuration, and is turned on by default.
> I am confused the problem appear in the table(partitions)  but it is ok with table(with out partitions) . It means spark do not use its own parquet ?
> Maybe someone give any suggest how could I avoid the issue?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org