You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2019/04/29 10:06:47 UTC
[GitHub] [spark] wangyum opened a new pull request #24486:
[SPARK-27592][SQL] Write the data of table write information to metadata
wangyum opened a new pull request #24486: [SPARK-27592][SQL] Write the data of table write information to metadata
URL: https://github.com/apache/spark/pull/24486
## What changes were proposed in this pull request?
We hint Hive using incorrect **InputFormat**(`org.apache.hadoop.mapred.SequenceFileInputFormat`) to read Spark's **Parquet** datasource bucket table:
```sql
spark-sql> CREATE TABLE t (c1 INT, c2 INT) USING parquet CLUSTERED BY (c1) SORTED BY (c1) INTO 2 BUCKETS;
2019-04-29 17:52:05 WARN HiveExternalCatalog:66 - Persisting bucketed data source table `default`.`t` into Hive metastore in Spark SQL specific format, which is NOT compatible with Hive.
spark-sql> DESC EXTENDED t;
c1 int NULL
c2 int NULL
# Detailed Table Information
Database default
Table t
Owner yumwang
Created Time Mon Apr 29 17:52:05 CST 2019
Last Access Thu Jan 01 08:00:00 CST 1970
Created By Spark 2.4.0
Type MANAGED
Provider parquet
Num Buckets 2
Bucket Columns [`c1`]
Sort Columns [`c1`]
Table Properties [transient_lastDdlTime=1556531525]
Location file:/user/hive/warehouse/t
Serde Library org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
InputFormat org.apache.hadoop.mapred.SequenceFileInputFormat
OutputFormat org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
Storage Properties [serialization.format=1]
```
We can see incompatible information when creating the table:
```
WARN HiveExternalCatalog:66 - Persisting bucketed data source table `default`.`t` into Hive metastore in Spark SQL specific format, which is NOT compatible with Hive.
```
But downstream don’t know the compatibility. I'd like to write the write information of this table to metadata so that each engine decides compatibility itself.
## How was this patch tested?
unit tests
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org