You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2019/04/29 10:06:47 UTC

[GitHub] [spark] wangyum opened a new pull request #24486: [SPARK-27592][SQL] Write the data of table write information to metadata

wangyum opened a new pull request #24486: [SPARK-27592][SQL] Write the data of table write information to metadata
URL: https://github.com/apache/spark/pull/24486
 
 
   ## What changes were proposed in this pull request?
   
   We hint Hive using incorrect **InputFormat**(`org.apache.hadoop.mapred.SequenceFileInputFormat`) to read Spark's **Parquet** datasource bucket table:
   ```sql
   spark-sql> CREATE TABLE t (c1 INT, c2 INT) USING parquet CLUSTERED BY (c1) SORTED BY (c1) INTO 2 BUCKETS;
   2019-04-29 17:52:05 WARN  HiveExternalCatalog:66 - Persisting bucketed data source table `default`.`t` into Hive metastore in Spark SQL specific format, which is NOT compatible with Hive.
   spark-sql> DESC EXTENDED t;
   c1	int	NULL
   c2	int	NULL
   
   # Detailed Table Information
   Database	default
   Table	t
   Owner	yumwang
   Created Time	Mon Apr 29 17:52:05 CST 2019
   Last Access	Thu Jan 01 08:00:00 CST 1970
   Created By	Spark 2.4.0
   Type	MANAGED
   Provider	parquet
   Num Buckets	2
   Bucket Columns	[`c1`]
   Sort Columns	[`c1`]
   Table Properties	[transient_lastDdlTime=1556531525]
   Location	file:/user/hive/warehouse/t
   Serde Library	org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
   InputFormat	org.apache.hadoop.mapred.SequenceFileInputFormat
   OutputFormat	org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
   Storage Properties	[serialization.format=1]
   ```
   We can see incompatible information when creating the table:
   ```
   WARN  HiveExternalCatalog:66 - Persisting bucketed data source table `default`.`t` into Hive metastore in Spark SQL specific format, which is NOT compatible with Hive.
   ```
   But downstream don’t know the compatibility. I'd like to write the write information of this table to metadata so that each engine decides compatibility itself.
   
   ## How was this patch tested?
   
   unit tests
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org