You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2019/05/14 02:48:09 UTC

[GitHub] [spark] wangyum opened a new pull request #24596: [SPARK-27694][SQL] CTAS created data source table should update statistics if spark.sql.statistics.size.autoUpdate.enabled is enabled

wangyum opened a new pull request #24596: [SPARK-27694][SQL] CTAS created data source table should update statistics if spark.sql.statistics.size.autoUpdate.enabled is enabled
URL: https://github.com/apache/spark/pull/24596
 
 
   ## What changes were proposed in this pull request?
   
   How to reproduce:
   ```sql
   bin/spark-sql --conf spark.sql.statistics.size.autoUpdate.enabled=true -S
   
   spark-sql> CREATE TABLE spark_27694 USING parquet AS SELECT 'a', 'b';
   spark-sql> desc formatted spark_27694;
   a	string	NULL
   b	string	NULL
   
   # Detailed Table Information
   Database	default
   Table	spark_27694
   Owner	yumwang
   Created Time	Tue May 14 10:38:25 CST 2019
   Last Access	Thu Jan 01 08:00:00 CST 1970
   Created By	Spark 2.4.0
   Type	MANAGED
   Provider	parquet
   Table Properties	[transient_lastDdlTime=1557801505]
   Location	file:/user/hive/warehouse/spark_27694
   Serde Library	org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe
   InputFormat	org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat
   OutputFormat	org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat
   Storage Properties	[serialization.format=1]
   ```
   This pr fix this issue.
   
   ## How was this patch tested?
   
   unit tests and manual tests:
   ```
   bin/spark-sql --conf spark.sql.statistics.size.autoUpdate.enabled=true -S
   
   spark-sql> CREATE TABLE spark_27694 USING parquet AS SELECT 'a', 'b';
   spark-sql> DESC FORMATTED spark_27694;
   a	string	NULL
   b	string	NULL
   
   # Detailed Table Information
   Database	default
   Table	spark_27694
   Owner	root
   Created Time	Mon May 13 19:45:33 GMT-07:00 2019
   Last Access	Wed Dec 31 17:00:00 GMT-07:00 1969
   Created By	Spark 3.0.0-SNAPSHOT
   Type	MANAGED
   Provider	parquet
   Statistics	561 bytes
   Location	file:/user/hive/warehouse/spark_27694
   Serde Library	org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe
   InputFormat	org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat
   OutputFormat	org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org