You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2019/05/14 02:48:09 UTC
[GitHub] [spark] wangyum opened a new pull request #24596:
[SPARK-27694][SQL] CTAS created data source table should update statistics
if spark.sql.statistics.size.autoUpdate.enabled is enabled
wangyum opened a new pull request #24596: [SPARK-27694][SQL] CTAS created data source table should update statistics if spark.sql.statistics.size.autoUpdate.enabled is enabled
URL: https://github.com/apache/spark/pull/24596
## What changes were proposed in this pull request?
How to reproduce:
```sql
bin/spark-sql --conf spark.sql.statistics.size.autoUpdate.enabled=true -S
spark-sql> CREATE TABLE spark_27694 USING parquet AS SELECT 'a', 'b';
spark-sql> desc formatted spark_27694;
a string NULL
b string NULL
# Detailed Table Information
Database default
Table spark_27694
Owner yumwang
Created Time Tue May 14 10:38:25 CST 2019
Last Access Thu Jan 01 08:00:00 CST 1970
Created By Spark 2.4.0
Type MANAGED
Provider parquet
Table Properties [transient_lastDdlTime=1557801505]
Location file:/user/hive/warehouse/spark_27694
Serde Library org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe
InputFormat org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat
OutputFormat org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat
Storage Properties [serialization.format=1]
```
This pr fix this issue.
## How was this patch tested?
unit tests and manual tests:
```
bin/spark-sql --conf spark.sql.statistics.size.autoUpdate.enabled=true -S
spark-sql> CREATE TABLE spark_27694 USING parquet AS SELECT 'a', 'b';
spark-sql> DESC FORMATTED spark_27694;
a string NULL
b string NULL
# Detailed Table Information
Database default
Table spark_27694
Owner root
Created Time Mon May 13 19:45:33 GMT-07:00 2019
Last Access Wed Dec 31 17:00:00 GMT-07:00 1969
Created By Spark 3.0.0-SNAPSHOT
Type MANAGED
Provider parquet
Statistics 561 bytes
Location file:/user/hive/warehouse/spark_27694
Serde Library org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe
InputFormat org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat
OutputFormat org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org