You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Yuming Wang (JIRA)" <ji...@apache.org> on 2019/05/27 03:21:00 UTC
[jira] [Resolved] (SPARK-24766) CreateHiveTableAsSelect and
InsertIntoHiveDir won't generate decimal column stats in parquet
[ https://issues.apache.org/jira/browse/SPARK-24766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Yuming Wang resolved SPARK-24766.
---------------------------------
Resolution: Fixed
Fix Version/s: 3.0.0
Fixed by https://github.com/apache/spark/pull/24346
> CreateHiveTableAsSelect and InsertIntoHiveDir won't generate decimal column stats in parquet
> --------------------------------------------------------------------------------------------
>
> Key: SPARK-24766
> URL: https://issues.apache.org/jira/browse/SPARK-24766
> Project: Spark
> Issue Type: Improvement
> Components: SQL
> Affects Versions: 2.4.0
> Reporter: Yuming Wang
> Priority: Major
> Labels: Parquet
> Fix For: 3.0.0
>
>
> How to reproduce:
> {code:java}
> INSERT OVERWRITE LOCAL DIRECTORY '/tmp/spark/parquet/dir' STORED AS parquet select cast(1 as decimal) as decimal1;
> {code}
> {code:java}
> create table test_parquet stored as parquet as select cast(1 as decimal) as decimal1;
> {code}
> {noformat}
> $ java -jar ./parquet-tools/target/parquet-tools-1.10.1-SNAPSHOT.jar meta file:/tmp/spark/parquet/dir/part-00000-cb96a617-4759-4b21-a222-2153ca0e8951-c000
> file: file:/tmp/spark/parquet/dir/part-00000-cb96a617-4759-4b21-a222-2153ca0e8951-c000
> creator: parquet-mr version 1.6.0 (build 6aa21f8776625b5fa6b18059cfebe7549f2e00cb)
> file schema: hive_schema
> --------------------------------------------------------------------------------
> decimal1: OPTIONAL FIXED_LEN_BYTE_ARRAY O:DECIMAL R:0 D:1
> row group 1: RC:1 TS:46 OFFSET:4
> --------------------------------------------------------------------------------
> decimal1: FIXED_LEN_BYTE_ARRAY SNAPPY DO:0 FPO:4 SZ:48/46/0.96 VC:1 ENC:BIT_PACKED,PLAIN,RLE ST:[no stats for this column]
> {noformat}
> because spark still use com.twitter.parquet-hadoop-bundle.1.6.0.
> May be we should refactor {{CreateHiveTableAsSelectCommand}} and {{InsertIntoHiveDirCommand}} or [upgrade built-in Hive|https://issues.apache.org/jira/browse/SPARK-23710].
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org