You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Cheng Lian (JIRA)" <ji...@apache.org> on 2016/06/01 22:19:59 UTC
[jira] [Created] (SPARK-15719) Disable writing Parquet summary
files by default
Cheng Lian created SPARK-15719:
----------------------------------
Summary: Disable writing Parquet summary files by default
Key: SPARK-15719
URL: https://issues.apache.org/jira/browse/SPARK-15719
Project: Spark
Issue Type: Bug
Components: SQL
Affects Versions: 2.0.0
Reporter: Cheng Lian
Assignee: Cheng Lian
Parquet summary files are not particular useful nowadays since
# when schema merging is disabled, we assume schema of all Parquet part-files are identical, thus we can read the footer from any part-files.
# when schema merging is enabled, we need to read footers of all files anyway to do the merge.
On the other hand, writing summary files can be expensive because footers of all part-files must be read and merged. This is particularly costly when appending small dataset to large existing Parquet dataset.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org