You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Tim Armstrong (JIRA)" <ji...@apache.org> on 2017/07/14 15:48:00 UTC

[jira] [Commented] (SPARK-20937) Describe spark.sql.parquet.writeLegacyFormat property in Spark SQL, DataFrames and Datasets Guide

    [ https://issues.apache.org/jira/browse/SPARK-20937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16087506#comment-16087506 ] 

Tim Armstrong commented on SPARK-20937:
---------------------------------------

+1 too. The documentation should also be clear that the "legacy" format for decimal *is* valid Parquet and is better supported by other systems. It's unfortunate that the decimal change and the array representation change got put under one flag since the previous decimal encoding was totally valid parquet and better supported by other systems.

> Describe spark.sql.parquet.writeLegacyFormat property in Spark SQL, DataFrames and Datasets Guide
> -------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-20937
>                 URL: https://issues.apache.org/jira/browse/SPARK-20937
>             Project: Spark
>          Issue Type: Improvement
>          Components: Documentation, SQL
>    Affects Versions: 2.3.0
>            Reporter: Jacek Laskowski
>            Priority: Trivial
>
> As a follow-up to SPARK-20297 (and SPARK-10400) in which {{spark.sql.parquet.writeLegacyFormat}} property was recommended for Impala and Hive, Spark SQL docs for [Parquet Files|https://spark.apache.org/docs/latest/sql-programming-guide.html#configuration] should have it documented.
> p.s. It was asked about in [Why can't Impala read parquet files after Spark SQL's write?|https://stackoverflow.com/q/44279870/1305344] on StackOverflow today.
> p.s. It's also covered in [~holden.karau@gmail.com]'s "High Performance Spark: Best Practices for Scaling and Optimizing Apache Spark" book (in Table 3-10. Parquet data source options) that gives the option some wider publicity.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org