You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Sean Owen (JIRA)" <ji...@apache.org> on 2016/03/09 10:31:40 UTC

[jira] [Commented] (SPARK-13766) Inconsistent file extensions and omitted file extensions written by CSV, TEXT and JSON data sources

    [ https://issues.apache.org/jira/browse/SPARK-13766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15186835#comment-15186835 ] 

Sean Owen commented on SPARK-13766:
-----------------------------------

It seems like you're referring to the "part-*" files. These files are effectively an internal representation, and I would not expect them to have such an extension. For example, you're not really guaranteed that the way the data breaks up leaves each file a valid JSON doc.

> Inconsistent file extensions and omitted file extensions written by CSV, TEXT and JSON data sources
> ---------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-13766
>                 URL: https://issues.apache.org/jira/browse/SPARK-13766
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 2.0.0
>            Reporter: Hyukjin Kwon
>            Priority: Minor
>
> Currently, the output (part-files) from CSV, TEXT and JSON data sources do not have file extensions such as .csv, .txt and .json (except for compression extensions such as .gz, .deflate and .bz4).
> In addition, it looks Parquet has the extensions (in part-files) such as .gz.parquet or .snappy.parquet according to compression codecs whereas ORC does not have such extensions but it is just .orc.
> So, in a simple view, currently the extensions are set as below:
> {code}
> TEXT, CSV and JSON - [.COMPRESSION_CODEC_NAME]
> Parquet -  [.COMPRESSION_CODEC_NAME].parquet
> ORC - .orc
> {code}
> It would be great if we have a consistent naming for them



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org