You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Apache Spark (JIRA)" <ji...@apache.org> on 2016/10/20 06:20:58 UTC

[jira] [Assigned] (SPARK-18021) Refactor file name specification for data sources

     [ https://issues.apache.org/jira/browse/SPARK-18021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Apache Spark reassigned SPARK-18021:
------------------------------------

    Assignee: Reynold Xin  (was: Apache Spark)

> Refactor file name specification for data sources
> -------------------------------------------------
>
>                 Key: SPARK-18021
>                 URL: https://issues.apache.org/jira/browse/SPARK-18021
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SQL
>            Reporter: Reynold Xin
>            Assignee: Reynold Xin
>
> Currently each data source OutputWriter is responsible for specifying the entire file name for each file output. This, however, does not make any sense because we rely on file name for certain behaviors in Spark SQL, e.g. bucket id. The current approach allows individual data sources to break the implementation of bucketing.
> We don't want to move file name entirely also out of the data sources, because different data sources do want to specify different extensions.
> A good compromise is for the OutputWriter to take in the prefix for a file, and it can add its own suffix.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org