You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Dongjoon Hyun (Jira)" <ji...@apache.org> on 2019/09/26 22:49:00 UTC

[jira] [Assigned] (SPARK-29259) Filesystem.exists is called even when not necessary for append save mode

     [ https://issues.apache.org/jira/browse/SPARK-29259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dongjoon Hyun reassigned SPARK-29259:
-------------------------------------

    Assignee: Rahij Ramsharan

> Filesystem.exists is called even when not necessary for append save mode
> ------------------------------------------------------------------------
>
>                 Key: SPARK-29259
>                 URL: https://issues.apache.org/jira/browse/SPARK-29259
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 3.0.0
>            Reporter: Rahij Ramsharan
>            Assignee: Rahij Ramsharan
>            Priority: Minor
>             Fix For: 3.0.0
>
>
> When saving a dataframe into Hadoop ([https://github.com/apache/spark/blob/v2.4.4/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/InsertIntoHadoopFsRelationCommand.scala#L93]), spark first checks if the file exists before inspecting the SaveMode to determine if it should actually insert data. However, the pathExists variable is actually not used in the case of SaveMode.Append. In some file systems, the exists call can be expensive and hence this PR makes that call only when necessary.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org