You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Hyukjin Kwon (Jira)" <ji...@apache.org> on 2019/10/08 05:42:21 UTC

[jira] [Resolved] (SPARK-24240) Add a config to control whether InMemoryFileIndex should update cache when refresh.

     [ https://issues.apache.org/jira/browse/SPARK-24240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hyukjin Kwon resolved SPARK-24240.
----------------------------------
    Resolution: Incomplete

> Add a config to control whether InMemoryFileIndex should update cache when refresh.
> -----------------------------------------------------------------------------------
>
>                 Key: SPARK-24240
>                 URL: https://issues.apache.org/jira/browse/SPARK-24240
>             Project: Spark
>          Issue Type: New Feature
>          Components: SQL
>    Affects Versions: 2.3.0
>            Reporter: jin xing
>            Priority: Major
>              Labels: bulk-closed
>
> In current code([https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/InsertIntoHadoopFsRelationCommand.scala#L172),] after data is inserted, spark will always refresh file index and update the cache. If the target table has tons of files, job will suffer time and OOM issue. Could we add a config to control whether InMemoryFileIndex should update cache when refresh.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org