You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Apache Spark (JIRA)" <ji...@apache.org> on 2016/01/04 13:43:39 UTC

[jira] [Assigned] (SPARK-12619) Combine small files in a hadoop directory into single split

     [ https://issues.apache.org/jira/browse/SPARK-12619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Apache Spark reassigned SPARK-12619:
------------------------------------

    Assignee: Apache Spark

> Combine small files in a hadoop directory into single split 
> ------------------------------------------------------------
>
>                 Key: SPARK-12619
>                 URL: https://issues.apache.org/jira/browse/SPARK-12619
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core
>            Reporter: Navis
>            Assignee: Apache Spark
>            Priority: Trivial
>
> When a directory contains too many (small) files, whole spark cluster will be exhausted scheduling tasks created for each file. Custom input format can handle that but if you're using hive metastore, it could hardly be an option.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org