You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "xianlongZhang (JIRA)" <ji...@apache.org> on 2017/08/17 01:33:00 UTC

[jira] [Commented] (SPARK-16188) Spark sql create a lot of small files

    [ https://issues.apache.org/jira/browse/SPARK-16188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16129742#comment-16129742 ] 

xianlongZhang commented on SPARK-16188:
---------------------------------------

But when we use Spark sql, we can not call the 'coalesce' method. What should we do in this case? In our production environment, this often happens and does not find a better solution until now

> Spark sql create a lot of small files
> -------------------------------------
>
>                 Key: SPARK-16188
>                 URL: https://issues.apache.org/jira/browse/SPARK-16188
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 1.6.0
>         Environment: spark 1.6.1
>            Reporter: cen yuhai
>
> I find that spark sql will create files as many as partition size. When the results are small, there will be too many small files and most of them are empty. 
> Hive have a function to detect the avg of file size. If  avg file size is smaller than "hive.merge.smallfiles.avgsize", hive will add a job to merge files.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org