You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tajo.apache.org by "Hyoungjun Kim (JIRA)" <ji...@apache.org> on 2014/08/27 11:26:57 UTC

[jira] [Resolved] (TAJO-992) Reduce number of hash shuffle output file.

     [ https://issues.apache.org/jira/browse/TAJO-992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hyoungjun Kim resolved TAJO-992.
--------------------------------

       Resolution: Fixed
    Fix Version/s: 0.9.0

Committed.

> Reduce number of hash shuffle output file.
> ------------------------------------------
>
>                 Key: TAJO-992
>                 URL: https://issues.apache.org/jira/browse/TAJO-992
>             Project: Tajo
>          Issue Type: Sub-task
>          Components: data shuffle
>            Reporter: Hyoungjun Kim
>            Assignee: Hyoungjun Kim
>             Fix For: 0.9.0
>
>
> Currently Tajo creates too many intermediate files in the case of hash shuffle. A execution block(SubQuery) on a TajoWorker creates intermediate files  as following rule:
>   # intermediate files  in a worker = # tasks / # workers * # partitions 
> This may cause 'too many file opens' error and makes it difficult to scale out. To solve this problem, We should reduce number of hash shuffle output file.



--
This message was sent by Atlassian JIRA
(v6.2#6252)