You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@crunch.apache.org by "Josh Wills (JIRA)" <ji...@apache.org> on 2017/03/20 21:11:42 UTC

[jira] [Resolved] (CRUNCH-636) Make replication factor for temporary files configurable

     [ https://issues.apache.org/jira/browse/CRUNCH-636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Josh Wills resolved CRUNCH-636.
-------------------------------
       Resolution: Fixed
    Fix Version/s: 1.0.0

Pushed to master-- thanks Attila!

> Make replication factor for temporary files configurable
> --------------------------------------------------------
>
>                 Key: CRUNCH-636
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-636
>             Project: Crunch
>          Issue Type: New Feature
>            Reporter: Attila Sasvari
>            Assignee: Attila Sasvari
>             Fix For: 1.0.0
>
>         Attachments: CRUNCH-636.01.patch, CRUNCH-636.02.patch, CRUNCH-636.03.patch, CRUNCH-636.04.patch, test.WordCount_2017-03-08_16.31.55.737_jobplan.dot.png, test.WordCount_2017-03-08_16.31.55.737.log
>
>
> As of now, Crunch does not allow having different replication factor for temporary files and non-temporary files (e.g. final output data of leaf nodes) at the same time. If a user has a large amount of data (say hundreds a of gigabytes) to process, they might want to have lower replication factor for large temporary files between Crunch jobs. 
> We could make this configurable via a new setting (e.g. {{crunch.tmp.dir.replication}}).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)