You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Erik Forsberg (Updated) (JIRA)" <ji...@apache.org> on 2012/02/02 16:35:54 UTC

[jira] [Updated] (CASSANDRA-3840) Use java.io.tmpdir as default output location for BulkRecordWriter

     [ https://issues.apache.org/jira/browse/CASSANDRA-3840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Erik Forsberg updated CASSANDRA-3840:
-------------------------------------

    Attachment: java.io.tmpdir.patch
    
> Use java.io.tmpdir as default output location for BulkRecordWriter
> ------------------------------------------------------------------
>
>                 Key: CASSANDRA-3840
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3840
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Hadoop
>    Affects Versions: 1.1
>            Reporter: Erik Forsberg
>              Labels: bulkloader
>         Attachments: java.io.tmpdir.patch
>
>
> BulkRecordWriter uses the value of the property mapreduce.output.bulkoutputformat.localdir if set, defaulting to value of mapred.local.dir if the former is not set.
> However, on a typical production system, mapred.local.dir is set to a list of directories. This leads to BulkOutputFormat writing to silly paths such as
> /dir1/,dir2,/dir3,KeySpaceName/CFName
> This has two effects:
> 1) Directory is not removed when job is finished, leading to disk space management issues.
> 2) If a new job is run against same keyspacename and CF, it tries to load old data + new data.
> Better to use System.getProperty("java.io.tmpdir"), as that is set to an attempt-specific temporary directory which is cleaned after the job finishes. See http://hadoop.apache.org/common/docs/current/mapred_tutorial.html, under "Directory Structure".

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira