You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Suraj Nayak (JIRA)" <ji...@apache.org> on 2016/05/07 09:14:13 UTC
[jira] [Created] (HADOOP-13114) DistCp should have option to
compress data on write
Suraj Nayak created HADOOP-13114:
------------------------------------
Summary: DistCp should have option to compress data on write
Key: HADOOP-13114
URL: https://issues.apache.org/jira/browse/HADOOP-13114
Project: Hadoop Common
Issue Type: Improvement
Reporter: Suraj Nayak
Assignee: Suraj Nayak
Priority: Minor
Fix For: 3.0.0
DistCp utility should have capability to store data in user specified compressed format. This avoids one hop of compressing data after transfer. Backup strategies to different cluster gets benefit saving one IO operation, time and effort.
* Create a option -compressOutput with defaulting to {{org.apache.avro.file.BZip2Codec}}.
* Users will be able to change codec with {{-D mapreduce.output.fileoutputformat.compress.codec=org.apache.avro.file.SnappyCodec}}
* If distcp compression is enables, suffix the filenames with default codec extension to indicate the file is compressed. Thus users can be aware of what codec was used to compress the data.
This JIRA is similar to [HADOOP-8065|https://issues.apache.org/jira/browse/HADOOP-8065]. [HADOOP-8065|https://issues.apache.org/jira/browse/HADOOP-8065] aims to compress data *during transit* which is a huge effort. This JIRA is simplified to enable to user to compress data when the data lands on target filesystem.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org