You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Robert Chansler (JIRA)" <ji...@apache.org> on 2008/04/05 02:30:27 UTC
[jira] Created: (HADOOP-3188) Archive/compaction utility for
directories
Archive/compaction utility for directories
------------------------------------------
Key: HADOOP-3188
URL: https://issues.apache.org/jira/browse/HADOOP-3188
Project: Hadoop Core
Issue Type: New Feature
Components: dfs, mapred
Reporter: Robert Chansler
Utility will collapse the contents of a directory into a small number of files plus and index file.
A new map-reduce input format will be able to read the collapsed structure.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-3188) compaction utility for directories
Posted by "Mahadev konar (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-3188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Mahadev konar updated HADOOP-3188:
----------------------------------
Description:
Utility will collapse the contents of a directory into a small number of files.
was:
Utility will collapse the contents of a directory into a small number of files plus and index file.
A new map-reduce input format will be able to read the collapsed structure.
Assignee: Robert Chansler (was: Mahadev konar)
Summary: compaction utility for directories (was: Archive/compaction utility for directories)
editing this issue to be just a compaction utility.
> compaction utility for directories
> ----------------------------------
>
> Key: HADOOP-3188
> URL: https://issues.apache.org/jira/browse/HADOOP-3188
> Project: Hadoop Core
> Issue Type: New Feature
> Components: dfs, mapred
> Reporter: Robert Chansler
> Assignee: Robert Chansler
>
> Utility will collapse the contents of a directory into a small number of files.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-3188) Archive/compaction utility for
directories
Posted by "Milind Bhandarkar (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-3188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12588038#action_12588038 ]
Milind Bhandarkar commented on HADOOP-3188:
-------------------------------------------
Thanks to Arkady, here are the requirements for such a tool.
Here is the outline of the functionality.
Please comment -- does this meet your needs, what is missing, etc. This will help to get this small tool right.
The purpose of the tool:
given a DFS directory with N part files, produce a DFS directory with M part files with content equivalent to the original one.
Optionally, the tool will also compress the data in a way that is transparent to MapReduce jobs.
What is "Equivalent content"?
There are three cases:
* records are independent and the order does not matter
* records are totally ordered (the keys are ordered in each part file, and all the keys in part-i are "less" than those in part-i+1)
* records are ordered within each shard (part-file), and this order is important
In the second and third cases the records with the same key should be in the same shard (part-file). This also so may be required in case 1, too.
The first two cases allow just to concatenate the shards into larger ones.
In the third case, the shards need to be merged according to the keys order.
The command will look like
dfs_compact
-input input dfs directory path (required)
-output output dfs directory path (by default -- replace the input)
-nshards the number of shards (part files) in the output
-shardsize the approximate desired size of a shard in the output
only of -nshards and -shardsize should be specified
default -- one shard
-order [yes|no] optional; default -- "no"
"yes" corresponds to case three (this may require supplying a key comparison method)
-compress [gzip|zlib|lzo] if the option specified with no value,
the tool will pick the compression method itself
It probably be implemented as a map-reduce job. (Map-only for cases 1 and 2)
> Archive/compaction utility for directories
> ------------------------------------------
>
> Key: HADOOP-3188
> URL: https://issues.apache.org/jira/browse/HADOOP-3188
> Project: Hadoop Core
> Issue Type: New Feature
> Components: dfs, mapred
> Reporter: Robert Chansler
>
> Utility will collapse the contents of a directory into a small number of files plus and index file.
> A new map-reduce input format will be able to read the collapsed structure.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Resolved: (HADOOP-3188) compaction utility for directories
Posted by "Robert Chansler (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-3188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Robert Chansler resolved HADOOP-3188.
-------------------------------------
Resolution: Fixed
Fix Version/s: 0.18.0
Mahadev's work (HADOOP-3307) is close enough. There is no reason for this to be open.
> compaction utility for directories
> ----------------------------------
>
> Key: HADOOP-3188
> URL: https://issues.apache.org/jira/browse/HADOOP-3188
> Project: Hadoop Core
> Issue Type: New Feature
> Components: dfs, mapred
> Reporter: Robert Chansler
> Assignee: Robert Chansler
> Fix For: 0.18.0
>
>
> Utility will collapse the contents of a directory into a small number of files.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Assigned: (HADOOP-3188) Archive/compaction utility for
directories
Posted by "Mahadev konar (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-3188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Mahadev konar reassigned HADOOP-3188:
-------------------------------------
Assignee: Mahadev konar
> Archive/compaction utility for directories
> ------------------------------------------
>
> Key: HADOOP-3188
> URL: https://issues.apache.org/jira/browse/HADOOP-3188
> Project: Hadoop Core
> Issue Type: New Feature
> Components: dfs, mapred
> Reporter: Robert Chansler
> Assignee: Mahadev konar
>
> Utility will collapse the contents of a directory into a small number of files plus and index file.
> A new map-reduce input format will be able to read the collapsed structure.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.