You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Owen O'Malley (JIRA)" <ji...@apache.org> on 2008/03/02 08:34:50 UTC

[jira] Commented: (HADOOP-2922) sequencefiles without keys

    [ https://issues.apache.org/jira/browse/HADOOP-2922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12574195#action_12574195 ] 

Owen O'Malley commented on HADOOP-2922:
---------------------------------------

Is this actually a serious issue? The overhead of using NullWritables as your key should be 4 bytes/record without block compression and far less with it. It might make sense to special case SequenceFiles to not actually encode the NullWritables in each record. Is the extra space an observed problem or is it just an abstract complaint?

> sequencefiles without keys
> --------------------------
>
>                 Key: HADOOP-2922
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2922
>             Project: Hadoop Core
>          Issue Type: New Feature
>    Affects Versions: 0.16.0
>            Reporter: Joydeep Sen Sarma
>
> sequencefiles are invaluable for storing compressed/binary data. but when we use them to store serialized records - we don't use the key part at all (just put something dummy there to satisfy the api). i have heard of other projects using the same tactics (jaql/cascading).
> so this is a request to have a modified version of sequencefiles that don't incur the space and compute overhead of processing/storing these dummy keys.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.