You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-dev@hadoop.apache.org by "eric baldeschwieler (JIRA)" <ji...@apache.org> on 2006/03/25 23:39:19 UTC

[jira] Commented: (HADOOP-106) Data blocks should be record-oriented.

    [ http://issues.apache.org/jira/browse/HADOOP-106?page=comments#action_12371878 ] 

eric baldeschwieler commented on HADOOP-106:
--------------------------------------------

My intuition is it makes more sense to do this the other way around and have records aligned to blocks.  This keeps the FS implementation trivial.  Just pad near the end of a block.  This way you keep a good seperation of APIs too.  Fairly straight forward to change the record model to do that.  Only issues are with huge records.  You have a couple of options there.  The simplest is to disallow them...

> Data blocks should be record-oriented.
> --------------------------------------
>
>          Key: HADOOP-106
>          URL: http://issues.apache.org/jira/browse/HADOOP-106
>      Project: Hadoop
>         Type: Wish
>   Components: dfs
>     Versions: 0.2
>     Reporter: Andrzej Bialecki 

>
> If data blocks were starting and ending on data record boundaries, and not in random places within a file, it would give some important advantages:
> * it would be possible to avoid "fishing" for the beginning of first record in a split (see SequenceFile.Reader.sync()).
> * it would make recovering from DFS errors much more successful and easier - in most cases missing blocks could be just skipped and the remaining parts combined together.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira