You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Doug Cutting (JIRA)" <ji...@apache.org> on 2006/03/07 01:26:29 UTC

[jira] Commented: (HADOOP-66) dfs client writes all data for a chunk to /tmp

    [ http://issues.apache.org/jira/browse/HADOOP-66?page=comments#action_12369110 ] 

Doug Cutting commented on HADOOP-66:
------------------------------------

It looks to me like the temp file is only in fact used when the connection to the datanode fails.  Normally the block is streamed to the datanode as it is written.  But if the connection to the datanode fails then an application exception is not thrown, instead the temp file is used to recover, by reconnecting to a datanode and trying to write the block again.

Data is bufferred in RAM first, just in chunks much smaller than the block.  I don't think we should buffer the entire block in RAM, as this would, e.g., prohibit applications which write lots of files in parallel.

We could get rid of the temp file and simply throw an application exception when we lose a connection to a datanode while writing.  What is the objection to the temp file?

> dfs client writes all data for a chunk to /tmp
> ----------------------------------------------
>
>          Key: HADOOP-66
>          URL: http://issues.apache.org/jira/browse/HADOOP-66
>      Project: Hadoop
>         Type: Bug
>   Components: dfs
>     Versions: 0.1
>     Reporter: Sameer Paranjpye
>      Fix For: 0.1

>
> The dfs client writes all the data for the current chunk to a file in /tmp, when the chunk is complete it is shipped out to the Datanodes. This can cause /tmp to fill up fast when a lot of files are being written. A potentially better scheme is to buffer the written data in RAM (application code can set the buffer size) and flush it to the Datanodes when the buffer fills up.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira