You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-dev@hadoop.apache.org by "Sameer Paranjpye (JIRA)" <ji...@apache.org> on 2006/03/07 00:57:29 UTC

[jira] Created: (HADOOP-66) dfs client writes all data for a chunk to /tmp

dfs client writes all data for a chunk to /tmp
----------------------------------------------

         Key: HADOOP-66
         URL: http://issues.apache.org/jira/browse/HADOOP-66
     Project: Hadoop
        Type: Bug
  Components: dfs  
    Versions: 0.1    
    Reporter: Sameer Paranjpye
     Fix For: 0.1


The dfs client writes all the data for the current chunk to a file in /tmp, when the chunk is complete it is shipped out to the Datanodes. This can cause /tmp to fill up fast when a lot of files are being written. A potentially better scheme is to buffer the written data in RAM (application code can set the buffer size) and flush it to the Datanodes when the buffer fills up.


-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

[jira] Updated: (HADOOP-66) dfs client writes all data for a chunk to /tmp

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.

     [ http://issues.apache.org/jira/browse/HADOOP-66?page=all ]

Doug Cutting updated HADOOP-66:
-------------------------------

    Attachment: no-tmp.patch

Here's a minimally-tested patch that removes the use of temp files.

> dfs client writes all data for a chunk to /tmp
> ----------------------------------------------
>
>          Key: HADOOP-66
>          URL: http://issues.apache.org/jira/browse/HADOOP-66
>      Project: Hadoop
>         Type: Bug
>   Components: dfs
>     Versions: 0.1
>     Reporter: Sameer Paranjpye
>      Fix For: 0.1
>  Attachments: no-tmp.patch
>
> The dfs client writes all the data for the current chunk to a file in /tmp, when the chunk is complete it is shipped out to the Datanodes. This can cause /tmp to fill up fast when a lot of files are being written. A potentially better scheme is to buffer the written data in RAM (application code can set the buffer size) and flush it to the Datanodes when the buffer fills up.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

[jira] Commented: (HADOOP-66) dfs client writes all data for a chunk to /tmp

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.

    [ http://issues.apache.org/jira/browse/HADOOP-66?page=comments#action_12369110 ] 

Doug Cutting commented on HADOOP-66:
------------------------------------

It looks to me like the temp file is only in fact used when the connection to the datanode fails.  Normally the block is streamed to the datanode as it is written.  But if the connection to the datanode fails then an application exception is not thrown, instead the temp file is used to recover, by reconnecting to a datanode and trying to write the block again.

Data is bufferred in RAM first, just in chunks much smaller than the block.  I don't think we should buffer the entire block in RAM, as this would, e.g., prohibit applications which write lots of files in parallel.

We could get rid of the temp file and simply throw an application exception when we lose a connection to a datanode while writing.  What is the objection to the temp file?

> dfs client writes all data for a chunk to /tmp
> ----------------------------------------------
>
>          Key: HADOOP-66
>          URL: http://issues.apache.org/jira/browse/HADOOP-66
>      Project: Hadoop
>         Type: Bug
>   Components: dfs
>     Versions: 0.1
>     Reporter: Sameer Paranjpye
>      Fix For: 0.1

>
> The dfs client writes all the data for the current chunk to a file in /tmp, when the chunk is complete it is shipped out to the Datanodes. This can cause /tmp to fill up fast when a lot of files are being written. A potentially better scheme is to buffer the written data in RAM (application code can set the buffer size) and flush it to the Datanodes when the buffer fills up.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

[jira] Updated: (HADOOP-66) dfs client writes all data for a chunk to /tmp

Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.

     [ http://issues.apache.org/jira/browse/HADOOP-66?page=all ]

Owen O'Malley updated HADOOP-66:
--------------------------------

    Attachment: tmp-delete.patch

There is a method on File to delete the file when the jvm exits. Here is a patch that calls the method on the temporary block files. Apparently there is a bug in the jvm under windows so that the files are only deleted if they are closed. For linux or solaris users, this should make sure that no block files end up being dropped by the application.

> dfs client writes all data for a chunk to /tmp
> ----------------------------------------------
>
>          Key: HADOOP-66
>          URL: http://issues.apache.org/jira/browse/HADOOP-66
>      Project: Hadoop
>         Type: Bug
>   Components: dfs
>     Versions: 0.1
>     Reporter: Sameer Paranjpye
>     Assignee: Doug Cutting
>      Fix For: 0.1
>  Attachments: no-tmp.patch, tmp-delete.patch
>
> The dfs client writes all the data for the current chunk to a file in /tmp, when the chunk is complete it is shipped out to the Datanodes. This can cause /tmp to fill up fast when a lot of files are being written. A potentially better scheme is to buffer the written data in RAM (application code can set the buffer size) and flush it to the Datanodes when the buffer fills up.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

[jira] Commented: (HADOOP-66) dfs client writes all data for a chunk to /tmp

Posted by "Sameer Paranjpye (JIRA)" <ji...@apache.org>.

    [ http://issues.apache.org/jira/browse/HADOOP-66?page=comments#action_12369299 ] 

Sameer Paranjpye commented on HADOOP-66:
----------------------------------------

It doesn't make a lot of sense to buffer the entire block in RAM. On the other hand, an application ought to be able to control the buffering strategy to some extent. Most stream implementations have a setBufferSize() or equivalent method that allow programmers to do this. The default buffer size is reasonably small so that many files can be opened without worrying too much about buffering.

Besides the issues with filling up /tmp (32MB is a pretty large chunk to be writing there), it's unclear that the scheme adds a lot of value, it may even be detrimental. If a connection to a Datanode fails then why not try to recover by re-connecting and throw an exception if that fails. If it's just the connection that has failed the client should be able to reconnect pretty easily. If the Datanode is down for the count the odds are low (or are they?) that it'll come back by the time the client finishes writing the block and the write will fail anyway, so why write to a temp file. If it's common for Datanodes to bounce and come back then Datanode stability is an problem that we should be working on. In that case, the temp file is only a workaround and not a real solution, it might even be masking the problem in many cases.


> dfs client writes all data for a chunk to /tmp
> ----------------------------------------------
>
>          Key: HADOOP-66
>          URL: http://issues.apache.org/jira/browse/HADOOP-66
>      Project: Hadoop
>         Type: Bug
>   Components: dfs
>     Versions: 0.1
>     Reporter: Sameer Paranjpye
>      Fix For: 0.1

>
> The dfs client writes all the data for the current chunk to a file in /tmp, when the chunk is complete it is shipped out to the Datanodes. This can cause /tmp to fill up fast when a lot of files are being written. A potentially better scheme is to buffer the written data in RAM (application code can set the buffer size) and flush it to the Datanodes when the buffer fills up.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

[jira] Commented: (HADOOP-66) dfs client writes all data for a chunk to /tmp

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.

    [ http://issues.apache.org/jira/browse/HADOOP-66?page=comments#action_12369293 ] 

Doug Cutting commented on HADOOP-66:
------------------------------------

Eric: I agree.  We should probably change this use a temp directory under dfs.data.dir instead of /tmp.

> dfs client writes all data for a chunk to /tmp
> ----------------------------------------------
>
>          Key: HADOOP-66
>          URL: http://issues.apache.org/jira/browse/HADOOP-66
>      Project: Hadoop
>         Type: Bug
>   Components: dfs
>     Versions: 0.1
>     Reporter: Sameer Paranjpye
>      Fix For: 0.1

>
> The dfs client writes all the data for the current chunk to a file in /tmp, when the chunk is complete it is shipped out to the Datanodes. This can cause /tmp to fill up fast when a lot of files are being written. A potentially better scheme is to buffer the written data in RAM (application code can set the buffer size) and flush it to the Datanodes when the buffer fills up.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

[jira] Commented: (HADOOP-66) dfs client writes all data for a chunk to /tmp

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.

    [ http://issues.apache.org/jira/browse/HADOOP-66?page=comments#action_12369308 ] 

Doug Cutting commented on HADOOP-66:
------------------------------------

It's hard to resume writing a block when a connection fails, since you don't know how much of the previous write succeeded.  Currently the block is streamed over TCP connections.  We could instead write it as a series of length-prefixed buffers, and query the remote datanode on reconnect about which buffers it had recieved, etc.  But that seems like reinventing a lot of TCP.

If the datanode goes down then currently the entire block is in a temp file so that it can instead be written to a different datanode.  Thus if datanodes die during, e.g., a reduce, then the reduce task does not have to restart.  But if reduce tasks are running on the same pool of machines as datanodes, then, when a node fails, some reduce tasks will need to be restarted anyway.  So I agree that this may not be helping us much.  I think throwing an exception when the connection to the datanode fails would be fine.

> dfs client writes all data for a chunk to /tmp
> ----------------------------------------------
>
>          Key: HADOOP-66
>          URL: http://issues.apache.org/jira/browse/HADOOP-66
>      Project: Hadoop
>         Type: Bug
>   Components: dfs
>     Versions: 0.1
>     Reporter: Sameer Paranjpye
>      Fix For: 0.1

>
> The dfs client writes all the data for the current chunk to a file in /tmp, when the chunk is complete it is shipped out to the Datanodes. This can cause /tmp to fill up fast when a lot of files are being written. A potentially better scheme is to buffer the written data in RAM (application code can set the buffer size) and flush it to the Datanodes when the buffer fills up.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

[jira] Commented: (HADOOP-66) dfs client writes all data for a chunk to /tmp

Posted by "eric baldeschwieler (JIRA)" <ji...@apache.org>.

    [ http://issues.apache.org/jira/browse/HADOOP-66?page=comments#action_12369159 ] 

eric baldeschwieler commented on HADOOP-66:
-------------------------------------------

So the problem with /tmp is that this can fill up and cause failures.  This is very config / install specific.  We almost never use /tmp because it gets blown out by something sometime, always when you least expect it.  Maybe we should throw by default and provide some config to do something else, such as provide a a file path for temp files?  This could be in /tmp if you chose, or map reduce could default to its temp directory where it is storing everything else.

Performance is clearly not an issue if this is truly an exceptional case.

> dfs client writes all data for a chunk to /tmp
> ----------------------------------------------
>
>          Key: HADOOP-66
>          URL: http://issues.apache.org/jira/browse/HADOOP-66
>      Project: Hadoop
>         Type: Bug
>   Components: dfs
>     Versions: 0.1
>     Reporter: Sameer Paranjpye
>      Fix For: 0.1

>
> The dfs client writes all the data for the current chunk to a file in /tmp, when the chunk is complete it is shipped out to the Datanodes. This can cause /tmp to fill up fast when a lot of files are being written. A potentially better scheme is to buffer the written data in RAM (application code can set the buffer size) and flush it to the Datanodes when the buffer fills up.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

[jira] Commented: (HADOOP-66) dfs client writes all data for a chunk to /tmp

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.

    [ http://issues.apache.org/jira/browse/HADOOP-66?page=comments#action_12370407 ] 

Doug Cutting commented on HADOOP-66:
------------------------------------

Should we worry about the space this consumes?  For each block a client writes there will be memory allocated containing the path name to the temporary file that cannot be gc'd.  If that's 100 bytes, then 1M blocks would generate 100MB, which would be a big leak.  But writing 1M blocks means writing 32TB from a single JVM, which would take around a month (at current dfs speeds).  If we increase the block size (as has been discussed) then the rate is slowed proportionally.  So I guess we don't worry about this "leak"?

> dfs client writes all data for a chunk to /tmp
> ----------------------------------------------
>
>          Key: HADOOP-66
>          URL: http://issues.apache.org/jira/browse/HADOOP-66
>      Project: Hadoop
>         Type: Bug
>   Components: dfs
>     Versions: 0.1
>     Reporter: Sameer Paranjpye
>     Assignee: Doug Cutting
>      Fix For: 0.1
>  Attachments: no-tmp.patch, tmp-delete.patch
>
> The dfs client writes all the data for the current chunk to a file in /tmp, when the chunk is complete it is shipped out to the Datanodes. This can cause /tmp to fill up fast when a lot of files are being written. A potentially better scheme is to buffer the written data in RAM (application code can set the buffer size) and flush it to the Datanodes when the buffer fills up.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

[jira] Resolved: (HADOOP-66) dfs client writes all data for a chunk to /tmp

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.

     [ http://issues.apache.org/jira/browse/HADOOP-66?page=all ]
     
Doug Cutting resolved HADOOP-66:
--------------------------------

    Resolution: Fixed
     Assign To: Doug Cutting

I just committed a fix for this and some problems that it was hiding.

> dfs client writes all data for a chunk to /tmp
> ----------------------------------------------
>
>          Key: HADOOP-66
>          URL: http://issues.apache.org/jira/browse/HADOOP-66
>      Project: Hadoop
>         Type: Bug
>   Components: dfs
>     Versions: 0.1
>     Reporter: Sameer Paranjpye
>     Assignee: Doug Cutting
>      Fix For: 0.1
>  Attachments: no-tmp.patch
>
> The dfs client writes all the data for the current chunk to a file in /tmp, when the chunk is complete it is shipped out to the Datanodes. This can cause /tmp to fill up fast when a lot of files are being written. A potentially better scheme is to buffer the written data in RAM (application code can set the buffer size) and flush it to the Datanodes when the buffer fills up.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira