You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Raghu Angadi (JIRA)" <ji...@apache.org> on 2007/12/04 19:57:43 UTC

[jira] Created: (HADOOP-2346) DataNode should have timeout on socket writes.

DataNode should have timeout on socket writes.
----------------------------------------------

                 Key: HADOOP-2346
                 URL: https://issues.apache.org/jira/browse/HADOOP-2346
             Project: Hadoop
          Issue Type: Bug
            Reporter: Raghu Angadi
             Fix For: 0.16.0



If a client opens a file and stops reading in the middle, DataNode thread writing the data could be stuck forever. For DataNode sockets we set read timeout but not write timeout. I think we should add a write(data, timeout) method in IOUtils that assumes it the underlying FileChannel is non-blocking.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-2346) DataNode should have timeout on socket writes.

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-2346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12548366 ] 

Doug Cutting commented on HADOOP-2346:
--------------------------------------

Write timeouts are only possible by using async i/o.  So would you have each thread to create its own selector and loop on it?  Is that wise?  Otherwise, implementing this would require converting the entire datanode to use async i/o, which is probably a good idea, but not a short-term one.

> DataNode should have timeout on socket writes.
> ----------------------------------------------
>
>                 Key: HADOOP-2346
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2346
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.15.1
>            Reporter: Raghu Angadi
>
> If a client opens a file and stops reading in the middle, DataNode thread writing the data could be stuck forever. For DataNode sockets we set read timeout but not write timeout. I think we should add a write(data, timeout) method in IOUtils that assumes it the underlying FileChannel is non-blocking.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-2346) DataNode should have timeout on socket writes.

Posted by "Raghu Angadi (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-2346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Raghu Angadi updated HADOOP-2346:
---------------------------------

    Attachment: HADOOP-2346.patch

> How well does it work?

All the unit tests pass and I tested with (artificially) slow clients. Performance wise, I don't expect any change. JRE has to do something like this anyway. select() is invoked only when we need to wait. We will surely run benchmarks before this goes, may be after 0.16 branch is cut. 

> SocketInputStream and SocketOutputStream seem like fine names, but should they be nested classes in IOUtils, or perhaps independent classes in the 'net' package?

Yes, these can be independent classes in io or net package. Currently there is a ipc.SocketOutputStream (not currently used in Hadoop), which is just a special case of SocketOutputStream here. ipc.SocketOutputStream will be removed.

> Also, we might make the error messages in the exceptions a bit more informative, e.g., including the address the socket is connected to, the timeout, etc.
should do this.

When I prepare more complete patch for 0.17, I think we should replace all socket input/output streams in Datanode with these.

Also WRITE_TIMEOUT of 1 minute might be too short.

The patch attached does not have above changes, it just has some minor changes from previous patch.

> DataNode should have timeout on socket writes.
> ----------------------------------------------
>
>                 Key: HADOOP-2346
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2346
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.15.1
>            Reporter: Raghu Angadi
>            Assignee: Raghu Angadi
>         Attachments: HADOOP-2346.patch, HADOOP-2346.patch
>
>
> If a client opens a file and stops reading in the middle, DataNode thread writing the data could be stuck forever. For DataNode sockets we set read timeout but not write timeout. I think we should add a write(data, timeout) method in IOUtils that assumes it the underlying FileChannel is non-blocking.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-2346) DataNode should have timeout on socket writes.

Posted by "Raghu Angadi (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-2346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Raghu Angadi updated HADOOP-2346:
---------------------------------

    Attachment: HADOOP-2346.patch


This patch implements write timeout on datanodes for block reads.  Currently only client reads have write timeout. Once the fix looks good , we can write timeout in other places (while writing mirror for e.g.).

This adds two classes SocketInputStream and SocketOutputStream in IOUtils. Please suggest better names.


> DataNode should have timeout on socket writes.
> ----------------------------------------------
>
>                 Key: HADOOP-2346
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2346
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.15.1
>            Reporter: Raghu Angadi
>            Assignee: Raghu Angadi
>         Attachments: HADOOP-2346.patch
>
>
> If a client opens a file and stops reading in the middle, DataNode thread writing the data could be stuck forever. For DataNode sockets we set read timeout but not write timeout. I think we should add a write(data, timeout) method in IOUtils that assumes it the underlying FileChannel is non-blocking.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-2346) DataNode should have timeout on socket writes.

Posted by "Raghu Angadi (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-2346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Raghu Angadi updated HADOOP-2346:
---------------------------------

          Component/s: dfs
        Fix Version/s:     (was: 0.16.0)
          Description: 
If a client opens a file and stops reading in the middle, DataNode thread writing the data could be stuck forever. For DataNode sockets we set read timeout but not write timeout. I think we should add a write(data, timeout) method in IOUtils that assumes it the underlying FileChannel is non-blocking.


  was:

If a client opens a file and stops reading in the middle, DataNode thread writing the data could be stuck forever. For DataNode sockets we set read timeout but not write timeout. I think we should add a write(data, timeout) method in IOUtils that assumes it the underlying FileChannel is non-blocking.


    Affects Version/s: 0.15.1

> DataNode should have timeout on socket writes.
> ----------------------------------------------
>
>                 Key: HADOOP-2346
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2346
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.15.1
>            Reporter: Raghu Angadi
>
> If a client opens a file and stops reading in the middle, DataNode thread writing the data could be stuck forever. For DataNode sockets we set read timeout but not write timeout. I think we should add a write(data, timeout) method in IOUtils that assumes it the underlying FileChannel is non-blocking.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-2346) DataNode should have timeout on socket writes.

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-2346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12558709#action_12558709 ] 

Doug Cutting commented on HADOOP-2346:
--------------------------------------

This looks nice!  How well does it work?

SocketInputStream and SocketOutputStream seem like fine names, but should they be nested classes in IOUtils, or perhaps independent classes in the 'net' package?

Also, we might make the error messages in the exceptions a bit more informative, e.g., including the address the socket is connected to, the timeout, etc.

> DataNode should have timeout on socket writes.
> ----------------------------------------------
>
>                 Key: HADOOP-2346
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2346
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.15.1
>            Reporter: Raghu Angadi
>            Assignee: Raghu Angadi
>         Attachments: HADOOP-2346.patch
>
>
> If a client opens a file and stops reading in the middle, DataNode thread writing the data could be stuck forever. For DataNode sockets we set read timeout but not write timeout. I think we should add a write(data, timeout) method in IOUtils that assumes it the underlying FileChannel is non-blocking.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-2346) DataNode should have timeout on socket writes.

Posted by "Raghu Angadi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-2346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12548372 ] 

Raghu Angadi commented on HADOOP-2346:
--------------------------------------

A new FilterInputStream would poll only when read or write fails with EAGAIN (some Java equivalent). So it does not slow down fast clients. I don't think a poll() costs much.. especially when we are polling when required. If there are a lot of clients reading from DataNode, bottleneck is most certainly going to be disk.

We don't strictly need this for 16.

> DataNode should have timeout on socket writes.
> ----------------------------------------------
>
>                 Key: HADOOP-2346
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2346
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.15.1
>            Reporter: Raghu Angadi
>
> If a client opens a file and stops reading in the middle, DataNode thread writing the data could be stuck forever. For DataNode sockets we set read timeout but not write timeout. I think we should add a write(data, timeout) method in IOUtils that assumes it the underlying FileChannel is non-blocking.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Assigned: (HADOOP-2346) DataNode should have timeout on socket writes.

Posted by "Raghu Angadi (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-2346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Raghu Angadi reassigned HADOOP-2346:
------------------------------------

    Assignee: Raghu Angadi

> DataNode should have timeout on socket writes.
> ----------------------------------------------
>
>                 Key: HADOOP-2346
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2346
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.15.1
>            Reporter: Raghu Angadi
>            Assignee: Raghu Angadi
>
> If a client opens a file and stops reading in the middle, DataNode thread writing the data could be stuck forever. For DataNode sockets we set read timeout but not write timeout. I think we should add a write(data, timeout) method in IOUtils that assumes it the underlying FileChannel is non-blocking.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.