You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by Babak Farhang <fa...@gmail.com> on 2009/01/14 07:17:40 UTC

Re: [jira] Commented: (HADOOP-3856) Asynchronous IO Handling in Hadoop and HDFS

Hi,

Late to this thread..

>> Essentially what I have in mind is similar to MINA, except that read and write of the sockets is done by the event handlers. The lowest layer essentially invokes selectors, invokes event handlers on single or on multiple threads. Each event handler is is expected to do some non-blocking work. We would of course have utility handler implementations that do  read, write, accept etc, that are useful for simple processing.
>> Are there other such implementations we should look at?

> Unless we have a better option, I would suggest have something custom written that leaves all the flexibility and control with us.

If you take that path, you might also want to look at some very simple
abstractions I developed for my project's non-blocking HTTP (subset)
server implementation. (Javadoc for the relevant package is
http://skwish.sourceforge.net/doc/com/faunos/util/net/package-summary.html
). I looked at implementing non-blocking HTTP using Doug Lea's
reactor pattern described in
http://gee.cs.oswego.edu/dl/cpjslides/nio.pdf , but I think my
abstractions are conceptually simpler, and thus easier to implement.

Skwish's NIO facilities are not really germane to that project. Still,
I thought you might find this aspect of it interesting.

Regards,

-Babak
http://skwish.sourceforge.net/


On Wed, Jul 30, 2008 at 4:53 AM, Ankur (JIRA) <ji...@apache.org> wrote:
>
>    [ https://issues.apache.org/jira/browse/HADOOP-3856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12618302#action_12618302 ]
>
> Ankur commented on HADOOP-3856:
> -------------------------------
>
> You can extend NIOSession to wrap the non-blocking sockets. The wrapper can then be added to a NIOProcessor that can be run via an executor service. However taking a look at the trunk code, NIOProcessor seems to be marked final so it looks like it can't be extended to modify/add behaviour. Instead one can extend AbstractPollingIOProcessor to implement a custom NIOProcessor that takes care of doing non-blocking work on NIOSession.
>
> It doesn't look like there are other such libraries having Apache compatible license. So we are left with little choice.
> Apache Mina looks more complex to me than using Java NIO and we might end up writing more code than ideal to workaround its various limitations which would ultimately make our code unnecessarily more complex. This would offset the benefit of using an external NIO library.
>
> Unless we have a better option, I would suggest have something custom written that leaves all the flexibility and control with us.
>
>> Asynchronous IO Handling in Hadoop and HDFS
>> -------------------------------------------
>>
>>                 Key: HADOOP-3856
>>                 URL: https://issues.apache.org/jira/browse/HADOOP-3856
>>             Project: Hadoop Core
>>          Issue Type: New Feature
>>          Components: dfs, io
>>            Reporter: Raghu Angadi
>>
>> I think Hadoop needs utilities or framework to make it simpler to deal with generic asynchronous IO in  Hadoop.
>> Example use case :
>> Its been a long standing problem that DataNode takes too many threads for data transfers. Each write operation takes up 2 threads at each of the datanodes and each read operation takes one irrespective of how much activity is on the sockets. The kinds of load that HDFS serves has been expanding quite fast and HDFS should handle these varied loads better. If there is a framework for non-blocking IO, read and write pipeline state machines could be implemented with async events on a fixed number of threads.
>> A generic utility is better since it could be used in other places like DFSClient. DFSClient currently creates 2 extra threads for each file it has open for writing.
>> Initially I started writing a primitive "selector", then tried to see if such facility already exists. [Apache MINA|http://mina.apache.org] seemed to do exactly this. My impression after looking the the interface and examples is that it does not give kind control we might prefer or need.  First use case I was thinking of implementing using MINA was to replace "response handlers" in DataNode. The response handlers are simpler since they don't involve disk I/O. I [asked on MINA user list|http://www.nabble.com/Async-events-with-existing-NIO-sockets.-td18640767.html], but looks like it can not be done, I think mainly because the sockets are already created.
>> Essentially what I have in mind is similar to MINA, except that read and write of the sockets is done by the event handlers. The lowest layer essentially invokes selectors, invokes event handlers on single or on multiple threads. Each event handler is is expected to do some non-blocking work. We would of course have utility handler implementations that do  read, write, accept etc, that are useful for simple processing.
>> Sam Pullara mentioned that [xSockets|http://xsocket.sourceforge.net/] is more flexible. It is under GPL.
>> Are there other such implementations we should look at?
>
> --
> This message is automatically generated by JIRA.
> -
> You can reply to this email to add a comment to the issue online.
>
>