You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-issues@hadoop.apache.org by "Binglin Chang (JIRA)" <ji...@apache.org> on 2014/04/01 08:28:19 UTC

[jira] [Commented] (HADOOP-10388) Pure native hadoop client

    [ https://issues.apache.org/jira/browse/HADOOP-10388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13956157#comment-13956157 ] 

Binglin Chang commented on HADOOP-10388:
----------------------------------------

bq. We can even make the XML-reading code optional if you want.
Sure, if for compatibility I guess add xml support if fine. By keeping strict compatibility we may need to support all javax xml / hadoop config features, I'm afraid libexpact/libxml2 support all of those, a lot effort may be spent on this, it is better to make it optional and do it later I think.

bq. Thread pools and async I/O, I'm afraid, are something we can't live without.
I am also prefer to use async I/O and thread for performance reasons, the code I published on github already have a working HDFS client with read/write, and HDFSOuputstream uses an aditional thread. 
What I was saying is use of more threads should be limited, in java client, to simply read/write a HDFS file, too much threads are used(rpc socket read/write, data transfer socket read/write, other misc executors, lease renewer etc.) Since we use async i/o, thread number should be rapidly reduced


> Pure native hadoop client
> -------------------------
>
>                 Key: HADOOP-10388
>                 URL: https://issues.apache.org/jira/browse/HADOOP-10388
>             Project: Hadoop Common
>          Issue Type: New Feature
>    Affects Versions: HADOOP-10388
>            Reporter: Binglin Chang
>            Assignee: Colin Patrick McCabe
>
> A pure native hadoop client has following use case/advantages:
> 1.  writing Yarn applications using c++
> 2.  direct access to HDFS, without extra proxy overhead, comparing to web/nfs interface.
> 3.  wrap native library to support more languages, e.g. python
> 4.  lightweight, small footprint compare to several hundred MB of JDK and hadoop library with various dependencies.



--
This message was sent by Atlassian JIRA
(v6.2#6252)