You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-dev@hadoop.apache.org by Kyle Sletmoe <ky...@urbanrobotics.net> on 2013/10/28 23:38:44 UTC

libhdfs portability

Now that Hadoop 2.2.0 is Windows compatible, is there going to be work on
creating a portable version of libhdfs for C/C++ interaction with HDFS? I
know I can use the WebHDFS REST API, but the data transfer rates are
abysmally slow compared to the direct interaction via libhdfs.

Regards,
--
Kyle Sletmoe

*Urban Robotics Inc.**
*Software Engineer

33 NW First Avenue, Suite 200 | Portland, OR 97209
c: (541) 621-7516 | e: kyle.sletmoe@urbanrobotics.net

http://www.urbanrobotics.net

-- 
*Information contained herein is subject to the Code of Federal Regulations 
Chapter 22 International Traffic in Arms Regulations. This data may not be 
resold, diverted, transferred, transshipped, made available to a foreign 
national within the United States, or otherwise disposed of in any other 
country outside of its intended destination, either in original form or 
after being incorporated through an intermediate process into other data 
without the prior written approval of the US Department of State.  **Penalties 
for violation include bans on defense and military work, fines and 
imprisonment.*

Re: libhdfs portability

Posted by Colin McCabe <cm...@alumni.cmu.edu>.
On Mon, Oct 28, 2013 at 4:24 PM, Kyle Sletmoe
<ky...@urbanrobotics.net> wrote:
> I have written a WebHDFSClient and I do not believe that reusing
> connections is enough to noticeably speed up transfers in my case. I did
> some tests and on average it took roughly 14 minutes to transfer a 3.6 GB
> file to an HDFS on my local network (I tried the same operation using cURL,
> with similar results). I tried transferring the exact same file with the
> hdfs->dfs->copyFromLocal command, and it took on average 40 seconds. I need
> to be able to reliably transfer files that are in the 250 GB - 1TB range,
> and I really need the speed afforded by the "direct" transferring method
> that libhdfs uses. Does libhdfs work with Hadoop 2.2.0 (if I use it in
> Linux)?

libhdfs is the basis of a lot of software built on top of HDFS, such
as Impala and fuse_dfs, and yes, it works.

Patches that improve portabilty are welcome.  However, rather than
#ifdefs, I would rather see platform-specific files that implement
whatever functionality is platform-specific.

Another option for you is to use the new NFS v3 gateway included in
Hadoop 2.  I have heard that newer version of Windows finally include
some kind of NFS support.  (However, older versions, such as Windows
XP, do not have this support).

best,
Colin


>
> --
> Kyle Sletmoe
>
> *Urban Robotics Inc.**
> *Software Engineer
>
> 33 NW First Avenue, Suite 200 | Portland, OR 97209
> c: (541) 621-7516 | e: kyle.sletmoe@urbanrobotics.net
>
> http://www.urbanrobotics.net
>
>
> On Mon, Oct 28, 2013 at 4:14 PM, Haohui Mai <hm...@hortonworks.com> wrote:
>
>> I believe that the WebHDFS API is your best bet for now. The current
>> implementation of WebHDFSClient does not reuse the HTTP connections, which
>> leads to a large part of the performance penalty.
>>
>> You might want to implement your own version that reuses HTTP connection to
>> see whether it meets your performance requirements.
>>
>> Thanks,
>> Haohui
>>
>>
>> On Mon, Oct 28, 2013 at 3:38 PM, Kyle Sletmoe <
>> kyle.sletmoe@urbanrobotics.net> wrote:
>>
>> > Now that Hadoop 2.2.0 is Windows compatible, is there going to be work on
>> > creating a portable version of libhdfs for C/C++ interaction with HDFS? I
>> > know I can use the WebHDFS REST API, but the data transfer rates are
>> > abysmally slow compared to the direct interaction via libhdfs.
>> >
>> > Regards,
>> > --
>> > Kyle Sletmoe
>> >
>> > *Urban Robotics Inc.**
>> > *Software Engineer
>> >
>> > 33 NW First Avenue, Suite 200 | Portland, OR 97209
>> > c: (541) 621-7516 | e: kyle.sletmoe@urbanrobotics.net
>> >
>> > http://www.urbanrobotics.net
>> >
>> > --
>> > *Information contained herein is subject to the Code of Federal
>> Regulations
>> > Chapter 22 International Traffic in Arms Regulations. This data may not
>> be
>> > resold, diverted, transferred, transshipped, made available to a foreign
>> > national within the United States, or otherwise disposed of in any other
>> > country outside of its intended destination, either in original form or
>> > after being incorporated through an intermediate process into other data
>> > without the prior written approval of the US Department of State.
>> >  **Penalties
>> > for violation include bans on defense and military work, fines and
>> > imprisonment.*
>> >
>>
>> --
>> CONFIDENTIALITY NOTICE
>> NOTICE: This message is intended for the use of the individual or entity to
>> which it is addressed and may contain information that is confidential,
>> privileged and exempt from disclosure under applicable law. If the reader
>> of this message is not the intended recipient, you are hereby notified that
>> any printing, copying, dissemination, distribution, disclosure or
>> forwarding of this communication is strictly prohibited. If you have
>> received this communication in error, please contact the sender immediately
>> and delete it from your system. Thank You.
>>
>
> --
> *Information contained herein is subject to the Code of Federal Regulations
> Chapter 22 International Traffic in Arms Regulations. This data may not be
> resold, diverted, transferred, transshipped, made available to a foreign
> national within the United States, or otherwise disposed of in any other
> country outside of its intended destination, either in original form or
> after being incorporated through an intermediate process into other data
> without the prior written approval of the US Department of State.  **Penalties
> for violation include bans on defense and military work, fines and
> imprisonment.*

Re: libhdfs portability

Posted by Kyle Sletmoe <ky...@urbanrobotics.net>.
I have written a WebHDFSClient and I do not believe that reusing
connections is enough to noticeably speed up transfers in my case. I did
some tests and on average it took roughly 14 minutes to transfer a 3.6 GB
file to an HDFS on my local network (I tried the same operation using cURL,
with similar results). I tried transferring the exact same file with the
hdfs->dfs->copyFromLocal command, and it took on average 40 seconds. I need
to be able to reliably transfer files that are in the 250 GB - 1TB range,
and I really need the speed afforded by the "direct" transferring method
that libhdfs uses. Does libhdfs work with Hadoop 2.2.0 (if I use it in
Linux)?

--
Kyle Sletmoe

*Urban Robotics Inc.**
*Software Engineer

33 NW First Avenue, Suite 200 | Portland, OR 97209
c: (541) 621-7516 | e: kyle.sletmoe@urbanrobotics.net

http://www.urbanrobotics.net


On Mon, Oct 28, 2013 at 4:14 PM, Haohui Mai <hm...@hortonworks.com> wrote:

> I believe that the WebHDFS API is your best bet for now. The current
> implementation of WebHDFSClient does not reuse the HTTP connections, which
> leads to a large part of the performance penalty.
>
> You might want to implement your own version that reuses HTTP connection to
> see whether it meets your performance requirements.
>
> Thanks,
> Haohui
>
>
> On Mon, Oct 28, 2013 at 3:38 PM, Kyle Sletmoe <
> kyle.sletmoe@urbanrobotics.net> wrote:
>
> > Now that Hadoop 2.2.0 is Windows compatible, is there going to be work on
> > creating a portable version of libhdfs for C/C++ interaction with HDFS? I
> > know I can use the WebHDFS REST API, but the data transfer rates are
> > abysmally slow compared to the direct interaction via libhdfs.
> >
> > Regards,
> > --
> > Kyle Sletmoe
> >
> > *Urban Robotics Inc.**
> > *Software Engineer
> >
> > 33 NW First Avenue, Suite 200 | Portland, OR 97209
> > c: (541) 621-7516 | e: kyle.sletmoe@urbanrobotics.net
> >
> > http://www.urbanrobotics.net
> >
> > --
> > *Information contained herein is subject to the Code of Federal
> Regulations
> > Chapter 22 International Traffic in Arms Regulations. This data may not
> be
> > resold, diverted, transferred, transshipped, made available to a foreign
> > national within the United States, or otherwise disposed of in any other
> > country outside of its intended destination, either in original form or
> > after being incorporated through an intermediate process into other data
> > without the prior written approval of the US Department of State.
> >  **Penalties
> > for violation include bans on defense and military work, fines and
> > imprisonment.*
> >
>
> --
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to
> which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.
>

-- 
*Information contained herein is subject to the Code of Federal Regulations 
Chapter 22 International Traffic in Arms Regulations. This data may not be 
resold, diverted, transferred, transshipped, made available to a foreign 
national within the United States, or otherwise disposed of in any other 
country outside of its intended destination, either in original form or 
after being incorporated through an intermediate process into other data 
without the prior written approval of the US Department of State.  **Penalties 
for violation include bans on defense and military work, fines and 
imprisonment.*

Re: libhdfs portability

Posted by Haohui Mai <hm...@hortonworks.com>.
I believe that the WebHDFS API is your best bet for now. The current
implementation of WebHDFSClient does not reuse the HTTP connections, which
leads to a large part of the performance penalty.

You might want to implement your own version that reuses HTTP connection to
see whether it meets your performance requirements.

Thanks,
Haohui


On Mon, Oct 28, 2013 at 3:38 PM, Kyle Sletmoe <
kyle.sletmoe@urbanrobotics.net> wrote:

> Now that Hadoop 2.2.0 is Windows compatible, is there going to be work on
> creating a portable version of libhdfs for C/C++ interaction with HDFS? I
> know I can use the WebHDFS REST API, but the data transfer rates are
> abysmally slow compared to the direct interaction via libhdfs.
>
> Regards,
> --
> Kyle Sletmoe
>
> *Urban Robotics Inc.**
> *Software Engineer
>
> 33 NW First Avenue, Suite 200 | Portland, OR 97209
> c: (541) 621-7516 | e: kyle.sletmoe@urbanrobotics.net
>
> http://www.urbanrobotics.net
>
> --
> *Information contained herein is subject to the Code of Federal Regulations
> Chapter 22 International Traffic in Arms Regulations. This data may not be
> resold, diverted, transferred, transshipped, made available to a foreign
> national within the United States, or otherwise disposed of in any other
> country outside of its intended destination, either in original form or
> after being incorporated through an intermediate process into other data
> without the prior written approval of the US Department of State.
>  **Penalties
> for violation include bans on defense and military work, fines and
> imprisonment.*
>

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.