You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-dev@hadoop.apache.org by Kyle Sletmoe <ky...@urbanrobotics.net> on 2013/10/28 23:38:44 UTC
libhdfs portability
Now that Hadoop 2.2.0 is Windows compatible, is there going to be work on
creating a portable version of libhdfs for C/C++ interaction with HDFS? I
know I can use the WebHDFS REST API, but the data transfer rates are
abysmally slow compared to the direct interaction via libhdfs.
Regards,
--
Kyle Sletmoe
*Urban Robotics Inc.**
*Software Engineer
33 NW First Avenue, Suite 200 | Portland, OR 97209
c: (541) 621-7516 | e: kyle.sletmoe@urbanrobotics.net
http://www.urbanrobotics.net
--
*Information contained herein is subject to the Code of Federal Regulations
Chapter 22 International Traffic in Arms Regulations. This data may not be
resold, diverted, transferred, transshipped, made available to a foreign
national within the United States, or otherwise disposed of in any other
country outside of its intended destination, either in original form or
after being incorporated through an intermediate process into other data
without the prior written approval of the US Department of State. **Penalties
for violation include bans on defense and military work, fines and
imprisonment.*
Re: libhdfs portability
Posted by Colin McCabe <cm...@alumni.cmu.edu>.
On Mon, Oct 28, 2013 at 4:24 PM, Kyle Sletmoe
<ky...@urbanrobotics.net> wrote:
> I have written a WebHDFSClient and I do not believe that reusing
> connections is enough to noticeably speed up transfers in my case. I did
> some tests and on average it took roughly 14 minutes to transfer a 3.6 GB
> file to an HDFS on my local network (I tried the same operation using cURL,
> with similar results). I tried transferring the exact same file with the
> hdfs->dfs->copyFromLocal command, and it took on average 40 seconds. I need
> to be able to reliably transfer files that are in the 250 GB - 1TB range,
> and I really need the speed afforded by the "direct" transferring method
> that libhdfs uses. Does libhdfs work with Hadoop 2.2.0 (if I use it in
> Linux)?
libhdfs is the basis of a lot of software built on top of HDFS, such
as Impala and fuse_dfs, and yes, it works.
Patches that improve portabilty are welcome. However, rather than
#ifdefs, I would rather see platform-specific files that implement
whatever functionality is platform-specific.
Another option for you is to use the new NFS v3 gateway included in
Hadoop 2. I have heard that newer version of Windows finally include
some kind of NFS support. (However, older versions, such as Windows
XP, do not have this support).
best,
Colin
>
> --
> Kyle Sletmoe
>
> *Urban Robotics Inc.**
> *Software Engineer
>
> 33 NW First Avenue, Suite 200 | Portland, OR 97209
> c: (541) 621-7516 | e: kyle.sletmoe@urbanrobotics.net
>
> http://www.urbanrobotics.net
>
>
> On Mon, Oct 28, 2013 at 4:14 PM, Haohui Mai <hm...@hortonworks.com> wrote:
>
>> I believe that the WebHDFS API is your best bet for now. The current
>> implementation of WebHDFSClient does not reuse the HTTP connections, which
>> leads to a large part of the performance penalty.
>>
>> You might want to implement your own version that reuses HTTP connection to
>> see whether it meets your performance requirements.
>>
>> Thanks,
>> Haohui
>>
>>
>> On Mon, Oct 28, 2013 at 3:38 PM, Kyle Sletmoe <
>> kyle.sletmoe@urbanrobotics.net> wrote:
>>
>> > Now that Hadoop 2.2.0 is Windows compatible, is there going to be work on
>> > creating a portable version of libhdfs for C/C++ interaction with HDFS? I
>> > know I can use the WebHDFS REST API, but the data transfer rates are
>> > abysmally slow compared to the direct interaction via libhdfs.
>> >
>> > Regards,
>> > --
>> > Kyle Sletmoe
>> >
>> > *Urban Robotics Inc.**
>> > *Software Engineer
>> >
>> > 33 NW First Avenue, Suite 200 | Portland, OR 97209
>> > c: (541) 621-7516 | e: kyle.sletmoe@urbanrobotics.net
>> >
>> > http://www.urbanrobotics.net
>> >
>> > --
>> > *Information contained herein is subject to the Code of Federal
>> Regulations
>> > Chapter 22 International Traffic in Arms Regulations. This data may not
>> be
>> > resold, diverted, transferred, transshipped, made available to a foreign
>> > national within the United States, or otherwise disposed of in any other
>> > country outside of its intended destination, either in original form or
>> > after being incorporated through an intermediate process into other data
>> > without the prior written approval of the US Department of State.
>> > **Penalties
>> > for violation include bans on defense and military work, fines and
>> > imprisonment.*
>> >
>>
>> --
>> CONFIDENTIALITY NOTICE
>> NOTICE: This message is intended for the use of the individual or entity to
>> which it is addressed and may contain information that is confidential,
>> privileged and exempt from disclosure under applicable law. If the reader
>> of this message is not the intended recipient, you are hereby notified that
>> any printing, copying, dissemination, distribution, disclosure or
>> forwarding of this communication is strictly prohibited. If you have
>> received this communication in error, please contact the sender immediately
>> and delete it from your system. Thank You.
>>
>
> --
> *Information contained herein is subject to the Code of Federal Regulations
> Chapter 22 International Traffic in Arms Regulations. This data may not be
> resold, diverted, transferred, transshipped, made available to a foreign
> national within the United States, or otherwise disposed of in any other
> country outside of its intended destination, either in original form or
> after being incorporated through an intermediate process into other data
> without the prior written approval of the US Department of State. **Penalties
> for violation include bans on defense and military work, fines and
> imprisonment.*
Re: libhdfs portability
Posted by Kyle Sletmoe <ky...@urbanrobotics.net>.
I have written a WebHDFSClient and I do not believe that reusing
connections is enough to noticeably speed up transfers in my case. I did
some tests and on average it took roughly 14 minutes to transfer a 3.6 GB
file to an HDFS on my local network (I tried the same operation using cURL,
with similar results). I tried transferring the exact same file with the
hdfs->dfs->copyFromLocal command, and it took on average 40 seconds. I need
to be able to reliably transfer files that are in the 250 GB - 1TB range,
and I really need the speed afforded by the "direct" transferring method
that libhdfs uses. Does libhdfs work with Hadoop 2.2.0 (if I use it in
Linux)?
--
Kyle Sletmoe
*Urban Robotics Inc.**
*Software Engineer
33 NW First Avenue, Suite 200 | Portland, OR 97209
c: (541) 621-7516 | e: kyle.sletmoe@urbanrobotics.net
http://www.urbanrobotics.net
On Mon, Oct 28, 2013 at 4:14 PM, Haohui Mai <hm...@hortonworks.com> wrote:
> I believe that the WebHDFS API is your best bet for now. The current
> implementation of WebHDFSClient does not reuse the HTTP connections, which
> leads to a large part of the performance penalty.
>
> You might want to implement your own version that reuses HTTP connection to
> see whether it meets your performance requirements.
>
> Thanks,
> Haohui
>
>
> On Mon, Oct 28, 2013 at 3:38 PM, Kyle Sletmoe <
> kyle.sletmoe@urbanrobotics.net> wrote:
>
> > Now that Hadoop 2.2.0 is Windows compatible, is there going to be work on
> > creating a portable version of libhdfs for C/C++ interaction with HDFS? I
> > know I can use the WebHDFS REST API, but the data transfer rates are
> > abysmally slow compared to the direct interaction via libhdfs.
> >
> > Regards,
> > --
> > Kyle Sletmoe
> >
> > *Urban Robotics Inc.**
> > *Software Engineer
> >
> > 33 NW First Avenue, Suite 200 | Portland, OR 97209
> > c: (541) 621-7516 | e: kyle.sletmoe@urbanrobotics.net
> >
> > http://www.urbanrobotics.net
> >
> > --
> > *Information contained herein is subject to the Code of Federal
> Regulations
> > Chapter 22 International Traffic in Arms Regulations. This data may not
> be
> > resold, diverted, transferred, transshipped, made available to a foreign
> > national within the United States, or otherwise disposed of in any other
> > country outside of its intended destination, either in original form or
> > after being incorporated through an intermediate process into other data
> > without the prior written approval of the US Department of State.
> > **Penalties
> > for violation include bans on defense and military work, fines and
> > imprisonment.*
> >
>
> --
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to
> which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.
>
--
*Information contained herein is subject to the Code of Federal Regulations
Chapter 22 International Traffic in Arms Regulations. This data may not be
resold, diverted, transferred, transshipped, made available to a foreign
national within the United States, or otherwise disposed of in any other
country outside of its intended destination, either in original form or
after being incorporated through an intermediate process into other data
without the prior written approval of the US Department of State. **Penalties
for violation include bans on defense and military work, fines and
imprisonment.*
Re: libhdfs portability
Posted by Haohui Mai <hm...@hortonworks.com>.
I believe that the WebHDFS API is your best bet for now. The current
implementation of WebHDFSClient does not reuse the HTTP connections, which
leads to a large part of the performance penalty.
You might want to implement your own version that reuses HTTP connection to
see whether it meets your performance requirements.
Thanks,
Haohui
On Mon, Oct 28, 2013 at 3:38 PM, Kyle Sletmoe <
kyle.sletmoe@urbanrobotics.net> wrote:
> Now that Hadoop 2.2.0 is Windows compatible, is there going to be work on
> creating a portable version of libhdfs for C/C++ interaction with HDFS? I
> know I can use the WebHDFS REST API, but the data transfer rates are
> abysmally slow compared to the direct interaction via libhdfs.
>
> Regards,
> --
> Kyle Sletmoe
>
> *Urban Robotics Inc.**
> *Software Engineer
>
> 33 NW First Avenue, Suite 200 | Portland, OR 97209
> c: (541) 621-7516 | e: kyle.sletmoe@urbanrobotics.net
>
> http://www.urbanrobotics.net
>
> --
> *Information contained herein is subject to the Code of Federal Regulations
> Chapter 22 International Traffic in Arms Regulations. This data may not be
> resold, diverted, transferred, transshipped, made available to a foreign
> national within the United States, or otherwise disposed of in any other
> country outside of its intended destination, either in original form or
> after being incorporated through an intermediate process into other data
> without the prior written approval of the US Department of State.
> **Penalties
> for violation include bans on defense and military work, fines and
> imprisonment.*
>
--
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to
which it is addressed and may contain information that is confidential,
privileged and exempt from disclosure under applicable law. If the reader
of this message is not the intended recipient, you are hereby notified that
any printing, copying, dissemination, distribution, disclosure or
forwarding of this communication is strictly prohibited. If you have
received this communication in error, please contact the sender immediately
and delete it from your system. Thank You.