You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Da Zheng <zh...@gmail.com> on 2011/01/02 21:01:18 UTC

Hadoop use direct I/O in Linux?

Hello,

direct IO can make huge performance difference, especially when Atom processors
are used. but as far as I know, hadoop doesn't enable direct IO of Linux. Does
anyone know any unofficial versions were developed to use direct IO?

I googled it, and found FUSE provides an option for direct IO. If I use FUSE DFS
and enable direct IO, will I get what I want? i.e., when I write data to HDFS,
the data is written to the disk directly (no caching by any file systems)? or
this direct IO option only allows me to bypass the caching in FUSE and the data
is still cached by the underlying FS?

Best,
Da

Re: Hadoop use direct I/O in Linux?

Posted by Greg Roelofs <ro...@yahoo-inc.com>.
Da Zheng wrote:

> I already did "ant compile-c++-libhdfs -Dlibhdfs=1", but it seems nothing is
> compiled as it prints the following:

> check-c++-libhdfs:

> check-c++-makefile-libhdfs:

> create-c++-libhdfs-makefile:

> compile-c++-libhdfs:

> BUILD SUCCESSFUL
> Total time: 2 seconds

You may need to add -Dcompile.native=true in there.

Switching lists.

Greg

Re: Hadoop use direct I/O in Linux?

Posted by Greg Roelofs <ro...@yahoo-inc.com>.
Da Zheng wrote:

> I already did "ant compile-c++-libhdfs -Dlibhdfs=1", but it seems nothing is
> compiled as it prints the following:

> check-c++-libhdfs:

> check-c++-makefile-libhdfs:

> create-c++-libhdfs-makefile:

> compile-c++-libhdfs:

> BUILD SUCCESSFUL
> Total time: 2 seconds

You may need to add -Dcompile.native=true in there.

Switching lists.

Greg

Re: Hadoop use direct I/O in Linux?

Posted by Da Zheng <zh...@gmail.com>.
I already did "ant compile-c++-libhdfs -Dlibhdfs=1", but it seems nothing is
compiled as it prints the following:

check-c++-libhdfs:

check-c++-makefile-libhdfs:

create-c++-libhdfs-makefile:

compile-c++-libhdfs:

BUILD SUCCESSFUL
Total time: 2 seconds

I still have the compilation error for ant compile-contrib -Dlibhdfs=1 -Dfusedfs=1

Da


On 1/2/11 3:36 PM, Harsh J wrote:
> Think it is a error in text, as the actual required target is
> "compile-c++-libhdfs" according to my copy.
> 
> Also, some of those required files may be found pre-compiled under the
> $HADOOP_HOME/c++/ folder.
> 
>> tried ant compile-libhdfs -Dlibhdfs=1
>> BUILD FAILED
>> Target "compile-libhdfs" does not exist in the project "Hadoop".
> 


Re: Hadoop use direct I/O in Linux?

Posted by Harsh J <qw...@gmail.com>.
Think it is a error in text, as the actual required target is
"compile-c++-libhdfs" according to my copy.

Also, some of those required files may be found pre-compiled under the
$HADOOP_HOME/c++/ folder.

> tried ant compile-libhdfs -Dlibhdfs=1
> BUILD FAILED
> Target "compile-libhdfs" does not exist in the project "Hadoop".

-- 
Harsh J
www.harshj.com

Re: Hadoop use direct I/O in Linux?

Posted by Da Zheng <zh...@gmail.com>.
PS, does FUSE DFS work in version 0.20.2?
I followed the instructions in http://wiki.apache.org/hadoop/MountableHDFS, but
when I run the following command:
ant compile-contrib -Dlibhdfs=1 -Dfusedfs=1
and get the error:
BUILD FAILED
/home/zhengda/hadoop-mod-0.20.2/build.xml:497: The following error occurred
while executing this line:
/home/zhengda/hadoop-mod-0.20.2/src/contrib/build.xml:30: The following error
occurred while executing this line:
/home/zhengda/hadoop-mod-0.20.2/src/contrib/fuse-dfs/build.xml:37: libhdfs.so
does not exist: /home/zhengda/hadoop-mod-0.20.2/build/libhdfs/libhdfs.so. Please
check flags -Dlibhdfs=1 -Dfusedfs=1 are set or first try ant compile-libhdfs
-Dlibhdfs=1

tried ant compile-libhdfs -Dlibhdfs=1
BUILD FAILED
Target "compile-libhdfs" does not exist in the project "Hadoop".

Best,
Da


On 1/2/11 3:01 PM, Da Zheng wrote:
> Hello,
> 
> direct IO can make huge performance difference, especially when Atom processors
> are used. but as far as I know, hadoop doesn't enable direct IO of Linux. Does
> anyone know any unofficial versions were developed to use direct IO?
> 
> I googled it, and found FUSE provides an option for direct IO. If I use FUSE DFS
> and enable direct IO, will I get what I want? i.e., when I write data to HDFS,
> the data is written to the disk directly (no caching by any file systems)? or
> this direct IO option only allows me to bypass the caching in FUSE and the data
> is still cached by the underlying FS?
> 
> Best,
> Da