You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Patrick Donnelly <ba...@batbytes.com> on 2010/06/11 19:05:42 UTC
Caching in HDFS C API Client
Hi List,
I need to explain an higher than expected throughput (bandwidth) for a
HDFS C API Client. Specifically, the client is getting bandwidth
higher than its link rate :). The client is first writing a 512 MB
file followed by reading the entire file back. The file read is what's
getting the higher than link rate bandwidth. I assume this is a
consequence of caching? Is this done by HDFS or by Linux?
Thanks for any help,
--
- Patrick Donnelly
Re: Caching in HDFS C API Client
Posted by Arun C Murthy <ac...@yahoo-inc.com>.
Nice, thanks Brian!
On Jun 14, 2010, at 7:39 AM, Brian Bockelman wrote:
> Hey Owen, all,
>
> I find this one handy if you have root access:
>
> http://linux-mm.org/Drop_Caches
>
> echo 3 > /proc/sys/vm/drop_caches
>
> Drops the pagecache, dentries, and inodes. Without this, you can
> still get caching effects doing the normal "read and write large
> files" if the linux pagecache outsmarts you (and I don't know about
> you, but it often outsmarts me...).
>
> Brian
>
> On Jun 14, 2010, at 9:35 AM, Owen O'Malley wrote:
>
>> Indeed. On the terasort benchmark, I had to run intermediate jobs
>> that
>> were larger than ram on the cluster to ensure that the data was not
>> coming from the file cache.
>>
>> -- Owen
>
Re: Caching in HDFS C API Client
Posted by Brian Bockelman <bb...@cse.unl.edu>.
Hey Owen, all,
I find this one handy if you have root access:
http://linux-mm.org/Drop_Caches
echo 3 > /proc/sys/vm/drop_caches
Drops the pagecache, dentries, and inodes. Without this, you can still get caching effects doing the normal "read and write large files" if the linux pagecache outsmarts you (and I don't know about you, but it often outsmarts me...).
Brian
On Jun 14, 2010, at 9:35 AM, Owen O'Malley wrote:
> Indeed. On the terasort benchmark, I had to run intermediate jobs that
> were larger than ram on the cluster to ensure that the data was not
> coming from the file cache.
>
> -- Owen
Re: Caching in HDFS C API Client
Posted by Owen O'Malley <om...@apache.org>.
Indeed. On the terasort benchmark, I had to run intermediate jobs that
were larger than ram on the cluster to ensure that the data was not
coming from the file cache.
-- Owen
Re: Caching in HDFS C API Client
Posted by Arun C Murthy <ac...@yahoo-inc.com>.
I'd bet on the Linux file-cache. Assuming you wrote the file with the
default replication factor of 3, there is one replica of the local-
filesystem which you are reading...
Try writing multiple GBs of data and randomly reading large files to
blow your file-cache?
Arun
On Jun 11, 2010, at 10:05 AM, Patrick Donnelly wrote:
> Hi List,
>
> I need to explain an higher than expected throughput (bandwidth) for a
> HDFS C API Client. Specifically, the client is getting bandwidth
> higher than its link rate :). The client is first writing a 512 MB
> file followed by reading the entire file back. The file read is what's
> getting the higher than link rate bandwidth. I assume this is a
> consequence of caching? Is this done by HDFS or by Linux?
>
> Thanks for any help,
>
> --
> - Patrick Donnelly