You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-dev@hadoop.apache.org by Aastha Mehta <aa...@gmail.com> on 2011/09/07 07:45:43 UTC

questions regarding fuse_dfs_read

Hello,

I am using FUSE-DFS with HDFS for a project. I have to modify the read and
write functions of fuse_dfs. I have few questions regarding the
fuse_dfs_read code. There is an rdbuffer_size variable associated with the
dfs_context, which is by default initialized to 10M. What is this
rdbuffer_size and what is it used for?

Secondly, in the fuse_dfs_read function, there are two places where
hdfsPread() is called in a loop. First, there is a check for whether the
requested read size is greater than the value of rdbuffer_size. Only if it
is, is the hdfsPread executed. In this case, the data is read into the
buffer passed from the caller.

In the second case, hdfsPread is executed for if a valid buffer is
associated with the dfs file handle fh and the size and offset of read
request lie within the range of the fh->buf. In this case, the data is read
into fh->buf.

Could someone explain what is happening here?

Thanks,
Aastha.

-- 
Aastha Mehta
B.E. (Hons.) Computer Science
BITS Pilani
E-mail: aasthakm@gmail.com

Re: questions regarding fuse_dfs_read

Posted by Aastha Mehta <aa...@gmail.com>.
Thanks Brian. That helped.

Regards,
Aastha.

On 7 September 2011 17:45, Brian Bockelman <bb...@cse.unl.edu> wrote:

> Hi Aastha,
>
> A read-ahead buffer is a common technique to trade higher bandwidth for
> lower latency for a number of common read patterns.  Your OS does something
> similar (a much more advanced technique though).  By reading ahead, HDFS is
> betting that your reads have a pattern to it.  I think the 10MB default is a
> touch excessive (made more sense in previous releases).  I use 32KB.
>
> The buffer is not used if you have very large reads, as it doesn't provide
> any benefit.
>
> Brian
>
> On Sep 7, 2011, at 12:45 AM, Aastha Mehta wrote:
>
> > Hello,
> >
> > I am using FUSE-DFS with HDFS for a project. I have to modify the read
> and
> > write functions of fuse_dfs. I have few questions regarding the
> > fuse_dfs_read code. There is an rdbuffer_size variable associated with
> the
> > dfs_context, which is by default initialized to 10M. What is this
> > rdbuffer_size and what is it used for?
> >
> > Secondly, in the fuse_dfs_read function, there are two places where
> > hdfsPread() is called in a loop. First, there is a check for whether the
> > requested read size is greater than the value of rdbuffer_size. Only if
> it
> > is, is the hdfsPread executed. In this case, the data is read into the
> > buffer passed from the caller.
> >
> > In the second case, hdfsPread is executed for if a valid buffer is
> > associated with the dfs file handle fh and the size and offset of read
> > request lie within the range of the fh->buf. In this case, the data is
> read
> > into fh->buf.
> >
> > Could someone explain what is happening here?
> >
> > Thanks,
> > Aastha.
> >
> > --
> > Aastha Mehta
> > B.E. (Hons.) Computer Science
> > BITS Pilani
> > E-mail: aasthakm@gmail.com
>
>

Re: questions regarding fuse_dfs_read

Posted by Brian Bockelman <bb...@cse.unl.edu>.
Hi Aastha,

A read-ahead buffer is a common technique to trade higher bandwidth for lower latency for a number of common read patterns.  Your OS does something similar (a much more advanced technique though).  By reading ahead, HDFS is betting that your reads have a pattern to it.  I think the 10MB default is a touch excessive (made more sense in previous releases).  I use 32KB.

The buffer is not used if you have very large reads, as it doesn't provide any benefit.

Brian

On Sep 7, 2011, at 12:45 AM, Aastha Mehta wrote:

> Hello,
> 
> I am using FUSE-DFS with HDFS for a project. I have to modify the read and
> write functions of fuse_dfs. I have few questions regarding the
> fuse_dfs_read code. There is an rdbuffer_size variable associated with the
> dfs_context, which is by default initialized to 10M. What is this
> rdbuffer_size and what is it used for?
> 
> Secondly, in the fuse_dfs_read function, there are two places where
> hdfsPread() is called in a loop. First, there is a check for whether the
> requested read size is greater than the value of rdbuffer_size. Only if it
> is, is the hdfsPread executed. In this case, the data is read into the
> buffer passed from the caller.
> 
> In the second case, hdfsPread is executed for if a valid buffer is
> associated with the dfs file handle fh and the size and offset of read
> request lie within the range of the fh->buf. In this case, the data is read
> into fh->buf.
> 
> Could someone explain what is happening here?
> 
> Thanks,
> Aastha.
> 
> -- 
> Aastha Mehta
> B.E. (Hons.) Computer Science
> BITS Pilani
> E-mail: aasthakm@gmail.com