You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-user@hadoop.apache.org by Jason Rutherglen <ja...@gmail.com> on 2011/03/02 01:24:47 UTC

Is DFSInputStream.read(long position,...) designed for multi threaded access?

It's unsynchronized however it's creating a new BlockReader on each
call, that seems like a problem?

Re: Is DFSInputStream.read(long position,...) designed for multi threaded access?

Posted by Todd Lipcon <to...@cloudera.com>.
On Wed, Mar 2, 2011 at 2:38 PM, Andrew Purtell <ap...@apache.org> wrote:
> I looked at HADOOP-6311 recently when deciding if I wanted to port the full FD-passing bits of HDFS-347 onto CDH3B3. I see Owen changed his -1 to -0, but that's not really comforting. How likely is this to go in? Would this go in to CDH if I posted a patch for HADOOP-6311 and HDFS-347?

Regarding trunk, that's for the general community to decide. I think
the increased interest in HBase across the community might have
changed people's minds - lots of people are now agreeing that 347 is a
great help for HBase. The fd-passing approach is so far the only one
I've seen proposed that addresses security.

Regarding CDH, this probably isn't the best place for discussing that :)

>
> My aim here is a fast local read path that respects security, obviously, something that I won't be supporting myself in production against an increasingly diverging upstream.

Agreed completely.

-Todd

>
> --- On Wed, 3/2/11, Todd Lipcon <to...@cloudera.com> wrote:
>
>> From: Todd Lipcon <to...@cloudera.com>
>> Subject: Re: Is DFSInputStream.read(long position,...) designed for multi threaded access?
>> To: "Jason Rutherglen" <ja...@gmail.com>
>> Cc: hdfs-user@hadoop.apache.org
>> Date: Wednesday, March 2, 2011, 2:10 PM
>> On Wed, Mar 2, 2011 at 8:17 AM, Jason
>> Rutherglen
>> <ja...@gmail.com>
>> wrote:
>> > Todd,
>> >
>> > Thanks for the reply.  I looked at HDFS-941 which seems to remove the
>> > redundant creation of BlockReaders.  That and HDFS-347 will solve some
>> > of the issues, however I think there's also the sendmsg() of the file
>> > descriptor that also needs to be implemented?
>>
>> Yes, see HADOOP-6311. I have an up-to-date patch on this but have not
>> yet gotten to posting it since we need 347 to be redone on trunk as
>> well.
>>
>> -Todd
>
>
>
>
>



-- 
Todd Lipcon
Software Engineer, Cloudera

Re: Is DFSInputStream.read(long position,...) designed for multi threaded access?

Posted by Jason Rutherglen <ja...@gmail.com>.
I wonder how much of the Android code would be used?  Eg, are we
passing data over domain sockets or are we only planning on using
sendmsg()?

On Wed, Mar 2, 2011 at 2:38 PM, Andrew Purtell <ap...@apache.org> wrote:
> I looked at HADOOP-6311 recently when deciding if I wanted to port the full FD-passing bits of HDFS-347 onto CDH3B3. I see Owen changed his -1 to -0, but that's not really comforting. How likely is this to go in? Would this go in to CDH if I posted a patch for HADOOP-6311 and HDFS-347?
>
> My aim here is a fast local read path that respects security, obviously, something that I won't be supporting myself in production against an increasingly diverging upstream.
>
>    - Andy
>
>
> --- On Wed, 3/2/11, Todd Lipcon <to...@cloudera.com> wrote:
>
>> From: Todd Lipcon <to...@cloudera.com>
>> Subject: Re: Is DFSInputStream.read(long position,...) designed for multi threaded access?
>> To: "Jason Rutherglen" <ja...@gmail.com>
>> Cc: hdfs-user@hadoop.apache.org
>> Date: Wednesday, March 2, 2011, 2:10 PM
>> On Wed, Mar 2, 2011 at 8:17 AM, Jason
>> Rutherglen
>> <ja...@gmail.com>
>> wrote:
>> > Todd,
>> >
>> > Thanks for the reply.  I looked at HDFS-941 which seems to remove the
>> > redundant creation of BlockReaders.  That and HDFS-347 will solve some
>> > of the issues, however I think there's also the sendmsg() of the file
>> > descriptor that also needs to be implemented?
>>
>> Yes, see HADOOP-6311. I have an up-to-date patch on this but have not
>> yet gotten to posting it since we need 347 to be redone on trunk as
>> well.
>>
>> -Todd
>
>
>
>
>

Re: Is DFSInputStream.read(long position,...) designed for multi threaded access?

Posted by Andrew Purtell <ap...@apache.org>.
I looked at HADOOP-6311 recently when deciding if I wanted to port the full FD-passing bits of HDFS-347 onto CDH3B3. I see Owen changed his -1 to -0, but that's not really comforting. How likely is this to go in? Would this go in to CDH if I posted a patch for HADOOP-6311 and HDFS-347?

My aim here is a fast local read path that respects security, obviously, something that I won't be supporting myself in production against an increasingly diverging upstream.

    - Andy


--- On Wed, 3/2/11, Todd Lipcon <to...@cloudera.com> wrote:

> From: Todd Lipcon <to...@cloudera.com>
> Subject: Re: Is DFSInputStream.read(long position,...) designed for multi threaded access?
> To: "Jason Rutherglen" <ja...@gmail.com>
> Cc: hdfs-user@hadoop.apache.org
> Date: Wednesday, March 2, 2011, 2:10 PM
> On Wed, Mar 2, 2011 at 8:17 AM, Jason
> Rutherglen
> <ja...@gmail.com>
> wrote:
> > Todd,
> >
> > Thanks for the reply.  I looked at HDFS-941 which seems to remove the
> > redundant creation of BlockReaders.  That and HDFS-347 will solve some
> > of the issues, however I think there's also the sendmsg() of the file
> > descriptor that also needs to be implemented?
> 
> Yes, see HADOOP-6311. I have an up-to-date patch on this but have not
> yet gotten to posting it since we need 347 to be redone on trunk as
> well.
> 
> -Todd



      

Re: Is DFSInputStream.read(long position,...) designed for multi threaded access?

Posted by Todd Lipcon <to...@cloudera.com>.
On Wed, Mar 2, 2011 at 8:17 AM, Jason Rutherglen
<ja...@gmail.com> wrote:
> Todd,
>
> Thanks for the reply.  I looked at HDFS-941 which seems to remove the
> redundant creation of BlockReaders.  That and HDFS-347 will solve some
> of the issues, however I think there's also the sendmsg() of the file
> descriptor that also needs to be implemented?

Yes, see HADOOP-6311. I have an up-to-date patch on this but have not
yet gotten to posting it since we need 347 to be redone on trunk as
well.

-Todd

> On Tue, Mar 1, 2011 at 5:58 PM, Todd Lipcon <to...@cloudera.com> wrote:
>> Hi Jason,
>>
>> Yes, this method is currently very inefficient.. HDFS-941 will
>> hopefully improve this situation, but currently there's no
>> particularly efficient way to do multithreaded random access.
>>
>> -Todd
>>
>> On Tue, Mar 1, 2011 at 4:24 PM, Jason Rutherglen
>> <ja...@gmail.com> wrote:
>>> It's unsynchronized however it's creating a new BlockReader on each
>>> call, that seems like a problem?
>>>
>>
>>
>>
>> --
>> Todd Lipcon
>> Software Engineer, Cloudera
>>
>



-- 
Todd Lipcon
Software Engineer, Cloudera

Re: Is DFSInputStream.read(long position,...) designed for multi threaded access?

Posted by Jason Rutherglen <ja...@gmail.com>.
Todd,

Thanks for the reply.  I looked at HDFS-941 which seems to remove the
redundant creation of BlockReaders.  That and HDFS-347 will solve some
of the issues, however I think there's also the sendmsg() of the file
descriptor that also needs to be implemented?

Jason

On Tue, Mar 1, 2011 at 5:58 PM, Todd Lipcon <to...@cloudera.com> wrote:
> Hi Jason,
>
> Yes, this method is currently very inefficient.. HDFS-941 will
> hopefully improve this situation, but currently there's no
> particularly efficient way to do multithreaded random access.
>
> -Todd
>
> On Tue, Mar 1, 2011 at 4:24 PM, Jason Rutherglen
> <ja...@gmail.com> wrote:
>> It's unsynchronized however it's creating a new BlockReader on each
>> call, that seems like a problem?
>>
>
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera
>

Re: Is DFSInputStream.read(long position,...) designed for multi threaded access?

Posted by Todd Lipcon <to...@cloudera.com>.
Hi Jason,

Yes, this method is currently very inefficient.. HDFS-941 will
hopefully improve this situation, but currently there's no
particularly efficient way to do multithreaded random access.

-Todd

On Tue, Mar 1, 2011 at 4:24 PM, Jason Rutherglen
<ja...@gmail.com> wrote:
> It's unsynchronized however it's creating a new BlockReader on each
> call, that seems like a problem?
>



-- 
Todd Lipcon
Software Engineer, Cloudera