You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by David Alves <da...@gmail.com> on 2013/04/20 23:39:14 UTC

quick question about the new SE iface

Hi

	I'm porting the region level HBase SE to the new SE iface and I have a couple of questions.
	1- about the method: public ListMultimap<ReadEntry, DrillbitEndpoint> getReadLocations(Collection<ReadEntry> entries)
	
	when does it happen that a read entry gets assigned more that one drillbits? 
	in terms of hbase I can see the case where multiple read entries get assigned to the same drillbit (co-located regions) but I can't envision a case where the same read entry (usually corresponding to a shard or partition) gets assigned to multiple drillbits. when can that happen?

	2- with regard to off-heap storage and underlying SE co-location
	
	this is not really a doubt, just checking that my reasoning is correct before.

	for co-located underlying SE and Drillbit's we should use off-heap, shared memory for IPC when possible, correct?
	Specifically I'm investigating the possibility of having HBase store region scan data directly off heap and making the results from hbase contain a set references to aligned shared memory locations.
	I'm not sure I'll be implementing this immediately but I'd like to design accounting for it if that is the idea.
	Also this means that SE's must work in two modes: co-located with shared memory and remote with sockets. We'd then have the 
	Jacques: I'm sure you've put some thought to the underlying mechanics on how to accomplish this, could you share some quick ideas/references?

Best
David

	
	

Re: quick question about the new SE iface

Posted by David Alves <da...@gmail.com>.
anecdotal evidence[1] seems to show that it is well worth while (6x improvement over *nix sockets).
i'll be implementing both approaches (sockets and mmap) since that is required for remote operation anyway so we should have some concrete numbers.

best
-david

[1] http://psy-lob-saw.blogspot.com/2013/04/lock-free-ipc-queue.html


On Apr 20, 2013, at 7:48 PM, Ted Dunning <te...@gmail.com> wrote:

> I have experimented with this and had decent results, but have not measured
> performance.
> 
> One of the real gotchas about Java use of mmap is that you can't easily
> unmap a region without JNI help.
> 
> It is worth testing whether you can actually exceed the performance of a
> Unix domain socket.  Obviously you have some advantages relative to
> serialization with mmap, but there are effectively still copies going on
> due to NUMA.
> 
> 
> On Sat, Apr 20, 2013 at 4:21 PM, David Alves <da...@gmail.com> wrote:
> 
>> Thanks for the quick reply and for the pointers.
>> wrt to question one, now thinking about it that opens the door for some
>> cool optimizations.
>> 
>> wrt to question two I found a couple of interesting references using
>> MMAP'd tmpfs, was now looking into the way JCuda uses pinned memory.
>> i'll also look into the Peter Lawrey references.
>> 
>> best
>> david
>> 
>> On Apr 20, 2013, at 6:00 PM, Jacques Nadeau <ja...@apache.org> wrote:
>> 
>>> On Sat, Apr 20, 2013 at 2:39 PM, David Alves <da...@gmail.com>
>> wrote:
>>> 
>>>> Hi
>>>> 
>>>>       I'm porting the region level HBase SE to the new SE iface and I
>>>> have a couple of questions.
>>>>       1- about the method: public ListMultimap<ReadEntry,
>>>> DrillbitEndpoint> getReadLocations(Collection<ReadEntry> entries)
>>>> 
>>>>       when does it happen that a read entry gets assigned more that one
>>>> drillbits?
>>>>       in terms of hbase I can see the case where multiple read entries
>>>> get assigned to the same drillbit (co-located regions) but I can't
>> envision
>>>> a case where the same read entry (usually corresponding to a shard or
>>>> partition) gets assigned to multiple drillbits. when can that happen?
>>>> 
>>> 
>>> Best example is probably block replica locations in HDFS have multiple
>>> possible endpoints.
>>> 
>>> 
>>> 
>>>> 
>>>>       2- with regard to off-heap storage and underlying SE co-location
>>>> 
>>>>       this is not really a doubt, just checking that my reasoning is
>>>> correct before.
>>>> 
>>>>       for co-located underlying SE and Drillbit's we should use
>>>> off-heap, shared memory for IPC when possible, correct?
>>>>       Specifically I'm investigating the possibility of having HBase
>>>> store region scan data directly off heap and making the results from
>> hbase
>>>> contain a set references to aligned shared memory locations.
>>>>       I'm not sure I'll be implementing this immediately but I'd like
>> to
>>>> design accounting for it if that is the idea.
>>>>       Also this means that SE's must work in two modes: co-located with
>>>> shared memory and remote with sockets. We'd then have the
>>>>       Jacques: I'm sure you've put some thought to the underlying
>>>> mechanics on how to accomplish this, could you share some quick
>>>> ideas/references?
>>>> 
>>> 
>>> The challenge is separate JVMs don't have a nice way to share memory.
>> The
>>> simplest way is probably using MMAP'd tmpfs.  We'd have to evaluate the
>>> performance impact of this complexity.  I think the Java Chronicle,
>>> HugeCollections or VanillaJava stuff by Peter Lawrey has played with
>> this.
>>> There isn't a lot of work in the space.  Other interesting info:
>>> 
>> http://javaforu.blogspot.com/2011/09/offloading-data-from-jvm-heap-little.html
>> .
>>> 
>>> 
>>> Yes, this does mean that an SE may need to use two different mechanisms
>> to
>>> interact: one local and one remote/fallback.
>>> 
>>> J
>> 
>> 


Re: quick question about the new SE iface

Posted by Ted Dunning <te...@gmail.com>.
I have experimented with this and had decent results, but have not measured
performance.

One of the real gotchas about Java use of mmap is that you can't easily
unmap a region without JNI help.

It is worth testing whether you can actually exceed the performance of a
Unix domain socket.  Obviously you have some advantages relative to
serialization with mmap, but there are effectively still copies going on
due to NUMA.


On Sat, Apr 20, 2013 at 4:21 PM, David Alves <da...@gmail.com> wrote:

> Thanks for the quick reply and for the pointers.
> wrt to question one, now thinking about it that opens the door for some
> cool optimizations.
>
> wrt to question two I found a couple of interesting references using
> MMAP'd tmpfs, was now looking into the way JCuda uses pinned memory.
> i'll also look into the Peter Lawrey references.
>
> best
> david
>
> On Apr 20, 2013, at 6:00 PM, Jacques Nadeau <ja...@apache.org> wrote:
>
> > On Sat, Apr 20, 2013 at 2:39 PM, David Alves <da...@gmail.com>
> wrote:
> >
> >> Hi
> >>
> >>        I'm porting the region level HBase SE to the new SE iface and I
> >> have a couple of questions.
> >>        1- about the method: public ListMultimap<ReadEntry,
> >> DrillbitEndpoint> getReadLocations(Collection<ReadEntry> entries)
> >>
> >>        when does it happen that a read entry gets assigned more that one
> >> drillbits?
> >>        in terms of hbase I can see the case where multiple read entries
> >> get assigned to the same drillbit (co-located regions) but I can't
> envision
> >> a case where the same read entry (usually corresponding to a shard or
> >> partition) gets assigned to multiple drillbits. when can that happen?
> >>
> >
> > Best example is probably block replica locations in HDFS have multiple
> > possible endpoints.
> >
> >
> >
> >>
> >>        2- with regard to off-heap storage and underlying SE co-location
> >>
> >>        this is not really a doubt, just checking that my reasoning is
> >> correct before.
> >>
> >>        for co-located underlying SE and Drillbit's we should use
> >> off-heap, shared memory for IPC when possible, correct?
> >>        Specifically I'm investigating the possibility of having HBase
> >> store region scan data directly off heap and making the results from
> hbase
> >> contain a set references to aligned shared memory locations.
> >>        I'm not sure I'll be implementing this immediately but I'd like
> to
> >> design accounting for it if that is the idea.
> >>        Also this means that SE's must work in two modes: co-located with
> >> shared memory and remote with sockets. We'd then have the
> >>        Jacques: I'm sure you've put some thought to the underlying
> >> mechanics on how to accomplish this, could you share some quick
> >> ideas/references?
> >>
> >
> > The challenge is separate JVMs don't have a nice way to share memory.
>  The
> > simplest way is probably using MMAP'd tmpfs.  We'd have to evaluate the
> > performance impact of this complexity.  I think the Java Chronicle,
> > HugeCollections or VanillaJava stuff by Peter Lawrey has played with
> this.
> > There isn't a lot of work in the space.  Other interesting info:
> >
> http://javaforu.blogspot.com/2011/09/offloading-data-from-jvm-heap-little.html
> .
> >
> >
> > Yes, this does mean that an SE may need to use two different mechanisms
> to
> > interact: one local and one remote/fallback.
> >
> > J
>
>

Re: quick question about the new SE iface

Posted by David Alves <da...@gmail.com>.
Thanks for the quick reply and for the pointers.
wrt to question one, now thinking about it that opens the door for some cool optimizations.

wrt to question two I found a couple of interesting references using MMAP'd tmpfs, was now looking into the way JCuda uses pinned memory.
i'll also look into the Peter Lawrey references.

best
david

On Apr 20, 2013, at 6:00 PM, Jacques Nadeau <ja...@apache.org> wrote:

> On Sat, Apr 20, 2013 at 2:39 PM, David Alves <da...@gmail.com> wrote:
> 
>> Hi
>> 
>>        I'm porting the region level HBase SE to the new SE iface and I
>> have a couple of questions.
>>        1- about the method: public ListMultimap<ReadEntry,
>> DrillbitEndpoint> getReadLocations(Collection<ReadEntry> entries)
>> 
>>        when does it happen that a read entry gets assigned more that one
>> drillbits?
>>        in terms of hbase I can see the case where multiple read entries
>> get assigned to the same drillbit (co-located regions) but I can't envision
>> a case where the same read entry (usually corresponding to a shard or
>> partition) gets assigned to multiple drillbits. when can that happen?
>> 
> 
> Best example is probably block replica locations in HDFS have multiple
> possible endpoints.
> 
> 
> 
>> 
>>        2- with regard to off-heap storage and underlying SE co-location
>> 
>>        this is not really a doubt, just checking that my reasoning is
>> correct before.
>> 
>>        for co-located underlying SE and Drillbit's we should use
>> off-heap, shared memory for IPC when possible, correct?
>>        Specifically I'm investigating the possibility of having HBase
>> store region scan data directly off heap and making the results from hbase
>> contain a set references to aligned shared memory locations.
>>        I'm not sure I'll be implementing this immediately but I'd like to
>> design accounting for it if that is the idea.
>>        Also this means that SE's must work in two modes: co-located with
>> shared memory and remote with sockets. We'd then have the
>>        Jacques: I'm sure you've put some thought to the underlying
>> mechanics on how to accomplish this, could you share some quick
>> ideas/references?
>> 
> 
> The challenge is separate JVMs don't have a nice way to share memory.  The
> simplest way is probably using MMAP'd tmpfs.  We'd have to evaluate the
> performance impact of this complexity.  I think the Java Chronicle,
> HugeCollections or VanillaJava stuff by Peter Lawrey has played with this.
> There isn't a lot of work in the space.  Other interesting info:
> http://javaforu.blogspot.com/2011/09/offloading-data-from-jvm-heap-little.html.
> 
> 
> Yes, this does mean that an SE may need to use two different mechanisms to
> interact: one local and one remote/fallback.
> 
> J


Re: quick question about the new SE iface

Posted by Jacques Nadeau <ja...@apache.org>.
On Sat, Apr 20, 2013 at 2:39 PM, David Alves <da...@gmail.com> wrote:

> Hi
>
>         I'm porting the region level HBase SE to the new SE iface and I
> have a couple of questions.
>         1- about the method: public ListMultimap<ReadEntry,
> DrillbitEndpoint> getReadLocations(Collection<ReadEntry> entries)
>
>         when does it happen that a read entry gets assigned more that one
> drillbits?
>         in terms of hbase I can see the case where multiple read entries
> get assigned to the same drillbit (co-located regions) but I can't envision
> a case where the same read entry (usually corresponding to a shard or
> partition) gets assigned to multiple drillbits. when can that happen?
>

Best example is probably block replica locations in HDFS have multiple
possible endpoints.



>
>         2- with regard to off-heap storage and underlying SE co-location
>
>         this is not really a doubt, just checking that my reasoning is
> correct before.
>
>         for co-located underlying SE and Drillbit's we should use
> off-heap, shared memory for IPC when possible, correct?
>         Specifically I'm investigating the possibility of having HBase
> store region scan data directly off heap and making the results from hbase
> contain a set references to aligned shared memory locations.
>         I'm not sure I'll be implementing this immediately but I'd like to
> design accounting for it if that is the idea.
>         Also this means that SE's must work in two modes: co-located with
> shared memory and remote with sockets. We'd then have the
>         Jacques: I'm sure you've put some thought to the underlying
> mechanics on how to accomplish this, could you share some quick
> ideas/references?
>

The challenge is separate JVMs don't have a nice way to share memory.  The
simplest way is probably using MMAP'd tmpfs.  We'd have to evaluate the
performance impact of this complexity.  I think the Java Chronicle,
HugeCollections or VanillaJava stuff by Peter Lawrey has played with this.
 There isn't a lot of work in the space.  Other interesting info:
http://javaforu.blogspot.com/2011/09/offloading-data-from-jvm-heap-little.html.


Yes, this does mean that an SE may need to use two different mechanisms to
interact: one local and one remote/fallback.

J