You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by Jason Rutherglen <ja...@gmail.com> on 2011/07/09 01:51:10 UTC

Converting byte[] to ByteBuffer

Is there an open issue for this?  How hard will this be?  :)

Re: Converting byte[] to ByteBuffer

Posted by Jason Rutherglen <ja...@gmail.com>.
Andrew,

I fully agree.  I opened HDFS-2004 to this end however it was (oddly)
shot down.  I think HBase usage of HDFS is divergent from the
traditional MapReduce usage.  MapR addresses these issues, as do some
of the Facebook related work.

I think HBase should work at a lower level than the traditional HDFS
APIs, thus the only patches required for HDFS are ones that make it
more malleable for the requirements of HBase.

> Ryan's HDFS-347 but in addition it also checksums the blocks and caches NameNode metadata

Sounds good, I'm interested in checking that out.

On Sun, Jul 10, 2011 at 9:25 AM, Andrew Purtell <ap...@apache.org> wrote:
>> I agree with what Ryan is saying here, and I'd like to second (third?
>> fourth?) keep pushing for HDFS improvements.  Anything else is coding
>> around the bigger I/O issue.
>
>
> The Facebook code drop, not the 0.20-append branch with its clean history but rather the hairball without (shame), has a HDFS patched with the same approach as Ryan's HDFS-347 but in addition it also checksums the blocks and caches NameNode metadata. I might swap out Ryan's HDFS-347 patch locally with an extraction of these changes.
>
> I've also been considering back porting the (stale) HADOOP-4801/HADOOP-6311 approach. Jason, it looks like you've recently updated those issues?
>
> Best regards,
>
>
>    - Andy
>
> Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White)
>
>
> ----- Original Message -----
>> From: Doug Meil <do...@explorysmedical.com>
>> To: "dev@hbase.apache.org" <de...@hbase.apache.org>
>> Cc:
>> Sent: Saturday, July 9, 2011 6:04 PM
>> Subject: Re: Converting byte[] to ByteBuffer
>>
>>
>> re:  "If a variant of hdfs-347 was committed,"
>>
>> I agree with what Ryan is saying here, and I'd like to second (third?
>> fourth?) keep pushing for HDFS improvements.  Anything else is coding
>> around the bigger I/O issue.
>>
>>
>>
>> On 7/9/11 6:13 PM, "Ryan Rawson" <ry...@gmail.com> wrote:
>>
>>> I think my general point is we could hack up the hbase source, add
>>> refcounting, circumvent the gc, etc or we could demand more from the dfs.
>>>
>>> If a variant of hdfs-347 was committed, reads could come from the Linux
>>> buffer cache and life would be good.
>>>
>>> The choice isn't fast hbase vs slow hbase, there are elements of bugs
>>> there
>>> as well.
>>> On Jul 9, 2011 12:25 PM, "M. C. Srivas" <mc...@gmail.com>
>> wrote:
>>>>  On Fri, Jul 8, 2011 at 6:47 PM, Jason Rutherglen <
>>> jason.rutherglen@gmail.com
>>>>>  wrote:
>>>>
>>>>>  There are couple of things here, one is direct byte buffers to put
>> the
>>>>>  blocks outside of heap, the other is MMap'ing the blocks
>> directly from
>>>>>  the underlying HDFS file.
>>>>
>>>>
>>>>>  I think they both make sense. And I'm not sure MapR's
>> solution will
>>>>>  be that much better if the latter is implemented in HBase.
>>>>>
>>>>
>>>>  There're some major issues with mmap'ing the local hdfs file
>> (the
>>>> "block")
>>>>  directly:
>>>>  (a) no checksums to detect data corruption from bad disks
>>>>  (b) when a disk does fail, the dfs could start reading from an
>> alternate
>>>>  replica ... but that option is lost when mmap'ing and the RS will
>> crash
>>>>  immediately
>>>>  (c) security is completely lost, but that is minor given hbase's
>> current
>>>>  status
>>>>
>>>>  For those hbase deployments that don't care about the absence of
>> the (a)
>>> and
>>>>  (b), especially (b), its definitely a viable option that gives good
>>>> perf.
>>>>
>>>>  At MapR, we did consider similar direct-access capability and rejected
>>>> it
>>>>  due to the above concerns.
>>>>
>>>>
>>>>
>>>>>
>>>>>  On Fri, Jul 8, 2011 at 6:26 PM, Ryan Rawson
>> <ry...@gmail.com> wrote:
>>>>>  > The overhead in a byte buffer is the extra integers to keep
>> track of
>>> the
>>>>>  > mark, position, limit.
>>>>>  >
>>>>>  > I am not sure that putting the block cache in to heap is the
>> way to
>>>>> go.
>>>>>  > Getting faster local dfs reads is important, and if you run
>> hbase on
>>> top
>>>>>  of
>>>>>  > Mapr, these things are taken care of for you.
>>>>>  > On Jul 8, 2011 6:20 PM, "Jason Rutherglen"
>>>>> <ja...@gmail.com>
>>>>>  > wrote:
>>>>>  >> Also, it's for a good cause, moving the blocks out of
>> main heap
>>>>> using
>>>>>  >> direct byte buffers or some other more native-like
>> facility (if
>>>>> DBB's
>>>>>  >> don't work).
>>>>>  >>
>>>>>  >> On Fri, Jul 8, 2011 at 5:34 PM, Ryan Rawson
>> <ry...@gmail.com>
>>> wrote:
>>>>>  >>> Where? Everywhere? An array is 24 bytes, bb is 56
>> bytes. Also the
>>>>> API
>>>>>  >>> is...annoying.
>>>>>  >>> On Jul 8, 2011 4:51 PM, "Jason Rutherglen"
>> <
>>> jason.rutherglen@gmail.com
>>>>>  >
>>>>>  >>> wrote:
>>>>>  >>>> Is there an open issue for this? How hard will
>> this be? :)
>>>>>  >>>
>>>>>  >
>>>>>
>>
>

Re: Converting byte[] to ByteBuffer

Posted by Andrew Purtell <ap...@apache.org>.
> I agree with what Ryan is saying here, and I'd like to second (third?
> fourth?) keep pushing for HDFS improvements.  Anything else is coding
> around the bigger I/O issue.


The Facebook code drop, not the 0.20-append branch with its clean history but rather the hairball without (shame), has a HDFS patched with the same approach as Ryan's HDFS-347 but in addition it also checksums the blocks and caches NameNode metadata. I might swap out Ryan's HDFS-347 patch locally with an extraction of these changes.

I've also been considering back porting the (stale) HADOOP-4801/HADOOP-6311 approach. Jason, it looks like you've recently updated those issues?
 
Best regards,


   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White)


----- Original Message -----
> From: Doug Meil <do...@explorysmedical.com>
> To: "dev@hbase.apache.org" <de...@hbase.apache.org>
> Cc: 
> Sent: Saturday, July 9, 2011 6:04 PM
> Subject: Re: Converting byte[] to ByteBuffer
> 
> 
> re:  "If a variant of hdfs-347 was committed,"
> 
> I agree with what Ryan is saying here, and I'd like to second (third?
> fourth?) keep pushing for HDFS improvements.  Anything else is coding
> around the bigger I/O issue.
> 
> 
> 
> On 7/9/11 6:13 PM, "Ryan Rawson" <ry...@gmail.com> wrote:
> 
>> I think my general point is we could hack up the hbase source, add
>> refcounting, circumvent the gc, etc or we could demand more from the dfs.
>> 
>> If a variant of hdfs-347 was committed, reads could come from the Linux
>> buffer cache and life would be good.
>> 
>> The choice isn't fast hbase vs slow hbase, there are elements of bugs
>> there
>> as well.
>> On Jul 9, 2011 12:25 PM, "M. C. Srivas" <mc...@gmail.com> 
> wrote:
>>>  On Fri, Jul 8, 2011 at 6:47 PM, Jason Rutherglen <
>> jason.rutherglen@gmail.com
>>>>  wrote:
>>> 
>>>>  There are couple of things here, one is direct byte buffers to put 
> the
>>>>  blocks outside of heap, the other is MMap'ing the blocks 
> directly from
>>>>  the underlying HDFS file.
>>> 
>>> 
>>>>  I think they both make sense. And I'm not sure MapR's 
> solution will
>>>>  be that much better if the latter is implemented in HBase.
>>>> 
>>> 
>>>  There're some major issues with mmap'ing the local hdfs file 
> (the
>>> "block")
>>>  directly:
>>>  (a) no checksums to detect data corruption from bad disks
>>>  (b) when a disk does fail, the dfs could start reading from an 
> alternate
>>>  replica ... but that option is lost when mmap'ing and the RS will 
> crash
>>>  immediately
>>>  (c) security is completely lost, but that is minor given hbase's 
> current
>>>  status
>>> 
>>>  For those hbase deployments that don't care about the absence of 
> the (a)
>> and
>>>  (b), especially (b), its definitely a viable option that gives good
>>> perf.
>>> 
>>>  At MapR, we did consider similar direct-access capability and rejected
>>> it
>>>  due to the above concerns.
>>> 
>>> 
>>> 
>>>> 
>>>>  On Fri, Jul 8, 2011 at 6:26 PM, Ryan Rawson 
> <ry...@gmail.com> wrote:
>>>>  > The overhead in a byte buffer is the extra integers to keep 
> track of
>> the
>>>>  > mark, position, limit.
>>>>  >
>>>>  > I am not sure that putting the block cache in to heap is the 
> way to
>>>> go.
>>>>  > Getting faster local dfs reads is important, and if you run 
> hbase on
>> top
>>>>  of
>>>>  > Mapr, these things are taken care of for you.
>>>>  > On Jul 8, 2011 6:20 PM, "Jason Rutherglen"
>>>> <ja...@gmail.com>
>>>>  > wrote:
>>>>  >> Also, it's for a good cause, moving the blocks out of 
> main heap
>>>> using
>>>>  >> direct byte buffers or some other more native-like 
> facility (if
>>>> DBB's
>>>>  >> don't work).
>>>>  >>
>>>>  >> On Fri, Jul 8, 2011 at 5:34 PM, Ryan Rawson 
> <ry...@gmail.com>
>> wrote:
>>>>  >>> Where? Everywhere? An array is 24 bytes, bb is 56 
> bytes. Also the
>>>> API
>>>>  >>> is...annoying.
>>>>  >>> On Jul 8, 2011 4:51 PM, "Jason Rutherglen" 
> <
>> jason.rutherglen@gmail.com
>>>>  >
>>>>  >>> wrote:
>>>>  >>>> Is there an open issue for this? How hard will 
> this be? :)
>>>>  >>>
>>>>  >
>>>> 
>

Re: Converting byte[] to ByteBuffer

Posted by Ryan Rawson <ry...@gmail.com>.
No lines of hbase were changed to run on Mapr. Mapr implements the hdfs API
and uses jni to get local data. If hdfs wanted to it could use more
sophisticated methods to get data rapidly from local disk to a client's
memory space...as Mapr does.
On Jul 9, 2011 6:05 PM, "Doug Meil" <do...@explorysmedical.com> wrote:
>
> re: "If a variant of hdfs-347 was committed,"
>
> I agree with what Ryan is saying here, and I'd like to second (third?
> fourth?) keep pushing for HDFS improvements. Anything else is coding
> around the bigger I/O issue.
>
>
>
> On 7/9/11 6:13 PM, "Ryan Rawson" <ry...@gmail.com> wrote:
>
>>I think my general point is we could hack up the hbase source, add
>>refcounting, circumvent the gc, etc or we could demand more from the dfs.
>>
>>If a variant of hdfs-347 was committed, reads could come from the Linux
>>buffer cache and life would be good.
>>
>>The choice isn't fast hbase vs slow hbase, there are elements of bugs
>>there
>>as well.
>>On Jul 9, 2011 12:25 PM, "M. C. Srivas" <mc...@gmail.com> wrote:
>>> On Fri, Jul 8, 2011 at 6:47 PM, Jason Rutherglen <
>>jason.rutherglen@gmail.com
>>>> wrote:
>>>
>>>> There are couple of things here, one is direct byte buffers to put the
>>>> blocks outside of heap, the other is MMap'ing the blocks directly from
>>>> the underlying HDFS file.
>>>
>>>
>>>> I think they both make sense. And I'm not sure MapR's solution will
>>>> be that much better if the latter is implemented in HBase.
>>>>
>>>
>>> There're some major issues with mmap'ing the local hdfs file (the
>>>"block")
>>> directly:
>>> (a) no checksums to detect data corruption from bad disks
>>> (b) when a disk does fail, the dfs could start reading from an alternate
>>> replica ... but that option is lost when mmap'ing and the RS will crash
>>> immediately
>>> (c) security is completely lost, but that is minor given hbase's current
>>> status
>>>
>>> For those hbase deployments that don't care about the absence of the (a)
>>and
>>> (b), especially (b), its definitely a viable option that gives good
>>>perf.
>>>
>>> At MapR, we did consider similar direct-access capability and rejected
>>>it
>>> due to the above concerns.
>>>
>>>
>>>
>>>>
>>>> On Fri, Jul 8, 2011 at 6:26 PM, Ryan Rawson <ry...@gmail.com> wrote:
>>>> > The overhead in a byte buffer is the extra integers to keep track of
>>the
>>>> > mark, position, limit.
>>>> >
>>>> > I am not sure that putting the block cache in to heap is the way to
>>>>go.
>>>> > Getting faster local dfs reads is important, and if you run hbase on
>>top
>>>> of
>>>> > Mapr, these things are taken care of for you.
>>>> > On Jul 8, 2011 6:20 PM, "Jason Rutherglen"
>>>><ja...@gmail.com>
>>>> > wrote:
>>>> >> Also, it's for a good cause, moving the blocks out of main heap
>>>>using
>>>> >> direct byte buffers or some other more native-like facility (if
>>>>DBB's
>>>> >> don't work).
>>>> >>
>>>> >> On Fri, Jul 8, 2011 at 5:34 PM, Ryan Rawson <ry...@gmail.com>
>>wrote:
>>>> >>> Where? Everywhere? An array is 24 bytes, bb is 56 bytes. Also the
>>>>API
>>>> >>> is...annoying.
>>>> >>> On Jul 8, 2011 4:51 PM, "Jason Rutherglen" <
>>jason.rutherglen@gmail.com
>>>> >
>>>> >>> wrote:
>>>> >>>> Is there an open issue for this? How hard will this be? :)
>>>> >>>
>>>> >
>>>>
>

Re: Converting byte[] to ByteBuffer

Posted by Doug Meil <do...@explorysmedical.com>.
re:  "If a variant of hdfs-347 was committed,"

I agree with what Ryan is saying here, and I'd like to second (third?
fourth?) keep pushing for HDFS improvements.  Anything else is coding
around the bigger I/O issue.



On 7/9/11 6:13 PM, "Ryan Rawson" <ry...@gmail.com> wrote:

>I think my general point is we could hack up the hbase source, add
>refcounting, circumvent the gc, etc or we could demand more from the dfs.
>
>If a variant of hdfs-347 was committed, reads could come from the Linux
>buffer cache and life would be good.
>
>The choice isn't fast hbase vs slow hbase, there are elements of bugs
>there
>as well.
>On Jul 9, 2011 12:25 PM, "M. C. Srivas" <mc...@gmail.com> wrote:
>> On Fri, Jul 8, 2011 at 6:47 PM, Jason Rutherglen <
>jason.rutherglen@gmail.com
>>> wrote:
>>
>>> There are couple of things here, one is direct byte buffers to put the
>>> blocks outside of heap, the other is MMap'ing the blocks directly from
>>> the underlying HDFS file.
>>
>>
>>> I think they both make sense. And I'm not sure MapR's solution will
>>> be that much better if the latter is implemented in HBase.
>>>
>>
>> There're some major issues with mmap'ing the local hdfs file (the
>>"block")
>> directly:
>> (a) no checksums to detect data corruption from bad disks
>> (b) when a disk does fail, the dfs could start reading from an alternate
>> replica ... but that option is lost when mmap'ing and the RS will crash
>> immediately
>> (c) security is completely lost, but that is minor given hbase's current
>> status
>>
>> For those hbase deployments that don't care about the absence of the (a)
>and
>> (b), especially (b), its definitely a viable option that gives good
>>perf.
>>
>> At MapR, we did consider similar direct-access capability and rejected
>>it
>> due to the above concerns.
>>
>>
>>
>>>
>>> On Fri, Jul 8, 2011 at 6:26 PM, Ryan Rawson <ry...@gmail.com> wrote:
>>> > The overhead in a byte buffer is the extra integers to keep track of
>the
>>> > mark, position, limit.
>>> >
>>> > I am not sure that putting the block cache in to heap is the way to
>>>go.
>>> > Getting faster local dfs reads is important, and if you run hbase on
>top
>>> of
>>> > Mapr, these things are taken care of for you.
>>> > On Jul 8, 2011 6:20 PM, "Jason Rutherglen"
>>><ja...@gmail.com>
>>> > wrote:
>>> >> Also, it's for a good cause, moving the blocks out of main heap
>>>using
>>> >> direct byte buffers or some other more native-like facility (if
>>>DBB's
>>> >> don't work).
>>> >>
>>> >> On Fri, Jul 8, 2011 at 5:34 PM, Ryan Rawson <ry...@gmail.com>
>wrote:
>>> >>> Where? Everywhere? An array is 24 bytes, bb is 56 bytes. Also the
>>>API
>>> >>> is...annoying.
>>> >>> On Jul 8, 2011 4:51 PM, "Jason Rutherglen" <
>jason.rutherglen@gmail.com
>>> >
>>> >>> wrote:
>>> >>>> Is there an open issue for this? How hard will this be? :)
>>> >>>
>>> >
>>>


Re: Converting byte[] to ByteBuffer

Posted by Ryan Rawson <ry...@gmail.com>.
I think my general point is we could hack up the hbase source, add
refcounting, circumvent the gc, etc or we could demand more from the dfs.

If a variant of hdfs-347 was committed, reads could come from the Linux
buffer cache and life would be good.

The choice isn't fast hbase vs slow hbase, there are elements of bugs there
as well.
On Jul 9, 2011 12:25 PM, "M. C. Srivas" <mc...@gmail.com> wrote:
> On Fri, Jul 8, 2011 at 6:47 PM, Jason Rutherglen <
jason.rutherglen@gmail.com
>> wrote:
>
>> There are couple of things here, one is direct byte buffers to put the
>> blocks outside of heap, the other is MMap'ing the blocks directly from
>> the underlying HDFS file.
>
>
>> I think they both make sense. And I'm not sure MapR's solution will
>> be that much better if the latter is implemented in HBase.
>>
>
> There're some major issues with mmap'ing the local hdfs file (the "block")
> directly:
> (a) no checksums to detect data corruption from bad disks
> (b) when a disk does fail, the dfs could start reading from an alternate
> replica ... but that option is lost when mmap'ing and the RS will crash
> immediately
> (c) security is completely lost, but that is minor given hbase's current
> status
>
> For those hbase deployments that don't care about the absence of the (a)
and
> (b), especially (b), its definitely a viable option that gives good perf.
>
> At MapR, we did consider similar direct-access capability and rejected it
> due to the above concerns.
>
>
>
>>
>> On Fri, Jul 8, 2011 at 6:26 PM, Ryan Rawson <ry...@gmail.com> wrote:
>> > The overhead in a byte buffer is the extra integers to keep track of
the
>> > mark, position, limit.
>> >
>> > I am not sure that putting the block cache in to heap is the way to go.
>> > Getting faster local dfs reads is important, and if you run hbase on
top
>> of
>> > Mapr, these things are taken care of for you.
>> > On Jul 8, 2011 6:20 PM, "Jason Rutherglen" <ja...@gmail.com>
>> > wrote:
>> >> Also, it's for a good cause, moving the blocks out of main heap using
>> >> direct byte buffers or some other more native-like facility (if DBB's
>> >> don't work).
>> >>
>> >> On Fri, Jul 8, 2011 at 5:34 PM, Ryan Rawson <ry...@gmail.com>
wrote:
>> >>> Where? Everywhere? An array is 24 bytes, bb is 56 bytes. Also the API
>> >>> is...annoying.
>> >>> On Jul 8, 2011 4:51 PM, "Jason Rutherglen" <
jason.rutherglen@gmail.com
>> >
>> >>> wrote:
>> >>>> Is there an open issue for this? How hard will this be? :)
>> >>>
>> >
>>

Re: Converting byte[] to ByteBuffer

Posted by "M. C. Srivas" <mc...@gmail.com>.
On Fri, Jul 8, 2011 at 6:47 PM, Jason Rutherglen <jason.rutherglen@gmail.com
> wrote:

> There are couple of things here, one is direct byte buffers to put the
> blocks outside of heap, the other is MMap'ing the blocks directly from
> the underlying HDFS file.


> I think they both make sense.  And I'm not sure MapR's solution will
> be that much better if the latter is implemented in HBase.
>

There're some major issues with mmap'ing the local hdfs file (the "block")
directly:
(a) no checksums to detect data corruption from bad disks
(b) when a disk does fail, the dfs could start reading from an alternate
replica ... but that option is lost when mmap'ing and the RS will crash
immediately
(c) security is completely lost, but that is minor given hbase's current
status

For those hbase deployments that don't care about the absence of the (a) and
(b), especially (b), its definitely a viable option that gives good perf.

At MapR, we did consider similar direct-access capability and rejected it
due to the above concerns.



>
> On Fri, Jul 8, 2011 at 6:26 PM, Ryan Rawson <ry...@gmail.com> wrote:
> > The overhead in a byte buffer is the extra integers to keep track of the
> > mark, position, limit.
> >
> > I am not sure that putting the block cache in to heap is the way to go.
> > Getting faster local dfs reads is important, and if you run hbase on top
> of
> > Mapr, these things are taken care of for you.
> > On Jul 8, 2011 6:20 PM, "Jason Rutherglen" <ja...@gmail.com>
> > wrote:
> >> Also, it's for a good cause, moving the blocks out of main heap using
> >> direct byte buffers or some other more native-like facility (if DBB's
> >> don't work).
> >>
> >> On Fri, Jul 8, 2011 at 5:34 PM, Ryan Rawson <ry...@gmail.com> wrote:
> >>> Where? Everywhere? An array is 24 bytes, bb is 56 bytes. Also the API
> >>> is...annoying.
> >>> On Jul 8, 2011 4:51 PM, "Jason Rutherglen" <jason.rutherglen@gmail.com
> >
> >>> wrote:
> >>>> Is there an open issue for this? How hard will this be? :)
> >>>
> >
>

Re: Converting byte[] to ByteBuffer

Posted by Jason Rutherglen <ja...@gmail.com>.
> When running on top of Mapr, hbase has fast cached access to locally stored
> files, the Mapr client ensures that. Likewise, hdfs should also ensure that
> local reads are fast and come out of cache as necessary. Eg: the kernel
> block cache.

Agreed!  However I don't see how that's possible today.  Eg, it'd
require more of a byte buffer type of API to HDFS, random reads not
using streams.  It's easy to add.

I think the biggest win for HBase with MapR is the lack of the
NameNode issues and snapshotting.  In particular, snapshots are pretty
much a standard RDBMS feature.

> Managing the block cache in not heap might work but you also might get there and find the dbb accounting
> overhead kills.

Lucene uses/abuses ref counting so I'm familiar with the downsides.
When it works, it's great, when it doesn't it's a nightmare to debug.
It is possible to make it work though.  I don't think there would be
overhead from it, ie, any pool of objects implements ref counting.

It'd be nice to not have a block cache however it's necessary for
caching compressed [on disk] blocks.

On Fri, Jul 8, 2011 at 7:05 PM, Ryan Rawson <ry...@gmail.com> wrote:
> Hey,
>
> When running on top of Mapr, hbase has fast cached access to locally stored
> files, the Mapr client ensures that. Likewise, hdfs should also ensure that
> local reads are fast and come out of cache as necessary. Eg: the kernel
> block cache.
>
> I wouldn't support mmap, it would require 2 different read path
> implementations. You will never know when a read is not local.
>
> Hdfs needs to provide faster local reads imo. Managing the block cache in
> not heap might work but you also might get there and find the dbb accounting
> overhead kills.
> On Jul 8, 2011 6:47 PM, "Jason Rutherglen" <ja...@gmail.com>
> wrote:
>> There are couple of things here, one is direct byte buffers to put the
>> blocks outside of heap, the other is MMap'ing the blocks directly from
>> the underlying HDFS file.
>>
>> I think they both make sense. And I'm not sure MapR's solution will
>> be that much better if the latter is implemented in HBase.
>>
>> On Fri, Jul 8, 2011 at 6:26 PM, Ryan Rawson <ry...@gmail.com> wrote:
>>> The overhead in a byte buffer is the extra integers to keep track of the
>>> mark, position, limit.
>>>
>>> I am not sure that putting the block cache in to heap is the way to go.
>>> Getting faster local dfs reads is important, and if you run hbase on top
> of
>>> Mapr, these things are taken care of for you.
>>> On Jul 8, 2011 6:20 PM, "Jason Rutherglen" <ja...@gmail.com>
>>> wrote:
>>>> Also, it's for a good cause, moving the blocks out of main heap using
>>>> direct byte buffers or some other more native-like facility (if DBB's
>>>> don't work).
>>>>
>>>> On Fri, Jul 8, 2011 at 5:34 PM, Ryan Rawson <ry...@gmail.com> wrote:
>>>>> Where? Everywhere? An array is 24 bytes, bb is 56 bytes. Also the API
>>>>> is...annoying.
>>>>> On Jul 8, 2011 4:51 PM, "Jason Rutherglen" <ja...@gmail.com>
>>>>> wrote:
>>>>>> Is there an open issue for this? How hard will this be? :)
>>>>>
>>>
>

Re: Converting byte[] to ByteBuffer

Posted by Ryan Rawson <ry...@gmail.com>.
Hey,

When running on top of Mapr, hbase has fast cached access to locally stored
files, the Mapr client ensures that. Likewise, hdfs should also ensure that
local reads are fast and come out of cache as necessary. Eg: the kernel
block cache.

I wouldn't support mmap, it would require 2 different read path
implementations. You will never know when a read is not local.

Hdfs needs to provide faster local reads imo. Managing the block cache in
not heap might work but you also might get there and find the dbb accounting
overhead kills.
On Jul 8, 2011 6:47 PM, "Jason Rutherglen" <ja...@gmail.com>
wrote:
> There are couple of things here, one is direct byte buffers to put the
> blocks outside of heap, the other is MMap'ing the blocks directly from
> the underlying HDFS file.
>
> I think they both make sense. And I'm not sure MapR's solution will
> be that much better if the latter is implemented in HBase.
>
> On Fri, Jul 8, 2011 at 6:26 PM, Ryan Rawson <ry...@gmail.com> wrote:
>> The overhead in a byte buffer is the extra integers to keep track of the
>> mark, position, limit.
>>
>> I am not sure that putting the block cache in to heap is the way to go.
>> Getting faster local dfs reads is important, and if you run hbase on top
of
>> Mapr, these things are taken care of for you.
>> On Jul 8, 2011 6:20 PM, "Jason Rutherglen" <ja...@gmail.com>
>> wrote:
>>> Also, it's for a good cause, moving the blocks out of main heap using
>>> direct byte buffers or some other more native-like facility (if DBB's
>>> don't work).
>>>
>>> On Fri, Jul 8, 2011 at 5:34 PM, Ryan Rawson <ry...@gmail.com> wrote:
>>>> Where? Everywhere? An array is 24 bytes, bb is 56 bytes. Also the API
>>>> is...annoying.
>>>> On Jul 8, 2011 4:51 PM, "Jason Rutherglen" <ja...@gmail.com>
>>>> wrote:
>>>>> Is there an open issue for this? How hard will this be? :)
>>>>
>>

Re: Converting byte[] to ByteBuffer

Posted by Jason Rutherglen <ja...@gmail.com>.
There are couple of things here, one is direct byte buffers to put the
blocks outside of heap, the other is MMap'ing the blocks directly from
the underlying HDFS file.

I think they both make sense.  And I'm not sure MapR's solution will
be that much better if the latter is implemented in HBase.

On Fri, Jul 8, 2011 at 6:26 PM, Ryan Rawson <ry...@gmail.com> wrote:
> The overhead in a byte buffer is the extra integers to keep track of the
> mark, position, limit.
>
> I am not sure that putting the block cache in to heap is the way to go.
> Getting faster local dfs reads is important, and if you run hbase on top of
> Mapr, these things are taken care of for you.
> On Jul 8, 2011 6:20 PM, "Jason Rutherglen" <ja...@gmail.com>
> wrote:
>> Also, it's for a good cause, moving the blocks out of main heap using
>> direct byte buffers or some other more native-like facility (if DBB's
>> don't work).
>>
>> On Fri, Jul 8, 2011 at 5:34 PM, Ryan Rawson <ry...@gmail.com> wrote:
>>> Where? Everywhere? An array is 24 bytes, bb is 56 bytes. Also the API
>>> is...annoying.
>>> On Jul 8, 2011 4:51 PM, "Jason Rutherglen" <ja...@gmail.com>
>>> wrote:
>>>> Is there an open issue for this? How hard will this be? :)
>>>
>

Re: Converting byte[] to ByteBuffer

Posted by Ryan Rawson <ry...@gmail.com>.
The overhead in a byte buffer is the extra integers to keep track of the
mark, position, limit.

I am not sure that putting the block cache in to heap is the way to go.
Getting faster local dfs reads is important, and if you run hbase on top of
Mapr, these things are taken care of for you.
On Jul 8, 2011 6:20 PM, "Jason Rutherglen" <ja...@gmail.com>
wrote:
> Also, it's for a good cause, moving the blocks out of main heap using
> direct byte buffers or some other more native-like facility (if DBB's
> don't work).
>
> On Fri, Jul 8, 2011 at 5:34 PM, Ryan Rawson <ry...@gmail.com> wrote:
>> Where? Everywhere? An array is 24 bytes, bb is 56 bytes. Also the API
>> is...annoying.
>> On Jul 8, 2011 4:51 PM, "Jason Rutherglen" <ja...@gmail.com>
>> wrote:
>>> Is there an open issue for this? How hard will this be? :)
>>

Re: Converting byte[] to ByteBuffer

Posted by Jason Rutherglen <ja...@gmail.com>.
Reference counting is doable.  Can you describe what the advantages
are of the slab allocated solution?

On Fri, Jul 8, 2011 at 6:30 PM, Li Pi <li...@cloudera.com> wrote:
> if you do that, you'll have to do a bit of reference counting. i'm working
> on a slab allocated solution.
>
> On Fri, Jul 8, 2011 at 6:20 PM, Jason Rutherglen <jason.rutherglen@gmail.com
>> wrote:
>
>> Also, it's for a good cause, moving the blocks out of main heap using
>> direct byte buffers or some other more native-like facility (if DBB's
>> don't work).
>>
>> On Fri, Jul 8, 2011 at 5:34 PM, Ryan Rawson <ry...@gmail.com> wrote:
>> > Where? Everywhere? An array is 24 bytes, bb is 56 bytes. Also the API
>> > is...annoying.
>> > On Jul 8, 2011 4:51 PM, "Jason Rutherglen" <ja...@gmail.com>
>> > wrote:
>> >> Is there an open issue for this? How hard will this be? :)
>> >
>>
>

Re: Converting byte[] to ByteBuffer

Posted by Li Pi <li...@cloudera.com>.
if you do that, you'll have to do a bit of reference counting. i'm working
on a slab allocated solution.

On Fri, Jul 8, 2011 at 6:20 PM, Jason Rutherglen <jason.rutherglen@gmail.com
> wrote:

> Also, it's for a good cause, moving the blocks out of main heap using
> direct byte buffers or some other more native-like facility (if DBB's
> don't work).
>
> On Fri, Jul 8, 2011 at 5:34 PM, Ryan Rawson <ry...@gmail.com> wrote:
> > Where? Everywhere? An array is 24 bytes, bb is 56 bytes. Also the API
> > is...annoying.
> > On Jul 8, 2011 4:51 PM, "Jason Rutherglen" <ja...@gmail.com>
> > wrote:
> >> Is there an open issue for this? How hard will this be? :)
> >
>

Re: Converting byte[] to ByteBuffer

Posted by Jason Rutherglen <ja...@gmail.com>.
Also, it's for a good cause, moving the blocks out of main heap using
direct byte buffers or some other more native-like facility (if DBB's
don't work).

On Fri, Jul 8, 2011 at 5:34 PM, Ryan Rawson <ry...@gmail.com> wrote:
> Where? Everywhere? An array is 24 bytes, bb is 56 bytes. Also the API
> is...annoying.
> On Jul 8, 2011 4:51 PM, "Jason Rutherglen" <ja...@gmail.com>
> wrote:
>> Is there an open issue for this? How hard will this be? :)
>

Re: Converting byte[] to ByteBuffer

Posted by Jason Rutherglen <ja...@gmail.com>.
I don't think the object pointer overhead is very much given it's
usually pointing at a full block?  Perhaps we can implement a nicer
class like Lucene's BytesRef [1].  Then we can have our own class that
may wrap a byte[] or ByteBuffer.

1. http://lucene.apache.org/java/3_3_0/api/core/org/apache/lucene/util/BytesRef.html

On Fri, Jul 8, 2011 at 5:34 PM, Ryan Rawson <ry...@gmail.com> wrote:
> Where? Everywhere? An array is 24 bytes, bb is 56 bytes. Also the API
> is...annoying.
> On Jul 8, 2011 4:51 PM, "Jason Rutherglen" <ja...@gmail.com>
> wrote:
>> Is there an open issue for this? How hard will this be? :)
>

Re: Converting byte[] to ByteBuffer

Posted by Ryan Rawson <ry...@gmail.com>.
Where? Everywhere? An array is 24 bytes, bb is 56 bytes. Also the API
is...annoying.
On Jul 8, 2011 4:51 PM, "Jason Rutherglen" <ja...@gmail.com>
wrote:
> Is there an open issue for this? How hard will this be? :)