You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@hbase.apache.org by Gaurav Sharma <ga...@gmail.com> on 2010/12/15 20:00:09 UTC

Hypertable claiming upto >900% random-read throughput vs HBase

Folks, my apologies if this has been discussed here before but can someone
please shed some light on how Hypertable is claiming upto a 900% higher
throughput on random reads and upto a 1000% on sequential reads in their
performance evaluation vs HBase (modeled after the perf-eval test in section
7 of the Bigtable paper):
http://www.hypertable.com/pub/perfeval/test1 (section: System Performance
Difference)

For one, I noticed they are running on CentOS 5.2 on 1.8Ghz dual-core
Opterons / 10gigs of RAM. There's also no date of posting on the blogpost.
It has been a while since I checked but YCSB did not have support for
Hypertable testing. The numbers do seem a bit too good to be true :)

-Gaurav

RE: Hypertable claiming upto >900% random-read throughput vs HBase

Posted by Chad Walters <Ch...@microsoft.com>.

I was really just trying to address this point that Ryan made:
"- They are able to harness larger amounts of RAM, so they are really just testing that vs HBase"

In cases where that actually makes a difference (i.e. there are significant amounts of RAM that can't be harnessed), the overhead of additional JVMs may become inconsequential.

Obviously, your particular mileage may vary.

Chad

-----Original Message-----
From: Ted Dunning [mailto:tdunning@maprtech.com] 
Sent: Wednesday, December 15, 2010 1:53 PM
To: dev@hbase.apache.org
Subject: Re: Hypertable claiming upto >900% random-read throughput vs HBase

That isn't really the trade-off.  The 10x is on an undocumented benchmark with apples to oranges tuning.  Moreover, hbase has had massive speedups since then.

Being able to set heap size actually lets me control memory use more precisely and running a single JVM lets me amortize JVM cost.  Java does do some sharing, but a single JVM is better.

On Wed, Dec 15, 2010 at 12:05 PM, Chad Walters
<Ch...@microsoft.com>wrote:

> Sure, but if the tradeoff is being unable to use all the memory 
> effectively and suffering 10x unfavorable benchmark comparisons, then 
> running 2 or more JVMs with a regionserver per VM seems like a 
> reasonable stopgap until the GC works better.
>
> Chad
>
> -----Original Message-----
> From: Ryan Rawson [mailto:ryanobjc@gmail.com]
> Sent: Wednesday, December 15, 2010 11:58 AM
> To: dev@hbase.apache.org
> Subject: Re: Hypertable claiming upto >900% random-read throughput vs 
> HBase
>
> Why do that?  You reduce the cache effectiveness and up the logistical 
> complexity.  As a stopgap maybe, but not as a long term strategy.
>
> Sun just needs to fix their GC.  Er, Oracle.
>
> -ryan
>
> On Wed, Dec 15, 2010 at 11:55 AM, Chad Walters 
> <Ch...@microsoft.com>
> wrote:
> > Why not run multiple JVMs per machine?
> >
> > Chad
> >
> > -----Original Message-----
> > From: Ryan Rawson [mailto:ryanobjc@gmail.com]
> > Sent: Wednesday, December 15, 2010 11:52 AM
> > To: dev@hbase.apache.org
> > Subject: Re: Hypertable claiming upto >900% random-read throughput 
> > vs HBase
> >
> > The malloc thing was pointing out that we have to contend with Xmx 
> > and
> GC.  So it makes it harder for us to maximally use all the available 
> ram for block cache in the regionserver.  Which you may or may not 
> want to do for alternative reasons.  At least with Xmx you can plan 
> and control your deployments, and you wont suffer from heap growth due to heap fragmentation.
> >
> > -ryan
> >
> > On Wed, Dec 15, 2010 at 11:49 AM, Todd Lipcon <to...@cloudera.com> wrote:
> >> On Wed, Dec 15, 2010 at 11:44 AM, Gaurav Sharma 
> >> <ga...@gmail.com> wrote:
> >>> Thanks Ryan and Ted. I also think if they were using tcmalloc, it 
> >>> would have given them a further advantage but as you said, not 
> >>> much is known about the test source code.
> >>
> >> I think Hypertable does use tcmalloc or jemalloc (forget which)
> >>
> >> You may be interested in this thread from back in August:
> >> http://search-hadoop.com/m/pG6SM1xSP7r/hypertable&subj=Re+Finding+o
> >> n+
> >> H
> >> Base+Hypertable+comparison
> >>
> >> -Todd
> >>
> >>>
> >>> On Wed, Dec 15, 2010 at 2:22 PM, Ryan Rawson <ry...@gmail.com>
> wrote:
> >>>
> >>>> So if that is the case, I'm not sure how that is a fair test.  
> >>>> One system reads from RAM, the other from disk.  The results as expected.
> >>>>
> >>>> Why not test one system with SSDs and the other without?
> >>>>
> >>>> It's really hard to get apples/oranges comparison. Even if you 
> >>>> are doing the same workloads on 2 diverse systems, you are not 
> >>>> testing the code quality, you are testing overall systems and other issues.
> >>>>
> >>>> As G1 GC improves, I expect our ability to use larger and larger 
> >>>> heaps would blunt the advantage of a C++ program using malloc.
> >>>>
> >>>> -ryan
> >>>>
> >>>> On Wed, Dec 15, 2010 at 11:15 AM, Ted Dunning 
> >>>> <td...@maprtech.com>
> >>>> wrote:
> >>>> > From the small comments I have heard, the RAM versus disk 
> >>>> > difference is mostly what I have heard they were testing.
> >>>> >
> >>>> > On Wed, Dec 15, 2010 at 11:11 AM, Ryan Rawson 
> >>>> > <ry...@gmail.com>
> >>>> wrote:
> >>>> >
> >>>> >> We dont have the test source code, so it isnt very objective.
> >>>> >> However I believe there are 2 things which help them:
> >>>> >> - They are able to harness larger amounts of RAM, so they are 
> >>>> >> really just testing that vs HBase
> >>>> >>
> >>>> >
> >>>>
> >>>
> >>
> >>
> >>
> >> --
> >> Todd Lipcon
> >> Software Engineer, Cloudera
> >>
> >
> >
>
>

Re: Hypertable claiming upto >900% random-read throughput vs HBase

Posted by Ted Dunning <td...@maprtech.com>.

That isn't really the trade-off.  The 10x is on an undocumented benchmark
with apples to oranges tuning.  Moreover, hbase has had massive speedups
since then.

Being able to set heap size actually lets me control memory use more
precisely and running a single JVM lets me amortize JVM cost.  Java does do
some sharing, but a single JVM is better.

On Wed, Dec 15, 2010 at 12:05 PM, Chad Walters
<Ch...@microsoft.com>wrote:

> Sure, but if the tradeoff is being unable to use all the memory effectively
> and suffering 10x unfavorable benchmark comparisons, then running 2 or more
> JVMs with a regionserver per VM seems like a reasonable stopgap until the GC
> works better.
>
> Chad
>
> -----Original Message-----
> From: Ryan Rawson [mailto:ryanobjc@gmail.com]
> Sent: Wednesday, December 15, 2010 11:58 AM
> To: dev@hbase.apache.org
> Subject: Re: Hypertable claiming upto >900% random-read throughput vs HBase
>
> Why do that?  You reduce the cache effectiveness and up the logistical
> complexity.  As a stopgap maybe, but not as a long term strategy.
>
> Sun just needs to fix their GC.  Er, Oracle.
>
> -ryan
>
> On Wed, Dec 15, 2010 at 11:55 AM, Chad Walters <Ch...@microsoft.com>
> wrote:
> > Why not run multiple JVMs per machine?
> >
> > Chad
> >
> > -----Original Message-----
> > From: Ryan Rawson [mailto:ryanobjc@gmail.com]
> > Sent: Wednesday, December 15, 2010 11:52 AM
> > To: dev@hbase.apache.org
> > Subject: Re: Hypertable claiming upto >900% random-read throughput vs
> > HBase
> >
> > The malloc thing was pointing out that we have to contend with Xmx and
> GC.  So it makes it harder for us to maximally use all the available ram for
> block cache in the regionserver.  Which you may or may not want to do for
> alternative reasons.  At least with Xmx you can plan and control your
> deployments, and you wont suffer from heap growth due to heap fragmentation.
> >
> > -ryan
> >
> > On Wed, Dec 15, 2010 at 11:49 AM, Todd Lipcon <to...@cloudera.com> wrote:
> >> On Wed, Dec 15, 2010 at 11:44 AM, Gaurav Sharma
> >> <ga...@gmail.com> wrote:
> >>> Thanks Ryan and Ted. I also think if they were using tcmalloc, it
> >>> would have given them a further advantage but as you said, not much
> >>> is known about the test source code.
> >>
> >> I think Hypertable does use tcmalloc or jemalloc (forget which)
> >>
> >> You may be interested in this thread from back in August:
> >> http://search-hadoop.com/m/pG6SM1xSP7r/hypertable&subj=Re+Finding+on+
> >> H
> >> Base+Hypertable+comparison
> >>
> >> -Todd
> >>
> >>>
> >>> On Wed, Dec 15, 2010 at 2:22 PM, Ryan Rawson <ry...@gmail.com>
> wrote:
> >>>
> >>>> So if that is the case, I'm not sure how that is a fair test.  One
> >>>> system reads from RAM, the other from disk.  The results as expected.
> >>>>
> >>>> Why not test one system with SSDs and the other without?
> >>>>
> >>>> It's really hard to get apples/oranges comparison. Even if you are
> >>>> doing the same workloads on 2 diverse systems, you are not testing
> >>>> the code quality, you are testing overall systems and other issues.
> >>>>
> >>>> As G1 GC improves, I expect our ability to use larger and larger
> >>>> heaps would blunt the advantage of a C++ program using malloc.
> >>>>
> >>>> -ryan
> >>>>
> >>>> On Wed, Dec 15, 2010 at 11:15 AM, Ted Dunning
> >>>> <td...@maprtech.com>
> >>>> wrote:
> >>>> > From the small comments I have heard, the RAM versus disk
> >>>> > difference is mostly what I have heard they were testing.
> >>>> >
> >>>> > On Wed, Dec 15, 2010 at 11:11 AM, Ryan Rawson
> >>>> > <ry...@gmail.com>
> >>>> wrote:
> >>>> >
> >>>> >> We dont have the test source code, so it isnt very objective.
> >>>> >> However I believe there are 2 things which help them:
> >>>> >> - They are able to harness larger amounts of RAM, so they are
> >>>> >> really just testing that vs HBase
> >>>> >>
> >>>> >
> >>>>
> >>>
> >>
> >>
> >>
> >> --
> >> Todd Lipcon
> >> Software Engineer, Cloudera
> >>
> >
> >
>
>

RE: Hypertable claiming upto >900% random-read throughput vs HBase

Posted by Chad Walters <Ch...@microsoft.com>.

Sure, but if the tradeoff is being unable to use all the memory effectively and suffering 10x unfavorable benchmark comparisons, then running 2 or more JVMs with a regionserver per VM seems like a reasonable stopgap until the GC works better.

Chad

-----Original Message-----
From: Ryan Rawson [mailto:ryanobjc@gmail.com] 
Sent: Wednesday, December 15, 2010 11:58 AM
To: dev@hbase.apache.org
Subject: Re: Hypertable claiming upto >900% random-read throughput vs HBase

Why do that?  You reduce the cache effectiveness and up the logistical complexity.  As a stopgap maybe, but not as a long term strategy.

Sun just needs to fix their GC.  Er, Oracle.

-ryan

On Wed, Dec 15, 2010 at 11:55 AM, Chad Walters <Ch...@microsoft.com> wrote:
> Why not run multiple JVMs per machine?
>
> Chad
>
> -----Original Message-----
> From: Ryan Rawson [mailto:ryanobjc@gmail.com]
> Sent: Wednesday, December 15, 2010 11:52 AM
> To: dev@hbase.apache.org
> Subject: Re: Hypertable claiming upto >900% random-read throughput vs 
> HBase
>
> The malloc thing was pointing out that we have to contend with Xmx and GC.  So it makes it harder for us to maximally use all the available ram for block cache in the regionserver.  Which you may or may not want to do for alternative reasons.  At least with Xmx you can plan and control your deployments, and you wont suffer from heap growth due to heap fragmentation.
>
> -ryan
>
> On Wed, Dec 15, 2010 at 11:49 AM, Todd Lipcon <to...@cloudera.com> wrote:
>> On Wed, Dec 15, 2010 at 11:44 AM, Gaurav Sharma 
>> <ga...@gmail.com> wrote:
>>> Thanks Ryan and Ted. I also think if they were using tcmalloc, it 
>>> would have given them a further advantage but as you said, not much 
>>> is known about the test source code.
>>
>> I think Hypertable does use tcmalloc or jemalloc (forget which)
>>
>> You may be interested in this thread from back in August:
>> http://search-hadoop.com/m/pG6SM1xSP7r/hypertable&subj=Re+Finding+on+
>> H
>> Base+Hypertable+comparison
>>
>> -Todd
>>
>>>
>>> On Wed, Dec 15, 2010 at 2:22 PM, Ryan Rawson <ry...@gmail.com> wrote:
>>>
>>>> So if that is the case, I'm not sure how that is a fair test.  One 
>>>> system reads from RAM, the other from disk.  The results as expected.
>>>>
>>>> Why not test one system with SSDs and the other without?
>>>>
>>>> It's really hard to get apples/oranges comparison. Even if you are 
>>>> doing the same workloads on 2 diverse systems, you are not testing 
>>>> the code quality, you are testing overall systems and other issues.
>>>>
>>>> As G1 GC improves, I expect our ability to use larger and larger 
>>>> heaps would blunt the advantage of a C++ program using malloc.
>>>>
>>>> -ryan
>>>>
>>>> On Wed, Dec 15, 2010 at 11:15 AM, Ted Dunning 
>>>> <td...@maprtech.com>
>>>> wrote:
>>>> > From the small comments I have heard, the RAM versus disk 
>>>> > difference is mostly what I have heard they were testing.
>>>> >
>>>> > On Wed, Dec 15, 2010 at 11:11 AM, Ryan Rawson 
>>>> > <ry...@gmail.com>
>>>> wrote:
>>>> >
>>>> >> We dont have the test source code, so it isnt very objective.
>>>> >> However I believe there are 2 things which help them:
>>>> >> - They are able to harness larger amounts of RAM, so they are 
>>>> >> really just testing that vs HBase
>>>> >>
>>>> >
>>>>
>>>
>>
>>
>>
>> --
>> Todd Lipcon
>> Software Engineer, Cloudera
>>
>
>

Re: Hypertable claiming upto >900% random-read throughput vs HBase

Posted by Ryan Rawson <ry...@gmail.com>.

Why do that?  You reduce the cache effectiveness and up the logistical
complexity.  As a stopgap maybe, but not as a long term strategy.

Sun just needs to fix their GC.  Er, Oracle.

-ryan

On Wed, Dec 15, 2010 at 11:55 AM, Chad Walters
<Ch...@microsoft.com> wrote:
> Why not run multiple JVMs per machine?
>
> Chad
>
> -----Original Message-----
> From: Ryan Rawson [mailto:ryanobjc@gmail.com]
> Sent: Wednesday, December 15, 2010 11:52 AM
> To: dev@hbase.apache.org
> Subject: Re: Hypertable claiming upto >900% random-read throughput vs HBase
>
> The malloc thing was pointing out that we have to contend with Xmx and GC.  So it makes it harder for us to maximally use all the available ram for block cache in the regionserver.  Which you may or may not want to do for alternative reasons.  At least with Xmx you can plan and control your deployments, and you wont suffer from heap growth due to heap fragmentation.
>
> -ryan
>
> On Wed, Dec 15, 2010 at 11:49 AM, Todd Lipcon <to...@cloudera.com> wrote:
>> On Wed, Dec 15, 2010 at 11:44 AM, Gaurav Sharma
>> <ga...@gmail.com> wrote:
>>> Thanks Ryan and Ted. I also think if they were using tcmalloc, it
>>> would have given them a further advantage but as you said, not much
>>> is known about the test source code.
>>
>> I think Hypertable does use tcmalloc or jemalloc (forget which)
>>
>> You may be interested in this thread from back in August:
>> http://search-hadoop.com/m/pG6SM1xSP7r/hypertable&subj=Re+Finding+on+H
>> Base+Hypertable+comparison
>>
>> -Todd
>>
>>>
>>> On Wed, Dec 15, 2010 at 2:22 PM, Ryan Rawson <ry...@gmail.com> wrote:
>>>
>>>> So if that is the case, I'm not sure how that is a fair test.  One
>>>> system reads from RAM, the other from disk.  The results as expected.
>>>>
>>>> Why not test one system with SSDs and the other without?
>>>>
>>>> It's really hard to get apples/oranges comparison. Even if you are
>>>> doing the same workloads on 2 diverse systems, you are not testing
>>>> the code quality, you are testing overall systems and other issues.
>>>>
>>>> As G1 GC improves, I expect our ability to use larger and larger
>>>> heaps would blunt the advantage of a C++ program using malloc.
>>>>
>>>> -ryan
>>>>
>>>> On Wed, Dec 15, 2010 at 11:15 AM, Ted Dunning
>>>> <td...@maprtech.com>
>>>> wrote:
>>>> > From the small comments I have heard, the RAM versus disk
>>>> > difference is mostly what I have heard they were testing.
>>>> >
>>>> > On Wed, Dec 15, 2010 at 11:11 AM, Ryan Rawson <ry...@gmail.com>
>>>> wrote:
>>>> >
>>>> >> We dont have the test source code, so it isnt very objective.
>>>> >> However I believe there are 2 things which help them:
>>>> >> - They are able to harness larger amounts of RAM, so they are
>>>> >> really just testing that vs HBase
>>>> >>
>>>> >
>>>>
>>>
>>
>>
>>
>> --
>> Todd Lipcon
>> Software Engineer, Cloudera
>>
>
>

RE: Hypertable claiming upto >900% random-read throughput vs HBase

Posted by Chad Walters <Ch...@microsoft.com>.

Why not run multiple JVMs per machine?

Chad

-----Original Message-----
From: Ryan Rawson [mailto:ryanobjc@gmail.com] 
Sent: Wednesday, December 15, 2010 11:52 AM
To: dev@hbase.apache.org
Subject: Re: Hypertable claiming upto >900% random-read throughput vs HBase

The malloc thing was pointing out that we have to contend with Xmx and GC.  So it makes it harder for us to maximally use all the available ram for block cache in the regionserver.  Which you may or may not want to do for alternative reasons.  At least with Xmx you can plan and control your deployments, and you wont suffer from heap growth due to heap fragmentation.

-ryan

On Wed, Dec 15, 2010 at 11:49 AM, Todd Lipcon <to...@cloudera.com> wrote:
> On Wed, Dec 15, 2010 at 11:44 AM, Gaurav Sharma 
> <ga...@gmail.com> wrote:
>> Thanks Ryan and Ted. I also think if they were using tcmalloc, it 
>> would have given them a further advantage but as you said, not much 
>> is known about the test source code.
>
> I think Hypertable does use tcmalloc or jemalloc (forget which)
>
> You may be interested in this thread from back in August:
> http://search-hadoop.com/m/pG6SM1xSP7r/hypertable&subj=Re+Finding+on+H
> Base+Hypertable+comparison
>
> -Todd
>
>>
>> On Wed, Dec 15, 2010 at 2:22 PM, Ryan Rawson <ry...@gmail.com> wrote:
>>
>>> So if that is the case, I'm not sure how that is a fair test.  One 
>>> system reads from RAM, the other from disk.  The results as expected.
>>>
>>> Why not test one system with SSDs and the other without?
>>>
>>> It's really hard to get apples/oranges comparison. Even if you are 
>>> doing the same workloads on 2 diverse systems, you are not testing 
>>> the code quality, you are testing overall systems and other issues.
>>>
>>> As G1 GC improves, I expect our ability to use larger and larger 
>>> heaps would blunt the advantage of a C++ program using malloc.
>>>
>>> -ryan
>>>
>>> On Wed, Dec 15, 2010 at 11:15 AM, Ted Dunning 
>>> <td...@maprtech.com>
>>> wrote:
>>> > From the small comments I have heard, the RAM versus disk 
>>> > difference is mostly what I have heard they were testing.
>>> >
>>> > On Wed, Dec 15, 2010 at 11:11 AM, Ryan Rawson <ry...@gmail.com>
>>> wrote:
>>> >
>>> >> We dont have the test source code, so it isnt very objective.  
>>> >> However I believe there are 2 things which help them:
>>> >> - They are able to harness larger amounts of RAM, so they are 
>>> >> really just testing that vs HBase
>>> >>
>>> >
>>>
>>
>
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera
>

Re: Hypertable claiming upto >900% random-read throughput vs HBase

Posted by Ed Kohlwey <ek...@gmail.com>.

Along the lines of Terracotta big memory, apparently what they are actually
doing is just using the DirectByteBuffer class (see this forum post:
http://forums.terracotta.org/forums/posts/list/4304.page) which is basically
the same as using malloc - it gives you non-gc access to a giant pool of
memory that you can allocate as you please.

Using the DirectByteBuffer directly might be even better than using
bigmemory, since it appears to use java object serialization to translate
between their "special" memory and regular java memory, which is probably
just another unnecessary layer.

On Wed, Dec 15, 2010 at 3:27 PM, Vladimir Rodionov
<vr...@carrieriq.com>wrote:

> Why do not you use off heap memory for this purpose? If its block cache
> (all blocks are of equal sizes)
> alloc/free algorithm is pretty much simple - you do not have to
> re-implement malloc in Java.
>
> I think something like open source version of Terracotta BigMemory is a
> good candidate for
> Apache project. I see at least  several large Hadoops : HBase, HDFS
> DataNodes, TaskTrackers and NameNode who suffer a lot from GC timeouts.
>
>
> Best regards,
> Vladimir Rodionov
> Principal Platform Engineer
> Carrier IQ, www.carrieriq.com
> e-mail: vrodionov@carrieriq.com
>
>
>

Re: Hypertable claiming upto >900% random-read throughput vs HBase

Posted by Ryan Rawson <ry...@gmail.com>.

I've looked in to this a lot, and the summary is 'easier said than
done'.  If you look at Terracotta they are using serialization of data
structures to off-heap ram, so it really is kind of like those EMM
systems from ye olde dose days.

Having done some prototypes of this, the most likely use case is to
store the heap blocks in off-heap ram, but still iterate over them in
java space.  To make life possible/easy we'd have to copy KeyValues
out of the block buffer in to per-scanner chunks, and this overhead
might negate other benefits.  Without building the whole stack out its
hard to construct a microbenchmark that isnt hopelessly naive.

In the mean time, the state of the GC space is getting more attention.
  G1 seems like it should be soon a viable alternative to those
systems which aren't sensitive to 250ms minor collection pauses.

-ryan

On Wed, Dec 15, 2010 at 12:30 PM, Todd Lipcon <to...@cloudera.com> wrote:
> On Wed, Dec 15, 2010 at 12:27 PM, Vladimir Rodionov
> <vr...@carrieriq.com> wrote:
>> Why do not you use off heap memory for this purpose? If its block cache (all blocks are of equal sizes)
>> alloc/free algorithm is pretty much simple - you do not have to re-implement malloc in Java.
>
> The block cache unfortunately isn't all equal size - if you have a
> single cell larger than the hfile block size, the block expands to fit
> it.
>
> That said we could use a fairly simple slab allocator.
>
> The bigger difficulty is in reference counting/tracking - the hfile
> blocks are zero-copied out all the way to the RPC implementation so
> tracking references is not straightforward.
>
> -Todd
>
>>
>> I think something like open source version of Terracotta BigMemory is a good candidate for
>> Apache project. I see at least  several large Hadoops : HBase, HDFS DataNodes, TaskTrackers and NameNode who suffer a lot from GC timeouts.
>>
>>
>> Best regards,
>> Vladimir Rodionov
>> Principal Platform Engineer
>> Carrier IQ, www.carrieriq.com
>> e-mail: vrodionov@carrieriq.com
>>
>> ________________________________________
>> From: Ryan Rawson [ryanobjc@gmail.com]
>> Sent: Wednesday, December 15, 2010 11:52 AM
>> To: dev@hbase.apache.org
>> Subject: Re: Hypertable claiming upto >900% random-read throughput vs HBase
>>
>> The malloc thing was pointing out that we have to contend with Xmx and
>> GC.  So it makes it harder for us to maximally use all the available
>> ram for block cache in the regionserver.  Which you may or may not
>> want to do for alternative reasons.  At least with Xmx you can plan
>> and control your deployments, and you wont suffer from heap growth due
>> to heap fragmentation.
>>
>> -ryan
>>
>> On Wed, Dec 15, 2010 at 11:49 AM, Todd Lipcon <to...@cloudera.com> wrote:
>>> On Wed, Dec 15, 2010 at 11:44 AM, Gaurav Sharma
>>> <ga...@gmail.com> wrote:
>>>> Thanks Ryan and Ted. I also think if they were using tcmalloc, it would have
>>>> given them a further advantage but as you said, not much is known about the
>>>> test source code.
>>>
>>> I think Hypertable does use tcmalloc or jemalloc (forget which)
>>>
>>> You may be interested in this thread from back in August:
>>> http://search-hadoop.com/m/pG6SM1xSP7r/hypertable&subj=Re+Finding+on+HBase+Hypertable+comparison
>>>
>>> -Todd
>>>
>>>>
>>>> On Wed, Dec 15, 2010 at 2:22 PM, Ryan Rawson <ry...@gmail.com> wrote:
>>>>
>>>>> So if that is the case, I'm not sure how that is a fair test.  One
>>>>> system reads from RAM, the other from disk.  The results as expected.
>>>>>
>>>>> Why not test one system with SSDs and the other without?
>>>>>
>>>>> It's really hard to get apples/oranges comparison. Even if you are
>>>>> doing the same workloads on 2 diverse systems, you are not testing the
>>>>> code quality, you are testing overall systems and other issues.
>>>>>
>>>>> As G1 GC improves, I expect our ability to use larger and larger heaps
>>>>> would blunt the advantage of a C++ program using malloc.
>>>>>
>>>>> -ryan
>>>>>
>>>>> On Wed, Dec 15, 2010 at 11:15 AM, Ted Dunning <td...@maprtech.com>
>>>>> wrote:
>>>>> > From the small comments I have heard, the RAM versus disk difference is
>>>>> > mostly what I have heard they were testing.
>>>>> >
>>>>> > On Wed, Dec 15, 2010 at 11:11 AM, Ryan Rawson <ry...@gmail.com>
>>>>> wrote:
>>>>> >
>>>>> >> We dont have the test source code, so it isnt very objective.  However
>>>>> >> I believe there are 2 things which help them:
>>>>> >> - They are able to harness larger amounts of RAM, so they are really
>>>>> >> just testing that vs HBase
>>>>> >>
>>>>> >
>>>>>
>>>>
>>>
>>>
>>>
>>> --
>>> Todd Lipcon
>>> Software Engineer, Cloudera
>>>
>>
>
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera
>

Re: Hypertable claiming upto >900% random-read throughput vs HBase

Posted by Todd Lipcon <to...@cloudera.com>.

On Wed, Dec 15, 2010 at 12:27 PM, Vladimir Rodionov
<vr...@carrieriq.com> wrote:
> Why do not you use off heap memory for this purpose? If its block cache (all blocks are of equal sizes)
> alloc/free algorithm is pretty much simple - you do not have to re-implement malloc in Java.

The block cache unfortunately isn't all equal size - if you have a
single cell larger than the hfile block size, the block expands to fit
it.

That said we could use a fairly simple slab allocator.

The bigger difficulty is in reference counting/tracking - the hfile
blocks are zero-copied out all the way to the RPC implementation so
tracking references is not straightforward.

-Todd

>
> I think something like open source version of Terracotta BigMemory is a good candidate for
> Apache project. I see at least  several large Hadoops : HBase, HDFS DataNodes, TaskTrackers and NameNode who suffer a lot from GC timeouts.
>
>
> Best regards,
> Vladimir Rodionov
> Principal Platform Engineer
> Carrier IQ, www.carrieriq.com
> e-mail: vrodionov@carrieriq.com
>
> ________________________________________
> From: Ryan Rawson [ryanobjc@gmail.com]
> Sent: Wednesday, December 15, 2010 11:52 AM
> To: dev@hbase.apache.org
> Subject: Re: Hypertable claiming upto >900% random-read throughput vs HBase
>
> The malloc thing was pointing out that we have to contend with Xmx and
> GC.  So it makes it harder for us to maximally use all the available
> ram for block cache in the regionserver.  Which you may or may not
> want to do for alternative reasons.  At least with Xmx you can plan
> and control your deployments, and you wont suffer from heap growth due
> to heap fragmentation.
>
> -ryan
>
> On Wed, Dec 15, 2010 at 11:49 AM, Todd Lipcon <to...@cloudera.com> wrote:
>> On Wed, Dec 15, 2010 at 11:44 AM, Gaurav Sharma
>> <ga...@gmail.com> wrote:
>>> Thanks Ryan and Ted. I also think if they were using tcmalloc, it would have
>>> given them a further advantage but as you said, not much is known about the
>>> test source code.
>>
>> I think Hypertable does use tcmalloc or jemalloc (forget which)
>>
>> You may be interested in this thread from back in August:
>> http://search-hadoop.com/m/pG6SM1xSP7r/hypertable&subj=Re+Finding+on+HBase+Hypertable+comparison
>>
>> -Todd
>>
>>>
>>> On Wed, Dec 15, 2010 at 2:22 PM, Ryan Rawson <ry...@gmail.com> wrote:
>>>
>>>> So if that is the case, I'm not sure how that is a fair test.  One
>>>> system reads from RAM, the other from disk.  The results as expected.
>>>>
>>>> Why not test one system with SSDs and the other without?
>>>>
>>>> It's really hard to get apples/oranges comparison. Even if you are
>>>> doing the same workloads on 2 diverse systems, you are not testing the
>>>> code quality, you are testing overall systems and other issues.
>>>>
>>>> As G1 GC improves, I expect our ability to use larger and larger heaps
>>>> would blunt the advantage of a C++ program using malloc.
>>>>
>>>> -ryan
>>>>
>>>> On Wed, Dec 15, 2010 at 11:15 AM, Ted Dunning <td...@maprtech.com>
>>>> wrote:
>>>> > From the small comments I have heard, the RAM versus disk difference is
>>>> > mostly what I have heard they were testing.
>>>> >
>>>> > On Wed, Dec 15, 2010 at 11:11 AM, Ryan Rawson <ry...@gmail.com>
>>>> wrote:
>>>> >
>>>> >> We dont have the test source code, so it isnt very objective.  However
>>>> >> I believe there are 2 things which help them:
>>>> >> - They are able to harness larger amounts of RAM, so they are really
>>>> >> just testing that vs HBase
>>>> >>
>>>> >
>>>>
>>>
>>
>>
>>
>> --
>> Todd Lipcon
>> Software Engineer, Cloudera
>>
>



-- 
Todd Lipcon
Software Engineer, Cloudera

RE: Hypertable claiming upto >900% random-read throughput vs HBase

Posted by Vladimir Rodionov <vr...@carrieriq.com>.

Why do not you use off heap memory for this purpose? If its block cache (all blocks are of equal sizes)
alloc/free algorithm is pretty much simple - you do not have to re-implement malloc in Java.

I think something like open source version of Terracotta BigMemory is a good candidate for
Apache project. I see at least  several large Hadoops : HBase, HDFS DataNodes, TaskTrackers and NameNode who suffer a lot from GC timeouts.


Best regards,
Vladimir Rodionov
Principal Platform Engineer
Carrier IQ, www.carrieriq.com
e-mail: vrodionov@carrieriq.com

________________________________________
From: Ryan Rawson [ryanobjc@gmail.com]
Sent: Wednesday, December 15, 2010 11:52 AM
To: dev@hbase.apache.org
Subject: Re: Hypertable claiming upto >900% random-read throughput vs HBase

The malloc thing was pointing out that we have to contend with Xmx and
GC.  So it makes it harder for us to maximally use all the available
ram for block cache in the regionserver.  Which you may or may not
want to do for alternative reasons.  At least with Xmx you can plan
and control your deployments, and you wont suffer from heap growth due
to heap fragmentation.

-ryan

On Wed, Dec 15, 2010 at 11:49 AM, Todd Lipcon <to...@cloudera.com> wrote:
> On Wed, Dec 15, 2010 at 11:44 AM, Gaurav Sharma
> <ga...@gmail.com> wrote:
>> Thanks Ryan and Ted. I also think if they were using tcmalloc, it would have
>> given them a further advantage but as you said, not much is known about the
>> test source code.
>
> I think Hypertable does use tcmalloc or jemalloc (forget which)
>
> You may be interested in this thread from back in August:
> http://search-hadoop.com/m/pG6SM1xSP7r/hypertable&subj=Re+Finding+on+HBase+Hypertable+comparison
>
> -Todd
>
>>
>> On Wed, Dec 15, 2010 at 2:22 PM, Ryan Rawson <ry...@gmail.com> wrote:
>>
>>> So if that is the case, I'm not sure how that is a fair test.  One
>>> system reads from RAM, the other from disk.  The results as expected.
>>>
>>> Why not test one system with SSDs and the other without?
>>>
>>> It's really hard to get apples/oranges comparison. Even if you are
>>> doing the same workloads on 2 diverse systems, you are not testing the
>>> code quality, you are testing overall systems and other issues.
>>>
>>> As G1 GC improves, I expect our ability to use larger and larger heaps
>>> would blunt the advantage of a C++ program using malloc.
>>>
>>> -ryan
>>>
>>> On Wed, Dec 15, 2010 at 11:15 AM, Ted Dunning <td...@maprtech.com>
>>> wrote:
>>> > From the small comments I have heard, the RAM versus disk difference is
>>> > mostly what I have heard they were testing.
>>> >
>>> > On Wed, Dec 15, 2010 at 11:11 AM, Ryan Rawson <ry...@gmail.com>
>>> wrote:
>>> >
>>> >> We dont have the test source code, so it isnt very objective.  However
>>> >> I believe there are 2 things which help them:
>>> >> - They are able to harness larger amounts of RAM, so they are really
>>> >> just testing that vs HBase
>>> >>
>>> >
>>>
>>
>
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera
>

Re: Hypertable claiming upto >900% random-read throughput vs HBase

Posted by Ryan Rawson <ry...@gmail.com>.

The malloc thing was pointing out that we have to contend with Xmx and
GC.  So it makes it harder for us to maximally use all the available
ram for block cache in the regionserver.  Which you may or may not
want to do for alternative reasons.  At least with Xmx you can plan
and control your deployments, and you wont suffer from heap growth due
to heap fragmentation.

-ryan

On Wed, Dec 15, 2010 at 11:49 AM, Todd Lipcon <to...@cloudera.com> wrote:
> On Wed, Dec 15, 2010 at 11:44 AM, Gaurav Sharma
> <ga...@gmail.com> wrote:
>> Thanks Ryan and Ted. I also think if they were using tcmalloc, it would have
>> given them a further advantage but as you said, not much is known about the
>> test source code.
>
> I think Hypertable does use tcmalloc or jemalloc (forget which)
>
> You may be interested in this thread from back in August:
> http://search-hadoop.com/m/pG6SM1xSP7r/hypertable&subj=Re+Finding+on+HBase+Hypertable+comparison
>
> -Todd
>
>>
>> On Wed, Dec 15, 2010 at 2:22 PM, Ryan Rawson <ry...@gmail.com> wrote:
>>
>>> So if that is the case, I'm not sure how that is a fair test.  One
>>> system reads from RAM, the other from disk.  The results as expected.
>>>
>>> Why not test one system with SSDs and the other without?
>>>
>>> It's really hard to get apples/oranges comparison. Even if you are
>>> doing the same workloads on 2 diverse systems, you are not testing the
>>> code quality, you are testing overall systems and other issues.
>>>
>>> As G1 GC improves, I expect our ability to use larger and larger heaps
>>> would blunt the advantage of a C++ program using malloc.
>>>
>>> -ryan
>>>
>>> On Wed, Dec 15, 2010 at 11:15 AM, Ted Dunning <td...@maprtech.com>
>>> wrote:
>>> > From the small comments I have heard, the RAM versus disk difference is
>>> > mostly what I have heard they were testing.
>>> >
>>> > On Wed, Dec 15, 2010 at 11:11 AM, Ryan Rawson <ry...@gmail.com>
>>> wrote:
>>> >
>>> >> We dont have the test source code, so it isnt very objective.  However
>>> >> I believe there are 2 things which help them:
>>> >> - They are able to harness larger amounts of RAM, so they are really
>>> >> just testing that vs HBase
>>> >>
>>> >
>>>
>>
>
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera
>

Re: Hypertable claiming upto >900% random-read throughput vs HBase

Posted by Todd Lipcon <to...@cloudera.com>.

On Wed, Dec 15, 2010 at 11:44 AM, Gaurav Sharma
<ga...@gmail.com> wrote:
> Thanks Ryan and Ted. I also think if they were using tcmalloc, it would have
> given them a further advantage but as you said, not much is known about the
> test source code.

I think Hypertable does use tcmalloc or jemalloc (forget which)

You may be interested in this thread from back in August:
http://search-hadoop.com/m/pG6SM1xSP7r/hypertable&subj=Re+Finding+on+HBase+Hypertable+comparison

-Todd

>
> On Wed, Dec 15, 2010 at 2:22 PM, Ryan Rawson <ry...@gmail.com> wrote:
>
>> So if that is the case, I'm not sure how that is a fair test.  One
>> system reads from RAM, the other from disk.  The results as expected.
>>
>> Why not test one system with SSDs and the other without?
>>
>> It's really hard to get apples/oranges comparison. Even if you are
>> doing the same workloads on 2 diverse systems, you are not testing the
>> code quality, you are testing overall systems and other issues.
>>
>> As G1 GC improves, I expect our ability to use larger and larger heaps
>> would blunt the advantage of a C++ program using malloc.
>>
>> -ryan
>>
>> On Wed, Dec 15, 2010 at 11:15 AM, Ted Dunning <td...@maprtech.com>
>> wrote:
>> > From the small comments I have heard, the RAM versus disk difference is
>> > mostly what I have heard they were testing.
>> >
>> > On Wed, Dec 15, 2010 at 11:11 AM, Ryan Rawson <ry...@gmail.com>
>> wrote:
>> >
>> >> We dont have the test source code, so it isnt very objective.  However
>> >> I believe there are 2 things which help them:
>> >> - They are able to harness larger amounts of RAM, so they are really
>> >> just testing that vs HBase
>> >>
>> >
>>
>



-- 
Todd Lipcon
Software Engineer, Cloudera

Re: Hypertable claiming upto >900% random-read throughput vs HBase

Posted by Gaurav Sharma <ga...@gmail.com>.

Thanks Ryan and Ted. I also think if they were using tcmalloc, it would have
given them a further advantage but as you said, not much is known about the
test source code.

On Wed, Dec 15, 2010 at 2:22 PM, Ryan Rawson <ry...@gmail.com> wrote:

> So if that is the case, I'm not sure how that is a fair test.  One
> system reads from RAM, the other from disk.  The results as expected.
>
> Why not test one system with SSDs and the other without?
>
> It's really hard to get apples/oranges comparison. Even if you are
> doing the same workloads on 2 diverse systems, you are not testing the
> code quality, you are testing overall systems and other issues.
>
> As G1 GC improves, I expect our ability to use larger and larger heaps
> would blunt the advantage of a C++ program using malloc.
>
> -ryan
>
> On Wed, Dec 15, 2010 at 11:15 AM, Ted Dunning <td...@maprtech.com>
> wrote:
> > From the small comments I have heard, the RAM versus disk difference is
> > mostly what I have heard they were testing.
> >
> > On Wed, Dec 15, 2010 at 11:11 AM, Ryan Rawson <ry...@gmail.com>
> wrote:
> >
> >> We dont have the test source code, so it isnt very objective.  However
> >> I believe there are 2 things which help them:
> >> - They are able to harness larger amounts of RAM, so they are really
> >> just testing that vs HBase
> >>
> >
>

Re: Hypertable claiming upto >900% random-read throughput vs HBase

Posted by Andrew Purtell <ap...@apache.org>.

> From: Ryan Rawson <ry...@gmail.com>
> Purtell has more, but he told me "no longer crashes, but minor pauses
> between 50-250 ms".  From 1.6_23.
 
That's right. 

On EC2 m1.xlarge so that's a big caveat... per-test-iteration variance on EC2 in general is ~20%, and EC2 hardware is 2? generations back. Someone with real hardware could get reasonable minor pause variations. I was just smoke testing. 

Maybe I can will try going a bit further this week. 

Best regards,

    - Andy

Re: Hypertable claiming upto >900% random-read throughput vs HBase

Posted by Ryan Rawson <ry...@gmail.com>.

Purtell has more, but he told me "no longer crashes, but minor pauses
between 50-250 ms".  From 1.6_23.

Still not usable in a latency sensitive prod setting.  Maybe in other settings?

-ryan

On Wed, Dec 15, 2010 at 11:31 AM, Ted Dunning <td...@maprtech.com> wrote:
> Does anybody have a recent report about how G1 is coming along?
>
> On Wed, Dec 15, 2010 at 11:22 AM, Ryan Rawson <ry...@gmail.com> wrote:
>
>> As G1 GC improves, I expect our ability to use larger and larger heaps
>> would blunt the advantage of a C++ program using malloc.
>>
>

Re: Hypertable claiming upto >900% random-read throughput vs HBase

Posted by Andrew Purtell <ap...@apache.org>.

> Does anybody have a recent report about how G1 is coming along?

Not in general, but as pertains to HBase, tried it recently with 1.6.0u23 and ran a generic heavy write test without crashing any more, so that is something. But I have not tried stressing it at "production" workloads.

Best regards,

    - Andy

--- On Wed, 12/15/10, Ted Dunning <td...@maprtech.com> wrote:

> From: Ted Dunning <td...@maprtech.com>
> Subject: Re: Hypertable claiming upto >900% random-read throughput vs HBase
> To: dev@hbase.apache.org
> Date: Wednesday, December 15, 2010, 11:31 AM
>
> Does anybody have a recent report about how G1 is coming along?

Re: Hypertable claiming upto >900% random-read throughput vs HBase

Posted by Ted Dunning <td...@maprtech.com>.

Does anybody have a recent report about how G1 is coming along?

On Wed, Dec 15, 2010 at 11:22 AM, Ryan Rawson <ry...@gmail.com> wrote:

> As G1 GC improves, I expect our ability to use larger and larger heaps
> would blunt the advantage of a C++ program using malloc.
>

Re: Hypertable claiming upto >900% random-read throughput vs HBase

Posted by Ryan Rawson <ry...@gmail.com>.

So if that is the case, I'm not sure how that is a fair test.  One
system reads from RAM, the other from disk.  The results as expected.

Why not test one system with SSDs and the other without?

It's really hard to get apples/oranges comparison. Even if you are
doing the same workloads on 2 diverse systems, you are not testing the
code quality, you are testing overall systems and other issues.

As G1 GC improves, I expect our ability to use larger and larger heaps
would blunt the advantage of a C++ program using malloc.

-ryan

On Wed, Dec 15, 2010 at 11:15 AM, Ted Dunning <td...@maprtech.com> wrote:
> From the small comments I have heard, the RAM versus disk difference is
> mostly what I have heard they were testing.
>
> On Wed, Dec 15, 2010 at 11:11 AM, Ryan Rawson <ry...@gmail.com> wrote:
>
>> We dont have the test source code, so it isnt very objective.  However
>> I believe there are 2 things which help them:
>> - They are able to harness larger amounts of RAM, so they are really
>> just testing that vs HBase
>>
>

Re: Hypertable claiming upto >900% random-read throughput vs HBase

Posted by Ted Dunning <td...@maprtech.com>.

>From the small comments I have heard, the RAM versus disk difference is
mostly what I have heard they were testing.

On Wed, Dec 15, 2010 at 11:11 AM, Ryan Rawson <ry...@gmail.com> wrote:

> We dont have the test source code, so it isnt very objective.  However
> I believe there are 2 things which help them:
> - They are able to harness larger amounts of RAM, so they are really
> just testing that vs HBase
>

Re: Hypertable claiming upto >900% random-read throughput vs HBase

Posted by Ryan Rawson <ry...@gmail.com>.

Hi,

We dont have the test source code, so it isnt very objective.  However
I believe there are 2 things which help them:
- They are able to harness larger amounts of RAM, so they are really
just testing that vs HBase
- There have been substantial performance improvements in HBase since
the version they used to test with.  I'm talking like 5x speedups in
some scan cases.

With those two things I believe we blunt the difference substantially,
but without the source it is impossible to tell.

Finally, aside from the speed issues, there is the community and
hadoop integration aspects. Once you get past the faster speed you
might miss the great map reduce hookups, the diverse 3rd party
community around hbase, and the size/helpfulness of the community in
general.

Good luck with your evals,
-ryan

On Wed, Dec 15, 2010 at 11:00 AM, Gaurav Sharma
<ga...@gmail.com> wrote:
> Folks, my apologies if this has been discussed here before but can someone
> please shed some light on how Hypertable is claiming upto a 900% higher
> throughput on random reads and upto a 1000% on sequential reads in their
> performance evaluation vs HBase (modeled after the perf-eval test in section
> 7 of the Bigtable paper):
> http://www.hypertable.com/pub/perfeval/test1 (section: System Performance
> Difference)
>
> For one, I noticed they are running on CentOS 5.2 on 1.8Ghz dual-core
> Opterons / 10gigs of RAM. There's also no date of posting on the blogpost.
> It has been a while since I checked but YCSB did not have support for
> Hypertable testing. The numbers do seem a bit too good to be true :)
>
> -Gaurav
>