You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@accumulo.apache.org by "Morris, Jason" <ja...@amazon.com> on 2014/03/06 22:48:32 UTC

Benchmarking Accumulo

Hey everyone,

I was wondering if anyone had a benchmark test for Accumulo? I could write a map/reduce job that creates a bunch of tables, maybe push some data, then drop them but I was wondering if anyone had something better?

Thanks,
Jason

Re: Benchmarking Accumulo

Posted by Mike Drob <ma...@cloudera.com>.
Jason,

It depends on what kind of performance you are trying to test. We have a
suite called the "continuous ingest"[1] test which basically pushes data
into a table until you tell it to stop. We have used this in the past to
get performance characteristics of write throughput between versions.

The numbers will vary wildly based on the size/hardware of your cluster, so
we generally do not publish the numbers because they are meaningless
without broader context.

I'm not sure that we have a great "read" benchmark, but spinning up a
MapReduce job is certainly an easy way to get started.

Mike

[1]:
https://github.com/apache/accumulo/blob/master/test/system/continuous/README


On Thu, Mar 6, 2014 at 4:48 PM, Morris, Jason <ja...@amazon.com> wrote:

> Hey everyone,
>
> I was wondering if anyone had a benchmark test for Accumulo? I could write
> a map/reduce job that creates a bunch of tables, maybe push some data, then
> drop them but I was wondering if anyone had something better?
>
> Thanks,
> Jason
>

Re: Benchmarking Accumulo

Posted by Nehal Mehta <ne...@gmail.com>.
I would say writing jmeter plugin may be better idea. I have started it but
may take some time to open source it as currently busy with something else.

Thanks,
Nehal
On Mar 6, 2014 4:49 PM, "Morris, Jason" <ja...@amazon.com> wrote:

> Hey everyone,
>
> I was wondering if anyone had a benchmark test for Accumulo? I could write
> a map/reduce job that creates a bunch of tables, maybe push some data, then
> drop them but I was wondering if anyone had something better?
>
> Thanks,
> Jason
>

Re: Benchmarking Accumulo

Posted by Josh Elser <jo...@gmail.com>.
Definitely a good idea, Jeremy. Performance numbers always benefit the 
community -- I'd love to make sure they get published prominently on the 
Accumulo site.

While the value of a benchmark is really only in the workload it 
performs, a good benchmark can be decomposed into a base set of 
operations which should be generally applicable. I don't agree that 
benchmarking Accumulo with D4M is only valid if you then use D4M.

As long as you state your performance requirements in a way that's 
comparable to your benchmark, that's all that really matters.

On 3/9/14, 4:35 PM, Jeremy Kepner wrote:
> Benchmarking Accumulo generally makes it look very good when
> compared to competing technologies and so benchmarking is good for the Accumulo
> community.
>
> For any particular application, standard benchmarks can
> be very helpful to verify your system is performing correctly.
>
> If we have a performance issue with a system, often the first thing
> we will do is run a benchmark on it to determine if the issue
> is with the system or how are application is using the system.
>
> On Sun, Mar 09, 2014 at 03:55:31PM -0400, David Medinets wrote:
>> What is the goal of your benchmarking? To some extent, benchmarking
>> Accumulo can't provide any true answers because it won't be using your
>> real-world data. A lot depends on the schema that you use. The D4M
>> benchmark would only be applicable to you if you plan to use their schema.
>>
>>
>> On Sun, Mar 9, 2014 at 2:23 PM, Kepner, Jeremy - 0553 - MITLL <
>> kepner@ll.mit.edu> wrote:
>>
>>>
>>> On Mar 9, 2014, at 2:21 PM, Arshak Navruzyan <ar...@gmail.com> wrote:
>>>
>>> The benchmark in the D4M paper is very helpful but perhaps you could
>>> clarify a few things:
>>>
>>> 1.  The 4 million entries per second pertains to the main table only or
>>> the main table, transpose and degree tables as well?
>>>
>>>
>>> All tables.
>>>
>>> 2.  Can  you share you accumulo-site.xml settings for the test?  In
>>> particular the memory map size and compaction ratio settings.
>>>
>>>
>>>
>>>
>>> On Thu, Mar 6, 2014 at 3:07 PM, Jeremy Kepner <ke...@ll.mit.edu> wrote:
>>>
>>>> There is one in D4M :-)
>>>>
>>>> On Thu, Mar 06, 2014 at 04:48:32PM -0500, Morris, Jason wrote:
>>>>> Hey everyone,
>>>>>
>>>>> I was wondering if anyone had a benchmark test for Accumulo? I could
>>>> write a map/reduce job that creates a bunch of tables, maybe push some
>>>> data, then drop them but I was wondering if anyone had something better?
>>>>>
>>>>> Thanks,
>>>>> Jason
>>>>
>>>
>>>
>>>

Re: Benchmarking Accumulo

Posted by Jeremy Kepner <ke...@ll.mit.edu>.
Benchmarking Accumulo generally makes it look very good when
compared to competing technologies and so benchmarking is good for the Accumulo
community.

For any particular application, standard benchmarks can
be very helpful to verify your system is performing correctly.

If we have a performance issue with a system, often the first thing
we will do is run a benchmark on it to determine if the issue
is with the system or how are application is using the system.

On Sun, Mar 09, 2014 at 03:55:31PM -0400, David Medinets wrote:
> What is the goal of your benchmarking? To some extent, benchmarking
> Accumulo can't provide any true answers because it won't be using your
> real-world data. A lot depends on the schema that you use. The D4M
> benchmark would only be applicable to you if you plan to use their schema.
> 
> 
> On Sun, Mar 9, 2014 at 2:23 PM, Kepner, Jeremy - 0553 - MITLL <
> kepner@ll.mit.edu> wrote:
> 
> >
> > On Mar 9, 2014, at 2:21 PM, Arshak Navruzyan <ar...@gmail.com> wrote:
> >
> > The benchmark in the D4M paper is very helpful but perhaps you could
> > clarify a few things:
> >
> > 1.  The 4 million entries per second pertains to the main table only or
> > the main table, transpose and degree tables as well?
> >
> >
> > All tables.
> >
> > 2.  Can  you share you accumulo-site.xml settings for the test?  In
> > particular the memory map size and compaction ratio settings.
> >
> >
> >
> >
> > On Thu, Mar 6, 2014 at 3:07 PM, Jeremy Kepner <ke...@ll.mit.edu> wrote:
> >
> >> There is one in D4M :-)
> >>
> >> On Thu, Mar 06, 2014 at 04:48:32PM -0500, Morris, Jason wrote:
> >> > Hey everyone,
> >> >
> >> > I was wondering if anyone had a benchmark test for Accumulo? I could
> >> write a map/reduce job that creates a bunch of tables, maybe push some
> >> data, then drop them but I was wondering if anyone had something better?
> >> >
> >> > Thanks,
> >> > Jason
> >>
> >
> >
> >

Re: Benchmarking Accumulo

Posted by David Medinets <da...@gmail.com>.
What is the goal of your benchmarking? To some extent, benchmarking
Accumulo can't provide any true answers because it won't be using your
real-world data. A lot depends on the schema that you use. The D4M
benchmark would only be applicable to you if you plan to use their schema.


On Sun, Mar 9, 2014 at 2:23 PM, Kepner, Jeremy - 0553 - MITLL <
kepner@ll.mit.edu> wrote:

>
> On Mar 9, 2014, at 2:21 PM, Arshak Navruzyan <ar...@gmail.com> wrote:
>
> The benchmark in the D4M paper is very helpful but perhaps you could
> clarify a few things:
>
> 1.  The 4 million entries per second pertains to the main table only or
> the main table, transpose and degree tables as well?
>
>
> All tables.
>
> 2.  Can  you share you accumulo-site.xml settings for the test?  In
> particular the memory map size and compaction ratio settings.
>
>
>
>
> On Thu, Mar 6, 2014 at 3:07 PM, Jeremy Kepner <ke...@ll.mit.edu> wrote:
>
>> There is one in D4M :-)
>>
>> On Thu, Mar 06, 2014 at 04:48:32PM -0500, Morris, Jason wrote:
>> > Hey everyone,
>> >
>> > I was wondering if anyone had a benchmark test for Accumulo? I could
>> write a map/reduce job that creates a bunch of tables, maybe push some
>> data, then drop them but I was wondering if anyone had something better?
>> >
>> > Thanks,
>> > Jason
>>
>
>
>

Re: Benchmarking Accumulo

Posted by "Kepner, Jeremy - 0553 - MITLL" <ke...@ll.mit.edu>.
On Mar 9, 2014, at 2:21 PM, Arshak Navruzyan <ar...@gmail.com> wrote:

> The benchmark in the D4M paper is very helpful but perhaps you could clarify a few things:
> 
> 1.  The 4 million entries per second pertains to the main table only or the main table, transpose and degree tables as well?

All tables.

> 2.  Can  you share you accumulo-site.xml settings for the test?  In particular the memory map size and compaction ratio settings.
> 
> 
> 
> 
> On Thu, Mar 6, 2014 at 3:07 PM, Jeremy Kepner <ke...@ll.mit.edu> wrote:
> There is one in D4M :-)
> 
> On Thu, Mar 06, 2014 at 04:48:32PM -0500, Morris, Jason wrote:
> > Hey everyone,
> >
> > I was wondering if anyone had a benchmark test for Accumulo? I could write a map/reduce job that creates a bunch of tables, maybe push some data, then drop them but I was wondering if anyone had something better?
> >
> > Thanks,
> > Jason
> 


Re: Benchmarking Accumulo

Posted by Arshak Navruzyan <ar...@gmail.com>.
The benchmark in the D4M paper is very helpful but perhaps you could
clarify a few things:

1.  The 4 million entries per second pertains to the main table only or the
main table, transpose and degree tables as well?
2.  Can  you share you accumulo-site.xml settings for the test?  In
particular the memory map size and compaction ratio settings.




On Thu, Mar 6, 2014 at 3:07 PM, Jeremy Kepner <ke...@ll.mit.edu> wrote:

> There is one in D4M :-)
>
> On Thu, Mar 06, 2014 at 04:48:32PM -0500, Morris, Jason wrote:
> > Hey everyone,
> >
> > I was wondering if anyone had a benchmark test for Accumulo? I could
> write a map/reduce job that creates a bunch of tables, maybe push some
> data, then drop them but I was wondering if anyone had something better?
> >
> > Thanks,
> > Jason
>

Re: Benchmarking Accumulo

Posted by "Morris, Jason" <ja...@amazon.com>.
Nice, thanks everyone!

On 3/6/14, 6:46 PM, "David Medinets" <da...@gmail.com> wrote:

>http://www.pdl.cmu.edu/PDL-FTP/Storage/socc2011.pdf - here is a white
>paper
>about YCSB++. You can also buy a performance white paper from IEEE via
>http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=6597155.
>
>
>On Thu, Mar 6, 2014 at 6:35 PM, Josh Elser <jo...@gmail.com> wrote:
>
>> There's also YCSB++
>>
>> http://www.cs.cmu.edu/~wtantisi/files/tablebenchmark-pdl11-talk.pdf
>>
>>
>> On 3/6/14, 6:07 PM, Jeremy Kepner wrote:
>>
>>> There is one in D4M :-)
>>>
>>> On Thu, Mar 06, 2014 at 04:48:32PM -0500, Morris, Jason wrote:
>>>
>>>> Hey everyone,
>>>>
>>>> I was wondering if anyone had a benchmark test for Accumulo? I could
>>>> write a map/reduce job that creates a bunch of tables, maybe push some
>>>> data, then drop them but I was wondering if anyone had something
>>>>better?
>>>>
>>>> Thanks,
>>>> Jason
>>>>
>>>


Re: Benchmarking Accumulo

Posted by David Medinets <da...@gmail.com>.
http://www.pdl.cmu.edu/PDL-FTP/Storage/socc2011.pdf - here is a white paper
about YCSB++. You can also buy a performance white paper from IEEE via
http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=6597155.


On Thu, Mar 6, 2014 at 6:35 PM, Josh Elser <jo...@gmail.com> wrote:

> There's also YCSB++
>
> http://www.cs.cmu.edu/~wtantisi/files/tablebenchmark-pdl11-talk.pdf
>
>
> On 3/6/14, 6:07 PM, Jeremy Kepner wrote:
>
>> There is one in D4M :-)
>>
>> On Thu, Mar 06, 2014 at 04:48:32PM -0500, Morris, Jason wrote:
>>
>>> Hey everyone,
>>>
>>> I was wondering if anyone had a benchmark test for Accumulo? I could
>>> write a map/reduce job that creates a bunch of tables, maybe push some
>>> data, then drop them but I was wondering if anyone had something better?
>>>
>>> Thanks,
>>> Jason
>>>
>>

Re: Benchmarking Accumulo

Posted by Josh Elser <jo...@gmail.com>.
There's also YCSB++

http://www.cs.cmu.edu/~wtantisi/files/tablebenchmark-pdl11-talk.pdf

On 3/6/14, 6:07 PM, Jeremy Kepner wrote:
> There is one in D4M :-)
>
> On Thu, Mar 06, 2014 at 04:48:32PM -0500, Morris, Jason wrote:
>> Hey everyone,
>>
>> I was wondering if anyone had a benchmark test for Accumulo? I could write a map/reduce job that creates a bunch of tables, maybe push some data, then drop them but I was wondering if anyone had something better?
>>
>> Thanks,
>> Jason

Re: Benchmarking Accumulo

Posted by Jeremy Kepner <ke...@ll.mit.edu>.
There is one in D4M :-)

On Thu, Mar 06, 2014 at 04:48:32PM -0500, Morris, Jason wrote:
> Hey everyone,
> 
> I was wondering if anyone had a benchmark test for Accumulo? I could write a map/reduce job that creates a bunch of tables, maybe push some data, then drop them but I was wondering if anyone had something better?
> 
> Thanks,
> Jason