You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@hbase.apache.org by Andrew Purtell <ap...@apache.org> on 2009/06/28 01:02:26 UTC

interesting informal test results

Test:

- Latest trunk.

- Config modified only with a store file split threshold of 1GB

- 4 node testbed:
    1) namenode, datanode, hmaster, heritrix, jobtracker
    2) datanode, regionserver, heritrix, tasktracker, mapper (2)
    3) datanode, regionserver, heritrix, tasktracker, mapper (2)
    4) datanode, regionserver, heritrix, tasktracker, mapper (2)

- 100 heritrix threads - 4 hosts, 25 threads each - feeding in ~5MB/sec average new edits

- 2 mappers x 3 hosts processing new edits and writing back serialized/compressed Documents

- 3K average transactions/sec reported by master

- 'hadoop balancer -threshold 0.1'

- 1 hour test run

Result:

Passed with no incidents!

   - Andy

Re: interesting informal test results

Posted by stack <st...@duboce.net>.

Thats good news Andrew (Thats a great test too -- heavy upload with
concurrent read/writes to same table with blocks being moved around
underneath it all).
St.Ack

On Sat, Jun 27, 2009 at 4:02 PM, Andrew Purtell <ap...@apache.org> wrote:

> Test:
>
> - Latest trunk.
>
> - Config modified only with a store file split threshold of 1GB
>
> - 4 node testbed:
>    1) namenode, datanode, hmaster, heritrix, jobtracker
>    2) datanode, regionserver, heritrix, tasktracker, mapper (2)
>    3) datanode, regionserver, heritrix, tasktracker, mapper (2)
>    4) datanode, regionserver, heritrix, tasktracker, mapper (2)
>
> - 100 heritrix threads - 4 hosts, 25 threads each - feeding in ~5MB/sec
> average new edits
>
> - 2 mappers x 3 hosts processing new edits and writing back
> serialized/compressed Documents
>
> - 3K average transactions/sec reported by master
>
> - 'hadoop balancer -threshold 0.1'
>
> - 1 hour test run
>
> Result:
>
> Passed with no incidents!
>
>   - Andy
>
>
>

Re: interesting informal test results

Posted by Jean-Daniel Cryans <jd...@apache.org>.

Sorry for being late in the conversation, I was away from the tubes
for some time...

So currently the batching is done by region, not region server. When I
implemented the write buffer, I saw that we can save a lot in the
first few RPCs we cut but then it stays pretty much stable (see the
graph in the jira about that). So grouping by region server would be a
bit more complicated IMO and it wouldn't save much... if not nothing
at all.

J-D

On Sun, Jun 28, 2009 at 4:30 AM, Ryan Rawson<ry...@gmail.com> wrote:
> Regionserver...could it be any other way?
>
> Group commit, aka commit buffer is to achieve maximal write performance.
>
> On Jun 28, 2009 12:54 AM, "Joey Echeverria"
> <jo...@gmail.com>>
> wrote:
>
> When the client does group commits does it group by row key or region
> server?
>
> On Sun, Jun 28, 2009 at 12:08 AM, Ryan Rawson<ry...@gmail.com> wrote: > I
> imported 9b rows in 5 ...
>

Re: interesting informal test results

Posted by Ryan Rawson <ry...@gmail.com>.

Regionserver...could it be any other way?

Group commit, aka commit buffer is to achieve maximal write performance.

On Jun 28, 2009 12:54 AM, "Joey Echeverria"
<jo...@gmail.com>>
wrote:

When the client does group commits does it group by row key or region
server?

On Sun, Jun 28, 2009 at 12:08 AM, Ryan Rawson<ry...@gmail.com> wrote: > I
imported 9b rows in 5 ...

Re: interesting informal test results

Posted by Joey Echeverria <jo...@gmail.com>.

When the client does group commits does it group by row key or region server?

On Sun, Jun 28, 2009 at 12:08 AM, Ryan Rawson<ry...@gmail.com> wrote:
> I imported 9b rows in 5 days or so, a few minor crashes, average speed
> between 50-200 k ops/sec.  The client needs some love to make it more
> efficient on grouping commits during bulk upload.
>
> On Jun 27, 2009 4:02 PM, "Andrew Purtell" <ap...@apache.org> wrote:
>
> Test:
>
> - Latest trunk.
>
> - Config modified only with a store file split threshold of 1GB
>
> - 4 node testbed:
>   1) namenode, datanode, hmaster, heritrix, jobtracker
>   2) datanode, regionserver, heritrix, tasktracker, mapper (2)
>   3) datanode, regionserver, heritrix, tasktracker, mapper (2)
>   4) datanode, regionserver, heritrix, tasktracker, mapper (2)
>
> - 100 heritrix threads - 4 hosts, 25 threads each - feeding in ~5MB/sec
> average new edits
>
> - 2 mappers x 3 hosts processing new edits and writing back
> serialized/compressed Documents
>
> - 3K average transactions/sec reported by master
>
> - 'hadoop balancer -threshold 0.1'
>
> - 1 hour test run
>
> Result:
>
> Passed with no incidents!
>
>  - Andy
>

Re: interesting informal test results

Posted by Ryan Rawson <ry...@gmail.com>.

I imported 9b rows in 5 days or so, a few minor crashes, average speed
between 50-200 k ops/sec.  The client needs some love to make it more
efficient on grouping commits during bulk upload.

On Jun 27, 2009 4:02 PM, "Andrew Purtell" <ap...@apache.org> wrote:

Test:

- Latest trunk.

- Config modified only with a store file split threshold of 1GB

- 4 node testbed:
   1) namenode, datanode, hmaster, heritrix, jobtracker
   2) datanode, regionserver, heritrix, tasktracker, mapper (2)
   3) datanode, regionserver, heritrix, tasktracker, mapper (2)
   4) datanode, regionserver, heritrix, tasktracker, mapper (2)

- 100 heritrix threads - 4 hosts, 25 threads each - feeding in ~5MB/sec
average new edits

- 2 mappers x 3 hosts processing new edits and writing back
serialized/compressed Documents

- 3K average transactions/sec reported by master

- 'hadoop balancer -threshold 0.1'

- 1 hour test run

Result:

Passed with no incidents!

  - Andy