You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@accumulo.apache.org by Andrew Hulbert <ah...@ccri.com> on 2015/10/14 18:56:42 UTC

Tweaking non-bulk Ingest Performance

Hi all,

I've been attempting to improve a streaming ingest client into Accumulo 
and have been playing with a few of the following settings:

tserver.memory.maps.max (and in tandem 
table.compaction.minor.logs.threshold and tserver.wal.blocksize)
tserver.mutation.queue.max

In one set of tests i stood up ~200 batch writers and wrote approx 250M 
tweets into a couple of different index schemas. What I've noticed is 
that increasing the tserver.memory.maps.max from 1G to 2G or 4G actually 
slows down my ingest rate. Cutting it to 512M forced lots of compactions 
and high server load but a faster ingest.

I attached a screen shot of the two ingests (the 
tserver.mutation.queue.max=4G in green) (33 nodes, -Xmx26G, 8 CPU, 4 SSDs)

My question is whether anyone has done any performance tweaking for 
non-bulk ingest on a cluster and understands why that'd be the case? 
I've read through all the docs/etc but haven't found a consistent 
methodology for tweaking params...so I was wondering if anyone else had 
attempted to tune a cluster like this.

Thanks for any ideas!

Andrew

Re: Tweaking non-bulk Ingest Performance

Posted by Eric Newton <er...@gmail.com>.
What version of accumulo?

Make sure you don't have any hotspots. For example, if you have data
ordered by time, that may cause one tablet to be much busier than the
others.

Pre-split your table(s) so that you have 20-80 tablets per tserver that you
will be ingesting into.

Use multiple writers per node (or increase the number of batchwriter
threads).

If you are using 1.7, decrease the Durability setting on your table.  That
may depend on your needs, of course.
Likewise, you can decrease the WAL replication down to 2, if you are
comfortable with that.

If you have multiple updates for the same row, make sure the column updates
are in the same mutation.

You should see about 100K ingest (for small updates) per node, per second,
sustained.

-Eric


On Wed, Oct 14, 2015 at 12:56 PM, Andrew Hulbert <ah...@ccri.com> wrote:

> Hi all,
>
> I've been attempting to improve a streaming ingest client into Accumulo
> and have been playing with a few of the following settings:
>
> tserver.memory.maps.max (and in tandem
> table.compaction.minor.logs.threshold and tserver.wal.blocksize)
> tserver.mutation.queue.max
>
> In one set of tests i stood up ~200 batch writers and wrote approx 250M
> tweets into a couple of different index schemas. What I've noticed is that
> increasing the tserver.memory.maps.max from 1G to 2G or 4G actually slows
> down my ingest rate. Cutting it to 512M forced lots of compactions and high
> server load but a faster ingest.
>
> I attached a screen shot of the two ingests (the
> tserver.mutation.queue.max=4G in green) (33 nodes, -Xmx26G, 8 CPU, 4 SSDs)
>
> My question is whether anyone has done any performance tweaking for
> non-bulk ingest on a cluster and understands why that'd be the case? I've
> read through all the docs/etc but haven't found a consistent methodology
> for tweaking params...so I was wondering if anyone else had attempted to
> tune a cluster like this.
>
> Thanks for any ideas!
>
> Andrew
>