You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@accumulo.apache.org by Andrew Hulbert <ah...@ccri.com> on 2015/10/14 18:56:42 UTC
Tweaking non-bulk Ingest Performance
Hi all,
I've been attempting to improve a streaming ingest client into Accumulo
and have been playing with a few of the following settings:
tserver.memory.maps.max (and in tandem
table.compaction.minor.logs.threshold and tserver.wal.blocksize)
tserver.mutation.queue.max
In one set of tests i stood up ~200 batch writers and wrote approx 250M
tweets into a couple of different index schemas. What I've noticed is
that increasing the tserver.memory.maps.max from 1G to 2G or 4G actually
slows down my ingest rate. Cutting it to 512M forced lots of compactions
and high server load but a faster ingest.
I attached a screen shot of the two ingests (the
tserver.mutation.queue.max=4G in green) (33 nodes, -Xmx26G, 8 CPU, 4 SSDs)
My question is whether anyone has done any performance tweaking for
non-bulk ingest on a cluster and understands why that'd be the case?
I've read through all the docs/etc but haven't found a consistent
methodology for tweaking params...so I was wondering if anyone else had
attempted to tune a cluster like this.
Thanks for any ideas!
Andrew
Re: Tweaking non-bulk Ingest Performance
Posted by Eric Newton <er...@gmail.com>.
What version of accumulo?
Make sure you don't have any hotspots. For example, if you have data
ordered by time, that may cause one tablet to be much busier than the
others.
Pre-split your table(s) so that you have 20-80 tablets per tserver that you
will be ingesting into.
Use multiple writers per node (or increase the number of batchwriter
threads).
If you are using 1.7, decrease the Durability setting on your table. That
may depend on your needs, of course.
Likewise, you can decrease the WAL replication down to 2, if you are
comfortable with that.
If you have multiple updates for the same row, make sure the column updates
are in the same mutation.
You should see about 100K ingest (for small updates) per node, per second,
sustained.
-Eric
On Wed, Oct 14, 2015 at 12:56 PM, Andrew Hulbert <ah...@ccri.com> wrote:
> Hi all,
>
> I've been attempting to improve a streaming ingest client into Accumulo
> and have been playing with a few of the following settings:
>
> tserver.memory.maps.max (and in tandem
> table.compaction.minor.logs.threshold and tserver.wal.blocksize)
> tserver.mutation.queue.max
>
> In one set of tests i stood up ~200 batch writers and wrote approx 250M
> tweets into a couple of different index schemas. What I've noticed is that
> increasing the tserver.memory.maps.max from 1G to 2G or 4G actually slows
> down my ingest rate. Cutting it to 512M forced lots of compactions and high
> server load but a faster ingest.
>
> I attached a screen shot of the two ingests (the
> tserver.mutation.queue.max=4G in green) (33 nodes, -Xmx26G, 8 CPU, 4 SSDs)
>
> My question is whether anyone has done any performance tweaking for
> non-bulk ingest on a cluster and understands why that'd be the case? I've
> read through all the docs/etc but haven't found a consistent methodology
> for tweaking params...so I was wondering if anyone else had attempted to
> tune a cluster like this.
>
> Thanks for any ideas!
>
> Andrew
>