You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@accumulo.apache.org by Dylan Hutchison <dh...@cs.washington.edu> on 2016/12/01 04:47:54 UTC

Accumulo Tip: Batch your Mutations

Hi folks,

I'd like to share a tip that ~doubled BatchWriter ingest performance in my
application.

When inserting multiple entries to the same Accumulo row, put them into the
same Mutation object.  Add that one large Mutation to a BatchWriter rather
than an individual Mutation for each entry. The result reduces the amount
of data transferred.

The tip seems obvious enough, but hey, I used Accumulo for a couple years
without realizing it, so I thought y'all might benefit too.

Enjoy!
Dylan

Re: Accumulo Tip: Batch your Mutations

Posted by Keith Turner <ke...@deenlo.com>.
Also the native map is a Map<row,Map<column, val>> ... when doing
updates for a mutation, it gets the Map<column, val> once and uses
that.    This can be much faster than a Map<Key, Value>, because for
Map<Key,Value> each insert may have to traverse a deeper tree than
inserting into Map<column, val>.

On Wed, Nov 30, 2016 at 11:47 PM, Dylan Hutchison
<dh...@cs.washington.edu> wrote:
> Hi folks,
>
> I'd like to share a tip that ~doubled BatchWriter ingest performance in my
> application.
>
> When inserting multiple entries to the same Accumulo row, put them into the
> same Mutation object.  Add that one large Mutation to a BatchWriter rather
> than an individual Mutation for each entry. The result reduces the amount of
> data transferred.
>
> The tip seems obvious enough, but hey, I used Accumulo for a couple years
> without realizing it, so I thought y'all might benefit too.
>
> Enjoy!
> Dylan