You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@accumulo.apache.org by Dylan Hutchison <dh...@cs.washington.edu> on 2016/12/01 04:47:54 UTC
Accumulo Tip: Batch your Mutations
Hi folks,
I'd like to share a tip that ~doubled BatchWriter ingest performance in my
application.
When inserting multiple entries to the same Accumulo row, put them into the
same Mutation object. Add that one large Mutation to a BatchWriter rather
than an individual Mutation for each entry. The result reduces the amount
of data transferred.
The tip seems obvious enough, but hey, I used Accumulo for a couple years
without realizing it, so I thought y'all might benefit too.
Enjoy!
Dylan
Re: Accumulo Tip: Batch your Mutations
Posted by Keith Turner <ke...@deenlo.com>.
Also the native map is a Map<row,Map<column, val>> ... when doing
updates for a mutation, it gets the Map<column, val> once and uses
that. This can be much faster than a Map<Key, Value>, because for
Map<Key,Value> each insert may have to traverse a deeper tree than
inserting into Map<column, val>.
On Wed, Nov 30, 2016 at 11:47 PM, Dylan Hutchison
<dh...@cs.washington.edu> wrote:
> Hi folks,
>
> I'd like to share a tip that ~doubled BatchWriter ingest performance in my
> application.
>
> When inserting multiple entries to the same Accumulo row, put them into the
> same Mutation object. Add that one large Mutation to a BatchWriter rather
> than an individual Mutation for each entry. The result reduces the amount of
> data transferred.
>
> The tip seems obvious enough, but hey, I used Accumulo for a couple years
> without realizing it, so I thought y'all might benefit too.
>
> Enjoy!
> Dylan