You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@phoenix.apache.org by "Riesland, Zack" <Za...@sensus.com> on 2016/08/03 19:45:38 UTC

Guidance to improve upsert performance

Hello,

I'm working on a POC to use HBase + Phoenix as a DB layer for a system that consumes several thousand (10,000 to 40,000) messages per second.

Our cluster is fairly small: 4 region servers supporting about a dozen tables. We are currently experimenting with salting - our first pass was 4 regions.

The ultimate data size is also pretty small. The data compacts very nicely and after aggregation and de-duplication, it is only on the order of 10's of GB.

Querying these tables is reasonably performant right now, but upserting the data is not optimal and I'm looking for some performance tips.

As I said, the incoming data is streamed (via storm), at a rate of thousands of messages per second.

After some basic benchmarking, it appears that Storm is able to consume the data much more quickly than it can upsert it to phoenix.

I understand that Phoenix is fundamentally designed for fast querying, and not necessarily fast writing. But can anyone suggest some Phoenix and/or hbase parameters we should consider tuning to improve performance? Any tips on designing something like this?

Also, we have 3 additional indexes, in addition to the primary key. I'm guessing that this creates a significant amount of overhead in terms of writing data. But the indexes are necessary for query performance. Is it possible to force the index maintenance to behave in more of a batch pattern? Maybe only update the index tables every X minutes? Even twice a day?

Thanks in advance for any tips!



Re: Guidance to improve upsert performance

Posted by James Taylor <ja...@apache.org>.
Hi Zach,
Here are some things to try:
- make sure you're batching your upserts by turning off auto commit. Maybe
start with a batch size of 1000 and commit when you reach this.
- to reduce RPC traffic, set the UPDATE_CACHE_FREQUENCY (4.7 or above) on
your table and indexes when you create them (or issue an ALTER TABLE/INDEX
call. See https://phoenix.apache.org/#Altering.
- if using 4.8, consider using local indexes to minimize the write time. In
this case, the writes for the secondary index will be to the same region
server as your base table. Otherwise, you're essentially tripling the cost
of a write (no wonder it can't keep up). You'll take a perf hit on the read
side, though, so make sure you quantify both write speed improvement and
read speed reduction.
- an alternate approach may be to disable the indexes and rebuild them
asynchronously at some interval. There's no partial index rebuild
currently, though, so when you enable them again, the entire index would
get built again.
- another idea might be to build HFiles in your storm processing for tables
and indexes. Then you can hand these off to HBase at some regular interval
in mass.

HTH.

Thanks,
James

On Wed, Aug 3, 2016 at 12:45 PM, Riesland, Zack <Za...@sensus.com>
wrote:

> Hello,
>
>
>
> I’m working on a POC to use HBase + Phoenix as a DB layer for a system
> that consumes several thousand (10,000 to 40,000) messages per second.
>
>
>
> Our cluster is fairly small: 4 region servers supporting about a dozen
> tables. We are currently experimenting with salting – our first pass was 4
> regions.
>
>
>
> The ultimate data size is also pretty small. The data compacts very nicely
> and after aggregation and de-duplication, it is only on the order of 10’s
> of GB.
>
>
>
> Querying these tables is reasonably performant right now, but upserting
> the data is not optimal and I’m looking for some performance tips.
>
>
>
> As I said, the incoming data is streamed (via storm), at a rate of
> thousands of messages per second.
>
>
>
> After some basic benchmarking, it appears that Storm is able to consume
> the data much more quickly than it can upsert it to phoenix.
>
>
>
> I understand that Phoenix is fundamentally designed for fast querying, and
> not necessarily fast writing. But can anyone suggest some Phoenix and/or
> hbase parameters we should consider tuning to improve performance? Any tips
> on designing something like this?
>
>
>
> Also, we have 3 additional indexes, in addition to the primary key. I’m
> guessing that this creates a significant amount of overhead in terms of
> writing data. But the indexes are necessary for query performance. Is it
> possible to force the index maintenance to behave in more of a batch
> pattern? Maybe only update the index tables every X minutes? Even twice a
> day?
>
>
>
> Thanks in advance for any tips!
>
>
>
>
>