You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@phoenix.apache.org by Marcell Ortutay <mo...@23andme.com> on 2018/02/15 19:25:16 UTC

Changing number of salt buckets for a table

I have a phoenix table that is about 7 TB (unreplicated) in size,
corresponding to about 500B rows. It was set up a couple years ago, and we
have determined that the number of salt buckets it has is not optimal for
the current query pattern we are seeing. I want to change the number of
salt buckets as I expect it will improve performance.

I have written a MapReduce job that does this using a subclass of
TableMapper. It scans the entire old table, and writes the re-salted data
to a new table. The MapReduce job works on small tables, but I'm having
trouble getting it to run on the larger table.

I have two questions for anyone who has experience with this:

(1) Are there any publicly available MapReduce jobs for re-salting a
Phoenix table?

(2) Generally, is there a better approach than MapReduce to re-salt a
Phoenix table?

Thanks,
Marcell Ortutay

Re: Changing number of salt buckets for a table

Posted by Sergey Soldatov <se...@gmail.com>.
Well, there is no easy way to resalt the table. The main problem that when
salting byte is calculated, the number of buckets is used. So if we want to
change the number of buckets, all rowkeys should be rewritten. I think that
you still can use MR job for that, but I would recommend to write data to
hfiles instead of using upserts. How it can be implemented you may find in
CSV bulkload tool sources.

Thanks,
Sergey

On Thu, Feb 15, 2018 at 11:25 AM, Marcell Ortutay <mo...@23andme.com>
wrote:

> I have a phoenix table that is about 7 TB (unreplicated) in size,
> corresponding to about 500B rows. It was set up a couple years ago, and we
> have determined that the number of salt buckets it has is not optimal for
> the current query pattern we are seeing. I want to change the number of
> salt buckets as I expect it will improve performance.
>
> I have written a MapReduce job that does this using a subclass of
> TableMapper. It scans the entire old table, and writes the re-salted data
> to a new table. The MapReduce job works on small tables, but I'm having
> trouble getting it to run on the larger table.
>
> I have two questions for anyone who has experience with this:
>
> (1) Are there any publicly available MapReduce jobs for re-salting a
> Phoenix table?
>
> (2) Generally, is there a better approach than MapReduce to re-salt a
> Phoenix table?
>
> Thanks,
> Marcell Ortutay
>
>