You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@phoenix.apache.org by Flavio Pompermaier <po...@okkam.it> on 2017/12/05 08:49:42 UTC

Add automatic/default SALT

Hi to all,
as stated by at the documentation[1] "for optimal performance, number of
salt buckets should match number of region servers".
So, why not to add an option AUTO/DEFAULT for salting that defaults this
parameter to the number of region servers?
Otherwise I have to manually connect to HBase, retrieve that number and
pass to Phoenix...
What do you think?

[1] https://phoenix.apache.org/performance.html#Salting

Best,
Flavio

Re: Add automatic/default SALT

Posted by James Taylor <ja...@apache.org>.
There's some information in our Tuning Guide[1] on recommendations of when
to use or not use salted tables. We don't recommend it unless your table
has a monotonically increasing primary key. Understanding why is best
explained with an example. Let's say you have a table with SALT_BUCKETS=20.
When you execute a simple query against that table that might return 10
contiguous rows, you'll be executing 20 scans instead of just one. Each
scan will open a block on the region server - that's 20 block fetches
versus what would otherwise be a single block fetch (assuming that the 10
rows being returned are in the same block since they're contiguous). The
only time you're not hit with this 20x block fetch cost is if you're doing
a point lookup (as the client can precompute the salt byte in that case).

[1] https://phoenix.apache.org/tuning_guide.html

On Wed, Dec 27, 2017 at 3:26 PM, Flavio Pompermaier <po...@okkam.it>
wrote:

> Hi Josh,
> Thanks for the feedback. Do you have any concrete example where salted
> tables are 'evil'? However I really like the idea to enable salting using
> some predefined variable (like number of region servers or something like
> that).
> An example could be:
>
> SALT_BUCKETS = $REGION_SERVERS_COUNT
>
> Best,
> Flavio
>
>
> On 12 Dec 2017 01:45, "Josh Elser" <el...@apache.org> wrote:
>
> I'm a little hesitant of this for a few things I've noticed from lots of
> various installations:
>
> * Salted tables are *not* always more efficient. In fact, I've found
> myself giving advice to not use salted tables a bit more than expected.
> Certain kinds of queries will require much more work if you have salting
> over not having salting
>
> * Considering salt buckets as a measure of parallelism for a table, it's
> impossible for the system to correctly judge what the parallelism of the
> cluster should be. For example, with 10 RS and 1 Phoenix table, you would
> want to start with 10 salt buckets. However, with 10 RS and 100 Phoenix
> tables, you'd *maybe* want to do 3 salt buckets. It's hard to make system
> wide decisions correctly without a global view of the entire system.
>
> I think James was trying to capture some of this in his use of "relative
> conservative default", but I'd take that even a bit farther to say I
> consider it harmful for Phoenix to do that out of the box.
>
> However, I would flip the question upside down instead: what kind of
> suggestions can Phoenix make as a database to the user to _recommend_ to
> them that they enable salting on a table given its schema and important
> queries?
>
>
> On 12/8/17 12:34 PM, James Taylor wrote:
>
>> Hi Flavio,
>> I like the idea of “adaptable configuration” where you specify a config
>> value as a % of some cluster resource (with relatively conservative
>> defaults). Salting is somewhat of a gray area though as it’s not config
>> based, but driven by your DDL. One solution you could implement on top of
>> Phoenix is scripting for DDL that fills in the salt bucket parameter based
>> on cluster size.
>> Thanks,
>> James
>>
>> On Tue, Dec 5, 2017 at 12:50 AM Flavio Pompermaier <pompermaier@okkam.it
>> <ma...@okkam.it>> wrote:
>>
>>     Hi to all,
>>     as stated by at the documentation[1] "for optimal performance,
>>     number of salt buckets should match number of region servers".
>>     So, why not to add an option AUTO/DEFAULT for salting that defaults
>>     this parameter to the number of region servers?
>>     Otherwise I have to manually connect to HBase, retrieve that number
>>     and pass to Phoenix...
>>     What do you think?
>>
>>     [1] https://phoenix.apache.org/performance.html#Salting
>>
>>     Best,
>>     Flavio
>>
>>
>

Re: Add automatic/default SALT

Posted by Flavio Pompermaier <po...@okkam.it>.
Hi Josh,
Thanks for the feedback. Do you have any concrete example where salted
tables are 'evil'? However I really like the idea to enable salting using
some predefined variable (like number of region servers or something like
that).
An example could be:

SALT_BUCKETS = $REGION_SERVERS_COUNT

Best,
Flavio

On 12 Dec 2017 01:45, "Josh Elser" <el...@apache.org> wrote:

I'm a little hesitant of this for a few things I've noticed from lots of
various installations:

* Salted tables are *not* always more efficient. In fact, I've found myself
giving advice to not use salted tables a bit more than expected. Certain
kinds of queries will require much more work if you have salting over not
having salting

* Considering salt buckets as a measure of parallelism for a table, it's
impossible for the system to correctly judge what the parallelism of the
cluster should be. For example, with 10 RS and 1 Phoenix table, you would
want to start with 10 salt buckets. However, with 10 RS and 100 Phoenix
tables, you'd *maybe* want to do 3 salt buckets. It's hard to make system
wide decisions correctly without a global view of the entire system.

I think James was trying to capture some of this in his use of "relative
conservative default", but I'd take that even a bit farther to say I
consider it harmful for Phoenix to do that out of the box.

However, I would flip the question upside down instead: what kind of
suggestions can Phoenix make as a database to the user to _recommend_ to
them that they enable salting on a table given its schema and important
queries?


On 12/8/17 12:34 PM, James Taylor wrote:

> Hi Flavio,
> I like the idea of “adaptable configuration” where you specify a config
> value as a % of some cluster resource (with relatively conservative
> defaults). Salting is somewhat of a gray area though as it’s not config
> based, but driven by your DDL. One solution you could implement on top of
> Phoenix is scripting for DDL that fills in the salt bucket parameter based
> on cluster size.
> Thanks,
> James
>
> On Tue, Dec 5, 2017 at 12:50 AM Flavio Pompermaier <pompermaier@okkam.it
> <ma...@okkam.it>> wrote:
>
>     Hi to all,
>     as stated by at the documentation[1] "for optimal performance,
>     number of salt buckets should match number of region servers".
>     So, why not to add an option AUTO/DEFAULT for salting that defaults
>     this parameter to the number of region servers?
>     Otherwise I have to manually connect to HBase, retrieve that number
>     and pass to Phoenix...
>     What do you think?
>
>     [1] https://phoenix.apache.org/performance.html#Salting
>
>     Best,
>     Flavio
>
>

Re: Add automatic/default SALT

Posted by Josh Elser <el...@apache.org>.
I'm a little hesitant of this for a few things I've noticed from lots of 
various installations:

* Salted tables are *not* always more efficient. In fact, I've found 
myself giving advice to not use salted tables a bit more than expected. 
Certain kinds of queries will require much more work if you have salting 
over not having salting

* Considering salt buckets as a measure of parallelism for a table, it's 
impossible for the system to correctly judge what the parallelism of the 
cluster should be. For example, with 10 RS and 1 Phoenix table, you 
would want to start with 10 salt buckets. However, with 10 RS and 100 
Phoenix tables, you'd *maybe* want to do 3 salt buckets. It's hard to 
make system wide decisions correctly without a global view of the entire 
system.

I think James was trying to capture some of this in his use of "relative 
conservative default", but I'd take that even a bit farther to say I 
consider it harmful for Phoenix to do that out of the box.

However, I would flip the question upside down instead: what kind of 
suggestions can Phoenix make as a database to the user to _recommend_ to 
them that they enable salting on a table given its schema and important 
queries?

On 12/8/17 12:34 PM, James Taylor wrote:
> Hi Flavio,
> I like the idea of “adaptable configuration” where you specify a config 
> value as a % of some cluster resource (with relatively conservative 
> defaults). Salting is somewhat of a gray area though as it’s not config 
> based, but driven by your DDL. One solution you could implement on top 
> of Phoenix is scripting for DDL that fills in the salt bucket parameter 
> based on cluster size.
> Thanks,
> James
> 
> On Tue, Dec 5, 2017 at 12:50 AM Flavio Pompermaier <pompermaier@okkam.it 
> <ma...@okkam.it>> wrote:
> 
>     Hi to all,
>     as stated by at the documentation[1] "for optimal performance,
>     number of salt buckets should match number of region servers".
>     So, why not to add an option AUTO/DEFAULT for salting that defaults
>     this parameter to the number of region servers?
>     Otherwise I have to manually connect to HBase, retrieve that number
>     and pass to Phoenix...
>     What do you think?
> 
>     [1] https://phoenix.apache.org/performance.html#Salting
> 
>     Best,
>     Flavio
> 

Re: Add automatic/default SALT

Posted by James Taylor <ja...@apache.org>.
Hi Flavio,
I like the idea of “adaptable configuration” where you specify a config
value as a % of some cluster resource (with relatively conservative
defaults). Salting is somewhat of a gray area though as it’s not config
based, but driven by your DDL. One solution you could implement on top of
Phoenix is scripting for DDL that fills in the salt bucket parameter based
on cluster size.
Thanks,
James

On Tue, Dec 5, 2017 at 12:50 AM Flavio Pompermaier <po...@okkam.it>
wrote:

> Hi to all,
> as stated by at the documentation[1] "for optimal performance, number of
> salt buckets should match number of region servers".
> So, why not to add an option AUTO/DEFAULT for salting that defaults this
> parameter to the number of region servers?
> Otherwise I have to manually connect to HBase, retrieve that number and
> pass to Phoenix...
> What do you think?
>
> [1] https://phoenix.apache.org/performance.html#Salting
>
> Best,
> Flavio
>