You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@phoenix.apache.org by Sumit Nigam <su...@yahoo.com> on 2015/10/07 09:11:57 UTC

Salting and pre-splitting

Hi,
I am somewhat confused by salting and pre-splitting. Would be grateful if any of you can clarify the following:
1. Do I need to use pre-splitting along with salting to take advantage of performance? Or I can still have single region server hot-spotting until I have enough regions to split into 2?2. Is it true that SALT_BUCKETS should be set to (number of region servers) * (number of cores per region server) ?3. I cannot modify salt buckets after table is created. If so, what happens when I add a new region server to the mix?4. Is number of buckets = number of task splits that Phoenix InputFormat uses?5. Does salting create a hex rowkey as is recommended?6. With salting, can I still perform range scans with LIMIT clause?
Thanks,Sumit

Re: Salting and pre-splitting

Posted by Samarth Jain <sa...@apache.org>.

- Default value of phoenix.query.rowKeyOrderSaltedTable is true and that
ensure that LIMIT clause returns data in rowkey order

This is no longer the case starting Phoenix 4.4. You need to provide an
explicit ORDER BY on row key columns if you need the rows to be returned in
row key order.

On Wed, Oct 7, 2015 at 9:59 AM, Ravi Kiran <ma...@gmail.com>
wrote:

> Hi Sumit,
>
>  The PhoenixInputFormat gets the number of splits based on the region
> boundaries .  However, if guideposts are configured(
> https://phoenix.apache.org/update_statistics.html) you might not see a 1
> to 1 mapping. @James please correct me if I am wrong here.
>
>    You are right on the salting behavior.
>
> Regards
> Ravi
>
> On Wed, Oct 7, 2015 at 2:03 AM, Sumit Nigam <su...@yahoo.com> wrote:
>
>> I did some homework and got some answers. Now open questions that remain:
>>
>> 1. Is number of buckets = number of task splits that Phoenix InputFormat
>> uses?
>> 2. Salting uses the first byte of stable hash of rowkey and it is this
>> byte that is prefixed. Is this correct?
>>
>> Answers, I could get:
>>
>> 1. Pre-splitting is not needed with salting. Salting anyway, pre-splits
>> at salt byte boundary.
>> 2. SALT_BUCKETS can be set to a higher value than region servers for
>> future.
>> 3. Adding a new region server does not matter to existing records as the
>> mod is with SALT_BUCKETS and not region servers
>> 4. Default value of phoenix.query.rowKeyOrderSaltedTable is true and that
>> ensure that LIMIT clause returns data in rowkey order
>>
>> Thanks,
>> Sumit
>>
>> ------------------------------
>> *From:* Sumit Nigam <su...@yahoo.com>
>> *To:* Users Mail List Phoenix <us...@phoenix.apache.org>
>> *Sent:* Wednesday, October 7, 2015 12:41 PM
>> *Subject:* Salting and pre-splitting
>>
>> Hi,
>>
>> I am somewhat confused by salting and pre-splitting. Would be grateful if
>> any of you can clarify the following:
>>
>> 1. Do I need to use pre-splitting along with salting to take advantage of
>> performance? Or I can still have single region server hot-spotting until I
>> have enough regions to split into 2?
>> 2. Is it true that SALT_BUCKETS should be set to (number of region
>> servers) * (number of cores per region server) ?
>> 3. I cannot modify salt buckets after table is created. If so, what
>> happens when I add a new region server to the mix?
>> 4. Is number of buckets = number of task splits that Phoenix InputFormat
>> uses?
>> 5. Does salting create a hex rowkey as is recommended?
>> 6. With salting, can I still perform range scans with LIMIT clause?
>>
>> Thanks,
>> Sumit
>>
>>
>>
>

Salting and pre-splitting

Posted by Sumit Nigam <su...@yahoo.com>.

Hi,
I am somewhat confused by salting and pre-splitting. Would be grateful if any of you can clarify the following:
1. Do I need to use pre-splitting along with salting to take advantage of performance? Or I can still have single region server hot-spotting until I have enough regions to split into 2?2. Is it true that SALT_BUCKETS should be set to (number of region servers) * (number of cores per region server) ?3. I cannot modify salt buckets after table is created. If so, what happens when I add a new region server to the mix?4. Is number of buckets = number of task splits that Phoenix InputFormat uses?5. Does salting create a hex rowkey as is recommended?6. With salting, can I still perform range scans with LIMIT clause?
Thanks,Sumit

Re: Salting and pre-splitting

Posted by Ravi Kiran <ma...@gmail.com>.

Hi Sumit,

 The PhoenixInputFormat gets the number of splits based on the region
boundaries .  However, if guideposts are configured(
https://phoenix.apache.org/update_statistics.html) you might not see a 1 to
1 mapping. @James please correct me if I am wrong here.

   You are right on the salting behavior.

Regards
Ravi

On Wed, Oct 7, 2015 at 2:03 AM, Sumit Nigam <su...@yahoo.com> wrote:

> I did some homework and got some answers. Now open questions that remain:
>
> 1. Is number of buckets = number of task splits that Phoenix InputFormat
> uses?
> 2. Salting uses the first byte of stable hash of rowkey and it is this
> byte that is prefixed. Is this correct?
>
> Answers, I could get:
>
> 1. Pre-splitting is not needed with salting. Salting anyway, pre-splits at
> salt byte boundary.
> 2. SALT_BUCKETS can be set to a higher value than region servers for
> future.
> 3. Adding a new region server does not matter to existing records as the
> mod is with SALT_BUCKETS and not region servers
> 4. Default value of phoenix.query.rowKeyOrderSaltedTable is true and that
> ensure that LIMIT clause returns data in rowkey order
>
> Thanks,
> Sumit
>
> ------------------------------
> *From:* Sumit Nigam <su...@yahoo.com>
> *To:* Users Mail List Phoenix <us...@phoenix.apache.org>
> *Sent:* Wednesday, October 7, 2015 12:41 PM
> *Subject:* Salting and pre-splitting
>
> Hi,
>
> I am somewhat confused by salting and pre-splitting. Would be grateful if
> any of you can clarify the following:
>
> 1. Do I need to use pre-splitting along with salting to take advantage of
> performance? Or I can still have single region server hot-spotting until I
> have enough regions to split into 2?
> 2. Is it true that SALT_BUCKETS should be set to (number of region
> servers) * (number of cores per region server) ?
> 3. I cannot modify salt buckets after table is created. If so, what
> happens when I add a new region server to the mix?
> 4. Is number of buckets = number of task splits that Phoenix InputFormat
> uses?
> 5. Does salting create a hex rowkey as is recommended?
> 6. With salting, can I still perform range scans with LIMIT clause?
>
> Thanks,
> Sumit
>
>
>