You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@phoenix.apache.org by Simon Wang <si...@airbnb.com> on 2016/07/08 18:59:27 UTC

Index tables at scale

Hi all,

I am writing to ask if there is a way to let Phoenix store all indexes on a single table in the same HBase table. If each index must be stored in a separate table, creating more than a few indexes on table with a large number of regions will not scale well.

From what I have learned, when Phoenix builds indexes on a view, it stores all indexes in a table associated with the underlying table of the view. e.g. if V1 is a view of T1, all indexes on V1 will be stored in _IDX_T1. It would be great if this behavior can be optionally turned on for indexes on tables. 

Best,
Simon

Re: Index tables at scale

Posted by James Taylor <ja...@apache.org>.
Makes sense now, Simon. I'd recommend you only use salting if your primary
key is monotonically increasing to prevent write hot spotting. Also, did
you know you can turn off salting for an index by adding a SALT_BUCKETS=0
to your CREATE INDEX statement? If the PK on the data table is
monotonically increasing, then usually your index won't be. In the same
way, if a data table is *not* salted, but you need to make your index
salted, then you can do so by adding a SALT_BUCKETS=<n> property to your
CREATE INDEX statement.

Please file a JIRA for combining global index tables into a single HBase
table.

Thanks,
James

On Tue, Jul 12, 2016 at 2:11 AM, Simon Wang <si...@airbnb.com> wrote:

> Thanks Mujtaba. This is good to know. It is possible manipulate the key
> bit to avoid the hot-spotting, so we are probably trying unsalted table
> out.
>
> Still, it would be nice if combined indexes in a single table is possible.
>
>
> On Jul 11, 2016, at 2:41 PM, Mujtaba Chohan <mu...@apache.org> wrote:
>
> FYI if you keys are not written in order i.e. you are not concerned about
> write hot-spotting/write throughput then try writing your data to an
> un-salted table. Read performance for un-salted table can be comparable or
> better to salted one with stats
> <https://phoenix.apache.org/update_statistics.html>.
>
> On Mon, Jul 11, 2016 at 2:31 PM, Simon Wang <si...@airbnb.com> wrote:
>
>> This indexes will be salted indeed. (so is the data table). If all
>> indexes reside in the same table, there will be only 512 regions in total
>> (256 for data table, 256 for the combined index table). Indeed the combined
>> index table will be 12x large as a single index table. But it doesn’t cover
>> all columns so it should be fine.
>>
>> On Jul 11, 2016, at 2:26 PM, James Taylor <ja...@apache.org> wrote:
>>
>> Will the index be salted (and that's why it's 256 regions per table)? If
>> not, how many regions would there be if all indexes are in the same table
>> (assuming the table is 12x bigger than one index table)?
>>
>> On Monday, July 11, 2016, Simon Wang <si...@airbnb.com> wrote:
>>
>>> Thanks, Mujtaba. What you wrote is exactly what I meant. While not all
>>> our tables needs these many regions and indexes, the num of regions/region
>>> server can grow quickly.
>>>
>>> -Simon
>>>
>>> On Jul 11, 2016, at 2:17 PM, Mujtaba Chohan <mu...@apache.org> wrote:
>>>
>>> 12 index tables * 256 region per table = ~3K regions for index tables
>>> assuming we are talking of covered index which implies 200+ regions/region
>>> server on a 15 node cluster.
>>>
>>> On Mon, Jul 11, 2016 at 1:58 PM, James Taylor <ja...@apache.org>
>>> wrote:
>>>
>>>> Hi Simon,
>>>>
>>>> I might be missing something, but with 12 separate index tables or 1
>>>> index table, the amount of data will be the same. Won't there be the same
>>>> number of regions either way?
>>>>
>>>> Thanks,
>>>> James
>>>>
>>>> On Sun, Jul 10, 2016 at 10:50 PM, Simon Wang <si...@airbnb.com>
>>>> wrote:
>>>>
>>>>> Hi James,
>>>>>
>>>>> Thanks for the response.
>>>>>
>>>>> In our use case, there is a 256 region table, and we want to build ~12
>>>>> indexes on it. We have 15 region servers. If each index is in its own
>>>>> table, that would be a total of 221 regions per region server of this
>>>>> single table. I think the extra write time cost is okay. But the number of
>>>>> regions is too high for us.
>>>>>
>>>>> Best,
>>>>> Simon
>>>>>
>>>>>
>>>>> On Jul 9, 2016, at 1:18 AM, James Taylor <ja...@apache.org>
>>>>> wrote:
>>>>>
>>>>> Hi Simon,
>>>>> The reason we've taken this approach with views is that it's possible
>>>>> with multi-tenancy that the number of views would grow unbounded since you
>>>>> might end up with a view per tenant (100K or 1M views or more - clearly too
>>>>> many for HBase to handle as separate tables).
>>>>>
>>>>> With secondary indexes directly on physical tables, you're somewhat
>>>>> bounded by the hit you're willing to take on the write side, as the cost of
>>>>> maintaining the index is similar to the cost of the write to the data
>>>>> table. So the extra number of physical tables for indexes seems within the
>>>>> bounds of what HBase could handle.
>>>>>
>>>>> How many secondary indexes are you creating and are you ok with the
>>>>> extra write-time cost?
>>>>>
>>>>> From a code consistency standpoint, using the same approach across
>>>>> local, global, and view indexes might simplify things, though. Please file
>>>>> a JIRA with a bit more detail on your use case.
>>>>>
>>>>> Thanks,
>>>>> James
>>>>>
>>>>>
>>>>>
>>>>> On Fri, Jul 8, 2016 at 8:59 PM, Simon Wang <si...@airbnb.com>
>>>>> wrote:
>>>>>
>>>>>> Hi all,
>>>>>>
>>>>>> I am writing to ask if there is a way to let Phoenix store all
>>>>>> indexes on a single table in the same HBase table. If each index must be
>>>>>> stored in a separate table, creating more than a few indexes on table with
>>>>>> a large number of regions will not scale well.
>>>>>>
>>>>>> From what I have learned, when Phoenix builds indexes on a view, it
>>>>>> stores all indexes in a table associated with the underlying table of the
>>>>>> view. e.g. if V1 is a view of T1, all indexes on V1 will be stored in
>>>>>> _IDX_T1. It would be great if this behavior can be optionally turned on for
>>>>>> indexes on tables.
>>>>>>
>>>>>> Best,
>>>>>> Simon
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>
>
>

Re: Index tables at scale

Posted by Simon Wang <si...@airbnb.com>.
Thanks Mujtaba. This is good to know. It is possible manipulate the key bit to avoid the hot-spotting, so we are probably trying unsalted table out. 

Still, it would be nice if combined indexes in a single table is possible. 


> On Jul 11, 2016, at 2:41 PM, Mujtaba Chohan <mu...@apache.org> wrote:
> 
> FYI if you keys are not written in order i.e. you are not concerned about write hot-spotting/write throughput then try writing your data to an un-salted table. Read performance for un-salted table can be comparable or better to salted one with stats <https://phoenix.apache.org/update_statistics.html>.
> 
> On Mon, Jul 11, 2016 at 2:31 PM, Simon Wang <simon.wang@airbnb.com <ma...@airbnb.com>> wrote:
> This indexes will be salted indeed. (so is the data table). If all indexes reside in the same table, there will be only 512 regions in total (256 for data table, 256 for the combined index table). Indeed the combined index table will be 12x large as a single index table. But it doesn’t cover all columns so it should be fine.
> 
>> On Jul 11, 2016, at 2:26 PM, James Taylor <jamestaylor@apache.org <ma...@apache.org>> wrote:
>> 
>> Will the index be salted (and that's why it's 256 regions per table)? If not, how many regions would there be if all indexes are in the same table (assuming the table is 12x bigger than one index table)?
>> 
>> On Monday, July 11, 2016, Simon Wang <simon.wang@airbnb.com <ma...@airbnb.com>> wrote:
>> Thanks, Mujtaba. What you wrote is exactly what I meant. While not all our tables needs these many regions and indexes, the num of regions/region server can grow quickly.
>> 
>> -Simon
>> 
>>> On Jul 11, 2016, at 2:17 PM, Mujtaba Chohan <mujtaba@apache.org <>> wrote:
>>> 
>>> 12 index tables * 256 region per table = ~3K regions for index tables assuming we are talking of covered index which implies 200+ regions/region server on a 15 node cluster.
>>> 
>>> On Mon, Jul 11, 2016 at 1:58 PM, James Taylor <jamestaylor@apache.org <>> wrote:
>>> Hi Simon,
>>> 
>>> I might be missing something, but with 12 separate index tables or 1 index table, the amount of data will be the same. Won't there be the same number of regions either way?
>>> 
>>> Thanks,
>>> James
>>> 
>>> On Sun, Jul 10, 2016 at 10:50 PM, Simon Wang <simon.wang@airbnb.com <>> wrote:
>>> Hi James,
>>> 
>>> Thanks for the response.
>>> 
>>> In our use case, there is a 256 region table, and we want to build ~12 indexes on it. We have 15 region servers. If each index is in its own table, that would be a total of 221 regions per region server of this single table. I think the extra write time cost is okay. But the number of regions is too high for us.
>>> 
>>> Best,
>>> Simon
>>> 
>>> 
>>>> On Jul 9, 2016, at 1:18 AM, James Taylor <jamestaylor@apache.org <>> wrote:
>>>> 
>>>> Hi Simon,
>>>> The reason we've taken this approach with views is that it's possible with multi-tenancy that the number of views would grow unbounded since you might end up with a view per tenant (100K or 1M views or more - clearly too many for HBase to handle as separate tables).
>>>> 
>>>> With secondary indexes directly on physical tables, you're somewhat bounded by the hit you're willing to take on the write side, as the cost of maintaining the index is similar to the cost of the write to the data table. So the extra number of physical tables for indexes seems within the bounds of what HBase could handle. 
>>>> 
>>>> How many secondary indexes are you creating and are you ok with the extra write-time cost?
>>>> 
>>>> From a code consistency standpoint, using the same approach across local, global, and view indexes might simplify things, though. Please file a JIRA with a bit more detail on your use case.
>>>> 
>>>> Thanks,
>>>> James
>>>> 
>>>> 
>>>> 
>>>> On Fri, Jul 8, 2016 at 8:59 PM, Simon Wang <simon.wang@airbnb.com <>> wrote:
>>>> Hi all,
>>>> 
>>>> I am writing to ask if there is a way to let Phoenix store all indexes on a single table in the same HBase table. If each index must be stored in a separate table, creating more than a few indexes on table with a large number of regions will not scale well.
>>>> 
>>>> From what I have learned, when Phoenix builds indexes on a view, it stores all indexes in a table associated with the underlying table of the view. e.g. if V1 is a view of T1, all indexes on V1 will be stored in _IDX_T1. It would be great if this behavior can be optionally turned on for indexes on tables.
>>>> 
>>>> Best,
>>>> Simon
>>>> 
>>> 
>>> 
>>> 
>> 
> 
> 


Re: Index tables at scale

Posted by Mujtaba Chohan <mu...@apache.org>.
FYI if you keys are not written in order i.e. you are not concerned about
write hot-spotting/write throughput then try writing your data to an
un-salted table. Read performance for un-salted table can be comparable or
better to salted one with stats
<https://phoenix.apache.org/update_statistics.html>.

On Mon, Jul 11, 2016 at 2:31 PM, Simon Wang <si...@airbnb.com> wrote:

> This indexes will be salted indeed. (so is the data table). If all indexes
> reside in the same table, there will be only 512 regions in total (256 for
> data table, 256 for the combined index table). Indeed the combined index
> table will be 12x large as a single index table. But it doesn’t cover all
> columns so it should be fine.
>
> On Jul 11, 2016, at 2:26 PM, James Taylor <ja...@apache.org> wrote:
>
> Will the index be salted (and that's why it's 256 regions per table)? If
> not, how many regions would there be if all indexes are in the same table
> (assuming the table is 12x bigger than one index table)?
>
> On Monday, July 11, 2016, Simon Wang <si...@airbnb.com> wrote:
>
>> Thanks, Mujtaba. What you wrote is exactly what I meant. While not all
>> our tables needs these many regions and indexes, the num of regions/region
>> server can grow quickly.
>>
>> -Simon
>>
>> On Jul 11, 2016, at 2:17 PM, Mujtaba Chohan <mu...@apache.org> wrote:
>>
>> 12 index tables * 256 region per table = ~3K regions for index tables
>> assuming we are talking of covered index which implies 200+ regions/region
>> server on a 15 node cluster.
>>
>> On Mon, Jul 11, 2016 at 1:58 PM, James Taylor <ja...@apache.org>
>> wrote:
>>
>>> Hi Simon,
>>>
>>> I might be missing something, but with 12 separate index tables or 1
>>> index table, the amount of data will be the same. Won't there be the same
>>> number of regions either way?
>>>
>>> Thanks,
>>> James
>>>
>>> On Sun, Jul 10, 2016 at 10:50 PM, Simon Wang <si...@airbnb.com>
>>> wrote:
>>>
>>>> Hi James,
>>>>
>>>> Thanks for the response.
>>>>
>>>> In our use case, there is a 256 region table, and we want to build ~12
>>>> indexes on it. We have 15 region servers. If each index is in its own
>>>> table, that would be a total of 221 regions per region server of this
>>>> single table. I think the extra write time cost is okay. But the number of
>>>> regions is too high for us.
>>>>
>>>> Best,
>>>> Simon
>>>>
>>>>
>>>> On Jul 9, 2016, at 1:18 AM, James Taylor <ja...@apache.org>
>>>> wrote:
>>>>
>>>> Hi Simon,
>>>> The reason we've taken this approach with views is that it's possible
>>>> with multi-tenancy that the number of views would grow unbounded since you
>>>> might end up with a view per tenant (100K or 1M views or more - clearly too
>>>> many for HBase to handle as separate tables).
>>>>
>>>> With secondary indexes directly on physical tables, you're somewhat
>>>> bounded by the hit you're willing to take on the write side, as the cost of
>>>> maintaining the index is similar to the cost of the write to the data
>>>> table. So the extra number of physical tables for indexes seems within the
>>>> bounds of what HBase could handle.
>>>>
>>>> How many secondary indexes are you creating and are you ok with the
>>>> extra write-time cost?
>>>>
>>>> From a code consistency standpoint, using the same approach across
>>>> local, global, and view indexes might simplify things, though. Please file
>>>> a JIRA with a bit more detail on your use case.
>>>>
>>>> Thanks,
>>>> James
>>>>
>>>>
>>>>
>>>> On Fri, Jul 8, 2016 at 8:59 PM, Simon Wang <si...@airbnb.com>
>>>> wrote:
>>>>
>>>>> Hi all,
>>>>>
>>>>> I am writing to ask if there is a way to let Phoenix store all indexes
>>>>> on a single table in the same HBase table. If each index must be stored in
>>>>> a separate table, creating more than a few indexes on table with a large
>>>>> number of regions will not scale well.
>>>>>
>>>>> From what I have learned, when Phoenix builds indexes on a view, it
>>>>> stores all indexes in a table associated with the underlying table of the
>>>>> view. e.g. if V1 is a view of T1, all indexes on V1 will be stored in
>>>>> _IDX_T1. It would be great if this behavior can be optionally turned on for
>>>>> indexes on tables.
>>>>>
>>>>> Best,
>>>>> Simon
>>>>
>>>>
>>>>
>>>>
>>>
>>
>>
>

Re: Index tables at scale

Posted by Simon Wang <si...@airbnb.com>.
This indexes will be salted indeed. (so is the data table). If all indexes reside in the same table, there will be only 512 regions in total (256 for data table, 256 for the combined index table). Indeed the combined index table will be 12x large as a single index table. But it doesn’t cover all columns so it should be fine.

> On Jul 11, 2016, at 2:26 PM, James Taylor <ja...@apache.org> wrote:
> 
> Will the index be salted (and that's why it's 256 regions per table)? If not, how many regions would there be if all indexes are in the same table (assuming the table is 12x bigger than one index table)?
> 
> On Monday, July 11, 2016, Simon Wang <simon.wang@airbnb.com <ma...@airbnb.com>> wrote:
> Thanks, Mujtaba. What you wrote is exactly what I meant. While not all our tables needs these many regions and indexes, the num of regions/region server can grow quickly.
> 
> -Simon
> 
>> On Jul 11, 2016, at 2:17 PM, Mujtaba Chohan <mujtaba@apache.org <javascript:_e(%7B%7D,'cvml','mujtaba@apache.org');>> wrote:
>> 
>> 12 index tables * 256 region per table = ~3K regions for index tables assuming we are talking of covered index which implies 200+ regions/region server on a 15 node cluster.
>> 
>> On Mon, Jul 11, 2016 at 1:58 PM, James Taylor <jamestaylor@apache.org <javascript:_e(%7B%7D,'cvml','jamestaylor@apache.org');>> wrote:
>> Hi Simon,
>> 
>> I might be missing something, but with 12 separate index tables or 1 index table, the amount of data will be the same. Won't there be the same number of regions either way?
>> 
>> Thanks,
>> James
>> 
>> On Sun, Jul 10, 2016 at 10:50 PM, Simon Wang <simon.wang@airbnb.com <javascript:_e(%7B%7D,'cvml','simon.wang@airbnb.com');>> wrote:
>> Hi James,
>> 
>> Thanks for the response.
>> 
>> In our use case, there is a 256 region table, and we want to build ~12 indexes on it. We have 15 region servers. If each index is in its own table, that would be a total of 221 regions per region server of this single table. I think the extra write time cost is okay. But the number of regions is too high for us.
>> 
>> Best,
>> Simon
>> 
>> 
>>> On Jul 9, 2016, at 1:18 AM, James Taylor <jamestaylor@apache.org <javascript:_e(%7B%7D,'cvml','jamestaylor@apache.org');>> wrote:
>>> 
>>> Hi Simon,
>>> The reason we've taken this approach with views is that it's possible with multi-tenancy that the number of views would grow unbounded since you might end up with a view per tenant (100K or 1M views or more - clearly too many for HBase to handle as separate tables).
>>> 
>>> With secondary indexes directly on physical tables, you're somewhat bounded by the hit you're willing to take on the write side, as the cost of maintaining the index is similar to the cost of the write to the data table. So the extra number of physical tables for indexes seems within the bounds of what HBase could handle. 
>>> 
>>> How many secondary indexes are you creating and are you ok with the extra write-time cost?
>>> 
>>> From a code consistency standpoint, using the same approach across local, global, and view indexes might simplify things, though. Please file a JIRA with a bit more detail on your use case.
>>> 
>>> Thanks,
>>> James
>>> 
>>> 
>>> 
>>> On Fri, Jul 8, 2016 at 8:59 PM, Simon Wang <simon.wang@airbnb.com <javascript:_e(%7B%7D,'cvml','simon.wang@airbnb.com');>> wrote:
>>> Hi all,
>>> 
>>> I am writing to ask if there is a way to let Phoenix store all indexes on a single table in the same HBase table. If each index must be stored in a separate table, creating more than a few indexes on table with a large number of regions will not scale well.
>>> 
>>> From what I have learned, when Phoenix builds indexes on a view, it stores all indexes in a table associated with the underlying table of the view. e.g. if V1 is a view of T1, all indexes on V1 will be stored in _IDX_T1. It would be great if this behavior can be optionally turned on for indexes on tables.
>>> 
>>> Best,
>>> Simon
>>> 
>> 
>> 
>> 
> 


Re: Index tables at scale

Posted by James Taylor <ja...@apache.org>.
Will the index be salted (and that's why it's 256 regions per table)? If
not, how many regions would there be if all indexes are in the same table
(assuming the table is 12x bigger than one index table)?

On Monday, July 11, 2016, Simon Wang <si...@airbnb.com> wrote:

> Thanks, Mujtaba. What you wrote is exactly what I meant. While not all our
> tables needs these many regions and indexes, the num of regions/region
> server can grow quickly.
>
> -Simon
>
> On Jul 11, 2016, at 2:17 PM, Mujtaba Chohan <mujtaba@apache.org
> <javascript:_e(%7B%7D,'cvml','mujtaba@apache.org');>> wrote:
>
> 12 index tables * 256 region per table = ~3K regions for index tables
> assuming we are talking of covered index which implies 200+ regions/region
> server on a 15 node cluster.
>
> On Mon, Jul 11, 2016 at 1:58 PM, James Taylor <jamestaylor@apache.org
> <javascript:_e(%7B%7D,'cvml','jamestaylor@apache.org');>> wrote:
>
>> Hi Simon,
>>
>> I might be missing something, but with 12 separate index tables or 1
>> index table, the amount of data will be the same. Won't there be the same
>> number of regions either way?
>>
>> Thanks,
>> James
>>
>> On Sun, Jul 10, 2016 at 10:50 PM, Simon Wang <simon.wang@airbnb.com
>> <javascript:_e(%7B%7D,'cvml','simon.wang@airbnb.com');>> wrote:
>>
>>> Hi James,
>>>
>>> Thanks for the response.
>>>
>>> In our use case, there is a 256 region table, and we want to build ~12
>>> indexes on it. We have 15 region servers. If each index is in its own
>>> table, that would be a total of 221 regions per region server of this
>>> single table. I think the extra write time cost is okay. But the number of
>>> regions is too high for us.
>>>
>>> Best,
>>> Simon
>>>
>>>
>>> On Jul 9, 2016, at 1:18 AM, James Taylor <jamestaylor@apache.org
>>> <javascript:_e(%7B%7D,'cvml','jamestaylor@apache.org');>> wrote:
>>>
>>> Hi Simon,
>>> The reason we've taken this approach with views is that it's possible
>>> with multi-tenancy that the number of views would grow unbounded since you
>>> might end up with a view per tenant (100K or 1M views or more - clearly too
>>> many for HBase to handle as separate tables).
>>>
>>> With secondary indexes directly on physical tables, you're somewhat
>>> bounded by the hit you're willing to take on the write side, as the cost of
>>> maintaining the index is similar to the cost of the write to the data
>>> table. So the extra number of physical tables for indexes seems within the
>>> bounds of what HBase could handle.
>>>
>>> How many secondary indexes are you creating and are you ok with the
>>> extra write-time cost?
>>>
>>> From a code consistency standpoint, using the same approach across
>>> local, global, and view indexes might simplify things, though. Please file
>>> a JIRA with a bit more detail on your use case.
>>>
>>> Thanks,
>>> James
>>>
>>>
>>>
>>> On Fri, Jul 8, 2016 at 8:59 PM, Simon Wang <simon.wang@airbnb.com
>>> <javascript:_e(%7B%7D,'cvml','simon.wang@airbnb.com');>> wrote:
>>>
>>>> Hi all,
>>>>
>>>> I am writing to ask if there is a way to let Phoenix store all indexes
>>>> on a single table in the same HBase table. If each index must be stored in
>>>> a separate table, creating more than a few indexes on table with a large
>>>> number of regions will not scale well.
>>>>
>>>> From what I have learned, when Phoenix builds indexes on a view, it
>>>> stores all indexes in a table associated with the underlying table of the
>>>> view. e.g. if V1 is a view of T1, all indexes on V1 will be stored in
>>>> _IDX_T1. It would be great if this behavior can be optionally turned on for
>>>> indexes on tables.
>>>>
>>>> Best,
>>>> Simon
>>>
>>>
>>>
>>>
>>
>
>

Re: Index tables at scale

Posted by Simon Wang <si...@airbnb.com>.
Thanks, Mujtaba. What you wrote is exactly what I meant. While not all our tables needs these many regions and indexes, the num of regions/region server can grow quickly.

-Simon

> On Jul 11, 2016, at 2:17 PM, Mujtaba Chohan <mu...@apache.org> wrote:
> 
> 12 index tables * 256 region per table = ~3K regions for index tables assuming we are talking of covered index which implies 200+ regions/region server on a 15 node cluster.
> 
> On Mon, Jul 11, 2016 at 1:58 PM, James Taylor <jamestaylor@apache.org <ma...@apache.org>> wrote:
> Hi Simon,
> 
> I might be missing something, but with 12 separate index tables or 1 index table, the amount of data will be the same. Won't there be the same number of regions either way?
> 
> Thanks,
> James
> 
> On Sun, Jul 10, 2016 at 10:50 PM, Simon Wang <simon.wang@airbnb.com <ma...@airbnb.com>> wrote:
> Hi James,
> 
> Thanks for the response.
> 
> In our use case, there is a 256 region table, and we want to build ~12 indexes on it. We have 15 region servers. If each index is in its own table, that would be a total of 221 regions per region server of this single table. I think the extra write time cost is okay. But the number of regions is too high for us.
> 
> Best,
> Simon
> 
> 
>> On Jul 9, 2016, at 1:18 AM, James Taylor <jamestaylor@apache.org <ma...@apache.org>> wrote:
>> 
>> Hi Simon,
>> The reason we've taken this approach with views is that it's possible with multi-tenancy that the number of views would grow unbounded since you might end up with a view per tenant (100K or 1M views or more - clearly too many for HBase to handle as separate tables).
>> 
>> With secondary indexes directly on physical tables, you're somewhat bounded by the hit you're willing to take on the write side, as the cost of maintaining the index is similar to the cost of the write to the data table. So the extra number of physical tables for indexes seems within the bounds of what HBase could handle. 
>> 
>> How many secondary indexes are you creating and are you ok with the extra write-time cost?
>> 
>> From a code consistency standpoint, using the same approach across local, global, and view indexes might simplify things, though. Please file a JIRA with a bit more detail on your use case.
>> 
>> Thanks,
>> James
>> 
>> 
>> 
>> On Fri, Jul 8, 2016 at 8:59 PM, Simon Wang <simon.wang@airbnb.com <ma...@airbnb.com>> wrote:
>> Hi all,
>> 
>> I am writing to ask if there is a way to let Phoenix store all indexes on a single table in the same HBase table. If each index must be stored in a separate table, creating more than a few indexes on table with a large number of regions will not scale well.
>> 
>> From what I have learned, when Phoenix builds indexes on a view, it stores all indexes in a table associated with the underlying table of the view. e.g. if V1 is a view of T1, all indexes on V1 will be stored in _IDX_T1. It would be great if this behavior can be optionally turned on for indexes on tables.
>> 
>> Best,
>> Simon
>> 
> 
> 
> 


Re: Index tables at scale

Posted by Mujtaba Chohan <mu...@apache.org>.
12 index tables * 256 region per table = ~3K regions for index tables
assuming we are talking of covered index which implies 200+ regions/region
server on a 15 node cluster.

On Mon, Jul 11, 2016 at 1:58 PM, James Taylor <ja...@apache.org>
wrote:

> Hi Simon,
>
> I might be missing something, but with 12 separate index tables or 1 index
> table, the amount of data will be the same. Won't there be the same number
> of regions either way?
>
> Thanks,
> James
>
> On Sun, Jul 10, 2016 at 10:50 PM, Simon Wang <si...@airbnb.com>
> wrote:
>
>> Hi James,
>>
>> Thanks for the response.
>>
>> In our use case, there is a 256 region table, and we want to build ~12
>> indexes on it. We have 15 region servers. If each index is in its own
>> table, that would be a total of 221 regions per region server of this
>> single table. I think the extra write time cost is okay. But the number of
>> regions is too high for us.
>>
>> Best,
>> Simon
>>
>>
>> On Jul 9, 2016, at 1:18 AM, James Taylor <ja...@apache.org> wrote:
>>
>> Hi Simon,
>> The reason we've taken this approach with views is that it's possible
>> with multi-tenancy that the number of views would grow unbounded since you
>> might end up with a view per tenant (100K or 1M views or more - clearly too
>> many for HBase to handle as separate tables).
>>
>> With secondary indexes directly on physical tables, you're somewhat
>> bounded by the hit you're willing to take on the write side, as the cost of
>> maintaining the index is similar to the cost of the write to the data
>> table. So the extra number of physical tables for indexes seems within the
>> bounds of what HBase could handle.
>>
>> How many secondary indexes are you creating and are you ok with the extra
>> write-time cost?
>>
>> From a code consistency standpoint, using the same approach across local,
>> global, and view indexes might simplify things, though. Please file a JIRA
>> with a bit more detail on your use case.
>>
>> Thanks,
>> James
>>
>>
>>
>> On Fri, Jul 8, 2016 at 8:59 PM, Simon Wang <si...@airbnb.com> wrote:
>>
>>> Hi all,
>>>
>>> I am writing to ask if there is a way to let Phoenix store all indexes
>>> on a single table in the same HBase table. If each index must be stored in
>>> a separate table, creating more than a few indexes on table with a large
>>> number of regions will not scale well.
>>>
>>> From what I have learned, when Phoenix builds indexes on a view, it
>>> stores all indexes in a table associated with the underlying table of the
>>> view. e.g. if V1 is a view of T1, all indexes on V1 will be stored in
>>> _IDX_T1. It would be great if this behavior can be optionally turned on for
>>> indexes on tables.
>>>
>>> Best,
>>> Simon
>>
>>
>>
>>
>

Re: Index tables at scale

Posted by James Taylor <ja...@apache.org>.
Hi Simon,

I might be missing something, but with 12 separate index tables or 1 index
table, the amount of data will be the same. Won't there be the same number
of regions either way?

Thanks,
James

On Sun, Jul 10, 2016 at 10:50 PM, Simon Wang <si...@airbnb.com> wrote:

> Hi James,
>
> Thanks for the response.
>
> In our use case, there is a 256 region table, and we want to build ~12
> indexes on it. We have 15 region servers. If each index is in its own
> table, that would be a total of 221 regions per region server of this
> single table. I think the extra write time cost is okay. But the number of
> regions is too high for us.
>
> Best,
> Simon
>
>
> On Jul 9, 2016, at 1:18 AM, James Taylor <ja...@apache.org> wrote:
>
> Hi Simon,
> The reason we've taken this approach with views is that it's possible with
> multi-tenancy that the number of views would grow unbounded since you might
> end up with a view per tenant (100K or 1M views or more - clearly too many
> for HBase to handle as separate tables).
>
> With secondary indexes directly on physical tables, you're somewhat
> bounded by the hit you're willing to take on the write side, as the cost of
> maintaining the index is similar to the cost of the write to the data
> table. So the extra number of physical tables for indexes seems within the
> bounds of what HBase could handle.
>
> How many secondary indexes are you creating and are you ok with the extra
> write-time cost?
>
> From a code consistency standpoint, using the same approach across local,
> global, and view indexes might simplify things, though. Please file a JIRA
> with a bit more detail on your use case.
>
> Thanks,
> James
>
>
>
> On Fri, Jul 8, 2016 at 8:59 PM, Simon Wang <si...@airbnb.com> wrote:
>
>> Hi all,
>>
>> I am writing to ask if there is a way to let Phoenix store all indexes on
>> a single table in the same HBase table. If each index must be stored in a
>> separate table, creating more than a few indexes on table with a large
>> number of regions will not scale well.
>>
>> From what I have learned, when Phoenix builds indexes on a view, it
>> stores all indexes in a table associated with the underlying table of the
>> view. e.g. if V1 is a view of T1, all indexes on V1 will be stored in
>> _IDX_T1. It would be great if this behavior can be optionally turned on for
>> indexes on tables.
>>
>> Best,
>> Simon
>
>
>
>

Re: Index tables at scale

Posted by Simon Wang <si...@airbnb.com>.
Hi James,

Thanks for the response.

In our use case, there is a 256 region table, and we want to build ~12 indexes on it. We have 15 region servers. If each index is in its own table, that would be a total of 221 regions per region server of this single table. I think the extra write time cost is okay. But the number of regions is too high for us.

Best,
Simon


> On Jul 9, 2016, at 1:18 AM, James Taylor <ja...@apache.org> wrote:
> 
> Hi Simon,
> The reason we've taken this approach with views is that it's possible with multi-tenancy that the number of views would grow unbounded since you might end up with a view per tenant (100K or 1M views or more - clearly too many for HBase to handle as separate tables).
> 
> With secondary indexes directly on physical tables, you're somewhat bounded by the hit you're willing to take on the write side, as the cost of maintaining the index is similar to the cost of the write to the data table. So the extra number of physical tables for indexes seems within the bounds of what HBase could handle. 
> 
> How many secondary indexes are you creating and are you ok with the extra write-time cost?
> 
> From a code consistency standpoint, using the same approach across local, global, and view indexes might simplify things, though. Please file a JIRA with a bit more detail on your use case.
> 
> Thanks,
> James
> 
> 
> 
> On Fri, Jul 8, 2016 at 8:59 PM, Simon Wang <simon.wang@airbnb.com <ma...@airbnb.com>> wrote:
> Hi all,
> 
> I am writing to ask if there is a way to let Phoenix store all indexes on a single table in the same HBase table. If each index must be stored in a separate table, creating more than a few indexes on table with a large number of regions will not scale well.
> 
> From what I have learned, when Phoenix builds indexes on a view, it stores all indexes in a table associated with the underlying table of the view. e.g. if V1 is a view of T1, all indexes on V1 will be stored in _IDX_T1. It would be great if this behavior can be optionally turned on for indexes on tables.
> 
> Best,
> Simon
> 


Re: Index tables at scale

Posted by James Taylor <ja...@apache.org>.
Hi Simon,
The reason we've taken this approach with views is that it's possible with
multi-tenancy that the number of views would grow unbounded since you might
end up with a view per tenant (100K or 1M views or more - clearly too many
for HBase to handle as separate tables).

With secondary indexes directly on physical tables, you're somewhat bounded
by the hit you're willing to take on the write side, as the cost of
maintaining the index is similar to the cost of the write to the data
table. So the extra number of physical tables for indexes seems within the
bounds of what HBase could handle.

How many secondary indexes are you creating and are you ok with the extra
write-time cost?

From a code consistency standpoint, using the same approach across local,
global, and view indexes might simplify things, though. Please file a JIRA
with a bit more detail on your use case.

Thanks,
James



On Fri, Jul 8, 2016 at 8:59 PM, Simon Wang <si...@airbnb.com> wrote:

> Hi all,
>
> I am writing to ask if there is a way to let Phoenix store all indexes on
> a single table in the same HBase table. If each index must be stored in a
> separate table, creating more than a few indexes on table with a large
> number of regions will not scale well.
>
> From what I have learned, when Phoenix builds indexes on a view, it stores
> all indexes in a table associated with the underlying table of the view.
> e.g. if V1 is a view of T1, all indexes on V1 will be stored in _IDX_T1. It
> would be great if this behavior can be optionally turned on for indexes on
> tables.
>
> Best,
> Simon