You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@phoenix.apache.org by Nathan Davis <na...@salesforce.com> on 2016/08/12 00:51:57 UTC

monitoring status of CREATE INDEX operation

Hi All,
I executed a CREATE INDEX against a fairly large table. And I received a
timeout error after a minute or two, which I understand is expected.
`!tables` in sqlline shows the index is still in BUILDING state after 2
hours, which may be accurate since it is a pretty large table and my
cluster is just a smallish EMR throwaway.

My question is: Is there some way to verify that the index is in fact still
being built? Perhaps some HBase logs or the UI or some hbase shell command?
Unfortunately I am just as new to HBase as I am to Phoenix itself.

Thanks,
 -nathan

Re: monitoring status of CREATE INDEX operation

Posted by Nathan Davis <na...@salesforce.com>.
Thanks James, all CAPS did the trick!

Yes, the event table is already IMMUTABLE_ROWS=true.

Thanks again,
Nathan


On Fri, Aug 12, 2016 at 10:59 AM, James Taylor <ja...@apache.org>
wrote:

> In your IndexTool invocation, try use all caps for your table and index
> name. Phoenix normalizes names by upper casing them (unless they're in
> double quotes).
>
> One other unrelated question: did you declare your event table with
> IMMUTABLE_ROWS=true (assuming it's a write-once table)? If not, you can use
> the ALTER TABLE trans.event SET IMMUTABLE_ROWS=true command to change it.
> This will give you some performance benefit.
>
> Thanks,
> James
>
> On Fri, Aug 12, 2016 at 7:39 AM, Nathan Davis <nathan.davis@salesforce.com
> > wrote:
>
>> Thanks for the detailed info. I took the advice of using the ASYNC
>> method. The CREATE statement executes fine and I end up with an index table
>> showing in state BUILDING. When I kick off the MR job with `hbase
>> org.apache.phoenix.mapreduce.index.IndexTool --schema trans --data-table
>> event --index-table event_object_id_idx_b --output-path
>> EVENT_OBJECT_ID_IDX_B_HFILES` I get this odd error:
>>
>> 2016-08-12 14:29:40,073 ERROR [main] index.IndexTool:  An exception
>>> occured while performing the indexing job : java.lang.IllegalArgumentException:
>>>  trans.event_object_id_idx_b is not an index table for trans.event
>>> at org.apache.phoenix.mapreduce.index.IndexTool.run(IndexTool.java:187)
>>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
>>> at org.apache.phoenix.mapreduce.index.IndexTool.main(IndexTool.java:378)
>>>
>>
>>
>>
>>
>> My CREATE INDEX was as follows:
>>
>> create index if not exists event_object_id_idx_b on trans.event (
>>>     object_id
>>> ) ASYNC UPDATE_CACHE_FREQUENCY=60000;
>>
>>
>>
>> On Thu, Aug 11, 2016 at 9:40 PM, James Taylor <ja...@apache.org>
>> wrote:
>>
>>> Hi Nathan,
>>> If your table is large, I'd recommend creating your index
>>> asynchronously. To do that, you'd add the ASYNC keyword to the end of your
>>> CREATE INDEX call. In this case, the index will be built through Map Reduce
>>> in a more resilient manner (i.e. the client going up or down won't impact
>>> it and you have the regular retries of a MR job). On AWS, you'll need to
>>> manually start the MR job, but at SFDC we have a CRON job that'll start it
>>> for you automatically (this is open source too, so it'd be good to get that
>>> up and running in AWS as well). See https://phoenix.apache.org
>>> /secondary_indexing.html#Index_Population for details.
>>>
>>> If you don't run it asynchronously, then you'll need to increase the
>>> query timeout (i.e. phoenix.query.timeoutMs config property) to be larger
>>> than the time it'll take to build the index. If the client goes down before
>>> the CREATE INDEX call finished (or the query times out), then the index
>>> build will stop (and unfortunately will need to be run again).
>>>
>>> To monitor the index build, there are a few ways - if running through
>>> MR, then you can monitor the MR job in the standard way. If running
>>> synchronously, you can monitor the index table - you'll see new regions
>>> created as splits occur, or you could query the SYSTEM.STATS table (which
>>> gets populated as splits and compactions happen), or you could run a
>>> count(*) query directly against the index table (though that'll put more
>>> load on your system because it'll require a full table scan).
>>>
>>> HTH. Thanks,
>>> James
>>>
>>> On Thu, Aug 11, 2016 at 5:51 PM, Nathan Davis <
>>> nathan.davis@salesforce.com> wrote:
>>>
>>>> Hi All,
>>>> I executed a CREATE INDEX against a fairly large table. And I received
>>>> a timeout error after a minute or two, which I understand is expected.
>>>> `!tables` in sqlline shows the index is still in BUILDING state after 2
>>>> hours, which may be accurate since it is a pretty large table and my
>>>> cluster is just a smallish EMR throwaway.
>>>>
>>>> My question is: Is there some way to verify that the index is in fact
>>>> still being built? Perhaps some HBase logs or the UI or some hbase shell
>>>> command? Unfortunately I am just as new to HBase as I am to Phoenix itself.
>>>>
>>>> Thanks,
>>>>  -nathan
>>>>
>>>
>>>
>>
>

Re: monitoring status of CREATE INDEX operation

Posted by James Taylor <ja...@apache.org>.
In your IndexTool invocation, try use all caps for your table and index
name. Phoenix normalizes names by upper casing them (unless they're in
double quotes).

One other unrelated question: did you declare your event table with
IMMUTABLE_ROWS=true (assuming it's a write-once table)? If not, you can use
the ALTER TABLE trans.event SET IMMUTABLE_ROWS=true command to change it.
This will give you some performance benefit.

Thanks,
James

On Fri, Aug 12, 2016 at 7:39 AM, Nathan Davis <na...@salesforce.com>
wrote:

> Thanks for the detailed info. I took the advice of using the ASYNC method.
> The CREATE statement executes fine and I end up with an index table showing
> in state BUILDING. When I kick off the MR job with `hbase
> org.apache.phoenix.mapreduce.index.IndexTool --schema trans --data-table
> event --index-table event_object_id_idx_b --output-path
> EVENT_OBJECT_ID_IDX_B_HFILES` I get this odd error:
>
> 2016-08-12 14:29:40,073 ERROR [main] index.IndexTool:  An exception
>> occured while performing the indexing job : java.lang.IllegalArgumentException:
>>  trans.event_object_id_idx_b is not an index table for trans.event
>> at org.apache.phoenix.mapreduce.index.IndexTool.run(IndexTool.java:187)
>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
>> at org.apache.phoenix.mapreduce.index.IndexTool.main(IndexTool.java:378)
>
>
>
>
> My CREATE INDEX was as follows:
>
> create index if not exists event_object_id_idx_b on trans.event (
>>     object_id
>> ) ASYNC UPDATE_CACHE_FREQUENCY=60000;
>
>
>
> On Thu, Aug 11, 2016 at 9:40 PM, James Taylor <ja...@apache.org>
> wrote:
>
>> Hi Nathan,
>> If your table is large, I'd recommend creating your index asynchronously.
>> To do that, you'd add the ASYNC keyword to the end of your CREATE INDEX
>> call. In this case, the index will be built through Map Reduce in a more
>> resilient manner (i.e. the client going up or down won't impact it and you
>> have the regular retries of a MR job). On AWS, you'll need to manually
>> start the MR job, but at SFDC we have a CRON job that'll start it for you
>> automatically (this is open source too, so it'd be good to get that up and
>> running in AWS as well). See https://phoenix.apache.org
>> /secondary_indexing.html#Index_Population for details.
>>
>> If you don't run it asynchronously, then you'll need to increase the
>> query timeout (i.e. phoenix.query.timeoutMs config property) to be larger
>> than the time it'll take to build the index. If the client goes down before
>> the CREATE INDEX call finished (or the query times out), then the index
>> build will stop (and unfortunately will need to be run again).
>>
>> To monitor the index build, there are a few ways - if running through MR,
>> then you can monitor the MR job in the standard way. If running
>> synchronously, you can monitor the index table - you'll see new regions
>> created as splits occur, or you could query the SYSTEM.STATS table (which
>> gets populated as splits and compactions happen), or you could run a
>> count(*) query directly against the index table (though that'll put more
>> load on your system because it'll require a full table scan).
>>
>> HTH. Thanks,
>> James
>>
>> On Thu, Aug 11, 2016 at 5:51 PM, Nathan Davis <
>> nathan.davis@salesforce.com> wrote:
>>
>>> Hi All,
>>> I executed a CREATE INDEX against a fairly large table. And I received a
>>> timeout error after a minute or two, which I understand is expected.
>>> `!tables` in sqlline shows the index is still in BUILDING state after 2
>>> hours, which may be accurate since it is a pretty large table and my
>>> cluster is just a smallish EMR throwaway.
>>>
>>> My question is: Is there some way to verify that the index is in fact
>>> still being built? Perhaps some HBase logs or the UI or some hbase shell
>>> command? Unfortunately I am just as new to HBase as I am to Phoenix itself.
>>>
>>> Thanks,
>>>  -nathan
>>>
>>
>>
>

Re: monitoring status of CREATE INDEX operation

Posted by Nathan Davis <na...@salesforce.com>.
Thanks for the detailed info. I took the advice of using the ASYNC method.
The CREATE statement executes fine and I end up with an index table showing
in state BUILDING. When I kick off the MR job with `hbase
org.apache.phoenix.mapreduce.index.IndexTool --schema trans --data-table
event --index-table event_object_id_idx_b --output-path
EVENT_OBJECT_ID_IDX_B_HFILES` I get this odd error:

2016-08-12 14:29:40,073 ERROR [main] index.IndexTool:  An exception occured
> while performing the indexing job : java.lang.IllegalArgumentException:
>  trans.event_object_id_idx_b is not an index table for trans.event
> at org.apache.phoenix.mapreduce.index.IndexTool.run(IndexTool.java:187)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
> at org.apache.phoenix.mapreduce.index.IndexTool.main(IndexTool.java:378)




My CREATE INDEX was as follows:

create index if not exists event_object_id_idx_b on trans.event (
>     object_id
> ) ASYNC UPDATE_CACHE_FREQUENCY=60000;



On Thu, Aug 11, 2016 at 9:40 PM, James Taylor <ja...@apache.org>
wrote:

> Hi Nathan,
> If your table is large, I'd recommend creating your index asynchronously.
> To do that, you'd add the ASYNC keyword to the end of your CREATE INDEX
> call. In this case, the index will be built through Map Reduce in a more
> resilient manner (i.e. the client going up or down won't impact it and you
> have the regular retries of a MR job). On AWS, you'll need to manually
> start the MR job, but at SFDC we have a CRON job that'll start it for you
> automatically (this is open source too, so it'd be good to get that up and
> running in AWS as well). See https://phoenix.apache.
> org/secondary_indexing.html#Index_Population for details.
>
> If you don't run it asynchronously, then you'll need to increase the query
> timeout (i.e. phoenix.query.timeoutMs config property) to be larger than
> the time it'll take to build the index. If the client goes down before the
> CREATE INDEX call finished (or the query times out), then the index build
> will stop (and unfortunately will need to be run again).
>
> To monitor the index build, there are a few ways - if running through MR,
> then you can monitor the MR job in the standard way. If running
> synchronously, you can monitor the index table - you'll see new regions
> created as splits occur, or you could query the SYSTEM.STATS table (which
> gets populated as splits and compactions happen), or you could run a
> count(*) query directly against the index table (though that'll put more
> load on your system because it'll require a full table scan).
>
> HTH. Thanks,
> James
>
> On Thu, Aug 11, 2016 at 5:51 PM, Nathan Davis <nathan.davis@salesforce.com
> > wrote:
>
>> Hi All,
>> I executed a CREATE INDEX against a fairly large table. And I received a
>> timeout error after a minute or two, which I understand is expected.
>> `!tables` in sqlline shows the index is still in BUILDING state after 2
>> hours, which may be accurate since it is a pretty large table and my
>> cluster is just a smallish EMR throwaway.
>>
>> My question is: Is there some way to verify that the index is in fact
>> still being built? Perhaps some HBase logs or the UI or some hbase shell
>> command? Unfortunately I am just as new to HBase as I am to Phoenix itself.
>>
>> Thanks,
>>  -nathan
>>
>
>

Re: monitoring status of CREATE INDEX operation

Posted by James Taylor <ja...@apache.org>.
Hi Nathan,
If your table is large, I'd recommend creating your index asynchronously.
To do that, you'd add the ASYNC keyword to the end of your CREATE INDEX
call. In this case, the index will be built through Map Reduce in a more
resilient manner (i.e. the client going up or down won't impact it and you
have the regular retries of a MR job). On AWS, you'll need to manually
start the MR job, but at SFDC we have a CRON job that'll start it for you
automatically (this is open source too, so it'd be good to get that up and
running in AWS as well). See
https://phoenix.apache.org/secondary_indexing.html#Index_Population for
details.

If you don't run it asynchronously, then you'll need to increase the query
timeout (i.e. phoenix.query.timeoutMs config property) to be larger than
the time it'll take to build the index. If the client goes down before the
CREATE INDEX call finished (or the query times out), then the index build
will stop (and unfortunately will need to be run again).

To monitor the index build, there are a few ways - if running through MR,
then you can monitor the MR job in the standard way. If running
synchronously, you can monitor the index table - you'll see new regions
created as splits occur, or you could query the SYSTEM.STATS table (which
gets populated as splits and compactions happen), or you could run a
count(*) query directly against the index table (though that'll put more
load on your system because it'll require a full table scan).

HTH. Thanks,
James

On Thu, Aug 11, 2016 at 5:51 PM, Nathan Davis <na...@salesforce.com>
wrote:

> Hi All,
> I executed a CREATE INDEX against a fairly large table. And I received a
> timeout error after a minute or two, which I understand is expected.
> `!tables` in sqlline shows the index is still in BUILDING state after 2
> hours, which may be accurate since it is a pretty large table and my
> cluster is just a smallish EMR throwaway.
>
> My question is: Is there some way to verify that the index is in fact
> still being built? Perhaps some HBase logs or the UI or some hbase shell
> command? Unfortunately I am just as new to HBase as I am to Phoenix itself.
>
> Thanks,
>  -nathan
>