You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@phoenix.apache.org by "Perko, Ralph J" <Ra...@pnnl.gov> on 2014/11/06 21:31:29 UTC

RegionTooBusyException

Hi, I am using a combination of Pig, Phoenix and HBase to load data on a test cluster and I continue to run into an issue with larger, longer running jobs (smaller jobs succeed).  After the job has run for several hours, the first set of mappers have finished and the second begin, the job dies with each mapper failing with the error RegionTooBusyException.  Could this be related to how I have my Phoenix tables configured or is this an Hbase configuration issue or something else?  Do you have any suggestions?

Thanks for the help,
Ralph


2014-11-05 23:08:31,573 INFO [main] org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 200 actions to finish
2014-11-05 23:08:33,729 WARN [phoenix-1-thread-34413] org.apache.hadoop.hbase.client.AsyncProcess: #1, table=T1_CSV_DATA, primary, attempt=36/35 failed 200 ops, last exception: null on server1,60020,1415229553858, tracking started Wed Nov 05 22:59:40 PST 2014; not retrying 200 - final failure
2014-11-05 23:08:33,736 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : java.io.IOException: Exception while committing to database.
at org.apache.phoenix.pig.hadoop.PhoenixRecordWriter.write(PhoenixRecordWriter.java:79)
at org.apache.phoenix.pig.hadoop.PhoenixRecordWriter.write(PhoenixRecordWriter.java:41)
at org.apache.phoenix.pig.PhoenixHBaseStorage.putNext(PhoenixHBaseStorage.java:151)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:139)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:98)
at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:635)
at org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.collect(PigMapOnly.java:48)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:284)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:277)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1594)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
Caused by: org.apache.phoenix.execute.CommitException: org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 200 actions: RegionTooBusyException: 200 times,
at org.apache.phoenix.execute.MutationState.commit(MutationState.java:418)
at org.apache.phoenix.jdbc.PhoenixConnection.commit(PhoenixConnection.java:356)
at org.apache.phoenix.pig.hadoop.PhoenixRecordWriter.write(PhoenixRecordWriter.java:76)
... 19 more
Caused by: org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 200 actions: RegionTooBusyException: 200 times,
at org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.makeException(AsyncProcess.java:207)
at org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.access$1700(AsyncProcess.java:187)
at org.apache.hadoop.hbase.client.AsyncProcess$AsyncRequestFutureImpl.getErrors(AsyncProcess.java:1473)
at org.apache.hadoop.hbase.client.HTable.batch(HTable.java:855)
at org.apache.hadoop.hbase.client.HTable.batch(HTable.java:869)
at org.apache.phoenix.execute.MutationState.commit(MutationState.java:399)
... 21 more

2014-11-05 23:08:33,739 INFO [main] org.apache.hadoop.mapred.Task: Runnning cleanup for the task
2014-11-05 23:08:33,773 INFO [Thread-11] org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation: Closing zookeeper sessionid=0x2497d0ab7e6007e

Data size:
75 csv files compressed with bz2
17g compressed – 165g Uncompressed

Time-series data, 6 node cluster, 5 region servers.  Hadoop 2.5  (HDP 2.1.5).  Phoenix 4.0, Hbase 0.98,

Phoenix Table def:

CREATE TABLE IF NOT EXISTS
t1_csv_data
(
timestamp BIGINT NOT NULL,
location VARCHAR NOT NULL,
fileid VARCHAR NOT NULL,
recnum INTEGER NOT NULL,
field5 VARCHAR,
...
field45 VARCHAR,
CONSTRAINT pkey PRIMARY KEY (timestamp,
location, fileid,recnum)
)
IMMUTABLE_ROWS=true,COMPRESSION='SNAPPY',SALT_BUCKETS=10;

-- indexes
CREATE INDEX t1_csv_data_f1_idx ON t1_csv_data(somefield1) COMPRESSION='SNAPPY';
CREATE INDEX t1_csv_data_f2_idx ON t1_csv_data(somefield2) COMPRESSION='SNAPPY';
CREATE INDEX t1_csv_data_f3_idx ON t1_csv_data(somefield3) COMPRESSION='SNAPPY';

Simple Pig script:

register $phoenix_jar;
register $udf_jar;
Z = load '$data' as (
file_id,
recnum,
dtm:chararray,
...
-- lots of other fields
);
D = foreach Z generate
gov.pnnl.pig.TimeStringToPeriod(dtm,'yyyyMMdd HH:mm:ss','yyyyMMddHHmmss'),
location,
fileid,
recnum,
...
-- lots of other fields
;
STORE D into
'hbase://$table_name' using
org.apache.phoenix.pig.PhoenixHBaseStorage('$zookeeper','-batchSize 1000');


Re: RegionTooBusyException

Posted by James Taylor <ja...@apache.org>.
Hi Ralph,
When an index is created, we make an estimation of its MAX_FILE_SIZE
in relation to the data table in an attempt to have it split
approximately the same number of times as the data table (because it's
typically smaller). You can override this, though, if its not optimal
for your use case.
Thanks,
James

On Fri, Nov 7, 2014 at 7:40 AM, Perko, Ralph J <Ra...@pnnl.gov> wrote:
> Salting the table, (gives me pre-splits), and using the split policy of
> ConstantSizeRegionSplitPolicy as you suggested worked!
>
> A question on the index tables – unlike the main table, hbase shows that
> each index table has the MAX_FILESIZE attribute set to 344148020 bytes which
> is well below what is set for the max HStoreFile size property in hbase,
> causing a large amount of splitting on all the index tables (which are pre
> split as well) despite using the same split policy as the main table.  Why
> is this done for just the index tables?  Is it safe to override?
>
> Thanks,
> Ralph
> __________________________________________________
> Ralph Perko
> Pacific Northwest National Laboratory
> (509) 375-2272
> ralph.perko@pnnl.gov
>
>
> From: Vladimir Rodionov <vl...@gmail.com>
> Reply-To: "user@phoenix.apache.org" <us...@phoenix.apache.org>
> Date: Thursday, November 6, 2014 at 1:04 PM
> To: "user@phoenix.apache.org" <us...@phoenix.apache.org>
> Subject: Re: RegionTooBusyException
>
> You may want to try different RegionSplitPolicy
> (ConstantSizeRegionSplitPolicy), default one
> (IncreasingToUpperBoundRegionSplitPolicy)
>  does not make sense when table is prespit in advance.
>
> -Vladimir Rodionov
>
> On Thu, Nov 6, 2014 at 1:01 PM, Vladimir Rodionov <vl...@gmail.com>
> wrote:
>>
>> Too many map task trying concurrently commit (save) data to HBase, I bet
>> you have compaction hell in your cluster during data loading.
>>
>> In a few words, you cluster is not able to keep up with data ingestion
>> rate. HBase does not do smart update/insert rate throttling for you. You may
>> try some compaction - related configuration options  :
>>    hbase.hstore.blockingWaitTime - Default. 90000
>>    hbase.hstore.compaction.min -  Default. 3
>>    hbase.hstore.compaction.max - Default. 10
>>    hbase.hstore.compaction.min.size - Default: 128 MB expressed in bytes.
>>
>> but I suggest you to pre-split your tables first, than limit # of map
>> tasks (if former does not help), than play with compaction config values
>> (above).
>>
>>
>> -Vladimir Rodionov
>>
>> On Thu, Nov 6, 2014 at 12:31 PM, Perko, Ralph J <Ra...@pnnl.gov>
>> wrote:
>>>
>>> Hi, I am using a combination of Pig, Phoenix and HBase to load data on a
>>> test cluster and I continue to run into an issue with larger, longer running
>>> jobs (smaller jobs succeed).  After the job has run for several hours, the
>>> first set of mappers have finished and the second begin, the job dies with
>>> each mapper failing with the error RegionTooBusyException.  Could this be
>>> related to how I have my Phoenix tables configured or is this an Hbase
>>> configuration issue or something else?  Do you have any suggestions?
>>>
>>> Thanks for the help,
>>> Ralph
>>>
>>>
>>> 2014-11-05 23:08:31,573 INFO [main]
>>> org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 200 actions to
>>> finish
>>> 2014-11-05 23:08:33,729 WARN [phoenix-1-thread-34413]
>>> org.apache.hadoop.hbase.client.AsyncProcess: #1, table=T1_CSV_DATA, primary,
>>> attempt=36/35 failed 200 ops, last exception: null on
>>> server1,60020,1415229553858, tracking started Wed Nov 05 22:59:40 PST 2014;
>>> not retrying 200 - final failure
>>> 2014-11-05 23:08:33,736 WARN [main] org.apache.hadoop.mapred.YarnChild:
>>> Exception running child : java.io.IOException: Exception while committing to
>>> database.
>>> at
>>> org.apache.phoenix.pig.hadoop.PhoenixRecordWriter.write(PhoenixRecordWriter.java:79)
>>> at
>>> org.apache.phoenix.pig.hadoop.PhoenixRecordWriter.write(PhoenixRecordWriter.java:41)
>>> at
>>> org.apache.phoenix.pig.PhoenixHBaseStorage.putNext(PhoenixHBaseStorage.java:151)
>>> at
>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:139)
>>> at
>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:98)
>>> at
>>> org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:635)
>>> at
>>> org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
>>> at
>>> org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)
>>> at
>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.collect(PigMapOnly.java:48)
>>> at
>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:284)
>>> at
>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:277)
>>> at
>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
>>> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
>>> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
>>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
>>> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
>>> at java.security.AccessController.doPrivileged(Native Method)
>>> at javax.security.auth.Subject.doAs(Subject.java:396)
>>> at
>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1594)
>>> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
>>> Caused by: org.apache.phoenix.execute.CommitException:
>>> org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed
>>> 200 actions: RegionTooBusyException: 200 times,
>>> at
>>> org.apache.phoenix.execute.MutationState.commit(MutationState.java:418)
>>> at
>>> org.apache.phoenix.jdbc.PhoenixConnection.commit(PhoenixConnection.java:356)
>>> at
>>> org.apache.phoenix.pig.hadoop.PhoenixRecordWriter.write(PhoenixRecordWriter.java:76)
>>> ... 19 more
>>> Caused by:
>>> org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed
>>> 200 actions: RegionTooBusyException: 200 times,
>>> at
>>> org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.makeException(AsyncProcess.java:207)
>>> at
>>> org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.access$1700(AsyncProcess.java:187)
>>> at
>>> org.apache.hadoop.hbase.client.AsyncProcess$AsyncRequestFutureImpl.getErrors(AsyncProcess.java:1473)
>>> at org.apache.hadoop.hbase.client.HTable.batch(HTable.java:855)
>>> at org.apache.hadoop.hbase.client.HTable.batch(HTable.java:869)
>>> at
>>> org.apache.phoenix.execute.MutationState.commit(MutationState.java:399)
>>> ... 21 more
>>>
>>> 2014-11-05 23:08:33,739 INFO [main] org.apache.hadoop.mapred.Task:
>>> Runnning cleanup for the task
>>> 2014-11-05 23:08:33,773 INFO [Thread-11]
>>> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation:
>>> Closing zookeeper sessionid=0x2497d0ab7e6007e
>>>
>>> Data size:
>>> 75 csv files compressed with bz2
>>> 17g compressed – 165g Uncompressed
>>>
>>> Time-series data, 6 node cluster, 5 region servers.  Hadoop 2.5  (HDP
>>> 2.1.5).  Phoenix 4.0, Hbase 0.98,
>>>
>>> Phoenix Table def:
>>>
>>> CREATE TABLE IF NOT EXISTS
>>> t1_csv_data
>>> (
>>> timestamp BIGINT NOT NULL,
>>> location VARCHAR NOT NULL,
>>> fileid VARCHAR NOT NULL,
>>> recnum INTEGER NOT NULL,
>>> field5 VARCHAR,
>>> ...
>>> field45 VARCHAR,
>>> CONSTRAINT pkey PRIMARY KEY (timestamp,
>>> location, fileid,recnum)
>>> )
>>> IMMUTABLE_ROWS=true,COMPRESSION='SNAPPY',SALT_BUCKETS=10;
>>>
>>> -- indexes
>>> CREATE INDEX t1_csv_data_f1_idx ON t1_csv_data(somefield1)
>>> COMPRESSION='SNAPPY';
>>> CREATE INDEX t1_csv_data_f2_idx ON t1_csv_data(somefield2)
>>> COMPRESSION='SNAPPY';
>>> CREATE INDEX t1_csv_data_f3_idx ON t1_csv_data(somefield3)
>>> COMPRESSION='SNAPPY';
>>>
>>> Simple Pig script:
>>>
>>> register $phoenix_jar;
>>> register $udf_jar;
>>> Z = load '$data' as (
>>> file_id,
>>> recnum,
>>> dtm:chararray,
>>> ...
>>> -- lots of other fields
>>> );
>>> D = foreach Z generate
>>> gov.pnnl.pig.TimeStringToPeriod(dtm,'yyyyMMdd
>>> HH:mm:ss','yyyyMMddHHmmss'),
>>> location,
>>> fileid,
>>> recnum,
>>> ...
>>> -- lots of other fields
>>> ;
>>> STORE D into
>>> 'hbase://$table_name' using
>>> org.apache.phoenix.pig.PhoenixHBaseStorage('$zookeeper','-batchSize
>>> 1000');
>>>
>>
>

Re: RegionTooBusyException

Posted by James Taylor <ja...@apache.org>.
It mainly targets querying, but index creation uses an UPSERT SELECT
statement. The data will be chunked into smaller chunks which should
help with timeout issues.

It won't have an impact on bulk loading through our CSV tool, though.

Thanks,
James

On Fri, Nov 7, 2014 at 10:54 AM, Vladimir Rodionov
<vl...@gmail.com> wrote:
> Thanks,
>
> It is for queries only.  I do not see how this can help during data loading
> and index creation.
>
> -Vladimir Rodionov
>
> On Fri, Nov 7, 2014 at 10:39 AM, James Taylor <ja...@apache.org>
> wrote:
>>
>> http://phoenix.apache.org/update_statistics.html
>>
>> On Fri, Nov 7, 2014 at 10:36 AM, Vladimir Rodionov
>> <vl...@gmail.com> wrote:
>> >>
>> >> With the new stats feature in 3.2/4.2, salting tables is less
>> >> necessary and will likely decrease your overall cluster throughput.
>> >
>> > Interesting, where can I get details, James? Is it fast region
>> > reassignment
>> > based on load statistics?
>> >
>> > -Vladimir Rodionov
>> >
>> > On Fri, Nov 7, 2014 at 10:29 AM, James Taylor <ja...@apache.org>
>> > wrote:
>> >>
>> >> If you salt your table (which pre-splits the table into SALT_BUCKETS
>> >> regions), by default your index will be salted and pre-split the same
>> >> way.
>> >>
>> >> FWIW, you can also presplit your table and index using the SPLIT ON
>> >> (...) syntax:
>> >> http://phoenix.apache.org/language/index.html#create_table
>> >>
>> >> With the new stats feature in 3.2/4.2, salting tables is less
>> >> necessary and will likely decrease your overall cluster throughput.
>> >>
>> >> Thanks,
>> >> James
>> >>
>> >>
>> >>
>> >> On Fri, Nov 7, 2014 at 10:21 AM, Vladimir Rodionov
>> >> <vl...@gmail.com> wrote:
>> >> > If you see split activity on your index tables, they are either not
>> >> > pre
>> >> > split or region sizes exceeds max limit (you load a lot of data into
>> >> > indexes) or index tables still on default split policy.
>> >> >
>> >> > How do you pre split your index tables?
>> >> >
>> >> > -Vladimir Rodionov
>> >> >
>> >> > On Fri, Nov 7, 2014 at 7:40 AM, Perko, Ralph J <Ra...@pnnl.gov>
>> >> > wrote:
>> >> >>
>> >> >> Salting the table, (gives me pre-splits), and using the split policy
>> >> >> of
>> >> >> ConstantSizeRegionSplitPolicy as you suggested worked!
>> >> >>
>> >> >> A question on the index tables – unlike the main table, hbase shows
>> >> >> that
>> >> >> each index table has the MAX_FILESIZE attribute set to 344148020
>> >> >> bytes
>> >> >> which
>> >> >> is well below what is set for the max HStoreFile size property in
>> >> >> hbase,
>> >> >> causing a large amount of splitting on all the index tables (which
>> >> >> are
>> >> >> pre
>> >> >> split as well) despite using the same split policy as the main
>> >> >> table.
>> >> >> Why
>> >> >> is this done for just the index tables?  Is it safe to override?
>> >> >>
>> >> >> Thanks,
>> >> >> Ralph
>> >> >> __________________________________________________
>> >> >> Ralph Perko
>> >> >> Pacific Northwest National Laboratory
>> >> >> (509) 375-2272
>> >> >> ralph.perko@pnnl.gov
>> >> >>
>> >> >>
>> >> >> From: Vladimir Rodionov <vl...@gmail.com>
>> >> >> Reply-To: "user@phoenix.apache.org" <us...@phoenix.apache.org>
>> >> >> Date: Thursday, November 6, 2014 at 1:04 PM
>> >> >> To: "user@phoenix.apache.org" <us...@phoenix.apache.org>
>> >> >> Subject: Re: RegionTooBusyException
>> >> >>
>> >> >> You may want to try different RegionSplitPolicy
>> >> >> (ConstantSizeRegionSplitPolicy), default one
>> >> >> (IncreasingToUpperBoundRegionSplitPolicy)
>> >> >>  does not make sense when table is prespit in advance.
>> >> >>
>> >> >> -Vladimir Rodionov
>> >> >>
>> >> >> On Thu, Nov 6, 2014 at 1:01 PM, Vladimir Rodionov
>> >> >> <vl...@gmail.com>
>> >> >> wrote:
>> >> >>>
>> >> >>> Too many map task trying concurrently commit (save) data to HBase,
>> >> >>> I
>> >> >>> bet
>> >> >>> you have compaction hell in your cluster during data loading.
>> >> >>>
>> >> >>> In a few words, you cluster is not able to keep up with data
>> >> >>> ingestion
>> >> >>> rate. HBase does not do smart update/insert rate throttling for
>> >> >>> you.
>> >> >>> You may
>> >> >>> try some compaction - related configuration options  :
>> >> >>>    hbase.hstore.blockingWaitTime - Default. 90000
>> >> >>>    hbase.hstore.compaction.min -  Default. 3
>> >> >>>    hbase.hstore.compaction.max - Default. 10
>> >> >>>    hbase.hstore.compaction.min.size - Default: 128 MB expressed in
>> >> >>> bytes.
>> >> >>>
>> >> >>> but I suggest you to pre-split your tables first, than limit # of
>> >> >>> map
>> >> >>> tasks (if former does not help), than play with compaction config
>> >> >>> values
>> >> >>> (above).
>> >> >>>
>> >> >>>
>> >> >>> -Vladimir Rodionov
>> >> >>>
>> >> >>> On Thu, Nov 6, 2014 at 12:31 PM, Perko, Ralph J
>> >> >>> <Ra...@pnnl.gov>
>> >> >>> wrote:
>> >> >>>>
>> >> >>>> Hi, I am using a combination of Pig, Phoenix and HBase to load
>> >> >>>> data
>> >> >>>> on a
>> >> >>>> test cluster and I continue to run into an issue with larger,
>> >> >>>> longer
>> >> >>>> running
>> >> >>>> jobs (smaller jobs succeed).  After the job has run for several
>> >> >>>> hours, the
>> >> >>>> first set of mappers have finished and the second begin, the job
>> >> >>>> dies
>> >> >>>> with
>> >> >>>> each mapper failing with the error RegionTooBusyException.  Could
>> >> >>>> this be
>> >> >>>> related to how I have my Phoenix tables configured or is this an
>> >> >>>> Hbase
>> >> >>>> configuration issue or something else?  Do you have any
>> >> >>>> suggestions?
>> >> >>>>
>> >> >>>> Thanks for the help,
>> >> >>>> Ralph
>> >> >>>>
>> >> >>>>
>> >> >>>> 2014-11-05 23:08:31,573 INFO [main]
>> >> >>>> org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 200
>> >> >>>> actions to
>> >> >>>> finish
>> >> >>>> 2014-11-05 23:08:33,729 WARN [phoenix-1-thread-34413]
>> >> >>>> org.apache.hadoop.hbase.client.AsyncProcess: #1,
>> >> >>>> table=T1_CSV_DATA,
>> >> >>>> primary,
>> >> >>>> attempt=36/35 failed 200 ops, last exception: null on
>> >> >>>> server1,60020,1415229553858, tracking started Wed Nov 05 22:59:40
>> >> >>>> PST
>> >> >>>> 2014;
>> >> >>>> not retrying 200 - final failure
>> >> >>>> 2014-11-05 23:08:33,736 WARN [main]
>> >> >>>> org.apache.hadoop.mapred.YarnChild:
>> >> >>>> Exception running child : java.io.IOException: Exception while
>> >> >>>> committing to
>> >> >>>> database.
>> >> >>>> at
>> >> >>>>
>> >> >>>>
>> >> >>>> org.apache.phoenix.pig.hadoop.PhoenixRecordWriter.write(PhoenixRecordWriter.java:79)
>> >> >>>> at
>> >> >>>>
>> >> >>>>
>> >> >>>> org.apache.phoenix.pig.hadoop.PhoenixRecordWriter.write(PhoenixRecordWriter.java:41)
>> >> >>>> at
>> >> >>>>
>> >> >>>>
>> >> >>>> org.apache.phoenix.pig.PhoenixHBaseStorage.putNext(PhoenixHBaseStorage.java:151)
>> >> >>>> at
>> >> >>>>
>> >> >>>>
>> >> >>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:139)
>> >> >>>> at
>> >> >>>>
>> >> >>>>
>> >> >>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:98)
>> >> >>>> at
>> >> >>>>
>> >> >>>>
>> >> >>>> org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:635)
>> >> >>>> at
>> >> >>>>
>> >> >>>>
>> >> >>>> org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
>> >> >>>> at
>> >> >>>>
>> >> >>>>
>> >> >>>> org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)
>> >> >>>> at
>> >> >>>>
>> >> >>>>
>> >> >>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.collect(PigMapOnly.java:48)
>> >> >>>> at
>> >> >>>>
>> >> >>>>
>> >> >>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:284)
>> >> >>>> at
>> >> >>>>
>> >> >>>>
>> >> >>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:277)
>> >> >>>> at
>> >> >>>>
>> >> >>>>
>> >> >>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
>> >> >>>> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
>> >> >>>> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
>> >> >>>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
>> >> >>>> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
>> >> >>>> at java.security.AccessController.doPrivileged(Native Method)
>> >> >>>> at javax.security.auth.Subject.doAs(Subject.java:396)
>> >> >>>> at
>> >> >>>>
>> >> >>>>
>> >> >>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1594)
>> >> >>>> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
>> >> >>>> Caused by: org.apache.phoenix.execute.CommitException:
>> >> >>>>
>> >> >>>> org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException:
>> >> >>>> Failed
>> >> >>>> 200 actions: RegionTooBusyException: 200 times,
>> >> >>>> at
>> >> >>>>
>> >> >>>>
>> >> >>>> org.apache.phoenix.execute.MutationState.commit(MutationState.java:418)
>> >> >>>> at
>> >> >>>>
>> >> >>>>
>> >> >>>> org.apache.phoenix.jdbc.PhoenixConnection.commit(PhoenixConnection.java:356)
>> >> >>>> at
>> >> >>>>
>> >> >>>>
>> >> >>>> org.apache.phoenix.pig.hadoop.PhoenixRecordWriter.write(PhoenixRecordWriter.java:76)
>> >> >>>> ... 19 more
>> >> >>>> Caused by:
>> >> >>>>
>> >> >>>> org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException:
>> >> >>>> Failed
>> >> >>>> 200 actions: RegionTooBusyException: 200 times,
>> >> >>>> at
>> >> >>>>
>> >> >>>>
>> >> >>>> org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.makeException(AsyncProcess.java:207)
>> >> >>>> at
>> >> >>>>
>> >> >>>>
>> >> >>>> org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.access$1700(AsyncProcess.java:187)
>> >> >>>> at
>> >> >>>>
>> >> >>>>
>> >> >>>> org.apache.hadoop.hbase.client.AsyncProcess$AsyncRequestFutureImpl.getErrors(AsyncProcess.java:1473)
>> >> >>>> at org.apache.hadoop.hbase.client.HTable.batch(HTable.java:855)
>> >> >>>> at org.apache.hadoop.hbase.client.HTable.batch(HTable.java:869)
>> >> >>>> at
>> >> >>>>
>> >> >>>>
>> >> >>>> org.apache.phoenix.execute.MutationState.commit(MutationState.java:399)
>> >> >>>> ... 21 more
>> >> >>>>
>> >> >>>> 2014-11-05 23:08:33,739 INFO [main] org.apache.hadoop.mapred.Task:
>> >> >>>> Runnning cleanup for the task
>> >> >>>> 2014-11-05 23:08:33,773 INFO [Thread-11]
>> >> >>>>
>> >> >>>>
>> >> >>>> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation:
>> >> >>>> Closing zookeeper sessionid=0x2497d0ab7e6007e
>> >> >>>>
>> >> >>>> Data size:
>> >> >>>> 75 csv files compressed with bz2
>> >> >>>> 17g compressed – 165g Uncompressed
>> >> >>>>
>> >> >>>> Time-series data, 6 node cluster, 5 region servers.  Hadoop 2.5
>> >> >>>> (HDP
>> >> >>>> 2.1.5).  Phoenix 4.0, Hbase 0.98,
>> >> >>>>
>> >> >>>> Phoenix Table def:
>> >> >>>>
>> >> >>>> CREATE TABLE IF NOT EXISTS
>> >> >>>> t1_csv_data
>> >> >>>> (
>> >> >>>> timestamp BIGINT NOT NULL,
>> >> >>>> location VARCHAR NOT NULL,
>> >> >>>> fileid VARCHAR NOT NULL,
>> >> >>>> recnum INTEGER NOT NULL,
>> >> >>>> field5 VARCHAR,
>> >> >>>> ...
>> >> >>>> field45 VARCHAR,
>> >> >>>> CONSTRAINT pkey PRIMARY KEY (timestamp,
>> >> >>>> location, fileid,recnum)
>> >> >>>> )
>> >> >>>> IMMUTABLE_ROWS=true,COMPRESSION='SNAPPY',SALT_BUCKETS=10;
>> >> >>>>
>> >> >>>> -- indexes
>> >> >>>> CREATE INDEX t1_csv_data_f1_idx ON t1_csv_data(somefield1)
>> >> >>>> COMPRESSION='SNAPPY';
>> >> >>>> CREATE INDEX t1_csv_data_f2_idx ON t1_csv_data(somefield2)
>> >> >>>> COMPRESSION='SNAPPY';
>> >> >>>> CREATE INDEX t1_csv_data_f3_idx ON t1_csv_data(somefield3)
>> >> >>>> COMPRESSION='SNAPPY';
>> >> >>>>
>> >> >>>> Simple Pig script:
>> >> >>>>
>> >> >>>> register $phoenix_jar;
>> >> >>>> register $udf_jar;
>> >> >>>> Z = load '$data' as (
>> >> >>>> file_id,
>> >> >>>> recnum,
>> >> >>>> dtm:chararray,
>> >> >>>> ...
>> >> >>>> -- lots of other fields
>> >> >>>> );
>> >> >>>> D = foreach Z generate
>> >> >>>> gov.pnnl.pig.TimeStringToPeriod(dtm,'yyyyMMdd
>> >> >>>> HH:mm:ss','yyyyMMddHHmmss'),
>> >> >>>> location,
>> >> >>>> fileid,
>> >> >>>> recnum,
>> >> >>>> ...
>> >> >>>> -- lots of other fields
>> >> >>>> ;
>> >> >>>> STORE D into
>> >> >>>> 'hbase://$table_name' using
>> >> >>>>
>> >> >>>> org.apache.phoenix.pig.PhoenixHBaseStorage('$zookeeper','-batchSize
>> >> >>>> 1000');
>> >> >>>>
>> >> >>>
>> >> >>
>> >> >
>> >
>> >
>
>

Re: RegionTooBusyException

Posted by Vladimir Rodionov <vl...@gmail.com>.
Thanks,

It is for queries only.  I do not see how this can help during data loading
and index creation.

-Vladimir Rodionov

On Fri, Nov 7, 2014 at 10:39 AM, James Taylor <ja...@apache.org>
wrote:

> http://phoenix.apache.org/update_statistics.html
>
> On Fri, Nov 7, 2014 at 10:36 AM, Vladimir Rodionov
> <vl...@gmail.com> wrote:
> >>
> >> With the new stats feature in 3.2/4.2, salting tables is less
> >> necessary and will likely decrease your overall cluster throughput.
> >
> > Interesting, where can I get details, James? Is it fast region
> reassignment
> > based on load statistics?
> >
> > -Vladimir Rodionov
> >
> > On Fri, Nov 7, 2014 at 10:29 AM, James Taylor <ja...@apache.org>
> > wrote:
> >>
> >> If you salt your table (which pre-splits the table into SALT_BUCKETS
> >> regions), by default your index will be salted and pre-split the same
> >> way.
> >>
> >> FWIW, you can also presplit your table and index using the SPLIT ON
> >> (...) syntax:
> http://phoenix.apache.org/language/index.html#create_table
> >>
> >> With the new stats feature in 3.2/4.2, salting tables is less
> >> necessary and will likely decrease your overall cluster throughput.
> >>
> >> Thanks,
> >> James
> >>
> >>
> >>
> >> On Fri, Nov 7, 2014 at 10:21 AM, Vladimir Rodionov
> >> <vl...@gmail.com> wrote:
> >> > If you see split activity on your index tables, they are either not
> pre
> >> > split or region sizes exceeds max limit (you load a lot of data into
> >> > indexes) or index tables still on default split policy.
> >> >
> >> > How do you pre split your index tables?
> >> >
> >> > -Vladimir Rodionov
> >> >
> >> > On Fri, Nov 7, 2014 at 7:40 AM, Perko, Ralph J <Ra...@pnnl.gov>
> >> > wrote:
> >> >>
> >> >> Salting the table, (gives me pre-splits), and using the split policy
> of
> >> >> ConstantSizeRegionSplitPolicy as you suggested worked!
> >> >>
> >> >> A question on the index tables – unlike the main table, hbase shows
> >> >> that
> >> >> each index table has the MAX_FILESIZE attribute set to 344148020
> bytes
> >> >> which
> >> >> is well below what is set for the max HStoreFile size property in
> >> >> hbase,
> >> >> causing a large amount of splitting on all the index tables (which
> are
> >> >> pre
> >> >> split as well) despite using the same split policy as the main table.
> >> >> Why
> >> >> is this done for just the index tables?  Is it safe to override?
> >> >>
> >> >> Thanks,
> >> >> Ralph
> >> >> __________________________________________________
> >> >> Ralph Perko
> >> >> Pacific Northwest National Laboratory
> >> >> (509) 375-2272
> >> >> ralph.perko@pnnl.gov
> >> >>
> >> >>
> >> >> From: Vladimir Rodionov <vl...@gmail.com>
> >> >> Reply-To: "user@phoenix.apache.org" <us...@phoenix.apache.org>
> >> >> Date: Thursday, November 6, 2014 at 1:04 PM
> >> >> To: "user@phoenix.apache.org" <us...@phoenix.apache.org>
> >> >> Subject: Re: RegionTooBusyException
> >> >>
> >> >> You may want to try different RegionSplitPolicy
> >> >> (ConstantSizeRegionSplitPolicy), default one
> >> >> (IncreasingToUpperBoundRegionSplitPolicy)
> >> >>  does not make sense when table is prespit in advance.
> >> >>
> >> >> -Vladimir Rodionov
> >> >>
> >> >> On Thu, Nov 6, 2014 at 1:01 PM, Vladimir Rodionov
> >> >> <vl...@gmail.com>
> >> >> wrote:
> >> >>>
> >> >>> Too many map task trying concurrently commit (save) data to HBase, I
> >> >>> bet
> >> >>> you have compaction hell in your cluster during data loading.
> >> >>>
> >> >>> In a few words, you cluster is not able to keep up with data
> ingestion
> >> >>> rate. HBase does not do smart update/insert rate throttling for you.
> >> >>> You may
> >> >>> try some compaction - related configuration options  :
> >> >>>    hbase.hstore.blockingWaitTime - Default. 90000
> >> >>>    hbase.hstore.compaction.min -  Default. 3
> >> >>>    hbase.hstore.compaction.max - Default. 10
> >> >>>    hbase.hstore.compaction.min.size - Default: 128 MB expressed in
> >> >>> bytes.
> >> >>>
> >> >>> but I suggest you to pre-split your tables first, than limit # of
> map
> >> >>> tasks (if former does not help), than play with compaction config
> >> >>> values
> >> >>> (above).
> >> >>>
> >> >>>
> >> >>> -Vladimir Rodionov
> >> >>>
> >> >>> On Thu, Nov 6, 2014 at 12:31 PM, Perko, Ralph J <
> Ralph.Perko@pnnl.gov>
> >> >>> wrote:
> >> >>>>
> >> >>>> Hi, I am using a combination of Pig, Phoenix and HBase to load data
> >> >>>> on a
> >> >>>> test cluster and I continue to run into an issue with larger,
> longer
> >> >>>> running
> >> >>>> jobs (smaller jobs succeed).  After the job has run for several
> >> >>>> hours, the
> >> >>>> first set of mappers have finished and the second begin, the job
> dies
> >> >>>> with
> >> >>>> each mapper failing with the error RegionTooBusyException.  Could
> >> >>>> this be
> >> >>>> related to how I have my Phoenix tables configured or is this an
> >> >>>> Hbase
> >> >>>> configuration issue or something else?  Do you have any
> suggestions?
> >> >>>>
> >> >>>> Thanks for the help,
> >> >>>> Ralph
> >> >>>>
> >> >>>>
> >> >>>> 2014-11-05 23:08:31,573 INFO [main]
> >> >>>> org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 200
> >> >>>> actions to
> >> >>>> finish
> >> >>>> 2014-11-05 23:08:33,729 WARN [phoenix-1-thread-34413]
> >> >>>> org.apache.hadoop.hbase.client.AsyncProcess: #1, table=T1_CSV_DATA,
> >> >>>> primary,
> >> >>>> attempt=36/35 failed 200 ops, last exception: null on
> >> >>>> server1,60020,1415229553858, tracking started Wed Nov 05 22:59:40
> PST
> >> >>>> 2014;
> >> >>>> not retrying 200 - final failure
> >> >>>> 2014-11-05 23:08:33,736 WARN [main]
> >> >>>> org.apache.hadoop.mapred.YarnChild:
> >> >>>> Exception running child : java.io.IOException: Exception while
> >> >>>> committing to
> >> >>>> database.
> >> >>>> at
> >> >>>>
> >> >>>>
> org.apache.phoenix.pig.hadoop.PhoenixRecordWriter.write(PhoenixRecordWriter.java:79)
> >> >>>> at
> >> >>>>
> >> >>>>
> org.apache.phoenix.pig.hadoop.PhoenixRecordWriter.write(PhoenixRecordWriter.java:41)
> >> >>>> at
> >> >>>>
> >> >>>>
> org.apache.phoenix.pig.PhoenixHBaseStorage.putNext(PhoenixHBaseStorage.java:151)
> >> >>>> at
> >> >>>>
> >> >>>>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:139)
> >> >>>> at
> >> >>>>
> >> >>>>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:98)
> >> >>>> at
> >> >>>>
> >> >>>>
> org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:635)
> >> >>>> at
> >> >>>>
> >> >>>>
> org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
> >> >>>> at
> >> >>>>
> >> >>>>
> org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)
> >> >>>> at
> >> >>>>
> >> >>>>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.collect(PigMapOnly.java:48)
> >> >>>> at
> >> >>>>
> >> >>>>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:284)
> >> >>>> at
> >> >>>>
> >> >>>>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:277)
> >> >>>> at
> >> >>>>
> >> >>>>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
> >> >>>> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
> >> >>>> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
> >> >>>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
> >> >>>> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
> >> >>>> at java.security.AccessController.doPrivileged(Native Method)
> >> >>>> at javax.security.auth.Subject.doAs(Subject.java:396)
> >> >>>> at
> >> >>>>
> >> >>>>
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1594)
> >> >>>> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
> >> >>>> Caused by: org.apache.phoenix.execute.CommitException:
> >> >>>>
> org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException:
> >> >>>> Failed
> >> >>>> 200 actions: RegionTooBusyException: 200 times,
> >> >>>> at
> >> >>>>
> >> >>>>
> org.apache.phoenix.execute.MutationState.commit(MutationState.java:418)
> >> >>>> at
> >> >>>>
> >> >>>>
> org.apache.phoenix.jdbc.PhoenixConnection.commit(PhoenixConnection.java:356)
> >> >>>> at
> >> >>>>
> >> >>>>
> org.apache.phoenix.pig.hadoop.PhoenixRecordWriter.write(PhoenixRecordWriter.java:76)
> >> >>>> ... 19 more
> >> >>>> Caused by:
> >> >>>>
> org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException:
> >> >>>> Failed
> >> >>>> 200 actions: RegionTooBusyException: 200 times,
> >> >>>> at
> >> >>>>
> >> >>>>
> org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.makeException(AsyncProcess.java:207)
> >> >>>> at
> >> >>>>
> >> >>>>
> org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.access$1700(AsyncProcess.java:187)
> >> >>>> at
> >> >>>>
> >> >>>>
> org.apache.hadoop.hbase.client.AsyncProcess$AsyncRequestFutureImpl.getErrors(AsyncProcess.java:1473)
> >> >>>> at org.apache.hadoop.hbase.client.HTable.batch(HTable.java:855)
> >> >>>> at org.apache.hadoop.hbase.client.HTable.batch(HTable.java:869)
> >> >>>> at
> >> >>>>
> >> >>>>
> org.apache.phoenix.execute.MutationState.commit(MutationState.java:399)
> >> >>>> ... 21 more
> >> >>>>
> >> >>>> 2014-11-05 23:08:33,739 INFO [main] org.apache.hadoop.mapred.Task:
> >> >>>> Runnning cleanup for the task
> >> >>>> 2014-11-05 23:08:33,773 INFO [Thread-11]
> >> >>>>
> >> >>>>
> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation:
> >> >>>> Closing zookeeper sessionid=0x2497d0ab7e6007e
> >> >>>>
> >> >>>> Data size:
> >> >>>> 75 csv files compressed with bz2
> >> >>>> 17g compressed – 165g Uncompressed
> >> >>>>
> >> >>>> Time-series data, 6 node cluster, 5 region servers.  Hadoop 2.5
> (HDP
> >> >>>> 2.1.5).  Phoenix 4.0, Hbase 0.98,
> >> >>>>
> >> >>>> Phoenix Table def:
> >> >>>>
> >> >>>> CREATE TABLE IF NOT EXISTS
> >> >>>> t1_csv_data
> >> >>>> (
> >> >>>> timestamp BIGINT NOT NULL,
> >> >>>> location VARCHAR NOT NULL,
> >> >>>> fileid VARCHAR NOT NULL,
> >> >>>> recnum INTEGER NOT NULL,
> >> >>>> field5 VARCHAR,
> >> >>>> ...
> >> >>>> field45 VARCHAR,
> >> >>>> CONSTRAINT pkey PRIMARY KEY (timestamp,
> >> >>>> location, fileid,recnum)
> >> >>>> )
> >> >>>> IMMUTABLE_ROWS=true,COMPRESSION='SNAPPY',SALT_BUCKETS=10;
> >> >>>>
> >> >>>> -- indexes
> >> >>>> CREATE INDEX t1_csv_data_f1_idx ON t1_csv_data(somefield1)
> >> >>>> COMPRESSION='SNAPPY';
> >> >>>> CREATE INDEX t1_csv_data_f2_idx ON t1_csv_data(somefield2)
> >> >>>> COMPRESSION='SNAPPY';
> >> >>>> CREATE INDEX t1_csv_data_f3_idx ON t1_csv_data(somefield3)
> >> >>>> COMPRESSION='SNAPPY';
> >> >>>>
> >> >>>> Simple Pig script:
> >> >>>>
> >> >>>> register $phoenix_jar;
> >> >>>> register $udf_jar;
> >> >>>> Z = load '$data' as (
> >> >>>> file_id,
> >> >>>> recnum,
> >> >>>> dtm:chararray,
> >> >>>> ...
> >> >>>> -- lots of other fields
> >> >>>> );
> >> >>>> D = foreach Z generate
> >> >>>> gov.pnnl.pig.TimeStringToPeriod(dtm,'yyyyMMdd
> >> >>>> HH:mm:ss','yyyyMMddHHmmss'),
> >> >>>> location,
> >> >>>> fileid,
> >> >>>> recnum,
> >> >>>> ...
> >> >>>> -- lots of other fields
> >> >>>> ;
> >> >>>> STORE D into
> >> >>>> 'hbase://$table_name' using
> >> >>>> org.apache.phoenix.pig.PhoenixHBaseStorage('$zookeeper','-batchSize
> >> >>>> 1000');
> >> >>>>
> >> >>>
> >> >>
> >> >
> >
> >
>

Re: RegionTooBusyException

Posted by James Taylor <ja...@apache.org>.
http://phoenix.apache.org/update_statistics.html

On Fri, Nov 7, 2014 at 10:36 AM, Vladimir Rodionov
<vl...@gmail.com> wrote:
>>
>> With the new stats feature in 3.2/4.2, salting tables is less
>> necessary and will likely decrease your overall cluster throughput.
>
> Interesting, where can I get details, James? Is it fast region reassignment
> based on load statistics?
>
> -Vladimir Rodionov
>
> On Fri, Nov 7, 2014 at 10:29 AM, James Taylor <ja...@apache.org>
> wrote:
>>
>> If you salt your table (which pre-splits the table into SALT_BUCKETS
>> regions), by default your index will be salted and pre-split the same
>> way.
>>
>> FWIW, you can also presplit your table and index using the SPLIT ON
>> (...) syntax: http://phoenix.apache.org/language/index.html#create_table
>>
>> With the new stats feature in 3.2/4.2, salting tables is less
>> necessary and will likely decrease your overall cluster throughput.
>>
>> Thanks,
>> James
>>
>>
>>
>> On Fri, Nov 7, 2014 at 10:21 AM, Vladimir Rodionov
>> <vl...@gmail.com> wrote:
>> > If you see split activity on your index tables, they are either not pre
>> > split or region sizes exceeds max limit (you load a lot of data into
>> > indexes) or index tables still on default split policy.
>> >
>> > How do you pre split your index tables?
>> >
>> > -Vladimir Rodionov
>> >
>> > On Fri, Nov 7, 2014 at 7:40 AM, Perko, Ralph J <Ra...@pnnl.gov>
>> > wrote:
>> >>
>> >> Salting the table, (gives me pre-splits), and using the split policy of
>> >> ConstantSizeRegionSplitPolicy as you suggested worked!
>> >>
>> >> A question on the index tables – unlike the main table, hbase shows
>> >> that
>> >> each index table has the MAX_FILESIZE attribute set to 344148020 bytes
>> >> which
>> >> is well below what is set for the max HStoreFile size property in
>> >> hbase,
>> >> causing a large amount of splitting on all the index tables (which are
>> >> pre
>> >> split as well) despite using the same split policy as the main table.
>> >> Why
>> >> is this done for just the index tables?  Is it safe to override?
>> >>
>> >> Thanks,
>> >> Ralph
>> >> __________________________________________________
>> >> Ralph Perko
>> >> Pacific Northwest National Laboratory
>> >> (509) 375-2272
>> >> ralph.perko@pnnl.gov
>> >>
>> >>
>> >> From: Vladimir Rodionov <vl...@gmail.com>
>> >> Reply-To: "user@phoenix.apache.org" <us...@phoenix.apache.org>
>> >> Date: Thursday, November 6, 2014 at 1:04 PM
>> >> To: "user@phoenix.apache.org" <us...@phoenix.apache.org>
>> >> Subject: Re: RegionTooBusyException
>> >>
>> >> You may want to try different RegionSplitPolicy
>> >> (ConstantSizeRegionSplitPolicy), default one
>> >> (IncreasingToUpperBoundRegionSplitPolicy)
>> >>  does not make sense when table is prespit in advance.
>> >>
>> >> -Vladimir Rodionov
>> >>
>> >> On Thu, Nov 6, 2014 at 1:01 PM, Vladimir Rodionov
>> >> <vl...@gmail.com>
>> >> wrote:
>> >>>
>> >>> Too many map task trying concurrently commit (save) data to HBase, I
>> >>> bet
>> >>> you have compaction hell in your cluster during data loading.
>> >>>
>> >>> In a few words, you cluster is not able to keep up with data ingestion
>> >>> rate. HBase does not do smart update/insert rate throttling for you.
>> >>> You may
>> >>> try some compaction - related configuration options  :
>> >>>    hbase.hstore.blockingWaitTime - Default. 90000
>> >>>    hbase.hstore.compaction.min -  Default. 3
>> >>>    hbase.hstore.compaction.max - Default. 10
>> >>>    hbase.hstore.compaction.min.size - Default: 128 MB expressed in
>> >>> bytes.
>> >>>
>> >>> but I suggest you to pre-split your tables first, than limit # of map
>> >>> tasks (if former does not help), than play with compaction config
>> >>> values
>> >>> (above).
>> >>>
>> >>>
>> >>> -Vladimir Rodionov
>> >>>
>> >>> On Thu, Nov 6, 2014 at 12:31 PM, Perko, Ralph J <Ra...@pnnl.gov>
>> >>> wrote:
>> >>>>
>> >>>> Hi, I am using a combination of Pig, Phoenix and HBase to load data
>> >>>> on a
>> >>>> test cluster and I continue to run into an issue with larger, longer
>> >>>> running
>> >>>> jobs (smaller jobs succeed).  After the job has run for several
>> >>>> hours, the
>> >>>> first set of mappers have finished and the second begin, the job dies
>> >>>> with
>> >>>> each mapper failing with the error RegionTooBusyException.  Could
>> >>>> this be
>> >>>> related to how I have my Phoenix tables configured or is this an
>> >>>> Hbase
>> >>>> configuration issue or something else?  Do you have any suggestions?
>> >>>>
>> >>>> Thanks for the help,
>> >>>> Ralph
>> >>>>
>> >>>>
>> >>>> 2014-11-05 23:08:31,573 INFO [main]
>> >>>> org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 200
>> >>>> actions to
>> >>>> finish
>> >>>> 2014-11-05 23:08:33,729 WARN [phoenix-1-thread-34413]
>> >>>> org.apache.hadoop.hbase.client.AsyncProcess: #1, table=T1_CSV_DATA,
>> >>>> primary,
>> >>>> attempt=36/35 failed 200 ops, last exception: null on
>> >>>> server1,60020,1415229553858, tracking started Wed Nov 05 22:59:40 PST
>> >>>> 2014;
>> >>>> not retrying 200 - final failure
>> >>>> 2014-11-05 23:08:33,736 WARN [main]
>> >>>> org.apache.hadoop.mapred.YarnChild:
>> >>>> Exception running child : java.io.IOException: Exception while
>> >>>> committing to
>> >>>> database.
>> >>>> at
>> >>>>
>> >>>> org.apache.phoenix.pig.hadoop.PhoenixRecordWriter.write(PhoenixRecordWriter.java:79)
>> >>>> at
>> >>>>
>> >>>> org.apache.phoenix.pig.hadoop.PhoenixRecordWriter.write(PhoenixRecordWriter.java:41)
>> >>>> at
>> >>>>
>> >>>> org.apache.phoenix.pig.PhoenixHBaseStorage.putNext(PhoenixHBaseStorage.java:151)
>> >>>> at
>> >>>>
>> >>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:139)
>> >>>> at
>> >>>>
>> >>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:98)
>> >>>> at
>> >>>>
>> >>>> org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:635)
>> >>>> at
>> >>>>
>> >>>> org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
>> >>>> at
>> >>>>
>> >>>> org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)
>> >>>> at
>> >>>>
>> >>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.collect(PigMapOnly.java:48)
>> >>>> at
>> >>>>
>> >>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:284)
>> >>>> at
>> >>>>
>> >>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:277)
>> >>>> at
>> >>>>
>> >>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
>> >>>> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
>> >>>> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
>> >>>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
>> >>>> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
>> >>>> at java.security.AccessController.doPrivileged(Native Method)
>> >>>> at javax.security.auth.Subject.doAs(Subject.java:396)
>> >>>> at
>> >>>>
>> >>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1594)
>> >>>> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
>> >>>> Caused by: org.apache.phoenix.execute.CommitException:
>> >>>> org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException:
>> >>>> Failed
>> >>>> 200 actions: RegionTooBusyException: 200 times,
>> >>>> at
>> >>>>
>> >>>> org.apache.phoenix.execute.MutationState.commit(MutationState.java:418)
>> >>>> at
>> >>>>
>> >>>> org.apache.phoenix.jdbc.PhoenixConnection.commit(PhoenixConnection.java:356)
>> >>>> at
>> >>>>
>> >>>> org.apache.phoenix.pig.hadoop.PhoenixRecordWriter.write(PhoenixRecordWriter.java:76)
>> >>>> ... 19 more
>> >>>> Caused by:
>> >>>> org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException:
>> >>>> Failed
>> >>>> 200 actions: RegionTooBusyException: 200 times,
>> >>>> at
>> >>>>
>> >>>> org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.makeException(AsyncProcess.java:207)
>> >>>> at
>> >>>>
>> >>>> org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.access$1700(AsyncProcess.java:187)
>> >>>> at
>> >>>>
>> >>>> org.apache.hadoop.hbase.client.AsyncProcess$AsyncRequestFutureImpl.getErrors(AsyncProcess.java:1473)
>> >>>> at org.apache.hadoop.hbase.client.HTable.batch(HTable.java:855)
>> >>>> at org.apache.hadoop.hbase.client.HTable.batch(HTable.java:869)
>> >>>> at
>> >>>>
>> >>>> org.apache.phoenix.execute.MutationState.commit(MutationState.java:399)
>> >>>> ... 21 more
>> >>>>
>> >>>> 2014-11-05 23:08:33,739 INFO [main] org.apache.hadoop.mapred.Task:
>> >>>> Runnning cleanup for the task
>> >>>> 2014-11-05 23:08:33,773 INFO [Thread-11]
>> >>>>
>> >>>> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation:
>> >>>> Closing zookeeper sessionid=0x2497d0ab7e6007e
>> >>>>
>> >>>> Data size:
>> >>>> 75 csv files compressed with bz2
>> >>>> 17g compressed – 165g Uncompressed
>> >>>>
>> >>>> Time-series data, 6 node cluster, 5 region servers.  Hadoop 2.5  (HDP
>> >>>> 2.1.5).  Phoenix 4.0, Hbase 0.98,
>> >>>>
>> >>>> Phoenix Table def:
>> >>>>
>> >>>> CREATE TABLE IF NOT EXISTS
>> >>>> t1_csv_data
>> >>>> (
>> >>>> timestamp BIGINT NOT NULL,
>> >>>> location VARCHAR NOT NULL,
>> >>>> fileid VARCHAR NOT NULL,
>> >>>> recnum INTEGER NOT NULL,
>> >>>> field5 VARCHAR,
>> >>>> ...
>> >>>> field45 VARCHAR,
>> >>>> CONSTRAINT pkey PRIMARY KEY (timestamp,
>> >>>> location, fileid,recnum)
>> >>>> )
>> >>>> IMMUTABLE_ROWS=true,COMPRESSION='SNAPPY',SALT_BUCKETS=10;
>> >>>>
>> >>>> -- indexes
>> >>>> CREATE INDEX t1_csv_data_f1_idx ON t1_csv_data(somefield1)
>> >>>> COMPRESSION='SNAPPY';
>> >>>> CREATE INDEX t1_csv_data_f2_idx ON t1_csv_data(somefield2)
>> >>>> COMPRESSION='SNAPPY';
>> >>>> CREATE INDEX t1_csv_data_f3_idx ON t1_csv_data(somefield3)
>> >>>> COMPRESSION='SNAPPY';
>> >>>>
>> >>>> Simple Pig script:
>> >>>>
>> >>>> register $phoenix_jar;
>> >>>> register $udf_jar;
>> >>>> Z = load '$data' as (
>> >>>> file_id,
>> >>>> recnum,
>> >>>> dtm:chararray,
>> >>>> ...
>> >>>> -- lots of other fields
>> >>>> );
>> >>>> D = foreach Z generate
>> >>>> gov.pnnl.pig.TimeStringToPeriod(dtm,'yyyyMMdd
>> >>>> HH:mm:ss','yyyyMMddHHmmss'),
>> >>>> location,
>> >>>> fileid,
>> >>>> recnum,
>> >>>> ...
>> >>>> -- lots of other fields
>> >>>> ;
>> >>>> STORE D into
>> >>>> 'hbase://$table_name' using
>> >>>> org.apache.phoenix.pig.PhoenixHBaseStorage('$zookeeper','-batchSize
>> >>>> 1000');
>> >>>>
>> >>>
>> >>
>> >
>
>

Re: RegionTooBusyException

Posted by Vladimir Rodionov <vl...@gmail.com>.
>
> With the new stats feature in 3.2/4.2, salting tables is less
> necessary and will likely decrease your overall cluster throughput.

Interesting, where can I get details, James? Is it fast region reassignment
based on load statistics?

-Vladimir Rodionov

On Fri, Nov 7, 2014 at 10:29 AM, James Taylor <ja...@apache.org>
wrote:

> If you salt your table (which pre-splits the table into SALT_BUCKETS
> regions), by default your index will be salted and pre-split the same
> way.
>
> FWIW, you can also presplit your table and index using the SPLIT ON
> (...) syntax: http://phoenix.apache.org/language/index.html#create_table
>
> With the new stats feature in 3.2/4.2, salting tables is less
> necessary and will likely decrease your overall cluster throughput.
>
> Thanks,
> James
>
>
>
> On Fri, Nov 7, 2014 at 10:21 AM, Vladimir Rodionov
> <vl...@gmail.com> wrote:
> > If you see split activity on your index tables, they are either not pre
> > split or region sizes exceeds max limit (you load a lot of data into
> > indexes) or index tables still on default split policy.
> >
> > How do you pre split your index tables?
> >
> > -Vladimir Rodionov
> >
> > On Fri, Nov 7, 2014 at 7:40 AM, Perko, Ralph J <Ra...@pnnl.gov>
> wrote:
> >>
> >> Salting the table, (gives me pre-splits), and using the split policy of
> >> ConstantSizeRegionSplitPolicy as you suggested worked!
> >>
> >> A question on the index tables – unlike the main table, hbase shows that
> >> each index table has the MAX_FILESIZE attribute set to 344148020 bytes
> which
> >> is well below what is set for the max HStoreFile size property in hbase,
> >> causing a large amount of splitting on all the index tables (which are
> pre
> >> split as well) despite using the same split policy as the main table.
> Why
> >> is this done for just the index tables?  Is it safe to override?
> >>
> >> Thanks,
> >> Ralph
> >> __________________________________________________
> >> Ralph Perko
> >> Pacific Northwest National Laboratory
> >> (509) 375-2272
> >> ralph.perko@pnnl.gov
> >>
> >>
> >> From: Vladimir Rodionov <vl...@gmail.com>
> >> Reply-To: "user@phoenix.apache.org" <us...@phoenix.apache.org>
> >> Date: Thursday, November 6, 2014 at 1:04 PM
> >> To: "user@phoenix.apache.org" <us...@phoenix.apache.org>
> >> Subject: Re: RegionTooBusyException
> >>
> >> You may want to try different RegionSplitPolicy
> >> (ConstantSizeRegionSplitPolicy), default one
> >> (IncreasingToUpperBoundRegionSplitPolicy)
> >>  does not make sense when table is prespit in advance.
> >>
> >> -Vladimir Rodionov
> >>
> >> On Thu, Nov 6, 2014 at 1:01 PM, Vladimir Rodionov <
> vladrodionov@gmail.com>
> >> wrote:
> >>>
> >>> Too many map task trying concurrently commit (save) data to HBase, I
> bet
> >>> you have compaction hell in your cluster during data loading.
> >>>
> >>> In a few words, you cluster is not able to keep up with data ingestion
> >>> rate. HBase does not do smart update/insert rate throttling for you.
> You may
> >>> try some compaction - related configuration options  :
> >>>    hbase.hstore.blockingWaitTime - Default. 90000
> >>>    hbase.hstore.compaction.min -  Default. 3
> >>>    hbase.hstore.compaction.max - Default. 10
> >>>    hbase.hstore.compaction.min.size - Default: 128 MB expressed in
> bytes.
> >>>
> >>> but I suggest you to pre-split your tables first, than limit # of map
> >>> tasks (if former does not help), than play with compaction config
> values
> >>> (above).
> >>>
> >>>
> >>> -Vladimir Rodionov
> >>>
> >>> On Thu, Nov 6, 2014 at 12:31 PM, Perko, Ralph J <Ra...@pnnl.gov>
> >>> wrote:
> >>>>
> >>>> Hi, I am using a combination of Pig, Phoenix and HBase to load data
> on a
> >>>> test cluster and I continue to run into an issue with larger, longer
> running
> >>>> jobs (smaller jobs succeed).  After the job has run for several
> hours, the
> >>>> first set of mappers have finished and the second begin, the job dies
> with
> >>>> each mapper failing with the error RegionTooBusyException.  Could
> this be
> >>>> related to how I have my Phoenix tables configured or is this an Hbase
> >>>> configuration issue or something else?  Do you have any suggestions?
> >>>>
> >>>> Thanks for the help,
> >>>> Ralph
> >>>>
> >>>>
> >>>> 2014-11-05 23:08:31,573 INFO [main]
> >>>> org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 200
> actions to
> >>>> finish
> >>>> 2014-11-05 23:08:33,729 WARN [phoenix-1-thread-34413]
> >>>> org.apache.hadoop.hbase.client.AsyncProcess: #1, table=T1_CSV_DATA,
> primary,
> >>>> attempt=36/35 failed 200 ops, last exception: null on
> >>>> server1,60020,1415229553858, tracking started Wed Nov 05 22:59:40 PST
> 2014;
> >>>> not retrying 200 - final failure
> >>>> 2014-11-05 23:08:33,736 WARN [main]
> org.apache.hadoop.mapred.YarnChild:
> >>>> Exception running child : java.io.IOException: Exception while
> committing to
> >>>> database.
> >>>> at
> >>>>
> org.apache.phoenix.pig.hadoop.PhoenixRecordWriter.write(PhoenixRecordWriter.java:79)
> >>>> at
> >>>>
> org.apache.phoenix.pig.hadoop.PhoenixRecordWriter.write(PhoenixRecordWriter.java:41)
> >>>> at
> >>>>
> org.apache.phoenix.pig.PhoenixHBaseStorage.putNext(PhoenixHBaseStorage.java:151)
> >>>> at
> >>>>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:139)
> >>>> at
> >>>>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:98)
> >>>> at
> >>>>
> org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:635)
> >>>> at
> >>>>
> org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
> >>>> at
> >>>>
> org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)
> >>>> at
> >>>>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.collect(PigMapOnly.java:48)
> >>>> at
> >>>>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:284)
> >>>> at
> >>>>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:277)
> >>>> at
> >>>>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
> >>>> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
> >>>> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
> >>>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
> >>>> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
> >>>> at java.security.AccessController.doPrivileged(Native Method)
> >>>> at javax.security.auth.Subject.doAs(Subject.java:396)
> >>>> at
> >>>>
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1594)
> >>>> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
> >>>> Caused by: org.apache.phoenix.execute.CommitException:
> >>>> org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException:
> Failed
> >>>> 200 actions: RegionTooBusyException: 200 times,
> >>>> at
> >>>>
> org.apache.phoenix.execute.MutationState.commit(MutationState.java:418)
> >>>> at
> >>>>
> org.apache.phoenix.jdbc.PhoenixConnection.commit(PhoenixConnection.java:356)
> >>>> at
> >>>>
> org.apache.phoenix.pig.hadoop.PhoenixRecordWriter.write(PhoenixRecordWriter.java:76)
> >>>> ... 19 more
> >>>> Caused by:
> >>>> org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException:
> Failed
> >>>> 200 actions: RegionTooBusyException: 200 times,
> >>>> at
> >>>>
> org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.makeException(AsyncProcess.java:207)
> >>>> at
> >>>>
> org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.access$1700(AsyncProcess.java:187)
> >>>> at
> >>>>
> org.apache.hadoop.hbase.client.AsyncProcess$AsyncRequestFutureImpl.getErrors(AsyncProcess.java:1473)
> >>>> at org.apache.hadoop.hbase.client.HTable.batch(HTable.java:855)
> >>>> at org.apache.hadoop.hbase.client.HTable.batch(HTable.java:869)
> >>>> at
> >>>>
> org.apache.phoenix.execute.MutationState.commit(MutationState.java:399)
> >>>> ... 21 more
> >>>>
> >>>> 2014-11-05 23:08:33,739 INFO [main] org.apache.hadoop.mapred.Task:
> >>>> Runnning cleanup for the task
> >>>> 2014-11-05 23:08:33,773 INFO [Thread-11]
> >>>>
> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation:
> >>>> Closing zookeeper sessionid=0x2497d0ab7e6007e
> >>>>
> >>>> Data size:
> >>>> 75 csv files compressed with bz2
> >>>> 17g compressed – 165g Uncompressed
> >>>>
> >>>> Time-series data, 6 node cluster, 5 region servers.  Hadoop 2.5  (HDP
> >>>> 2.1.5).  Phoenix 4.0, Hbase 0.98,
> >>>>
> >>>> Phoenix Table def:
> >>>>
> >>>> CREATE TABLE IF NOT EXISTS
> >>>> t1_csv_data
> >>>> (
> >>>> timestamp BIGINT NOT NULL,
> >>>> location VARCHAR NOT NULL,
> >>>> fileid VARCHAR NOT NULL,
> >>>> recnum INTEGER NOT NULL,
> >>>> field5 VARCHAR,
> >>>> ...
> >>>> field45 VARCHAR,
> >>>> CONSTRAINT pkey PRIMARY KEY (timestamp,
> >>>> location, fileid,recnum)
> >>>> )
> >>>> IMMUTABLE_ROWS=true,COMPRESSION='SNAPPY',SALT_BUCKETS=10;
> >>>>
> >>>> -- indexes
> >>>> CREATE INDEX t1_csv_data_f1_idx ON t1_csv_data(somefield1)
> >>>> COMPRESSION='SNAPPY';
> >>>> CREATE INDEX t1_csv_data_f2_idx ON t1_csv_data(somefield2)
> >>>> COMPRESSION='SNAPPY';
> >>>> CREATE INDEX t1_csv_data_f3_idx ON t1_csv_data(somefield3)
> >>>> COMPRESSION='SNAPPY';
> >>>>
> >>>> Simple Pig script:
> >>>>
> >>>> register $phoenix_jar;
> >>>> register $udf_jar;
> >>>> Z = load '$data' as (
> >>>> file_id,
> >>>> recnum,
> >>>> dtm:chararray,
> >>>> ...
> >>>> -- lots of other fields
> >>>> );
> >>>> D = foreach Z generate
> >>>> gov.pnnl.pig.TimeStringToPeriod(dtm,'yyyyMMdd
> >>>> HH:mm:ss','yyyyMMddHHmmss'),
> >>>> location,
> >>>> fileid,
> >>>> recnum,
> >>>> ...
> >>>> -- lots of other fields
> >>>> ;
> >>>> STORE D into
> >>>> 'hbase://$table_name' using
> >>>> org.apache.phoenix.pig.PhoenixHBaseStorage('$zookeeper','-batchSize
> >>>> 1000');
> >>>>
> >>>
> >>
> >
>

Re: RegionTooBusyException

Posted by James Taylor <ja...@apache.org>.
If you salt your table (which pre-splits the table into SALT_BUCKETS
regions), by default your index will be salted and pre-split the same
way.

FWIW, you can also presplit your table and index using the SPLIT ON
(...) syntax: http://phoenix.apache.org/language/index.html#create_table

With the new stats feature in 3.2/4.2, salting tables is less
necessary and will likely decrease your overall cluster throughput.

Thanks,
James



On Fri, Nov 7, 2014 at 10:21 AM, Vladimir Rodionov
<vl...@gmail.com> wrote:
> If you see split activity on your index tables, they are either not pre
> split or region sizes exceeds max limit (you load a lot of data into
> indexes) or index tables still on default split policy.
>
> How do you pre split your index tables?
>
> -Vladimir Rodionov
>
> On Fri, Nov 7, 2014 at 7:40 AM, Perko, Ralph J <Ra...@pnnl.gov> wrote:
>>
>> Salting the table, (gives me pre-splits), and using the split policy of
>> ConstantSizeRegionSplitPolicy as you suggested worked!
>>
>> A question on the index tables – unlike the main table, hbase shows that
>> each index table has the MAX_FILESIZE attribute set to 344148020 bytes which
>> is well below what is set for the max HStoreFile size property in hbase,
>> causing a large amount of splitting on all the index tables (which are pre
>> split as well) despite using the same split policy as the main table.  Why
>> is this done for just the index tables?  Is it safe to override?
>>
>> Thanks,
>> Ralph
>> __________________________________________________
>> Ralph Perko
>> Pacific Northwest National Laboratory
>> (509) 375-2272
>> ralph.perko@pnnl.gov
>>
>>
>> From: Vladimir Rodionov <vl...@gmail.com>
>> Reply-To: "user@phoenix.apache.org" <us...@phoenix.apache.org>
>> Date: Thursday, November 6, 2014 at 1:04 PM
>> To: "user@phoenix.apache.org" <us...@phoenix.apache.org>
>> Subject: Re: RegionTooBusyException
>>
>> You may want to try different RegionSplitPolicy
>> (ConstantSizeRegionSplitPolicy), default one
>> (IncreasingToUpperBoundRegionSplitPolicy)
>>  does not make sense when table is prespit in advance.
>>
>> -Vladimir Rodionov
>>
>> On Thu, Nov 6, 2014 at 1:01 PM, Vladimir Rodionov <vl...@gmail.com>
>> wrote:
>>>
>>> Too many map task trying concurrently commit (save) data to HBase, I bet
>>> you have compaction hell in your cluster during data loading.
>>>
>>> In a few words, you cluster is not able to keep up with data ingestion
>>> rate. HBase does not do smart update/insert rate throttling for you. You may
>>> try some compaction - related configuration options  :
>>>    hbase.hstore.blockingWaitTime - Default. 90000
>>>    hbase.hstore.compaction.min -  Default. 3
>>>    hbase.hstore.compaction.max - Default. 10
>>>    hbase.hstore.compaction.min.size - Default: 128 MB expressed in bytes.
>>>
>>> but I suggest you to pre-split your tables first, than limit # of map
>>> tasks (if former does not help), than play with compaction config values
>>> (above).
>>>
>>>
>>> -Vladimir Rodionov
>>>
>>> On Thu, Nov 6, 2014 at 12:31 PM, Perko, Ralph J <Ra...@pnnl.gov>
>>> wrote:
>>>>
>>>> Hi, I am using a combination of Pig, Phoenix and HBase to load data on a
>>>> test cluster and I continue to run into an issue with larger, longer running
>>>> jobs (smaller jobs succeed).  After the job has run for several hours, the
>>>> first set of mappers have finished and the second begin, the job dies with
>>>> each mapper failing with the error RegionTooBusyException.  Could this be
>>>> related to how I have my Phoenix tables configured or is this an Hbase
>>>> configuration issue or something else?  Do you have any suggestions?
>>>>
>>>> Thanks for the help,
>>>> Ralph
>>>>
>>>>
>>>> 2014-11-05 23:08:31,573 INFO [main]
>>>> org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 200 actions to
>>>> finish
>>>> 2014-11-05 23:08:33,729 WARN [phoenix-1-thread-34413]
>>>> org.apache.hadoop.hbase.client.AsyncProcess: #1, table=T1_CSV_DATA, primary,
>>>> attempt=36/35 failed 200 ops, last exception: null on
>>>> server1,60020,1415229553858, tracking started Wed Nov 05 22:59:40 PST 2014;
>>>> not retrying 200 - final failure
>>>> 2014-11-05 23:08:33,736 WARN [main] org.apache.hadoop.mapred.YarnChild:
>>>> Exception running child : java.io.IOException: Exception while committing to
>>>> database.
>>>> at
>>>> org.apache.phoenix.pig.hadoop.PhoenixRecordWriter.write(PhoenixRecordWriter.java:79)
>>>> at
>>>> org.apache.phoenix.pig.hadoop.PhoenixRecordWriter.write(PhoenixRecordWriter.java:41)
>>>> at
>>>> org.apache.phoenix.pig.PhoenixHBaseStorage.putNext(PhoenixHBaseStorage.java:151)
>>>> at
>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:139)
>>>> at
>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:98)
>>>> at
>>>> org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:635)
>>>> at
>>>> org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
>>>> at
>>>> org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)
>>>> at
>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.collect(PigMapOnly.java:48)
>>>> at
>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:284)
>>>> at
>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:277)
>>>> at
>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
>>>> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
>>>> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
>>>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
>>>> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
>>>> at java.security.AccessController.doPrivileged(Native Method)
>>>> at javax.security.auth.Subject.doAs(Subject.java:396)
>>>> at
>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1594)
>>>> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
>>>> Caused by: org.apache.phoenix.execute.CommitException:
>>>> org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed
>>>> 200 actions: RegionTooBusyException: 200 times,
>>>> at
>>>> org.apache.phoenix.execute.MutationState.commit(MutationState.java:418)
>>>> at
>>>> org.apache.phoenix.jdbc.PhoenixConnection.commit(PhoenixConnection.java:356)
>>>> at
>>>> org.apache.phoenix.pig.hadoop.PhoenixRecordWriter.write(PhoenixRecordWriter.java:76)
>>>> ... 19 more
>>>> Caused by:
>>>> org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed
>>>> 200 actions: RegionTooBusyException: 200 times,
>>>> at
>>>> org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.makeException(AsyncProcess.java:207)
>>>> at
>>>> org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.access$1700(AsyncProcess.java:187)
>>>> at
>>>> org.apache.hadoop.hbase.client.AsyncProcess$AsyncRequestFutureImpl.getErrors(AsyncProcess.java:1473)
>>>> at org.apache.hadoop.hbase.client.HTable.batch(HTable.java:855)
>>>> at org.apache.hadoop.hbase.client.HTable.batch(HTable.java:869)
>>>> at
>>>> org.apache.phoenix.execute.MutationState.commit(MutationState.java:399)
>>>> ... 21 more
>>>>
>>>> 2014-11-05 23:08:33,739 INFO [main] org.apache.hadoop.mapred.Task:
>>>> Runnning cleanup for the task
>>>> 2014-11-05 23:08:33,773 INFO [Thread-11]
>>>> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation:
>>>> Closing zookeeper sessionid=0x2497d0ab7e6007e
>>>>
>>>> Data size:
>>>> 75 csv files compressed with bz2
>>>> 17g compressed – 165g Uncompressed
>>>>
>>>> Time-series data, 6 node cluster, 5 region servers.  Hadoop 2.5  (HDP
>>>> 2.1.5).  Phoenix 4.0, Hbase 0.98,
>>>>
>>>> Phoenix Table def:
>>>>
>>>> CREATE TABLE IF NOT EXISTS
>>>> t1_csv_data
>>>> (
>>>> timestamp BIGINT NOT NULL,
>>>> location VARCHAR NOT NULL,
>>>> fileid VARCHAR NOT NULL,
>>>> recnum INTEGER NOT NULL,
>>>> field5 VARCHAR,
>>>> ...
>>>> field45 VARCHAR,
>>>> CONSTRAINT pkey PRIMARY KEY (timestamp,
>>>> location, fileid,recnum)
>>>> )
>>>> IMMUTABLE_ROWS=true,COMPRESSION='SNAPPY',SALT_BUCKETS=10;
>>>>
>>>> -- indexes
>>>> CREATE INDEX t1_csv_data_f1_idx ON t1_csv_data(somefield1)
>>>> COMPRESSION='SNAPPY';
>>>> CREATE INDEX t1_csv_data_f2_idx ON t1_csv_data(somefield2)
>>>> COMPRESSION='SNAPPY';
>>>> CREATE INDEX t1_csv_data_f3_idx ON t1_csv_data(somefield3)
>>>> COMPRESSION='SNAPPY';
>>>>
>>>> Simple Pig script:
>>>>
>>>> register $phoenix_jar;
>>>> register $udf_jar;
>>>> Z = load '$data' as (
>>>> file_id,
>>>> recnum,
>>>> dtm:chararray,
>>>> ...
>>>> -- lots of other fields
>>>> );
>>>> D = foreach Z generate
>>>> gov.pnnl.pig.TimeStringToPeriod(dtm,'yyyyMMdd
>>>> HH:mm:ss','yyyyMMddHHmmss'),
>>>> location,
>>>> fileid,
>>>> recnum,
>>>> ...
>>>> -- lots of other fields
>>>> ;
>>>> STORE D into
>>>> 'hbase://$table_name' using
>>>> org.apache.phoenix.pig.PhoenixHBaseStorage('$zookeeper','-batchSize
>>>> 1000');
>>>>
>>>
>>
>

Re: RegionTooBusyException

Posted by Vladimir Rodionov <vl...@gmail.com>.
If you see split activity on your index tables, they are either not pre
split or region sizes exceeds max limit (you load a lot of data into
indexes) or index tables still on default split policy.

How do you pre split your index tables?

-Vladimir Rodionov

On Fri, Nov 7, 2014 at 7:40 AM, Perko, Ralph J <Ra...@pnnl.gov> wrote:

>  Salting the table, (gives me pre-splits), and using the split policy of
> ConstantSizeRegionSplitPolicy as you suggested worked!
>
>  A question on the index tables – unlike the main table, hbase shows that
> each index table has the MAX_FILESIZE attribute set to 344148020 bytes
> which is well below what is set for the max HStoreFile size property in
> hbase, causing a large amount of splitting on all the index tables (which
> are pre split as well) despite using the same split policy as the main
> table.  Why is this done for just the index tables?  Is it safe to override?
>
>  Thanks,
> Ralph
>    __________________________________________________
> *Ralph Perko*
> Pacific Northwest National Laboratory
>   (509) 375-2272
> ralph.perko@pnnl.gov
>
>
>   From: Vladimir Rodionov <vl...@gmail.com>
> Reply-To: "user@phoenix.apache.org" <us...@phoenix.apache.org>
> Date: Thursday, November 6, 2014 at 1:04 PM
> To: "user@phoenix.apache.org" <us...@phoenix.apache.org>
> Subject: Re: RegionTooBusyException
>
>   You may want to try different RegionSplitPolicy
> (ConstantSizeRegionSplitPolicy), default one
> (IncreasingToUpperBoundRegionSplitPolicy)
>  does not make sense when table is prespit in advance.
>
>  -Vladimir Rodionov
>
> On Thu, Nov 6, 2014 at 1:01 PM, Vladimir Rodionov <vl...@gmail.com>
> wrote:
>
>>   Too many map task trying concurrently commit (save) data to HBase, I
>> bet you have compaction hell in your cluster during data loading.
>>
>>  In a few words, you cluster is not able to keep up with data ingestion
>> rate. HBase does not do smart update/insert rate throttling for you. You
>> may
>>  try some compaction - related configuration options  :
>>    hbase.hstore.blockingWaitTime - Default. 90000
>>    hbase.hstore.compaction.min -  Default. 3
>>    hbase.hstore.compaction.max - Default. 10
>>    hbase.hstore.compaction.min.size - Default: 128 MB expressed in bytes.
>>
>>  but I suggest you to pre-split your tables first, than limit # of map
>> tasks (if former does not help), than play with compaction config values
>> (above).
>>
>>
>>  -Vladimir Rodionov
>>
>> On Thu, Nov 6, 2014 at 12:31 PM, Perko, Ralph J <Ra...@pnnl.gov>
>> wrote:
>>
>>>  Hi, I am using a combination of Pig, Phoenix and HBase to load data on
>>> a test cluster and I continue to run into an issue with larger, longer
>>> running jobs (smaller jobs succeed).  After the job has run for several
>>> hours, the first set of mappers have finished and the second begin, the job
>>> dies with each mapper failing with the error RegionTooBusyException.  Could
>>> this be related to how I have my Phoenix tables configured or is this an
>>> Hbase configuration issue or something else?  Do you have any suggestions?
>>>
>>>  Thanks for the help,
>>> Ralph
>>>
>>>
>>>  2014-11-05 23:08:31,573 INFO [main]
>>> org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 200 actions to
>>> finish
>>> 2014-11-05 23:08:33,729 WARN [phoenix-1-thread-34413]
>>> org.apache.hadoop.hbase.client.AsyncProcess: #1, table=T1_CSV_DATA,
>>> primary, attempt=36/35 failed 200 ops, last exception: null on
>>> server1,60020,1415229553858, tracking started Wed Nov 05 22:59:40 PST 2014;
>>> not retrying 200 - final failure
>>> 2014-11-05 23:08:33,736 WARN [main] org.apache.hadoop.mapred.YarnChild:
>>> Exception running child : java.io.IOException: Exception while committing
>>> to database.
>>> at
>>> org.apache.phoenix.pig.hadoop.PhoenixRecordWriter.write(PhoenixRecordWriter.java:79)
>>> at
>>> org.apache.phoenix.pig.hadoop.PhoenixRecordWriter.write(PhoenixRecordWriter.java:41)
>>> at
>>> org.apache.phoenix.pig.PhoenixHBaseStorage.putNext(PhoenixHBaseStorage.java:151)
>>> at
>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:139)
>>> at
>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:98)
>>> at
>>> org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:635)
>>> at
>>> org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
>>> at
>>> org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)
>>> at
>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.collect(PigMapOnly.java:48)
>>> at
>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:284)
>>> at
>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:277)
>>> at
>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
>>> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
>>> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
>>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
>>> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
>>> at java.security.AccessController.doPrivileged(Native Method)
>>> at javax.security.auth.Subject.doAs(Subject.java:396)
>>> at
>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1594)
>>> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
>>> Caused by: org.apache.phoenix.execute.CommitException:
>>> org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed
>>> 200 actions: RegionTooBusyException: 200 times,
>>> at
>>> org.apache.phoenix.execute.MutationState.commit(MutationState.java:418)
>>> at
>>> org.apache.phoenix.jdbc.PhoenixConnection.commit(PhoenixConnection.java:356)
>>> at
>>> org.apache.phoenix.pig.hadoop.PhoenixRecordWriter.write(PhoenixRecordWriter.java:76)
>>> ... 19 more
>>> Caused by:
>>> org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed
>>> 200 actions: RegionTooBusyException: 200 times,
>>> at
>>> org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.makeException(AsyncProcess.java:207)
>>> at
>>> org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.access$1700(AsyncProcess.java:187)
>>> at
>>> org.apache.hadoop.hbase.client.AsyncProcess$AsyncRequestFutureImpl.getErrors(AsyncProcess.java:1473)
>>> at org.apache.hadoop.hbase.client.HTable.batch(HTable.java:855)
>>> at org.apache.hadoop.hbase.client.HTable.batch(HTable.java:869)
>>> at
>>> org.apache.phoenix.execute.MutationState.commit(MutationState.java:399)
>>> ... 21 more
>>>
>>>  2014-11-05 23:08:33,739 INFO [main] org.apache.hadoop.mapred.Task:
>>> Runnning cleanup for the task
>>> 2014-11-05 23:08:33,773 INFO [Thread-11]
>>> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation:
>>> Closing zookeeper sessionid=0x2497d0ab7e6007e
>>>
>>>  Data size:
>>> 75 csv files compressed with bz2
>>> 17g compressed – 165g Uncompressed
>>>
>>>  Time-series data, 6 node cluster, 5 region servers.  Hadoop 2.5  (HDP
>>> 2.1.5).  Phoenix 4.0, Hbase 0.98,
>>>
>>>  Phoenix Table def:
>>>
>>>  CREATE TABLE IF NOT EXISTS
>>> t1_csv_data
>>> (
>>> timestamp BIGINT NOT NULL,
>>> location VARCHAR NOT NULL,
>>> fileid VARCHAR NOT NULL,
>>> recnum INTEGER NOT NULL,
>>> field5 VARCHAR,
>>> ...
>>> field45 VARCHAR,
>>> CONSTRAINT pkey PRIMARY KEY (timestamp,
>>> location, fileid,recnum)
>>> )
>>> IMMUTABLE_ROWS=true,COMPRESSION='SNAPPY',SALT_BUCKETS=10;
>>>
>>>  -- indexes
>>> CREATE INDEX t1_csv_data_f1_idx ON t1_csv_data(somefield1)
>>> COMPRESSION='SNAPPY';
>>> CREATE INDEX t1_csv_data_f2_idx ON t1_csv_data(somefield2)
>>> COMPRESSION='SNAPPY';
>>> CREATE INDEX t1_csv_data_f3_idx ON t1_csv_data(somefield3)
>>> COMPRESSION='SNAPPY';
>>>
>>>  Simple Pig script:
>>>
>>>  register $phoenix_jar;
>>> register $udf_jar;
>>>  Z = load '$data' as (
>>> file_id,
>>> recnum,
>>> dtm:chararray,
>>> ...
>>> -- lots of other fields
>>> );
>>>  D = foreach Z generate
>>> gov.pnnl.pig.TimeStringToPeriod(dtm,'yyyyMMdd
>>> HH:mm:ss','yyyyMMddHHmmss'),
>>> location,
>>> fileid,
>>> recnum,
>>> ...
>>> -- lots of other fields
>>> ;
>>>  STORE D into
>>> 'hbase://$table_name' using
>>> org.apache.phoenix.pig.PhoenixHBaseStorage('$zookeeper','-batchSize
>>> 1000');
>>>
>>>
>>
>

Re: RegionTooBusyException

Posted by "Perko, Ralph J" <Ra...@pnnl.gov>.
Salting the table, (gives me pre-splits), and using the split policy of ConstantSizeRegionSplitPolicy as you suggested worked!

A question on the index tables – unlike the main table, hbase shows that each index table has the MAX_FILESIZE attribute set to 344148020 bytes which is well below what is set for the max HStoreFile size property in hbase, causing a large amount of splitting on all the index tables (which are pre split as well) despite using the same split policy as the main table.  Why is this done for just the index tables?  Is it safe to override?

Thanks,
Ralph
__________________________________________________
Ralph Perko
Pacific Northwest National Laboratory
(509) 375-2272
ralph.perko@pnnl.gov


From: Vladimir Rodionov <vl...@gmail.com>>
Reply-To: "user@phoenix.apache.org<ma...@phoenix.apache.org>" <us...@phoenix.apache.org>>
Date: Thursday, November 6, 2014 at 1:04 PM
To: "user@phoenix.apache.org<ma...@phoenix.apache.org>" <us...@phoenix.apache.org>>
Subject: Re: RegionTooBusyException

You may want to try different RegionSplitPolicy (ConstantSizeRegionSplitPolicy), default one (IncreasingToUpperBoundRegionSplitPolicy)
 does not make sense when table is prespit in advance.

-Vladimir Rodionov

On Thu, Nov 6, 2014 at 1:01 PM, Vladimir Rodionov <vl...@gmail.com>> wrote:
Too many map task trying concurrently commit (save) data to HBase, I bet you have compaction hell in your cluster during data loading.

In a few words, you cluster is not able to keep up with data ingestion rate. HBase does not do smart update/insert rate throttling for you. You may
try some compaction - related configuration options  :
   hbase.hstore.blockingWaitTime - Default. 90000
   hbase.hstore.compaction.min -  Default. 3
   hbase.hstore.compaction.max - Default. 10
   hbase.hstore.compaction.min.size - Default: 128 MB expressed in bytes.

but I suggest you to pre-split your tables first, than limit # of map tasks (if former does not help), than play with compaction config values (above).


-Vladimir Rodionov

On Thu, Nov 6, 2014 at 12:31 PM, Perko, Ralph J <Ra...@pnnl.gov>> wrote:
Hi, I am using a combination of Pig, Phoenix and HBase to load data on a test cluster and I continue to run into an issue with larger, longer running jobs (smaller jobs succeed).  After the job has run for several hours, the first set of mappers have finished and the second begin, the job dies with each mapper failing with the error RegionTooBusyException.  Could this be related to how I have my Phoenix tables configured or is this an Hbase configuration issue or something else?  Do you have any suggestions?

Thanks for the help,
Ralph


2014-11-05 23:08:31,573 INFO [main] org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 200 actions to finish
2014-11-05 23:08:33,729 WARN [phoenix-1-thread-34413] org.apache.hadoop.hbase.client.AsyncProcess: #1, table=T1_CSV_DATA, primary, attempt=36/35 failed 200 ops, last exception: null on server1,60020,1415229553858, tracking started Wed Nov 05 22:59:40 PST 2014; not retrying 200 - final failure
2014-11-05 23:08:33,736 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : java.io.IOException: Exception while committing to database.
at org.apache.phoenix.pig.hadoop.PhoenixRecordWriter.write(PhoenixRecordWriter.java:79)
at org.apache.phoenix.pig.hadoop.PhoenixRecordWriter.write(PhoenixRecordWriter.java:41)
at org.apache.phoenix.pig.PhoenixHBaseStorage.putNext(PhoenixHBaseStorage.java:151)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:139)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:98)
at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:635)
at org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.collect(PigMapOnly.java:48)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:284)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:277)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1594)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
Caused by: org.apache.phoenix.execute.CommitException: org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 200 actions: RegionTooBusyException: 200 times,
at org.apache.phoenix.execute.MutationState.commit(MutationState.java:418)
at org.apache.phoenix.jdbc.PhoenixConnection.commit(PhoenixConnection.java:356)
at org.apache.phoenix.pig.hadoop.PhoenixRecordWriter.write(PhoenixRecordWriter.java:76)
... 19 more
Caused by: org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 200 actions: RegionTooBusyException: 200 times,
at org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.makeException(AsyncProcess.java:207)
at org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.access$1700(AsyncProcess.java:187)
at org.apache.hadoop.hbase.client.AsyncProcess$AsyncRequestFutureImpl.getErrors(AsyncProcess.java:1473)
at org.apache.hadoop.hbase.client.HTable.batch(HTable.java:855)
at org.apache.hadoop.hbase.client.HTable.batch(HTable.java:869)
at org.apache.phoenix.execute.MutationState.commit(MutationState.java:399)
... 21 more

2014-11-05 23:08:33,739 INFO [main] org.apache.hadoop.mapred.Task: Runnning cleanup for the task
2014-11-05 23:08:33,773 INFO [Thread-11] org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation: Closing zookeeper sessionid=0x2497d0ab7e6007e

Data size:
75 csv files compressed with bz2
17g compressed – 165g Uncompressed

Time-series data, 6 node cluster, 5 region servers.  Hadoop 2.5  (HDP 2.1.5).  Phoenix 4.0, Hbase 0.98,

Phoenix Table def:

CREATE TABLE IF NOT EXISTS
t1_csv_data
(
timestamp BIGINT NOT NULL,
location VARCHAR NOT NULL,
fileid VARCHAR NOT NULL,
recnum INTEGER NOT NULL,
field5 VARCHAR,
...
field45 VARCHAR,
CONSTRAINT pkey PRIMARY KEY (timestamp,
location, fileid,recnum)
)
IMMUTABLE_ROWS=true,COMPRESSION='SNAPPY',SALT_BUCKETS=10;

-- indexes
CREATE INDEX t1_csv_data_f1_idx ON t1_csv_data(somefield1) COMPRESSION='SNAPPY';
CREATE INDEX t1_csv_data_f2_idx ON t1_csv_data(somefield2) COMPRESSION='SNAPPY';
CREATE INDEX t1_csv_data_f3_idx ON t1_csv_data(somefield3) COMPRESSION='SNAPPY';

Simple Pig script:

register $phoenix_jar;
register $udf_jar;
Z = load '$data' as (
file_id,
recnum,
dtm:chararray,
...
-- lots of other fields
);
D = foreach Z generate
gov.pnnl.pig.TimeStringToPeriod(dtm,'yyyyMMdd HH:mm:ss','yyyyMMddHHmmss'),
location,
fileid,
recnum,
...
-- lots of other fields
;
STORE D into
'hbase://$table_name' using
org.apache.phoenix.pig.PhoenixHBaseStorage('$zookeeper','-batchSize 1000');




Re: RegionTooBusyException

Posted by Vladimir Rodionov <vl...@gmail.com>.
You may want to try different RegionSplitPolicy
(ConstantSizeRegionSplitPolicy), default one
(IncreasingToUpperBoundRegionSplitPolicy)
 does not make sense when table is prespit in advance.

-Vladimir Rodionov

On Thu, Nov 6, 2014 at 1:01 PM, Vladimir Rodionov <vl...@gmail.com>
wrote:

> Too many map task trying concurrently commit (save) data to HBase, I bet
> you have compaction hell in your cluster during data loading.
>
> In a few words, you cluster is not able to keep up with data ingestion
> rate. HBase does not do smart update/insert rate throttling for you. You
> may
> try some compaction - related configuration options  :
>    hbase.hstore.blockingWaitTime - Default. 90000
>    hbase.hstore.compaction.min -  Default. 3
>    hbase.hstore.compaction.max - Default. 10
>    hbase.hstore.compaction.min.size - Default: 128 MB expressed in bytes.
>
> but I suggest you to pre-split your tables first, than limit # of map
> tasks (if former does not help), than play with compaction config values
> (above).
>
>
> -Vladimir Rodionov
>
> On Thu, Nov 6, 2014 at 12:31 PM, Perko, Ralph J <Ra...@pnnl.gov>
> wrote:
>
>>  Hi, I am using a combination of Pig, Phoenix and HBase to load data on
>> a test cluster and I continue to run into an issue with larger, longer
>> running jobs (smaller jobs succeed).  After the job has run for several
>> hours, the first set of mappers have finished and the second begin, the job
>> dies with each mapper failing with the error RegionTooBusyException.  Could
>> this be related to how I have my Phoenix tables configured or is this an
>> Hbase configuration issue or something else?  Do you have any suggestions?
>>
>>  Thanks for the help,
>> Ralph
>>
>>
>>  2014-11-05 23:08:31,573 INFO [main]
>> org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 200 actions to
>> finish
>> 2014-11-05 23:08:33,729 WARN [phoenix-1-thread-34413]
>> org.apache.hadoop.hbase.client.AsyncProcess: #1, table=T1_CSV_DATA,
>> primary, attempt=36/35 failed 200 ops, last exception: null on
>> server1,60020,1415229553858, tracking started Wed Nov 05 22:59:40 PST 2014;
>> not retrying 200 - final failure
>> 2014-11-05 23:08:33,736 WARN [main] org.apache.hadoop.mapred.YarnChild:
>> Exception running child : java.io.IOException: Exception while committing
>> to database.
>> at
>> org.apache.phoenix.pig.hadoop.PhoenixRecordWriter.write(PhoenixRecordWriter.java:79)
>> at
>> org.apache.phoenix.pig.hadoop.PhoenixRecordWriter.write(PhoenixRecordWriter.java:41)
>> at
>> org.apache.phoenix.pig.PhoenixHBaseStorage.putNext(PhoenixHBaseStorage.java:151)
>> at
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:139)
>> at
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:98)
>> at
>> org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:635)
>> at
>> org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
>> at
>> org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)
>> at
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.collect(PigMapOnly.java:48)
>> at
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:284)
>> at
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:277)
>> at
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
>> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
>> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
>> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
>> at java.security.AccessController.doPrivileged(Native Method)
>> at javax.security.auth.Subject.doAs(Subject.java:396)
>> at
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1594)
>> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
>> Caused by: org.apache.phoenix.execute.CommitException:
>> org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed
>> 200 actions: RegionTooBusyException: 200 times,
>> at org.apache.phoenix.execute.MutationState.commit(MutationState.java:418)
>> at
>> org.apache.phoenix.jdbc.PhoenixConnection.commit(PhoenixConnection.java:356)
>> at
>> org.apache.phoenix.pig.hadoop.PhoenixRecordWriter.write(PhoenixRecordWriter.java:76)
>> ... 19 more
>> Caused by:
>> org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed
>> 200 actions: RegionTooBusyException: 200 times,
>> at
>> org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.makeException(AsyncProcess.java:207)
>> at
>> org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.access$1700(AsyncProcess.java:187)
>> at
>> org.apache.hadoop.hbase.client.AsyncProcess$AsyncRequestFutureImpl.getErrors(AsyncProcess.java:1473)
>> at org.apache.hadoop.hbase.client.HTable.batch(HTable.java:855)
>> at org.apache.hadoop.hbase.client.HTable.batch(HTable.java:869)
>> at org.apache.phoenix.execute.MutationState.commit(MutationState.java:399)
>> ... 21 more
>>
>>  2014-11-05 23:08:33,739 INFO [main] org.apache.hadoop.mapred.Task:
>> Runnning cleanup for the task
>> 2014-11-05 23:08:33,773 INFO [Thread-11]
>> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation:
>> Closing zookeeper sessionid=0x2497d0ab7e6007e
>>
>>  Data size:
>> 75 csv files compressed with bz2
>> 17g compressed – 165g Uncompressed
>>
>>  Time-series data, 6 node cluster, 5 region servers.  Hadoop 2.5  (HDP
>> 2.1.5).  Phoenix 4.0, Hbase 0.98,
>>
>>  Phoenix Table def:
>>
>>  CREATE TABLE IF NOT EXISTS
>> t1_csv_data
>> (
>> timestamp BIGINT NOT NULL,
>> location VARCHAR NOT NULL,
>> fileid VARCHAR NOT NULL,
>> recnum INTEGER NOT NULL,
>> field5 VARCHAR,
>> ...
>> field45 VARCHAR,
>> CONSTRAINT pkey PRIMARY KEY (timestamp,
>> location, fileid,recnum)
>> )
>> IMMUTABLE_ROWS=true,COMPRESSION='SNAPPY',SALT_BUCKETS=10;
>>
>>  -- indexes
>> CREATE INDEX t1_csv_data_f1_idx ON t1_csv_data(somefield1)
>> COMPRESSION='SNAPPY';
>> CREATE INDEX t1_csv_data_f2_idx ON t1_csv_data(somefield2)
>> COMPRESSION='SNAPPY';
>> CREATE INDEX t1_csv_data_f3_idx ON t1_csv_data(somefield3)
>> COMPRESSION='SNAPPY';
>>
>>  Simple Pig script:
>>
>>  register $phoenix_jar;
>> register $udf_jar;
>>  Z = load '$data' as (
>> file_id,
>> recnum,
>> dtm:chararray,
>> ...
>> -- lots of other fields
>> );
>>  D = foreach Z generate
>> gov.pnnl.pig.TimeStringToPeriod(dtm,'yyyyMMdd HH:mm:ss','yyyyMMddHHmmss'),
>> location,
>> fileid,
>> recnum,
>> ...
>> -- lots of other fields
>> ;
>>  STORE D into
>> 'hbase://$table_name' using
>> org.apache.phoenix.pig.PhoenixHBaseStorage('$zookeeper','-batchSize
>> 1000');
>>
>>
>

Re: RegionTooBusyException

Posted by Vladimir Rodionov <vl...@gmail.com>.
Too many map task trying concurrently commit (save) data to HBase, I bet
you have compaction hell in your cluster during data loading.

In a few words, you cluster is not able to keep up with data ingestion
rate. HBase does not do smart update/insert rate throttling for you. You
may
try some compaction - related configuration options  :
   hbase.hstore.blockingWaitTime - Default. 90000
   hbase.hstore.compaction.min -  Default. 3
   hbase.hstore.compaction.max - Default. 10
   hbase.hstore.compaction.min.size - Default: 128 MB expressed in bytes.

but I suggest you to pre-split your tables first, than limit # of map tasks
(if former does not help), than play with compaction config values (above).


-Vladimir Rodionov

On Thu, Nov 6, 2014 at 12:31 PM, Perko, Ralph J <Ra...@pnnl.gov>
wrote:

>  Hi, I am using a combination of Pig, Phoenix and HBase to load data on a
> test cluster and I continue to run into an issue with larger, longer
> running jobs (smaller jobs succeed).  After the job has run for several
> hours, the first set of mappers have finished and the second begin, the job
> dies with each mapper failing with the error RegionTooBusyException.  Could
> this be related to how I have my Phoenix tables configured or is this an
> Hbase configuration issue or something else?  Do you have any suggestions?
>
>  Thanks for the help,
> Ralph
>
>
>  2014-11-05 23:08:31,573 INFO [main]
> org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 200 actions to
> finish
> 2014-11-05 23:08:33,729 WARN [phoenix-1-thread-34413]
> org.apache.hadoop.hbase.client.AsyncProcess: #1, table=T1_CSV_DATA,
> primary, attempt=36/35 failed 200 ops, last exception: null on
> server1,60020,1415229553858, tracking started Wed Nov 05 22:59:40 PST 2014;
> not retrying 200 - final failure
> 2014-11-05 23:08:33,736 WARN [main] org.apache.hadoop.mapred.YarnChild:
> Exception running child : java.io.IOException: Exception while committing
> to database.
> at
> org.apache.phoenix.pig.hadoop.PhoenixRecordWriter.write(PhoenixRecordWriter.java:79)
> at
> org.apache.phoenix.pig.hadoop.PhoenixRecordWriter.write(PhoenixRecordWriter.java:41)
> at
> org.apache.phoenix.pig.PhoenixHBaseStorage.putNext(PhoenixHBaseStorage.java:151)
> at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:139)
> at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:98)
> at
> org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:635)
> at
> org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
> at
> org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)
> at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.collect(PigMapOnly.java:48)
> at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:284)
> at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:277)
> at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:396)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1594)
> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
> Caused by: org.apache.phoenix.execute.CommitException:
> org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed
> 200 actions: RegionTooBusyException: 200 times,
> at org.apache.phoenix.execute.MutationState.commit(MutationState.java:418)
> at
> org.apache.phoenix.jdbc.PhoenixConnection.commit(PhoenixConnection.java:356)
> at
> org.apache.phoenix.pig.hadoop.PhoenixRecordWriter.write(PhoenixRecordWriter.java:76)
> ... 19 more
> Caused by:
> org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed
> 200 actions: RegionTooBusyException: 200 times,
> at
> org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.makeException(AsyncProcess.java:207)
> at
> org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.access$1700(AsyncProcess.java:187)
> at
> org.apache.hadoop.hbase.client.AsyncProcess$AsyncRequestFutureImpl.getErrors(AsyncProcess.java:1473)
> at org.apache.hadoop.hbase.client.HTable.batch(HTable.java:855)
> at org.apache.hadoop.hbase.client.HTable.batch(HTable.java:869)
> at org.apache.phoenix.execute.MutationState.commit(MutationState.java:399)
> ... 21 more
>
>  2014-11-05 23:08:33,739 INFO [main] org.apache.hadoop.mapred.Task:
> Runnning cleanup for the task
> 2014-11-05 23:08:33,773 INFO [Thread-11]
> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation:
> Closing zookeeper sessionid=0x2497d0ab7e6007e
>
>  Data size:
> 75 csv files compressed with bz2
> 17g compressed – 165g Uncompressed
>
>  Time-series data, 6 node cluster, 5 region servers.  Hadoop 2.5  (HDP
> 2.1.5).  Phoenix 4.0, Hbase 0.98,
>
>  Phoenix Table def:
>
>  CREATE TABLE IF NOT EXISTS
> t1_csv_data
> (
> timestamp BIGINT NOT NULL,
> location VARCHAR NOT NULL,
> fileid VARCHAR NOT NULL,
> recnum INTEGER NOT NULL,
> field5 VARCHAR,
> ...
> field45 VARCHAR,
> CONSTRAINT pkey PRIMARY KEY (timestamp,
> location, fileid,recnum)
> )
> IMMUTABLE_ROWS=true,COMPRESSION='SNAPPY',SALT_BUCKETS=10;
>
>  -- indexes
> CREATE INDEX t1_csv_data_f1_idx ON t1_csv_data(somefield1)
> COMPRESSION='SNAPPY';
> CREATE INDEX t1_csv_data_f2_idx ON t1_csv_data(somefield2)
> COMPRESSION='SNAPPY';
> CREATE INDEX t1_csv_data_f3_idx ON t1_csv_data(somefield3)
> COMPRESSION='SNAPPY';
>
>  Simple Pig script:
>
>  register $phoenix_jar;
> register $udf_jar;
>  Z = load '$data' as (
> file_id,
> recnum,
> dtm:chararray,
> ...
> -- lots of other fields
> );
>  D = foreach Z generate
> gov.pnnl.pig.TimeStringToPeriod(dtm,'yyyyMMdd HH:mm:ss','yyyyMMddHHmmss'),
> location,
> fileid,
> recnum,
> ...
> -- lots of other fields
> ;
>  STORE D into
> 'hbase://$table_name' using
> org.apache.phoenix.pig.PhoenixHBaseStorage('$zookeeper','-batchSize 1000');
>
>