You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@phoenix.apache.org by "Guizhou Feng (JIRA)" <ji...@apache.org> on 2018/02/07 07:03:00 UTC
[jira] [Comment Edited] (PHOENIX-2883) Region close during automatic disabling of index for rebuilding can lead to RS abort

    [ https://issues.apache.org/jira/browse/PHOENIX-2883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16355030#comment-16355030 ] 

Guizhou Feng edited comment on PHOENIX-2883 at 2/7/18 7:02 AM:
---------------------------------------------------------------

I encounter similar case while build index async via IndexTool

HBase Version: 1.2.0-cdh5.10.1
Phoenix Version: phoenix-4.8.0-cdh5.8.0-server.jar

Behavior Description:

    1. Create Index: CREATE INDEX "prod:log_my_phx_3_idx"
    ON "prod:log_my_phx" ("id", "version", "event_time" )
        INCLUDE(
         "name",
         "code",
         "type",
         "decision",
         "monitoring") ASYNC;
    2. Run IndexTool mapreduce job

MapReduce job run succeed, index is activated, although alter index statement throw NullPointerException as below

ALTER INDEX IF EXISTS "prod:log_my_phx_3_idx" ON "prod:log_my_phx" ACTIVE

18/02/06 16:26:32 INFO index.IndexToolUtil: alterQuery: ALTER INDEX IF EXISTS "prod:log_my_phx_3_idx" ON "prod:log_my_phx" ACTIVE
18/02/06 16:26:32 ERROR index.IndexTool: An exception occurred while performing the indexing job: NullPointerException:  at:
java.lang.NullPointerException
    at org.apache.phoenix.schema.PMetaDataImpl.addTable(PMetaDataImpl.java:108)
    at org.apache.phoenix.jdbc.PhoenixConnection.addTable(PhoenixConnection.java:903)
    at org.apache.phoenix.schema.MetaDataClient.addTableToCache(MetaDataClient.java:3539)
    at org.apache.phoenix.schema.MetaDataClient.alterIndex(MetaDataClient.java:3504)
    at org.apache.phoenix.jdbc.PhoenixStatement$ExecutableAlterIndexStatement$1.execute(PhoenixStatement.java:993)
    at org.apache.phoenix.jdbc.PhoenixStatement$2.call(PhoenixStatement.java:344)
    at org.apache.phoenix.jdbc.PhoenixStatement$2.call(PhoenixStatement.java:332)
    at org.apache.phoenix.call.CallRunner.run(CallRunner.java:53)
    at org.apache.phoenix.jdbc.PhoenixStatement.executeMutation(PhoenixStatement.java:331)
    at org.apache.phoenix.jdbc.PhoenixStatement.execute(PhoenixStatement.java:1442)
    at org.apache.phoenix.mapreduce.index.IndexToolUtil.updateIndexState(IndexToolUtil.java:75)
    at org.apache.phoenix.mapreduce.index.IndexTool.run(IndexTool.java:245)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
    at org.apache.phoenix.mapreduce.index.IndexTool.main(IndexTool.java:384)


FATAL Errors in RegionServer:
2018-02-07 13:18:14,229 INFO org.apache.hadoop.hbase.regionserver.HRegion: Finished memstore flush of ~358.52 MB/375939816, currentsize=-101.26 MB/-106175240 for region prod:log_my_phx,11_1022430660_502_V5,1517881075389.3504c1c9e68e7b0c9c1c99ea396ccb57. in 1849ms, sequenceid=15325637, compaction requested=true
2018-02-07 13:18:14,263 INFO org.apache.hadoop.hbase.regionserver.SplitTransaction: Starting split of region prod:log_my_phx,11_1022430660_502_V5,1517881075389.3504c1c9e68e7b0c9c1c99ea396ccb57.
2018-02-07 13:18:14,384 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server my-stage-hadoop-prod05-bp,60020,1516757389774: Assertion failed while closing store prod:log_my_phx,11_1022430660_502_V5,1517881075389.3504c1c9e68e7b0c9c1c99ea396ccb57. 0. flushableSize expected=0, actual= 1212904. Current memstoreSize=-106175240. Maybe a coprocessor operation failed and left the memstore in a partially updated state.
2018-02-07 13:18:14,384 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: RegionServer abort: loaded coprocessors are: [org.apache.phoenix.coprocessor.SequenceRegionObserver, org.apache.phoenix.coprocessor.ScanRegionObserver, org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver, org.apache.phoenix.hbase.index.Indexer, org.apache.phoenix.coprocessor.GroupedAggregateRegionObserver, org.apache.hadoop.hbase.regionserver.LocalIndexSplitter, org.apache.phoenix.coprocessor.ServerCachingEndpointImpl, org.apache.hadoop.hbase.security.access.SecureBulkLoadEndpoint, org.apache.hadoop.hbase.coprocessor.MultiRowMutationEndpoint]

By the way, only one of the region server abort, the abort of region server bring a lot of inconsistency due to region in transition and hard to recover with hbase hbck -repair, it took whole day to run the repair bunch of times










 


was (Author: guizhou):
I encounter similar case while build index async via IndexTool

HBase Version: 1.2.0-cdh5.10.1
Phoenix Version: phoenix-4.8.0-cdh5.8.0-server.jar

Behavior Description:

    1. Create Index: CREATE INDEX "prod:log_my_phx_3_idx"
    ON "prod:log_my_phx" ("id", "version", "event_time" )
        INCLUDE(
         "name",
         "code",
         "type",
         "decision",
         "monitoring") ASYNC;
    2. Run IndexTool mapreduce job

MapReduce job run succeed, index is activated, although alter index statement throw NullPointerException as below

ALTER INDEX IF EXISTS "prod:log_my_phx_3_idx" ON "prod:log_my_phx" ACTIVE

18/02/06 16:26:32 INFO index.IndexToolUtil: alterQuery: ALTER INDEX IF EXISTS "prod:log_my_phx_3_idx" ON "prod:log_my_phx" ACTIVE
18/02/06 16:26:32 ERROR index.IndexTool: An exception occurred while performing the indexing job: NullPointerException:  at:
java.lang.NullPointerException
    at org.apache.phoenix.schema.PMetaDataImpl.addTable(PMetaDataImpl.java:108)
    at org.apache.phoenix.jdbc.PhoenixConnection.addTable(PhoenixConnection.java:903)
    at org.apache.phoenix.schema.MetaDataClient.addTableToCache(MetaDataClient.java:3539)
    at org.apache.phoenix.schema.MetaDataClient.alterIndex(MetaDataClient.java:3504)
    at org.apache.phoenix.jdbc.PhoenixStatement$ExecutableAlterIndexStatement$1.execute(PhoenixStatement.java:993)
    at org.apache.phoenix.jdbc.PhoenixStatement$2.call(PhoenixStatement.java:344)
    at org.apache.phoenix.jdbc.PhoenixStatement$2.call(PhoenixStatement.java:332)
    at org.apache.phoenix.call.CallRunner.run(CallRunner.java:53)
    at org.apache.phoenix.jdbc.PhoenixStatement.executeMutation(PhoenixStatement.java:331)
    at org.apache.phoenix.jdbc.PhoenixStatement.execute(PhoenixStatement.java:1442)
    at org.apache.phoenix.mapreduce.index.IndexToolUtil.updateIndexState(IndexToolUtil.java:75)
    at org.apache.phoenix.mapreduce.index.IndexTool.run(IndexTool.java:245)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
    at org.apache.phoenix.mapreduce.index.IndexTool.main(IndexTool.java:384)


FATAL Errors in RegionServer:
ABORTING region server my-stage-hadoop-prod08-bp,60020,1504851662123: Assertion failed while closing store prod:log_my_phx,12_1022360799_801_V19,1517885909466.4a9e01d05c167dc6bdcab962763d7096. 0. flushableSize expected=0, actual= 207088. Current memstoreSize=-114100080. Maybe a coprocessor operation failed and left the memstore in a partially updated state.

RegionServer abort: loaded coprocessors are: [org.apache.phoenix.coprocessor.MetaDataEndpointImpl, org.apache.phoenix.coprocessor.SequenceRegionObserver, org.apache.phoenix.coprocessor.ScanRegionObserver, org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver, org.apache.phoenix.hbase.index.Indexer, org.apache.phoenix.coprocessor.MetaDataRegionObserver, org.apache.phoenix.coprocessor.GroupedAggregateRegionObserver, org.apache.hadoop.hbase.regionserver.LocalIndexSplitter, org.apache.phoenix.coprocessor.ServerCachingEndpointImpl, org.apache.hadoop.hbase.security.access.SecureBulkLoadEndpoint, org.apache.hadoop.hbase.coprocessor.MultiRowMutationEndpoint] 


By the way, only one of the region server abort, the abort of region server bring a lot of inconsistency due to region in transition and hard to recover with hbase hbck -repair, it took whole day to run the repair bunch of times










 

> Region close during automatic disabling of index for rebuilding can lead to RS abort
> ------------------------------------------------------------------------------------
>
>                 Key: PHOENIX-2883
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-2883
>             Project: Phoenix
>          Issue Type: Bug
>            Reporter: Josh Elser
>            Assignee: Josh Elser
>            Priority: Major
>
> (disclaimer: still performing due-diligence on this one)
> I've been helping a user this week with what is thought to be a race condition in secondary index updates. This user has a relatively heavy write-based workload with a few tables that each have at least one index.
> What we have seen is that when the region distribution is changing (concretely, we were doing a rolling restart of the cluster without the load balancer disabled in the hopes of retaining as much availability as possible), I've seen the following general outline in the logs:
> * An index update fails (due to {{ERROR 2008 (INT10)}} the index metadata cache expired or is just missing)
> * The index is taken offline to be asynchronously rebuilt
> * A flush on the data table's region is queue for quite some time
> * RS is asked to close a region (due to a move, commonly)
> * RS aborts because the memstore for the data table's region is in an inconsistent state (e.g. {{Assertion failed while closing store <region> <colfam> flushableSize expected=0, actual= 193392. Current memstoreSize=-552208. Maybe a coprocessor operation failed and left the memstore in a partially updated state.}}
> Some relevant HBase issues include HBASE-10514 and HBASE-10844.
> Have been talking to [~ayingshu] and [~devaraj] about it, but haven't found anything definitively conclusive yet. Will dump findings here.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)