You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@phoenix.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2020/11/20 06:09:00 UTC

[jira] [Commented] (PHOENIX-6153) Table Map Reduce job after a Snapshot based job fails with CorruptedSnapshotException

    [ https://issues.apache.org/jira/browse/PHOENIX-6153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17235916#comment-17235916 ] 

ASF GitHub Bot commented on PHOENIX-6153:
-----------------------------------------

sakshamgangwar closed pull request #902:
URL: https://github.com/apache/phoenix/pull/902


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


> Table Map Reduce job after a Snapshot based job fails with CorruptedSnapshotException
> -------------------------------------------------------------------------------------
>
>                 Key: PHOENIX-6153
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-6153
>             Project: Phoenix
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 4.15.0, 4.14.3, master
>            Reporter: Saksham Gangwar
>            Assignee: Saksham Gangwar
>            Priority: Major
>             Fix For: 5.1.0, 4.16.0
>
>         Attachments: PHOENIX-6153.4.x.v1.patch, PHOENIX-6153.master.v1.patch, PHOENIX-6153.master.v2.patch, PHOENIX-6153.master.v3.patch, PHOENIX-6153.master.v4.patch, PHOENIX-6153.master.v5.patch, Screen Shot 2020-09-30 at 4.00.58 AM.png, Screen Shot 2020-09-30 at 4.01.10 AM.png, Screen Shot 2020-09-30 at 4.01.10 AM.png, Screen Shot 2020-09-30 at 4.01.19 AM.png, Screen Shot 2020-09-30 at 4.01.19 AM.png, Screen Shot 2020-09-30 at 4.01.19 AM.png, Screen Shot 2020-09-30 at 4.01.34 AM.png, Screen Shot 2020-09-30 at 4.01.52 AM.png, Screen Shot 2020-09-30 at 4.01.52 AM.png, Screen Shot 2020-09-30 at 9.30.06 AM.png
>
>
> Different MR job requests which reach [MapReduceParallelScanGrouper getRegionBoundaries|https://github.com/apache/phoenix/blob/f9e304754bad886344a856dd2565e3f24e345ed2/phoenix-core/src/main/java/org/apache/phoenix/iterate/MapReduceParallelScanGrouper.java#L65] we currently make use of shared configuration among jobs to figure out snapshot names. 
> Example jobs' sequence: first two jobs work over snapshot and the third job over a regular table.
> Prininting hashcode of objects when entering: [https://github.com/apache/phoenix/blob/f9e304754bad886344a856dd2565e3f24e345ed2/phoenix-core/src/main/java/org/apache/phoenix/iterate/MapReduceParallelScanGrouper.java#L65]
> *Job 1:* (over snapshot of  *ABC_TABLE_1* and is successful)
> context.getConnection(): 521093916
>  ConnectionQueryServices: 1772519705
>  *Configuration conf: 813285994*
>      conf.get(PhoenixConfigurationUtil.SNAPSHOT_NAME_KEY):*ABC_TABLE_1*
>  
> *Job 2:* (over snapshot of *ABC_TABLE_2* and is successful)
> context.getConnection(): 1928017473
>  ConnectionQueryServices: 961279422
>  *Configuration conf: 813285994*
>      conf.get(PhoenixConfigurationUtil.SNAPSHOT_NAME_KEY): *ABC_TABLE_2*
>  
> *Job 3:* (over the table *ABC_TABLE_3* but fails with CorruptedSnapshotException while it got nothing to do with snapshot)
> context.getConnection(): 28889670
>  ConnectionQueryServices: 424389847
>  *Configuration: 813285994*
>      conf.get(PhoenixConfigurationUtil.SNAPSHOT_NAME_KEY): *ABC_TABLE_2*
>  
> Exception which we get:
>  [2020:08:18 20:56:17.409] [MigrationRetryPoller-Executor-1] [ERROR] [c.s.hgrate.mapreduce.MapReduceImpl] - Error submitting M/R job for Job 3
>  java.lang.RuntimeException: org.apache.hadoop.hbase.snapshot.CorruptedSnapshotException: Couldn't read snapshot info from:hdfs://.../hbase/.hbase-snapshot/ABC_TABLE_2_1597687413477/.snapshotinfo
>  at org.apache.phoenix.iterate.MapReduceParallelScanGrouper.getRegionBoundaries(MapReduceParallelScanGrouper.java:81) ~[phoenix-core-4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT.jar:4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT]
>  at org.apache.phoenix.iterate.BaseResultIterators.getRegionBoundaries(BaseResultIterators.java:541) ~[phoenix-core-4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT.jar:4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT]
>  at org.apache.phoenix.iterate.BaseResultIterators.getParallelScans(BaseResultIterators.java:893) ~[phoenix-core-4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT.jar:4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT]
>  at org.apache.phoenix.iterate.BaseResultIterators.getParallelScans(BaseResultIterators.java:641) ~[phoenix-core-4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT.jar:4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT]
>  at org.apache.phoenix.iterate.BaseResultIterators.<init>(BaseResultIterators.java:511) ~[phoenix-core-4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT.jar:4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT]
>  at org.apache.phoenix.iterate.ParallelIterators.<init>(ParallelIterators.java:62) ~[phoenix-core-4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT.jar:4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT]
>  at org.apache.phoenix.execute.ScanPlan.newIterator(ScanPlan.java:278) ~[phoenix-core-4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT.jar:4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT]
>  at org.apache.phoenix.execute.BaseQueryPlan.iterator(BaseQueryPlan.java:367) ~[phoenix-core-4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT.jar:4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT]
>  at org.apache.phoenix.execute.BaseQueryPlan.iterator(BaseQueryPlan.java:218) ~[phoenix-core-4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT.jar:4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT]
>  at org.apache.phoenix.execute.BaseQueryPlan.iterator(BaseQueryPlan.java:213) ~[phoenix-core-4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT.jar:4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT]
>  at org.apache.phoenix.mapreduce.PhoenixInputFormat.setupParallelScansWithScanGrouper(PhoenixInputFormat.java:252) ~[phoenix-core-4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT.jar:4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT]
>  at org.apache.phoenix.mapreduce.PhoenixInputFormat.setupParallelScansFromQueryPlan(PhoenixInputFormat.java:235) ~[phoenix-core-4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT.jar:4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT]
>  at org.apache.phoenix.mapreduce.PhoenixInputFormat.generateSplits(PhoenixInputFormat.java:94) ~[phoenix-core-4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT.jar:4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT]
>  at org.apache.phoenix.mapreduce.PhoenixInputFormat.getSplits(PhoenixInputFormat.java:89) ~[phoenix-core-4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT.jar:4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT]
>  at org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:301) ~[hadoop-mapreduce-client-core-2.7.7-sfdc-1.0.18.jar:2.7.7-sfdc-1.0.18]
>  at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:318) ~[hadoop-mapreduce-client-core-2.7.7-sfdc-1.0.18.jar:2.7.7-sfdc-1.0.18]
>  at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:196) ~[hadoop-mapreduce-client-core-2.7.7-sfdc-1.0.18.jar:2.7.7-sfdc-1.0.18]
>  at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290) ~[hadoop-mapreduce-client-core-2.7.7-sfdc-1.0.18.jar:2.7.7-sfdc-1.0.18]
>  at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287) ~[hadoop-mapreduce-client-core-2.7.7-sfdc-1.0.18.jar:2.7.7-sfdc-1.0.18]
>  at java.security.AccessController.doPrivileged(Native Method) ~[na:1.8.0_172]
>  at javax.security.auth.Subject.doAs(Subject.java:422) ~[na:1.8.0_172]
>   
>  
>  Change Required:
> 1. While setting the snapshot name in a shared configuration we also need to add a mechanism to remove it as well when jobs are not snapshot related:
> [https://github.com/apache/phoenix/blob/f9e304754bad886344a856dd2565e3f24e345ed2/phoenix-core/src/main/java/org/apache/phoenix/mapreduce/PhoenixInputFormat.java#L210]
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)