You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "ShivaKumar SS (JIRA)" <ji...@apache.org> on 2018/07/11 05:59:00 UTC
[jira] [Comment Edited] (HBASE-20844) Duplicate rows returned while
hbase snapshot reads
[ https://issues.apache.org/jira/browse/HBASE-20844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16539597#comment-16539597 ]
ShivaKumar SS edited comment on HBASE-20844 at 7/11/18 5:58 AM:
----------------------------------------------------------------
This behaviour is not seen in hbase 1.4.5 and it turns out to be below fix missing in hbase 1.3.1, where it ignores regions which are getting split.
{{Class : org.apache.hadoop.hbase.mapreduce.TableSnapshotInputFormatImpl}}
Method :
{{ public static List<HRegionInfo> getRegionInfosFromManifest(SnapshotManifest manifest) {}}
{{ List<SnapshotRegionManifest> regionManifests = manifest.getRegionManifests();}}
{{ if (regionManifests == null) {}}
{{ throw new IllegalArgumentException("Snapshot seems empty");}}
}
{{ List<HRegionInfo> regionInfos = Lists.newArrayListWithCapacity(regionManifests.size());}}
{{ for (SnapshotRegionManifest regionManifest : regionManifests) {}}
{{ HRegionInfo hri = HRegionInfo.convert(regionManifest.getRegionInfo());}}
{{ if (hri.isOffline() && (hri.isSplit() || hri.isSplitParent())) { // This one.}}
{{ continue;}}
{{ }}}
{{ regionInfos.add(hri);}}
}
{{ return regionInfos;}}
}
was (Author: shivakumar.ss):
This behaviour is not seen in hbase 1.4.5 and it turns out to be below fix missing in hbase 1.3.1, where it ignores regions which are getting split.
{{Class : org.apache.hadoop.hbase.mapreduce.TableSnapshotInputFormatImpl}}
Method :
{{ public static List<HRegionInfo> getRegionInfosFromManifest(SnapshotManifest manifest) {}}
{{ List<SnapshotRegionManifest> regionManifests = manifest.getRegionManifests();}}
{{ if (regionManifests == null) {}}
{{ throw new IllegalArgumentException("Snapshot seems empty");}}
{{ }}}
{{ List<HRegionInfo> regionInfos = Lists.newArrayListWithCapacity(regionManifests.size());}}
{{ for (SnapshotRegionManifest regionManifest : regionManifests) {}}
{{ HRegionInfo hri = HRegionInfo.convert(regionManifest.getRegionInfo());}}
{{ if (hri.isOffline() && (hri.isSplit() || hri.isSplitParent())) { // This one.}}
{{ continue;}}
{{ }}}
{{ regionInfos.add(hri);}}
{{ }}}
{{ return regionInfos;}}
{{ }}}
> Duplicate rows returned while hbase snapshot reads
> --------------------------------------------------
>
> Key: HBASE-20844
> URL: https://issues.apache.org/jira/browse/HBASE-20844
> Project: HBase
> Issue Type: Bug
> Components: mapreduce, snapshots, spark
> Affects Versions: 1.3.1
> Environment: Cluster Details
> Java 1.7
> Hbase 1.3.1
> Spark 1.6.1
> Reporter: ShivaKumar SS
> Priority: Major
>
> We are trying to take snapshot from code and read data using MR and spark, both approaches are returning duplicate records.
> On the API side, \{{org.apache.hadoop.hbase.mapreduce.TableSnapshotInputFormat }} is used.
> Snapshot was taken during the table was in a region split state.
> We suspect it is due to data is being returned for both parent and daughter regions.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)