You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "ShivaKumar SS (JIRA)" <ji...@apache.org> on 2018/07/11 05:59:00 UTC

[jira] [Commented] (HBASE-20844) Duplicate rows returned while hbase snapshot reads

    [ https://issues.apache.org/jira/browse/HBASE-20844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16539597#comment-16539597 ] 

ShivaKumar SS commented on HBASE-20844:
---------------------------------------

This behaviour is not seen in hbase 1.4.5 and it turns out to be below fix missing in hbase 1.3.1, where it ignores regions which are getting split.


{{Class : org.apache.hadoop.hbase.mapreduce.TableSnapshotInputFormatImpl}}

 

Method :

{{  public static List<HRegionInfo> getRegionInfosFromManifest(SnapshotManifest manifest) {}}
{{      List<SnapshotRegionManifest> regionManifests = manifest.getRegionManifests();}}
{{      if (regionManifests == null) {}}
{{         throw new IllegalArgumentException("Snapshot seems empty");}}
{{      }}}

{{      List<HRegionInfo> regionInfos = Lists.newArrayListWithCapacity(regionManifests.size());}}

{{      for (SnapshotRegionManifest regionManifest : regionManifests) {}}
{{         HRegionInfo hri = HRegionInfo.convert(regionManifest.getRegionInfo());}}
{{         if (hri.isOffline() && (hri.isSplit() || hri.isSplitParent())) { // This one.}}
{{           continue;}}
{{         }}}
{{         regionInfos.add(hri);}}
{{      }}}
{{      return regionInfos;}}
{{  }}}

 

> Duplicate rows returned while hbase snapshot reads
> --------------------------------------------------
>
>                 Key: HBASE-20844
>                 URL: https://issues.apache.org/jira/browse/HBASE-20844
>             Project: HBase
>          Issue Type: Bug
>          Components: mapreduce, snapshots, spark
>    Affects Versions: 1.3.1
>         Environment: Cluster Details 
> Java 	1.7
> Hbase     1.3.1
> Spark      1.6.1
>            Reporter: ShivaKumar SS
>            Priority: Major
>
> We are trying to take snapshot from code and read data using MR and spark, both approaches are returning duplicate records.
> On the API side, \{{org.apache.hadoop.hbase.mapreduce.TableSnapshotInputFormat }} is used.
> Snapshot was taken during the table was in a region split state.
> We suspect it is due to data is being returned for both parent and daughter regions.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)