You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Ted Yu (JIRA)" <ji...@apache.org> on 2018/06/21 19:51:00 UTC
[jira] [Commented] (HBASE-20769) getSplits() has a out of bounds problem in TableSnapshotInputFormatImpl

    [ https://issues.apache.org/jira/browse/HBASE-20769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16519720#comment-16519720 ] 

Ted Yu commented on HBASE-20769:
--------------------------------

When I ran the new test without fix:
{code}
testWithMockedMapReduceWithSplitsPerRegion(org.apache.hadoop.hbase.mapreduce.TestTableSnapshotInputFormat)  Time elapsed: 21.222 sec  <<< FAILURE!
java.lang.AssertionError: yya >= yyy?
	at org.apache.hadoop.hbase.mapreduce.TestTableSnapshotInputFormat.verifyWithMockedMapReduce(TestTableSnapshotInputFormat.java:357)
	at org.apache.hadoop.hbase.mapreduce.TestTableSnapshotInputFormat.testWithMockedMapReduceWithSplitsPerRegion(TestTableSnapshotInputFormat.java:269)
{code}
{code}
      Assert.assertTrue(Bytes.toStringBinary(startRow) + " <= "+ Bytes.toStringBinary(scan.getStartRow()) + "?", Bytes.compareTo(startRow, scan.getStartRow()) <= 0);
      Assert.assertTrue(Bytes.toStringBinary(stopRow) + " >= "+ Bytes.toStringBinary(scan.getStopRow()) + "?", Bytes.compareTo(stopRow, scan.getStopRow()) >= 0);
{code}
First, using '?' doesn't go with assertion - if test fails, the output should be definitive.
Second, please wrap long line.
{code}
+    public TableSnapshotInputFormatImpl.InputSplit getDelegate() {
{code}
The above can be package private.

> getSplits() has a out of bounds problem in TableSnapshotInputFormatImpl
> -----------------------------------------------------------------------
>
>                 Key: HBASE-20769
>                 URL: https://issues.apache.org/jira/browse/HBASE-20769
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 1.3.0, 1.4.0, 2.0.0
>            Reporter: Jingyun Tian
>            Assignee: Jingyun Tian
>            Priority: Major
>             Fix For: 2.0.0
>
>         Attachments: HBASE-20769.master.001.patch
>
>
> When numSplits > 1, getSplits may create split that has start row smaller than user specified scan's start row or stop row larger than user specified scan's stop row.
> {code}
>         byte[][] sp = sa.split(hri.getStartKey(), hri.getEndKey(), numSplits, true);
>         for (int i = 0; i < sp.length - 1; i++) {
>           if (PrivateCellUtil.overlappingKeys(scan.getStartRow(), scan.getStopRow(), sp[i],
>                   sp[i + 1])) {
>             List<String> hosts =
>                 calculateLocationsForInputSplit(conf, htd, hri, tableDir, localityEnabled);
>             Scan boundedScan = new Scan(scan);
>             boundedScan.setStartRow(sp[i]);
>             boundedScan.setStopRow(sp[i + 1]);
>             splits.add(new InputSplit(htd, hri, hosts, boundedScan, restoreDir));
>           }
>         }
> {code}
> Since we split keys by the range of regions, when sp[i] < scan.getStartRow() or sp[i + 1] > scan.getStopRow(), the created bounded scan may contain range that over user defined scan.
> fix should be simple:
> {code}
> boundedScan.setStartRow(
>  Bytes.compareTo(scan.getStartRow(), sp[i]) > 0 ? scan.getStartRow() : sp[i]);
>  boundedScan.setStopRow(
>  Bytes.compareTo(scan.getStopRow(), sp[i + 1]) < 0 ? scan.getStopRow() : sp[i + 1]);
> {code}
> I will also try to add UTs to help discover this problem



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)