You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@phoenix.apache.org by GitBox <gi...@apache.org> on 2020/06/09 19:11:46 UTC
[GitHub] [phoenix] gjacoby126 commented on a change in pull request #789: PHOENIX-5783: Implement starttime in IndexTool for rebuild and verifi…

gjacoby126 commented on a change in pull request #789:
URL: https://github.com/apache/phoenix/pull/789#discussion_r437648684



##########
File path: phoenix-core/src/main/java/org/apache/phoenix/coprocessor/IndexRebuildRegionScanner.java
##########
@@ -118,6 +118,9 @@
     private IndexVerificationOutputRepository verificationOutputRepository;
     private boolean skipped = false;
     private boolean shouldVerifyCheckDone = false;
+    private byte[] nextStartKey;
+    private boolean hasMoreIncr = false;
+    private long minTimestamp = 0 ;

Review comment:
       Looks like minTimestamp is just set once from the Scan (or 0 if no verify type), so can this be final?

##########
File path: phoenix-core/src/main/java/org/apache/phoenix/coprocessor/IndexRebuildRegionScanner.java
##########
@@ -1398,7 +1410,64 @@ public boolean next(List<Cell> results) throws IOException {
                     SINGLE_COLUMN, AGG_TIMESTAMP, rowCountBytes, 0, rowCountBytes.length);
         }
         results.add(aggKeyValue);
-        return hasMore;
+        return hasMore || hasMoreIncr;
+    }
+
+    private RegionScanner getLocalScanner() throws IOException {
+        // override the filter to skip scan and open new scanner
+        // when lower bound of timerange is passed or newStartKey was populated
+        // from previous call to next()
+        if(minTimestamp!= 0) {
+            Scan incrScan = new Scan(scan);
+            incrScan.setTimeRange(minTimestamp, scan.getTimeRange().getMax());
+            incrScan.setRaw(true);
+            incrScan.setMaxVersions();
+            incrScan.getFamilyMap().clear();
+            incrScan.setCacheBlocks(false);
+            for (byte[] family : scan.getFamilyMap().keySet()) {

Review comment:
       curious why we're clearing the families set by copying the original Scan and then adding them back? 

##########
File path: phoenix-core/src/it/java/org/apache/phoenix/end2end/IndexToolForNonTxGlobalIndexIT.java
##########
@@ -587,6 +589,113 @@ public void testIndexToolForIncrementalRebuild() throws Exception {
         }
     }
 
+    @Test
+    public void testIndexToolForIncrementalVerify() throws Exception {

Review comment:
       Let's also have a test where we incrementally rebuild a view index. Since the copy-constructor of Scan also copies over filters, I _expect_ they're OK, but better to test rather than assume. :-)

##########
File path: phoenix-core/src/main/java/org/apache/phoenix/coprocessor/IndexRebuildRegionScanner.java
##########
@@ -1398,7 +1410,64 @@ public boolean next(List<Cell> results) throws IOException {
                     SINGLE_COLUMN, AGG_TIMESTAMP, rowCountBytes, 0, rowCountBytes.length);
         }
         results.add(aggKeyValue);
-        return hasMore;
+        return hasMore || hasMoreIncr;
+    }
+
+    private RegionScanner getLocalScanner() throws IOException {
+        // override the filter to skip scan and open new scanner
+        // when lower bound of timerange is passed or newStartKey was populated
+        // from previous call to next()
+        if(minTimestamp!= 0) {
+            Scan incrScan = new Scan(scan);
+            incrScan.setTimeRange(minTimestamp, scan.getTimeRange().getMax());
+            incrScan.setRaw(true);
+            incrScan.setMaxVersions();
+            incrScan.getFamilyMap().clear();
+            incrScan.setCacheBlocks(false);
+            for (byte[] family : scan.getFamilyMap().keySet()) {
+                incrScan.addFamily(family);
+            }
+            if(nextStartKey != null) {
+                incrScan.setStartRow(nextStartKey);

Review comment:
       Even if a table is sorted DESC, it's ok to go from the start of the table down to the end so long as we're consistent about order and get every row we need, right? (I think so, but want to ask because we've had a lot of bugs over the years from hidden assumptions about sorting)

##########
File path: phoenix-core/src/main/java/org/apache/phoenix/coprocessor/IndexRebuildRegionScanner.java
##########
@@ -1398,7 +1410,64 @@ public boolean next(List<Cell> results) throws IOException {
                     SINGLE_COLUMN, AGG_TIMESTAMP, rowCountBytes, 0, rowCountBytes.length);
         }
         results.add(aggKeyValue);
-        return hasMore;
+        return hasMore || hasMoreIncr;
+    }
+
+    private RegionScanner getLocalScanner() throws IOException {
+        // override the filter to skip scan and open new scanner
+        // when lower bound of timerange is passed or newStartKey was populated
+        // from previous call to next()
+        if(minTimestamp!= 0) {
+            Scan incrScan = new Scan(scan);
+            incrScan.setTimeRange(minTimestamp, scan.getTimeRange().getMax());
+            incrScan.setRaw(true);
+            incrScan.setMaxVersions();
+            incrScan.getFamilyMap().clear();
+            incrScan.setCacheBlocks(false);
+            for (byte[] family : scan.getFamilyMap().keySet()) {
+                incrScan.addFamily(family);
+            }
+            if(nextStartKey != null) {
+                incrScan.setStartRow(nextStartKey);
+            }
+            List<KeyRange> keys = new ArrayList<>();
+            try(RegionScanner scanner = region.getScanner(incrScan)) {
+                List<Cell> row = new ArrayList<>();
+                int rowCount = 0;
+                // collect row keys that have been modified in the given time-range
+                // up to the size of page to build skip scan filter
+                do {
+                    hasMoreIncr = scanner.nextRaw(row);
+                    if (!row.isEmpty()) {
+                        keys.add(PVarbinary.INSTANCE.getKeyRange(CellUtil.cloneRow(row.get(0))));
+                        rowCount++;
+                    }
+                    row.clear();
+                } while (hasMoreIncr && rowCount < pageSizeInRows);
+            }
+            if (!hasMoreIncr && keys.isEmpty()) {
+                return null;
+            }
+            if (!keys.isEmpty()) {
+                nextStartKey = ByteUtil.calculateTheClosestNextRowKeyForPrefix(keys.get(keys.size() - 1).getLowerRange());
+            }
+            try {
+                ScanRanges scanRanges = ScanRanges.createPointLookup(keys);
+                scanRanges.initializeScan(incrScan);
+                SkipScanFilter skipScanFilter = scanRanges.getSkipScanFilter();
+                incrScan.setFilter(new SkipScanFilter(skipScanFilter, true));

Review comment:
       What if there's already a Filter on that scan (such as if we're rebuilding a view index). I _think_ that's OK, because we generated our key list for the SkipScanFilter using the pre-existing view filter on incrScan which we got from the original rebuild Scan, but let's make sure we test and are really sure about that. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org