You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@phoenix.apache.org by "Rajeshbabu Chintaguntla (JIRA)" <ji...@apache.org> on 2016/05/22 23:49:12 UTC

[jira] [Commented] (PHOENIX-2903) Handle split during scan for row key ordered aggregations

    [ https://issues.apache.org/jira/browse/PHOENIX-2903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15295777#comment-15295777 ] 

Rajeshbabu Chintaguntla commented on PHOENIX-2903:
--------------------------------------------------

[~jamestaylor] The approach is very nice. 
bq. Performs a second check for the scan start/stop being within the region while we have the region lock. This covers the case of a split occurring after the preScannerOpen, but before the postScannerOpen. That's theoretically possible, right?
Yes it can happen. But we won't reach the check you have added here in case of split because region.startRegionOperation() itself identify the split and throw NotServingRegionException.
{noformat}
                 region.startRegionOperation();
                 acquiredLock = true;
                 synchronized (scanner) {
+                    // Check again while we have the region lock as it's possible
+                    // we've split after the check we did in preScannerOpen()
+                    throwIfScanOutOfRegion(scan, c.getEnvironment().getRegion());
{noformat}

bq. Sets a SCAN_START_AFTER_ROW attribute on the scan based on the previous tuple and ensures in BaseRegionScannerObserver that we ignore rows at or before that row key. We do this instead of trying to increment the row key because that won't work for the aggregation key as we don't have a complete row key.
I like this idea but we might need to scan full region if there is a split just before ending scanner(or few records left).

bq. I believe there's still one issue with this technique, though, when the scan is over a local index. Before the split occurs, there will be a merge sort occurring among all scanners across all regions. After the split occurs, the original merge sort will continue and there'll be a new merge sort, again across all regions. We really need the original merge sort to continue only for the result iterator in which the split was detected. Otherwise, we'll get duplicate rows across the new and old iterators.
After PHOENIX-2628 we are return merge sort iterator((in case required)) for just split regions together so it should return the results  in the same order as before split. Hope this works.

> Handle split during scan for row key ordered aggregations
> ---------------------------------------------------------
>
>                 Key: PHOENIX-2903
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-2903
>             Project: Phoenix
>          Issue Type: Bug
>            Reporter: James Taylor
>            Assignee: James Taylor
>         Attachments: PHOENIX-2903_v1.patch, PHOENIX-2903_wip.patch
>
>
> Currently a hole in our split detection code



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)