You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@phoenix.apache.org by GitBox <gi...@apache.org> on 2020/10/23 00:31:51 UTC

[GitHub] [phoenix] kadirozde opened a new pull request #936: PHOENIX-5998 Paged server side ungrouped aggregate operations

kadirozde opened a new pull request #936:
URL: https://github.com/apache/phoenix/pull/936


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [phoenix] stoty commented on pull request #936: PHOENIX-5998 Paged server side ungrouped aggregate operations

Posted by GitBox <gi...@apache.org>.

stoty commented on pull request #936:
URL: https://github.com/apache/phoenix/pull/936#issuecomment-724996268


   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |:----:|----------:|--------:|:--------|
   | +0 :ok: |  reexec  |   5m 15s |  Docker mode activated.  |
   ||| _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  No case conflicting files found.  |
   | +1 :green_heart: |  hbaseanti  |   0m  0s |  Patch does not have any anti-patterns.  |
   | +1 :green_heart: |  @author  |   0m  0s |  The patch does not contain any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  The patch appears to include 1 new or modified test files.  |
   ||| _ 4.x Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  10m 44s |  4.x passed  |
   | +1 :green_heart: |  compile  |   0m 54s |  4.x passed  |
   | +1 :green_heart: |  checkstyle  |   1m 19s |  4.x passed  |
   | +1 :green_heart: |  javadoc  |   0m 44s |  4.x passed  |
   | +0 :ok: |  spotbugs  |   2m 50s |  phoenix-core in 4.x has 946 extant spotbugs warnings.  |
   ||| _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   5m 11s |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 55s |  the patch passed  |
   | +1 :green_heart: |  javac  |   0m 55s |  the patch passed  |
   | -1 :x: |  checkstyle  |   1m 22s |  phoenix-core: The patch generated 311 new + 1799 unchanged - 216 fixed = 2110 total (was 2015)  |
   | +1 :green_heart: |  whitespace  |   0m  1s |  The patch has no whitespace issues.  |
   | +1 :green_heart: |  javadoc  |   0m 43s |  the patch passed  |
   | -1 :x: |  spotbugs  |   3m  6s |  phoenix-core generated 1 new + 945 unchanged - 1 fixed = 946 total (was 946)  |
   ||| _ Other Tests _ |
   | -1 :x: |  unit  | 190m 26s |  phoenix-core in the patch failed.  |
   | +1 :green_heart: |  asflicense  |   0m 37s |  The patch does not generate ASF License warnings.  |
   |  |   | 226m 53s |   |
   
   
   | Reason | Tests |
   |-------:|:------|
   | FindBugs | module:phoenix-core |
   |  |  Switch statement found in org.apache.phoenix.coprocessor.UngroupedAggregateRegionScanner.descRowKeyOrderUpgrade(List, ImmutableBytesWritable, UngroupedAggregateRegionObserver$MutationList) where default case is missing  At UngroupedAggregateRegionScanner.java:ImmutableBytesWritable, UngroupedAggregateRegionObserver$MutationList) where default case is missing  At UngroupedAggregateRegionScanner.java:[lines 382-403] |
   | Failed junit tests | TEST-[InQueryIT_0] |
   |   | phoenix.end2end.OrderByWithSpillingIT |
   |   | phoenix.end2end.index.PartialIndexRebuilderIT |
   |   | phoenix.end2end.index.MutableIndexExtendedIT |
   |   | phoenix.end2end.AlterTableIT |
   |   | phoenix.end2end.index.GlobalIndexOptimizationIT |
   |   | phoenix.end2end.OrphanViewToolIT |
   |   | phoenix.end2end.IndexToolForNonTxGlobalIndexIT |
   
   
   | Subsystem | Report/Notes |
   |----------:|:-------------|
   | Docker | ClientAPI=1.40 ServerAPI=1.40 base: https://ci-hadoop.apache.org/job/Phoenix/job/Phoenix-PreCommit-GitHub-PR/job/PR-936/7/artifact/yetus-general-check/output/Dockerfile |
   | GITHUB PR | https://github.com/apache/phoenix/pull/936 |
   | Optional Tests | dupname asflicense javac javadoc unit spotbugs hbaseanti checkstyle compile |
   | uname | Linux ad601483d01b 4.15.0-65-generic #74-Ubuntu SMP Tue Sep 17 17:06:04 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev/phoenix-personality.sh |
   | git revision | 4.x / 21e729f |
   | Default Java | Private Build-1.8.0_242-8u242-b08-0ubuntu3~16.04-b08 |
   | checkstyle | https://ci-hadoop.apache.org/job/Phoenix/job/Phoenix-PreCommit-GitHub-PR/job/PR-936/7/artifact/yetus-general-check/output/diff-checkstyle-phoenix-core.txt |
   | spotbugs | https://ci-hadoop.apache.org/job/Phoenix/job/Phoenix-PreCommit-GitHub-PR/job/PR-936/7/artifact/yetus-general-check/output/new-spotbugs-phoenix-core.html |
   | unit | https://ci-hadoop.apache.org/job/Phoenix/job/Phoenix-PreCommit-GitHub-PR/job/PR-936/7/artifact/yetus-general-check/output/patch-unit-phoenix-core.txt |
   |  Test Results | https://ci-hadoop.apache.org/job/Phoenix/job/Phoenix-PreCommit-GitHub-PR/job/PR-936/7/testReport/ |
   | Max. process+thread count | 6225 (vs. ulimit of 30000) |
   | modules | C: phoenix-core U: phoenix-core |
   | Console output | https://ci-hadoop.apache.org/job/Phoenix/job/Phoenix-PreCommit-GitHub-PR/job/PR-936/7/console |
   | versions | git=2.7.4 maven=3.3.9 spotbugs=4.1.3 |
   | Powered by | Apache Yetus 0.12.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [phoenix] kadirozde commented on a change in pull request #936: PHOENIX-5998 Paged server side ungrouped aggregate operations

Posted by GitBox <gi...@apache.org>.

kadirozde commented on a change in pull request #936:
URL: https://github.com/apache/phoenix/pull/936#discussion_r520725153



##########
File path: phoenix-core/src/main/java/org/apache/phoenix/coprocessor/UngroupedAggregateRegionObserver.java
##########
@@ -268,7 +244,7 @@ public void doMutation() throws IOException {
         }
     }
 
-    private void commitBatch(Region region, List<Mutation> mutations, long blockingMemstoreSize) throws IOException {
+    public void commitBatch(Region region, List<Mutation> mutations, long blockingMemstoreSize) throws IOException {

Review comment:
       Good suggestion, I will convert them.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [phoenix] stoty commented on pull request #936: PHOENIX-5998 Paged server side ungrouped aggregate operations

Posted by GitBox <gi...@apache.org>.

stoty commented on pull request #936:
URL: https://github.com/apache/phoenix/pull/936#issuecomment-716821761


   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |:----:|----------:|--------:|:--------|
   | +0 :ok: |  reexec  |   6m 31s |  Docker mode activated.  |
   ||| _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  1s |  No case conflicting files found.  |
   | +1 :green_heart: |  hbaseanti  |   0m  0s |  Patch does not have any anti-patterns.  |
   | +1 :green_heart: |  @author  |   0m  0s |  The patch does not contain any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  The patch appears to include 1 new or modified test files.  |
   ||| _ 4.x Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  11m 19s |  4.x passed  |
   | +1 :green_heart: |  compile  |   0m 57s |  4.x passed  |
   | +1 :green_heart: |  checkstyle  |   1m 20s |  4.x passed  |
   | +1 :green_heart: |  javadoc  |   0m 43s |  4.x passed  |
   | +0 :ok: |  spotbugs  |   2m 54s |  phoenix-core in 4.x has 954 extant spotbugs warnings.  |
   ||| _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   5m 18s |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 56s |  the patch passed  |
   | +1 :green_heart: |  javac  |   0m 56s |  the patch passed  |
   | -1 :x: |  checkstyle  |   1m 22s |  phoenix-core: The patch generated 314 new + 1796 unchanged - 219 fixed = 2110 total (was 2015)  |
   | +1 :green_heart: |  whitespace  |   0m  0s |  The patch has no whitespace issues.  |
   | +1 :green_heart: |  javadoc  |   0m 42s |  the patch passed  |
   | -1 :x: |  spotbugs  |   3m  7s |  phoenix-core generated 1 new + 953 unchanged - 1 fixed = 954 total (was 954)  |
   ||| _ Other Tests _ |
   | -1 :x: |  unit  | 188m 39s |  phoenix-core in the patch failed.  |
   | +1 :green_heart: |  asflicense  |   0m 35s |  The patch does not generate ASF License warnings.  |
   |  |   | 227m  9s |   |
   
   
   | Reason | Tests |
   |-------:|:------|
   | FindBugs | module:phoenix-core |
   |  |  Switch statement found in org.apache.phoenix.coprocessor.UngroupedAggregateRegionScanner.descRowKeyOrderUpgrade(List, ImmutableBytesWritable, UngroupedAggregateRegionObserver$MutationList) where default case is missing  At UngroupedAggregateRegionScanner.java:ImmutableBytesWritable, UngroupedAggregateRegionObserver$MutationList) where default case is missing  At UngroupedAggregateRegionScanner.java:[lines 382-403] |
   | Failed junit tests | phoenix.end2end.SequencePointInTimeIT |
   |   | phoenix.end2end.IndexToolForNonTxGlobalIndexIT |
   
   
   | Subsystem | Report/Notes |
   |----------:|:-------------|
   | Docker | ClientAPI=1.40 ServerAPI=1.40 base: https://ci-hadoop.apache.org/job/Phoenix/job/Phoenix-PreCommit-GitHub-PR/job/PR-936/5/artifact/yetus-general-check/output/Dockerfile |
   | GITHUB PR | https://github.com/apache/phoenix/pull/936 |
   | Optional Tests | dupname asflicense javac javadoc unit spotbugs hbaseanti checkstyle compile |
   | uname | Linux 3e4465cd4a6b 4.15.0-65-generic #74-Ubuntu SMP Tue Sep 17 17:06:04 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev/phoenix-personality.sh |
   | git revision | 4.x / 135692c |
   | Default Java | Private Build-1.8.0_242-8u242-b08-0ubuntu3~16.04-b08 |
   | checkstyle | https://ci-hadoop.apache.org/job/Phoenix/job/Phoenix-PreCommit-GitHub-PR/job/PR-936/5/artifact/yetus-general-check/output/diff-checkstyle-phoenix-core.txt |
   | spotbugs | https://ci-hadoop.apache.org/job/Phoenix/job/Phoenix-PreCommit-GitHub-PR/job/PR-936/5/artifact/yetus-general-check/output/new-spotbugs-phoenix-core.html |
   | unit | https://ci-hadoop.apache.org/job/Phoenix/job/Phoenix-PreCommit-GitHub-PR/job/PR-936/5/artifact/yetus-general-check/output/patch-unit-phoenix-core.txt |
   |  Test Results | https://ci-hadoop.apache.org/job/Phoenix/job/Phoenix-PreCommit-GitHub-PR/job/PR-936/5/testReport/ |
   | Max. process+thread count | 6239 (vs. ulimit of 30000) |
   | modules | C: phoenix-core U: phoenix-core |
   | Console output | https://ci-hadoop.apache.org/job/Phoenix/job/Phoenix-PreCommit-GitHub-PR/job/PR-936/5/console |
   | versions | git=2.7.4 maven=3.3.9 spotbugs=4.1.3 |
   | Powered by | Apache Yetus 0.12.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [phoenix] ChinmaySKulkarni commented on a change in pull request #936: PHOENIX-5998 Paged server side ungrouped aggregate operations

Posted by GitBox <gi...@apache.org>.

ChinmaySKulkarni commented on a change in pull request #936:
URL: https://github.com/apache/phoenix/pull/936#discussion_r520833354



##########
File path: phoenix-core/src/main/java/org/apache/phoenix/iterate/TableResultIterator.java
##########
@@ -134,6 +137,7 @@ public TableResultIterator(MutationState mutationState, Scan scan, ScanMetricsHo
         this.retry=plan.getContext().getConnection().getQueryServices().getProps()
                 .getInt(QueryConstants.HASH_JOIN_CACHE_RETRIES, QueryConstants.DEFAULT_HASH_JOIN_CACHE_RETRIES);
         IndexUtil.setScanAttributesForIndexReadRepair(scan, table, plan.getContext().getConnection());
+        scan.setAttribute(BaseScannerRegionObserver.SERVER_PAGING, TRUE_BYTES);

Review comment:
       Makes sense. For old clients this attribute won't be set so it will be like current behavior. For new clients we can set page size to Long.MAX_VALUE to disable it. 👍 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [phoenix] kadirozde commented on pull request #936: PHOENIX-5998 Paged server side ungrouped aggregate operations

Posted by GitBox <gi...@apache.org>.

kadirozde commented on pull request #936:
URL: https://github.com/apache/phoenix/pull/936#issuecomment-724872173


   > @kadirozde overall looks like a great improvement. I have added a few comments. Some questions:
   > 
   > 1. Is it more beneficial to have paging based on row size rather than number of rows, since each row can be arbitrarily large?
   > 2. Server-side pagination will help _reduce_ the chance of the race conditions mentioned in the Jira description, but does not aim at eliminating them, correct?
   > 3. Though this is aimed at such race conditions related to mutations (server-side UPSERT SELECT/DELETE), it seems like it will also affect the normal read path for non-Group_By aggregate queries. Is there any negative effect/extra slowness during reads due to this pagination, and if yes, do we want to make sure that changes only affect the write paths?
   > 
   > Let's also please add some tests for this.
   
   1. Not sure about it but we can introduce additional constraints like the total size of scanned bytes as you suggested to further improve this feature later. 
   2. This is correct. By itself, it does not eliminate. However, the client can wait for all the page operation to complete or fail before returning to the application, as an additional improvement. This will further reduce the race conditions. I think we have to enforce the client side timestamp to make the race almost impossible.
   3. I expect this feature will improve the overall performance and availability since paging limits the memory usage and the time to hold server resources. My experience with paging on a real cluster is very positive.  I have not seen any negative impact yet as long as the page size is not very small (e.g., less than 1000).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [phoenix] kadirozde commented on a change in pull request #936: PHOENIX-5998 Paged server side ungrouped aggregate operations

Posted by GitBox <gi...@apache.org>.

kadirozde commented on a change in pull request #936:
URL: https://github.com/apache/phoenix/pull/936#discussion_r521623760



##########
File path: phoenix-core/src/test/java/org/apache/phoenix/query/BaseTest.java
##########
@@ -629,6 +626,11 @@ public static Configuration setUpConfigForMiniCluster(Configuration conf, ReadOn
         conf.setInt(HConstants.HBASE_CLIENT_RETRIES_NUMBER, 2);
         conf.setInt(NUM_CONCURRENT_INDEX_WRITER_THREADS_CONF_KEY, 1);
         conf.setInt(GLOBAL_INDEX_ROW_AGE_THRESHOLD_TO_DELETE_MS_ATTRIB, 0);
+        if (conf.getLong(QueryServices.UNGROUPED_AGGREGATE_PAGE_SIZE_IN_MS, 0) == 0) {
+            conf.setLong(QueryServices.UNGROUPED_AGGREGATE_PAGE_SIZE_IN_MS, 0);
+            // This results in processing one row at a time in each next operation of the aggregate region
+            // scanner, i.e.,  one row pages

Review comment:
       This means with 0ms page is equivalent to one-row page. I will update the comment to make it clear.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [phoenix] stoty commented on pull request #936: PHOENIX-5998 Paged server side ungrouped aggregate operations

Posted by GitBox <gi...@apache.org>.

stoty commented on pull request #936:
URL: https://github.com/apache/phoenix/pull/936#issuecomment-716100721


   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |:----:|----------:|--------:|:--------|
   | +0 :ok: |  reexec  |   1m  7s |  Docker mode activated.  |
   ||| _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  No case conflicting files found.  |
   | +1 :green_heart: |  hbaseanti  |   0m  0s |  Patch does not have any anti-patterns.  |
   | +1 :green_heart: |  @author  |   0m  0s |  The patch does not contain any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  The patch appears to include 1 new or modified test files.  |
   ||| _ 4.x Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  11m 51s |  4.x passed  |
   | +1 :green_heart: |  compile  |   0m 59s |  4.x passed  |
   | +1 :green_heart: |  checkstyle  |   1m  3s |  4.x passed  |
   | +1 :green_heart: |  javadoc  |   0m 47s |  4.x passed  |
   | +0 :ok: |  spotbugs  |   3m  8s |  phoenix-core in 4.x has 954 extant spotbugs warnings.  |
   ||| _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   6m 19s |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 59s |  the patch passed  |
   | +1 :green_heart: |  javac  |   0m 59s |  the patch passed  |
   | -1 :x: |  checkstyle  |   1m  5s |  phoenix-core: The patch generated 301 new + 1807 unchanged - 208 fixed = 2108 total (was 2015)  |
   | +1 :green_heart: |  whitespace  |   0m  0s |  The patch has no whitespace issues.  |
   | +1 :green_heart: |  javadoc  |   0m 46s |  the patch passed  |
   | -1 :x: |  spotbugs  |   3m 21s |  phoenix-core generated 1 new + 953 unchanged - 1 fixed = 954 total (was 954)  |
   ||| _ Other Tests _ |
   | -1 :x: |  unit  | 186m 34s |  phoenix-core in the patch failed.  |
   | +1 :green_heart: |  asflicense  |   0m 30s |  The patch does not generate ASF License warnings.  |
   |  |   | 221m  9s |   |
   
   
   | Reason | Tests |
   |-------:|:------|
   | FindBugs | module:phoenix-core |
   |  |  Switch statement found in org.apache.phoenix.coprocessor.UngroupedAggregateRegionScanner.descRowKeyOrderUpgrade(List, ImmutableBytesWritable, UngroupedAggregateRegionObserver$MutationList) where default case is missing  At UngroupedAggregateRegionScanner.java:ImmutableBytesWritable, UngroupedAggregateRegionObserver$MutationList) where default case is missing  At UngroupedAggregateRegionScanner.java:[lines 380-401] |
   | Failed junit tests | phoenix.end2end.SpillableGroupByIT |
   |   | phoenix.execute.UpsertSelectOverlappingBatchesIT |
   |   | phoenix.util.IndexScrutinyIT |
   |   | phoenix.end2end.UpsertSelectIT |
   |   | phoenix.end2end.StoreNullsIT |
   |   | phoenix.end2end.join.HashJoinPersistentCacheIT |
   |   | TEST-[RangeScanIT_2] |
   |   | phoenix.end2end.ArithmeticQueryIT |
   |   | phoenix.end2end.index.GlobalMutableTxIndexIT |
   |   | TEST-[QueryIT_2] |
   |   | phoenix.end2end.PropertiesInSyncIT |
   
   
   | Subsystem | Report/Notes |
   |----------:|:-------------|
   | Docker | ClientAPI=1.40 ServerAPI=1.40 base: https://ci-hadoop.apache.org/job/Phoenix/job/Phoenix-PreCommit-GitHub-PR/job/PR-936/3/artifact/yetus-general-check/output/Dockerfile |
   | GITHUB PR | https://github.com/apache/phoenix/pull/936 |
   | Optional Tests | dupname asflicense javac javadoc unit spotbugs hbaseanti checkstyle compile |
   | uname | Linux 11a56c546ccb 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev/phoenix-personality.sh |
   | git revision | 4.x / 605656c |
   | Default Java | Private Build-1.8.0_242-8u242-b08-0ubuntu3~16.04-b08 |
   | checkstyle | https://ci-hadoop.apache.org/job/Phoenix/job/Phoenix-PreCommit-GitHub-PR/job/PR-936/3/artifact/yetus-general-check/output/diff-checkstyle-phoenix-core.txt |
   | spotbugs | https://ci-hadoop.apache.org/job/Phoenix/job/Phoenix-PreCommit-GitHub-PR/job/PR-936/3/artifact/yetus-general-check/output/new-spotbugs-phoenix-core.html |
   | unit | https://ci-hadoop.apache.org/job/Phoenix/job/Phoenix-PreCommit-GitHub-PR/job/PR-936/3/artifact/yetus-general-check/output/patch-unit-phoenix-core.txt |
   |  Test Results | https://ci-hadoop.apache.org/job/Phoenix/job/Phoenix-PreCommit-GitHub-PR/job/PR-936/3/testReport/ |
   | Max. process+thread count | 6167 (vs. ulimit of 30000) |
   | modules | C: phoenix-core U: phoenix-core |
   | Console output | https://ci-hadoop.apache.org/job/Phoenix/job/Phoenix-PreCommit-GitHub-PR/job/PR-936/3/console |
   | versions | git=2.7.4 maven=3.3.9 spotbugs=4.1.3 |
   | Powered by | Apache Yetus 0.12.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [phoenix] ChinmaySKulkarni commented on a change in pull request #936: PHOENIX-5998 Paged server side ungrouped aggregate operations

Posted by GitBox <gi...@apache.org>.

ChinmaySKulkarni commented on a change in pull request #936:
URL: https://github.com/apache/phoenix/pull/936#discussion_r521649318



##########
File path: phoenix-core/src/test/java/org/apache/phoenix/query/BaseTest.java
##########
@@ -629,6 +626,11 @@ public static Configuration setUpConfigForMiniCluster(Configuration conf, ReadOn
         conf.setInt(HConstants.HBASE_CLIENT_RETRIES_NUMBER, 2);
         conf.setInt(NUM_CONCURRENT_INDEX_WRITER_THREADS_CONF_KEY, 1);
         conf.setInt(GLOBAL_INDEX_ROW_AGE_THRESHOLD_TO_DELETE_MS_ATTRIB, 0);
+        if (conf.getLong(QueryServices.UNGROUPED_AGGREGATE_PAGE_SIZE_IN_MS, 0) == 0) {
+            conf.setLong(QueryServices.UNGROUPED_AGGREGATE_PAGE_SIZE_IN_MS, 0);

Review comment:
       I meant, the `if` condition is checking that the config is 0 and then setting it to zero itself. Perhaps you meant to invert the condition




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [phoenix] ChinmaySKulkarni commented on a change in pull request #936: PHOENIX-5998 Paged server side ungrouped aggregate operations

Posted by GitBox <gi...@apache.org>.

ChinmaySKulkarni commented on a change in pull request #936:
URL: https://github.com/apache/phoenix/pull/936#discussion_r521609388



##########
File path: phoenix-core/src/test/java/org/apache/phoenix/query/BaseTest.java
##########
@@ -629,6 +626,11 @@ public static Configuration setUpConfigForMiniCluster(Configuration conf, ReadOn
         conf.setInt(HConstants.HBASE_CLIENT_RETRIES_NUMBER, 2);
         conf.setInt(NUM_CONCURRENT_INDEX_WRITER_THREADS_CONF_KEY, 1);
         conf.setInt(GLOBAL_INDEX_ROW_AGE_THRESHOLD_TO_DELETE_MS_ATTRIB, 0);
+        if (conf.getLong(QueryServices.UNGROUPED_AGGREGATE_PAGE_SIZE_IN_MS, 0) == 0) {
+            conf.setLong(QueryServices.UNGROUPED_AGGREGATE_PAGE_SIZE_IN_MS, 0);

Review comment:
       Did you mean to set this to some non-zero value?

##########
File path: phoenix-core/src/test/java/org/apache/phoenix/query/BaseTest.java
##########
@@ -629,6 +626,11 @@ public static Configuration setUpConfigForMiniCluster(Configuration conf, ReadOn
         conf.setInt(HConstants.HBASE_CLIENT_RETRIES_NUMBER, 2);
         conf.setInt(NUM_CONCURRENT_INDEX_WRITER_THREADS_CONF_KEY, 1);
         conf.setInt(GLOBAL_INDEX_ROW_AGE_THRESHOLD_TO_DELETE_MS_ATTRIB, 0);
+        if (conf.getLong(QueryServices.UNGROUPED_AGGREGATE_PAGE_SIZE_IN_MS, 0) == 0) {
+            conf.setLong(QueryServices.UNGROUPED_AGGREGATE_PAGE_SIZE_IN_MS, 0);
+            // This results in processing one row at a time in each next operation of the aggregate region
+            // scanner, i.e.,  one row pages

Review comment:
       This comment needs to be changed to reflect the "time" aspect of paging rather than number of rows, right?

##########
File path: phoenix-core/src/main/java/org/apache/phoenix/coprocessor/UngroupedAggregateRegionScanner.java
##########
@@ -0,0 +1,646 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.phoenix.coprocessor;
+
+import static org.apache.phoenix.coprocessor.BaseScannerRegionObserver.LOCAL_INDEX_BUILD;
+import static org.apache.phoenix.coprocessor.BaseScannerRegionObserver.LOCAL_INDEX_BUILD_PROTO;
+import static org.apache.phoenix.coprocessor.BaseScannerRegionObserver.REPLAY_WRITES;
+import static org.apache.phoenix.coprocessor.BaseScannerRegionObserver.UPGRADE_DESC_ROW_KEY;
+import static org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver.checkForLocalIndexColumnFamilies;
+import static org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver.deserializeExpressions;
+import static org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver.deserializeTable;
+import static org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver.getBlockingMemstoreSize;
+import static org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver.setIndexAndTransactionProperties;
+import static org.apache.phoenix.query.QueryConstants.AGG_TIMESTAMP;
+import static org.apache.phoenix.query.QueryConstants.SINGLE_COLUMN;
+import static org.apache.phoenix.query.QueryConstants.SINGLE_COLUMN_FAMILY;
+import static org.apache.phoenix.query.QueryServices.MUTATE_BATCH_SIZE_ATTRIB;
+import static org.apache.phoenix.query.QueryServices.MUTATE_BATCH_SIZE_BYTES_ATTRIB;
+import static org.apache.phoenix.query.QueryServices.UNGROUPED_AGGREGATE_PAGE_SIZE_IN_MS;
+import static org.apache.phoenix.schema.PTableImpl.getColumnsToClone;
+
+import java.io.IOException;
+import java.sql.SQLException;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.List;
+import java.util.Set;
+
+import com.google.common.collect.Sets;
+import com.google.common.primitives.Ints;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.hbase.Cell;
+import org.apache.hadoop.hbase.CellUtil;
+import org.apache.hadoop.hbase.DoNotRetryIOException;
+import org.apache.hadoop.hbase.HRegionInfo;
+import org.apache.hadoop.hbase.KeyValue;
+import org.apache.hadoop.hbase.client.Delete;
+import org.apache.hadoop.hbase.client.HTable;
+import org.apache.hadoop.hbase.client.Mutation;
+import org.apache.hadoop.hbase.client.Put;
+import org.apache.hadoop.hbase.client.Scan;
+import org.apache.hadoop.hbase.coprocessor.ObserverContext;
+import org.apache.hadoop.hbase.coprocessor.RegionCoprocessorEnvironment;
+import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
+import org.apache.hadoop.hbase.regionserver.Region;
+import org.apache.hadoop.hbase.regionserver.RegionScanner;
+import org.apache.hadoop.hbase.util.Bytes;
+import org.apache.phoenix.cache.GlobalCache;
+import org.apache.phoenix.cache.TenantCache;
+import org.apache.phoenix.exception.DataExceedsCapacityException;
+import org.apache.phoenix.expression.Expression;
+import org.apache.phoenix.expression.aggregator.Aggregator;
+import org.apache.phoenix.expression.aggregator.Aggregators;
+import org.apache.phoenix.expression.aggregator.ServerAggregators;
+import org.apache.phoenix.hbase.index.covered.update.ColumnReference;
+import org.apache.phoenix.hbase.index.util.GenericKeyValueBuilder;
+import org.apache.phoenix.index.PhoenixIndexCodec;
+import org.apache.phoenix.memory.InsufficientMemoryException;
+import org.apache.phoenix.memory.MemoryManager;
+import org.apache.phoenix.query.QueryConstants;
+import org.apache.phoenix.query.QueryServicesOptions;
+import org.apache.phoenix.schema.PColumn;
+import org.apache.phoenix.schema.PRow;
+import org.apache.phoenix.schema.PTable;
+import org.apache.phoenix.schema.PTableImpl;
+import org.apache.phoenix.schema.PTableType;
+import org.apache.phoenix.schema.RowKeySchema;
+import org.apache.phoenix.schema.SortOrder;
+import org.apache.phoenix.schema.TableRef;
+import org.apache.phoenix.schema.ValueSchema;
+import org.apache.phoenix.schema.tuple.EncodedColumnQualiferCellsList;
+import org.apache.phoenix.schema.tuple.MultiKeyValueTuple;
+import org.apache.phoenix.schema.tuple.PositionBasedMultiKeyValueTuple;
+import org.apache.phoenix.schema.tuple.Tuple;
+import org.apache.phoenix.schema.types.PBinary;
+import org.apache.phoenix.schema.types.PChar;
+import org.apache.phoenix.schema.types.PDataType;
+import org.apache.phoenix.schema.types.PDouble;
+import org.apache.phoenix.schema.types.PFloat;
+import org.apache.phoenix.transaction.PhoenixTransactionContext;
+import org.apache.phoenix.transaction.PhoenixTransactionProvider;
+import org.apache.phoenix.transaction.TransactionFactory;
+import org.apache.phoenix.util.ByteUtil;
+import org.apache.phoenix.util.EncodedColumnsUtil;
+import org.apache.hadoop.hbase.util.Pair;
+import org.apache.phoenix.hbase.index.ValueGetter;
+import org.apache.phoenix.hbase.index.util.ImmutableBytesPtr;
+import org.apache.phoenix.index.IndexMaintainer;
+import org.apache.phoenix.util.EnvironmentEdgeManager;
+import org.apache.phoenix.util.ExpressionUtil;
+import org.apache.phoenix.util.IndexUtil;
+import org.apache.phoenix.util.KeyValueUtil;
+import org.apache.phoenix.util.LogUtil;
+import org.apache.phoenix.util.ScanUtil;
+import org.apache.phoenix.util.ServerUtil;
+import org.apache.phoenix.util.StringUtil;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+public class UngroupedAggregateRegionScanner extends BaseRegionScanner {
+
+    private static final Logger LOGGER = LoggerFactory.getLogger(UngroupedAggregateRegionScanner.class);
+
+    protected long pageSizeInMs = Long.MAX_VALUE;
+    protected int maxBatchSize = 0;
+    protected Scan scan;
+    protected RegionScanner innerScanner;
+    protected Region region;
+    private final UngroupedAggregateRegionObserver ungroupedAggregateRegionObserver;
+    final private RegionCoprocessorEnvironment env;
+    private final boolean useQualifierAsIndex;
+    boolean needToWrite = false;
+    final Pair<Integer, Integer> minMaxQualifiers;
+    byte[][] values = null;
+    PTable.QualifierEncodingScheme encodingScheme;
+    PTable writeToTable = null;
+    PTable projectedTable = null;
+    boolean isDescRowKeyOrderUpgrade;
+    final int offset;
+    boolean buildLocalIndex;
+    List<IndexMaintainer> indexMaintainers;
+    boolean isPKChanging = false;
+    long ts;
+    PhoenixTransactionProvider txnProvider = null;
+    UngroupedAggregateRegionObserver.MutationList indexMutations;
+    boolean isDelete = false;
+    byte[] replayMutations;
+    boolean isUpsert = false;
+    List<Expression> selectExpressions = null;
+    byte[] deleteCQ = null;
+    byte[] deleteCF = null;
+    byte[] emptyCF = null;
+    byte[] indexUUID;
+    byte[] txState;
+    byte[] clientVersionBytes;
+    long blockingMemStoreSize;
+    long maxBatchSizeBytes = 0L;
+    HTable targetHTable = null;
+    boolean incrScanRefCount = false;
+    byte[] indexMaintainersPtr;
+    boolean useIndexProto;
+
+    public UngroupedAggregateRegionScanner(final ObserverContext<RegionCoprocessorEnvironment> c,
+                                           final RegionScanner innerScanner, final Region region, final Scan scan,
+                                           final RegionCoprocessorEnvironment env,
+                                           final UngroupedAggregateRegionObserver ungroupedAggregateRegionObserver)
+            throws IOException, SQLException{
+        super(innerScanner);
+        this.env = env;
+        this.region = region;
+        this.scan = scan;
+        this.ungroupedAggregateRegionObserver = ungroupedAggregateRegionObserver;
+        this.innerScanner = innerScanner;
+        Configuration conf = env.getConfiguration();
+        if (scan.getAttribute(BaseScannerRegionObserver.SERVER_PAGING) != null) {
+            byte[] pageSizeFromScan =
+                    scan.getAttribute(BaseScannerRegionObserver.AGGREGATE_PAGE_ROWS);
+            if (pageSizeFromScan != null) {
+                pageSizeInMs = Bytes.toLong(pageSizeFromScan);
+            } else {
+                pageSizeInMs =
+                        conf.getLong(UNGROUPED_AGGREGATE_PAGE_SIZE_IN_MS,
+                                QueryServicesOptions.DEFAULT_UNGROUPED_AGGREGATE_PAGE_SIZE_IN_MS);
+            }
+        }
+        ts = scan.getTimeRange().getMax();
+        boolean localIndexScan = ScanUtil.isLocalIndex(scan);
+
+        encodingScheme = EncodedColumnsUtil.getQualifierEncodingScheme(scan);
+        int offsetToBe = 0;
+        if (localIndexScan) {
+            /*
+             * For local indexes, we need to set an offset on row key expressions to skip
+             * the region start key.
+             */
+            offsetToBe = region.getRegionInfo().getStartKey().length != 0 ? region.getRegionInfo().getStartKey().length :
+                    region.getRegionInfo().getEndKey().length;
+            ScanUtil.setRowKeyOffset(scan, offsetToBe);
+        }
+        offset = offsetToBe;
+
+        byte[] descRowKeyTableBytes = scan.getAttribute(UPGRADE_DESC_ROW_KEY);
+        isDescRowKeyOrderUpgrade = descRowKeyTableBytes != null;
+        if (isDescRowKeyOrderUpgrade) {
+            LOGGER.debug("Upgrading row key for " + region.getRegionInfo().getTable().getNameAsString());
+            projectedTable = deserializeTable(descRowKeyTableBytes);
+            try {
+                writeToTable = PTableImpl.builderWithColumns(projectedTable,
+                        getColumnsToClone(projectedTable))
+                        .setRowKeyOrderOptimizable(true)
+                        .build();
+            } catch (SQLException e) {
+                ServerUtil.throwIOException("Upgrade failed", e); // Impossible
+            }
+            values = new byte[projectedTable.getPKColumns().size()][];
+        }
+        boolean useProto = false;
+        byte[] localIndexBytes = scan.getAttribute(LOCAL_INDEX_BUILD_PROTO);
+        useProto = localIndexBytes != null;
+        if (localIndexBytes == null) {
+            localIndexBytes = scan.getAttribute(LOCAL_INDEX_BUILD);
+        }
+        indexMaintainers = localIndexBytes == null ? null : IndexMaintainer.deserialize(localIndexBytes, useProto);
+        indexMutations = localIndexBytes == null ? new UngroupedAggregateRegionObserver.MutationList() : new UngroupedAggregateRegionObserver.MutationList(1024);
+
+        replayMutations = scan.getAttribute(REPLAY_WRITES);
+        indexUUID = scan.getAttribute(PhoenixIndexCodec.INDEX_UUID);
+        txState = scan.getAttribute(BaseScannerRegionObserver.TX_STATE);
+        clientVersionBytes = scan.getAttribute(BaseScannerRegionObserver.CLIENT_VERSION);
+        if (txState != null) {
+            int clientVersion = clientVersionBytes == null ? ScanUtil.UNKNOWN_CLIENT_VERSION : Bytes.toInt(clientVersionBytes);
+            txnProvider = TransactionFactory.getTransactionProvider(txState, clientVersion);
+        }
+        byte[] upsertSelectTable = scan.getAttribute(BaseScannerRegionObserver.UPSERT_SELECT_TABLE);
+        if (upsertSelectTable != null) {
+            isUpsert = true;
+            projectedTable = deserializeTable(upsertSelectTable);
+            targetHTable = new HTable(ungroupedAggregateRegionObserver.getUpsertSelectConfig(),
+                    projectedTable.getPhysicalName().getBytes());
+            selectExpressions = deserializeExpressions(scan.getAttribute(BaseScannerRegionObserver.UPSERT_SELECT_EXPRS));
+            values = new byte[projectedTable.getPKColumns().size()][];
+            isPKChanging = ExpressionUtil.isPkPositionChanging(new TableRef(projectedTable), selectExpressions);
+        } else {
+            byte[] isDeleteAgg = scan.getAttribute(BaseScannerRegionObserver.DELETE_AGG);
+            isDelete = isDeleteAgg != null && Bytes.compareTo(PDataType.TRUE_BYTES, isDeleteAgg) == 0;
+            if (!isDelete) {
+                deleteCF = scan.getAttribute(BaseScannerRegionObserver.DELETE_CF);
+                deleteCQ = scan.getAttribute(BaseScannerRegionObserver.DELETE_CQ);
+            }
+            emptyCF = scan.getAttribute(BaseScannerRegionObserver.EMPTY_CF);
+        }
+        ColumnReference[] dataColumns = IndexUtil.deserializeDataTableColumnsToJoin(scan);
+        useQualifierAsIndex = EncodedColumnsUtil.useQualifierAsIndex(EncodedColumnsUtil.getMinMaxQualifiersFromScan(scan));
+
+        /**
+         * Slow down the writes if the memstore size more than
+         * (hbase.hregion.memstore.block.multiplier - 1) times hbase.hregion.memstore.flush.size
+         * bytes. This avoids flush storm to hdfs for cases like index building where reads and
+         * write happen to all the table regions in the server.
+         */
+        blockingMemStoreSize = getBlockingMemstoreSize(region, conf) ;
+
+        buildLocalIndex = indexMaintainers != null && dataColumns==null && !localIndexScan;
+        if(buildLocalIndex) {
+            checkForLocalIndexColumnFamilies(region, indexMaintainers);
+        }
+        if (isDescRowKeyOrderUpgrade || isDelete || isUpsert
+                || (deleteCQ != null && deleteCF != null) || emptyCF != null || buildLocalIndex) {
+            needToWrite = true;
+            if((isUpsert && (targetHTable == null ||
+                    !targetHTable.getName().equals(region.getTableDesc().getTableName())))) {
+                needToWrite = false;
+            }
+            maxBatchSize = conf.getInt(MUTATE_BATCH_SIZE_ATTRIB, QueryServicesOptions.DEFAULT_MUTATE_BATCH_SIZE);
+            maxBatchSizeBytes = conf.getLong(MUTATE_BATCH_SIZE_BYTES_ATTRIB,
+                    QueryServicesOptions.DEFAULT_MUTATE_BATCH_SIZE_BYTES);
+        }
+        minMaxQualifiers = EncodedColumnsUtil.getMinMaxQualifiersFromScan(scan);
+        if (LOGGER.isDebugEnabled()) {
+            LOGGER.debug(LogUtil.addCustomAnnotations("Starting ungrouped coprocessor scan " + scan + " " + region.getRegionInfo(), ScanUtil.getCustomAnnotations(scan)));
+        }
+        useIndexProto = true;
+        indexMaintainersPtr = scan.getAttribute(PhoenixIndexCodec.INDEX_PROTO_MD);
+        // for backward compatiblity fall back to look by the old attribute
+        if (indexMaintainersPtr == null) {
+            indexMaintainersPtr = scan.getAttribute(PhoenixIndexCodec.INDEX_MD);
+            useIndexProto = false;
+        }
+
+        if (needToWrite) {
+            ungroupedAggregateRegionObserver.incrementScansReferenceCount();
+            incrScanRefCount = true;
+        }
+    }
+
+    @Override
+    public HRegionInfo getRegionInfo() {
+        return region.getRegionInfo();
+    }
+
+    @Override
+    public boolean isFilterDone() {
+        return false;
+    }
+
+    @Override
+    public void close() throws IOException {
+        if (needToWrite && incrScanRefCount) {
+            ungroupedAggregateRegionObserver.decrementScansReferenceCount();
+        }
+        try {
+            if (targetHTable != null) {
+                try {
+                    targetHTable.close();
+                } catch (IOException e) {
+                    LOGGER.error("Closing table: " + targetHTable + " failed: ", e);
+                }
+            }
+        } finally {
+            innerScanner.close();
+        }
+    }
+
+    boolean descRowKeyOrderUpgrade(List<Cell> results, ImmutableBytesWritable ptr,
+                                UngroupedAggregateRegionObserver.MutationList mutations) throws IOException {
+        Arrays.fill(values, null);
+        Cell firstKV = results.get(0);
+        RowKeySchema schema = projectedTable.getRowKeySchema();
+        int maxOffset = schema.iterator(firstKV.getRowArray(), firstKV.getRowOffset() + offset, firstKV.getRowLength(), ptr);
+        for (int i = 0; i < schema.getFieldCount(); i++) {
+            Boolean hasValue = schema.next(ptr, i, maxOffset);
+            if (hasValue == null) {
+                break;
+            }
+            ValueSchema.Field field = schema.getField(i);
+            if (field.getSortOrder() == SortOrder.DESC) {
+                // Special case for re-writing DESC ARRAY, as the actual byte value needs to change in this case
+                if (field.getDataType().isArrayType()) {
+                    field.getDataType().coerceBytes(ptr, null, field.getDataType(),
+                            field.getMaxLength(), field.getScale(), field.getSortOrder(),
+                            field.getMaxLength(), field.getScale(), field.getSortOrder(), true); // force to use correct separator byte
+                }
+                // Special case for re-writing DESC CHAR or DESC BINARY, to force the re-writing of trailing space characters
+                else if (field.getDataType() == PChar.INSTANCE || field.getDataType() == PBinary.INSTANCE) {
+                    int len = ptr.getLength();
+                    while (len > 0 && ptr.get()[ptr.getOffset() + len - 1] == StringUtil.SPACE_UTF8) {
+                        len--;
+                    }
+                    ptr.set(ptr.get(), ptr.getOffset(), len);
+                    // Special case for re-writing DESC FLOAT and DOUBLE, as they're not inverted like they should be (PHOENIX-2171)
+                } else if (field.getDataType() == PFloat.INSTANCE || field.getDataType() == PDouble.INSTANCE) {
+                    byte[] invertedBytes = SortOrder.invert(ptr.get(), ptr.getOffset(), ptr.getLength());
+                    ptr.set(invertedBytes);
+                }
+            } else if (field.getDataType() == PBinary.INSTANCE) {
+                // Remove trailing space characters so that the setValues call below will replace them
+                // with the correct zero byte character. Note this is somewhat dangerous as these
+                // could be legit, but I don't know what the alternative is.
+                int len = ptr.getLength();
+                while (len > 0 && ptr.get()[ptr.getOffset() + len - 1] == StringUtil.SPACE_UTF8) {
+                    len--;
+                }
+                ptr.set(ptr.get(), ptr.getOffset(), len);
+            }
+            values[i] = ptr.copyBytes();
+        }
+        writeToTable.newKey(ptr, values);
+        if (Bytes.compareTo(
+                firstKV.getRowArray(), firstKV.getRowOffset() + offset, firstKV.getRowLength(),
+                ptr.get(),ptr.getOffset() + offset,ptr.getLength()) == 0) {
+            return false;
+        }
+        byte[] newRow = ByteUtil.copyKeyBytesIfNecessary(ptr);
+        if (offset > 0) { // for local indexes (prepend region start key)
+            byte[] newRowWithOffset = new byte[offset + newRow.length];
+            System.arraycopy(firstKV.getRowArray(), firstKV.getRowOffset(), newRowWithOffset, 0, offset);;
+            System.arraycopy(newRow, 0, newRowWithOffset, offset, newRow.length);
+            newRow = newRowWithOffset;
+        }
+        byte[] oldRow = Bytes.copy(firstKV.getRowArray(), firstKV.getRowOffset(), firstKV.getRowLength());
+        for (Cell cell : results) {
+            // Copy existing cell but with new row key
+            Cell newCell = new KeyValue(newRow, 0, newRow.length,
+                    cell.getFamilyArray(), cell.getFamilyOffset(), cell.getFamilyLength(),
+                    cell.getQualifierArray(), cell.getQualifierOffset(), cell.getQualifierLength(),
+                    cell.getTimestamp(), KeyValue.Type.codeToType(cell.getTypeByte()),
+                    cell.getValueArray(), cell.getValueOffset(), cell.getValueLength());
+            switch (KeyValue.Type.codeToType(cell.getTypeByte())) {
+                case Put:
+                    // If Put, point delete old Put
+                    Delete del = new Delete(oldRow);
+                    del.addDeleteMarker(new KeyValue(cell.getRowArray(), cell.getRowOffset(), cell.getRowLength(),
+                            cell.getFamilyArray(), cell.getFamilyOffset(), cell.getFamilyLength(),
+                            cell.getQualifierArray(), cell.getQualifierOffset(),
+                            cell.getQualifierLength(), cell.getTimestamp(), KeyValue.Type.Delete,
+                            ByteUtil.EMPTY_BYTE_ARRAY, 0, 0));
+                    mutations.add(del);
+
+                    Put put = new Put(newRow);
+                    put.add(newCell);
+                    mutations.add(put);
+                    break;
+                case Delete:
+                case DeleteColumn:
+                case DeleteFamily:
+                case DeleteFamilyVersion:
+                    Delete delete = new Delete(newRow);
+                    delete.addDeleteMarker(newCell);
+                    mutations.add(delete);
+                    break;
+            }
+        }
+        return true;
+    }
+
+    void buildLocalIndex(Tuple result, List<Cell> results, ImmutableBytesWritable ptr,
+                         UngroupedAggregateRegionObserver.MutationList mutations) throws IOException {
+        for (IndexMaintainer maintainer : indexMaintainers) {
+            if (!results.isEmpty()) {
+                result.getKey(ptr);
+                ValueGetter valueGetter =
+                        maintainer.createGetterFromKeyValues(
+                                ImmutableBytesPtr.copyBytesIfNecessary(ptr),
+                                results);
+                Put put = maintainer.buildUpdateMutation(GenericKeyValueBuilder.INSTANCE,
+                        valueGetter, ptr, results.get(0).getTimestamp(),
+                        env.getRegion().getRegionInfo().getStartKey(),
+                        env.getRegion().getRegionInfo().getEndKey());
+
+                if (txnProvider != null) {
+                    put = txnProvider.markPutAsCommitted(put, ts, ts);
+                }
+                indexMutations.add(put);
+            }
+        }
+        result.setKeyValues(results);
+    }
+    void deleteRow(List<Cell> results, UngroupedAggregateRegionObserver.MutationList mutations) {
+        // FIXME: the version of the Delete constructor without the lock
+        // args was introduced in 0.94.4, thus if we try to use it here
+        // we can no longer use the 0.94.2 version of the client.
+        Cell firstKV = results.get(0);
+        Delete delete = new Delete(firstKV.getRowArray(),
+                firstKV.getRowOffset(), firstKV.getRowLength(),ts);
+        if (replayMutations != null) {
+            delete.setAttribute(REPLAY_WRITES, replayMutations);
+        }
+        mutations.add(delete);
+        // force tephra to ignore this deletes
+        delete.setAttribute(PhoenixTransactionContext.TX_ROLLBACK_ATTRIBUTE_KEY, new byte[0]);
+    }
+
+    void deleteCForQ(Tuple result, List<Cell> results, UngroupedAggregateRegionObserver.MutationList mutations) {
+        // No need to search for delete column, since we project only it
+        // if no empty key value is being set
+        if (emptyCF == null ||
+                result.getValue(deleteCF, deleteCQ) != null) {
+            Delete delete = new Delete(results.get(0).getRowArray(),
+                    results.get(0).getRowOffset(),
+                    results.get(0).getRowLength());
+            delete.deleteColumns(deleteCF,  deleteCQ, ts);
+            // force tephra to ignore this deletes
+            delete.setAttribute(PhoenixTransactionContext.TX_ROLLBACK_ATTRIBUTE_KEY, new byte[0]);
+            mutations.add(delete);
+        }
+    }
+    void upsert(Tuple result, ImmutableBytesWritable ptr, UngroupedAggregateRegionObserver.MutationList mutations) {
+        Arrays.fill(values, null);
+        int bucketNumOffset = 0;
+        if (projectedTable.getBucketNum() != null) {
+            values[0] = new byte[] { 0 };
+            bucketNumOffset = 1;
+        }
+        int i = bucketNumOffset;
+        List<PColumn> projectedColumns = projectedTable.getColumns();
+        for (; i < projectedTable.getPKColumns().size(); i++) {
+            Expression expression = selectExpressions.get(i - bucketNumOffset);
+            if (expression.evaluate(result, ptr)) {
+                values[i] = ptr.copyBytes();
+                // If SortOrder from expression in SELECT doesn't match the
+                // column being projected into then invert the bits.
+                if (expression.getSortOrder() !=
+                        projectedColumns.get(i).getSortOrder()) {
+                    SortOrder.invert(values[i], 0, values[i], 0,
+                            values[i].length);
+                }
+            }else{
+                values[i] = ByteUtil.EMPTY_BYTE_ARRAY;
+            }
+        }
+        projectedTable.newKey(ptr, values);
+        PRow row = projectedTable.newRow(GenericKeyValueBuilder.INSTANCE, ts, ptr, false);
+        for (; i < projectedColumns.size(); i++) {
+            Expression expression = selectExpressions.get(i - bucketNumOffset);
+            if (expression.evaluate(result, ptr)) {
+                PColumn column = projectedColumns.get(i);
+                if (!column.getDataType().isSizeCompatible(ptr, null,
+                        expression.getDataType(), expression.getSortOrder(),
+                        expression.getMaxLength(), expression.getScale(),
+                        column.getMaxLength(), column.getScale())) {
+                    throw new DataExceedsCapacityException(
+                            column.getDataType(), column.getMaxLength(),
+                            column.getScale(), column.getName().getString(), ptr);
+                }
+                column.getDataType().coerceBytes(ptr, null,
+                        expression.getDataType(), expression.getMaxLength(),
+                        expression.getScale(), expression.getSortOrder(),
+                        column.getMaxLength(), column.getScale(),
+                        column.getSortOrder(), projectedTable.rowKeyOrderOptimizable());
+                byte[] bytes = ByteUtil.copyKeyBytesIfNecessary(ptr);
+                row.setValue(column, bytes);
+            }
+        }
+        for (Mutation mutation : row.toRowMutations()) {
+            if (replayMutations != null) {
+                mutation.setAttribute(REPLAY_WRITES, replayMutations);
+            } else if (txnProvider != null && projectedTable.getType() == PTableType.INDEX) {
+                mutation = txnProvider.markPutAsCommitted((Put)mutation, ts, ts);
+            }
+            mutations.add(mutation);
+        }
+        for (i = 0; i < selectExpressions.size(); i++) {
+            selectExpressions.get(i).reset();
+        }
+    }
+
+    void insertEmptyKeyValue(List<Cell> results, UngroupedAggregateRegionObserver.MutationList mutations) {
+        Set<Long> timeStamps =
+                Sets.newHashSetWithExpectedSize(results.size());
+        for (Cell kv : results) {
+            long kvts = kv.getTimestamp();
+            if (!timeStamps.contains(kvts)) {
+                Put put = new Put(kv.getRowArray(), kv.getRowOffset(),
+                        kv.getRowLength());
+                put.addColumn(emptyCF, QueryConstants.EMPTY_COLUMN_BYTES, kvts,
+                        ByteUtil.EMPTY_BYTE_ARRAY);
+                mutations.add(put);
+            }
+        }
+    }
+    @Override
+    public boolean next(List<Cell> resultsToReturn) throws IOException {
+        boolean hasMore;
+        long startTime = EnvironmentEdgeManager.currentTimeMillis();
+        Configuration conf = env.getConfiguration();
+        final TenantCache tenantCache = GlobalCache.getTenantCache(env, ScanUtil.getTenantId(scan));
+        try (MemoryManager.MemoryChunk em = tenantCache.getMemoryManager().allocate(0)) {
+            Aggregators aggregators = ServerAggregators.deserialize(
+                    scan.getAttribute(BaseScannerRegionObserver.AGGREGATORS), conf, em);
+            Aggregator[] rowAggregators = aggregators.getAggregators();
+            aggregators.reset(rowAggregators);
+            Cell lastCell = null;
+            boolean hasAny = false;
+            ImmutableBytesWritable ptr = new ImmutableBytesWritable();
+            Tuple result = useQualifierAsIndex ? new PositionBasedMultiKeyValueTuple() : new MultiKeyValueTuple();
+            UngroupedAggregateRegionObserver.MutationList mutations = new UngroupedAggregateRegionObserver.MutationList();
+            if (isDescRowKeyOrderUpgrade || isDelete || isUpsert
+                    || (deleteCQ != null && deleteCF != null) || emptyCF != null || buildLocalIndex) {
+                mutations = new UngroupedAggregateRegionObserver.MutationList(Ints.saturatedCast(maxBatchSize + maxBatchSize / 10));
+            }
+            region.startRegionOperation();
+            try {
+                synchronized (innerScanner) {
+                    do {
+                        ungroupedAggregateRegionObserver.checkForRegionClosing();
+                        List<Cell> results = useQualifierAsIndex ? new EncodedColumnQualiferCellsList(minMaxQualifiers.getFirst(), minMaxQualifiers.getSecond(), encodingScheme) : new ArrayList<Cell>();
+                        // Results are potentially returned even when the return value of s.next is false
+                        // since this is an indication of whether or not there are more values after the
+                        // ones returned
+                        hasMore = innerScanner.nextRaw(results);
+                        if (!results.isEmpty()) {
+                            lastCell = results.get(0);
+                            result.setKeyValues(results);
+                            if (isDescRowKeyOrderUpgrade) {
+                                if (!descRowKeyOrderUpgrade(results, ptr, mutations)) {
+                                    continue;
+                                }
+                            } else if (buildLocalIndex) {
+                                buildLocalIndex(result, results, ptr, mutations);
+                            } else if (isDelete) {
+                                deleteRow(results, mutations);
+                            } else if (isUpsert) {
+                                upsert(result, ptr, mutations);
+                            } else if (deleteCF != null && deleteCQ != null) {
+                                deleteCForQ(result, results, mutations);
+                            }
+                            if (emptyCF != null) {
+                                /*
+                                 * If we've specified an emptyCF, then we need to insert an empty
+                                 * key value "retroactively" for any key value that is visible at
+                                 * the timestamp that the DDL was issued. Key values that are not
+                                 * visible at this timestamp will not ever be projected up to
+                                 * scans past this timestamp, so don't need to be considered.
+                                 * We insert one empty key value per row per timestamp.
+                                 */
+                                insertEmptyKeyValue(results, mutations);
+                            }
+                            if (ServerUtil.readyToCommit(mutations.size(), mutations.byteSize(), maxBatchSize, maxBatchSizeBytes)) {
+                                ungroupedAggregateRegionObserver.commit(region, mutations, indexUUID, blockingMemStoreSize, indexMaintainersPtr,
+                                        txState, targetHTable, useIndexProto, isPKChanging, clientVersionBytes);
+                                mutations.clear();
+                            }
+                            // Commit in batches based on UPSERT_BATCH_SIZE_BYTES_ATTRIB in config
+
+                            if (ServerUtil.readyToCommit(indexMutations.size(), indexMutations.byteSize(), maxBatchSize, maxBatchSizeBytes)) {
+                                setIndexAndTransactionProperties(indexMutations, indexUUID, indexMaintainersPtr, txState, clientVersionBytes, useIndexProto);
+                                ungroupedAggregateRegionObserver.commitBatch(region, indexMutations, blockingMemStoreSize);
+                                indexMutations.clear();
+                            }
+                            aggregators.aggregate(rowAggregators, result);
+                            hasAny = true;
+                        }
+                    } while (hasMore && (EnvironmentEdgeManager.currentTimeMillis() - startTime) < pageSizeInMs);

Review comment:
       This logic is not sufficient to ensure that only `pageSizeInMs` time is spent on the server-side operation. If say `pageSizeInMs=1000` and the last iteration starts at 990 ms but takes much longer than 10 ms, we can end up spending much more time than the configured page size duration. Is that fine?
   
   If not, we can use something like `ExecutorService` with a timeout and do the entire `do while` logic inside a separate thread. This ensures that only `pageSizeInMs` time is ever spent on this. This might complicate things a bit however if there is work being done inside the loop which might be left in an incomplete state on thread interruption. Also, if we go with this approach, we don't guarantee even 1 iteration of the loop in spite of it being a `do while`.
   
   Depends on how strict we want the paging duration limit to be. What do you think?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [phoenix] kadirozde commented on a change in pull request #936: PHOENIX-5998 Paged server side ungrouped aggregate operations

Posted by GitBox <gi...@apache.org>.

kadirozde commented on a change in pull request #936:
URL: https://github.com/apache/phoenix/pull/936#discussion_r520726571



##########
File path: phoenix-core/src/main/java/org/apache/phoenix/coprocessor/UngroupedAggregateRegionObserver.java
##########
@@ -516,402 +437,19 @@ public RegionScanner run() throws Exception {
             }
             ImmutableBytesWritable tempPtr = new ImmutableBytesWritable();
             theScanner =
-                    getWrappedScanner(c, theScanner, offset, scan, dataColumns, tupleProjector, 
-                        region, indexMaintainers == null ? null : indexMaintainers.get(0), viewConstants, p, tempPtr, useQualifierAsIndex);
-        } 
-        
-        if (j != null)  {
-            theScanner = new HashJoinRegionScanner(theScanner, p, j, ScanUtil.getTenantId(scan), env, useQualifierAsIndex, useNewValueColumnQualifier);
-        }
-        
-        int maxBatchSize = 0;
-        long maxBatchSizeBytes = 0L;
-        MutationList mutations = new MutationList();
-        boolean needToWrite = false;
-        Configuration conf = env.getConfiguration();
-
-        /**
-         * Slow down the writes if the memstore size more than
-         * (hbase.hregion.memstore.block.multiplier - 1) times hbase.hregion.memstore.flush.size
-         * bytes. This avoids flush storm to hdfs for cases like index building where reads and
-         * write happen to all the table regions in the server.
-         */
-        final long blockingMemStoreSize = getBlockingMemstoreSize(region, conf) ;
-
-        boolean buildLocalIndex = indexMaintainers != null && dataColumns==null && !localIndexScan;
-        if(buildLocalIndex) {
-            checkForLocalIndexColumnFamilies(region, indexMaintainers);
-        }
-        if (isDescRowKeyOrderUpgrade || isDelete || isUpsert
-                || (deleteCQ != null && deleteCF != null) || emptyCF != null || buildLocalIndex) {
-            needToWrite = true;
-            if((isUpsert && (targetHTable == null ||
-                    !targetHTable.getName().equals(region.getTableDesc().getTableName())))) {
-                needToWrite = false;
-            }
-            maxBatchSize = conf.getInt(MUTATE_BATCH_SIZE_ATTRIB, QueryServicesOptions.DEFAULT_MUTATE_BATCH_SIZE);
-            mutations = new MutationList(Ints.saturatedCast(maxBatchSize + maxBatchSize / 10));
-            maxBatchSizeBytes = conf.getLong(MUTATE_BATCH_SIZE_BYTES_ATTRIB,
-                QueryServicesOptions.DEFAULT_MUTATE_BATCH_SIZE_BYTES);
-        }
-        boolean hasMore;
-        int rowCount = 0;
-        boolean hasAny = false;
-        boolean acquiredLock = false;
-        boolean incrScanRefCount = false;
-        Aggregators aggregators = null;
-        Aggregator[] rowAggregators = null;
-        final RegionScanner innerScanner = theScanner;
-        final TenantCache tenantCache = GlobalCache.getTenantCache(env, ScanUtil.getTenantId(scan));
-        try (MemoryChunk em = tenantCache.getMemoryManager().allocate(0)) {
-            aggregators = ServerAggregators.deserialize(
-                    scan.getAttribute(BaseScannerRegionObserver.AGGREGATORS), conf, em);
-            rowAggregators = aggregators.getAggregators();
-            Pair<Integer, Integer> minMaxQualifiers = EncodedColumnsUtil.getMinMaxQualifiersFromScan(scan);
-            Tuple result = useQualifierAsIndex ? new PositionBasedMultiKeyValueTuple() : new MultiKeyValueTuple();
-            if (LOGGER.isDebugEnabled()) {
-                LOGGER.debug(LogUtil.addCustomAnnotations("Starting ungrouped coprocessor scan " + scan + " "+region.getRegionInfo(), ScanUtil.getCustomAnnotations(scan)));
-            }
-            boolean useIndexProto = true;
-            byte[] indexMaintainersPtr = scan.getAttribute(PhoenixIndexCodec.INDEX_PROTO_MD);
-            // for backward compatiblity fall back to look by the old attribute
-            if (indexMaintainersPtr == null) {
-                indexMaintainersPtr = scan.getAttribute(PhoenixIndexCodec.INDEX_MD);
-                useIndexProto = false;
-            }
-    
-            if(needToWrite) {
-                synchronized (lock) {
-                    if (isRegionClosingOrSplitting) {
-                        throw new IOException("Temporarily unable to write from scan because region is closing or splitting");
-                    }
-                    scansReferenceCount++;
-                    incrScanRefCount = true;
-                    lock.notifyAll();
-                }
-            }
-            region.startRegionOperation();
-            acquiredLock = true;
-            synchronized (innerScanner) {
-                do {
-                    List<Cell> results = useQualifierAsIndex ? new EncodedColumnQualiferCellsList(minMaxQualifiers.getFirst(), minMaxQualifiers.getSecond(), encodingScheme) : new ArrayList<Cell>();
-                    // Results are potentially returned even when the return value of s.next is false
-                    // since this is an indication of whether or not there are more values after the
-                    // ones returned
-                    hasMore = innerScanner.nextRaw(results);
-                    if (!results.isEmpty()) {
-                        rowCount++;
-                        result.setKeyValues(results);
-                        if (isDescRowKeyOrderUpgrade) {
-                            Arrays.fill(values, null);
-                            Cell firstKV = results.get(0);
-                            RowKeySchema schema = projectedTable.getRowKeySchema();
-                            int maxOffset = schema.iterator(firstKV.getRowArray(), firstKV.getRowOffset() + offset, firstKV.getRowLength(), ptr);
-                            for (int i = 0; i < schema.getFieldCount(); i++) {
-                                Boolean hasValue = schema.next(ptr, i, maxOffset);
-                                if (hasValue == null) {
-                                    break;
-                                }
-                                Field field = schema.getField(i);
-                                if (field.getSortOrder() == SortOrder.DESC) {
-                                    // Special case for re-writing DESC ARRAY, as the actual byte value needs to change in this case
-                                    if (field.getDataType().isArrayType()) {
-                                        field.getDataType().coerceBytes(ptr, null, field.getDataType(),
-                                            field.getMaxLength(), field.getScale(), field.getSortOrder(), 
-                                            field.getMaxLength(), field.getScale(), field.getSortOrder(), true); // force to use correct separator byte
-                                    }
-                                    // Special case for re-writing DESC CHAR or DESC BINARY, to force the re-writing of trailing space characters
-                                    else if (field.getDataType() == PChar.INSTANCE || field.getDataType() == PBinary.INSTANCE) {
-                                        int len = ptr.getLength();
-                                        while (len > 0 && ptr.get()[ptr.getOffset() + len - 1] == StringUtil.SPACE_UTF8) {
-                                            len--;
-                                        }
-                                        ptr.set(ptr.get(), ptr.getOffset(), len);
-                                        // Special case for re-writing DESC FLOAT and DOUBLE, as they're not inverted like they should be (PHOENIX-2171)
-                                    } else if (field.getDataType() == PFloat.INSTANCE || field.getDataType() == PDouble.INSTANCE) {
-                                        byte[] invertedBytes = SortOrder.invert(ptr.get(), ptr.getOffset(), ptr.getLength());
-                                        ptr.set(invertedBytes);
-                                    }
-                                } else if (field.getDataType() == PBinary.INSTANCE) {
-                                    // Remove trailing space characters so that the setValues call below will replace them
-                                    // with the correct zero byte character. Note this is somewhat dangerous as these
-                                    // could be legit, but I don't know what the alternative is.
-                                    int len = ptr.getLength();
-                                    while (len > 0 && ptr.get()[ptr.getOffset() + len - 1] == StringUtil.SPACE_UTF8) {
-                                        len--;
-                                    }
-                                    ptr.set(ptr.get(), ptr.getOffset(), len);                                        
-                                }
-                                values[i] = ptr.copyBytes();
-                            }
-                            writeToTable.newKey(ptr, values);
-                            if (Bytes.compareTo(
-                                firstKV.getRowArray(), firstKV.getRowOffset() + offset, firstKV.getRowLength(), 
-                                ptr.get(),ptr.getOffset() + offset,ptr.getLength()) == 0) {
-                                continue;
-                            }
-                            byte[] newRow = ByteUtil.copyKeyBytesIfNecessary(ptr);
-                            if (offset > 0) { // for local indexes (prepend region start key)
-                                byte[] newRowWithOffset = new byte[offset + newRow.length];
-                                System.arraycopy(firstKV.getRowArray(), firstKV.getRowOffset(), newRowWithOffset, 0, offset);;
-                                System.arraycopy(newRow, 0, newRowWithOffset, offset, newRow.length);
-                                newRow = newRowWithOffset;
-                            }
-                            byte[] oldRow = Bytes.copy(firstKV.getRowArray(), firstKV.getRowOffset(), firstKV.getRowLength());
-                            for (Cell cell : results) {
-                                // Copy existing cell but with new row key
-                                Cell newCell = new KeyValue(newRow, 0, newRow.length,
-                                    cell.getFamilyArray(), cell.getFamilyOffset(), cell.getFamilyLength(),
-                                    cell.getQualifierArray(), cell.getQualifierOffset(), cell.getQualifierLength(),
-                                    cell.getTimestamp(), KeyValue.Type.codeToType(cell.getTypeByte()),
-                                    cell.getValueArray(), cell.getValueOffset(), cell.getValueLength());
-                                switch (KeyValue.Type.codeToType(cell.getTypeByte())) {
-                                case Put:
-                                    // If Put, point delete old Put
-                                    Delete del = new Delete(oldRow);
-                                    del.addDeleteMarker(new KeyValue(cell.getRowArray(), cell.getRowOffset(), cell.getRowLength(),
-                                        cell.getFamilyArray(), cell.getFamilyOffset(), cell.getFamilyLength(),
-                                        cell.getQualifierArray(), cell.getQualifierOffset(),
-                                        cell.getQualifierLength(), cell.getTimestamp(), KeyValue.Type.Delete,
-                                        ByteUtil.EMPTY_BYTE_ARRAY, 0, 0));
-                                    mutations.add(del);
-
-                                    Put put = new Put(newRow);
-                                    put.add(newCell);
-                                    mutations.add(put);
-                                    break;
-                                case Delete:
-                                case DeleteColumn:
-                                case DeleteFamily:
-                                case DeleteFamilyVersion:
-                                    Delete delete = new Delete(newRow);
-                                    delete.addDeleteMarker(newCell);
-                                    mutations.add(delete);
-                                    break;
-                                }
-                            }
-                        } else if (buildLocalIndex) {
-                            for (IndexMaintainer maintainer : indexMaintainers) {
-                                if (!results.isEmpty()) {
-                                    result.getKey(ptr);
-                                    ValueGetter valueGetter =
-                                            maintainer.createGetterFromKeyValues(
-                                                ImmutableBytesPtr.copyBytesIfNecessary(ptr),
-                                                results);
-                                    Put put = maintainer.buildUpdateMutation(kvBuilder,
-                                        valueGetter, ptr, results.get(0).getTimestamp(),
-                                        env.getRegion().getRegionInfo().getStartKey(),
-                                        env.getRegion().getRegionInfo().getEndKey());
-
-                                    if (txnProvider != null) {
-                                        put = txnProvider.markPutAsCommitted(put, ts, ts);
-                                    }
-                                    indexMutations.add(put);
-                                }
-                            }
-                            result.setKeyValues(results);
-                        } else if (isDelete) {
-                            // FIXME: the version of the Delete constructor without the lock
-                            // args was introduced in 0.94.4, thus if we try to use it here
-                            // we can no longer use the 0.94.2 version of the client.
-                            Cell firstKV = results.get(0);
-                            Delete delete = new Delete(firstKV.getRowArray(),
-                                firstKV.getRowOffset(), firstKV.getRowLength(),ts);
-                            if (replayMutations != null) {
-                                delete.setAttribute(REPLAY_WRITES, replayMutations);
-                            }
-                            mutations.add(delete);
-                            // force tephra to ignore this deletes
-                            delete.setAttribute(PhoenixTransactionContext.TX_ROLLBACK_ATTRIBUTE_KEY, new byte[0]);
-                        } else if (isUpsert) {
-                            Arrays.fill(values, null);
-                            int bucketNumOffset = 0;
-                            if (projectedTable.getBucketNum() != null) {
-                                values[0] = new byte[] { 0 };
-                                bucketNumOffset = 1;
-                            }
-                            int i = bucketNumOffset;
-                            List<PColumn> projectedColumns = projectedTable.getColumns();
-                            for (; i < projectedTable.getPKColumns().size(); i++) {
-                                Expression expression = selectExpressions.get(i - bucketNumOffset);
-                                if (expression.evaluate(result, ptr)) {
-                                    values[i] = ptr.copyBytes();
-                                    // If SortOrder from expression in SELECT doesn't match the
-                                    // column being projected into then invert the bits.
-                                    if (expression.getSortOrder() !=
-                                            projectedColumns.get(i).getSortOrder()) {
-                                        SortOrder.invert(values[i], 0, values[i], 0,
-                                            values[i].length);
-                                    }
-                                }else{
-                                    values[i] = ByteUtil.EMPTY_BYTE_ARRAY;
-                                }
-                            }
-                            projectedTable.newKey(ptr, values);
-                            PRow row = projectedTable.newRow(kvBuilder, ts, ptr, false);
-                            for (; i < projectedColumns.size(); i++) {
-                                Expression expression = selectExpressions.get(i - bucketNumOffset);
-                                if (expression.evaluate(result, ptr)) {
-                                    PColumn column = projectedColumns.get(i);
-                                    if (!column.getDataType().isSizeCompatible(ptr, null,
-                                        expression.getDataType(), expression.getSortOrder(),
-                                        expression.getMaxLength(), expression.getScale(),
-                                        column.getMaxLength(), column.getScale())) {
-                                        throw new DataExceedsCapacityException(
-                                                column.getDataType(),
-                                                column.getMaxLength(),
-                                                column.getScale(),
-                                                column.getName().getString());
-                                    }
-                                    column.getDataType().coerceBytes(ptr, null,
-                                        expression.getDataType(), expression.getMaxLength(),
-                                        expression.getScale(), expression.getSortOrder(), 
-                                        column.getMaxLength(), column.getScale(),
-                                        column.getSortOrder(), projectedTable.rowKeyOrderOptimizable());
-                                    byte[] bytes = ByteUtil.copyKeyBytesIfNecessary(ptr);
-                                    row.setValue(column, bytes);
-                                }
-                            }
-                            for (Mutation mutation : row.toRowMutations()) {
-                                if (replayMutations != null) {
-                                    mutation.setAttribute(REPLAY_WRITES, replayMutations);
-                                } else if (txnProvider != null && projectedTable.getType() == PTableType.INDEX) {
-                                    mutation = txnProvider.markPutAsCommitted((Put)mutation, ts, ts);
-                                }
-                                mutations.add(mutation);
-                            }
-                            for (i = 0; i < selectExpressions.size(); i++) {
-                                selectExpressions.get(i).reset();
-                            }
-                        } else if (deleteCF != null && deleteCQ != null) {
-                            // No need to search for delete column, since we project only it
-                            // if no empty key value is being set
-                            if (emptyCF == null ||
-                                    result.getValue(deleteCF, deleteCQ) != null) {
-                                Delete delete = new Delete(results.get(0).getRowArray(),
-                                    results.get(0).getRowOffset(),
-                                    results.get(0).getRowLength());
-                                delete.deleteColumns(deleteCF,  deleteCQ, ts);
-                                // force tephra to ignore this deletes
-                                delete.setAttribute(PhoenixTransactionContext.TX_ROLLBACK_ATTRIBUTE_KEY, new byte[0]);
-                                mutations.add(delete);
-                            }
-                        }
-                        if (emptyCF != null) {
-                            /*
-                             * If we've specified an emptyCF, then we need to insert an empty
-                             * key value "retroactively" for any key value that is visible at
-                             * the timestamp that the DDL was issued. Key values that are not
-                             * visible at this timestamp will not ever be projected up to
-                             * scans past this timestamp, so don't need to be considered.
-                             * We insert one empty key value per row per timestamp.
-                             */
-                            Set<Long> timeStamps =
-                                    Sets.newHashSetWithExpectedSize(results.size());
-                            for (Cell kv : results) {
-                                long kvts = kv.getTimestamp();
-                                if (!timeStamps.contains(kvts)) {
-                                    Put put = new Put(kv.getRowArray(), kv.getRowOffset(),
-                                        kv.getRowLength());
-                                    put.addColumn(emptyCF, QueryConstants.EMPTY_COLUMN_BYTES, kvts,
-                                        ByteUtil.EMPTY_BYTE_ARRAY);
-                                    mutations.add(put);
-                                }
-                            }
-                        }
-                        if (ServerUtil.readyToCommit(mutations.size(), mutations.byteSize(), maxBatchSize, maxBatchSizeBytes)) {
-                            commit(region, mutations, indexUUID, blockingMemStoreSize, indexMaintainersPtr,
-                                txState, targetHTable, useIndexProto, isPKChanging, clientVersionBytes);
-                            mutations.clear();
-                        }
-                        // Commit in batches based on UPSERT_BATCH_SIZE_BYTES_ATTRIB in config
-
-                        if (ServerUtil.readyToCommit(indexMutations.size(), indexMutations.byteSize(), maxBatchSize, maxBatchSizeBytes)) {
-                            setIndexAndTransactionProperties(indexMutations, indexUUID, indexMaintainersPtr, txState, clientVersionBytes, useIndexProto);
-                            commitBatch(region, indexMutations, blockingMemStoreSize);
-                            indexMutations.clear();
-                        }
-                        aggregators.aggregate(rowAggregators, result);
-                        hasAny = true;
-                    }
-                } while (hasMore);
-                if (!mutations.isEmpty()) {
-                    commit(region, mutations, indexUUID, blockingMemStoreSize, indexMaintainersPtr, txState,
-                        targetHTable, useIndexProto, isPKChanging, clientVersionBytes);
-                    mutations.clear();
-                }
-
-                if (!indexMutations.isEmpty()) {
-                    commitBatch(region, indexMutations, blockingMemStoreSize);
-                    indexMutations.clear();
-                }
-            }
-        } finally {
-            if (needToWrite && incrScanRefCount) {
-                synchronized (lock) {
-                    scansReferenceCount--;
-                    if (scansReferenceCount < 0) {
-                        LOGGER.warn(
-                            "Scan reference count went below zero. Something isn't correct. Resetting it back to zero");
-                        scansReferenceCount = 0;
-                    }
-                    lock.notifyAll();
-                }
-            }
-            try {
-                if (targetHTable != null) {
-                    try {
-                        targetHTable.close();
-                    } catch (IOException e) {
-                        LOGGER.error("Closing table: " + targetHTable + " failed: ", e);
-                    }
-                }
-            } finally {
-                try {
-                    innerScanner.close();
-                } finally {
-                    if (acquiredLock) region.closeRegionOperation();
-                }
-            }
-        }
-        if (LOGGER.isDebugEnabled()) {
-            LOGGER.debug(LogUtil.addCustomAnnotations("Finished scanning " + rowCount + " rows for ungrouped coprocessor scan " + scan, ScanUtil.getCustomAnnotations(scan)));
+                    getWrappedScanner(c, theScanner, offset, scan, dataColumns, tupleProjector,
+                            region, indexMaintainers == null ? null : indexMaintainers.get(0), viewConstants, p, tempPtr, useQualifierAsIndex);
         }
 
-        final boolean hadAny = hasAny;
-        KeyValue keyValue = null;
-        if (hadAny) {
-            byte[] value = aggregators.toBytes(rowAggregators);
-            keyValue = KeyValueUtil.newKeyValue(UNGROUPED_AGG_ROW_KEY, SINGLE_COLUMN_FAMILY, SINGLE_COLUMN, AGG_TIMESTAMP, value, 0, value.length);
+        if (j != null)  {
+            theScanner = new HashJoinRegionScanner(theScanner, p, j, ScanUtil.getTenantId(scan), env, useQualifierAsIndex, useNewValueColumnQualifier);
         }
-        final KeyValue aggKeyValue = keyValue;
-
-        RegionScanner scanner = new BaseRegionScanner(innerScanner) {
-            private boolean done = !hadAny;
-
-            @Override
-            public boolean isFilterDone() {
-                return done;
-            }
-
-            @Override
-            public boolean next(List<Cell> results) throws IOException {
-                if (done) return false;
-                done = true;
-                results.add(aggKeyValue);
-                return false;
-            }
-
-            @Override
-            public long getMaxResultSize() {
-                return scan.getMaxResultSize();
-            }
-        };
+        RegionScanner scanner = new UngroupedAggregateRegionScanner(c, theScanner,region, scan, env, this);

Review comment:
       Yes, you are right.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [phoenix] kadirozde commented on a change in pull request #936: PHOENIX-5998 Paged server side ungrouped aggregate operations

Posted by GitBox <gi...@apache.org>.

kadirozde commented on a change in pull request #936:
URL: https://github.com/apache/phoenix/pull/936#discussion_r512309864



##########
File path: phoenix-core/src/main/java/org/apache/phoenix/iterate/UngroupedAggregatingResultIterator.java
##########
@@ -36,19 +37,33 @@ public UngroupedAggregatingResultIterator( PeekingResultIterator resultIterator,
     
     @Override
     public Tuple next() throws SQLException {
-        Tuple result = super.next();
+        byte[] value;
+        Tuple result = resultIterator.next();
+        // We should reset ClientAggregators here in case they are being reused in a new ResultIterator.
+        aggregators.reset(aggregators.getAggregators());
         // Ensure ungrouped aggregregation always returns a row, even if the underlying iterator doesn't.
-        if (result == null && !hasRows) {
-            // We should reset ClientAggregators here in case they are being reused in a new ResultIterator.
-            aggregators.reset(aggregators.getAggregators());
-            byte[] value = aggregators.toBytes(aggregators.getAggregators());
-            result = new SingleKeyValueTuple(
-                    KeyValueUtil.newKeyValue(UNGROUPED_AGG_ROW_KEY, 
-                            SINGLE_COLUMN_FAMILY, 
-                            SINGLE_COLUMN, 
-                            AGG_TIMESTAMP, 
-                            value));
+        if (result == null) {
+            if (hasRows) {
+                return null;

Review comment:
       I did not really change the behavior here and thus I did not change the comment. If there is no row in the table, we should still get a result object with the zero row count. This is all what the comment tries to communicate.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [phoenix] ChinmaySKulkarni commented on a change in pull request #936: PHOENIX-5998 Paged server side ungrouped aggregate operations

Posted by GitBox <gi...@apache.org>.

ChinmaySKulkarni commented on a change in pull request #936:
URL: https://github.com/apache/phoenix/pull/936#discussion_r520211914



##########
File path: phoenix-core/src/main/java/org/apache/phoenix/coprocessor/UngroupedAggregateRegionScanner.java
##########
@@ -0,0 +1,645 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.phoenix.coprocessor;
+
+import static org.apache.phoenix.coprocessor.BaseScannerRegionObserver.LOCAL_INDEX_BUILD;
+import static org.apache.phoenix.coprocessor.BaseScannerRegionObserver.LOCAL_INDEX_BUILD_PROTO;
+import static org.apache.phoenix.coprocessor.BaseScannerRegionObserver.REPLAY_WRITES;
+import static org.apache.phoenix.coprocessor.BaseScannerRegionObserver.UPGRADE_DESC_ROW_KEY;
+import static org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver.checkForLocalIndexColumnFamilies;
+import static org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver.deserializeExpressions;
+import static org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver.deserializeTable;
+import static org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver.getBlockingMemstoreSize;
+import static org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver.setIndexAndTransactionProperties;
+import static org.apache.phoenix.query.QueryConstants.AGG_TIMESTAMP;
+import static org.apache.phoenix.query.QueryConstants.SINGLE_COLUMN;
+import static org.apache.phoenix.query.QueryConstants.SINGLE_COLUMN_FAMILY;
+import static org.apache.phoenix.query.QueryServices.MUTATE_BATCH_SIZE_ATTRIB;
+import static org.apache.phoenix.query.QueryServices.MUTATE_BATCH_SIZE_BYTES_ATTRIB;
+import static org.apache.phoenix.query.QueryServices.UNGROUPED_AGGREGATE_PAGE_SIZE_IN_ROWS;
+import static org.apache.phoenix.schema.PTableImpl.getColumnsToClone;
+
+import java.io.IOException;
+import java.sql.SQLException;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.List;
+import java.util.Set;
+
+import com.google.common.collect.Sets;
+import com.google.common.primitives.Ints;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.hbase.Cell;
+import org.apache.hadoop.hbase.CellUtil;
+import org.apache.hadoop.hbase.DoNotRetryIOException;
+import org.apache.hadoop.hbase.HRegionInfo;
+import org.apache.hadoop.hbase.KeyValue;
+import org.apache.hadoop.hbase.client.Delete;
+import org.apache.hadoop.hbase.client.HTable;
+import org.apache.hadoop.hbase.client.Mutation;
+import org.apache.hadoop.hbase.client.Put;
+import org.apache.hadoop.hbase.client.Scan;
+import org.apache.hadoop.hbase.coprocessor.ObserverContext;
+import org.apache.hadoop.hbase.coprocessor.RegionCoprocessorEnvironment;
+import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
+import org.apache.hadoop.hbase.regionserver.Region;
+import org.apache.hadoop.hbase.regionserver.RegionScanner;
+import org.apache.hadoop.hbase.util.Bytes;
+import org.apache.phoenix.cache.GlobalCache;
+import org.apache.phoenix.cache.TenantCache;
+import org.apache.phoenix.exception.DataExceedsCapacityException;
+import org.apache.phoenix.expression.Expression;
+import org.apache.phoenix.expression.aggregator.Aggregator;
+import org.apache.phoenix.expression.aggregator.Aggregators;
+import org.apache.phoenix.expression.aggregator.ServerAggregators;
+import org.apache.phoenix.hbase.index.covered.update.ColumnReference;
+import org.apache.phoenix.hbase.index.util.GenericKeyValueBuilder;
+import org.apache.phoenix.index.PhoenixIndexCodec;
+import org.apache.phoenix.memory.InsufficientMemoryException;
+import org.apache.phoenix.memory.MemoryManager;
+import org.apache.phoenix.query.QueryConstants;
+import org.apache.phoenix.query.QueryServicesOptions;
+import org.apache.phoenix.schema.PColumn;
+import org.apache.phoenix.schema.PRow;
+import org.apache.phoenix.schema.PTable;
+import org.apache.phoenix.schema.PTableImpl;
+import org.apache.phoenix.schema.PTableType;
+import org.apache.phoenix.schema.RowKeySchema;
+import org.apache.phoenix.schema.SortOrder;
+import org.apache.phoenix.schema.TableRef;
+import org.apache.phoenix.schema.ValueSchema;
+import org.apache.phoenix.schema.tuple.EncodedColumnQualiferCellsList;
+import org.apache.phoenix.schema.tuple.MultiKeyValueTuple;
+import org.apache.phoenix.schema.tuple.PositionBasedMultiKeyValueTuple;
+import org.apache.phoenix.schema.tuple.Tuple;
+import org.apache.phoenix.schema.types.PBinary;
+import org.apache.phoenix.schema.types.PChar;
+import org.apache.phoenix.schema.types.PDataType;
+import org.apache.phoenix.schema.types.PDouble;
+import org.apache.phoenix.schema.types.PFloat;
+import org.apache.phoenix.transaction.PhoenixTransactionContext;
+import org.apache.phoenix.transaction.PhoenixTransactionProvider;
+import org.apache.phoenix.transaction.TransactionFactory;
+import org.apache.phoenix.util.ByteUtil;
+import org.apache.phoenix.util.EncodedColumnsUtil;
+import org.apache.hadoop.hbase.util.Pair;
+import org.apache.phoenix.hbase.index.ValueGetter;
+import org.apache.phoenix.hbase.index.util.ImmutableBytesPtr;
+import org.apache.phoenix.index.IndexMaintainer;
+import org.apache.phoenix.util.ExpressionUtil;
+import org.apache.phoenix.util.IndexUtil;
+import org.apache.phoenix.util.KeyValueUtil;
+import org.apache.phoenix.util.LogUtil;
+import org.apache.phoenix.util.ScanUtil;
+import org.apache.phoenix.util.ServerUtil;
+import org.apache.phoenix.util.StringUtil;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+public class UngroupedAggregateRegionScanner extends BaseRegionScanner {
+
+    private static final Logger LOGGER = LoggerFactory.getLogger(UngroupedAggregateRegionScanner.class);
+
+    protected long pageSizeInRows = Long.MAX_VALUE;
+    protected int maxBatchSize = 0;
+    protected Scan scan;
+    protected RegionScanner innerScanner;
+    protected Region region;
+    private final UngroupedAggregateRegionObserver ungroupedAggregateRegionObserver;
+    final private RegionCoprocessorEnvironment env;
+    private final boolean useQualifierAsIndex;
+    boolean needToWrite = false;
+    final Pair<Integer, Integer> minMaxQualifiers;
+    byte[][] values = null;
+    PTable.QualifierEncodingScheme encodingScheme;
+    PTable writeToTable = null;
+    PTable projectedTable = null;
+    boolean isDescRowKeyOrderUpgrade;
+    final int offset;
+    boolean buildLocalIndex;
+    List<IndexMaintainer> indexMaintainers;
+    boolean isPKChanging = false;
+    long ts;
+    PhoenixTransactionProvider txnProvider = null;
+    UngroupedAggregateRegionObserver.MutationList indexMutations;
+    boolean isDelete = false;
+    byte[] replayMutations;
+    boolean isUpsert = false;
+    List<Expression> selectExpressions = null;
+    byte[] deleteCQ = null;
+    byte[] deleteCF = null;
+    byte[] emptyCF = null;
+    byte[] indexUUID;
+    byte[] txState;
+    byte[] clientVersionBytes;
+    long blockingMemStoreSize;
+    long maxBatchSizeBytes = 0L;
+    HTable targetHTable = null;
+    boolean incrScanRefCount = false;
+    byte[] indexMaintainersPtr;
+    boolean useIndexProto;
+
+    public UngroupedAggregateRegionScanner(final ObserverContext<RegionCoprocessorEnvironment> c,
+                                           final RegionScanner innerScanner, final Region region, final Scan scan,
+                                           final RegionCoprocessorEnvironment env,
+                                           final UngroupedAggregateRegionObserver ungroupedAggregateRegionObserver)
+            throws IOException, SQLException{
+        super(innerScanner);
+        this.env = env;
+        this.region = region;
+        this.scan = scan;
+        this.ungroupedAggregateRegionObserver = ungroupedAggregateRegionObserver;
+        this.innerScanner = innerScanner;
+        Configuration conf = env.getConfiguration();
+        if (scan.getAttribute(BaseScannerRegionObserver.SERVER_PAGING) != null) {
+            byte[] pageSizeFromScan =
+                    scan.getAttribute(BaseScannerRegionObserver.AGGREGATE_PAGE_ROWS);
+            if (pageSizeFromScan != null) {
+                pageSizeInRows = Bytes.toLong(pageSizeFromScan);
+            } else {
+                pageSizeInRows =
+                        conf.getLong(UNGROUPED_AGGREGATE_PAGE_SIZE_IN_ROWS,
+                                QueryServicesOptions.DEFAULT_UNGROUPED_AGGREGATE_PAGE_SIZE_IN_ROWS);
+            }
+        }
+        ts = scan.getTimeRange().getMax();
+        boolean localIndexScan = ScanUtil.isLocalIndex(scan);
+
+        encodingScheme = EncodedColumnsUtil.getQualifierEncodingScheme(scan);
+        int offsetToBe = 0;
+        if (localIndexScan) {
+            /*
+             * For local indexes, we need to set an offset on row key expressions to skip
+             * the region start key.
+             */
+            offsetToBe = region.getRegionInfo().getStartKey().length != 0 ? region.getRegionInfo().getStartKey().length :
+                    region.getRegionInfo().getEndKey().length;
+            ScanUtil.setRowKeyOffset(scan, offsetToBe);
+        }
+        offset = offsetToBe;
+
+        byte[] descRowKeyTableBytes = scan.getAttribute(UPGRADE_DESC_ROW_KEY);
+        isDescRowKeyOrderUpgrade = descRowKeyTableBytes != null;
+        if (isDescRowKeyOrderUpgrade) {
+            LOGGER.debug("Upgrading row key for " + region.getRegionInfo().getTable().getNameAsString());
+            projectedTable = deserializeTable(descRowKeyTableBytes);
+            try {
+                writeToTable = PTableImpl.builderWithColumns(projectedTable,
+                        getColumnsToClone(projectedTable))
+                        .setRowKeyOrderOptimizable(true)
+                        .build();
+            } catch (SQLException e) {
+                ServerUtil.throwIOException("Upgrade failed", e); // Impossible
+            }
+            values = new byte[projectedTable.getPKColumns().size()][];
+        }
+        boolean useProto = false;
+        byte[] localIndexBytes = scan.getAttribute(LOCAL_INDEX_BUILD_PROTO);
+        useProto = localIndexBytes != null;
+        if (localIndexBytes == null) {
+            localIndexBytes = scan.getAttribute(LOCAL_INDEX_BUILD);
+        }
+        indexMaintainers = localIndexBytes == null ? null : IndexMaintainer.deserialize(localIndexBytes, useProto);
+        indexMutations = localIndexBytes == null ? new UngroupedAggregateRegionObserver.MutationList() : new UngroupedAggregateRegionObserver.MutationList(1024);
+
+        replayMutations = scan.getAttribute(REPLAY_WRITES);
+        indexUUID = scan.getAttribute(PhoenixIndexCodec.INDEX_UUID);
+        txState = scan.getAttribute(BaseScannerRegionObserver.TX_STATE);
+        clientVersionBytes = scan.getAttribute(BaseScannerRegionObserver.CLIENT_VERSION);
+        if (txState != null) {
+            int clientVersion = clientVersionBytes == null ? ScanUtil.UNKNOWN_CLIENT_VERSION : Bytes.toInt(clientVersionBytes);
+            txnProvider = TransactionFactory.getTransactionProvider(txState, clientVersion);
+        }
+        byte[] upsertSelectTable = scan.getAttribute(BaseScannerRegionObserver.UPSERT_SELECT_TABLE);
+        if (upsertSelectTable != null) {
+            isUpsert = true;
+            projectedTable = deserializeTable(upsertSelectTable);
+            targetHTable = new HTable(ungroupedAggregateRegionObserver.getUpsertSelectConfig(),
+                    projectedTable.getPhysicalName().getBytes());
+            selectExpressions = deserializeExpressions(scan.getAttribute(BaseScannerRegionObserver.UPSERT_SELECT_EXPRS));
+            values = new byte[projectedTable.getPKColumns().size()][];
+            isPKChanging = ExpressionUtil.isPkPositionChanging(new TableRef(projectedTable), selectExpressions);
+        } else {
+            byte[] isDeleteAgg = scan.getAttribute(BaseScannerRegionObserver.DELETE_AGG);
+            isDelete = isDeleteAgg != null && Bytes.compareTo(PDataType.TRUE_BYTES, isDeleteAgg) == 0;
+            if (!isDelete) {
+                deleteCF = scan.getAttribute(BaseScannerRegionObserver.DELETE_CF);
+                deleteCQ = scan.getAttribute(BaseScannerRegionObserver.DELETE_CQ);
+            }
+            emptyCF = scan.getAttribute(BaseScannerRegionObserver.EMPTY_CF);
+        }
+        ColumnReference[] dataColumns = IndexUtil.deserializeDataTableColumnsToJoin(scan);
+        useQualifierAsIndex = EncodedColumnsUtil.useQualifierAsIndex(EncodedColumnsUtil.getMinMaxQualifiersFromScan(scan));
+
+        /**
+         * Slow down the writes if the memstore size more than
+         * (hbase.hregion.memstore.block.multiplier - 1) times hbase.hregion.memstore.flush.size
+         * bytes. This avoids flush storm to hdfs for cases like index building where reads and
+         * write happen to all the table regions in the server.
+         */
+        blockingMemStoreSize = getBlockingMemstoreSize(region, conf) ;
+
+        buildLocalIndex = indexMaintainers != null && dataColumns==null && !localIndexScan;
+        if(buildLocalIndex) {
+            checkForLocalIndexColumnFamilies(region, indexMaintainers);
+        }
+        if (isDescRowKeyOrderUpgrade || isDelete || isUpsert
+                || (deleteCQ != null && deleteCF != null) || emptyCF != null || buildLocalIndex) {
+            needToWrite = true;
+            if((isUpsert && (targetHTable == null ||
+                    !targetHTable.getName().equals(region.getTableDesc().getTableName())))) {
+                needToWrite = false;
+            }
+            maxBatchSize = conf.getInt(MUTATE_BATCH_SIZE_ATTRIB, QueryServicesOptions.DEFAULT_MUTATE_BATCH_SIZE);
+            maxBatchSizeBytes = conf.getLong(MUTATE_BATCH_SIZE_BYTES_ATTRIB,
+                    QueryServicesOptions.DEFAULT_MUTATE_BATCH_SIZE_BYTES);
+        }
+        minMaxQualifiers = EncodedColumnsUtil.getMinMaxQualifiersFromScan(scan);
+        if (LOGGER.isDebugEnabled()) {
+            LOGGER.debug(LogUtil.addCustomAnnotations("Starting ungrouped coprocessor scan " + scan + " " + region.getRegionInfo(), ScanUtil.getCustomAnnotations(scan)));
+        }
+        useIndexProto = true;
+        indexMaintainersPtr = scan.getAttribute(PhoenixIndexCodec.INDEX_PROTO_MD);
+        // for backward compatiblity fall back to look by the old attribute
+        if (indexMaintainersPtr == null) {
+            indexMaintainersPtr = scan.getAttribute(PhoenixIndexCodec.INDEX_MD);
+            useIndexProto = false;
+        }
+
+        if (needToWrite) {
+            ungroupedAggregateRegionObserver.incrementScansReferenceCount();
+            incrScanRefCount = true;
+        }
+    }
+
+    @Override
+    public HRegionInfo getRegionInfo() {
+        return region.getRegionInfo();
+    }
+
+    @Override
+    public boolean isFilterDone() {
+        return false;
+    }
+
+    @Override
+    public void close() throws IOException {
+        if (needToWrite && incrScanRefCount) {
+            ungroupedAggregateRegionObserver.decrementScansReferenceCount();
+        }
+        try {
+            if (targetHTable != null) {
+                try {
+                    targetHTable.close();
+                } catch (IOException e) {
+                    LOGGER.error("Closing table: " + targetHTable + " failed: ", e);
+                }
+            }
+        } finally {
+            innerScanner.close();
+        }
+    }
+
+    boolean descRowKeyOrderUpgrade(List<Cell> results, ImmutableBytesWritable ptr,
+                                UngroupedAggregateRegionObserver.MutationList mutations) throws IOException {
+        Arrays.fill(values, null);
+        Cell firstKV = results.get(0);
+        RowKeySchema schema = projectedTable.getRowKeySchema();
+        int maxOffset = schema.iterator(firstKV.getRowArray(), firstKV.getRowOffset() + offset, firstKV.getRowLength(), ptr);
+        for (int i = 0; i < schema.getFieldCount(); i++) {
+            Boolean hasValue = schema.next(ptr, i, maxOffset);
+            if (hasValue == null) {
+                break;
+            }
+            ValueSchema.Field field = schema.getField(i);
+            if (field.getSortOrder() == SortOrder.DESC) {
+                // Special case for re-writing DESC ARRAY, as the actual byte value needs to change in this case
+                if (field.getDataType().isArrayType()) {
+                    field.getDataType().coerceBytes(ptr, null, field.getDataType(),
+                            field.getMaxLength(), field.getScale(), field.getSortOrder(),
+                            field.getMaxLength(), field.getScale(), field.getSortOrder(), true); // force to use correct separator byte
+                }
+                // Special case for re-writing DESC CHAR or DESC BINARY, to force the re-writing of trailing space characters
+                else if (field.getDataType() == PChar.INSTANCE || field.getDataType() == PBinary.INSTANCE) {
+                    int len = ptr.getLength();
+                    while (len > 0 && ptr.get()[ptr.getOffset() + len - 1] == StringUtil.SPACE_UTF8) {
+                        len--;
+                    }
+                    ptr.set(ptr.get(), ptr.getOffset(), len);
+                    // Special case for re-writing DESC FLOAT and DOUBLE, as they're not inverted like they should be (PHOENIX-2171)
+                } else if (field.getDataType() == PFloat.INSTANCE || field.getDataType() == PDouble.INSTANCE) {
+                    byte[] invertedBytes = SortOrder.invert(ptr.get(), ptr.getOffset(), ptr.getLength());
+                    ptr.set(invertedBytes);
+                }
+            } else if (field.getDataType() == PBinary.INSTANCE) {
+                // Remove trailing space characters so that the setValues call below will replace them
+                // with the correct zero byte character. Note this is somewhat dangerous as these
+                // could be legit, but I don't know what the alternative is.
+                int len = ptr.getLength();
+                while (len > 0 && ptr.get()[ptr.getOffset() + len - 1] == StringUtil.SPACE_UTF8) {
+                    len--;
+                }
+                ptr.set(ptr.get(), ptr.getOffset(), len);
+            }
+            values[i] = ptr.copyBytes();
+        }
+        writeToTable.newKey(ptr, values);
+        if (Bytes.compareTo(
+                firstKV.getRowArray(), firstKV.getRowOffset() + offset, firstKV.getRowLength(),
+                ptr.get(),ptr.getOffset() + offset,ptr.getLength()) == 0) {
+            return false;
+        }
+        byte[] newRow = ByteUtil.copyKeyBytesIfNecessary(ptr);
+        if (offset > 0) { // for local indexes (prepend region start key)
+            byte[] newRowWithOffset = new byte[offset + newRow.length];
+            System.arraycopy(firstKV.getRowArray(), firstKV.getRowOffset(), newRowWithOffset, 0, offset);;
+            System.arraycopy(newRow, 0, newRowWithOffset, offset, newRow.length);
+            newRow = newRowWithOffset;
+        }
+        byte[] oldRow = Bytes.copy(firstKV.getRowArray(), firstKV.getRowOffset(), firstKV.getRowLength());
+        for (Cell cell : results) {
+            // Copy existing cell but with new row key
+            Cell newCell = new KeyValue(newRow, 0, newRow.length,
+                    cell.getFamilyArray(), cell.getFamilyOffset(), cell.getFamilyLength(),
+                    cell.getQualifierArray(), cell.getQualifierOffset(), cell.getQualifierLength(),
+                    cell.getTimestamp(), KeyValue.Type.codeToType(cell.getTypeByte()),
+                    cell.getValueArray(), cell.getValueOffset(), cell.getValueLength());
+            switch (KeyValue.Type.codeToType(cell.getTypeByte())) {
+                case Put:
+                    // If Put, point delete old Put
+                    Delete del = new Delete(oldRow);
+                    del.addDeleteMarker(new KeyValue(cell.getRowArray(), cell.getRowOffset(), cell.getRowLength(),
+                            cell.getFamilyArray(), cell.getFamilyOffset(), cell.getFamilyLength(),
+                            cell.getQualifierArray(), cell.getQualifierOffset(),
+                            cell.getQualifierLength(), cell.getTimestamp(), KeyValue.Type.Delete,
+                            ByteUtil.EMPTY_BYTE_ARRAY, 0, 0));
+                    mutations.add(del);
+
+                    Put put = new Put(newRow);
+                    put.add(newCell);
+                    mutations.add(put);
+                    break;
+                case Delete:
+                case DeleteColumn:
+                case DeleteFamily:
+                case DeleteFamilyVersion:
+                    Delete delete = new Delete(newRow);
+                    delete.addDeleteMarker(newCell);
+                    mutations.add(delete);
+                    break;
+            }
+        }
+        return true;
+    }
+
+    void buildLocalIndex(Tuple result, List<Cell> results, ImmutableBytesWritable ptr,
+                         UngroupedAggregateRegionObserver.MutationList mutations) throws IOException {
+        for (IndexMaintainer maintainer : indexMaintainers) {
+            if (!results.isEmpty()) {
+                result.getKey(ptr);
+                ValueGetter valueGetter =
+                        maintainer.createGetterFromKeyValues(
+                                ImmutableBytesPtr.copyBytesIfNecessary(ptr),
+                                results);
+                Put put = maintainer.buildUpdateMutation(GenericKeyValueBuilder.INSTANCE,
+                        valueGetter, ptr, results.get(0).getTimestamp(),
+                        env.getRegion().getRegionInfo().getStartKey(),
+                        env.getRegion().getRegionInfo().getEndKey());
+
+                if (txnProvider != null) {
+                    put = txnProvider.markPutAsCommitted(put, ts, ts);
+                }
+                indexMutations.add(put);
+            }
+        }
+        result.setKeyValues(results);
+    }
+    void deleteRow(List<Cell> results, UngroupedAggregateRegionObserver.MutationList mutations) {
+        // FIXME: the version of the Delete constructor without the lock
+        // args was introduced in 0.94.4, thus if we try to use it here
+        // we can no longer use the 0.94.2 version of the client.
+        Cell firstKV = results.get(0);
+        Delete delete = new Delete(firstKV.getRowArray(),
+                firstKV.getRowOffset(), firstKV.getRowLength(),ts);
+        if (replayMutations != null) {
+            delete.setAttribute(REPLAY_WRITES, replayMutations);
+        }
+        mutations.add(delete);
+        // force tephra to ignore this deletes
+        delete.setAttribute(PhoenixTransactionContext.TX_ROLLBACK_ATTRIBUTE_KEY, new byte[0]);
+    }
+
+    void deleteCForQ(Tuple result, List<Cell> results, UngroupedAggregateRegionObserver.MutationList mutations) {
+        // No need to search for delete column, since we project only it
+        // if no empty key value is being set
+        if (emptyCF == null ||
+                result.getValue(deleteCF, deleteCQ) != null) {
+            Delete delete = new Delete(results.get(0).getRowArray(),
+                    results.get(0).getRowOffset(),
+                    results.get(0).getRowLength());
+            delete.deleteColumns(deleteCF,  deleteCQ, ts);
+            // force tephra to ignore this deletes
+            delete.setAttribute(PhoenixTransactionContext.TX_ROLLBACK_ATTRIBUTE_KEY, new byte[0]);
+            mutations.add(delete);
+        }
+    }
+    void upsert(Tuple result, ImmutableBytesWritable ptr, UngroupedAggregateRegionObserver.MutationList mutations) {
+        Arrays.fill(values, null);
+        int bucketNumOffset = 0;
+        if (projectedTable.getBucketNum() != null) {
+            values[0] = new byte[] { 0 };
+            bucketNumOffset = 1;
+        }
+        int i = bucketNumOffset;
+        List<PColumn> projectedColumns = projectedTable.getColumns();
+        for (; i < projectedTable.getPKColumns().size(); i++) {
+            Expression expression = selectExpressions.get(i - bucketNumOffset);
+            if (expression.evaluate(result, ptr)) {
+                values[i] = ptr.copyBytes();
+                // If SortOrder from expression in SELECT doesn't match the
+                // column being projected into then invert the bits.
+                if (expression.getSortOrder() !=
+                        projectedColumns.get(i).getSortOrder()) {
+                    SortOrder.invert(values[i], 0, values[i], 0,
+                            values[i].length);
+                }
+            }else{
+                values[i] = ByteUtil.EMPTY_BYTE_ARRAY;
+            }
+        }
+        projectedTable.newKey(ptr, values);
+        PRow row = projectedTable.newRow(GenericKeyValueBuilder.INSTANCE, ts, ptr, false);
+        for (; i < projectedColumns.size(); i++) {
+            Expression expression = selectExpressions.get(i - bucketNumOffset);
+            if (expression.evaluate(result, ptr)) {
+                PColumn column = projectedColumns.get(i);
+                if (!column.getDataType().isSizeCompatible(ptr, null,
+                        expression.getDataType(), expression.getSortOrder(),
+                        expression.getMaxLength(), expression.getScale(),
+                        column.getMaxLength(), column.getScale())) {
+                    throw new DataExceedsCapacityException(
+                            column.getDataType(), column.getMaxLength(),
+                            column.getScale(), column.getName().getString(), ptr);
+                }
+                column.getDataType().coerceBytes(ptr, null,
+                        expression.getDataType(), expression.getMaxLength(),
+                        expression.getScale(), expression.getSortOrder(),
+                        column.getMaxLength(), column.getScale(),
+                        column.getSortOrder(), projectedTable.rowKeyOrderOptimizable());
+                byte[] bytes = ByteUtil.copyKeyBytesIfNecessary(ptr);
+                row.setValue(column, bytes);
+            }
+        }
+        for (Mutation mutation : row.toRowMutations()) {
+            if (replayMutations != null) {
+                mutation.setAttribute(REPLAY_WRITES, replayMutations);
+            } else if (txnProvider != null && projectedTable.getType() == PTableType.INDEX) {
+                mutation = txnProvider.markPutAsCommitted((Put)mutation, ts, ts);
+            }
+            mutations.add(mutation);
+        }
+        for (i = 0; i < selectExpressions.size(); i++) {
+            selectExpressions.get(i).reset();
+        }
+    }
+
+    void insertEmptyKeyValue(List<Cell> results, UngroupedAggregateRegionObserver.MutationList mutations) {
+        Set<Long> timeStamps =
+                Sets.newHashSetWithExpectedSize(results.size());
+        for (Cell kv : results) {
+            long kvts = kv.getTimestamp();
+            if (!timeStamps.contains(kvts)) {
+                Put put = new Put(kv.getRowArray(), kv.getRowOffset(),
+                        kv.getRowLength());
+                put.addColumn(emptyCF, QueryConstants.EMPTY_COLUMN_BYTES, kvts,
+                        ByteUtil.EMPTY_BYTE_ARRAY);
+                mutations.add(put);
+            }
+        }
+    }
+    @Override
+    public boolean next(List<Cell> resultsToReturn) throws IOException {

Review comment:
       Just a generic question: In case the RS dies and the server-side scanner dies with it, when a new scanner starts its work again it will start from row 0 within the region. However, can it occur that the previous scanner had returned rows to the client upon which the client issued DELETE/UPSERT operations corresponding to the UPSERT SELECT/DELETE? Does something inside UARO prevent this?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [phoenix] stoty commented on pull request #936: PHOENIX-5998 Paged server side ungrouped aggregate operations

Posted by GitBox <gi...@apache.org>.

stoty commented on pull request #936:
URL: https://github.com/apache/phoenix/pull/936#issuecomment-725732587


   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |:----:|----------:|--------:|:--------|
   | +0 :ok: |  reexec  |   0m 31s |  Docker mode activated.  |
   ||| _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  No case conflicting files found.  |
   | +1 :green_heart: |  hbaseanti  |   0m  0s |  Patch does not have any anti-patterns.  |
   | +1 :green_heart: |  @author  |   0m  0s |  The patch does not contain any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  The patch appears to include 1 new or modified test files.  |
   ||| _ 4.x Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  10m 41s |  4.x passed  |
   | +1 :green_heart: |  compile  |   0m 54s |  4.x passed  |
   | +1 :green_heart: |  checkstyle  |   1m 20s |  4.x passed  |
   | +1 :green_heart: |  javadoc  |   0m 43s |  4.x passed  |
   | +0 :ok: |  spotbugs  |   2m 58s |  phoenix-core in 4.x has 946 extant spotbugs warnings.  |
   ||| _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   5m 17s |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 55s |  the patch passed  |
   | +1 :green_heart: |  javac  |   0m 55s |  the patch passed  |
   | -1 :x: |  checkstyle  |   1m 22s |  phoenix-core: The patch generated 323 new + 1785 unchanged - 230 fixed = 2108 total (was 2015)  |
   | +1 :green_heart: |  whitespace  |   0m  0s |  The patch has no whitespace issues.  |
   | +1 :green_heart: |  javadoc  |   0m 43s |  the patch passed  |
   | -1 :x: |  spotbugs  |   3m  9s |  phoenix-core generated 1 new + 945 unchanged - 1 fixed = 946 total (was 946)  |
   ||| _ Other Tests _ |
   | -1 :x: |  unit  | 124m 11s |  phoenix-core in the patch failed.  |
   | +1 :green_heart: |  asflicense  |   0m 38s |  The patch does not generate ASF License warnings.  |
   |  |   | 156m  5s |   |
   
   
   | Reason | Tests |
   |-------:|:------|
   | FindBugs | module:phoenix-core |
   |  |  Switch statement found in org.apache.phoenix.coprocessor.UngroupedAggregateRegionScanner.descRowKeyOrderUpgrade(List, ImmutableBytesWritable, UngroupedAggregateRegionObserver$MutationList) where default case is missing  At UngroupedAggregateRegionScanner.java:ImmutableBytesWritable, UngroupedAggregateRegionObserver$MutationList) where default case is missing  At UngroupedAggregateRegionScanner.java:[lines 383-404] |
   | Failed junit tests | phoenix.rpc.PhoenixServerRpcIT |
   
   
   | Subsystem | Report/Notes |
   |----------:|:-------------|
   | Docker | ClientAPI=1.40 ServerAPI=1.40 base: https://ci-hadoop.apache.org/job/Phoenix/job/Phoenix-PreCommit-GitHub-PR/job/PR-936/10/artifact/yetus-general-check/output/Dockerfile |
   | GITHUB PR | https://github.com/apache/phoenix/pull/936 |
   | Optional Tests | dupname asflicense javac javadoc unit spotbugs hbaseanti checkstyle compile |
   | uname | Linux c158ee7b0562 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev/phoenix-personality.sh |
   | git revision | 4.x / c9c80b2 |
   | Default Java | Private Build-1.8.0_242-8u242-b08-0ubuntu3~16.04-b08 |
   | checkstyle | https://ci-hadoop.apache.org/job/Phoenix/job/Phoenix-PreCommit-GitHub-PR/job/PR-936/10/artifact/yetus-general-check/output/diff-checkstyle-phoenix-core.txt |
   | spotbugs | https://ci-hadoop.apache.org/job/Phoenix/job/Phoenix-PreCommit-GitHub-PR/job/PR-936/10/artifact/yetus-general-check/output/new-spotbugs-phoenix-core.html |
   | unit | https://ci-hadoop.apache.org/job/Phoenix/job/Phoenix-PreCommit-GitHub-PR/job/PR-936/10/artifact/yetus-general-check/output/patch-unit-phoenix-core.txt |
   |  Test Results | https://ci-hadoop.apache.org/job/Phoenix/job/Phoenix-PreCommit-GitHub-PR/job/PR-936/10/testReport/ |
   | Max. process+thread count | 3780 (vs. ulimit of 30000) |
   | modules | C: phoenix-core U: phoenix-core |
   | Console output | https://ci-hadoop.apache.org/job/Phoenix/job/Phoenix-PreCommit-GitHub-PR/job/PR-936/10/console |
   | versions | git=2.7.4 maven=3.3.9 spotbugs=4.1.3 |
   | Powered by | Apache Yetus 0.12.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [phoenix] kadirozde commented on a change in pull request #936: PHOENIX-5998 Paged server side ungrouped aggregate operations

Posted by GitBox <gi...@apache.org>.

kadirozde commented on a change in pull request #936:
URL: https://github.com/apache/phoenix/pull/936#discussion_r522662146



##########
File path: phoenix-core/src/main/java/org/apache/phoenix/coprocessor/UngroupedAggregateRegionScanner.java
##########
@@ -0,0 +1,645 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.phoenix.coprocessor;
+
+import static org.apache.phoenix.coprocessor.BaseScannerRegionObserver.LOCAL_INDEX_BUILD;
+import static org.apache.phoenix.coprocessor.BaseScannerRegionObserver.LOCAL_INDEX_BUILD_PROTO;
+import static org.apache.phoenix.coprocessor.BaseScannerRegionObserver.REPLAY_WRITES;
+import static org.apache.phoenix.coprocessor.BaseScannerRegionObserver.UPGRADE_DESC_ROW_KEY;
+import static org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver.checkForLocalIndexColumnFamilies;
+import static org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver.deserializeExpressions;
+import static org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver.deserializeTable;
+import static org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver.getBlockingMemstoreSize;
+import static org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver.setIndexAndTransactionProperties;
+import static org.apache.phoenix.query.QueryConstants.AGG_TIMESTAMP;
+import static org.apache.phoenix.query.QueryConstants.SINGLE_COLUMN;
+import static org.apache.phoenix.query.QueryConstants.SINGLE_COLUMN_FAMILY;
+import static org.apache.phoenix.query.QueryServices.MUTATE_BATCH_SIZE_ATTRIB;
+import static org.apache.phoenix.query.QueryServices.MUTATE_BATCH_SIZE_BYTES_ATTRIB;
+import static org.apache.phoenix.query.QueryServices.UNGROUPED_AGGREGATE_PAGE_SIZE_IN_ROWS;
+import static org.apache.phoenix.schema.PTableImpl.getColumnsToClone;
+
+import java.io.IOException;
+import java.sql.SQLException;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.List;
+import java.util.Set;
+
+import com.google.common.collect.Sets;
+import com.google.common.primitives.Ints;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.hbase.Cell;
+import org.apache.hadoop.hbase.CellUtil;
+import org.apache.hadoop.hbase.DoNotRetryIOException;
+import org.apache.hadoop.hbase.HRegionInfo;
+import org.apache.hadoop.hbase.KeyValue;
+import org.apache.hadoop.hbase.client.Delete;
+import org.apache.hadoop.hbase.client.HTable;
+import org.apache.hadoop.hbase.client.Mutation;
+import org.apache.hadoop.hbase.client.Put;
+import org.apache.hadoop.hbase.client.Scan;
+import org.apache.hadoop.hbase.coprocessor.ObserverContext;
+import org.apache.hadoop.hbase.coprocessor.RegionCoprocessorEnvironment;
+import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
+import org.apache.hadoop.hbase.regionserver.Region;
+import org.apache.hadoop.hbase.regionserver.RegionScanner;
+import org.apache.hadoop.hbase.util.Bytes;
+import org.apache.phoenix.cache.GlobalCache;
+import org.apache.phoenix.cache.TenantCache;
+import org.apache.phoenix.exception.DataExceedsCapacityException;
+import org.apache.phoenix.expression.Expression;
+import org.apache.phoenix.expression.aggregator.Aggregator;
+import org.apache.phoenix.expression.aggregator.Aggregators;
+import org.apache.phoenix.expression.aggregator.ServerAggregators;
+import org.apache.phoenix.hbase.index.covered.update.ColumnReference;
+import org.apache.phoenix.hbase.index.util.GenericKeyValueBuilder;
+import org.apache.phoenix.index.PhoenixIndexCodec;
+import org.apache.phoenix.memory.InsufficientMemoryException;
+import org.apache.phoenix.memory.MemoryManager;
+import org.apache.phoenix.query.QueryConstants;
+import org.apache.phoenix.query.QueryServicesOptions;
+import org.apache.phoenix.schema.PColumn;
+import org.apache.phoenix.schema.PRow;
+import org.apache.phoenix.schema.PTable;
+import org.apache.phoenix.schema.PTableImpl;
+import org.apache.phoenix.schema.PTableType;
+import org.apache.phoenix.schema.RowKeySchema;
+import org.apache.phoenix.schema.SortOrder;
+import org.apache.phoenix.schema.TableRef;
+import org.apache.phoenix.schema.ValueSchema;
+import org.apache.phoenix.schema.tuple.EncodedColumnQualiferCellsList;
+import org.apache.phoenix.schema.tuple.MultiKeyValueTuple;
+import org.apache.phoenix.schema.tuple.PositionBasedMultiKeyValueTuple;
+import org.apache.phoenix.schema.tuple.Tuple;
+import org.apache.phoenix.schema.types.PBinary;
+import org.apache.phoenix.schema.types.PChar;
+import org.apache.phoenix.schema.types.PDataType;
+import org.apache.phoenix.schema.types.PDouble;
+import org.apache.phoenix.schema.types.PFloat;
+import org.apache.phoenix.transaction.PhoenixTransactionContext;
+import org.apache.phoenix.transaction.PhoenixTransactionProvider;
+import org.apache.phoenix.transaction.TransactionFactory;
+import org.apache.phoenix.util.ByteUtil;
+import org.apache.phoenix.util.EncodedColumnsUtil;
+import org.apache.hadoop.hbase.util.Pair;
+import org.apache.phoenix.hbase.index.ValueGetter;
+import org.apache.phoenix.hbase.index.util.ImmutableBytesPtr;
+import org.apache.phoenix.index.IndexMaintainer;
+import org.apache.phoenix.util.ExpressionUtil;
+import org.apache.phoenix.util.IndexUtil;
+import org.apache.phoenix.util.KeyValueUtil;
+import org.apache.phoenix.util.LogUtil;
+import org.apache.phoenix.util.ScanUtil;
+import org.apache.phoenix.util.ServerUtil;
+import org.apache.phoenix.util.StringUtil;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+public class UngroupedAggregateRegionScanner extends BaseRegionScanner {
+
+    private static final Logger LOGGER = LoggerFactory.getLogger(UngroupedAggregateRegionScanner.class);
+
+    protected long pageSizeInRows = Long.MAX_VALUE;
+    protected int maxBatchSize = 0;
+    protected Scan scan;
+    protected RegionScanner innerScanner;
+    protected Region region;
+    private final UngroupedAggregateRegionObserver ungroupedAggregateRegionObserver;
+    final private RegionCoprocessorEnvironment env;
+    private final boolean useQualifierAsIndex;
+    boolean needToWrite = false;
+    final Pair<Integer, Integer> minMaxQualifiers;
+    byte[][] values = null;
+    PTable.QualifierEncodingScheme encodingScheme;
+    PTable writeToTable = null;
+    PTable projectedTable = null;
+    boolean isDescRowKeyOrderUpgrade;
+    final int offset;
+    boolean buildLocalIndex;
+    List<IndexMaintainer> indexMaintainers;
+    boolean isPKChanging = false;
+    long ts;
+    PhoenixTransactionProvider txnProvider = null;

Review comment:
       I will make all of them private as you suggested.

##########
File path: phoenix-core/src/main/java/org/apache/phoenix/coprocessor/UngroupedAggregateRegionObserver.java
##########
@@ -250,7 +202,31 @@ public void start(CoprocessorEnvironment e) throws IOException {
         indexWriteProps = new ReadOnlyProps(indexWriteConfig.iterator());
     }
 
-    public void commitBatchWithRetries(final Region region, final List<Mutation> localRegionMutations, final long blockingMemstoreSize) throws IOException {
+    Configuration getUpsertSelectConfig() {

Review comment:
       Yes. Initially these were class private. I made them public unnecessarily in one of my previous Jiras when I introduced IndexRebuildRegionScanner. Based on Chinmay's feedback, I made them package private as they are only accessed within the package.

##########
File path: phoenix-core/src/main/java/org/apache/phoenix/coprocessor/UngroupedAggregateRegionScanner.java
##########
@@ -0,0 +1,641 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.phoenix.coprocessor;
+
+import static org.apache.phoenix.coprocessor.BaseScannerRegionObserver.LOCAL_INDEX_BUILD;
+import static org.apache.phoenix.coprocessor.BaseScannerRegionObserver.LOCAL_INDEX_BUILD_PROTO;
+import static org.apache.phoenix.coprocessor.BaseScannerRegionObserver.REPLAY_WRITES;
+import static org.apache.phoenix.coprocessor.BaseScannerRegionObserver.UPGRADE_DESC_ROW_KEY;
+import static org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver.checkForLocalIndexColumnFamilies;
+import static org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver.deserializeExpressions;
+import static org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver.deserializeTable;
+import static org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver.getBlockingMemstoreSize;
+import static org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver.setIndexAndTransactionProperties;
+import static org.apache.phoenix.query.QueryConstants.AGG_TIMESTAMP;
+import static org.apache.phoenix.query.QueryConstants.SINGLE_COLUMN;
+import static org.apache.phoenix.query.QueryConstants.SINGLE_COLUMN_FAMILY;
+import static org.apache.phoenix.query.QueryServices.MUTATE_BATCH_SIZE_ATTRIB;
+import static org.apache.phoenix.query.QueryServices.MUTATE_BATCH_SIZE_BYTES_ATTRIB;
+import static org.apache.phoenix.query.QueryServices.UNGROUPED_AGGREGATE_PAGE_SIZE_IN_ROWS;
+import static org.apache.phoenix.schema.PTableImpl.getColumnsToClone;
+
+import java.io.IOException;
+import java.sql.SQLException;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.List;
+import java.util.Set;
+
+import com.google.common.collect.Sets;
+import com.google.common.primitives.Ints;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.hbase.Cell;
+import org.apache.hadoop.hbase.CellUtil;
+import org.apache.hadoop.hbase.HRegionInfo;
+import org.apache.hadoop.hbase.KeyValue;
+import org.apache.hadoop.hbase.client.Delete;
+import org.apache.hadoop.hbase.client.HTable;
+import org.apache.hadoop.hbase.client.Mutation;
+import org.apache.hadoop.hbase.client.Put;
+import org.apache.hadoop.hbase.client.Scan;
+import org.apache.hadoop.hbase.coprocessor.ObserverContext;
+import org.apache.hadoop.hbase.coprocessor.RegionCoprocessorEnvironment;
+import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
+import org.apache.hadoop.hbase.regionserver.Region;
+import org.apache.hadoop.hbase.regionserver.RegionScanner;
+import org.apache.hadoop.hbase.util.Bytes;
+import org.apache.phoenix.cache.GlobalCache;
+import org.apache.phoenix.cache.TenantCache;
+import org.apache.phoenix.exception.DataExceedsCapacityException;
+import org.apache.phoenix.expression.Expression;
+import org.apache.phoenix.expression.aggregator.Aggregator;
+import org.apache.phoenix.expression.aggregator.Aggregators;
+import org.apache.phoenix.expression.aggregator.ServerAggregators;
+import org.apache.phoenix.hbase.index.covered.update.ColumnReference;
+import org.apache.phoenix.hbase.index.util.GenericKeyValueBuilder;
+import org.apache.phoenix.index.PhoenixIndexCodec;
+import org.apache.phoenix.memory.MemoryManager;
+import org.apache.phoenix.query.QueryConstants;
+import org.apache.phoenix.query.QueryServicesOptions;
+import org.apache.phoenix.schema.PColumn;
+import org.apache.phoenix.schema.PRow;
+import org.apache.phoenix.schema.PTable;
+import org.apache.phoenix.schema.PTableImpl;
+import org.apache.phoenix.schema.PTableType;
+import org.apache.phoenix.schema.RowKeySchema;
+import org.apache.phoenix.schema.SortOrder;
+import org.apache.phoenix.schema.TableRef;
+import org.apache.phoenix.schema.ValueSchema;
+import org.apache.phoenix.schema.tuple.EncodedColumnQualiferCellsList;
+import org.apache.phoenix.schema.tuple.MultiKeyValueTuple;
+import org.apache.phoenix.schema.tuple.PositionBasedMultiKeyValueTuple;
+import org.apache.phoenix.schema.tuple.Tuple;
+import org.apache.phoenix.schema.types.PBinary;
+import org.apache.phoenix.schema.types.PChar;
+import org.apache.phoenix.schema.types.PDataType;
+import org.apache.phoenix.schema.types.PDouble;
+import org.apache.phoenix.schema.types.PFloat;
+import org.apache.phoenix.transaction.PhoenixTransactionContext;
+import org.apache.phoenix.transaction.PhoenixTransactionProvider;
+import org.apache.phoenix.transaction.TransactionFactory;
+import org.apache.phoenix.util.ByteUtil;
+import org.apache.phoenix.util.EncodedColumnsUtil;
+import org.apache.hadoop.hbase.util.Pair;
+import org.apache.phoenix.hbase.index.ValueGetter;
+import org.apache.phoenix.hbase.index.util.ImmutableBytesPtr;
+import org.apache.phoenix.index.IndexMaintainer;
+import org.apache.phoenix.util.ExpressionUtil;
+import org.apache.phoenix.util.IndexUtil;
+import org.apache.phoenix.util.KeyValueUtil;
+import org.apache.phoenix.util.LogUtil;
+import org.apache.phoenix.util.ScanUtil;
+import org.apache.phoenix.util.ServerUtil;
+import org.apache.phoenix.util.StringUtil;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+public class UngroupedAggregateRegionScanner extends BaseRegionScanner {
+
+    private static final Logger LOGGER = LoggerFactory.getLogger(UngroupedAggregateRegionScanner.class);
+
+    protected long pageSizeInRows = Long.MAX_VALUE;
+    protected int maxBatchSize = 0;
+    protected Scan scan;
+    protected RegionScanner innerScanner;
+    protected Region region;
+    private final UngroupedAggregateRegionObserver ungroupedAggregateRegionObserver;
+    final private RegionCoprocessorEnvironment env;
+    private final boolean useQualifierAsIndex;
+    boolean needToWrite = false;
+    final Pair<Integer, Integer> minMaxQualifiers;
+    byte[][] values = null;
+    PTable.QualifierEncodingScheme encodingScheme;
+    PTable writeToTable = null;
+    PTable projectedTable = null;
+    boolean isDescRowKeyOrderUpgrade;
+    final int offset;
+    boolean buildLocalIndex;
+    List<IndexMaintainer> indexMaintainers;
+    boolean isPKChanging = false;
+    long ts;
+    PhoenixTransactionProvider txnProvider = null;
+    UngroupedAggregateRegionObserver.MutationList indexMutations;
+    boolean isDelete = false;
+    byte[] replayMutations;
+    boolean isUpsert = false;
+    List<Expression> selectExpressions = null;
+    byte[] deleteCQ = null;
+    byte[] deleteCF = null;
+    byte[] emptyCF = null;
+    byte[] indexUUID;
+    byte[] txState;
+    byte[] clientVersionBytes;
+    long blockingMemStoreSize;
+    long maxBatchSizeBytes = 0L;
+    HTable targetHTable = null;
+    boolean incrScanRefCount = false;
+    Aggregators aggregators = null;
+    Aggregator[] rowAggregators = null;
+    byte[] indexMaintainersPtr;
+    boolean useIndexProto;
+    MemoryManager.MemoryChunk em = null;
+    boolean hasMore = false;
+
+    public UngroupedAggregateRegionScanner(final ObserverContext<RegionCoprocessorEnvironment> c,
+                                           final RegionScanner innerScanner, final Region region, final Scan scan,
+                                           final RegionCoprocessorEnvironment env,
+                                           final UngroupedAggregateRegionObserver ungroupedAggregateRegionObserver)
+            throws IOException, SQLException{
+        super(innerScanner);
+        this.env = env;
+        this.region = region;
+        this.scan = scan;
+        this.ungroupedAggregateRegionObserver = ungroupedAggregateRegionObserver;
+        this.innerScanner = innerScanner;
+        Configuration conf = env.getConfiguration();
+        byte[] pageSizeFromScan =
+                scan.getAttribute(BaseScannerRegionObserver.AGGREGATE_PAGE_ROWS);
+        if (pageSizeFromScan != null) {
+            pageSizeInRows = Bytes.toLong(pageSizeFromScan);
+        } else {
+            pageSizeInRows =
+                    conf.getLong(UNGROUPED_AGGREGATE_PAGE_SIZE_IN_ROWS,
+                            QueryServicesOptions.DEFAULT_UNGROUPED_AGGREGATE_PAGE_SIZE_IN_ROWS);
+        }
+        ts = scan.getTimeRange().getMax();
+        boolean localIndexScan = ScanUtil.isLocalIndex(scan);
+
+        encodingScheme = EncodedColumnsUtil.getQualifierEncodingScheme(scan);
+        int offsetToBe = 0;
+        if (localIndexScan) {
+            /*
+             * For local indexes, we need to set an offset on row key expressions to skip
+             * the region start key.
+             */
+            offsetToBe = region.getRegionInfo().getStartKey().length != 0 ? region.getRegionInfo().getStartKey().length :
+                    region.getRegionInfo().getEndKey().length;
+            ScanUtil.setRowKeyOffset(scan, offsetToBe);
+        }
+        offset = offsetToBe;
+
+        byte[] descRowKeyTableBytes = scan.getAttribute(UPGRADE_DESC_ROW_KEY);
+        isDescRowKeyOrderUpgrade = descRowKeyTableBytes != null;
+        if (isDescRowKeyOrderUpgrade) {
+            LOGGER.debug("Upgrading row key for " + region.getRegionInfo().getTable().getNameAsString());
+            projectedTable = deserializeTable(descRowKeyTableBytes);
+            try {
+                writeToTable = PTableImpl.builderWithColumns(projectedTable,
+                        getColumnsToClone(projectedTable))
+                        .setRowKeyOrderOptimizable(true)
+                        .build();
+            } catch (SQLException e) {
+                ServerUtil.throwIOException("Upgrade failed", e); // Impossible
+            }
+            values = new byte[projectedTable.getPKColumns().size()][];
+        }
+        boolean useProto = false;
+        byte[] localIndexBytes = scan.getAttribute(LOCAL_INDEX_BUILD_PROTO);
+        useProto = localIndexBytes != null;
+        if (localIndexBytes == null) {
+            localIndexBytes = scan.getAttribute(LOCAL_INDEX_BUILD);
+        }
+        indexMaintainers = localIndexBytes == null ? null : IndexMaintainer.deserialize(localIndexBytes, useProto);
+        indexMutations = localIndexBytes == null ? new UngroupedAggregateRegionObserver.MutationList() : new UngroupedAggregateRegionObserver.MutationList(1024);
+
+        replayMutations = scan.getAttribute(REPLAY_WRITES);
+        indexUUID = scan.getAttribute(PhoenixIndexCodec.INDEX_UUID);
+        txState = scan.getAttribute(BaseScannerRegionObserver.TX_STATE);
+        clientVersionBytes = scan.getAttribute(BaseScannerRegionObserver.CLIENT_VERSION);
+        if (txState != null) {
+            int clientVersion = clientVersionBytes == null ? ScanUtil.UNKNOWN_CLIENT_VERSION : Bytes.toInt(clientVersionBytes);
+            txnProvider = TransactionFactory.getTransactionProvider(txState, clientVersion);
+        }
+        byte[] upsertSelectTable = scan.getAttribute(BaseScannerRegionObserver.UPSERT_SELECT_TABLE);
+        if (upsertSelectTable != null) {
+            isUpsert = true;
+            projectedTable = deserializeTable(upsertSelectTable);
+            targetHTable = new HTable(ungroupedAggregateRegionObserver.getUpsertSelectConfig(),
+                    projectedTable.getPhysicalName().getBytes());
+            selectExpressions = deserializeExpressions(scan.getAttribute(BaseScannerRegionObserver.UPSERT_SELECT_EXPRS));
+            values = new byte[projectedTable.getPKColumns().size()][];
+            isPKChanging = ExpressionUtil.isPkPositionChanging(new TableRef(projectedTable), selectExpressions);
+        } else {
+            byte[] isDeleteAgg = scan.getAttribute(BaseScannerRegionObserver.DELETE_AGG);
+            isDelete = isDeleteAgg != null && Bytes.compareTo(PDataType.TRUE_BYTES, isDeleteAgg) == 0;
+            if (!isDelete) {
+                deleteCF = scan.getAttribute(BaseScannerRegionObserver.DELETE_CF);
+                deleteCQ = scan.getAttribute(BaseScannerRegionObserver.DELETE_CQ);
+            }
+            emptyCF = scan.getAttribute(BaseScannerRegionObserver.EMPTY_CF);
+        }
+        ColumnReference[] dataColumns = IndexUtil.deserializeDataTableColumnsToJoin(scan);
+        useQualifierAsIndex = EncodedColumnsUtil.useQualifierAsIndex(EncodedColumnsUtil.getMinMaxQualifiersFromScan(scan));
+
+        /**
+         * Slow down the writes if the memstore size more than
+         * (hbase.hregion.memstore.block.multiplier - 1) times hbase.hregion.memstore.flush.size
+         * bytes. This avoids flush storm to hdfs for cases like index building where reads and
+         * write happen to all the table regions in the server.
+         */
+        blockingMemStoreSize = getBlockingMemstoreSize(region, conf) ;
+
+        buildLocalIndex = indexMaintainers != null && dataColumns==null && !localIndexScan;
+        if(buildLocalIndex) {
+            checkForLocalIndexColumnFamilies(region, indexMaintainers);
+        }
+        if (isDescRowKeyOrderUpgrade || isDelete || isUpsert
+                || (deleteCQ != null && deleteCF != null) || emptyCF != null || buildLocalIndex) {
+            needToWrite = true;
+            if((isUpsert && (targetHTable == null ||
+                    !targetHTable.getName().equals(region.getTableDesc().getTableName())))) {
+                needToWrite = false;
+            }
+            maxBatchSize = conf.getInt(MUTATE_BATCH_SIZE_ATTRIB, QueryServicesOptions.DEFAULT_MUTATE_BATCH_SIZE);
+            maxBatchSizeBytes = conf.getLong(MUTATE_BATCH_SIZE_BYTES_ATTRIB,
+                    QueryServicesOptions.DEFAULT_MUTATE_BATCH_SIZE_BYTES);
+        }
+        minMaxQualifiers = EncodedColumnsUtil.getMinMaxQualifiersFromScan(scan);
+        final TenantCache tenantCache = GlobalCache.getTenantCache(env, ScanUtil.getTenantId(scan));
+
+        em = tenantCache.getMemoryManager().allocate(0);
+        aggregators = ServerAggregators.deserialize(
+                scan.getAttribute(BaseScannerRegionObserver.AGGREGATORS), conf, em);
+        rowAggregators = aggregators.getAggregators();
+        if (LOGGER.isDebugEnabled()) {
+            LOGGER.debug(LogUtil.addCustomAnnotations("Starting ungrouped coprocessor scan " + scan + " " + region.getRegionInfo(), ScanUtil.getCustomAnnotations(scan)));
+        }
+        useIndexProto = true;
+        indexMaintainersPtr = scan.getAttribute(PhoenixIndexCodec.INDEX_PROTO_MD);
+        // for backward compatiblity fall back to look by the old attribute
+        if (indexMaintainersPtr == null) {
+            indexMaintainersPtr = scan.getAttribute(PhoenixIndexCodec.INDEX_MD);
+            useIndexProto = false;
+        }
+
+        if (needToWrite) {
+            ungroupedAggregateRegionObserver.incrementScansReferenceCount();
+            incrScanRefCount = true;
+        }
+    }
+
+    @Override
+    public HRegionInfo getRegionInfo() {
+        return region.getRegionInfo();
+    }
+
+    @Override
+    public boolean isFilterDone() {
+        return false;
+    }
+
+    @Override
+    public void close() throws IOException {
+        if (needToWrite && incrScanRefCount) {
+            ungroupedAggregateRegionObserver.decrementScansReferenceCount();
+        }
+        try {
+            if (targetHTable != null) {
+                try {
+                    targetHTable.close();
+                } catch (IOException e) {
+                    LOGGER.error("Closing table: " + targetHTable + " failed: ", e);
+                }
+            }
+        } finally {
+            innerScanner.close();
+            if (em != null) {
+                em.close();
+            }
+        }
+    }
+
+    boolean descRowKeyOrderUpgrade(List<Cell> results, ImmutableBytesWritable ptr,
+                                UngroupedAggregateRegionObserver.MutationList mutations) throws IOException {
+        Arrays.fill(values, null);
+        Cell firstKV = results.get(0);
+        RowKeySchema schema = projectedTable.getRowKeySchema();
+        int maxOffset = schema.iterator(firstKV.getRowArray(), firstKV.getRowOffset() + offset, firstKV.getRowLength(), ptr);
+        for (int i = 0; i < schema.getFieldCount(); i++) {
+            Boolean hasValue = schema.next(ptr, i, maxOffset);
+            if (hasValue == null) {
+                break;
+            }
+            ValueSchema.Field field = schema.getField(i);
+            if (field.getSortOrder() == SortOrder.DESC) {
+                // Special case for re-writing DESC ARRAY, as the actual byte value needs to change in this case
+                if (field.getDataType().isArrayType()) {
+                    field.getDataType().coerceBytes(ptr, null, field.getDataType(),
+                            field.getMaxLength(), field.getScale(), field.getSortOrder(),
+                            field.getMaxLength(), field.getScale(), field.getSortOrder(), true); // force to use correct separator byte
+                }
+                // Special case for re-writing DESC CHAR or DESC BINARY, to force the re-writing of trailing space characters
+                else if (field.getDataType() == PChar.INSTANCE || field.getDataType() == PBinary.INSTANCE) {
+                    int len = ptr.getLength();
+                    while (len > 0 && ptr.get()[ptr.getOffset() + len - 1] == StringUtil.SPACE_UTF8) {
+                        len--;
+                    }
+                    ptr.set(ptr.get(), ptr.getOffset(), len);
+                    // Special case for re-writing DESC FLOAT and DOUBLE, as they're not inverted like they should be (PHOENIX-2171)
+                } else if (field.getDataType() == PFloat.INSTANCE || field.getDataType() == PDouble.INSTANCE) {
+                    byte[] invertedBytes = SortOrder.invert(ptr.get(), ptr.getOffset(), ptr.getLength());
+                    ptr.set(invertedBytes);
+                }
+            } else if (field.getDataType() == PBinary.INSTANCE) {
+                // Remove trailing space characters so that the setValues call below will replace them
+                // with the correct zero byte character. Note this is somewhat dangerous as these
+                // could be legit, but I don't know what the alternative is.
+                int len = ptr.getLength();
+                while (len > 0 && ptr.get()[ptr.getOffset() + len - 1] == StringUtil.SPACE_UTF8) {
+                    len--;
+                }
+                ptr.set(ptr.get(), ptr.getOffset(), len);
+            }
+            values[i] = ptr.copyBytes();
+        }
+        writeToTable.newKey(ptr, values);
+        if (Bytes.compareTo(
+                firstKV.getRowArray(), firstKV.getRowOffset() + offset, firstKV.getRowLength(),
+                ptr.get(),ptr.getOffset() + offset,ptr.getLength()) == 0) {
+            return false;
+        }
+        byte[] newRow = ByteUtil.copyKeyBytesIfNecessary(ptr);
+        if (offset > 0) { // for local indexes (prepend region start key)
+            byte[] newRowWithOffset = new byte[offset + newRow.length];
+            System.arraycopy(firstKV.getRowArray(), firstKV.getRowOffset(), newRowWithOffset, 0, offset);;
+            System.arraycopy(newRow, 0, newRowWithOffset, offset, newRow.length);
+            newRow = newRowWithOffset;
+        }
+        byte[] oldRow = Bytes.copy(firstKV.getRowArray(), firstKV.getRowOffset(), firstKV.getRowLength());
+        for (Cell cell : results) {
+            // Copy existing cell but with new row key
+            Cell newCell = new KeyValue(newRow, 0, newRow.length,
+                    cell.getFamilyArray(), cell.getFamilyOffset(), cell.getFamilyLength(),
+                    cell.getQualifierArray(), cell.getQualifierOffset(), cell.getQualifierLength(),
+                    cell.getTimestamp(), KeyValue.Type.codeToType(cell.getTypeByte()),
+                    cell.getValueArray(), cell.getValueOffset(), cell.getValueLength());
+            switch (KeyValue.Type.codeToType(cell.getTypeByte())) {
+                case Put:
+                    // If Put, point delete old Put
+                    Delete del = new Delete(oldRow);
+                    del.addDeleteMarker(new KeyValue(cell.getRowArray(), cell.getRowOffset(), cell.getRowLength(),
+                            cell.getFamilyArray(), cell.getFamilyOffset(), cell.getFamilyLength(),
+                            cell.getQualifierArray(), cell.getQualifierOffset(),
+                            cell.getQualifierLength(), cell.getTimestamp(), KeyValue.Type.Delete,
+                            ByteUtil.EMPTY_BYTE_ARRAY, 0, 0));
+                    mutations.add(del);
+
+                    Put put = new Put(newRow);
+                    put.add(newCell);
+                    mutations.add(put);
+                    break;
+                case Delete:
+                case DeleteColumn:
+                case DeleteFamily:
+                case DeleteFamilyVersion:
+                    Delete delete = new Delete(newRow);
+                    delete.addDeleteMarker(newCell);
+                    mutations.add(delete);
+                    break;
+            }
+        }
+        return true;
+    }
+
+    void buildLocalIndex(Tuple result, List<Cell> results, ImmutableBytesWritable ptr,
+                         UngroupedAggregateRegionObserver.MutationList mutations) throws IOException {
+        for (IndexMaintainer maintainer : indexMaintainers) {
+            if (!results.isEmpty()) {
+                result.getKey(ptr);
+                ValueGetter valueGetter =
+                        maintainer.createGetterFromKeyValues(
+                                ImmutableBytesPtr.copyBytesIfNecessary(ptr),
+                                results);
+                Put put = maintainer.buildUpdateMutation(GenericKeyValueBuilder.INSTANCE,
+                        valueGetter, ptr, results.get(0).getTimestamp(),
+                        env.getRegion().getRegionInfo().getStartKey(),
+                        env.getRegion().getRegionInfo().getEndKey());
+
+                if (txnProvider != null) {
+                    put = txnProvider.markPutAsCommitted(put, ts, ts);
+                }
+                indexMutations.add(put);
+            }
+        }
+        result.setKeyValues(results);
+    }
+    void deleteRow(List<Cell> results, UngroupedAggregateRegionObserver.MutationList mutations) {
+        // FIXME: the version of the Delete constructor without the lock

Review comment:
       I will remove the comment

##########
File path: phoenix-core/src/main/java/org/apache/phoenix/coprocessor/UngroupedAggregateRegionScanner.java
##########
@@ -0,0 +1,646 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.phoenix.coprocessor;
+
+import static org.apache.phoenix.coprocessor.BaseScannerRegionObserver.LOCAL_INDEX_BUILD;
+import static org.apache.phoenix.coprocessor.BaseScannerRegionObserver.LOCAL_INDEX_BUILD_PROTO;
+import static org.apache.phoenix.coprocessor.BaseScannerRegionObserver.REPLAY_WRITES;
+import static org.apache.phoenix.coprocessor.BaseScannerRegionObserver.UPGRADE_DESC_ROW_KEY;
+import static org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver.checkForLocalIndexColumnFamilies;
+import static org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver.deserializeExpressions;
+import static org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver.deserializeTable;
+import static org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver.getBlockingMemstoreSize;
+import static org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver.setIndexAndTransactionProperties;
+import static org.apache.phoenix.query.QueryConstants.AGG_TIMESTAMP;
+import static org.apache.phoenix.query.QueryConstants.SINGLE_COLUMN;
+import static org.apache.phoenix.query.QueryConstants.SINGLE_COLUMN_FAMILY;
+import static org.apache.phoenix.query.QueryServices.MUTATE_BATCH_SIZE_ATTRIB;
+import static org.apache.phoenix.query.QueryServices.MUTATE_BATCH_SIZE_BYTES_ATTRIB;
+import static org.apache.phoenix.query.QueryServices.UNGROUPED_AGGREGATE_PAGE_SIZE_IN_MS;
+import static org.apache.phoenix.schema.PTableImpl.getColumnsToClone;
+
+import java.io.IOException;
+import java.sql.SQLException;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.List;
+import java.util.Set;
+
+import com.google.common.collect.Sets;
+import com.google.common.primitives.Ints;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.hbase.Cell;
+import org.apache.hadoop.hbase.CellUtil;
+import org.apache.hadoop.hbase.DoNotRetryIOException;
+import org.apache.hadoop.hbase.HRegionInfo;
+import org.apache.hadoop.hbase.KeyValue;
+import org.apache.hadoop.hbase.client.Delete;
+import org.apache.hadoop.hbase.client.HTable;
+import org.apache.hadoop.hbase.client.Mutation;
+import org.apache.hadoop.hbase.client.Put;
+import org.apache.hadoop.hbase.client.Scan;
+import org.apache.hadoop.hbase.coprocessor.ObserverContext;
+import org.apache.hadoop.hbase.coprocessor.RegionCoprocessorEnvironment;
+import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
+import org.apache.hadoop.hbase.regionserver.Region;
+import org.apache.hadoop.hbase.regionserver.RegionScanner;
+import org.apache.hadoop.hbase.util.Bytes;
+import org.apache.phoenix.cache.GlobalCache;
+import org.apache.phoenix.cache.TenantCache;
+import org.apache.phoenix.exception.DataExceedsCapacityException;
+import org.apache.phoenix.expression.Expression;
+import org.apache.phoenix.expression.aggregator.Aggregator;
+import org.apache.phoenix.expression.aggregator.Aggregators;
+import org.apache.phoenix.expression.aggregator.ServerAggregators;
+import org.apache.phoenix.hbase.index.covered.update.ColumnReference;
+import org.apache.phoenix.hbase.index.util.GenericKeyValueBuilder;
+import org.apache.phoenix.index.PhoenixIndexCodec;
+import org.apache.phoenix.memory.InsufficientMemoryException;
+import org.apache.phoenix.memory.MemoryManager;
+import org.apache.phoenix.query.QueryConstants;
+import org.apache.phoenix.query.QueryServicesOptions;
+import org.apache.phoenix.schema.PColumn;
+import org.apache.phoenix.schema.PRow;
+import org.apache.phoenix.schema.PTable;
+import org.apache.phoenix.schema.PTableImpl;
+import org.apache.phoenix.schema.PTableType;
+import org.apache.phoenix.schema.RowKeySchema;
+import org.apache.phoenix.schema.SortOrder;
+import org.apache.phoenix.schema.TableRef;
+import org.apache.phoenix.schema.ValueSchema;
+import org.apache.phoenix.schema.tuple.EncodedColumnQualiferCellsList;
+import org.apache.phoenix.schema.tuple.MultiKeyValueTuple;
+import org.apache.phoenix.schema.tuple.PositionBasedMultiKeyValueTuple;
+import org.apache.phoenix.schema.tuple.Tuple;
+import org.apache.phoenix.schema.types.PBinary;
+import org.apache.phoenix.schema.types.PChar;
+import org.apache.phoenix.schema.types.PDataType;
+import org.apache.phoenix.schema.types.PDouble;
+import org.apache.phoenix.schema.types.PFloat;
+import org.apache.phoenix.transaction.PhoenixTransactionContext;
+import org.apache.phoenix.transaction.PhoenixTransactionProvider;
+import org.apache.phoenix.transaction.TransactionFactory;
+import org.apache.phoenix.util.ByteUtil;
+import org.apache.phoenix.util.EncodedColumnsUtil;
+import org.apache.hadoop.hbase.util.Pair;
+import org.apache.phoenix.hbase.index.ValueGetter;
+import org.apache.phoenix.hbase.index.util.ImmutableBytesPtr;
+import org.apache.phoenix.index.IndexMaintainer;
+import org.apache.phoenix.util.EnvironmentEdgeManager;
+import org.apache.phoenix.util.ExpressionUtil;
+import org.apache.phoenix.util.IndexUtil;
+import org.apache.phoenix.util.KeyValueUtil;
+import org.apache.phoenix.util.LogUtil;
+import org.apache.phoenix.util.ScanUtil;
+import org.apache.phoenix.util.ServerUtil;
+import org.apache.phoenix.util.StringUtil;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+public class UngroupedAggregateRegionScanner extends BaseRegionScanner {
+
+    private static final Logger LOGGER = LoggerFactory.getLogger(UngroupedAggregateRegionScanner.class);
+
+    protected long pageSizeInMs = Long.MAX_VALUE;
+    protected int maxBatchSize = 0;
+    protected Scan scan;
+    protected RegionScanner innerScanner;
+    protected Region region;
+    private final UngroupedAggregateRegionObserver ungroupedAggregateRegionObserver;
+    final private RegionCoprocessorEnvironment env;
+    private final boolean useQualifierAsIndex;
+    boolean needToWrite = false;
+    final Pair<Integer, Integer> minMaxQualifiers;
+    byte[][] values = null;
+    PTable.QualifierEncodingScheme encodingScheme;
+    PTable writeToTable = null;
+    PTable projectedTable = null;
+    boolean isDescRowKeyOrderUpgrade;
+    final int offset;
+    boolean buildLocalIndex;
+    List<IndexMaintainer> indexMaintainers;
+    boolean isPKChanging = false;
+    long ts;
+    PhoenixTransactionProvider txnProvider = null;
+    UngroupedAggregateRegionObserver.MutationList indexMutations;
+    boolean isDelete = false;
+    byte[] replayMutations;
+    boolean isUpsert = false;
+    List<Expression> selectExpressions = null;
+    byte[] deleteCQ = null;
+    byte[] deleteCF = null;
+    byte[] emptyCF = null;
+    byte[] indexUUID;
+    byte[] txState;
+    byte[] clientVersionBytes;
+    long blockingMemStoreSize;
+    long maxBatchSizeBytes = 0L;
+    HTable targetHTable = null;
+    boolean incrScanRefCount = false;
+    byte[] indexMaintainersPtr;
+    boolean useIndexProto;
+
+    public UngroupedAggregateRegionScanner(final ObserverContext<RegionCoprocessorEnvironment> c,
+                                           final RegionScanner innerScanner, final Region region, final Scan scan,
+                                           final RegionCoprocessorEnvironment env,
+                                           final UngroupedAggregateRegionObserver ungroupedAggregateRegionObserver)
+            throws IOException, SQLException{
+        super(innerScanner);
+        this.env = env;
+        this.region = region;
+        this.scan = scan;
+        this.ungroupedAggregateRegionObserver = ungroupedAggregateRegionObserver;
+        this.innerScanner = innerScanner;
+        Configuration conf = env.getConfiguration();
+        if (scan.getAttribute(BaseScannerRegionObserver.SERVER_PAGING) != null) {
+            byte[] pageSizeFromScan =
+                    scan.getAttribute(BaseScannerRegionObserver.AGGREGATE_PAGE_ROWS);
+            if (pageSizeFromScan != null) {
+                pageSizeInMs = Bytes.toLong(pageSizeFromScan);
+            } else {
+                pageSizeInMs =
+                        conf.getLong(UNGROUPED_AGGREGATE_PAGE_SIZE_IN_MS,
+                                QueryServicesOptions.DEFAULT_UNGROUPED_AGGREGATE_PAGE_SIZE_IN_MS);
+            }
+        }
+        ts = scan.getTimeRange().getMax();
+        boolean localIndexScan = ScanUtil.isLocalIndex(scan);
+
+        encodingScheme = EncodedColumnsUtil.getQualifierEncodingScheme(scan);
+        int offsetToBe = 0;
+        if (localIndexScan) {
+            /*
+             * For local indexes, we need to set an offset on row key expressions to skip
+             * the region start key.
+             */
+            offsetToBe = region.getRegionInfo().getStartKey().length != 0 ? region.getRegionInfo().getStartKey().length :
+                    region.getRegionInfo().getEndKey().length;
+            ScanUtil.setRowKeyOffset(scan, offsetToBe);
+        }
+        offset = offsetToBe;
+
+        byte[] descRowKeyTableBytes = scan.getAttribute(UPGRADE_DESC_ROW_KEY);
+        isDescRowKeyOrderUpgrade = descRowKeyTableBytes != null;
+        if (isDescRowKeyOrderUpgrade) {
+            LOGGER.debug("Upgrading row key for " + region.getRegionInfo().getTable().getNameAsString());
+            projectedTable = deserializeTable(descRowKeyTableBytes);
+            try {
+                writeToTable = PTableImpl.builderWithColumns(projectedTable,
+                        getColumnsToClone(projectedTable))
+                        .setRowKeyOrderOptimizable(true)
+                        .build();
+            } catch (SQLException e) {
+                ServerUtil.throwIOException("Upgrade failed", e); // Impossible
+            }
+            values = new byte[projectedTable.getPKColumns().size()][];
+        }
+        boolean useProto = false;
+        byte[] localIndexBytes = scan.getAttribute(LOCAL_INDEX_BUILD_PROTO);
+        useProto = localIndexBytes != null;
+        if (localIndexBytes == null) {
+            localIndexBytes = scan.getAttribute(LOCAL_INDEX_BUILD);
+        }
+        indexMaintainers = localIndexBytes == null ? null : IndexMaintainer.deserialize(localIndexBytes, useProto);
+        indexMutations = localIndexBytes == null ? new UngroupedAggregateRegionObserver.MutationList() : new UngroupedAggregateRegionObserver.MutationList(1024);
+
+        replayMutations = scan.getAttribute(REPLAY_WRITES);
+        indexUUID = scan.getAttribute(PhoenixIndexCodec.INDEX_UUID);
+        txState = scan.getAttribute(BaseScannerRegionObserver.TX_STATE);
+        clientVersionBytes = scan.getAttribute(BaseScannerRegionObserver.CLIENT_VERSION);
+        if (txState != null) {
+            int clientVersion = clientVersionBytes == null ? ScanUtil.UNKNOWN_CLIENT_VERSION : Bytes.toInt(clientVersionBytes);
+            txnProvider = TransactionFactory.getTransactionProvider(txState, clientVersion);
+        }
+        byte[] upsertSelectTable = scan.getAttribute(BaseScannerRegionObserver.UPSERT_SELECT_TABLE);
+        if (upsertSelectTable != null) {
+            isUpsert = true;
+            projectedTable = deserializeTable(upsertSelectTable);
+            targetHTable = new HTable(ungroupedAggregateRegionObserver.getUpsertSelectConfig(),
+                    projectedTable.getPhysicalName().getBytes());
+            selectExpressions = deserializeExpressions(scan.getAttribute(BaseScannerRegionObserver.UPSERT_SELECT_EXPRS));
+            values = new byte[projectedTable.getPKColumns().size()][];
+            isPKChanging = ExpressionUtil.isPkPositionChanging(new TableRef(projectedTable), selectExpressions);
+        } else {
+            byte[] isDeleteAgg = scan.getAttribute(BaseScannerRegionObserver.DELETE_AGG);
+            isDelete = isDeleteAgg != null && Bytes.compareTo(PDataType.TRUE_BYTES, isDeleteAgg) == 0;
+            if (!isDelete) {
+                deleteCF = scan.getAttribute(BaseScannerRegionObserver.DELETE_CF);
+                deleteCQ = scan.getAttribute(BaseScannerRegionObserver.DELETE_CQ);
+            }
+            emptyCF = scan.getAttribute(BaseScannerRegionObserver.EMPTY_CF);
+        }
+        ColumnReference[] dataColumns = IndexUtil.deserializeDataTableColumnsToJoin(scan);
+        useQualifierAsIndex = EncodedColumnsUtil.useQualifierAsIndex(EncodedColumnsUtil.getMinMaxQualifiersFromScan(scan));
+
+        /**
+         * Slow down the writes if the memstore size more than
+         * (hbase.hregion.memstore.block.multiplier - 1) times hbase.hregion.memstore.flush.size
+         * bytes. This avoids flush storm to hdfs for cases like index building where reads and
+         * write happen to all the table regions in the server.
+         */
+        blockingMemStoreSize = getBlockingMemstoreSize(region, conf) ;
+
+        buildLocalIndex = indexMaintainers != null && dataColumns==null && !localIndexScan;
+        if(buildLocalIndex) {
+            checkForLocalIndexColumnFamilies(region, indexMaintainers);
+        }
+        if (isDescRowKeyOrderUpgrade || isDelete || isUpsert
+                || (deleteCQ != null && deleteCF != null) || emptyCF != null || buildLocalIndex) {
+            needToWrite = true;
+            if((isUpsert && (targetHTable == null ||
+                    !targetHTable.getName().equals(region.getTableDesc().getTableName())))) {
+                needToWrite = false;
+            }
+            maxBatchSize = conf.getInt(MUTATE_BATCH_SIZE_ATTRIB, QueryServicesOptions.DEFAULT_MUTATE_BATCH_SIZE);
+            maxBatchSizeBytes = conf.getLong(MUTATE_BATCH_SIZE_BYTES_ATTRIB,
+                    QueryServicesOptions.DEFAULT_MUTATE_BATCH_SIZE_BYTES);
+        }
+        minMaxQualifiers = EncodedColumnsUtil.getMinMaxQualifiersFromScan(scan);
+        if (LOGGER.isDebugEnabled()) {
+            LOGGER.debug(LogUtil.addCustomAnnotations("Starting ungrouped coprocessor scan " + scan + " " + region.getRegionInfo(), ScanUtil.getCustomAnnotations(scan)));
+        }
+        useIndexProto = true;
+        indexMaintainersPtr = scan.getAttribute(PhoenixIndexCodec.INDEX_PROTO_MD);
+        // for backward compatiblity fall back to look by the old attribute
+        if (indexMaintainersPtr == null) {
+            indexMaintainersPtr = scan.getAttribute(PhoenixIndexCodec.INDEX_MD);
+            useIndexProto = false;
+        }
+
+        if (needToWrite) {
+            ungroupedAggregateRegionObserver.incrementScansReferenceCount();
+            incrScanRefCount = true;
+        }
+    }
+
+    @Override
+    public HRegionInfo getRegionInfo() {
+        return region.getRegionInfo();
+    }
+
+    @Override
+    public boolean isFilterDone() {
+        return false;
+    }
+
+    @Override
+    public void close() throws IOException {
+        if (needToWrite && incrScanRefCount) {
+            ungroupedAggregateRegionObserver.decrementScansReferenceCount();
+        }
+        try {
+            if (targetHTable != null) {
+                try {
+                    targetHTable.close();
+                } catch (IOException e) {
+                    LOGGER.error("Closing table: " + targetHTable + " failed: ", e);
+                }
+            }
+        } finally {
+            innerScanner.close();
+        }
+    }
+
+    boolean descRowKeyOrderUpgrade(List<Cell> results, ImmutableBytesWritable ptr,
+                                UngroupedAggregateRegionObserver.MutationList mutations) throws IOException {
+        Arrays.fill(values, null);
+        Cell firstKV = results.get(0);
+        RowKeySchema schema = projectedTable.getRowKeySchema();
+        int maxOffset = schema.iterator(firstKV.getRowArray(), firstKV.getRowOffset() + offset, firstKV.getRowLength(), ptr);
+        for (int i = 0; i < schema.getFieldCount(); i++) {
+            Boolean hasValue = schema.next(ptr, i, maxOffset);
+            if (hasValue == null) {
+                break;
+            }
+            ValueSchema.Field field = schema.getField(i);
+            if (field.getSortOrder() == SortOrder.DESC) {
+                // Special case for re-writing DESC ARRAY, as the actual byte value needs to change in this case
+                if (field.getDataType().isArrayType()) {
+                    field.getDataType().coerceBytes(ptr, null, field.getDataType(),
+                            field.getMaxLength(), field.getScale(), field.getSortOrder(),
+                            field.getMaxLength(), field.getScale(), field.getSortOrder(), true); // force to use correct separator byte
+                }
+                // Special case for re-writing DESC CHAR or DESC BINARY, to force the re-writing of trailing space characters
+                else if (field.getDataType() == PChar.INSTANCE || field.getDataType() == PBinary.INSTANCE) {
+                    int len = ptr.getLength();
+                    while (len > 0 && ptr.get()[ptr.getOffset() + len - 1] == StringUtil.SPACE_UTF8) {
+                        len--;
+                    }
+                    ptr.set(ptr.get(), ptr.getOffset(), len);
+                    // Special case for re-writing DESC FLOAT and DOUBLE, as they're not inverted like they should be (PHOENIX-2171)
+                } else if (field.getDataType() == PFloat.INSTANCE || field.getDataType() == PDouble.INSTANCE) {
+                    byte[] invertedBytes = SortOrder.invert(ptr.get(), ptr.getOffset(), ptr.getLength());
+                    ptr.set(invertedBytes);
+                }
+            } else if (field.getDataType() == PBinary.INSTANCE) {
+                // Remove trailing space characters so that the setValues call below will replace them
+                // with the correct zero byte character. Note this is somewhat dangerous as these
+                // could be legit, but I don't know what the alternative is.
+                int len = ptr.getLength();
+                while (len > 0 && ptr.get()[ptr.getOffset() + len - 1] == StringUtil.SPACE_UTF8) {
+                    len--;
+                }
+                ptr.set(ptr.get(), ptr.getOffset(), len);
+            }
+            values[i] = ptr.copyBytes();
+        }
+        writeToTable.newKey(ptr, values);
+        if (Bytes.compareTo(
+                firstKV.getRowArray(), firstKV.getRowOffset() + offset, firstKV.getRowLength(),
+                ptr.get(),ptr.getOffset() + offset,ptr.getLength()) == 0) {
+            return false;
+        }
+        byte[] newRow = ByteUtil.copyKeyBytesIfNecessary(ptr);
+        if (offset > 0) { // for local indexes (prepend region start key)
+            byte[] newRowWithOffset = new byte[offset + newRow.length];
+            System.arraycopy(firstKV.getRowArray(), firstKV.getRowOffset(), newRowWithOffset, 0, offset);;
+            System.arraycopy(newRow, 0, newRowWithOffset, offset, newRow.length);
+            newRow = newRowWithOffset;
+        }
+        byte[] oldRow = Bytes.copy(firstKV.getRowArray(), firstKV.getRowOffset(), firstKV.getRowLength());
+        for (Cell cell : results) {
+            // Copy existing cell but with new row key
+            Cell newCell = new KeyValue(newRow, 0, newRow.length,

Review comment:
       Ok

##########
File path: phoenix-core/src/main/java/org/apache/phoenix/coprocessor/UngroupedAggregateRegionScanner.java
##########
@@ -0,0 +1,646 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.phoenix.coprocessor;
+
+import static org.apache.phoenix.coprocessor.BaseScannerRegionObserver.LOCAL_INDEX_BUILD;
+import static org.apache.phoenix.coprocessor.BaseScannerRegionObserver.LOCAL_INDEX_BUILD_PROTO;
+import static org.apache.phoenix.coprocessor.BaseScannerRegionObserver.REPLAY_WRITES;
+import static org.apache.phoenix.coprocessor.BaseScannerRegionObserver.UPGRADE_DESC_ROW_KEY;
+import static org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver.checkForLocalIndexColumnFamilies;
+import static org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver.deserializeExpressions;
+import static org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver.deserializeTable;
+import static org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver.getBlockingMemstoreSize;
+import static org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver.setIndexAndTransactionProperties;
+import static org.apache.phoenix.query.QueryConstants.AGG_TIMESTAMP;
+import static org.apache.phoenix.query.QueryConstants.SINGLE_COLUMN;
+import static org.apache.phoenix.query.QueryConstants.SINGLE_COLUMN_FAMILY;
+import static org.apache.phoenix.query.QueryServices.MUTATE_BATCH_SIZE_ATTRIB;
+import static org.apache.phoenix.query.QueryServices.MUTATE_BATCH_SIZE_BYTES_ATTRIB;
+import static org.apache.phoenix.query.QueryServices.UNGROUPED_AGGREGATE_PAGE_SIZE_IN_MS;
+import static org.apache.phoenix.schema.PTableImpl.getColumnsToClone;
+
+import java.io.IOException;
+import java.sql.SQLException;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.List;
+import java.util.Set;
+
+import com.google.common.collect.Sets;
+import com.google.common.primitives.Ints;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.hbase.Cell;
+import org.apache.hadoop.hbase.CellUtil;
+import org.apache.hadoop.hbase.DoNotRetryIOException;
+import org.apache.hadoop.hbase.HRegionInfo;
+import org.apache.hadoop.hbase.KeyValue;
+import org.apache.hadoop.hbase.client.Delete;
+import org.apache.hadoop.hbase.client.HTable;
+import org.apache.hadoop.hbase.client.Mutation;
+import org.apache.hadoop.hbase.client.Put;
+import org.apache.hadoop.hbase.client.Scan;
+import org.apache.hadoop.hbase.coprocessor.ObserverContext;
+import org.apache.hadoop.hbase.coprocessor.RegionCoprocessorEnvironment;
+import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
+import org.apache.hadoop.hbase.regionserver.Region;
+import org.apache.hadoop.hbase.regionserver.RegionScanner;
+import org.apache.hadoop.hbase.util.Bytes;
+import org.apache.phoenix.cache.GlobalCache;
+import org.apache.phoenix.cache.TenantCache;
+import org.apache.phoenix.exception.DataExceedsCapacityException;
+import org.apache.phoenix.expression.Expression;
+import org.apache.phoenix.expression.aggregator.Aggregator;
+import org.apache.phoenix.expression.aggregator.Aggregators;
+import org.apache.phoenix.expression.aggregator.ServerAggregators;
+import org.apache.phoenix.hbase.index.covered.update.ColumnReference;
+import org.apache.phoenix.hbase.index.util.GenericKeyValueBuilder;
+import org.apache.phoenix.index.PhoenixIndexCodec;
+import org.apache.phoenix.memory.InsufficientMemoryException;
+import org.apache.phoenix.memory.MemoryManager;
+import org.apache.phoenix.query.QueryConstants;
+import org.apache.phoenix.query.QueryServicesOptions;
+import org.apache.phoenix.schema.PColumn;
+import org.apache.phoenix.schema.PRow;
+import org.apache.phoenix.schema.PTable;
+import org.apache.phoenix.schema.PTableImpl;
+import org.apache.phoenix.schema.PTableType;
+import org.apache.phoenix.schema.RowKeySchema;
+import org.apache.phoenix.schema.SortOrder;
+import org.apache.phoenix.schema.TableRef;
+import org.apache.phoenix.schema.ValueSchema;
+import org.apache.phoenix.schema.tuple.EncodedColumnQualiferCellsList;
+import org.apache.phoenix.schema.tuple.MultiKeyValueTuple;
+import org.apache.phoenix.schema.tuple.PositionBasedMultiKeyValueTuple;
+import org.apache.phoenix.schema.tuple.Tuple;
+import org.apache.phoenix.schema.types.PBinary;
+import org.apache.phoenix.schema.types.PChar;
+import org.apache.phoenix.schema.types.PDataType;
+import org.apache.phoenix.schema.types.PDouble;
+import org.apache.phoenix.schema.types.PFloat;
+import org.apache.phoenix.transaction.PhoenixTransactionContext;
+import org.apache.phoenix.transaction.PhoenixTransactionProvider;
+import org.apache.phoenix.transaction.TransactionFactory;
+import org.apache.phoenix.util.ByteUtil;
+import org.apache.phoenix.util.EncodedColumnsUtil;
+import org.apache.hadoop.hbase.util.Pair;
+import org.apache.phoenix.hbase.index.ValueGetter;
+import org.apache.phoenix.hbase.index.util.ImmutableBytesPtr;
+import org.apache.phoenix.index.IndexMaintainer;
+import org.apache.phoenix.util.EnvironmentEdgeManager;
+import org.apache.phoenix.util.ExpressionUtil;
+import org.apache.phoenix.util.IndexUtil;
+import org.apache.phoenix.util.KeyValueUtil;
+import org.apache.phoenix.util.LogUtil;
+import org.apache.phoenix.util.ScanUtil;
+import org.apache.phoenix.util.ServerUtil;
+import org.apache.phoenix.util.StringUtil;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+public class UngroupedAggregateRegionScanner extends BaseRegionScanner {
+
+    private static final Logger LOGGER = LoggerFactory.getLogger(UngroupedAggregateRegionScanner.class);
+
+    protected long pageSizeInMs = Long.MAX_VALUE;
+    protected int maxBatchSize = 0;
+    protected Scan scan;
+    protected RegionScanner innerScanner;
+    protected Region region;
+    private final UngroupedAggregateRegionObserver ungroupedAggregateRegionObserver;
+    final private RegionCoprocessorEnvironment env;
+    private final boolean useQualifierAsIndex;
+    boolean needToWrite = false;
+    final Pair<Integer, Integer> minMaxQualifiers;
+    byte[][] values = null;
+    PTable.QualifierEncodingScheme encodingScheme;
+    PTable writeToTable = null;
+    PTable projectedTable = null;
+    boolean isDescRowKeyOrderUpgrade;
+    final int offset;
+    boolean buildLocalIndex;
+    List<IndexMaintainer> indexMaintainers;
+    boolean isPKChanging = false;
+    long ts;
+    PhoenixTransactionProvider txnProvider = null;
+    UngroupedAggregateRegionObserver.MutationList indexMutations;
+    boolean isDelete = false;
+    byte[] replayMutations;
+    boolean isUpsert = false;
+    List<Expression> selectExpressions = null;
+    byte[] deleteCQ = null;
+    byte[] deleteCF = null;
+    byte[] emptyCF = null;
+    byte[] indexUUID;
+    byte[] txState;
+    byte[] clientVersionBytes;
+    long blockingMemStoreSize;
+    long maxBatchSizeBytes = 0L;
+    HTable targetHTable = null;
+    boolean incrScanRefCount = false;
+    byte[] indexMaintainersPtr;
+    boolean useIndexProto;
+
+    public UngroupedAggregateRegionScanner(final ObserverContext<RegionCoprocessorEnvironment> c,
+                                           final RegionScanner innerScanner, final Region region, final Scan scan,
+                                           final RegionCoprocessorEnvironment env,
+                                           final UngroupedAggregateRegionObserver ungroupedAggregateRegionObserver)
+            throws IOException, SQLException{
+        super(innerScanner);
+        this.env = env;
+        this.region = region;
+        this.scan = scan;
+        this.ungroupedAggregateRegionObserver = ungroupedAggregateRegionObserver;
+        this.innerScanner = innerScanner;
+        Configuration conf = env.getConfiguration();
+        if (scan.getAttribute(BaseScannerRegionObserver.SERVER_PAGING) != null) {
+            byte[] pageSizeFromScan =
+                    scan.getAttribute(BaseScannerRegionObserver.AGGREGATE_PAGE_ROWS);
+            if (pageSizeFromScan != null) {
+                pageSizeInMs = Bytes.toLong(pageSizeFromScan);
+            } else {
+                pageSizeInMs =
+                        conf.getLong(UNGROUPED_AGGREGATE_PAGE_SIZE_IN_MS,
+                                QueryServicesOptions.DEFAULT_UNGROUPED_AGGREGATE_PAGE_SIZE_IN_MS);
+            }
+        }
+        ts = scan.getTimeRange().getMax();
+        boolean localIndexScan = ScanUtil.isLocalIndex(scan);
+
+        encodingScheme = EncodedColumnsUtil.getQualifierEncodingScheme(scan);
+        int offsetToBe = 0;
+        if (localIndexScan) {
+            /*
+             * For local indexes, we need to set an offset on row key expressions to skip
+             * the region start key.
+             */
+            offsetToBe = region.getRegionInfo().getStartKey().length != 0 ? region.getRegionInfo().getStartKey().length :
+                    region.getRegionInfo().getEndKey().length;
+            ScanUtil.setRowKeyOffset(scan, offsetToBe);
+        }
+        offset = offsetToBe;
+
+        byte[] descRowKeyTableBytes = scan.getAttribute(UPGRADE_DESC_ROW_KEY);
+        isDescRowKeyOrderUpgrade = descRowKeyTableBytes != null;
+        if (isDescRowKeyOrderUpgrade) {
+            LOGGER.debug("Upgrading row key for " + region.getRegionInfo().getTable().getNameAsString());
+            projectedTable = deserializeTable(descRowKeyTableBytes);
+            try {
+                writeToTable = PTableImpl.builderWithColumns(projectedTable,
+                        getColumnsToClone(projectedTable))
+                        .setRowKeyOrderOptimizable(true)
+                        .build();
+            } catch (SQLException e) {
+                ServerUtil.throwIOException("Upgrade failed", e); // Impossible
+            }
+            values = new byte[projectedTable.getPKColumns().size()][];
+        }
+        boolean useProto = false;
+        byte[] localIndexBytes = scan.getAttribute(LOCAL_INDEX_BUILD_PROTO);
+        useProto = localIndexBytes != null;
+        if (localIndexBytes == null) {
+            localIndexBytes = scan.getAttribute(LOCAL_INDEX_BUILD);
+        }
+        indexMaintainers = localIndexBytes == null ? null : IndexMaintainer.deserialize(localIndexBytes, useProto);
+        indexMutations = localIndexBytes == null ? new UngroupedAggregateRegionObserver.MutationList() : new UngroupedAggregateRegionObserver.MutationList(1024);
+
+        replayMutations = scan.getAttribute(REPLAY_WRITES);
+        indexUUID = scan.getAttribute(PhoenixIndexCodec.INDEX_UUID);
+        txState = scan.getAttribute(BaseScannerRegionObserver.TX_STATE);
+        clientVersionBytes = scan.getAttribute(BaseScannerRegionObserver.CLIENT_VERSION);
+        if (txState != null) {
+            int clientVersion = clientVersionBytes == null ? ScanUtil.UNKNOWN_CLIENT_VERSION : Bytes.toInt(clientVersionBytes);
+            txnProvider = TransactionFactory.getTransactionProvider(txState, clientVersion);
+        }
+        byte[] upsertSelectTable = scan.getAttribute(BaseScannerRegionObserver.UPSERT_SELECT_TABLE);
+        if (upsertSelectTable != null) {
+            isUpsert = true;
+            projectedTable = deserializeTable(upsertSelectTable);
+            targetHTable = new HTable(ungroupedAggregateRegionObserver.getUpsertSelectConfig(),
+                    projectedTable.getPhysicalName().getBytes());
+            selectExpressions = deserializeExpressions(scan.getAttribute(BaseScannerRegionObserver.UPSERT_SELECT_EXPRS));
+            values = new byte[projectedTable.getPKColumns().size()][];
+            isPKChanging = ExpressionUtil.isPkPositionChanging(new TableRef(projectedTable), selectExpressions);
+        } else {
+            byte[] isDeleteAgg = scan.getAttribute(BaseScannerRegionObserver.DELETE_AGG);
+            isDelete = isDeleteAgg != null && Bytes.compareTo(PDataType.TRUE_BYTES, isDeleteAgg) == 0;
+            if (!isDelete) {
+                deleteCF = scan.getAttribute(BaseScannerRegionObserver.DELETE_CF);
+                deleteCQ = scan.getAttribute(BaseScannerRegionObserver.DELETE_CQ);
+            }
+            emptyCF = scan.getAttribute(BaseScannerRegionObserver.EMPTY_CF);
+        }
+        ColumnReference[] dataColumns = IndexUtil.deserializeDataTableColumnsToJoin(scan);
+        useQualifierAsIndex = EncodedColumnsUtil.useQualifierAsIndex(EncodedColumnsUtil.getMinMaxQualifiersFromScan(scan));
+
+        /**
+         * Slow down the writes if the memstore size more than
+         * (hbase.hregion.memstore.block.multiplier - 1) times hbase.hregion.memstore.flush.size
+         * bytes. This avoids flush storm to hdfs for cases like index building where reads and
+         * write happen to all the table regions in the server.
+         */
+        blockingMemStoreSize = getBlockingMemstoreSize(region, conf) ;
+
+        buildLocalIndex = indexMaintainers != null && dataColumns==null && !localIndexScan;
+        if(buildLocalIndex) {
+            checkForLocalIndexColumnFamilies(region, indexMaintainers);
+        }
+        if (isDescRowKeyOrderUpgrade || isDelete || isUpsert
+                || (deleteCQ != null && deleteCF != null) || emptyCF != null || buildLocalIndex) {
+            needToWrite = true;
+            if((isUpsert && (targetHTable == null ||
+                    !targetHTable.getName().equals(region.getTableDesc().getTableName())))) {
+                needToWrite = false;
+            }
+            maxBatchSize = conf.getInt(MUTATE_BATCH_SIZE_ATTRIB, QueryServicesOptions.DEFAULT_MUTATE_BATCH_SIZE);
+            maxBatchSizeBytes = conf.getLong(MUTATE_BATCH_SIZE_BYTES_ATTRIB,
+                    QueryServicesOptions.DEFAULT_MUTATE_BATCH_SIZE_BYTES);
+        }
+        minMaxQualifiers = EncodedColumnsUtil.getMinMaxQualifiersFromScan(scan);
+        if (LOGGER.isDebugEnabled()) {
+            LOGGER.debug(LogUtil.addCustomAnnotations("Starting ungrouped coprocessor scan " + scan + " " + region.getRegionInfo(), ScanUtil.getCustomAnnotations(scan)));
+        }
+        useIndexProto = true;
+        indexMaintainersPtr = scan.getAttribute(PhoenixIndexCodec.INDEX_PROTO_MD);
+        // for backward compatiblity fall back to look by the old attribute
+        if (indexMaintainersPtr == null) {
+            indexMaintainersPtr = scan.getAttribute(PhoenixIndexCodec.INDEX_MD);
+            useIndexProto = false;
+        }
+
+        if (needToWrite) {
+            ungroupedAggregateRegionObserver.incrementScansReferenceCount();
+            incrScanRefCount = true;
+        }
+    }
+
+    @Override
+    public HRegionInfo getRegionInfo() {
+        return region.getRegionInfo();
+    }
+
+    @Override
+    public boolean isFilterDone() {
+        return false;
+    }
+
+    @Override
+    public void close() throws IOException {
+        if (needToWrite && incrScanRefCount) {
+            ungroupedAggregateRegionObserver.decrementScansReferenceCount();
+        }
+        try {
+            if (targetHTable != null) {
+                try {
+                    targetHTable.close();
+                } catch (IOException e) {
+                    LOGGER.error("Closing table: " + targetHTable + " failed: ", e);
+                }
+            }
+        } finally {
+            innerScanner.close();
+        }
+    }
+
+    boolean descRowKeyOrderUpgrade(List<Cell> results, ImmutableBytesWritable ptr,
+                                UngroupedAggregateRegionObserver.MutationList mutations) throws IOException {
+        Arrays.fill(values, null);
+        Cell firstKV = results.get(0);
+        RowKeySchema schema = projectedTable.getRowKeySchema();
+        int maxOffset = schema.iterator(firstKV.getRowArray(), firstKV.getRowOffset() + offset, firstKV.getRowLength(), ptr);
+        for (int i = 0; i < schema.getFieldCount(); i++) {
+            Boolean hasValue = schema.next(ptr, i, maxOffset);
+            if (hasValue == null) {
+                break;
+            }
+            ValueSchema.Field field = schema.getField(i);
+            if (field.getSortOrder() == SortOrder.DESC) {
+                // Special case for re-writing DESC ARRAY, as the actual byte value needs to change in this case
+                if (field.getDataType().isArrayType()) {
+                    field.getDataType().coerceBytes(ptr, null, field.getDataType(),
+                            field.getMaxLength(), field.getScale(), field.getSortOrder(),
+                            field.getMaxLength(), field.getScale(), field.getSortOrder(), true); // force to use correct separator byte
+                }
+                // Special case for re-writing DESC CHAR or DESC BINARY, to force the re-writing of trailing space characters
+                else if (field.getDataType() == PChar.INSTANCE || field.getDataType() == PBinary.INSTANCE) {
+                    int len = ptr.getLength();
+                    while (len > 0 && ptr.get()[ptr.getOffset() + len - 1] == StringUtil.SPACE_UTF8) {
+                        len--;
+                    }
+                    ptr.set(ptr.get(), ptr.getOffset(), len);
+                    // Special case for re-writing DESC FLOAT and DOUBLE, as they're not inverted like they should be (PHOENIX-2171)
+                } else if (field.getDataType() == PFloat.INSTANCE || field.getDataType() == PDouble.INSTANCE) {
+                    byte[] invertedBytes = SortOrder.invert(ptr.get(), ptr.getOffset(), ptr.getLength());
+                    ptr.set(invertedBytes);
+                }
+            } else if (field.getDataType() == PBinary.INSTANCE) {
+                // Remove trailing space characters so that the setValues call below will replace them
+                // with the correct zero byte character. Note this is somewhat dangerous as these
+                // could be legit, but I don't know what the alternative is.
+                int len = ptr.getLength();
+                while (len > 0 && ptr.get()[ptr.getOffset() + len - 1] == StringUtil.SPACE_UTF8) {
+                    len--;
+                }
+                ptr.set(ptr.get(), ptr.getOffset(), len);
+            }
+            values[i] = ptr.copyBytes();
+        }
+        writeToTable.newKey(ptr, values);
+        if (Bytes.compareTo(
+                firstKV.getRowArray(), firstKV.getRowOffset() + offset, firstKV.getRowLength(),
+                ptr.get(),ptr.getOffset() + offset,ptr.getLength()) == 0) {
+            return false;
+        }
+        byte[] newRow = ByteUtil.copyKeyBytesIfNecessary(ptr);
+        if (offset > 0) { // for local indexes (prepend region start key)
+            byte[] newRowWithOffset = new byte[offset + newRow.length];
+            System.arraycopy(firstKV.getRowArray(), firstKV.getRowOffset(), newRowWithOffset, 0, offset);;
+            System.arraycopy(newRow, 0, newRowWithOffset, offset, newRow.length);
+            newRow = newRowWithOffset;
+        }
+        byte[] oldRow = Bytes.copy(firstKV.getRowArray(), firstKV.getRowOffset(), firstKV.getRowLength());
+        for (Cell cell : results) {
+            // Copy existing cell but with new row key
+            Cell newCell = new KeyValue(newRow, 0, newRow.length,
+                    cell.getFamilyArray(), cell.getFamilyOffset(), cell.getFamilyLength(),
+                    cell.getQualifierArray(), cell.getQualifierOffset(), cell.getQualifierLength(),
+                    cell.getTimestamp(), KeyValue.Type.codeToType(cell.getTypeByte()),
+                    cell.getValueArray(), cell.getValueOffset(), cell.getValueLength());
+            switch (KeyValue.Type.codeToType(cell.getTypeByte())) {
+                case Put:
+                    // If Put, point delete old Put
+                    Delete del = new Delete(oldRow);
+                    del.addDeleteMarker(new KeyValue(cell.getRowArray(), cell.getRowOffset(), cell.getRowLength(),

Review comment:
       Ok

##########
File path: phoenix-core/src/main/java/org/apache/phoenix/coprocessor/UngroupedAggregateRegionScanner.java
##########
@@ -0,0 +1,646 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.phoenix.coprocessor;
+
+import static org.apache.phoenix.coprocessor.BaseScannerRegionObserver.LOCAL_INDEX_BUILD;
+import static org.apache.phoenix.coprocessor.BaseScannerRegionObserver.LOCAL_INDEX_BUILD_PROTO;
+import static org.apache.phoenix.coprocessor.BaseScannerRegionObserver.REPLAY_WRITES;
+import static org.apache.phoenix.coprocessor.BaseScannerRegionObserver.UPGRADE_DESC_ROW_KEY;
+import static org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver.checkForLocalIndexColumnFamilies;
+import static org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver.deserializeExpressions;
+import static org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver.deserializeTable;
+import static org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver.getBlockingMemstoreSize;
+import static org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver.setIndexAndTransactionProperties;
+import static org.apache.phoenix.query.QueryConstants.AGG_TIMESTAMP;
+import static org.apache.phoenix.query.QueryConstants.SINGLE_COLUMN;
+import static org.apache.phoenix.query.QueryConstants.SINGLE_COLUMN_FAMILY;
+import static org.apache.phoenix.query.QueryServices.MUTATE_BATCH_SIZE_ATTRIB;
+import static org.apache.phoenix.query.QueryServices.MUTATE_BATCH_SIZE_BYTES_ATTRIB;
+import static org.apache.phoenix.query.QueryServices.UNGROUPED_AGGREGATE_PAGE_SIZE_IN_MS;
+import static org.apache.phoenix.schema.PTableImpl.getColumnsToClone;
+
+import java.io.IOException;
+import java.sql.SQLException;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.List;
+import java.util.Set;
+
+import com.google.common.collect.Sets;
+import com.google.common.primitives.Ints;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.hbase.Cell;
+import org.apache.hadoop.hbase.CellUtil;
+import org.apache.hadoop.hbase.DoNotRetryIOException;
+import org.apache.hadoop.hbase.HRegionInfo;
+import org.apache.hadoop.hbase.KeyValue;
+import org.apache.hadoop.hbase.client.Delete;
+import org.apache.hadoop.hbase.client.HTable;
+import org.apache.hadoop.hbase.client.Mutation;
+import org.apache.hadoop.hbase.client.Put;
+import org.apache.hadoop.hbase.client.Scan;
+import org.apache.hadoop.hbase.coprocessor.ObserverContext;
+import org.apache.hadoop.hbase.coprocessor.RegionCoprocessorEnvironment;
+import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
+import org.apache.hadoop.hbase.regionserver.Region;
+import org.apache.hadoop.hbase.regionserver.RegionScanner;
+import org.apache.hadoop.hbase.util.Bytes;
+import org.apache.phoenix.cache.GlobalCache;
+import org.apache.phoenix.cache.TenantCache;
+import org.apache.phoenix.exception.DataExceedsCapacityException;
+import org.apache.phoenix.expression.Expression;
+import org.apache.phoenix.expression.aggregator.Aggregator;
+import org.apache.phoenix.expression.aggregator.Aggregators;
+import org.apache.phoenix.expression.aggregator.ServerAggregators;
+import org.apache.phoenix.hbase.index.covered.update.ColumnReference;
+import org.apache.phoenix.hbase.index.util.GenericKeyValueBuilder;
+import org.apache.phoenix.index.PhoenixIndexCodec;
+import org.apache.phoenix.memory.InsufficientMemoryException;
+import org.apache.phoenix.memory.MemoryManager;
+import org.apache.phoenix.query.QueryConstants;
+import org.apache.phoenix.query.QueryServicesOptions;
+import org.apache.phoenix.schema.PColumn;
+import org.apache.phoenix.schema.PRow;
+import org.apache.phoenix.schema.PTable;
+import org.apache.phoenix.schema.PTableImpl;
+import org.apache.phoenix.schema.PTableType;
+import org.apache.phoenix.schema.RowKeySchema;
+import org.apache.phoenix.schema.SortOrder;
+import org.apache.phoenix.schema.TableRef;
+import org.apache.phoenix.schema.ValueSchema;
+import org.apache.phoenix.schema.tuple.EncodedColumnQualiferCellsList;
+import org.apache.phoenix.schema.tuple.MultiKeyValueTuple;
+import org.apache.phoenix.schema.tuple.PositionBasedMultiKeyValueTuple;
+import org.apache.phoenix.schema.tuple.Tuple;
+import org.apache.phoenix.schema.types.PBinary;
+import org.apache.phoenix.schema.types.PChar;
+import org.apache.phoenix.schema.types.PDataType;
+import org.apache.phoenix.schema.types.PDouble;
+import org.apache.phoenix.schema.types.PFloat;
+import org.apache.phoenix.transaction.PhoenixTransactionContext;
+import org.apache.phoenix.transaction.PhoenixTransactionProvider;
+import org.apache.phoenix.transaction.TransactionFactory;
+import org.apache.phoenix.util.ByteUtil;
+import org.apache.phoenix.util.EncodedColumnsUtil;
+import org.apache.hadoop.hbase.util.Pair;
+import org.apache.phoenix.hbase.index.ValueGetter;
+import org.apache.phoenix.hbase.index.util.ImmutableBytesPtr;
+import org.apache.phoenix.index.IndexMaintainer;
+import org.apache.phoenix.util.EnvironmentEdgeManager;
+import org.apache.phoenix.util.ExpressionUtil;
+import org.apache.phoenix.util.IndexUtil;
+import org.apache.phoenix.util.KeyValueUtil;
+import org.apache.phoenix.util.LogUtil;
+import org.apache.phoenix.util.ScanUtil;
+import org.apache.phoenix.util.ServerUtil;
+import org.apache.phoenix.util.StringUtil;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+public class UngroupedAggregateRegionScanner extends BaseRegionScanner {
+
+    private static final Logger LOGGER = LoggerFactory.getLogger(UngroupedAggregateRegionScanner.class);
+
+    protected long pageSizeInMs = Long.MAX_VALUE;
+    protected int maxBatchSize = 0;
+    protected Scan scan;
+    protected RegionScanner innerScanner;
+    protected Region region;
+    private final UngroupedAggregateRegionObserver ungroupedAggregateRegionObserver;
+    final private RegionCoprocessorEnvironment env;
+    private final boolean useQualifierAsIndex;
+    boolean needToWrite = false;
+    final Pair<Integer, Integer> minMaxQualifiers;
+    byte[][] values = null;
+    PTable.QualifierEncodingScheme encodingScheme;
+    PTable writeToTable = null;
+    PTable projectedTable = null;
+    boolean isDescRowKeyOrderUpgrade;
+    final int offset;
+    boolean buildLocalIndex;
+    List<IndexMaintainer> indexMaintainers;
+    boolean isPKChanging = false;
+    long ts;
+    PhoenixTransactionProvider txnProvider = null;
+    UngroupedAggregateRegionObserver.MutationList indexMutations;
+    boolean isDelete = false;
+    byte[] replayMutations;
+    boolean isUpsert = false;
+    List<Expression> selectExpressions = null;
+    byte[] deleteCQ = null;
+    byte[] deleteCF = null;
+    byte[] emptyCF = null;
+    byte[] indexUUID;
+    byte[] txState;
+    byte[] clientVersionBytes;
+    long blockingMemStoreSize;
+    long maxBatchSizeBytes = 0L;
+    HTable targetHTable = null;
+    boolean incrScanRefCount = false;
+    byte[] indexMaintainersPtr;
+    boolean useIndexProto;
+
+    public UngroupedAggregateRegionScanner(final ObserverContext<RegionCoprocessorEnvironment> c,
+                                           final RegionScanner innerScanner, final Region region, final Scan scan,
+                                           final RegionCoprocessorEnvironment env,
+                                           final UngroupedAggregateRegionObserver ungroupedAggregateRegionObserver)
+            throws IOException, SQLException{
+        super(innerScanner);
+        this.env = env;
+        this.region = region;
+        this.scan = scan;
+        this.ungroupedAggregateRegionObserver = ungroupedAggregateRegionObserver;
+        this.innerScanner = innerScanner;
+        Configuration conf = env.getConfiguration();
+        if (scan.getAttribute(BaseScannerRegionObserver.SERVER_PAGING) != null) {
+            byte[] pageSizeFromScan =
+                    scan.getAttribute(BaseScannerRegionObserver.AGGREGATE_PAGE_ROWS);
+            if (pageSizeFromScan != null) {
+                pageSizeInMs = Bytes.toLong(pageSizeFromScan);
+            } else {
+                pageSizeInMs =
+                        conf.getLong(UNGROUPED_AGGREGATE_PAGE_SIZE_IN_MS,
+                                QueryServicesOptions.DEFAULT_UNGROUPED_AGGREGATE_PAGE_SIZE_IN_MS);
+            }
+        }
+        ts = scan.getTimeRange().getMax();
+        boolean localIndexScan = ScanUtil.isLocalIndex(scan);
+
+        encodingScheme = EncodedColumnsUtil.getQualifierEncodingScheme(scan);
+        int offsetToBe = 0;
+        if (localIndexScan) {
+            /*
+             * For local indexes, we need to set an offset on row key expressions to skip
+             * the region start key.
+             */
+            offsetToBe = region.getRegionInfo().getStartKey().length != 0 ? region.getRegionInfo().getStartKey().length :
+                    region.getRegionInfo().getEndKey().length;
+            ScanUtil.setRowKeyOffset(scan, offsetToBe);
+        }
+        offset = offsetToBe;
+
+        byte[] descRowKeyTableBytes = scan.getAttribute(UPGRADE_DESC_ROW_KEY);
+        isDescRowKeyOrderUpgrade = descRowKeyTableBytes != null;
+        if (isDescRowKeyOrderUpgrade) {
+            LOGGER.debug("Upgrading row key for " + region.getRegionInfo().getTable().getNameAsString());
+            projectedTable = deserializeTable(descRowKeyTableBytes);
+            try {
+                writeToTable = PTableImpl.builderWithColumns(projectedTable,
+                        getColumnsToClone(projectedTable))
+                        .setRowKeyOrderOptimizable(true)
+                        .build();
+            } catch (SQLException e) {
+                ServerUtil.throwIOException("Upgrade failed", e); // Impossible
+            }
+            values = new byte[projectedTable.getPKColumns().size()][];
+        }
+        boolean useProto = false;
+        byte[] localIndexBytes = scan.getAttribute(LOCAL_INDEX_BUILD_PROTO);
+        useProto = localIndexBytes != null;
+        if (localIndexBytes == null) {
+            localIndexBytes = scan.getAttribute(LOCAL_INDEX_BUILD);
+        }
+        indexMaintainers = localIndexBytes == null ? null : IndexMaintainer.deserialize(localIndexBytes, useProto);
+        indexMutations = localIndexBytes == null ? new UngroupedAggregateRegionObserver.MutationList() : new UngroupedAggregateRegionObserver.MutationList(1024);
+
+        replayMutations = scan.getAttribute(REPLAY_WRITES);
+        indexUUID = scan.getAttribute(PhoenixIndexCodec.INDEX_UUID);
+        txState = scan.getAttribute(BaseScannerRegionObserver.TX_STATE);
+        clientVersionBytes = scan.getAttribute(BaseScannerRegionObserver.CLIENT_VERSION);
+        if (txState != null) {
+            int clientVersion = clientVersionBytes == null ? ScanUtil.UNKNOWN_CLIENT_VERSION : Bytes.toInt(clientVersionBytes);
+            txnProvider = TransactionFactory.getTransactionProvider(txState, clientVersion);
+        }
+        byte[] upsertSelectTable = scan.getAttribute(BaseScannerRegionObserver.UPSERT_SELECT_TABLE);
+        if (upsertSelectTable != null) {
+            isUpsert = true;
+            projectedTable = deserializeTable(upsertSelectTable);
+            targetHTable = new HTable(ungroupedAggregateRegionObserver.getUpsertSelectConfig(),
+                    projectedTable.getPhysicalName().getBytes());
+            selectExpressions = deserializeExpressions(scan.getAttribute(BaseScannerRegionObserver.UPSERT_SELECT_EXPRS));
+            values = new byte[projectedTable.getPKColumns().size()][];
+            isPKChanging = ExpressionUtil.isPkPositionChanging(new TableRef(projectedTable), selectExpressions);
+        } else {
+            byte[] isDeleteAgg = scan.getAttribute(BaseScannerRegionObserver.DELETE_AGG);
+            isDelete = isDeleteAgg != null && Bytes.compareTo(PDataType.TRUE_BYTES, isDeleteAgg) == 0;
+            if (!isDelete) {
+                deleteCF = scan.getAttribute(BaseScannerRegionObserver.DELETE_CF);
+                deleteCQ = scan.getAttribute(BaseScannerRegionObserver.DELETE_CQ);
+            }
+            emptyCF = scan.getAttribute(BaseScannerRegionObserver.EMPTY_CF);
+        }
+        ColumnReference[] dataColumns = IndexUtil.deserializeDataTableColumnsToJoin(scan);
+        useQualifierAsIndex = EncodedColumnsUtil.useQualifierAsIndex(EncodedColumnsUtil.getMinMaxQualifiersFromScan(scan));
+
+        /**
+         * Slow down the writes if the memstore size more than
+         * (hbase.hregion.memstore.block.multiplier - 1) times hbase.hregion.memstore.flush.size
+         * bytes. This avoids flush storm to hdfs for cases like index building where reads and
+         * write happen to all the table regions in the server.
+         */
+        blockingMemStoreSize = getBlockingMemstoreSize(region, conf) ;
+
+        buildLocalIndex = indexMaintainers != null && dataColumns==null && !localIndexScan;
+        if(buildLocalIndex) {
+            checkForLocalIndexColumnFamilies(region, indexMaintainers);
+        }
+        if (isDescRowKeyOrderUpgrade || isDelete || isUpsert
+                || (deleteCQ != null && deleteCF != null) || emptyCF != null || buildLocalIndex) {
+            needToWrite = true;
+            if((isUpsert && (targetHTable == null ||
+                    !targetHTable.getName().equals(region.getTableDesc().getTableName())))) {
+                needToWrite = false;
+            }
+            maxBatchSize = conf.getInt(MUTATE_BATCH_SIZE_ATTRIB, QueryServicesOptions.DEFAULT_MUTATE_BATCH_SIZE);
+            maxBatchSizeBytes = conf.getLong(MUTATE_BATCH_SIZE_BYTES_ATTRIB,
+                    QueryServicesOptions.DEFAULT_MUTATE_BATCH_SIZE_BYTES);
+        }
+        minMaxQualifiers = EncodedColumnsUtil.getMinMaxQualifiersFromScan(scan);
+        if (LOGGER.isDebugEnabled()) {
+            LOGGER.debug(LogUtil.addCustomAnnotations("Starting ungrouped coprocessor scan " + scan + " " + region.getRegionInfo(), ScanUtil.getCustomAnnotations(scan)));
+        }
+        useIndexProto = true;
+        indexMaintainersPtr = scan.getAttribute(PhoenixIndexCodec.INDEX_PROTO_MD);
+        // for backward compatiblity fall back to look by the old attribute
+        if (indexMaintainersPtr == null) {
+            indexMaintainersPtr = scan.getAttribute(PhoenixIndexCodec.INDEX_MD);
+            useIndexProto = false;
+        }
+
+        if (needToWrite) {
+            ungroupedAggregateRegionObserver.incrementScansReferenceCount();
+            incrScanRefCount = true;
+        }
+    }
+
+    @Override
+    public HRegionInfo getRegionInfo() {
+        return region.getRegionInfo();
+    }
+
+    @Override
+    public boolean isFilterDone() {
+        return false;
+    }
+
+    @Override
+    public void close() throws IOException {
+        if (needToWrite && incrScanRefCount) {
+            ungroupedAggregateRegionObserver.decrementScansReferenceCount();
+        }
+        try {
+            if (targetHTable != null) {
+                try {
+                    targetHTable.close();
+                } catch (IOException e) {
+                    LOGGER.error("Closing table: " + targetHTable + " failed: ", e);
+                }
+            }
+        } finally {
+            innerScanner.close();
+        }
+    }
+
+    boolean descRowKeyOrderUpgrade(List<Cell> results, ImmutableBytesWritable ptr,
+                                UngroupedAggregateRegionObserver.MutationList mutations) throws IOException {
+        Arrays.fill(values, null);
+        Cell firstKV = results.get(0);
+        RowKeySchema schema = projectedTable.getRowKeySchema();
+        int maxOffset = schema.iterator(firstKV.getRowArray(), firstKV.getRowOffset() + offset, firstKV.getRowLength(), ptr);
+        for (int i = 0; i < schema.getFieldCount(); i++) {
+            Boolean hasValue = schema.next(ptr, i, maxOffset);
+            if (hasValue == null) {
+                break;
+            }
+            ValueSchema.Field field = schema.getField(i);
+            if (field.getSortOrder() == SortOrder.DESC) {
+                // Special case for re-writing DESC ARRAY, as the actual byte value needs to change in this case
+                if (field.getDataType().isArrayType()) {
+                    field.getDataType().coerceBytes(ptr, null, field.getDataType(),
+                            field.getMaxLength(), field.getScale(), field.getSortOrder(),
+                            field.getMaxLength(), field.getScale(), field.getSortOrder(), true); // force to use correct separator byte
+                }
+                // Special case for re-writing DESC CHAR or DESC BINARY, to force the re-writing of trailing space characters
+                else if (field.getDataType() == PChar.INSTANCE || field.getDataType() == PBinary.INSTANCE) {
+                    int len = ptr.getLength();
+                    while (len > 0 && ptr.get()[ptr.getOffset() + len - 1] == StringUtil.SPACE_UTF8) {
+                        len--;
+                    }
+                    ptr.set(ptr.get(), ptr.getOffset(), len);
+                    // Special case for re-writing DESC FLOAT and DOUBLE, as they're not inverted like they should be (PHOENIX-2171)
+                } else if (field.getDataType() == PFloat.INSTANCE || field.getDataType() == PDouble.INSTANCE) {
+                    byte[] invertedBytes = SortOrder.invert(ptr.get(), ptr.getOffset(), ptr.getLength());
+                    ptr.set(invertedBytes);
+                }
+            } else if (field.getDataType() == PBinary.INSTANCE) {
+                // Remove trailing space characters so that the setValues call below will replace them
+                // with the correct zero byte character. Note this is somewhat dangerous as these
+                // could be legit, but I don't know what the alternative is.
+                int len = ptr.getLength();
+                while (len > 0 && ptr.get()[ptr.getOffset() + len - 1] == StringUtil.SPACE_UTF8) {
+                    len--;
+                }
+                ptr.set(ptr.get(), ptr.getOffset(), len);
+            }
+            values[i] = ptr.copyBytes();
+        }
+        writeToTable.newKey(ptr, values);
+        if (Bytes.compareTo(
+                firstKV.getRowArray(), firstKV.getRowOffset() + offset, firstKV.getRowLength(),
+                ptr.get(),ptr.getOffset() + offset,ptr.getLength()) == 0) {
+            return false;
+        }
+        byte[] newRow = ByteUtil.copyKeyBytesIfNecessary(ptr);
+        if (offset > 0) { // for local indexes (prepend region start key)
+            byte[] newRowWithOffset = new byte[offset + newRow.length];
+            System.arraycopy(firstKV.getRowArray(), firstKV.getRowOffset(), newRowWithOffset, 0, offset);;
+            System.arraycopy(newRow, 0, newRowWithOffset, offset, newRow.length);
+            newRow = newRowWithOffset;
+        }
+        byte[] oldRow = Bytes.copy(firstKV.getRowArray(), firstKV.getRowOffset(), firstKV.getRowLength());
+        for (Cell cell : results) {
+            // Copy existing cell but with new row key
+            Cell newCell = new KeyValue(newRow, 0, newRow.length,
+                    cell.getFamilyArray(), cell.getFamilyOffset(), cell.getFamilyLength(),
+                    cell.getQualifierArray(), cell.getQualifierOffset(), cell.getQualifierLength(),
+                    cell.getTimestamp(), KeyValue.Type.codeToType(cell.getTypeByte()),
+                    cell.getValueArray(), cell.getValueOffset(), cell.getValueLength());
+            switch (KeyValue.Type.codeToType(cell.getTypeByte())) {
+                case Put:
+                    // If Put, point delete old Put
+                    Delete del = new Delete(oldRow);
+                    del.addDeleteMarker(new KeyValue(cell.getRowArray(), cell.getRowOffset(), cell.getRowLength(),
+                            cell.getFamilyArray(), cell.getFamilyOffset(), cell.getFamilyLength(),
+                            cell.getQualifierArray(), cell.getQualifierOffset(),
+                            cell.getQualifierLength(), cell.getTimestamp(), KeyValue.Type.Delete,
+                            ByteUtil.EMPTY_BYTE_ARRAY, 0, 0));
+                    mutations.add(del);
+
+                    Put put = new Put(newRow);
+                    put.add(newCell);
+                    mutations.add(put);
+                    break;
+                case Delete:
+                case DeleteColumn:
+                case DeleteFamily:
+                case DeleteFamilyVersion:
+                    Delete delete = new Delete(newRow);
+                    delete.addDeleteMarker(newCell);
+                    mutations.add(delete);
+                    break;
+            }
+        }
+        return true;
+    }
+
+    void buildLocalIndex(Tuple result, List<Cell> results, ImmutableBytesWritable ptr,
+                         UngroupedAggregateRegionObserver.MutationList mutations) throws IOException {
+        for (IndexMaintainer maintainer : indexMaintainers) {
+            if (!results.isEmpty()) {
+                result.getKey(ptr);
+                ValueGetter valueGetter =
+                        maintainer.createGetterFromKeyValues(
+                                ImmutableBytesPtr.copyBytesIfNecessary(ptr),
+                                results);
+                Put put = maintainer.buildUpdateMutation(GenericKeyValueBuilder.INSTANCE,
+                        valueGetter, ptr, results.get(0).getTimestamp(),
+                        env.getRegion().getRegionInfo().getStartKey(),
+                        env.getRegion().getRegionInfo().getEndKey());
+
+                if (txnProvider != null) {
+                    put = txnProvider.markPutAsCommitted(put, ts, ts);
+                }
+                indexMutations.add(put);
+            }
+        }
+        result.setKeyValues(results);
+    }
+    void deleteRow(List<Cell> results, UngroupedAggregateRegionObserver.MutationList mutations) {
+        // FIXME: the version of the Delete constructor without the lock
+        // args was introduced in 0.94.4, thus if we try to use it here
+        // we can no longer use the 0.94.2 version of the client.
+        Cell firstKV = results.get(0);
+        Delete delete = new Delete(firstKV.getRowArray(),
+                firstKV.getRowOffset(), firstKV.getRowLength(),ts);
+        if (replayMutations != null) {
+            delete.setAttribute(REPLAY_WRITES, replayMutations);
+        }
+        mutations.add(delete);
+        // force tephra to ignore this deletes
+        delete.setAttribute(PhoenixTransactionContext.TX_ROLLBACK_ATTRIBUTE_KEY, new byte[0]);
+    }
+
+    void deleteCForQ(Tuple result, List<Cell> results, UngroupedAggregateRegionObserver.MutationList mutations) {
+        // No need to search for delete column, since we project only it
+        // if no empty key value is being set
+        if (emptyCF == null ||
+                result.getValue(deleteCF, deleteCQ) != null) {
+            Delete delete = new Delete(results.get(0).getRowArray(),
+                    results.get(0).getRowOffset(),
+                    results.get(0).getRowLength());
+            delete.deleteColumns(deleteCF,  deleteCQ, ts);
+            // force tephra to ignore this deletes
+            delete.setAttribute(PhoenixTransactionContext.TX_ROLLBACK_ATTRIBUTE_KEY, new byte[0]);
+            mutations.add(delete);
+        }
+    }
+    void upsert(Tuple result, ImmutableBytesWritable ptr, UngroupedAggregateRegionObserver.MutationList mutations) {
+        Arrays.fill(values, null);
+        int bucketNumOffset = 0;
+        if (projectedTable.getBucketNum() != null) {
+            values[0] = new byte[] { 0 };
+            bucketNumOffset = 1;
+        }
+        int i = bucketNumOffset;
+        List<PColumn> projectedColumns = projectedTable.getColumns();
+        for (; i < projectedTable.getPKColumns().size(); i++) {
+            Expression expression = selectExpressions.get(i - bucketNumOffset);
+            if (expression.evaluate(result, ptr)) {
+                values[i] = ptr.copyBytes();
+                // If SortOrder from expression in SELECT doesn't match the
+                // column being projected into then invert the bits.
+                if (expression.getSortOrder() !=
+                        projectedColumns.get(i).getSortOrder()) {
+                    SortOrder.invert(values[i], 0, values[i], 0,
+                            values[i].length);
+                }
+            }else{
+                values[i] = ByteUtil.EMPTY_BYTE_ARRAY;
+            }
+        }
+        projectedTable.newKey(ptr, values);
+        PRow row = projectedTable.newRow(GenericKeyValueBuilder.INSTANCE, ts, ptr, false);
+        for (; i < projectedColumns.size(); i++) {
+            Expression expression = selectExpressions.get(i - bucketNumOffset);
+            if (expression.evaluate(result, ptr)) {
+                PColumn column = projectedColumns.get(i);
+                if (!column.getDataType().isSizeCompatible(ptr, null,
+                        expression.getDataType(), expression.getSortOrder(),
+                        expression.getMaxLength(), expression.getScale(),
+                        column.getMaxLength(), column.getScale())) {
+                    throw new DataExceedsCapacityException(
+                            column.getDataType(), column.getMaxLength(),
+                            column.getScale(), column.getName().getString(), ptr);
+                }
+                column.getDataType().coerceBytes(ptr, null,
+                        expression.getDataType(), expression.getMaxLength(),
+                        expression.getScale(), expression.getSortOrder(),
+                        column.getMaxLength(), column.getScale(),
+                        column.getSortOrder(), projectedTable.rowKeyOrderOptimizable());
+                byte[] bytes = ByteUtil.copyKeyBytesIfNecessary(ptr);
+                row.setValue(column, bytes);
+            }
+        }
+        for (Mutation mutation : row.toRowMutations()) {
+            if (replayMutations != null) {
+                mutation.setAttribute(REPLAY_WRITES, replayMutations);
+            } else if (txnProvider != null && projectedTable.getType() == PTableType.INDEX) {
+                mutation = txnProvider.markPutAsCommitted((Put)mutation, ts, ts);
+            }
+            mutations.add(mutation);
+        }
+        for (i = 0; i < selectExpressions.size(); i++) {
+            selectExpressions.get(i).reset();
+        }
+    }
+
+    void insertEmptyKeyValue(List<Cell> results, UngroupedAggregateRegionObserver.MutationList mutations) {
+        Set<Long> timeStamps =
+                Sets.newHashSetWithExpectedSize(results.size());
+        for (Cell kv : results) {
+            long kvts = kv.getTimestamp();
+            if (!timeStamps.contains(kvts)) {
+                Put put = new Put(kv.getRowArray(), kv.getRowOffset(),
+                        kv.getRowLength());
+                put.addColumn(emptyCF, QueryConstants.EMPTY_COLUMN_BYTES, kvts,
+                        ByteUtil.EMPTY_BYTE_ARRAY);
+                mutations.add(put);
+            }
+        }
+    }
+    @Override
+    public boolean next(List<Cell> resultsToReturn) throws IOException {
+        boolean hasMore;
+        long startTime = EnvironmentEdgeManager.currentTimeMillis();
+        Configuration conf = env.getConfiguration();
+        final TenantCache tenantCache = GlobalCache.getTenantCache(env, ScanUtil.getTenantId(scan));
+        try (MemoryManager.MemoryChunk em = tenantCache.getMemoryManager().allocate(0)) {
+            Aggregators aggregators = ServerAggregators.deserialize(
+                    scan.getAttribute(BaseScannerRegionObserver.AGGREGATORS), conf, em);
+            Aggregator[] rowAggregators = aggregators.getAggregators();
+            aggregators.reset(rowAggregators);
+            Cell lastCell = null;
+            boolean hasAny = false;
+            ImmutableBytesWritable ptr = new ImmutableBytesWritable();
+            Tuple result = useQualifierAsIndex ? new PositionBasedMultiKeyValueTuple() : new MultiKeyValueTuple();
+            UngroupedAggregateRegionObserver.MutationList mutations = new UngroupedAggregateRegionObserver.MutationList();
+            if (isDescRowKeyOrderUpgrade || isDelete || isUpsert
+                    || (deleteCQ != null && deleteCF != null) || emptyCF != null || buildLocalIndex) {
+                mutations = new UngroupedAggregateRegionObserver.MutationList(Ints.saturatedCast(maxBatchSize + maxBatchSize / 10));
+            }
+            region.startRegionOperation();
+            try {
+                synchronized (innerScanner) {
+                    do {
+                        ungroupedAggregateRegionObserver.checkForRegionClosing();
+                        List<Cell> results = useQualifierAsIndex ? new EncodedColumnQualiferCellsList(minMaxQualifiers.getFirst(), minMaxQualifiers.getSecond(), encodingScheme) : new ArrayList<Cell>();
+                        // Results are potentially returned even when the return value of s.next is false
+                        // since this is an indication of whether or not there are more values after the
+                        // ones returned
+                        hasMore = innerScanner.nextRaw(results);
+                        if (!results.isEmpty()) {
+                            lastCell = results.get(0);
+                            result.setKeyValues(results);
+                            if (isDescRowKeyOrderUpgrade) {
+                                if (!descRowKeyOrderUpgrade(results, ptr, mutations)) {
+                                    continue;
+                                }
+                            } else if (buildLocalIndex) {
+                                buildLocalIndex(result, results, ptr, mutations);
+                            } else if (isDelete) {
+                                deleteRow(results, mutations);
+                            } else if (isUpsert) {
+                                upsert(result, ptr, mutations);
+                            } else if (deleteCF != null && deleteCQ != null) {
+                                deleteCForQ(result, results, mutations);
+                            }
+                            if (emptyCF != null) {
+                                /*
+                                 * If we've specified an emptyCF, then we need to insert an empty
+                                 * key value "retroactively" for any key value that is visible at
+                                 * the timestamp that the DDL was issued. Key values that are not
+                                 * visible at this timestamp will not ever be projected up to
+                                 * scans past this timestamp, so don't need to be considered.
+                                 * We insert one empty key value per row per timestamp.
+                                 */
+                                insertEmptyKeyValue(results, mutations);
+                            }
+                            if (ServerUtil.readyToCommit(mutations.size(), mutations.byteSize(), maxBatchSize, maxBatchSizeBytes)) {
+                                ungroupedAggregateRegionObserver.commit(region, mutations, indexUUID, blockingMemStoreSize, indexMaintainersPtr,
+                                        txState, targetHTable, useIndexProto, isPKChanging, clientVersionBytes);
+                                mutations.clear();
+                            }
+                            // Commit in batches based on UPSERT_BATCH_SIZE_BYTES_ATTRIB in config
+
+                            if (ServerUtil.readyToCommit(indexMutations.size(), indexMutations.byteSize(), maxBatchSize, maxBatchSizeBytes)) {
+                                setIndexAndTransactionProperties(indexMutations, indexUUID, indexMaintainersPtr, txState, clientVersionBytes, useIndexProto);
+                                ungroupedAggregateRegionObserver.commitBatch(region, indexMutations, blockingMemStoreSize);
+                                indexMutations.clear();
+                            }
+                            aggregators.aggregate(rowAggregators, result);
+                            hasAny = true;
+                        }
+                    } while (hasMore && (EnvironmentEdgeManager.currentTimeMillis() - startTime) < pageSizeInMs);
+
+                    if (!mutations.isEmpty()) {
+                        ungroupedAggregateRegionObserver.commit(region, mutations, indexUUID, blockingMemStoreSize, indexMaintainersPtr, txState,
+                                targetHTable, useIndexProto, isPKChanging, clientVersionBytes);
+                        mutations.clear();
+                    }
+                    if (!indexMutations.isEmpty()) {
+                        ungroupedAggregateRegionObserver.commitBatch(region, indexMutations, blockingMemStoreSize);
+                        indexMutations.clear();
+                    }
+                }
+            } catch (InsufficientMemoryException e) {
+                throw new DoNotRetryIOException(e);
+            } catch (DataExceedsCapacityException e) {
+                throw new DoNotRetryIOException(e.getMessage(), e);
+            } catch (Throwable e) {
+                LOGGER.error("Exception in UngroupedAggreagteRegionScanner for region "
+                        + region.getRegionInfo().getRegionNameAsString(), e);
+                throw e;
+            }
+            KeyValue keyValue;

Review comment:
       Ok




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [phoenix] gjacoby126 commented on pull request #936: PHOENIX-5998 Paged server side ungrouped aggregate operations

Posted by GitBox <gi...@apache.org>.

gjacoby126 commented on pull request #936:
URL: https://github.com/apache/phoenix/pull/936#issuecomment-716864823


   @kadirozde - UpsertSelectIT seems to have crashed in the last test run. (And timeouts for SequencePointInTimeIT and IndexToolForNonTxGlobalIndexIT). Have these passed elsewhere?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [phoenix] stoty commented on pull request #936: PHOENIX-5998 Paged server side ungrouped aggregate operations

Posted by GitBox <gi...@apache.org>.

stoty commented on pull request #936:
URL: https://github.com/apache/phoenix/pull/936#issuecomment-726601686


   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |:----:|----------:|--------:|:--------|
   | +0 :ok: |  reexec  |   0m 30s |  Docker mode activated.  |
   ||| _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  No case conflicting files found.  |
   | +1 :green_heart: |  hbaseanti  |   0m  0s |  Patch does not have any anti-patterns.  |
   | +1 :green_heart: |  @author  |   0m  0s |  The patch does not contain any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  The patch appears to include 1 new or modified test files.  |
   ||| _ 4.x Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  10m 38s |  4.x passed  |
   | +1 :green_heart: |  compile  |   0m 54s |  4.x passed  |
   | +1 :green_heart: |  checkstyle  |   1m 20s |  4.x passed  |
   | +1 :green_heart: |  javadoc  |   0m 46s |  4.x passed  |
   | +0 :ok: |  spotbugs  |   2m 54s |  phoenix-core in 4.x has 946 extant spotbugs warnings.  |
   ||| _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   5m 15s |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 55s |  the patch passed  |
   | +1 :green_heart: |  javac  |   0m 55s |  the patch passed  |
   | -1 :x: |  checkstyle  |   1m 23s |  phoenix-core: The patch generated 283 new + 1785 unchanged - 230 fixed = 2068 total (was 2015)  |
   | +1 :green_heart: |  whitespace  |   0m  0s |  The patch has no whitespace issues.  |
   | +1 :green_heart: |  javadoc  |   0m 43s |  the patch passed  |
   | -1 :x: |  spotbugs  |   3m  8s |  phoenix-core generated 1 new + 945 unchanged - 1 fixed = 946 total (was 946)  |
   ||| _ Other Tests _ |
   | -1 :x: |  unit  | 139m  4s |  phoenix-core in the patch failed.  |
   | +1 :green_heart: |  asflicense  |   0m 39s |  The patch does not generate ASF License warnings.  |
   |  |   | 171m  0s |   |
   
   
   | Reason | Tests |
   |-------:|:------|
   | FindBugs | module:phoenix-core |
   |  |  Switch statement found in org.apache.phoenix.coprocessor.UngroupedAggregateRegionScanner.descRowKeyOrderUpgrade(List, ImmutableBytesWritable, UngroupedAggregateRegionObserver$MutationList) where default case is missing  At UngroupedAggregateRegionScanner.java:ImmutableBytesWritable, UngroupedAggregateRegionObserver$MutationList) where default case is missing  At UngroupedAggregateRegionScanner.java:[lines 379-398] |
   | Failed junit tests | phoenix.end2end.StatsEnabledSplitSystemCatalogIT |
   |   | phoenix.end2end.index.GlobalImmutableNonTxIndexIT |
   |   | phoenix.end2end.UpsertSelectIT |
   |   | phoenix.end2end.InQueryIT |
   |   | phoenix.end2end.index.ChildViewsUseParentViewIndexIT |
   |   | phoenix.end2end.BackwardCompatibilityIT |
   |   | phoenix.end2end.index.AlterIndexIT |
   
   
   | Subsystem | Report/Notes |
   |----------:|:-------------|
   | Docker | ClientAPI=1.40 ServerAPI=1.40 base: https://ci-hadoop.apache.org/job/Phoenix/job/Phoenix-PreCommit-GitHub-PR/job/PR-936/11/artifact/yetus-general-check/output/Dockerfile |
   | GITHUB PR | https://github.com/apache/phoenix/pull/936 |
   | Optional Tests | dupname asflicense javac javadoc unit spotbugs hbaseanti checkstyle compile |
   | uname | Linux 4c51add2915a 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev/phoenix-personality.sh |
   | git revision | 4.x / 565b0ea |
   | Default Java | Private Build-1.8.0_242-8u242-b08-0ubuntu3~16.04-b08 |
   | checkstyle | https://ci-hadoop.apache.org/job/Phoenix/job/Phoenix-PreCommit-GitHub-PR/job/PR-936/11/artifact/yetus-general-check/output/diff-checkstyle-phoenix-core.txt |
   | spotbugs | https://ci-hadoop.apache.org/job/Phoenix/job/Phoenix-PreCommit-GitHub-PR/job/PR-936/11/artifact/yetus-general-check/output/new-spotbugs-phoenix-core.html |
   | unit | https://ci-hadoop.apache.org/job/Phoenix/job/Phoenix-PreCommit-GitHub-PR/job/PR-936/11/artifact/yetus-general-check/output/patch-unit-phoenix-core.txt |
   |  Test Results | https://ci-hadoop.apache.org/job/Phoenix/job/Phoenix-PreCommit-GitHub-PR/job/PR-936/11/testReport/ |
   | Max. process+thread count | 6470 (vs. ulimit of 30000) |
   | modules | C: phoenix-core U: phoenix-core |
   | Console output | https://ci-hadoop.apache.org/job/Phoenix/job/Phoenix-PreCommit-GitHub-PR/job/PR-936/11/console |
   | versions | git=2.7.4 maven=3.3.9 spotbugs=4.1.3 |
   | Powered by | Apache Yetus 0.12.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [phoenix] kadirozde commented on pull request #936: PHOENIX-5998 Paged server side ungrouped aggregate operations

Posted by GitBox <gi...@apache.org>.

kadirozde commented on pull request #936:
URL: https://github.com/apache/phoenix/pull/936#issuecomment-724943743


   @ChinmaySKulkarni, Thank you for reviewing this and asking very good questions. I just had a conversation on this with Sukumar Maddineni and we agreed that the limit on the paging size should be in terms of the time to spend on the server side instead of the number of rows or bytes. If you agree too,  I would like to update the PR accordingly. Also, I will write a doc that will cover this and the other two Jiras (6207 and 6211).  


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [phoenix] kadirozde commented on a change in pull request #936: PHOENIX-5998 Paged server side ungrouped aggregate operations

Posted by GitBox <gi...@apache.org>.

kadirozde commented on a change in pull request #936:
URL: https://github.com/apache/phoenix/pull/936#discussion_r520748597



##########
File path: phoenix-core/src/main/java/org/apache/phoenix/coprocessor/UngroupedAggregateRegionScanner.java
##########
@@ -0,0 +1,645 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.phoenix.coprocessor;
+
+import static org.apache.phoenix.coprocessor.BaseScannerRegionObserver.LOCAL_INDEX_BUILD;
+import static org.apache.phoenix.coprocessor.BaseScannerRegionObserver.LOCAL_INDEX_BUILD_PROTO;
+import static org.apache.phoenix.coprocessor.BaseScannerRegionObserver.REPLAY_WRITES;
+import static org.apache.phoenix.coprocessor.BaseScannerRegionObserver.UPGRADE_DESC_ROW_KEY;
+import static org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver.checkForLocalIndexColumnFamilies;
+import static org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver.deserializeExpressions;
+import static org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver.deserializeTable;
+import static org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver.getBlockingMemstoreSize;
+import static org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver.setIndexAndTransactionProperties;
+import static org.apache.phoenix.query.QueryConstants.AGG_TIMESTAMP;
+import static org.apache.phoenix.query.QueryConstants.SINGLE_COLUMN;
+import static org.apache.phoenix.query.QueryConstants.SINGLE_COLUMN_FAMILY;
+import static org.apache.phoenix.query.QueryServices.MUTATE_BATCH_SIZE_ATTRIB;
+import static org.apache.phoenix.query.QueryServices.MUTATE_BATCH_SIZE_BYTES_ATTRIB;
+import static org.apache.phoenix.query.QueryServices.UNGROUPED_AGGREGATE_PAGE_SIZE_IN_ROWS;
+import static org.apache.phoenix.schema.PTableImpl.getColumnsToClone;
+
+import java.io.IOException;
+import java.sql.SQLException;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.List;
+import java.util.Set;
+
+import com.google.common.collect.Sets;
+import com.google.common.primitives.Ints;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.hbase.Cell;
+import org.apache.hadoop.hbase.CellUtil;
+import org.apache.hadoop.hbase.DoNotRetryIOException;
+import org.apache.hadoop.hbase.HRegionInfo;
+import org.apache.hadoop.hbase.KeyValue;
+import org.apache.hadoop.hbase.client.Delete;
+import org.apache.hadoop.hbase.client.HTable;
+import org.apache.hadoop.hbase.client.Mutation;
+import org.apache.hadoop.hbase.client.Put;
+import org.apache.hadoop.hbase.client.Scan;
+import org.apache.hadoop.hbase.coprocessor.ObserverContext;
+import org.apache.hadoop.hbase.coprocessor.RegionCoprocessorEnvironment;
+import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
+import org.apache.hadoop.hbase.regionserver.Region;
+import org.apache.hadoop.hbase.regionserver.RegionScanner;
+import org.apache.hadoop.hbase.util.Bytes;
+import org.apache.phoenix.cache.GlobalCache;
+import org.apache.phoenix.cache.TenantCache;
+import org.apache.phoenix.exception.DataExceedsCapacityException;
+import org.apache.phoenix.expression.Expression;
+import org.apache.phoenix.expression.aggregator.Aggregator;
+import org.apache.phoenix.expression.aggregator.Aggregators;
+import org.apache.phoenix.expression.aggregator.ServerAggregators;
+import org.apache.phoenix.hbase.index.covered.update.ColumnReference;
+import org.apache.phoenix.hbase.index.util.GenericKeyValueBuilder;
+import org.apache.phoenix.index.PhoenixIndexCodec;
+import org.apache.phoenix.memory.InsufficientMemoryException;
+import org.apache.phoenix.memory.MemoryManager;
+import org.apache.phoenix.query.QueryConstants;
+import org.apache.phoenix.query.QueryServicesOptions;
+import org.apache.phoenix.schema.PColumn;
+import org.apache.phoenix.schema.PRow;
+import org.apache.phoenix.schema.PTable;
+import org.apache.phoenix.schema.PTableImpl;
+import org.apache.phoenix.schema.PTableType;
+import org.apache.phoenix.schema.RowKeySchema;
+import org.apache.phoenix.schema.SortOrder;
+import org.apache.phoenix.schema.TableRef;
+import org.apache.phoenix.schema.ValueSchema;
+import org.apache.phoenix.schema.tuple.EncodedColumnQualiferCellsList;
+import org.apache.phoenix.schema.tuple.MultiKeyValueTuple;
+import org.apache.phoenix.schema.tuple.PositionBasedMultiKeyValueTuple;
+import org.apache.phoenix.schema.tuple.Tuple;
+import org.apache.phoenix.schema.types.PBinary;
+import org.apache.phoenix.schema.types.PChar;
+import org.apache.phoenix.schema.types.PDataType;
+import org.apache.phoenix.schema.types.PDouble;
+import org.apache.phoenix.schema.types.PFloat;
+import org.apache.phoenix.transaction.PhoenixTransactionContext;
+import org.apache.phoenix.transaction.PhoenixTransactionProvider;
+import org.apache.phoenix.transaction.TransactionFactory;
+import org.apache.phoenix.util.ByteUtil;
+import org.apache.phoenix.util.EncodedColumnsUtil;
+import org.apache.hadoop.hbase.util.Pair;
+import org.apache.phoenix.hbase.index.ValueGetter;
+import org.apache.phoenix.hbase.index.util.ImmutableBytesPtr;
+import org.apache.phoenix.index.IndexMaintainer;
+import org.apache.phoenix.util.ExpressionUtil;
+import org.apache.phoenix.util.IndexUtil;
+import org.apache.phoenix.util.KeyValueUtil;
+import org.apache.phoenix.util.LogUtil;
+import org.apache.phoenix.util.ScanUtil;
+import org.apache.phoenix.util.ServerUtil;
+import org.apache.phoenix.util.StringUtil;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+public class UngroupedAggregateRegionScanner extends BaseRegionScanner {
+
+    private static final Logger LOGGER = LoggerFactory.getLogger(UngroupedAggregateRegionScanner.class);
+
+    protected long pageSizeInRows = Long.MAX_VALUE;
+    protected int maxBatchSize = 0;
+    protected Scan scan;
+    protected RegionScanner innerScanner;
+    protected Region region;
+    private final UngroupedAggregateRegionObserver ungroupedAggregateRegionObserver;
+    final private RegionCoprocessorEnvironment env;
+    private final boolean useQualifierAsIndex;
+    boolean needToWrite = false;
+    final Pair<Integer, Integer> minMaxQualifiers;
+    byte[][] values = null;
+    PTable.QualifierEncodingScheme encodingScheme;
+    PTable writeToTable = null;
+    PTable projectedTable = null;
+    boolean isDescRowKeyOrderUpgrade;
+    final int offset;
+    boolean buildLocalIndex;
+    List<IndexMaintainer> indexMaintainers;
+    boolean isPKChanging = false;
+    long ts;
+    PhoenixTransactionProvider txnProvider = null;
+    UngroupedAggregateRegionObserver.MutationList indexMutations;
+    boolean isDelete = false;
+    byte[] replayMutations;
+    boolean isUpsert = false;
+    List<Expression> selectExpressions = null;
+    byte[] deleteCQ = null;
+    byte[] deleteCF = null;
+    byte[] emptyCF = null;
+    byte[] indexUUID;
+    byte[] txState;
+    byte[] clientVersionBytes;
+    long blockingMemStoreSize;
+    long maxBatchSizeBytes = 0L;
+    HTable targetHTable = null;
+    boolean incrScanRefCount = false;
+    byte[] indexMaintainersPtr;
+    boolean useIndexProto;
+
+    public UngroupedAggregateRegionScanner(final ObserverContext<RegionCoprocessorEnvironment> c,
+                                           final RegionScanner innerScanner, final Region region, final Scan scan,
+                                           final RegionCoprocessorEnvironment env,
+                                           final UngroupedAggregateRegionObserver ungroupedAggregateRegionObserver)
+            throws IOException, SQLException{
+        super(innerScanner);
+        this.env = env;
+        this.region = region;
+        this.scan = scan;
+        this.ungroupedAggregateRegionObserver = ungroupedAggregateRegionObserver;
+        this.innerScanner = innerScanner;
+        Configuration conf = env.getConfiguration();
+        if (scan.getAttribute(BaseScannerRegionObserver.SERVER_PAGING) != null) {
+            byte[] pageSizeFromScan =
+                    scan.getAttribute(BaseScannerRegionObserver.AGGREGATE_PAGE_ROWS);
+            if (pageSizeFromScan != null) {
+                pageSizeInRows = Bytes.toLong(pageSizeFromScan);
+            } else {
+                pageSizeInRows =
+                        conf.getLong(UNGROUPED_AGGREGATE_PAGE_SIZE_IN_ROWS,
+                                QueryServicesOptions.DEFAULT_UNGROUPED_AGGREGATE_PAGE_SIZE_IN_ROWS);
+            }
+        }
+        ts = scan.getTimeRange().getMax();
+        boolean localIndexScan = ScanUtil.isLocalIndex(scan);
+
+        encodingScheme = EncodedColumnsUtil.getQualifierEncodingScheme(scan);
+        int offsetToBe = 0;
+        if (localIndexScan) {
+            /*
+             * For local indexes, we need to set an offset on row key expressions to skip
+             * the region start key.
+             */
+            offsetToBe = region.getRegionInfo().getStartKey().length != 0 ? region.getRegionInfo().getStartKey().length :
+                    region.getRegionInfo().getEndKey().length;
+            ScanUtil.setRowKeyOffset(scan, offsetToBe);
+        }
+        offset = offsetToBe;
+
+        byte[] descRowKeyTableBytes = scan.getAttribute(UPGRADE_DESC_ROW_KEY);
+        isDescRowKeyOrderUpgrade = descRowKeyTableBytes != null;
+        if (isDescRowKeyOrderUpgrade) {
+            LOGGER.debug("Upgrading row key for " + region.getRegionInfo().getTable().getNameAsString());
+            projectedTable = deserializeTable(descRowKeyTableBytes);
+            try {
+                writeToTable = PTableImpl.builderWithColumns(projectedTable,
+                        getColumnsToClone(projectedTable))
+                        .setRowKeyOrderOptimizable(true)
+                        .build();
+            } catch (SQLException e) {
+                ServerUtil.throwIOException("Upgrade failed", e); // Impossible
+            }
+            values = new byte[projectedTable.getPKColumns().size()][];
+        }
+        boolean useProto = false;
+        byte[] localIndexBytes = scan.getAttribute(LOCAL_INDEX_BUILD_PROTO);
+        useProto = localIndexBytes != null;
+        if (localIndexBytes == null) {
+            localIndexBytes = scan.getAttribute(LOCAL_INDEX_BUILD);
+        }
+        indexMaintainers = localIndexBytes == null ? null : IndexMaintainer.deserialize(localIndexBytes, useProto);
+        indexMutations = localIndexBytes == null ? new UngroupedAggregateRegionObserver.MutationList() : new UngroupedAggregateRegionObserver.MutationList(1024);
+
+        replayMutations = scan.getAttribute(REPLAY_WRITES);
+        indexUUID = scan.getAttribute(PhoenixIndexCodec.INDEX_UUID);
+        txState = scan.getAttribute(BaseScannerRegionObserver.TX_STATE);
+        clientVersionBytes = scan.getAttribute(BaseScannerRegionObserver.CLIENT_VERSION);
+        if (txState != null) {
+            int clientVersion = clientVersionBytes == null ? ScanUtil.UNKNOWN_CLIENT_VERSION : Bytes.toInt(clientVersionBytes);
+            txnProvider = TransactionFactory.getTransactionProvider(txState, clientVersion);
+        }
+        byte[] upsertSelectTable = scan.getAttribute(BaseScannerRegionObserver.UPSERT_SELECT_TABLE);
+        if (upsertSelectTable != null) {
+            isUpsert = true;
+            projectedTable = deserializeTable(upsertSelectTable);
+            targetHTable = new HTable(ungroupedAggregateRegionObserver.getUpsertSelectConfig(),
+                    projectedTable.getPhysicalName().getBytes());
+            selectExpressions = deserializeExpressions(scan.getAttribute(BaseScannerRegionObserver.UPSERT_SELECT_EXPRS));
+            values = new byte[projectedTable.getPKColumns().size()][];
+            isPKChanging = ExpressionUtil.isPkPositionChanging(new TableRef(projectedTable), selectExpressions);
+        } else {
+            byte[] isDeleteAgg = scan.getAttribute(BaseScannerRegionObserver.DELETE_AGG);
+            isDelete = isDeleteAgg != null && Bytes.compareTo(PDataType.TRUE_BYTES, isDeleteAgg) == 0;
+            if (!isDelete) {
+                deleteCF = scan.getAttribute(BaseScannerRegionObserver.DELETE_CF);
+                deleteCQ = scan.getAttribute(BaseScannerRegionObserver.DELETE_CQ);
+            }
+            emptyCF = scan.getAttribute(BaseScannerRegionObserver.EMPTY_CF);
+        }
+        ColumnReference[] dataColumns = IndexUtil.deserializeDataTableColumnsToJoin(scan);
+        useQualifierAsIndex = EncodedColumnsUtil.useQualifierAsIndex(EncodedColumnsUtil.getMinMaxQualifiersFromScan(scan));
+
+        /**
+         * Slow down the writes if the memstore size more than
+         * (hbase.hregion.memstore.block.multiplier - 1) times hbase.hregion.memstore.flush.size
+         * bytes. This avoids flush storm to hdfs for cases like index building where reads and
+         * write happen to all the table regions in the server.
+         */
+        blockingMemStoreSize = getBlockingMemstoreSize(region, conf) ;
+
+        buildLocalIndex = indexMaintainers != null && dataColumns==null && !localIndexScan;
+        if(buildLocalIndex) {
+            checkForLocalIndexColumnFamilies(region, indexMaintainers);
+        }
+        if (isDescRowKeyOrderUpgrade || isDelete || isUpsert
+                || (deleteCQ != null && deleteCF != null) || emptyCF != null || buildLocalIndex) {
+            needToWrite = true;
+            if((isUpsert && (targetHTable == null ||
+                    !targetHTable.getName().equals(region.getTableDesc().getTableName())))) {
+                needToWrite = false;
+            }
+            maxBatchSize = conf.getInt(MUTATE_BATCH_SIZE_ATTRIB, QueryServicesOptions.DEFAULT_MUTATE_BATCH_SIZE);
+            maxBatchSizeBytes = conf.getLong(MUTATE_BATCH_SIZE_BYTES_ATTRIB,
+                    QueryServicesOptions.DEFAULT_MUTATE_BATCH_SIZE_BYTES);
+        }
+        minMaxQualifiers = EncodedColumnsUtil.getMinMaxQualifiersFromScan(scan);
+        if (LOGGER.isDebugEnabled()) {
+            LOGGER.debug(LogUtil.addCustomAnnotations("Starting ungrouped coprocessor scan " + scan + " " + region.getRegionInfo(), ScanUtil.getCustomAnnotations(scan)));
+        }
+        useIndexProto = true;
+        indexMaintainersPtr = scan.getAttribute(PhoenixIndexCodec.INDEX_PROTO_MD);
+        // for backward compatiblity fall back to look by the old attribute
+        if (indexMaintainersPtr == null) {
+            indexMaintainersPtr = scan.getAttribute(PhoenixIndexCodec.INDEX_MD);
+            useIndexProto = false;
+        }
+
+        if (needToWrite) {
+            ungroupedAggregateRegionObserver.incrementScansReferenceCount();
+            incrScanRefCount = true;
+        }
+    }
+
+    @Override
+    public HRegionInfo getRegionInfo() {
+        return region.getRegionInfo();
+    }
+
+    @Override
+    public boolean isFilterDone() {
+        return false;
+    }
+
+    @Override
+    public void close() throws IOException {
+        if (needToWrite && incrScanRefCount) {
+            ungroupedAggregateRegionObserver.decrementScansReferenceCount();
+        }
+        try {
+            if (targetHTable != null) {
+                try {
+                    targetHTable.close();
+                } catch (IOException e) {
+                    LOGGER.error("Closing table: " + targetHTable + " failed: ", e);
+                }
+            }
+        } finally {
+            innerScanner.close();
+        }
+    }
+
+    boolean descRowKeyOrderUpgrade(List<Cell> results, ImmutableBytesWritable ptr,
+                                UngroupedAggregateRegionObserver.MutationList mutations) throws IOException {
+        Arrays.fill(values, null);
+        Cell firstKV = results.get(0);
+        RowKeySchema schema = projectedTable.getRowKeySchema();
+        int maxOffset = schema.iterator(firstKV.getRowArray(), firstKV.getRowOffset() + offset, firstKV.getRowLength(), ptr);
+        for (int i = 0; i < schema.getFieldCount(); i++) {
+            Boolean hasValue = schema.next(ptr, i, maxOffset);
+            if (hasValue == null) {
+                break;
+            }
+            ValueSchema.Field field = schema.getField(i);
+            if (field.getSortOrder() == SortOrder.DESC) {
+                // Special case for re-writing DESC ARRAY, as the actual byte value needs to change in this case
+                if (field.getDataType().isArrayType()) {
+                    field.getDataType().coerceBytes(ptr, null, field.getDataType(),
+                            field.getMaxLength(), field.getScale(), field.getSortOrder(),
+                            field.getMaxLength(), field.getScale(), field.getSortOrder(), true); // force to use correct separator byte
+                }
+                // Special case for re-writing DESC CHAR or DESC BINARY, to force the re-writing of trailing space characters
+                else if (field.getDataType() == PChar.INSTANCE || field.getDataType() == PBinary.INSTANCE) {
+                    int len = ptr.getLength();
+                    while (len > 0 && ptr.get()[ptr.getOffset() + len - 1] == StringUtil.SPACE_UTF8) {
+                        len--;
+                    }
+                    ptr.set(ptr.get(), ptr.getOffset(), len);
+                    // Special case for re-writing DESC FLOAT and DOUBLE, as they're not inverted like they should be (PHOENIX-2171)
+                } else if (field.getDataType() == PFloat.INSTANCE || field.getDataType() == PDouble.INSTANCE) {
+                    byte[] invertedBytes = SortOrder.invert(ptr.get(), ptr.getOffset(), ptr.getLength());
+                    ptr.set(invertedBytes);
+                }
+            } else if (field.getDataType() == PBinary.INSTANCE) {
+                // Remove trailing space characters so that the setValues call below will replace them
+                // with the correct zero byte character. Note this is somewhat dangerous as these
+                // could be legit, but I don't know what the alternative is.
+                int len = ptr.getLength();
+                while (len > 0 && ptr.get()[ptr.getOffset() + len - 1] == StringUtil.SPACE_UTF8) {
+                    len--;
+                }
+                ptr.set(ptr.get(), ptr.getOffset(), len);
+            }
+            values[i] = ptr.copyBytes();
+        }
+        writeToTable.newKey(ptr, values);
+        if (Bytes.compareTo(
+                firstKV.getRowArray(), firstKV.getRowOffset() + offset, firstKV.getRowLength(),
+                ptr.get(),ptr.getOffset() + offset,ptr.getLength()) == 0) {
+            return false;
+        }
+        byte[] newRow = ByteUtil.copyKeyBytesIfNecessary(ptr);
+        if (offset > 0) { // for local indexes (prepend region start key)
+            byte[] newRowWithOffset = new byte[offset + newRow.length];
+            System.arraycopy(firstKV.getRowArray(), firstKV.getRowOffset(), newRowWithOffset, 0, offset);;
+            System.arraycopy(newRow, 0, newRowWithOffset, offset, newRow.length);
+            newRow = newRowWithOffset;
+        }
+        byte[] oldRow = Bytes.copy(firstKV.getRowArray(), firstKV.getRowOffset(), firstKV.getRowLength());
+        for (Cell cell : results) {
+            // Copy existing cell but with new row key
+            Cell newCell = new KeyValue(newRow, 0, newRow.length,
+                    cell.getFamilyArray(), cell.getFamilyOffset(), cell.getFamilyLength(),
+                    cell.getQualifierArray(), cell.getQualifierOffset(), cell.getQualifierLength(),
+                    cell.getTimestamp(), KeyValue.Type.codeToType(cell.getTypeByte()),
+                    cell.getValueArray(), cell.getValueOffset(), cell.getValueLength());
+            switch (KeyValue.Type.codeToType(cell.getTypeByte())) {
+                case Put:
+                    // If Put, point delete old Put
+                    Delete del = new Delete(oldRow);
+                    del.addDeleteMarker(new KeyValue(cell.getRowArray(), cell.getRowOffset(), cell.getRowLength(),
+                            cell.getFamilyArray(), cell.getFamilyOffset(), cell.getFamilyLength(),
+                            cell.getQualifierArray(), cell.getQualifierOffset(),
+                            cell.getQualifierLength(), cell.getTimestamp(), KeyValue.Type.Delete,
+                            ByteUtil.EMPTY_BYTE_ARRAY, 0, 0));
+                    mutations.add(del);
+
+                    Put put = new Put(newRow);
+                    put.add(newCell);
+                    mutations.add(put);
+                    break;
+                case Delete:
+                case DeleteColumn:
+                case DeleteFamily:
+                case DeleteFamilyVersion:
+                    Delete delete = new Delete(newRow);
+                    delete.addDeleteMarker(newCell);
+                    mutations.add(delete);
+                    break;
+            }
+        }
+        return true;
+    }
+
+    void buildLocalIndex(Tuple result, List<Cell> results, ImmutableBytesWritable ptr,
+                         UngroupedAggregateRegionObserver.MutationList mutations) throws IOException {
+        for (IndexMaintainer maintainer : indexMaintainers) {
+            if (!results.isEmpty()) {
+                result.getKey(ptr);
+                ValueGetter valueGetter =
+                        maintainer.createGetterFromKeyValues(
+                                ImmutableBytesPtr.copyBytesIfNecessary(ptr),
+                                results);
+                Put put = maintainer.buildUpdateMutation(GenericKeyValueBuilder.INSTANCE,
+                        valueGetter, ptr, results.get(0).getTimestamp(),
+                        env.getRegion().getRegionInfo().getStartKey(),
+                        env.getRegion().getRegionInfo().getEndKey());
+
+                if (txnProvider != null) {
+                    put = txnProvider.markPutAsCommitted(put, ts, ts);
+                }
+                indexMutations.add(put);
+            }
+        }
+        result.setKeyValues(results);
+    }
+    void deleteRow(List<Cell> results, UngroupedAggregateRegionObserver.MutationList mutations) {
+        // FIXME: the version of the Delete constructor without the lock
+        // args was introduced in 0.94.4, thus if we try to use it here
+        // we can no longer use the 0.94.2 version of the client.
+        Cell firstKV = results.get(0);
+        Delete delete = new Delete(firstKV.getRowArray(),
+                firstKV.getRowOffset(), firstKV.getRowLength(),ts);
+        if (replayMutations != null) {
+            delete.setAttribute(REPLAY_WRITES, replayMutations);
+        }
+        mutations.add(delete);
+        // force tephra to ignore this deletes
+        delete.setAttribute(PhoenixTransactionContext.TX_ROLLBACK_ATTRIBUTE_KEY, new byte[0]);
+    }
+
+    void deleteCForQ(Tuple result, List<Cell> results, UngroupedAggregateRegionObserver.MutationList mutations) {
+        // No need to search for delete column, since we project only it
+        // if no empty key value is being set
+        if (emptyCF == null ||
+                result.getValue(deleteCF, deleteCQ) != null) {
+            Delete delete = new Delete(results.get(0).getRowArray(),
+                    results.get(0).getRowOffset(),
+                    results.get(0).getRowLength());
+            delete.deleteColumns(deleteCF,  deleteCQ, ts);
+            // force tephra to ignore this deletes
+            delete.setAttribute(PhoenixTransactionContext.TX_ROLLBACK_ATTRIBUTE_KEY, new byte[0]);
+            mutations.add(delete);
+        }
+    }
+    void upsert(Tuple result, ImmutableBytesWritable ptr, UngroupedAggregateRegionObserver.MutationList mutations) {
+        Arrays.fill(values, null);
+        int bucketNumOffset = 0;
+        if (projectedTable.getBucketNum() != null) {
+            values[0] = new byte[] { 0 };
+            bucketNumOffset = 1;
+        }
+        int i = bucketNumOffset;
+        List<PColumn> projectedColumns = projectedTable.getColumns();
+        for (; i < projectedTable.getPKColumns().size(); i++) {
+            Expression expression = selectExpressions.get(i - bucketNumOffset);
+            if (expression.evaluate(result, ptr)) {
+                values[i] = ptr.copyBytes();
+                // If SortOrder from expression in SELECT doesn't match the
+                // column being projected into then invert the bits.
+                if (expression.getSortOrder() !=
+                        projectedColumns.get(i).getSortOrder()) {
+                    SortOrder.invert(values[i], 0, values[i], 0,
+                            values[i].length);
+                }
+            }else{
+                values[i] = ByteUtil.EMPTY_BYTE_ARRAY;
+            }
+        }
+        projectedTable.newKey(ptr, values);
+        PRow row = projectedTable.newRow(GenericKeyValueBuilder.INSTANCE, ts, ptr, false);
+        for (; i < projectedColumns.size(); i++) {
+            Expression expression = selectExpressions.get(i - bucketNumOffset);
+            if (expression.evaluate(result, ptr)) {
+                PColumn column = projectedColumns.get(i);
+                if (!column.getDataType().isSizeCompatible(ptr, null,
+                        expression.getDataType(), expression.getSortOrder(),
+                        expression.getMaxLength(), expression.getScale(),
+                        column.getMaxLength(), column.getScale())) {
+                    throw new DataExceedsCapacityException(
+                            column.getDataType(), column.getMaxLength(),
+                            column.getScale(), column.getName().getString(), ptr);
+                }
+                column.getDataType().coerceBytes(ptr, null,
+                        expression.getDataType(), expression.getMaxLength(),
+                        expression.getScale(), expression.getSortOrder(),
+                        column.getMaxLength(), column.getScale(),
+                        column.getSortOrder(), projectedTable.rowKeyOrderOptimizable());
+                byte[] bytes = ByteUtil.copyKeyBytesIfNecessary(ptr);
+                row.setValue(column, bytes);
+            }
+        }
+        for (Mutation mutation : row.toRowMutations()) {
+            if (replayMutations != null) {
+                mutation.setAttribute(REPLAY_WRITES, replayMutations);
+            } else if (txnProvider != null && projectedTable.getType() == PTableType.INDEX) {
+                mutation = txnProvider.markPutAsCommitted((Put)mutation, ts, ts);
+            }
+            mutations.add(mutation);
+        }
+        for (i = 0; i < selectExpressions.size(); i++) {
+            selectExpressions.get(i).reset();
+        }
+    }
+
+    void insertEmptyKeyValue(List<Cell> results, UngroupedAggregateRegionObserver.MutationList mutations) {
+        Set<Long> timeStamps =
+                Sets.newHashSetWithExpectedSize(results.size());
+        for (Cell kv : results) {
+            long kvts = kv.getTimestamp();
+            if (!timeStamps.contains(kvts)) {
+                Put put = new Put(kv.getRowArray(), kv.getRowOffset(),
+                        kv.getRowLength());
+                put.addColumn(emptyCF, QueryConstants.EMPTY_COLUMN_BYTES, kvts,
+                        ByteUtil.EMPTY_BYTE_ARRAY);
+                mutations.add(put);
+            }
+        }
+    }
+    @Override
+    public boolean next(List<Cell> resultsToReturn) throws IOException {

Review comment:
       UARO did not do anything. With this change, only the last page is repeated not the entire table region.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [phoenix] ChinmaySKulkarni commented on pull request #936: PHOENIX-5998 Paged server side ungrouped aggregate operations

Posted by GitBox <gi...@apache.org>.

ChinmaySKulkarni commented on pull request #936:
URL: https://github.com/apache/phoenix/pull/936#issuecomment-716908659


   @kadirozde The changes are substantial and I will need some heads-down time to review them. If it is urgent, please feel free to rely on other's reviews and don't wait for me


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [phoenix] gjacoby126 commented on a change in pull request #936: PHOENIX-5998 Paged server side ungrouped aggregate operations

Posted by GitBox <gi...@apache.org>.

gjacoby126 commented on a change in pull request #936:
URL: https://github.com/apache/phoenix/pull/936#discussion_r512152630



##########
File path: phoenix-core/src/main/java/org/apache/phoenix/coprocessor/UngroupedAggregateRegionScanner.java
##########
@@ -0,0 +1,645 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.phoenix.coprocessor;
+
+import static org.apache.phoenix.coprocessor.BaseScannerRegionObserver.LOCAL_INDEX_BUILD;
+import static org.apache.phoenix.coprocessor.BaseScannerRegionObserver.LOCAL_INDEX_BUILD_PROTO;
+import static org.apache.phoenix.coprocessor.BaseScannerRegionObserver.REPLAY_WRITES;
+import static org.apache.phoenix.coprocessor.BaseScannerRegionObserver.UPGRADE_DESC_ROW_KEY;
+import static org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver.checkForLocalIndexColumnFamilies;
+import static org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver.deserializeExpressions;
+import static org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver.deserializeTable;
+import static org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver.getBlockingMemstoreSize;
+import static org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver.setIndexAndTransactionProperties;
+import static org.apache.phoenix.query.QueryConstants.AGG_TIMESTAMP;
+import static org.apache.phoenix.query.QueryConstants.SINGLE_COLUMN;
+import static org.apache.phoenix.query.QueryConstants.SINGLE_COLUMN_FAMILY;
+import static org.apache.phoenix.query.QueryServices.MUTATE_BATCH_SIZE_ATTRIB;
+import static org.apache.phoenix.query.QueryServices.MUTATE_BATCH_SIZE_BYTES_ATTRIB;
+import static org.apache.phoenix.query.QueryServices.UNGROUPED_AGGREGATE_PAGE_SIZE_IN_ROWS;
+import static org.apache.phoenix.schema.PTableImpl.getColumnsToClone;
+
+import java.io.IOException;
+import java.sql.SQLException;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.List;
+import java.util.Set;
+
+import com.google.common.collect.Sets;
+import com.google.common.primitives.Ints;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.hbase.Cell;
+import org.apache.hadoop.hbase.CellUtil;
+import org.apache.hadoop.hbase.DoNotRetryIOException;
+import org.apache.hadoop.hbase.HRegionInfo;
+import org.apache.hadoop.hbase.KeyValue;
+import org.apache.hadoop.hbase.client.Delete;
+import org.apache.hadoop.hbase.client.HTable;
+import org.apache.hadoop.hbase.client.Mutation;
+import org.apache.hadoop.hbase.client.Put;
+import org.apache.hadoop.hbase.client.Scan;
+import org.apache.hadoop.hbase.coprocessor.ObserverContext;
+import org.apache.hadoop.hbase.coprocessor.RegionCoprocessorEnvironment;
+import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
+import org.apache.hadoop.hbase.regionserver.Region;
+import org.apache.hadoop.hbase.regionserver.RegionScanner;
+import org.apache.hadoop.hbase.util.Bytes;
+import org.apache.phoenix.cache.GlobalCache;
+import org.apache.phoenix.cache.TenantCache;
+import org.apache.phoenix.exception.DataExceedsCapacityException;
+import org.apache.phoenix.expression.Expression;
+import org.apache.phoenix.expression.aggregator.Aggregator;
+import org.apache.phoenix.expression.aggregator.Aggregators;
+import org.apache.phoenix.expression.aggregator.ServerAggregators;
+import org.apache.phoenix.hbase.index.covered.update.ColumnReference;
+import org.apache.phoenix.hbase.index.util.GenericKeyValueBuilder;
+import org.apache.phoenix.index.PhoenixIndexCodec;
+import org.apache.phoenix.memory.InsufficientMemoryException;
+import org.apache.phoenix.memory.MemoryManager;
+import org.apache.phoenix.query.QueryConstants;
+import org.apache.phoenix.query.QueryServicesOptions;
+import org.apache.phoenix.schema.PColumn;
+import org.apache.phoenix.schema.PRow;
+import org.apache.phoenix.schema.PTable;
+import org.apache.phoenix.schema.PTableImpl;
+import org.apache.phoenix.schema.PTableType;
+import org.apache.phoenix.schema.RowKeySchema;
+import org.apache.phoenix.schema.SortOrder;
+import org.apache.phoenix.schema.TableRef;
+import org.apache.phoenix.schema.ValueSchema;
+import org.apache.phoenix.schema.tuple.EncodedColumnQualiferCellsList;
+import org.apache.phoenix.schema.tuple.MultiKeyValueTuple;
+import org.apache.phoenix.schema.tuple.PositionBasedMultiKeyValueTuple;
+import org.apache.phoenix.schema.tuple.Tuple;
+import org.apache.phoenix.schema.types.PBinary;
+import org.apache.phoenix.schema.types.PChar;
+import org.apache.phoenix.schema.types.PDataType;
+import org.apache.phoenix.schema.types.PDouble;
+import org.apache.phoenix.schema.types.PFloat;
+import org.apache.phoenix.transaction.PhoenixTransactionContext;
+import org.apache.phoenix.transaction.PhoenixTransactionProvider;
+import org.apache.phoenix.transaction.TransactionFactory;
+import org.apache.phoenix.util.ByteUtil;
+import org.apache.phoenix.util.EncodedColumnsUtil;
+import org.apache.hadoop.hbase.util.Pair;
+import org.apache.phoenix.hbase.index.ValueGetter;
+import org.apache.phoenix.hbase.index.util.ImmutableBytesPtr;
+import org.apache.phoenix.index.IndexMaintainer;
+import org.apache.phoenix.util.ExpressionUtil;
+import org.apache.phoenix.util.IndexUtil;
+import org.apache.phoenix.util.KeyValueUtil;
+import org.apache.phoenix.util.LogUtil;
+import org.apache.phoenix.util.ScanUtil;
+import org.apache.phoenix.util.ServerUtil;
+import org.apache.phoenix.util.StringUtil;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+public class UngroupedAggregateRegionScanner extends BaseRegionScanner {
+
+    private static final Logger LOGGER = LoggerFactory.getLogger(UngroupedAggregateRegionScanner.class);
+
+    protected long pageSizeInRows = Long.MAX_VALUE;
+    protected int maxBatchSize = 0;
+    protected Scan scan;
+    protected RegionScanner innerScanner;
+    protected Region region;
+    private final UngroupedAggregateRegionObserver ungroupedAggregateRegionObserver;
+    final private RegionCoprocessorEnvironment env;
+    private final boolean useQualifierAsIndex;
+    boolean needToWrite = false;
+    final Pair<Integer, Integer> minMaxQualifiers;
+    byte[][] values = null;
+    PTable.QualifierEncodingScheme encodingScheme;
+    PTable writeToTable = null;
+    PTable projectedTable = null;
+    boolean isDescRowKeyOrderUpgrade;
+    final int offset;
+    boolean buildLocalIndex;
+    List<IndexMaintainer> indexMaintainers;
+    boolean isPKChanging = false;
+    long ts;
+    PhoenixTransactionProvider txnProvider = null;

Review comment:
       Lots of package-level variables -- can any of these be made private? If an external class is referring to them (such as UARO), that's too-tight coupling. But it looks like these are mostly transplanted locals from UARO, so they should be safe to make private?

##########
File path: phoenix-core/src/main/java/org/apache/phoenix/coprocessor/UngroupedAggregateRegionObserver.java
##########
@@ -250,7 +202,31 @@ public void start(CoprocessorEnvironment e) throws IOException {
         indexWriteProps = new ReadOnlyProps(indexWriteConfig.iterator());
     }
 
-    public void commitBatchWithRetries(final Region region, final List<Mutation> localRegionMutations, final long blockingMemstoreSize) throws IOException {
+    Configuration getUpsertSelectConfig() {

Review comment:
       Package scope is intentional in these methods?

##########
File path: phoenix-core/src/main/java/org/apache/phoenix/coprocessor/UngroupedAggregateRegionScanner.java
##########
@@ -0,0 +1,646 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.phoenix.coprocessor;
+
+import static org.apache.phoenix.coprocessor.BaseScannerRegionObserver.LOCAL_INDEX_BUILD;
+import static org.apache.phoenix.coprocessor.BaseScannerRegionObserver.LOCAL_INDEX_BUILD_PROTO;
+import static org.apache.phoenix.coprocessor.BaseScannerRegionObserver.REPLAY_WRITES;
+import static org.apache.phoenix.coprocessor.BaseScannerRegionObserver.UPGRADE_DESC_ROW_KEY;
+import static org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver.checkForLocalIndexColumnFamilies;
+import static org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver.deserializeExpressions;
+import static org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver.deserializeTable;
+import static org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver.getBlockingMemstoreSize;
+import static org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver.setIndexAndTransactionProperties;
+import static org.apache.phoenix.query.QueryConstants.AGG_TIMESTAMP;
+import static org.apache.phoenix.query.QueryConstants.SINGLE_COLUMN;
+import static org.apache.phoenix.query.QueryConstants.SINGLE_COLUMN_FAMILY;
+import static org.apache.phoenix.query.QueryServices.MUTATE_BATCH_SIZE_ATTRIB;
+import static org.apache.phoenix.query.QueryServices.MUTATE_BATCH_SIZE_BYTES_ATTRIB;
+import static org.apache.phoenix.query.QueryServices.UNGROUPED_AGGREGATE_PAGE_SIZE_IN_MS;
+import static org.apache.phoenix.schema.PTableImpl.getColumnsToClone;
+
+import java.io.IOException;
+import java.sql.SQLException;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.List;
+import java.util.Set;
+
+import com.google.common.collect.Sets;
+import com.google.common.primitives.Ints;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.hbase.Cell;
+import org.apache.hadoop.hbase.CellUtil;
+import org.apache.hadoop.hbase.DoNotRetryIOException;
+import org.apache.hadoop.hbase.HRegionInfo;
+import org.apache.hadoop.hbase.KeyValue;
+import org.apache.hadoop.hbase.client.Delete;
+import org.apache.hadoop.hbase.client.HTable;
+import org.apache.hadoop.hbase.client.Mutation;
+import org.apache.hadoop.hbase.client.Put;
+import org.apache.hadoop.hbase.client.Scan;
+import org.apache.hadoop.hbase.coprocessor.ObserverContext;
+import org.apache.hadoop.hbase.coprocessor.RegionCoprocessorEnvironment;
+import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
+import org.apache.hadoop.hbase.regionserver.Region;
+import org.apache.hadoop.hbase.regionserver.RegionScanner;
+import org.apache.hadoop.hbase.util.Bytes;
+import org.apache.phoenix.cache.GlobalCache;
+import org.apache.phoenix.cache.TenantCache;
+import org.apache.phoenix.exception.DataExceedsCapacityException;
+import org.apache.phoenix.expression.Expression;
+import org.apache.phoenix.expression.aggregator.Aggregator;
+import org.apache.phoenix.expression.aggregator.Aggregators;
+import org.apache.phoenix.expression.aggregator.ServerAggregators;
+import org.apache.phoenix.hbase.index.covered.update.ColumnReference;
+import org.apache.phoenix.hbase.index.util.GenericKeyValueBuilder;
+import org.apache.phoenix.index.PhoenixIndexCodec;
+import org.apache.phoenix.memory.InsufficientMemoryException;
+import org.apache.phoenix.memory.MemoryManager;
+import org.apache.phoenix.query.QueryConstants;
+import org.apache.phoenix.query.QueryServicesOptions;
+import org.apache.phoenix.schema.PColumn;
+import org.apache.phoenix.schema.PRow;
+import org.apache.phoenix.schema.PTable;
+import org.apache.phoenix.schema.PTableImpl;
+import org.apache.phoenix.schema.PTableType;
+import org.apache.phoenix.schema.RowKeySchema;
+import org.apache.phoenix.schema.SortOrder;
+import org.apache.phoenix.schema.TableRef;
+import org.apache.phoenix.schema.ValueSchema;
+import org.apache.phoenix.schema.tuple.EncodedColumnQualiferCellsList;
+import org.apache.phoenix.schema.tuple.MultiKeyValueTuple;
+import org.apache.phoenix.schema.tuple.PositionBasedMultiKeyValueTuple;
+import org.apache.phoenix.schema.tuple.Tuple;
+import org.apache.phoenix.schema.types.PBinary;
+import org.apache.phoenix.schema.types.PChar;
+import org.apache.phoenix.schema.types.PDataType;
+import org.apache.phoenix.schema.types.PDouble;
+import org.apache.phoenix.schema.types.PFloat;
+import org.apache.phoenix.transaction.PhoenixTransactionContext;
+import org.apache.phoenix.transaction.PhoenixTransactionProvider;
+import org.apache.phoenix.transaction.TransactionFactory;
+import org.apache.phoenix.util.ByteUtil;
+import org.apache.phoenix.util.EncodedColumnsUtil;
+import org.apache.hadoop.hbase.util.Pair;
+import org.apache.phoenix.hbase.index.ValueGetter;
+import org.apache.phoenix.hbase.index.util.ImmutableBytesPtr;
+import org.apache.phoenix.index.IndexMaintainer;
+import org.apache.phoenix.util.EnvironmentEdgeManager;
+import org.apache.phoenix.util.ExpressionUtil;
+import org.apache.phoenix.util.IndexUtil;
+import org.apache.phoenix.util.KeyValueUtil;
+import org.apache.phoenix.util.LogUtil;
+import org.apache.phoenix.util.ScanUtil;
+import org.apache.phoenix.util.ServerUtil;
+import org.apache.phoenix.util.StringUtil;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+public class UngroupedAggregateRegionScanner extends BaseRegionScanner {
+
+    private static final Logger LOGGER = LoggerFactory.getLogger(UngroupedAggregateRegionScanner.class);
+
+    protected long pageSizeInMs = Long.MAX_VALUE;
+    protected int maxBatchSize = 0;
+    protected Scan scan;
+    protected RegionScanner innerScanner;
+    protected Region region;
+    private final UngroupedAggregateRegionObserver ungroupedAggregateRegionObserver;
+    final private RegionCoprocessorEnvironment env;
+    private final boolean useQualifierAsIndex;
+    boolean needToWrite = false;
+    final Pair<Integer, Integer> minMaxQualifiers;
+    byte[][] values = null;
+    PTable.QualifierEncodingScheme encodingScheme;
+    PTable writeToTable = null;
+    PTable projectedTable = null;
+    boolean isDescRowKeyOrderUpgrade;
+    final int offset;
+    boolean buildLocalIndex;
+    List<IndexMaintainer> indexMaintainers;
+    boolean isPKChanging = false;
+    long ts;
+    PhoenixTransactionProvider txnProvider = null;
+    UngroupedAggregateRegionObserver.MutationList indexMutations;
+    boolean isDelete = false;
+    byte[] replayMutations;
+    boolean isUpsert = false;
+    List<Expression> selectExpressions = null;
+    byte[] deleteCQ = null;
+    byte[] deleteCF = null;
+    byte[] emptyCF = null;
+    byte[] indexUUID;
+    byte[] txState;
+    byte[] clientVersionBytes;
+    long blockingMemStoreSize;
+    long maxBatchSizeBytes = 0L;
+    HTable targetHTable = null;
+    boolean incrScanRefCount = false;
+    byte[] indexMaintainersPtr;
+    boolean useIndexProto;
+
+    public UngroupedAggregateRegionScanner(final ObserverContext<RegionCoprocessorEnvironment> c,
+                                           final RegionScanner innerScanner, final Region region, final Scan scan,
+                                           final RegionCoprocessorEnvironment env,
+                                           final UngroupedAggregateRegionObserver ungroupedAggregateRegionObserver)
+            throws IOException, SQLException{
+        super(innerScanner);
+        this.env = env;
+        this.region = region;
+        this.scan = scan;
+        this.ungroupedAggregateRegionObserver = ungroupedAggregateRegionObserver;
+        this.innerScanner = innerScanner;
+        Configuration conf = env.getConfiguration();
+        if (scan.getAttribute(BaseScannerRegionObserver.SERVER_PAGING) != null) {
+            byte[] pageSizeFromScan =
+                    scan.getAttribute(BaseScannerRegionObserver.AGGREGATE_PAGE_ROWS);
+            if (pageSizeFromScan != null) {
+                pageSizeInMs = Bytes.toLong(pageSizeFromScan);
+            } else {
+                pageSizeInMs =
+                        conf.getLong(UNGROUPED_AGGREGATE_PAGE_SIZE_IN_MS,
+                                QueryServicesOptions.DEFAULT_UNGROUPED_AGGREGATE_PAGE_SIZE_IN_MS);
+            }
+        }
+        ts = scan.getTimeRange().getMax();
+        boolean localIndexScan = ScanUtil.isLocalIndex(scan);
+
+        encodingScheme = EncodedColumnsUtil.getQualifierEncodingScheme(scan);
+        int offsetToBe = 0;
+        if (localIndexScan) {
+            /*
+             * For local indexes, we need to set an offset on row key expressions to skip
+             * the region start key.
+             */
+            offsetToBe = region.getRegionInfo().getStartKey().length != 0 ? region.getRegionInfo().getStartKey().length :
+                    region.getRegionInfo().getEndKey().length;
+            ScanUtil.setRowKeyOffset(scan, offsetToBe);
+        }
+        offset = offsetToBe;
+
+        byte[] descRowKeyTableBytes = scan.getAttribute(UPGRADE_DESC_ROW_KEY);
+        isDescRowKeyOrderUpgrade = descRowKeyTableBytes != null;
+        if (isDescRowKeyOrderUpgrade) {
+            LOGGER.debug("Upgrading row key for " + region.getRegionInfo().getTable().getNameAsString());
+            projectedTable = deserializeTable(descRowKeyTableBytes);
+            try {
+                writeToTable = PTableImpl.builderWithColumns(projectedTable,
+                        getColumnsToClone(projectedTable))
+                        .setRowKeyOrderOptimizable(true)
+                        .build();
+            } catch (SQLException e) {
+                ServerUtil.throwIOException("Upgrade failed", e); // Impossible
+            }
+            values = new byte[projectedTable.getPKColumns().size()][];
+        }
+        boolean useProto = false;
+        byte[] localIndexBytes = scan.getAttribute(LOCAL_INDEX_BUILD_PROTO);
+        useProto = localIndexBytes != null;
+        if (localIndexBytes == null) {
+            localIndexBytes = scan.getAttribute(LOCAL_INDEX_BUILD);
+        }
+        indexMaintainers = localIndexBytes == null ? null : IndexMaintainer.deserialize(localIndexBytes, useProto);
+        indexMutations = localIndexBytes == null ? new UngroupedAggregateRegionObserver.MutationList() : new UngroupedAggregateRegionObserver.MutationList(1024);
+
+        replayMutations = scan.getAttribute(REPLAY_WRITES);
+        indexUUID = scan.getAttribute(PhoenixIndexCodec.INDEX_UUID);
+        txState = scan.getAttribute(BaseScannerRegionObserver.TX_STATE);
+        clientVersionBytes = scan.getAttribute(BaseScannerRegionObserver.CLIENT_VERSION);
+        if (txState != null) {
+            int clientVersion = clientVersionBytes == null ? ScanUtil.UNKNOWN_CLIENT_VERSION : Bytes.toInt(clientVersionBytes);
+            txnProvider = TransactionFactory.getTransactionProvider(txState, clientVersion);
+        }
+        byte[] upsertSelectTable = scan.getAttribute(BaseScannerRegionObserver.UPSERT_SELECT_TABLE);
+        if (upsertSelectTable != null) {
+            isUpsert = true;
+            projectedTable = deserializeTable(upsertSelectTable);
+            targetHTable = new HTable(ungroupedAggregateRegionObserver.getUpsertSelectConfig(),
+                    projectedTable.getPhysicalName().getBytes());
+            selectExpressions = deserializeExpressions(scan.getAttribute(BaseScannerRegionObserver.UPSERT_SELECT_EXPRS));
+            values = new byte[projectedTable.getPKColumns().size()][];
+            isPKChanging = ExpressionUtil.isPkPositionChanging(new TableRef(projectedTable), selectExpressions);
+        } else {
+            byte[] isDeleteAgg = scan.getAttribute(BaseScannerRegionObserver.DELETE_AGG);
+            isDelete = isDeleteAgg != null && Bytes.compareTo(PDataType.TRUE_BYTES, isDeleteAgg) == 0;
+            if (!isDelete) {
+                deleteCF = scan.getAttribute(BaseScannerRegionObserver.DELETE_CF);
+                deleteCQ = scan.getAttribute(BaseScannerRegionObserver.DELETE_CQ);
+            }
+            emptyCF = scan.getAttribute(BaseScannerRegionObserver.EMPTY_CF);
+        }
+        ColumnReference[] dataColumns = IndexUtil.deserializeDataTableColumnsToJoin(scan);
+        useQualifierAsIndex = EncodedColumnsUtil.useQualifierAsIndex(EncodedColumnsUtil.getMinMaxQualifiersFromScan(scan));
+
+        /**
+         * Slow down the writes if the memstore size more than
+         * (hbase.hregion.memstore.block.multiplier - 1) times hbase.hregion.memstore.flush.size
+         * bytes. This avoids flush storm to hdfs for cases like index building where reads and
+         * write happen to all the table regions in the server.
+         */
+        blockingMemStoreSize = getBlockingMemstoreSize(region, conf) ;
+
+        buildLocalIndex = indexMaintainers != null && dataColumns==null && !localIndexScan;
+        if(buildLocalIndex) {
+            checkForLocalIndexColumnFamilies(region, indexMaintainers);
+        }
+        if (isDescRowKeyOrderUpgrade || isDelete || isUpsert
+                || (deleteCQ != null && deleteCF != null) || emptyCF != null || buildLocalIndex) {
+            needToWrite = true;
+            if((isUpsert && (targetHTable == null ||
+                    !targetHTable.getName().equals(region.getTableDesc().getTableName())))) {
+                needToWrite = false;
+            }
+            maxBatchSize = conf.getInt(MUTATE_BATCH_SIZE_ATTRIB, QueryServicesOptions.DEFAULT_MUTATE_BATCH_SIZE);
+            maxBatchSizeBytes = conf.getLong(MUTATE_BATCH_SIZE_BYTES_ATTRIB,
+                    QueryServicesOptions.DEFAULT_MUTATE_BATCH_SIZE_BYTES);
+        }
+        minMaxQualifiers = EncodedColumnsUtil.getMinMaxQualifiersFromScan(scan);
+        if (LOGGER.isDebugEnabled()) {
+            LOGGER.debug(LogUtil.addCustomAnnotations("Starting ungrouped coprocessor scan " + scan + " " + region.getRegionInfo(), ScanUtil.getCustomAnnotations(scan)));
+        }
+        useIndexProto = true;
+        indexMaintainersPtr = scan.getAttribute(PhoenixIndexCodec.INDEX_PROTO_MD);
+        // for backward compatiblity fall back to look by the old attribute
+        if (indexMaintainersPtr == null) {
+            indexMaintainersPtr = scan.getAttribute(PhoenixIndexCodec.INDEX_MD);
+            useIndexProto = false;
+        }
+
+        if (needToWrite) {
+            ungroupedAggregateRegionObserver.incrementScansReferenceCount();
+            incrScanRefCount = true;
+        }
+    }
+
+    @Override
+    public HRegionInfo getRegionInfo() {
+        return region.getRegionInfo();
+    }
+
+    @Override
+    public boolean isFilterDone() {
+        return false;
+    }
+
+    @Override
+    public void close() throws IOException {
+        if (needToWrite && incrScanRefCount) {
+            ungroupedAggregateRegionObserver.decrementScansReferenceCount();
+        }
+        try {
+            if (targetHTable != null) {
+                try {
+                    targetHTable.close();
+                } catch (IOException e) {
+                    LOGGER.error("Closing table: " + targetHTable + " failed: ", e);
+                }
+            }
+        } finally {
+            innerScanner.close();
+        }
+    }
+
+    boolean descRowKeyOrderUpgrade(List<Cell> results, ImmutableBytesWritable ptr,
+                                UngroupedAggregateRegionObserver.MutationList mutations) throws IOException {
+        Arrays.fill(values, null);
+        Cell firstKV = results.get(0);
+        RowKeySchema schema = projectedTable.getRowKeySchema();
+        int maxOffset = schema.iterator(firstKV.getRowArray(), firstKV.getRowOffset() + offset, firstKV.getRowLength(), ptr);
+        for (int i = 0; i < schema.getFieldCount(); i++) {
+            Boolean hasValue = schema.next(ptr, i, maxOffset);
+            if (hasValue == null) {
+                break;
+            }
+            ValueSchema.Field field = schema.getField(i);
+            if (field.getSortOrder() == SortOrder.DESC) {
+                // Special case for re-writing DESC ARRAY, as the actual byte value needs to change in this case
+                if (field.getDataType().isArrayType()) {
+                    field.getDataType().coerceBytes(ptr, null, field.getDataType(),
+                            field.getMaxLength(), field.getScale(), field.getSortOrder(),
+                            field.getMaxLength(), field.getScale(), field.getSortOrder(), true); // force to use correct separator byte
+                }
+                // Special case for re-writing DESC CHAR or DESC BINARY, to force the re-writing of trailing space characters
+                else if (field.getDataType() == PChar.INSTANCE || field.getDataType() == PBinary.INSTANCE) {
+                    int len = ptr.getLength();
+                    while (len > 0 && ptr.get()[ptr.getOffset() + len - 1] == StringUtil.SPACE_UTF8) {
+                        len--;
+                    }
+                    ptr.set(ptr.get(), ptr.getOffset(), len);
+                    // Special case for re-writing DESC FLOAT and DOUBLE, as they're not inverted like they should be (PHOENIX-2171)
+                } else if (field.getDataType() == PFloat.INSTANCE || field.getDataType() == PDouble.INSTANCE) {
+                    byte[] invertedBytes = SortOrder.invert(ptr.get(), ptr.getOffset(), ptr.getLength());
+                    ptr.set(invertedBytes);
+                }
+            } else if (field.getDataType() == PBinary.INSTANCE) {
+                // Remove trailing space characters so that the setValues call below will replace them
+                // with the correct zero byte character. Note this is somewhat dangerous as these
+                // could be legit, but I don't know what the alternative is.
+                int len = ptr.getLength();
+                while (len > 0 && ptr.get()[ptr.getOffset() + len - 1] == StringUtil.SPACE_UTF8) {
+                    len--;
+                }
+                ptr.set(ptr.get(), ptr.getOffset(), len);
+            }
+            values[i] = ptr.copyBytes();
+        }
+        writeToTable.newKey(ptr, values);
+        if (Bytes.compareTo(
+                firstKV.getRowArray(), firstKV.getRowOffset() + offset, firstKV.getRowLength(),
+                ptr.get(),ptr.getOffset() + offset,ptr.getLength()) == 0) {
+            return false;
+        }
+        byte[] newRow = ByteUtil.copyKeyBytesIfNecessary(ptr);
+        if (offset > 0) { // for local indexes (prepend region start key)
+            byte[] newRowWithOffset = new byte[offset + newRow.length];
+            System.arraycopy(firstKV.getRowArray(), firstKV.getRowOffset(), newRowWithOffset, 0, offset);;
+            System.arraycopy(newRow, 0, newRowWithOffset, offset, newRow.length);
+            newRow = newRowWithOffset;
+        }
+        byte[] oldRow = Bytes.copy(firstKV.getRowArray(), firstKV.getRowOffset(), firstKV.getRowLength());
+        for (Cell cell : results) {
+            // Copy existing cell but with new row key
+            Cell newCell = new KeyValue(newRow, 0, newRow.length,
+                    cell.getFamilyArray(), cell.getFamilyOffset(), cell.getFamilyLength(),
+                    cell.getQualifierArray(), cell.getQualifierOffset(), cell.getQualifierLength(),
+                    cell.getTimestamp(), KeyValue.Type.codeToType(cell.getTypeByte()),
+                    cell.getValueArray(), cell.getValueOffset(), cell.getValueLength());
+            switch (KeyValue.Type.codeToType(cell.getTypeByte())) {
+                case Put:
+                    // If Put, point delete old Put
+                    Delete del = new Delete(oldRow);
+                    del.addDeleteMarker(new KeyValue(cell.getRowArray(), cell.getRowOffset(), cell.getRowLength(),

Review comment:
       ditto here

##########
File path: phoenix-core/src/main/java/org/apache/phoenix/coprocessor/UngroupedAggregateRegionScanner.java
##########
@@ -0,0 +1,646 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.phoenix.coprocessor;
+
+import static org.apache.phoenix.coprocessor.BaseScannerRegionObserver.LOCAL_INDEX_BUILD;
+import static org.apache.phoenix.coprocessor.BaseScannerRegionObserver.LOCAL_INDEX_BUILD_PROTO;
+import static org.apache.phoenix.coprocessor.BaseScannerRegionObserver.REPLAY_WRITES;
+import static org.apache.phoenix.coprocessor.BaseScannerRegionObserver.UPGRADE_DESC_ROW_KEY;
+import static org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver.checkForLocalIndexColumnFamilies;
+import static org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver.deserializeExpressions;
+import static org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver.deserializeTable;
+import static org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver.getBlockingMemstoreSize;
+import static org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver.setIndexAndTransactionProperties;
+import static org.apache.phoenix.query.QueryConstants.AGG_TIMESTAMP;
+import static org.apache.phoenix.query.QueryConstants.SINGLE_COLUMN;
+import static org.apache.phoenix.query.QueryConstants.SINGLE_COLUMN_FAMILY;
+import static org.apache.phoenix.query.QueryServices.MUTATE_BATCH_SIZE_ATTRIB;
+import static org.apache.phoenix.query.QueryServices.MUTATE_BATCH_SIZE_BYTES_ATTRIB;
+import static org.apache.phoenix.query.QueryServices.UNGROUPED_AGGREGATE_PAGE_SIZE_IN_MS;
+import static org.apache.phoenix.schema.PTableImpl.getColumnsToClone;
+
+import java.io.IOException;
+import java.sql.SQLException;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.List;
+import java.util.Set;
+
+import com.google.common.collect.Sets;
+import com.google.common.primitives.Ints;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.hbase.Cell;
+import org.apache.hadoop.hbase.CellUtil;
+import org.apache.hadoop.hbase.DoNotRetryIOException;
+import org.apache.hadoop.hbase.HRegionInfo;
+import org.apache.hadoop.hbase.KeyValue;
+import org.apache.hadoop.hbase.client.Delete;
+import org.apache.hadoop.hbase.client.HTable;
+import org.apache.hadoop.hbase.client.Mutation;
+import org.apache.hadoop.hbase.client.Put;
+import org.apache.hadoop.hbase.client.Scan;
+import org.apache.hadoop.hbase.coprocessor.ObserverContext;
+import org.apache.hadoop.hbase.coprocessor.RegionCoprocessorEnvironment;
+import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
+import org.apache.hadoop.hbase.regionserver.Region;
+import org.apache.hadoop.hbase.regionserver.RegionScanner;
+import org.apache.hadoop.hbase.util.Bytes;
+import org.apache.phoenix.cache.GlobalCache;
+import org.apache.phoenix.cache.TenantCache;
+import org.apache.phoenix.exception.DataExceedsCapacityException;
+import org.apache.phoenix.expression.Expression;
+import org.apache.phoenix.expression.aggregator.Aggregator;
+import org.apache.phoenix.expression.aggregator.Aggregators;
+import org.apache.phoenix.expression.aggregator.ServerAggregators;
+import org.apache.phoenix.hbase.index.covered.update.ColumnReference;
+import org.apache.phoenix.hbase.index.util.GenericKeyValueBuilder;
+import org.apache.phoenix.index.PhoenixIndexCodec;
+import org.apache.phoenix.memory.InsufficientMemoryException;
+import org.apache.phoenix.memory.MemoryManager;
+import org.apache.phoenix.query.QueryConstants;
+import org.apache.phoenix.query.QueryServicesOptions;
+import org.apache.phoenix.schema.PColumn;
+import org.apache.phoenix.schema.PRow;
+import org.apache.phoenix.schema.PTable;
+import org.apache.phoenix.schema.PTableImpl;
+import org.apache.phoenix.schema.PTableType;
+import org.apache.phoenix.schema.RowKeySchema;
+import org.apache.phoenix.schema.SortOrder;
+import org.apache.phoenix.schema.TableRef;
+import org.apache.phoenix.schema.ValueSchema;
+import org.apache.phoenix.schema.tuple.EncodedColumnQualiferCellsList;
+import org.apache.phoenix.schema.tuple.MultiKeyValueTuple;
+import org.apache.phoenix.schema.tuple.PositionBasedMultiKeyValueTuple;
+import org.apache.phoenix.schema.tuple.Tuple;
+import org.apache.phoenix.schema.types.PBinary;
+import org.apache.phoenix.schema.types.PChar;
+import org.apache.phoenix.schema.types.PDataType;
+import org.apache.phoenix.schema.types.PDouble;
+import org.apache.phoenix.schema.types.PFloat;
+import org.apache.phoenix.transaction.PhoenixTransactionContext;
+import org.apache.phoenix.transaction.PhoenixTransactionProvider;
+import org.apache.phoenix.transaction.TransactionFactory;
+import org.apache.phoenix.util.ByteUtil;
+import org.apache.phoenix.util.EncodedColumnsUtil;
+import org.apache.hadoop.hbase.util.Pair;
+import org.apache.phoenix.hbase.index.ValueGetter;
+import org.apache.phoenix.hbase.index.util.ImmutableBytesPtr;
+import org.apache.phoenix.index.IndexMaintainer;
+import org.apache.phoenix.util.EnvironmentEdgeManager;
+import org.apache.phoenix.util.ExpressionUtil;
+import org.apache.phoenix.util.IndexUtil;
+import org.apache.phoenix.util.KeyValueUtil;
+import org.apache.phoenix.util.LogUtil;
+import org.apache.phoenix.util.ScanUtil;
+import org.apache.phoenix.util.ServerUtil;
+import org.apache.phoenix.util.StringUtil;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+public class UngroupedAggregateRegionScanner extends BaseRegionScanner {
+
+    private static final Logger LOGGER = LoggerFactory.getLogger(UngroupedAggregateRegionScanner.class);
+
+    protected long pageSizeInMs = Long.MAX_VALUE;
+    protected int maxBatchSize = 0;
+    protected Scan scan;
+    protected RegionScanner innerScanner;
+    protected Region region;
+    private final UngroupedAggregateRegionObserver ungroupedAggregateRegionObserver;
+    final private RegionCoprocessorEnvironment env;
+    private final boolean useQualifierAsIndex;
+    boolean needToWrite = false;
+    final Pair<Integer, Integer> minMaxQualifiers;
+    byte[][] values = null;
+    PTable.QualifierEncodingScheme encodingScheme;
+    PTable writeToTable = null;
+    PTable projectedTable = null;
+    boolean isDescRowKeyOrderUpgrade;
+    final int offset;
+    boolean buildLocalIndex;
+    List<IndexMaintainer> indexMaintainers;
+    boolean isPKChanging = false;
+    long ts;
+    PhoenixTransactionProvider txnProvider = null;
+    UngroupedAggregateRegionObserver.MutationList indexMutations;
+    boolean isDelete = false;
+    byte[] replayMutations;
+    boolean isUpsert = false;
+    List<Expression> selectExpressions = null;
+    byte[] deleteCQ = null;
+    byte[] deleteCF = null;
+    byte[] emptyCF = null;
+    byte[] indexUUID;
+    byte[] txState;
+    byte[] clientVersionBytes;
+    long blockingMemStoreSize;
+    long maxBatchSizeBytes = 0L;
+    HTable targetHTable = null;
+    boolean incrScanRefCount = false;
+    byte[] indexMaintainersPtr;
+    boolean useIndexProto;
+
+    public UngroupedAggregateRegionScanner(final ObserverContext<RegionCoprocessorEnvironment> c,
+                                           final RegionScanner innerScanner, final Region region, final Scan scan,
+                                           final RegionCoprocessorEnvironment env,
+                                           final UngroupedAggregateRegionObserver ungroupedAggregateRegionObserver)
+            throws IOException, SQLException{
+        super(innerScanner);
+        this.env = env;
+        this.region = region;
+        this.scan = scan;
+        this.ungroupedAggregateRegionObserver = ungroupedAggregateRegionObserver;
+        this.innerScanner = innerScanner;
+        Configuration conf = env.getConfiguration();
+        if (scan.getAttribute(BaseScannerRegionObserver.SERVER_PAGING) != null) {
+            byte[] pageSizeFromScan =
+                    scan.getAttribute(BaseScannerRegionObserver.AGGREGATE_PAGE_ROWS);
+            if (pageSizeFromScan != null) {
+                pageSizeInMs = Bytes.toLong(pageSizeFromScan);
+            } else {
+                pageSizeInMs =
+                        conf.getLong(UNGROUPED_AGGREGATE_PAGE_SIZE_IN_MS,
+                                QueryServicesOptions.DEFAULT_UNGROUPED_AGGREGATE_PAGE_SIZE_IN_MS);
+            }
+        }
+        ts = scan.getTimeRange().getMax();
+        boolean localIndexScan = ScanUtil.isLocalIndex(scan);
+
+        encodingScheme = EncodedColumnsUtil.getQualifierEncodingScheme(scan);
+        int offsetToBe = 0;
+        if (localIndexScan) {
+            /*
+             * For local indexes, we need to set an offset on row key expressions to skip
+             * the region start key.
+             */
+            offsetToBe = region.getRegionInfo().getStartKey().length != 0 ? region.getRegionInfo().getStartKey().length :
+                    region.getRegionInfo().getEndKey().length;
+            ScanUtil.setRowKeyOffset(scan, offsetToBe);
+        }
+        offset = offsetToBe;
+
+        byte[] descRowKeyTableBytes = scan.getAttribute(UPGRADE_DESC_ROW_KEY);
+        isDescRowKeyOrderUpgrade = descRowKeyTableBytes != null;
+        if (isDescRowKeyOrderUpgrade) {
+            LOGGER.debug("Upgrading row key for " + region.getRegionInfo().getTable().getNameAsString());
+            projectedTable = deserializeTable(descRowKeyTableBytes);
+            try {
+                writeToTable = PTableImpl.builderWithColumns(projectedTable,
+                        getColumnsToClone(projectedTable))
+                        .setRowKeyOrderOptimizable(true)
+                        .build();
+            } catch (SQLException e) {
+                ServerUtil.throwIOException("Upgrade failed", e); // Impossible
+            }
+            values = new byte[projectedTable.getPKColumns().size()][];
+        }
+        boolean useProto = false;
+        byte[] localIndexBytes = scan.getAttribute(LOCAL_INDEX_BUILD_PROTO);
+        useProto = localIndexBytes != null;
+        if (localIndexBytes == null) {
+            localIndexBytes = scan.getAttribute(LOCAL_INDEX_BUILD);
+        }
+        indexMaintainers = localIndexBytes == null ? null : IndexMaintainer.deserialize(localIndexBytes, useProto);
+        indexMutations = localIndexBytes == null ? new UngroupedAggregateRegionObserver.MutationList() : new UngroupedAggregateRegionObserver.MutationList(1024);
+
+        replayMutations = scan.getAttribute(REPLAY_WRITES);
+        indexUUID = scan.getAttribute(PhoenixIndexCodec.INDEX_UUID);
+        txState = scan.getAttribute(BaseScannerRegionObserver.TX_STATE);
+        clientVersionBytes = scan.getAttribute(BaseScannerRegionObserver.CLIENT_VERSION);
+        if (txState != null) {
+            int clientVersion = clientVersionBytes == null ? ScanUtil.UNKNOWN_CLIENT_VERSION : Bytes.toInt(clientVersionBytes);
+            txnProvider = TransactionFactory.getTransactionProvider(txState, clientVersion);
+        }
+        byte[] upsertSelectTable = scan.getAttribute(BaseScannerRegionObserver.UPSERT_SELECT_TABLE);
+        if (upsertSelectTable != null) {
+            isUpsert = true;
+            projectedTable = deserializeTable(upsertSelectTable);
+            targetHTable = new HTable(ungroupedAggregateRegionObserver.getUpsertSelectConfig(),
+                    projectedTable.getPhysicalName().getBytes());
+            selectExpressions = deserializeExpressions(scan.getAttribute(BaseScannerRegionObserver.UPSERT_SELECT_EXPRS));
+            values = new byte[projectedTable.getPKColumns().size()][];
+            isPKChanging = ExpressionUtil.isPkPositionChanging(new TableRef(projectedTable), selectExpressions);
+        } else {
+            byte[] isDeleteAgg = scan.getAttribute(BaseScannerRegionObserver.DELETE_AGG);
+            isDelete = isDeleteAgg != null && Bytes.compareTo(PDataType.TRUE_BYTES, isDeleteAgg) == 0;
+            if (!isDelete) {
+                deleteCF = scan.getAttribute(BaseScannerRegionObserver.DELETE_CF);
+                deleteCQ = scan.getAttribute(BaseScannerRegionObserver.DELETE_CQ);
+            }
+            emptyCF = scan.getAttribute(BaseScannerRegionObserver.EMPTY_CF);
+        }
+        ColumnReference[] dataColumns = IndexUtil.deserializeDataTableColumnsToJoin(scan);
+        useQualifierAsIndex = EncodedColumnsUtil.useQualifierAsIndex(EncodedColumnsUtil.getMinMaxQualifiersFromScan(scan));
+
+        /**
+         * Slow down the writes if the memstore size more than
+         * (hbase.hregion.memstore.block.multiplier - 1) times hbase.hregion.memstore.flush.size
+         * bytes. This avoids flush storm to hdfs for cases like index building where reads and
+         * write happen to all the table regions in the server.
+         */
+        blockingMemStoreSize = getBlockingMemstoreSize(region, conf) ;
+
+        buildLocalIndex = indexMaintainers != null && dataColumns==null && !localIndexScan;
+        if(buildLocalIndex) {
+            checkForLocalIndexColumnFamilies(region, indexMaintainers);
+        }
+        if (isDescRowKeyOrderUpgrade || isDelete || isUpsert
+                || (deleteCQ != null && deleteCF != null) || emptyCF != null || buildLocalIndex) {
+            needToWrite = true;
+            if((isUpsert && (targetHTable == null ||
+                    !targetHTable.getName().equals(region.getTableDesc().getTableName())))) {
+                needToWrite = false;
+            }
+            maxBatchSize = conf.getInt(MUTATE_BATCH_SIZE_ATTRIB, QueryServicesOptions.DEFAULT_MUTATE_BATCH_SIZE);
+            maxBatchSizeBytes = conf.getLong(MUTATE_BATCH_SIZE_BYTES_ATTRIB,
+                    QueryServicesOptions.DEFAULT_MUTATE_BATCH_SIZE_BYTES);
+        }
+        minMaxQualifiers = EncodedColumnsUtil.getMinMaxQualifiersFromScan(scan);
+        if (LOGGER.isDebugEnabled()) {
+            LOGGER.debug(LogUtil.addCustomAnnotations("Starting ungrouped coprocessor scan " + scan + " " + region.getRegionInfo(), ScanUtil.getCustomAnnotations(scan)));
+        }
+        useIndexProto = true;
+        indexMaintainersPtr = scan.getAttribute(PhoenixIndexCodec.INDEX_PROTO_MD);
+        // for backward compatiblity fall back to look by the old attribute
+        if (indexMaintainersPtr == null) {
+            indexMaintainersPtr = scan.getAttribute(PhoenixIndexCodec.INDEX_MD);
+            useIndexProto = false;
+        }
+
+        if (needToWrite) {
+            ungroupedAggregateRegionObserver.incrementScansReferenceCount();
+            incrScanRefCount = true;
+        }
+    }
+
+    @Override
+    public HRegionInfo getRegionInfo() {
+        return region.getRegionInfo();
+    }
+
+    @Override
+    public boolean isFilterDone() {
+        return false;
+    }
+
+    @Override
+    public void close() throws IOException {
+        if (needToWrite && incrScanRefCount) {
+            ungroupedAggregateRegionObserver.decrementScansReferenceCount();
+        }
+        try {
+            if (targetHTable != null) {
+                try {
+                    targetHTable.close();
+                } catch (IOException e) {
+                    LOGGER.error("Closing table: " + targetHTable + " failed: ", e);
+                }
+            }
+        } finally {
+            innerScanner.close();
+        }
+    }
+
+    boolean descRowKeyOrderUpgrade(List<Cell> results, ImmutableBytesWritable ptr,
+                                UngroupedAggregateRegionObserver.MutationList mutations) throws IOException {
+        Arrays.fill(values, null);
+        Cell firstKV = results.get(0);
+        RowKeySchema schema = projectedTable.getRowKeySchema();
+        int maxOffset = schema.iterator(firstKV.getRowArray(), firstKV.getRowOffset() + offset, firstKV.getRowLength(), ptr);
+        for (int i = 0; i < schema.getFieldCount(); i++) {
+            Boolean hasValue = schema.next(ptr, i, maxOffset);
+            if (hasValue == null) {
+                break;
+            }
+            ValueSchema.Field field = schema.getField(i);
+            if (field.getSortOrder() == SortOrder.DESC) {
+                // Special case for re-writing DESC ARRAY, as the actual byte value needs to change in this case
+                if (field.getDataType().isArrayType()) {
+                    field.getDataType().coerceBytes(ptr, null, field.getDataType(),
+                            field.getMaxLength(), field.getScale(), field.getSortOrder(),
+                            field.getMaxLength(), field.getScale(), field.getSortOrder(), true); // force to use correct separator byte
+                }
+                // Special case for re-writing DESC CHAR or DESC BINARY, to force the re-writing of trailing space characters
+                else if (field.getDataType() == PChar.INSTANCE || field.getDataType() == PBinary.INSTANCE) {
+                    int len = ptr.getLength();
+                    while (len > 0 && ptr.get()[ptr.getOffset() + len - 1] == StringUtil.SPACE_UTF8) {
+                        len--;
+                    }
+                    ptr.set(ptr.get(), ptr.getOffset(), len);
+                    // Special case for re-writing DESC FLOAT and DOUBLE, as they're not inverted like they should be (PHOENIX-2171)
+                } else if (field.getDataType() == PFloat.INSTANCE || field.getDataType() == PDouble.INSTANCE) {
+                    byte[] invertedBytes = SortOrder.invert(ptr.get(), ptr.getOffset(), ptr.getLength());
+                    ptr.set(invertedBytes);
+                }
+            } else if (field.getDataType() == PBinary.INSTANCE) {
+                // Remove trailing space characters so that the setValues call below will replace them
+                // with the correct zero byte character. Note this is somewhat dangerous as these
+                // could be legit, but I don't know what the alternative is.
+                int len = ptr.getLength();
+                while (len > 0 && ptr.get()[ptr.getOffset() + len - 1] == StringUtil.SPACE_UTF8) {
+                    len--;
+                }
+                ptr.set(ptr.get(), ptr.getOffset(), len);
+            }
+            values[i] = ptr.copyBytes();
+        }
+        writeToTable.newKey(ptr, values);
+        if (Bytes.compareTo(
+                firstKV.getRowArray(), firstKV.getRowOffset() + offset, firstKV.getRowLength(),
+                ptr.get(),ptr.getOffset() + offset,ptr.getLength()) == 0) {
+            return false;
+        }
+        byte[] newRow = ByteUtil.copyKeyBytesIfNecessary(ptr);
+        if (offset > 0) { // for local indexes (prepend region start key)
+            byte[] newRowWithOffset = new byte[offset + newRow.length];
+            System.arraycopy(firstKV.getRowArray(), firstKV.getRowOffset(), newRowWithOffset, 0, offset);;
+            System.arraycopy(newRow, 0, newRowWithOffset, offset, newRow.length);
+            newRow = newRowWithOffset;
+        }
+        byte[] oldRow = Bytes.copy(firstKV.getRowArray(), firstKV.getRowOffset(), firstKV.getRowLength());
+        for (Cell cell : results) {
+            // Copy existing cell but with new row key
+            Cell newCell = new KeyValue(newRow, 0, newRow.length,

Review comment:
       KeyValue is IA.Private -- I know this is extracted from existing code, but good to get rid of these when we find them. Can be replaced with CellUtil.createCell and CellUtil.cloneFamily, etc.  

##########
File path: phoenix-core/src/main/java/org/apache/phoenix/coprocessor/UngroupedAggregateRegionScanner.java
##########
@@ -0,0 +1,641 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.phoenix.coprocessor;
+
+import static org.apache.phoenix.coprocessor.BaseScannerRegionObserver.LOCAL_INDEX_BUILD;
+import static org.apache.phoenix.coprocessor.BaseScannerRegionObserver.LOCAL_INDEX_BUILD_PROTO;
+import static org.apache.phoenix.coprocessor.BaseScannerRegionObserver.REPLAY_WRITES;
+import static org.apache.phoenix.coprocessor.BaseScannerRegionObserver.UPGRADE_DESC_ROW_KEY;
+import static org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver.checkForLocalIndexColumnFamilies;
+import static org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver.deserializeExpressions;
+import static org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver.deserializeTable;
+import static org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver.getBlockingMemstoreSize;
+import static org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver.setIndexAndTransactionProperties;
+import static org.apache.phoenix.query.QueryConstants.AGG_TIMESTAMP;
+import static org.apache.phoenix.query.QueryConstants.SINGLE_COLUMN;
+import static org.apache.phoenix.query.QueryConstants.SINGLE_COLUMN_FAMILY;
+import static org.apache.phoenix.query.QueryServices.MUTATE_BATCH_SIZE_ATTRIB;
+import static org.apache.phoenix.query.QueryServices.MUTATE_BATCH_SIZE_BYTES_ATTRIB;
+import static org.apache.phoenix.query.QueryServices.UNGROUPED_AGGREGATE_PAGE_SIZE_IN_ROWS;
+import static org.apache.phoenix.schema.PTableImpl.getColumnsToClone;
+
+import java.io.IOException;
+import java.sql.SQLException;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.List;
+import java.util.Set;
+
+import com.google.common.collect.Sets;
+import com.google.common.primitives.Ints;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.hbase.Cell;
+import org.apache.hadoop.hbase.CellUtil;
+import org.apache.hadoop.hbase.HRegionInfo;
+import org.apache.hadoop.hbase.KeyValue;
+import org.apache.hadoop.hbase.client.Delete;
+import org.apache.hadoop.hbase.client.HTable;
+import org.apache.hadoop.hbase.client.Mutation;
+import org.apache.hadoop.hbase.client.Put;
+import org.apache.hadoop.hbase.client.Scan;
+import org.apache.hadoop.hbase.coprocessor.ObserverContext;
+import org.apache.hadoop.hbase.coprocessor.RegionCoprocessorEnvironment;
+import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
+import org.apache.hadoop.hbase.regionserver.Region;
+import org.apache.hadoop.hbase.regionserver.RegionScanner;
+import org.apache.hadoop.hbase.util.Bytes;
+import org.apache.phoenix.cache.GlobalCache;
+import org.apache.phoenix.cache.TenantCache;
+import org.apache.phoenix.exception.DataExceedsCapacityException;
+import org.apache.phoenix.expression.Expression;
+import org.apache.phoenix.expression.aggregator.Aggregator;
+import org.apache.phoenix.expression.aggregator.Aggregators;
+import org.apache.phoenix.expression.aggregator.ServerAggregators;
+import org.apache.phoenix.hbase.index.covered.update.ColumnReference;
+import org.apache.phoenix.hbase.index.util.GenericKeyValueBuilder;
+import org.apache.phoenix.index.PhoenixIndexCodec;
+import org.apache.phoenix.memory.MemoryManager;
+import org.apache.phoenix.query.QueryConstants;
+import org.apache.phoenix.query.QueryServicesOptions;
+import org.apache.phoenix.schema.PColumn;
+import org.apache.phoenix.schema.PRow;
+import org.apache.phoenix.schema.PTable;
+import org.apache.phoenix.schema.PTableImpl;
+import org.apache.phoenix.schema.PTableType;
+import org.apache.phoenix.schema.RowKeySchema;
+import org.apache.phoenix.schema.SortOrder;
+import org.apache.phoenix.schema.TableRef;
+import org.apache.phoenix.schema.ValueSchema;
+import org.apache.phoenix.schema.tuple.EncodedColumnQualiferCellsList;
+import org.apache.phoenix.schema.tuple.MultiKeyValueTuple;
+import org.apache.phoenix.schema.tuple.PositionBasedMultiKeyValueTuple;
+import org.apache.phoenix.schema.tuple.Tuple;
+import org.apache.phoenix.schema.types.PBinary;
+import org.apache.phoenix.schema.types.PChar;
+import org.apache.phoenix.schema.types.PDataType;
+import org.apache.phoenix.schema.types.PDouble;
+import org.apache.phoenix.schema.types.PFloat;
+import org.apache.phoenix.transaction.PhoenixTransactionContext;
+import org.apache.phoenix.transaction.PhoenixTransactionProvider;
+import org.apache.phoenix.transaction.TransactionFactory;
+import org.apache.phoenix.util.ByteUtil;
+import org.apache.phoenix.util.EncodedColumnsUtil;
+import org.apache.hadoop.hbase.util.Pair;
+import org.apache.phoenix.hbase.index.ValueGetter;
+import org.apache.phoenix.hbase.index.util.ImmutableBytesPtr;
+import org.apache.phoenix.index.IndexMaintainer;
+import org.apache.phoenix.util.ExpressionUtil;
+import org.apache.phoenix.util.IndexUtil;
+import org.apache.phoenix.util.KeyValueUtil;
+import org.apache.phoenix.util.LogUtil;
+import org.apache.phoenix.util.ScanUtil;
+import org.apache.phoenix.util.ServerUtil;
+import org.apache.phoenix.util.StringUtil;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+public class UngroupedAggregateRegionScanner extends BaseRegionScanner {
+
+    private static final Logger LOGGER = LoggerFactory.getLogger(UngroupedAggregateRegionScanner.class);
+
+    protected long pageSizeInRows = Long.MAX_VALUE;
+    protected int maxBatchSize = 0;
+    protected Scan scan;
+    protected RegionScanner innerScanner;
+    protected Region region;
+    private final UngroupedAggregateRegionObserver ungroupedAggregateRegionObserver;
+    final private RegionCoprocessorEnvironment env;
+    private final boolean useQualifierAsIndex;
+    boolean needToWrite = false;
+    final Pair<Integer, Integer> minMaxQualifiers;
+    byte[][] values = null;
+    PTable.QualifierEncodingScheme encodingScheme;
+    PTable writeToTable = null;
+    PTable projectedTable = null;
+    boolean isDescRowKeyOrderUpgrade;
+    final int offset;
+    boolean buildLocalIndex;
+    List<IndexMaintainer> indexMaintainers;
+    boolean isPKChanging = false;
+    long ts;
+    PhoenixTransactionProvider txnProvider = null;
+    UngroupedAggregateRegionObserver.MutationList indexMutations;
+    boolean isDelete = false;
+    byte[] replayMutations;
+    boolean isUpsert = false;
+    List<Expression> selectExpressions = null;
+    byte[] deleteCQ = null;
+    byte[] deleteCF = null;
+    byte[] emptyCF = null;
+    byte[] indexUUID;
+    byte[] txState;
+    byte[] clientVersionBytes;
+    long blockingMemStoreSize;
+    long maxBatchSizeBytes = 0L;
+    HTable targetHTable = null;
+    boolean incrScanRefCount = false;
+    Aggregators aggregators = null;
+    Aggregator[] rowAggregators = null;
+    byte[] indexMaintainersPtr;
+    boolean useIndexProto;
+    MemoryManager.MemoryChunk em = null;
+    boolean hasMore = false;
+
+    public UngroupedAggregateRegionScanner(final ObserverContext<RegionCoprocessorEnvironment> c,
+                                           final RegionScanner innerScanner, final Region region, final Scan scan,
+                                           final RegionCoprocessorEnvironment env,
+                                           final UngroupedAggregateRegionObserver ungroupedAggregateRegionObserver)
+            throws IOException, SQLException{
+        super(innerScanner);
+        this.env = env;
+        this.region = region;
+        this.scan = scan;
+        this.ungroupedAggregateRegionObserver = ungroupedAggregateRegionObserver;
+        this.innerScanner = innerScanner;
+        Configuration conf = env.getConfiguration();
+        byte[] pageSizeFromScan =
+                scan.getAttribute(BaseScannerRegionObserver.AGGREGATE_PAGE_ROWS);
+        if (pageSizeFromScan != null) {
+            pageSizeInRows = Bytes.toLong(pageSizeFromScan);
+        } else {
+            pageSizeInRows =
+                    conf.getLong(UNGROUPED_AGGREGATE_PAGE_SIZE_IN_ROWS,
+                            QueryServicesOptions.DEFAULT_UNGROUPED_AGGREGATE_PAGE_SIZE_IN_ROWS);
+        }
+        ts = scan.getTimeRange().getMax();
+        boolean localIndexScan = ScanUtil.isLocalIndex(scan);
+
+        encodingScheme = EncodedColumnsUtil.getQualifierEncodingScheme(scan);
+        int offsetToBe = 0;
+        if (localIndexScan) {
+            /*
+             * For local indexes, we need to set an offset on row key expressions to skip
+             * the region start key.
+             */
+            offsetToBe = region.getRegionInfo().getStartKey().length != 0 ? region.getRegionInfo().getStartKey().length :
+                    region.getRegionInfo().getEndKey().length;
+            ScanUtil.setRowKeyOffset(scan, offsetToBe);
+        }
+        offset = offsetToBe;
+
+        byte[] descRowKeyTableBytes = scan.getAttribute(UPGRADE_DESC_ROW_KEY);
+        isDescRowKeyOrderUpgrade = descRowKeyTableBytes != null;
+        if (isDescRowKeyOrderUpgrade) {
+            LOGGER.debug("Upgrading row key for " + region.getRegionInfo().getTable().getNameAsString());
+            projectedTable = deserializeTable(descRowKeyTableBytes);
+            try {
+                writeToTable = PTableImpl.builderWithColumns(projectedTable,
+                        getColumnsToClone(projectedTable))
+                        .setRowKeyOrderOptimizable(true)
+                        .build();
+            } catch (SQLException e) {
+                ServerUtil.throwIOException("Upgrade failed", e); // Impossible
+            }
+            values = new byte[projectedTable.getPKColumns().size()][];
+        }
+        boolean useProto = false;
+        byte[] localIndexBytes = scan.getAttribute(LOCAL_INDEX_BUILD_PROTO);
+        useProto = localIndexBytes != null;
+        if (localIndexBytes == null) {
+            localIndexBytes = scan.getAttribute(LOCAL_INDEX_BUILD);
+        }
+        indexMaintainers = localIndexBytes == null ? null : IndexMaintainer.deserialize(localIndexBytes, useProto);
+        indexMutations = localIndexBytes == null ? new UngroupedAggregateRegionObserver.MutationList() : new UngroupedAggregateRegionObserver.MutationList(1024);
+
+        replayMutations = scan.getAttribute(REPLAY_WRITES);
+        indexUUID = scan.getAttribute(PhoenixIndexCodec.INDEX_UUID);
+        txState = scan.getAttribute(BaseScannerRegionObserver.TX_STATE);
+        clientVersionBytes = scan.getAttribute(BaseScannerRegionObserver.CLIENT_VERSION);
+        if (txState != null) {
+            int clientVersion = clientVersionBytes == null ? ScanUtil.UNKNOWN_CLIENT_VERSION : Bytes.toInt(clientVersionBytes);
+            txnProvider = TransactionFactory.getTransactionProvider(txState, clientVersion);
+        }
+        byte[] upsertSelectTable = scan.getAttribute(BaseScannerRegionObserver.UPSERT_SELECT_TABLE);
+        if (upsertSelectTable != null) {
+            isUpsert = true;
+            projectedTable = deserializeTable(upsertSelectTable);
+            targetHTable = new HTable(ungroupedAggregateRegionObserver.getUpsertSelectConfig(),
+                    projectedTable.getPhysicalName().getBytes());
+            selectExpressions = deserializeExpressions(scan.getAttribute(BaseScannerRegionObserver.UPSERT_SELECT_EXPRS));
+            values = new byte[projectedTable.getPKColumns().size()][];
+            isPKChanging = ExpressionUtil.isPkPositionChanging(new TableRef(projectedTable), selectExpressions);
+        } else {
+            byte[] isDeleteAgg = scan.getAttribute(BaseScannerRegionObserver.DELETE_AGG);
+            isDelete = isDeleteAgg != null && Bytes.compareTo(PDataType.TRUE_BYTES, isDeleteAgg) == 0;
+            if (!isDelete) {
+                deleteCF = scan.getAttribute(BaseScannerRegionObserver.DELETE_CF);
+                deleteCQ = scan.getAttribute(BaseScannerRegionObserver.DELETE_CQ);
+            }
+            emptyCF = scan.getAttribute(BaseScannerRegionObserver.EMPTY_CF);
+        }
+        ColumnReference[] dataColumns = IndexUtil.deserializeDataTableColumnsToJoin(scan);
+        useQualifierAsIndex = EncodedColumnsUtil.useQualifierAsIndex(EncodedColumnsUtil.getMinMaxQualifiersFromScan(scan));
+
+        /**
+         * Slow down the writes if the memstore size more than
+         * (hbase.hregion.memstore.block.multiplier - 1) times hbase.hregion.memstore.flush.size
+         * bytes. This avoids flush storm to hdfs for cases like index building where reads and
+         * write happen to all the table regions in the server.
+         */
+        blockingMemStoreSize = getBlockingMemstoreSize(region, conf) ;
+
+        buildLocalIndex = indexMaintainers != null && dataColumns==null && !localIndexScan;
+        if(buildLocalIndex) {
+            checkForLocalIndexColumnFamilies(region, indexMaintainers);
+        }
+        if (isDescRowKeyOrderUpgrade || isDelete || isUpsert
+                || (deleteCQ != null && deleteCF != null) || emptyCF != null || buildLocalIndex) {
+            needToWrite = true;
+            if((isUpsert && (targetHTable == null ||
+                    !targetHTable.getName().equals(region.getTableDesc().getTableName())))) {
+                needToWrite = false;
+            }
+            maxBatchSize = conf.getInt(MUTATE_BATCH_SIZE_ATTRIB, QueryServicesOptions.DEFAULT_MUTATE_BATCH_SIZE);
+            maxBatchSizeBytes = conf.getLong(MUTATE_BATCH_SIZE_BYTES_ATTRIB,
+                    QueryServicesOptions.DEFAULT_MUTATE_BATCH_SIZE_BYTES);
+        }
+        minMaxQualifiers = EncodedColumnsUtil.getMinMaxQualifiersFromScan(scan);
+        final TenantCache tenantCache = GlobalCache.getTenantCache(env, ScanUtil.getTenantId(scan));
+
+        em = tenantCache.getMemoryManager().allocate(0);
+        aggregators = ServerAggregators.deserialize(
+                scan.getAttribute(BaseScannerRegionObserver.AGGREGATORS), conf, em);
+        rowAggregators = aggregators.getAggregators();
+        if (LOGGER.isDebugEnabled()) {
+            LOGGER.debug(LogUtil.addCustomAnnotations("Starting ungrouped coprocessor scan " + scan + " " + region.getRegionInfo(), ScanUtil.getCustomAnnotations(scan)));
+        }
+        useIndexProto = true;
+        indexMaintainersPtr = scan.getAttribute(PhoenixIndexCodec.INDEX_PROTO_MD);
+        // for backward compatiblity fall back to look by the old attribute
+        if (indexMaintainersPtr == null) {
+            indexMaintainersPtr = scan.getAttribute(PhoenixIndexCodec.INDEX_MD);
+            useIndexProto = false;
+        }
+
+        if (needToWrite) {
+            ungroupedAggregateRegionObserver.incrementScansReferenceCount();
+            incrScanRefCount = true;
+        }
+    }
+
+    @Override
+    public HRegionInfo getRegionInfo() {
+        return region.getRegionInfo();
+    }
+
+    @Override
+    public boolean isFilterDone() {
+        return false;
+    }
+
+    @Override
+    public void close() throws IOException {
+        if (needToWrite && incrScanRefCount) {
+            ungroupedAggregateRegionObserver.decrementScansReferenceCount();
+        }
+        try {
+            if (targetHTable != null) {
+                try {
+                    targetHTable.close();
+                } catch (IOException e) {
+                    LOGGER.error("Closing table: " + targetHTable + " failed: ", e);
+                }
+            }
+        } finally {
+            innerScanner.close();
+            if (em != null) {
+                em.close();
+            }
+        }
+    }
+
+    boolean descRowKeyOrderUpgrade(List<Cell> results, ImmutableBytesWritable ptr,
+                                UngroupedAggregateRegionObserver.MutationList mutations) throws IOException {
+        Arrays.fill(values, null);
+        Cell firstKV = results.get(0);
+        RowKeySchema schema = projectedTable.getRowKeySchema();
+        int maxOffset = schema.iterator(firstKV.getRowArray(), firstKV.getRowOffset() + offset, firstKV.getRowLength(), ptr);
+        for (int i = 0; i < schema.getFieldCount(); i++) {
+            Boolean hasValue = schema.next(ptr, i, maxOffset);
+            if (hasValue == null) {
+                break;
+            }
+            ValueSchema.Field field = schema.getField(i);
+            if (field.getSortOrder() == SortOrder.DESC) {
+                // Special case for re-writing DESC ARRAY, as the actual byte value needs to change in this case
+                if (field.getDataType().isArrayType()) {
+                    field.getDataType().coerceBytes(ptr, null, field.getDataType(),
+                            field.getMaxLength(), field.getScale(), field.getSortOrder(),
+                            field.getMaxLength(), field.getScale(), field.getSortOrder(), true); // force to use correct separator byte
+                }
+                // Special case for re-writing DESC CHAR or DESC BINARY, to force the re-writing of trailing space characters
+                else if (field.getDataType() == PChar.INSTANCE || field.getDataType() == PBinary.INSTANCE) {
+                    int len = ptr.getLength();
+                    while (len > 0 && ptr.get()[ptr.getOffset() + len - 1] == StringUtil.SPACE_UTF8) {
+                        len--;
+                    }
+                    ptr.set(ptr.get(), ptr.getOffset(), len);
+                    // Special case for re-writing DESC FLOAT and DOUBLE, as they're not inverted like they should be (PHOENIX-2171)
+                } else if (field.getDataType() == PFloat.INSTANCE || field.getDataType() == PDouble.INSTANCE) {
+                    byte[] invertedBytes = SortOrder.invert(ptr.get(), ptr.getOffset(), ptr.getLength());
+                    ptr.set(invertedBytes);
+                }
+            } else if (field.getDataType() == PBinary.INSTANCE) {
+                // Remove trailing space characters so that the setValues call below will replace them
+                // with the correct zero byte character. Note this is somewhat dangerous as these
+                // could be legit, but I don't know what the alternative is.
+                int len = ptr.getLength();
+                while (len > 0 && ptr.get()[ptr.getOffset() + len - 1] == StringUtil.SPACE_UTF8) {
+                    len--;
+                }
+                ptr.set(ptr.get(), ptr.getOffset(), len);
+            }
+            values[i] = ptr.copyBytes();
+        }
+        writeToTable.newKey(ptr, values);
+        if (Bytes.compareTo(
+                firstKV.getRowArray(), firstKV.getRowOffset() + offset, firstKV.getRowLength(),
+                ptr.get(),ptr.getOffset() + offset,ptr.getLength()) == 0) {
+            return false;
+        }
+        byte[] newRow = ByteUtil.copyKeyBytesIfNecessary(ptr);
+        if (offset > 0) { // for local indexes (prepend region start key)
+            byte[] newRowWithOffset = new byte[offset + newRow.length];
+            System.arraycopy(firstKV.getRowArray(), firstKV.getRowOffset(), newRowWithOffset, 0, offset);;
+            System.arraycopy(newRow, 0, newRowWithOffset, offset, newRow.length);
+            newRow = newRowWithOffset;
+        }
+        byte[] oldRow = Bytes.copy(firstKV.getRowArray(), firstKV.getRowOffset(), firstKV.getRowLength());
+        for (Cell cell : results) {
+            // Copy existing cell but with new row key
+            Cell newCell = new KeyValue(newRow, 0, newRow.length,
+                    cell.getFamilyArray(), cell.getFamilyOffset(), cell.getFamilyLength(),
+                    cell.getQualifierArray(), cell.getQualifierOffset(), cell.getQualifierLength(),
+                    cell.getTimestamp(), KeyValue.Type.codeToType(cell.getTypeByte()),
+                    cell.getValueArray(), cell.getValueOffset(), cell.getValueLength());
+            switch (KeyValue.Type.codeToType(cell.getTypeByte())) {
+                case Put:
+                    // If Put, point delete old Put
+                    Delete del = new Delete(oldRow);
+                    del.addDeleteMarker(new KeyValue(cell.getRowArray(), cell.getRowOffset(), cell.getRowLength(),
+                            cell.getFamilyArray(), cell.getFamilyOffset(), cell.getFamilyLength(),
+                            cell.getQualifierArray(), cell.getQualifierOffset(),
+                            cell.getQualifierLength(), cell.getTimestamp(), KeyValue.Type.Delete,
+                            ByteUtil.EMPTY_BYTE_ARRAY, 0, 0));
+                    mutations.add(del);
+
+                    Put put = new Put(newRow);
+                    put.add(newCell);
+                    mutations.add(put);
+                    break;
+                case Delete:
+                case DeleteColumn:
+                case DeleteFamily:
+                case DeleteFamilyVersion:
+                    Delete delete = new Delete(newRow);
+                    delete.addDeleteMarker(newCell);
+                    mutations.add(delete);
+                    break;
+            }
+        }
+        return true;
+    }
+
+    void buildLocalIndex(Tuple result, List<Cell> results, ImmutableBytesWritable ptr,
+                         UngroupedAggregateRegionObserver.MutationList mutations) throws IOException {
+        for (IndexMaintainer maintainer : indexMaintainers) {
+            if (!results.isEmpty()) {
+                result.getKey(ptr);
+                ValueGetter valueGetter =
+                        maintainer.createGetterFromKeyValues(
+                                ImmutableBytesPtr.copyBytesIfNecessary(ptr),
+                                results);
+                Put put = maintainer.buildUpdateMutation(GenericKeyValueBuilder.INSTANCE,
+                        valueGetter, ptr, results.get(0).getTimestamp(),
+                        env.getRegion().getRegionInfo().getStartKey(),
+                        env.getRegion().getRegionInfo().getEndKey());
+
+                if (txnProvider != null) {
+                    put = txnProvider.markPutAsCommitted(put, ts, ts);
+                }
+                indexMutations.add(put);
+            }
+        }
+        result.setKeyValues(results);
+    }
+    void deleteRow(List<Cell> results, UngroupedAggregateRegionObserver.MutationList mutations) {
+        // FIXME: the version of the Delete constructor without the lock

Review comment:
       This comment isn't relevant anymore. (The RowLock constructor for deletes is long gone)

##########
File path: phoenix-core/src/main/java/org/apache/phoenix/coprocessor/UngroupedAggregateRegionScanner.java
##########
@@ -0,0 +1,641 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.phoenix.coprocessor;
+
+import static org.apache.phoenix.coprocessor.BaseScannerRegionObserver.LOCAL_INDEX_BUILD;
+import static org.apache.phoenix.coprocessor.BaseScannerRegionObserver.LOCAL_INDEX_BUILD_PROTO;
+import static org.apache.phoenix.coprocessor.BaseScannerRegionObserver.REPLAY_WRITES;
+import static org.apache.phoenix.coprocessor.BaseScannerRegionObserver.UPGRADE_DESC_ROW_KEY;
+import static org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver.checkForLocalIndexColumnFamilies;
+import static org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver.deserializeExpressions;
+import static org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver.deserializeTable;
+import static org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver.getBlockingMemstoreSize;
+import static org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver.setIndexAndTransactionProperties;
+import static org.apache.phoenix.query.QueryConstants.AGG_TIMESTAMP;
+import static org.apache.phoenix.query.QueryConstants.SINGLE_COLUMN;
+import static org.apache.phoenix.query.QueryConstants.SINGLE_COLUMN_FAMILY;
+import static org.apache.phoenix.query.QueryServices.MUTATE_BATCH_SIZE_ATTRIB;
+import static org.apache.phoenix.query.QueryServices.MUTATE_BATCH_SIZE_BYTES_ATTRIB;
+import static org.apache.phoenix.query.QueryServices.UNGROUPED_AGGREGATE_PAGE_SIZE_IN_ROWS;
+import static org.apache.phoenix.schema.PTableImpl.getColumnsToClone;
+
+import java.io.IOException;
+import java.sql.SQLException;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.List;
+import java.util.Set;
+
+import com.google.common.collect.Sets;
+import com.google.common.primitives.Ints;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.hbase.Cell;
+import org.apache.hadoop.hbase.CellUtil;
+import org.apache.hadoop.hbase.HRegionInfo;
+import org.apache.hadoop.hbase.KeyValue;
+import org.apache.hadoop.hbase.client.Delete;
+import org.apache.hadoop.hbase.client.HTable;
+import org.apache.hadoop.hbase.client.Mutation;
+import org.apache.hadoop.hbase.client.Put;
+import org.apache.hadoop.hbase.client.Scan;
+import org.apache.hadoop.hbase.coprocessor.ObserverContext;
+import org.apache.hadoop.hbase.coprocessor.RegionCoprocessorEnvironment;
+import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
+import org.apache.hadoop.hbase.regionserver.Region;
+import org.apache.hadoop.hbase.regionserver.RegionScanner;
+import org.apache.hadoop.hbase.util.Bytes;
+import org.apache.phoenix.cache.GlobalCache;
+import org.apache.phoenix.cache.TenantCache;
+import org.apache.phoenix.exception.DataExceedsCapacityException;
+import org.apache.phoenix.expression.Expression;
+import org.apache.phoenix.expression.aggregator.Aggregator;
+import org.apache.phoenix.expression.aggregator.Aggregators;
+import org.apache.phoenix.expression.aggregator.ServerAggregators;
+import org.apache.phoenix.hbase.index.covered.update.ColumnReference;
+import org.apache.phoenix.hbase.index.util.GenericKeyValueBuilder;
+import org.apache.phoenix.index.PhoenixIndexCodec;
+import org.apache.phoenix.memory.MemoryManager;
+import org.apache.phoenix.query.QueryConstants;
+import org.apache.phoenix.query.QueryServicesOptions;
+import org.apache.phoenix.schema.PColumn;
+import org.apache.phoenix.schema.PRow;
+import org.apache.phoenix.schema.PTable;
+import org.apache.phoenix.schema.PTableImpl;
+import org.apache.phoenix.schema.PTableType;
+import org.apache.phoenix.schema.RowKeySchema;
+import org.apache.phoenix.schema.SortOrder;
+import org.apache.phoenix.schema.TableRef;
+import org.apache.phoenix.schema.ValueSchema;
+import org.apache.phoenix.schema.tuple.EncodedColumnQualiferCellsList;
+import org.apache.phoenix.schema.tuple.MultiKeyValueTuple;
+import org.apache.phoenix.schema.tuple.PositionBasedMultiKeyValueTuple;
+import org.apache.phoenix.schema.tuple.Tuple;
+import org.apache.phoenix.schema.types.PBinary;
+import org.apache.phoenix.schema.types.PChar;
+import org.apache.phoenix.schema.types.PDataType;
+import org.apache.phoenix.schema.types.PDouble;
+import org.apache.phoenix.schema.types.PFloat;
+import org.apache.phoenix.transaction.PhoenixTransactionContext;
+import org.apache.phoenix.transaction.PhoenixTransactionProvider;
+import org.apache.phoenix.transaction.TransactionFactory;
+import org.apache.phoenix.util.ByteUtil;
+import org.apache.phoenix.util.EncodedColumnsUtil;
+import org.apache.hadoop.hbase.util.Pair;
+import org.apache.phoenix.hbase.index.ValueGetter;
+import org.apache.phoenix.hbase.index.util.ImmutableBytesPtr;
+import org.apache.phoenix.index.IndexMaintainer;
+import org.apache.phoenix.util.ExpressionUtil;
+import org.apache.phoenix.util.IndexUtil;
+import org.apache.phoenix.util.KeyValueUtil;
+import org.apache.phoenix.util.LogUtil;
+import org.apache.phoenix.util.ScanUtil;
+import org.apache.phoenix.util.ServerUtil;
+import org.apache.phoenix.util.StringUtil;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+public class UngroupedAggregateRegionScanner extends BaseRegionScanner {
+
+    private static final Logger LOGGER = LoggerFactory.getLogger(UngroupedAggregateRegionScanner.class);
+
+    protected long pageSizeInRows = Long.MAX_VALUE;
+    protected int maxBatchSize = 0;
+    protected Scan scan;
+    protected RegionScanner innerScanner;
+    protected Region region;
+    private final UngroupedAggregateRegionObserver ungroupedAggregateRegionObserver;
+    final private RegionCoprocessorEnvironment env;
+    private final boolean useQualifierAsIndex;
+    boolean needToWrite = false;
+    final Pair<Integer, Integer> minMaxQualifiers;
+    byte[][] values = null;
+    PTable.QualifierEncodingScheme encodingScheme;
+    PTable writeToTable = null;
+    PTable projectedTable = null;
+    boolean isDescRowKeyOrderUpgrade;
+    final int offset;
+    boolean buildLocalIndex;
+    List<IndexMaintainer> indexMaintainers;
+    boolean isPKChanging = false;
+    long ts;
+    PhoenixTransactionProvider txnProvider = null;
+    UngroupedAggregateRegionObserver.MutationList indexMutations;
+    boolean isDelete = false;
+    byte[] replayMutations;
+    boolean isUpsert = false;
+    List<Expression> selectExpressions = null;
+    byte[] deleteCQ = null;
+    byte[] deleteCF = null;
+    byte[] emptyCF = null;
+    byte[] indexUUID;
+    byte[] txState;
+    byte[] clientVersionBytes;
+    long blockingMemStoreSize;
+    long maxBatchSizeBytes = 0L;
+    HTable targetHTable = null;
+    boolean incrScanRefCount = false;
+    Aggregators aggregators = null;
+    Aggregator[] rowAggregators = null;
+    byte[] indexMaintainersPtr;
+    boolean useIndexProto;
+    MemoryManager.MemoryChunk em = null;
+    boolean hasMore = false;
+
+    public UngroupedAggregateRegionScanner(final ObserverContext<RegionCoprocessorEnvironment> c,
+                                           final RegionScanner innerScanner, final Region region, final Scan scan,
+                                           final RegionCoprocessorEnvironment env,
+                                           final UngroupedAggregateRegionObserver ungroupedAggregateRegionObserver)
+            throws IOException, SQLException{
+        super(innerScanner);
+        this.env = env;
+        this.region = region;
+        this.scan = scan;
+        this.ungroupedAggregateRegionObserver = ungroupedAggregateRegionObserver;
+        this.innerScanner = innerScanner;
+        Configuration conf = env.getConfiguration();
+        byte[] pageSizeFromScan =
+                scan.getAttribute(BaseScannerRegionObserver.AGGREGATE_PAGE_ROWS);
+        if (pageSizeFromScan != null) {
+            pageSizeInRows = Bytes.toLong(pageSizeFromScan);
+        } else {
+            pageSizeInRows =
+                    conf.getLong(UNGROUPED_AGGREGATE_PAGE_SIZE_IN_ROWS,
+                            QueryServicesOptions.DEFAULT_UNGROUPED_AGGREGATE_PAGE_SIZE_IN_ROWS);
+        }
+        ts = scan.getTimeRange().getMax();
+        boolean localIndexScan = ScanUtil.isLocalIndex(scan);
+
+        encodingScheme = EncodedColumnsUtil.getQualifierEncodingScheme(scan);
+        int offsetToBe = 0;
+        if (localIndexScan) {
+            /*
+             * For local indexes, we need to set an offset on row key expressions to skip
+             * the region start key.
+             */
+            offsetToBe = region.getRegionInfo().getStartKey().length != 0 ? region.getRegionInfo().getStartKey().length :
+                    region.getRegionInfo().getEndKey().length;
+            ScanUtil.setRowKeyOffset(scan, offsetToBe);
+        }
+        offset = offsetToBe;
+
+        byte[] descRowKeyTableBytes = scan.getAttribute(UPGRADE_DESC_ROW_KEY);
+        isDescRowKeyOrderUpgrade = descRowKeyTableBytes != null;
+        if (isDescRowKeyOrderUpgrade) {
+            LOGGER.debug("Upgrading row key for " + region.getRegionInfo().getTable().getNameAsString());
+            projectedTable = deserializeTable(descRowKeyTableBytes);
+            try {
+                writeToTable = PTableImpl.builderWithColumns(projectedTable,
+                        getColumnsToClone(projectedTable))
+                        .setRowKeyOrderOptimizable(true)
+                        .build();
+            } catch (SQLException e) {
+                ServerUtil.throwIOException("Upgrade failed", e); // Impossible
+            }
+            values = new byte[projectedTable.getPKColumns().size()][];
+        }
+        boolean useProto = false;
+        byte[] localIndexBytes = scan.getAttribute(LOCAL_INDEX_BUILD_PROTO);
+        useProto = localIndexBytes != null;
+        if (localIndexBytes == null) {
+            localIndexBytes = scan.getAttribute(LOCAL_INDEX_BUILD);
+        }
+        indexMaintainers = localIndexBytes == null ? null : IndexMaintainer.deserialize(localIndexBytes, useProto);
+        indexMutations = localIndexBytes == null ? new UngroupedAggregateRegionObserver.MutationList() : new UngroupedAggregateRegionObserver.MutationList(1024);
+
+        replayMutations = scan.getAttribute(REPLAY_WRITES);
+        indexUUID = scan.getAttribute(PhoenixIndexCodec.INDEX_UUID);
+        txState = scan.getAttribute(BaseScannerRegionObserver.TX_STATE);
+        clientVersionBytes = scan.getAttribute(BaseScannerRegionObserver.CLIENT_VERSION);
+        if (txState != null) {
+            int clientVersion = clientVersionBytes == null ? ScanUtil.UNKNOWN_CLIENT_VERSION : Bytes.toInt(clientVersionBytes);
+            txnProvider = TransactionFactory.getTransactionProvider(txState, clientVersion);
+        }
+        byte[] upsertSelectTable = scan.getAttribute(BaseScannerRegionObserver.UPSERT_SELECT_TABLE);
+        if (upsertSelectTable != null) {
+            isUpsert = true;
+            projectedTable = deserializeTable(upsertSelectTable);
+            targetHTable = new HTable(ungroupedAggregateRegionObserver.getUpsertSelectConfig(),
+                    projectedTable.getPhysicalName().getBytes());
+            selectExpressions = deserializeExpressions(scan.getAttribute(BaseScannerRegionObserver.UPSERT_SELECT_EXPRS));
+            values = new byte[projectedTable.getPKColumns().size()][];
+            isPKChanging = ExpressionUtil.isPkPositionChanging(new TableRef(projectedTable), selectExpressions);
+        } else {
+            byte[] isDeleteAgg = scan.getAttribute(BaseScannerRegionObserver.DELETE_AGG);
+            isDelete = isDeleteAgg != null && Bytes.compareTo(PDataType.TRUE_BYTES, isDeleteAgg) == 0;
+            if (!isDelete) {
+                deleteCF = scan.getAttribute(BaseScannerRegionObserver.DELETE_CF);
+                deleteCQ = scan.getAttribute(BaseScannerRegionObserver.DELETE_CQ);
+            }
+            emptyCF = scan.getAttribute(BaseScannerRegionObserver.EMPTY_CF);
+        }
+        ColumnReference[] dataColumns = IndexUtil.deserializeDataTableColumnsToJoin(scan);
+        useQualifierAsIndex = EncodedColumnsUtil.useQualifierAsIndex(EncodedColumnsUtil.getMinMaxQualifiersFromScan(scan));
+
+        /**
+         * Slow down the writes if the memstore size more than
+         * (hbase.hregion.memstore.block.multiplier - 1) times hbase.hregion.memstore.flush.size
+         * bytes. This avoids flush storm to hdfs for cases like index building where reads and
+         * write happen to all the table regions in the server.
+         */
+        blockingMemStoreSize = getBlockingMemstoreSize(region, conf) ;
+
+        buildLocalIndex = indexMaintainers != null && dataColumns==null && !localIndexScan;
+        if(buildLocalIndex) {
+            checkForLocalIndexColumnFamilies(region, indexMaintainers);
+        }
+        if (isDescRowKeyOrderUpgrade || isDelete || isUpsert
+                || (deleteCQ != null && deleteCF != null) || emptyCF != null || buildLocalIndex) {
+            needToWrite = true;
+            if((isUpsert && (targetHTable == null ||
+                    !targetHTable.getName().equals(region.getTableDesc().getTableName())))) {
+                needToWrite = false;
+            }
+            maxBatchSize = conf.getInt(MUTATE_BATCH_SIZE_ATTRIB, QueryServicesOptions.DEFAULT_MUTATE_BATCH_SIZE);
+            maxBatchSizeBytes = conf.getLong(MUTATE_BATCH_SIZE_BYTES_ATTRIB,
+                    QueryServicesOptions.DEFAULT_MUTATE_BATCH_SIZE_BYTES);
+        }
+        minMaxQualifiers = EncodedColumnsUtil.getMinMaxQualifiersFromScan(scan);
+        final TenantCache tenantCache = GlobalCache.getTenantCache(env, ScanUtil.getTenantId(scan));
+
+        em = tenantCache.getMemoryManager().allocate(0);
+        aggregators = ServerAggregators.deserialize(
+                scan.getAttribute(BaseScannerRegionObserver.AGGREGATORS), conf, em);
+        rowAggregators = aggregators.getAggregators();
+        if (LOGGER.isDebugEnabled()) {
+            LOGGER.debug(LogUtil.addCustomAnnotations("Starting ungrouped coprocessor scan " + scan + " " + region.getRegionInfo(), ScanUtil.getCustomAnnotations(scan)));
+        }
+        useIndexProto = true;
+        indexMaintainersPtr = scan.getAttribute(PhoenixIndexCodec.INDEX_PROTO_MD);
+        // for backward compatiblity fall back to look by the old attribute
+        if (indexMaintainersPtr == null) {
+            indexMaintainersPtr = scan.getAttribute(PhoenixIndexCodec.INDEX_MD);
+            useIndexProto = false;
+        }
+
+        if (needToWrite) {
+            ungroupedAggregateRegionObserver.incrementScansReferenceCount();
+            incrScanRefCount = true;
+        }
+    }
+
+    @Override
+    public HRegionInfo getRegionInfo() {
+        return region.getRegionInfo();
+    }
+
+    @Override
+    public boolean isFilterDone() {
+        return false;
+    }
+
+    @Override
+    public void close() throws IOException {
+        if (needToWrite && incrScanRefCount) {
+            ungroupedAggregateRegionObserver.decrementScansReferenceCount();
+        }
+        try {
+            if (targetHTable != null) {
+                try {
+                    targetHTable.close();
+                } catch (IOException e) {
+                    LOGGER.error("Closing table: " + targetHTable + " failed: ", e);
+                }
+            }
+        } finally {
+            innerScanner.close();
+            if (em != null) {
+                em.close();
+            }
+        }
+    }
+
+    boolean descRowKeyOrderUpgrade(List<Cell> results, ImmutableBytesWritable ptr,
+                                UngroupedAggregateRegionObserver.MutationList mutations) throws IOException {
+        Arrays.fill(values, null);
+        Cell firstKV = results.get(0);
+        RowKeySchema schema = projectedTable.getRowKeySchema();
+        int maxOffset = schema.iterator(firstKV.getRowArray(), firstKV.getRowOffset() + offset, firstKV.getRowLength(), ptr);
+        for (int i = 0; i < schema.getFieldCount(); i++) {
+            Boolean hasValue = schema.next(ptr, i, maxOffset);
+            if (hasValue == null) {
+                break;
+            }
+            ValueSchema.Field field = schema.getField(i);
+            if (field.getSortOrder() == SortOrder.DESC) {
+                // Special case for re-writing DESC ARRAY, as the actual byte value needs to change in this case
+                if (field.getDataType().isArrayType()) {
+                    field.getDataType().coerceBytes(ptr, null, field.getDataType(),
+                            field.getMaxLength(), field.getScale(), field.getSortOrder(),
+                            field.getMaxLength(), field.getScale(), field.getSortOrder(), true); // force to use correct separator byte
+                }
+                // Special case for re-writing DESC CHAR or DESC BINARY, to force the re-writing of trailing space characters
+                else if (field.getDataType() == PChar.INSTANCE || field.getDataType() == PBinary.INSTANCE) {
+                    int len = ptr.getLength();
+                    while (len > 0 && ptr.get()[ptr.getOffset() + len - 1] == StringUtil.SPACE_UTF8) {
+                        len--;
+                    }
+                    ptr.set(ptr.get(), ptr.getOffset(), len);
+                    // Special case for re-writing DESC FLOAT and DOUBLE, as they're not inverted like they should be (PHOENIX-2171)
+                } else if (field.getDataType() == PFloat.INSTANCE || field.getDataType() == PDouble.INSTANCE) {
+                    byte[] invertedBytes = SortOrder.invert(ptr.get(), ptr.getOffset(), ptr.getLength());
+                    ptr.set(invertedBytes);
+                }
+            } else if (field.getDataType() == PBinary.INSTANCE) {
+                // Remove trailing space characters so that the setValues call below will replace them
+                // with the correct zero byte character. Note this is somewhat dangerous as these
+                // could be legit, but I don't know what the alternative is.
+                int len = ptr.getLength();
+                while (len > 0 && ptr.get()[ptr.getOffset() + len - 1] == StringUtil.SPACE_UTF8) {
+                    len--;
+                }
+                ptr.set(ptr.get(), ptr.getOffset(), len);
+            }
+            values[i] = ptr.copyBytes();
+        }
+        writeToTable.newKey(ptr, values);
+        if (Bytes.compareTo(
+                firstKV.getRowArray(), firstKV.getRowOffset() + offset, firstKV.getRowLength(),
+                ptr.get(),ptr.getOffset() + offset,ptr.getLength()) == 0) {
+            return false;
+        }
+        byte[] newRow = ByteUtil.copyKeyBytesIfNecessary(ptr);
+        if (offset > 0) { // for local indexes (prepend region start key)
+            byte[] newRowWithOffset = new byte[offset + newRow.length];
+            System.arraycopy(firstKV.getRowArray(), firstKV.getRowOffset(), newRowWithOffset, 0, offset);;
+            System.arraycopy(newRow, 0, newRowWithOffset, offset, newRow.length);
+            newRow = newRowWithOffset;
+        }
+        byte[] oldRow = Bytes.copy(firstKV.getRowArray(), firstKV.getRowOffset(), firstKV.getRowLength());
+        for (Cell cell : results) {
+            // Copy existing cell but with new row key
+            Cell newCell = new KeyValue(newRow, 0, newRow.length,
+                    cell.getFamilyArray(), cell.getFamilyOffset(), cell.getFamilyLength(),
+                    cell.getQualifierArray(), cell.getQualifierOffset(), cell.getQualifierLength(),
+                    cell.getTimestamp(), KeyValue.Type.codeToType(cell.getTypeByte()),
+                    cell.getValueArray(), cell.getValueOffset(), cell.getValueLength());
+            switch (KeyValue.Type.codeToType(cell.getTypeByte())) {
+                case Put:
+                    // If Put, point delete old Put
+                    Delete del = new Delete(oldRow);
+                    del.addDeleteMarker(new KeyValue(cell.getRowArray(), cell.getRowOffset(), cell.getRowLength(),
+                            cell.getFamilyArray(), cell.getFamilyOffset(), cell.getFamilyLength(),
+                            cell.getQualifierArray(), cell.getQualifierOffset(),
+                            cell.getQualifierLength(), cell.getTimestamp(), KeyValue.Type.Delete,
+                            ByteUtil.EMPTY_BYTE_ARRAY, 0, 0));
+                    mutations.add(del);
+
+                    Put put = new Put(newRow);
+                    put.add(newCell);
+                    mutations.add(put);
+                    break;
+                case Delete:
+                case DeleteColumn:
+                case DeleteFamily:
+                case DeleteFamilyVersion:
+                    Delete delete = new Delete(newRow);
+                    delete.addDeleteMarker(newCell);
+                    mutations.add(delete);
+                    break;
+            }
+        }
+        return true;
+    }
+
+    void buildLocalIndex(Tuple result, List<Cell> results, ImmutableBytesWritable ptr,
+                         UngroupedAggregateRegionObserver.MutationList mutations) throws IOException {
+        for (IndexMaintainer maintainer : indexMaintainers) {
+            if (!results.isEmpty()) {
+                result.getKey(ptr);
+                ValueGetter valueGetter =
+                        maintainer.createGetterFromKeyValues(
+                                ImmutableBytesPtr.copyBytesIfNecessary(ptr),
+                                results);
+                Put put = maintainer.buildUpdateMutation(GenericKeyValueBuilder.INSTANCE,
+                        valueGetter, ptr, results.get(0).getTimestamp(),
+                        env.getRegion().getRegionInfo().getStartKey(),
+                        env.getRegion().getRegionInfo().getEndKey());
+
+                if (txnProvider != null) {
+                    put = txnProvider.markPutAsCommitted(put, ts, ts);
+                }
+                indexMutations.add(put);
+            }
+        }
+        result.setKeyValues(results);
+    }
+    void deleteRow(List<Cell> results, UngroupedAggregateRegionObserver.MutationList mutations) {
+        // FIXME: the version of the Delete constructor without the lock
+        // args was introduced in 0.94.4, thus if we try to use it here
+        // we can no longer use the 0.94.2 version of the client.
+        Cell firstKV = results.get(0);
+        Delete delete = new Delete(firstKV.getRowArray(),
+                firstKV.getRowOffset(), firstKV.getRowLength(),ts);
+        if (replayMutations != null) {
+            delete.setAttribute(REPLAY_WRITES, replayMutations);
+        }
+        mutations.add(delete);
+        // force tephra to ignore this deletes
+        delete.setAttribute(PhoenixTransactionContext.TX_ROLLBACK_ATTRIBUTE_KEY, new byte[0]);
+    }
+
+    void deleteCForQ(Tuple result, List<Cell> results, UngroupedAggregateRegionObserver.MutationList mutations) {
+        // No need to search for delete column, since we project only it
+        // if no empty key value is being set
+        if (emptyCF == null ||
+                result.getValue(deleteCF, deleteCQ) != null) {
+            Delete delete = new Delete(results.get(0).getRowArray(),
+                    results.get(0).getRowOffset(),
+                    results.get(0).getRowLength());
+            delete.deleteColumns(deleteCF,  deleteCQ, ts);
+            // force tephra to ignore this deletes
+            delete.setAttribute(PhoenixTransactionContext.TX_ROLLBACK_ATTRIBUTE_KEY, new byte[0]);
+            mutations.add(delete);
+        }
+    }
+    void upsert(Tuple result, ImmutableBytesWritable ptr, UngroupedAggregateRegionObserver.MutationList mutations) {
+        Arrays.fill(values, null);
+        int bucketNumOffset = 0;
+        if (projectedTable.getBucketNum() != null) {
+            values[0] = new byte[] { 0 };
+            bucketNumOffset = 1;
+        }
+        int i = bucketNumOffset;
+        List<PColumn> projectedColumns = projectedTable.getColumns();
+        for (; i < projectedTable.getPKColumns().size(); i++) {
+            Expression expression = selectExpressions.get(i - bucketNumOffset);
+            if (expression.evaluate(result, ptr)) {
+                values[i] = ptr.copyBytes();
+                // If SortOrder from expression in SELECT doesn't match the
+                // column being projected into then invert the bits.
+                if (expression.getSortOrder() !=
+                        projectedColumns.get(i).getSortOrder()) {
+                    SortOrder.invert(values[i], 0, values[i], 0,
+                            values[i].length);
+                }
+            }else{
+                values[i] = ByteUtil.EMPTY_BYTE_ARRAY;
+            }
+        }
+        projectedTable.newKey(ptr, values);
+        PRow row = projectedTable.newRow(GenericKeyValueBuilder.INSTANCE, ts, ptr, false);
+        for (; i < projectedColumns.size(); i++) {
+            Expression expression = selectExpressions.get(i - bucketNumOffset);
+            if (expression.evaluate(result, ptr)) {
+                PColumn column = projectedColumns.get(i);
+                if (!column.getDataType().isSizeCompatible(ptr, null,
+                        expression.getDataType(), expression.getSortOrder(),
+                        expression.getMaxLength(), expression.getScale(),
+                        column.getMaxLength(), column.getScale())) {
+                    throw new DataExceedsCapacityException(
+                            column.getDataType(), column.getMaxLength(),
+                            column.getScale(), column.getName().getString(), ptr);
+                }
+                column.getDataType().coerceBytes(ptr, null,
+                        expression.getDataType(), expression.getMaxLength(),
+                        expression.getScale(), expression.getSortOrder(),
+                        column.getMaxLength(), column.getScale(),
+                        column.getSortOrder(), projectedTable.rowKeyOrderOptimizable());
+                byte[] bytes = ByteUtil.copyKeyBytesIfNecessary(ptr);
+                row.setValue(column, bytes);
+            }
+        }
+        for (Mutation mutation : row.toRowMutations()) {
+            if (replayMutations != null) {
+                mutation.setAttribute(REPLAY_WRITES, replayMutations);
+            } else if (txnProvider != null && projectedTable.getType() == PTableType.INDEX) {
+                mutation = txnProvider.markPutAsCommitted((Put)mutation, ts, ts);
+            }
+            mutations.add(mutation);
+        }
+        for (i = 0; i < selectExpressions.size(); i++) {
+            selectExpressions.get(i).reset();
+        }
+    }
+
+    void insertEmptyKeyValue(List<Cell> results, UngroupedAggregateRegionObserver.MutationList mutations) {
+        Set<Long> timeStamps =
+                Sets.newHashSetWithExpectedSize(results.size());
+        for (Cell kv : results) {
+            long kvts = kv.getTimestamp();
+            if (!timeStamps.contains(kvts)) {
+                Put put = new Put(kv.getRowArray(), kv.getRowOffset(),
+                        kv.getRowLength());
+                put.addColumn(emptyCF, QueryConstants.EMPTY_COLUMN_BYTES, kvts,
+                        ByteUtil.EMPTY_BYTE_ARRAY);
+                mutations.add(put);
+            }
+        }
+    }
+    @Override
+    public boolean next(List<Cell> resultsToReturn) throws IOException {
+        aggregators.reset(rowAggregators);
+        Cell lastCell = null;
+        int rowCount = 0;
+        boolean hasAny = false;
+        ImmutableBytesWritable ptr = new ImmutableBytesWritable();
+        Tuple result = useQualifierAsIndex ? new PositionBasedMultiKeyValueTuple() : new MultiKeyValueTuple();
+        UngroupedAggregateRegionObserver.MutationList mutations = new UngroupedAggregateRegionObserver.MutationList();
+        if (isDescRowKeyOrderUpgrade || isDelete || isUpsert
+                || (deleteCQ != null && deleteCF != null) || emptyCF != null || buildLocalIndex) {
+            mutations = new UngroupedAggregateRegionObserver.MutationList(Ints.saturatedCast(maxBatchSize + maxBatchSize / 10));
+        }
+        region.startRegionOperation();
+        try {
+        synchronized (innerScanner) {
+            do {
+                List<Cell> results = useQualifierAsIndex ? new EncodedColumnQualiferCellsList(minMaxQualifiers.getFirst(), minMaxQualifiers.getSecond(), encodingScheme) : new ArrayList<Cell>();
+                // Results are potentially returned even when the return value of s.next is false
+                // since this is an indication of whether or not there are more values after the
+                // ones returned
+                hasMore = innerScanner.nextRaw(results);
+                if (!results.isEmpty()) {
+                    lastCell = results.get(0);
+                    rowCount++;
+                    result.setKeyValues(results);
+                    if (isDescRowKeyOrderUpgrade) {
+                        if (!descRowKeyOrderUpgrade(results, ptr, mutations)){
+                            continue;
+                        }
+                    } else if (buildLocalIndex) {
+                        buildLocalIndex(result, results, ptr, mutations);
+                    } else if (isDelete) {
+                        deleteRow(results, mutations);
+                    } else if (isUpsert) {
+                        upsert(result, ptr, mutations);

Review comment:
       Thank you SO much for these Extract Methods. :-) 

##########
File path: phoenix-core/src/main/java/org/apache/phoenix/coprocessor/UngroupedAggregateRegionScanner.java
##########
@@ -0,0 +1,646 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.phoenix.coprocessor;
+
+import static org.apache.phoenix.coprocessor.BaseScannerRegionObserver.LOCAL_INDEX_BUILD;
+import static org.apache.phoenix.coprocessor.BaseScannerRegionObserver.LOCAL_INDEX_BUILD_PROTO;
+import static org.apache.phoenix.coprocessor.BaseScannerRegionObserver.REPLAY_WRITES;
+import static org.apache.phoenix.coprocessor.BaseScannerRegionObserver.UPGRADE_DESC_ROW_KEY;
+import static org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver.checkForLocalIndexColumnFamilies;
+import static org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver.deserializeExpressions;
+import static org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver.deserializeTable;
+import static org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver.getBlockingMemstoreSize;
+import static org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver.setIndexAndTransactionProperties;
+import static org.apache.phoenix.query.QueryConstants.AGG_TIMESTAMP;
+import static org.apache.phoenix.query.QueryConstants.SINGLE_COLUMN;
+import static org.apache.phoenix.query.QueryConstants.SINGLE_COLUMN_FAMILY;
+import static org.apache.phoenix.query.QueryServices.MUTATE_BATCH_SIZE_ATTRIB;
+import static org.apache.phoenix.query.QueryServices.MUTATE_BATCH_SIZE_BYTES_ATTRIB;
+import static org.apache.phoenix.query.QueryServices.UNGROUPED_AGGREGATE_PAGE_SIZE_IN_MS;
+import static org.apache.phoenix.schema.PTableImpl.getColumnsToClone;
+
+import java.io.IOException;
+import java.sql.SQLException;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.List;
+import java.util.Set;
+
+import com.google.common.collect.Sets;
+import com.google.common.primitives.Ints;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.hbase.Cell;
+import org.apache.hadoop.hbase.CellUtil;
+import org.apache.hadoop.hbase.DoNotRetryIOException;
+import org.apache.hadoop.hbase.HRegionInfo;
+import org.apache.hadoop.hbase.KeyValue;
+import org.apache.hadoop.hbase.client.Delete;
+import org.apache.hadoop.hbase.client.HTable;
+import org.apache.hadoop.hbase.client.Mutation;
+import org.apache.hadoop.hbase.client.Put;
+import org.apache.hadoop.hbase.client.Scan;
+import org.apache.hadoop.hbase.coprocessor.ObserverContext;
+import org.apache.hadoop.hbase.coprocessor.RegionCoprocessorEnvironment;
+import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
+import org.apache.hadoop.hbase.regionserver.Region;
+import org.apache.hadoop.hbase.regionserver.RegionScanner;
+import org.apache.hadoop.hbase.util.Bytes;
+import org.apache.phoenix.cache.GlobalCache;
+import org.apache.phoenix.cache.TenantCache;
+import org.apache.phoenix.exception.DataExceedsCapacityException;
+import org.apache.phoenix.expression.Expression;
+import org.apache.phoenix.expression.aggregator.Aggregator;
+import org.apache.phoenix.expression.aggregator.Aggregators;
+import org.apache.phoenix.expression.aggregator.ServerAggregators;
+import org.apache.phoenix.hbase.index.covered.update.ColumnReference;
+import org.apache.phoenix.hbase.index.util.GenericKeyValueBuilder;
+import org.apache.phoenix.index.PhoenixIndexCodec;
+import org.apache.phoenix.memory.InsufficientMemoryException;
+import org.apache.phoenix.memory.MemoryManager;
+import org.apache.phoenix.query.QueryConstants;
+import org.apache.phoenix.query.QueryServicesOptions;
+import org.apache.phoenix.schema.PColumn;
+import org.apache.phoenix.schema.PRow;
+import org.apache.phoenix.schema.PTable;
+import org.apache.phoenix.schema.PTableImpl;
+import org.apache.phoenix.schema.PTableType;
+import org.apache.phoenix.schema.RowKeySchema;
+import org.apache.phoenix.schema.SortOrder;
+import org.apache.phoenix.schema.TableRef;
+import org.apache.phoenix.schema.ValueSchema;
+import org.apache.phoenix.schema.tuple.EncodedColumnQualiferCellsList;
+import org.apache.phoenix.schema.tuple.MultiKeyValueTuple;
+import org.apache.phoenix.schema.tuple.PositionBasedMultiKeyValueTuple;
+import org.apache.phoenix.schema.tuple.Tuple;
+import org.apache.phoenix.schema.types.PBinary;
+import org.apache.phoenix.schema.types.PChar;
+import org.apache.phoenix.schema.types.PDataType;
+import org.apache.phoenix.schema.types.PDouble;
+import org.apache.phoenix.schema.types.PFloat;
+import org.apache.phoenix.transaction.PhoenixTransactionContext;
+import org.apache.phoenix.transaction.PhoenixTransactionProvider;
+import org.apache.phoenix.transaction.TransactionFactory;
+import org.apache.phoenix.util.ByteUtil;
+import org.apache.phoenix.util.EncodedColumnsUtil;
+import org.apache.hadoop.hbase.util.Pair;
+import org.apache.phoenix.hbase.index.ValueGetter;
+import org.apache.phoenix.hbase.index.util.ImmutableBytesPtr;
+import org.apache.phoenix.index.IndexMaintainer;
+import org.apache.phoenix.util.EnvironmentEdgeManager;
+import org.apache.phoenix.util.ExpressionUtil;
+import org.apache.phoenix.util.IndexUtil;
+import org.apache.phoenix.util.KeyValueUtil;
+import org.apache.phoenix.util.LogUtil;
+import org.apache.phoenix.util.ScanUtil;
+import org.apache.phoenix.util.ServerUtil;
+import org.apache.phoenix.util.StringUtil;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+public class UngroupedAggregateRegionScanner extends BaseRegionScanner {
+
+    private static final Logger LOGGER = LoggerFactory.getLogger(UngroupedAggregateRegionScanner.class);
+
+    protected long pageSizeInMs = Long.MAX_VALUE;
+    protected int maxBatchSize = 0;
+    protected Scan scan;
+    protected RegionScanner innerScanner;
+    protected Region region;
+    private final UngroupedAggregateRegionObserver ungroupedAggregateRegionObserver;
+    final private RegionCoprocessorEnvironment env;
+    private final boolean useQualifierAsIndex;
+    boolean needToWrite = false;
+    final Pair<Integer, Integer> minMaxQualifiers;
+    byte[][] values = null;
+    PTable.QualifierEncodingScheme encodingScheme;
+    PTable writeToTable = null;
+    PTable projectedTable = null;
+    boolean isDescRowKeyOrderUpgrade;
+    final int offset;
+    boolean buildLocalIndex;
+    List<IndexMaintainer> indexMaintainers;
+    boolean isPKChanging = false;
+    long ts;
+    PhoenixTransactionProvider txnProvider = null;
+    UngroupedAggregateRegionObserver.MutationList indexMutations;
+    boolean isDelete = false;
+    byte[] replayMutations;
+    boolean isUpsert = false;
+    List<Expression> selectExpressions = null;
+    byte[] deleteCQ = null;
+    byte[] deleteCF = null;
+    byte[] emptyCF = null;
+    byte[] indexUUID;
+    byte[] txState;
+    byte[] clientVersionBytes;
+    long blockingMemStoreSize;
+    long maxBatchSizeBytes = 0L;
+    HTable targetHTable = null;
+    boolean incrScanRefCount = false;
+    byte[] indexMaintainersPtr;
+    boolean useIndexProto;
+
+    public UngroupedAggregateRegionScanner(final ObserverContext<RegionCoprocessorEnvironment> c,
+                                           final RegionScanner innerScanner, final Region region, final Scan scan,
+                                           final RegionCoprocessorEnvironment env,
+                                           final UngroupedAggregateRegionObserver ungroupedAggregateRegionObserver)
+            throws IOException, SQLException{
+        super(innerScanner);
+        this.env = env;
+        this.region = region;
+        this.scan = scan;
+        this.ungroupedAggregateRegionObserver = ungroupedAggregateRegionObserver;
+        this.innerScanner = innerScanner;
+        Configuration conf = env.getConfiguration();
+        if (scan.getAttribute(BaseScannerRegionObserver.SERVER_PAGING) != null) {
+            byte[] pageSizeFromScan =
+                    scan.getAttribute(BaseScannerRegionObserver.AGGREGATE_PAGE_ROWS);
+            if (pageSizeFromScan != null) {
+                pageSizeInMs = Bytes.toLong(pageSizeFromScan);
+            } else {
+                pageSizeInMs =
+                        conf.getLong(UNGROUPED_AGGREGATE_PAGE_SIZE_IN_MS,
+                                QueryServicesOptions.DEFAULT_UNGROUPED_AGGREGATE_PAGE_SIZE_IN_MS);
+            }
+        }
+        ts = scan.getTimeRange().getMax();
+        boolean localIndexScan = ScanUtil.isLocalIndex(scan);
+
+        encodingScheme = EncodedColumnsUtil.getQualifierEncodingScheme(scan);
+        int offsetToBe = 0;
+        if (localIndexScan) {
+            /*
+             * For local indexes, we need to set an offset on row key expressions to skip
+             * the region start key.
+             */
+            offsetToBe = region.getRegionInfo().getStartKey().length != 0 ? region.getRegionInfo().getStartKey().length :
+                    region.getRegionInfo().getEndKey().length;
+            ScanUtil.setRowKeyOffset(scan, offsetToBe);
+        }
+        offset = offsetToBe;
+
+        byte[] descRowKeyTableBytes = scan.getAttribute(UPGRADE_DESC_ROW_KEY);
+        isDescRowKeyOrderUpgrade = descRowKeyTableBytes != null;
+        if (isDescRowKeyOrderUpgrade) {
+            LOGGER.debug("Upgrading row key for " + region.getRegionInfo().getTable().getNameAsString());
+            projectedTable = deserializeTable(descRowKeyTableBytes);
+            try {
+                writeToTable = PTableImpl.builderWithColumns(projectedTable,
+                        getColumnsToClone(projectedTable))
+                        .setRowKeyOrderOptimizable(true)
+                        .build();
+            } catch (SQLException e) {
+                ServerUtil.throwIOException("Upgrade failed", e); // Impossible
+            }
+            values = new byte[projectedTable.getPKColumns().size()][];
+        }
+        boolean useProto = false;
+        byte[] localIndexBytes = scan.getAttribute(LOCAL_INDEX_BUILD_PROTO);
+        useProto = localIndexBytes != null;
+        if (localIndexBytes == null) {
+            localIndexBytes = scan.getAttribute(LOCAL_INDEX_BUILD);
+        }
+        indexMaintainers = localIndexBytes == null ? null : IndexMaintainer.deserialize(localIndexBytes, useProto);
+        indexMutations = localIndexBytes == null ? new UngroupedAggregateRegionObserver.MutationList() : new UngroupedAggregateRegionObserver.MutationList(1024);
+
+        replayMutations = scan.getAttribute(REPLAY_WRITES);
+        indexUUID = scan.getAttribute(PhoenixIndexCodec.INDEX_UUID);
+        txState = scan.getAttribute(BaseScannerRegionObserver.TX_STATE);
+        clientVersionBytes = scan.getAttribute(BaseScannerRegionObserver.CLIENT_VERSION);
+        if (txState != null) {
+            int clientVersion = clientVersionBytes == null ? ScanUtil.UNKNOWN_CLIENT_VERSION : Bytes.toInt(clientVersionBytes);
+            txnProvider = TransactionFactory.getTransactionProvider(txState, clientVersion);
+        }
+        byte[] upsertSelectTable = scan.getAttribute(BaseScannerRegionObserver.UPSERT_SELECT_TABLE);
+        if (upsertSelectTable != null) {
+            isUpsert = true;
+            projectedTable = deserializeTable(upsertSelectTable);
+            targetHTable = new HTable(ungroupedAggregateRegionObserver.getUpsertSelectConfig(),
+                    projectedTable.getPhysicalName().getBytes());
+            selectExpressions = deserializeExpressions(scan.getAttribute(BaseScannerRegionObserver.UPSERT_SELECT_EXPRS));
+            values = new byte[projectedTable.getPKColumns().size()][];
+            isPKChanging = ExpressionUtil.isPkPositionChanging(new TableRef(projectedTable), selectExpressions);
+        } else {
+            byte[] isDeleteAgg = scan.getAttribute(BaseScannerRegionObserver.DELETE_AGG);
+            isDelete = isDeleteAgg != null && Bytes.compareTo(PDataType.TRUE_BYTES, isDeleteAgg) == 0;
+            if (!isDelete) {
+                deleteCF = scan.getAttribute(BaseScannerRegionObserver.DELETE_CF);
+                deleteCQ = scan.getAttribute(BaseScannerRegionObserver.DELETE_CQ);
+            }
+            emptyCF = scan.getAttribute(BaseScannerRegionObserver.EMPTY_CF);
+        }
+        ColumnReference[] dataColumns = IndexUtil.deserializeDataTableColumnsToJoin(scan);
+        useQualifierAsIndex = EncodedColumnsUtil.useQualifierAsIndex(EncodedColumnsUtil.getMinMaxQualifiersFromScan(scan));
+
+        /**
+         * Slow down the writes if the memstore size more than
+         * (hbase.hregion.memstore.block.multiplier - 1) times hbase.hregion.memstore.flush.size
+         * bytes. This avoids flush storm to hdfs for cases like index building where reads and
+         * write happen to all the table regions in the server.
+         */
+        blockingMemStoreSize = getBlockingMemstoreSize(region, conf) ;
+
+        buildLocalIndex = indexMaintainers != null && dataColumns==null && !localIndexScan;
+        if(buildLocalIndex) {
+            checkForLocalIndexColumnFamilies(region, indexMaintainers);
+        }
+        if (isDescRowKeyOrderUpgrade || isDelete || isUpsert
+                || (deleteCQ != null && deleteCF != null) || emptyCF != null || buildLocalIndex) {
+            needToWrite = true;
+            if((isUpsert && (targetHTable == null ||
+                    !targetHTable.getName().equals(region.getTableDesc().getTableName())))) {
+                needToWrite = false;
+            }
+            maxBatchSize = conf.getInt(MUTATE_BATCH_SIZE_ATTRIB, QueryServicesOptions.DEFAULT_MUTATE_BATCH_SIZE);
+            maxBatchSizeBytes = conf.getLong(MUTATE_BATCH_SIZE_BYTES_ATTRIB,
+                    QueryServicesOptions.DEFAULT_MUTATE_BATCH_SIZE_BYTES);
+        }
+        minMaxQualifiers = EncodedColumnsUtil.getMinMaxQualifiersFromScan(scan);
+        if (LOGGER.isDebugEnabled()) {
+            LOGGER.debug(LogUtil.addCustomAnnotations("Starting ungrouped coprocessor scan " + scan + " " + region.getRegionInfo(), ScanUtil.getCustomAnnotations(scan)));
+        }
+        useIndexProto = true;
+        indexMaintainersPtr = scan.getAttribute(PhoenixIndexCodec.INDEX_PROTO_MD);
+        // for backward compatiblity fall back to look by the old attribute
+        if (indexMaintainersPtr == null) {
+            indexMaintainersPtr = scan.getAttribute(PhoenixIndexCodec.INDEX_MD);
+            useIndexProto = false;
+        }
+
+        if (needToWrite) {
+            ungroupedAggregateRegionObserver.incrementScansReferenceCount();
+            incrScanRefCount = true;
+        }
+    }
+
+    @Override
+    public HRegionInfo getRegionInfo() {
+        return region.getRegionInfo();
+    }
+
+    @Override
+    public boolean isFilterDone() {
+        return false;
+    }
+
+    @Override
+    public void close() throws IOException {
+        if (needToWrite && incrScanRefCount) {
+            ungroupedAggregateRegionObserver.decrementScansReferenceCount();
+        }
+        try {
+            if (targetHTable != null) {
+                try {
+                    targetHTable.close();
+                } catch (IOException e) {
+                    LOGGER.error("Closing table: " + targetHTable + " failed: ", e);
+                }
+            }
+        } finally {
+            innerScanner.close();
+        }
+    }
+
+    boolean descRowKeyOrderUpgrade(List<Cell> results, ImmutableBytesWritable ptr,
+                                UngroupedAggregateRegionObserver.MutationList mutations) throws IOException {
+        Arrays.fill(values, null);
+        Cell firstKV = results.get(0);
+        RowKeySchema schema = projectedTable.getRowKeySchema();
+        int maxOffset = schema.iterator(firstKV.getRowArray(), firstKV.getRowOffset() + offset, firstKV.getRowLength(), ptr);
+        for (int i = 0; i < schema.getFieldCount(); i++) {
+            Boolean hasValue = schema.next(ptr, i, maxOffset);
+            if (hasValue == null) {
+                break;
+            }
+            ValueSchema.Field field = schema.getField(i);
+            if (field.getSortOrder() == SortOrder.DESC) {
+                // Special case for re-writing DESC ARRAY, as the actual byte value needs to change in this case
+                if (field.getDataType().isArrayType()) {
+                    field.getDataType().coerceBytes(ptr, null, field.getDataType(),
+                            field.getMaxLength(), field.getScale(), field.getSortOrder(),
+                            field.getMaxLength(), field.getScale(), field.getSortOrder(), true); // force to use correct separator byte
+                }
+                // Special case for re-writing DESC CHAR or DESC BINARY, to force the re-writing of trailing space characters
+                else if (field.getDataType() == PChar.INSTANCE || field.getDataType() == PBinary.INSTANCE) {
+                    int len = ptr.getLength();
+                    while (len > 0 && ptr.get()[ptr.getOffset() + len - 1] == StringUtil.SPACE_UTF8) {
+                        len--;
+                    }
+                    ptr.set(ptr.get(), ptr.getOffset(), len);
+                    // Special case for re-writing DESC FLOAT and DOUBLE, as they're not inverted like they should be (PHOENIX-2171)
+                } else if (field.getDataType() == PFloat.INSTANCE || field.getDataType() == PDouble.INSTANCE) {
+                    byte[] invertedBytes = SortOrder.invert(ptr.get(), ptr.getOffset(), ptr.getLength());
+                    ptr.set(invertedBytes);
+                }
+            } else if (field.getDataType() == PBinary.INSTANCE) {
+                // Remove trailing space characters so that the setValues call below will replace them
+                // with the correct zero byte character. Note this is somewhat dangerous as these
+                // could be legit, but I don't know what the alternative is.
+                int len = ptr.getLength();
+                while (len > 0 && ptr.get()[ptr.getOffset() + len - 1] == StringUtil.SPACE_UTF8) {
+                    len--;
+                }
+                ptr.set(ptr.get(), ptr.getOffset(), len);
+            }
+            values[i] = ptr.copyBytes();
+        }
+        writeToTable.newKey(ptr, values);
+        if (Bytes.compareTo(
+                firstKV.getRowArray(), firstKV.getRowOffset() + offset, firstKV.getRowLength(),
+                ptr.get(),ptr.getOffset() + offset,ptr.getLength()) == 0) {
+            return false;
+        }
+        byte[] newRow = ByteUtil.copyKeyBytesIfNecessary(ptr);
+        if (offset > 0) { // for local indexes (prepend region start key)
+            byte[] newRowWithOffset = new byte[offset + newRow.length];
+            System.arraycopy(firstKV.getRowArray(), firstKV.getRowOffset(), newRowWithOffset, 0, offset);;
+            System.arraycopy(newRow, 0, newRowWithOffset, offset, newRow.length);
+            newRow = newRowWithOffset;
+        }
+        byte[] oldRow = Bytes.copy(firstKV.getRowArray(), firstKV.getRowOffset(), firstKV.getRowLength());
+        for (Cell cell : results) {
+            // Copy existing cell but with new row key
+            Cell newCell = new KeyValue(newRow, 0, newRow.length,
+                    cell.getFamilyArray(), cell.getFamilyOffset(), cell.getFamilyLength(),
+                    cell.getQualifierArray(), cell.getQualifierOffset(), cell.getQualifierLength(),
+                    cell.getTimestamp(), KeyValue.Type.codeToType(cell.getTypeByte()),
+                    cell.getValueArray(), cell.getValueOffset(), cell.getValueLength());
+            switch (KeyValue.Type.codeToType(cell.getTypeByte())) {
+                case Put:
+                    // If Put, point delete old Put
+                    Delete del = new Delete(oldRow);
+                    del.addDeleteMarker(new KeyValue(cell.getRowArray(), cell.getRowOffset(), cell.getRowLength(),
+                            cell.getFamilyArray(), cell.getFamilyOffset(), cell.getFamilyLength(),
+                            cell.getQualifierArray(), cell.getQualifierOffset(),
+                            cell.getQualifierLength(), cell.getTimestamp(), KeyValue.Type.Delete,
+                            ByteUtil.EMPTY_BYTE_ARRAY, 0, 0));
+                    mutations.add(del);
+
+                    Put put = new Put(newRow);
+                    put.add(newCell);
+                    mutations.add(put);
+                    break;
+                case Delete:
+                case DeleteColumn:
+                case DeleteFamily:
+                case DeleteFamilyVersion:
+                    Delete delete = new Delete(newRow);
+                    delete.addDeleteMarker(newCell);
+                    mutations.add(delete);
+                    break;
+            }
+        }
+        return true;
+    }
+
+    void buildLocalIndex(Tuple result, List<Cell> results, ImmutableBytesWritable ptr,
+                         UngroupedAggregateRegionObserver.MutationList mutations) throws IOException {
+        for (IndexMaintainer maintainer : indexMaintainers) {
+            if (!results.isEmpty()) {
+                result.getKey(ptr);
+                ValueGetter valueGetter =
+                        maintainer.createGetterFromKeyValues(
+                                ImmutableBytesPtr.copyBytesIfNecessary(ptr),
+                                results);
+                Put put = maintainer.buildUpdateMutation(GenericKeyValueBuilder.INSTANCE,
+                        valueGetter, ptr, results.get(0).getTimestamp(),
+                        env.getRegion().getRegionInfo().getStartKey(),
+                        env.getRegion().getRegionInfo().getEndKey());
+
+                if (txnProvider != null) {
+                    put = txnProvider.markPutAsCommitted(put, ts, ts);
+                }
+                indexMutations.add(put);
+            }
+        }
+        result.setKeyValues(results);
+    }
+    void deleteRow(List<Cell> results, UngroupedAggregateRegionObserver.MutationList mutations) {
+        // FIXME: the version of the Delete constructor without the lock
+        // args was introduced in 0.94.4, thus if we try to use it here
+        // we can no longer use the 0.94.2 version of the client.
+        Cell firstKV = results.get(0);
+        Delete delete = new Delete(firstKV.getRowArray(),
+                firstKV.getRowOffset(), firstKV.getRowLength(),ts);
+        if (replayMutations != null) {
+            delete.setAttribute(REPLAY_WRITES, replayMutations);
+        }
+        mutations.add(delete);
+        // force tephra to ignore this deletes
+        delete.setAttribute(PhoenixTransactionContext.TX_ROLLBACK_ATTRIBUTE_KEY, new byte[0]);
+    }
+
+    void deleteCForQ(Tuple result, List<Cell> results, UngroupedAggregateRegionObserver.MutationList mutations) {
+        // No need to search for delete column, since we project only it
+        // if no empty key value is being set
+        if (emptyCF == null ||
+                result.getValue(deleteCF, deleteCQ) != null) {
+            Delete delete = new Delete(results.get(0).getRowArray(),
+                    results.get(0).getRowOffset(),
+                    results.get(0).getRowLength());
+            delete.deleteColumns(deleteCF,  deleteCQ, ts);
+            // force tephra to ignore this deletes
+            delete.setAttribute(PhoenixTransactionContext.TX_ROLLBACK_ATTRIBUTE_KEY, new byte[0]);
+            mutations.add(delete);
+        }
+    }
+    void upsert(Tuple result, ImmutableBytesWritable ptr, UngroupedAggregateRegionObserver.MutationList mutations) {
+        Arrays.fill(values, null);
+        int bucketNumOffset = 0;
+        if (projectedTable.getBucketNum() != null) {
+            values[0] = new byte[] { 0 };
+            bucketNumOffset = 1;
+        }
+        int i = bucketNumOffset;
+        List<PColumn> projectedColumns = projectedTable.getColumns();
+        for (; i < projectedTable.getPKColumns().size(); i++) {
+            Expression expression = selectExpressions.get(i - bucketNumOffset);
+            if (expression.evaluate(result, ptr)) {
+                values[i] = ptr.copyBytes();
+                // If SortOrder from expression in SELECT doesn't match the
+                // column being projected into then invert the bits.
+                if (expression.getSortOrder() !=
+                        projectedColumns.get(i).getSortOrder()) {
+                    SortOrder.invert(values[i], 0, values[i], 0,
+                            values[i].length);
+                }
+            }else{
+                values[i] = ByteUtil.EMPTY_BYTE_ARRAY;
+            }
+        }
+        projectedTable.newKey(ptr, values);
+        PRow row = projectedTable.newRow(GenericKeyValueBuilder.INSTANCE, ts, ptr, false);
+        for (; i < projectedColumns.size(); i++) {
+            Expression expression = selectExpressions.get(i - bucketNumOffset);
+            if (expression.evaluate(result, ptr)) {
+                PColumn column = projectedColumns.get(i);
+                if (!column.getDataType().isSizeCompatible(ptr, null,
+                        expression.getDataType(), expression.getSortOrder(),
+                        expression.getMaxLength(), expression.getScale(),
+                        column.getMaxLength(), column.getScale())) {
+                    throw new DataExceedsCapacityException(
+                            column.getDataType(), column.getMaxLength(),
+                            column.getScale(), column.getName().getString(), ptr);
+                }
+                column.getDataType().coerceBytes(ptr, null,
+                        expression.getDataType(), expression.getMaxLength(),
+                        expression.getScale(), expression.getSortOrder(),
+                        column.getMaxLength(), column.getScale(),
+                        column.getSortOrder(), projectedTable.rowKeyOrderOptimizable());
+                byte[] bytes = ByteUtil.copyKeyBytesIfNecessary(ptr);
+                row.setValue(column, bytes);
+            }
+        }
+        for (Mutation mutation : row.toRowMutations()) {
+            if (replayMutations != null) {
+                mutation.setAttribute(REPLAY_WRITES, replayMutations);
+            } else if (txnProvider != null && projectedTable.getType() == PTableType.INDEX) {
+                mutation = txnProvider.markPutAsCommitted((Put)mutation, ts, ts);
+            }
+            mutations.add(mutation);
+        }
+        for (i = 0; i < selectExpressions.size(); i++) {
+            selectExpressions.get(i).reset();
+        }
+    }
+
+    void insertEmptyKeyValue(List<Cell> results, UngroupedAggregateRegionObserver.MutationList mutations) {
+        Set<Long> timeStamps =
+                Sets.newHashSetWithExpectedSize(results.size());
+        for (Cell kv : results) {
+            long kvts = kv.getTimestamp();
+            if (!timeStamps.contains(kvts)) {
+                Put put = new Put(kv.getRowArray(), kv.getRowOffset(),
+                        kv.getRowLength());
+                put.addColumn(emptyCF, QueryConstants.EMPTY_COLUMN_BYTES, kvts,
+                        ByteUtil.EMPTY_BYTE_ARRAY);
+                mutations.add(put);
+            }
+        }
+    }
+    @Override
+    public boolean next(List<Cell> resultsToReturn) throws IOException {
+        boolean hasMore;
+        long startTime = EnvironmentEdgeManager.currentTimeMillis();
+        Configuration conf = env.getConfiguration();
+        final TenantCache tenantCache = GlobalCache.getTenantCache(env, ScanUtil.getTenantId(scan));
+        try (MemoryManager.MemoryChunk em = tenantCache.getMemoryManager().allocate(0)) {
+            Aggregators aggregators = ServerAggregators.deserialize(
+                    scan.getAttribute(BaseScannerRegionObserver.AGGREGATORS), conf, em);
+            Aggregator[] rowAggregators = aggregators.getAggregators();
+            aggregators.reset(rowAggregators);
+            Cell lastCell = null;
+            boolean hasAny = false;
+            ImmutableBytesWritable ptr = new ImmutableBytesWritable();
+            Tuple result = useQualifierAsIndex ? new PositionBasedMultiKeyValueTuple() : new MultiKeyValueTuple();
+            UngroupedAggregateRegionObserver.MutationList mutations = new UngroupedAggregateRegionObserver.MutationList();
+            if (isDescRowKeyOrderUpgrade || isDelete || isUpsert
+                    || (deleteCQ != null && deleteCF != null) || emptyCF != null || buildLocalIndex) {
+                mutations = new UngroupedAggregateRegionObserver.MutationList(Ints.saturatedCast(maxBatchSize + maxBatchSize / 10));
+            }
+            region.startRegionOperation();
+            try {
+                synchronized (innerScanner) {
+                    do {
+                        ungroupedAggregateRegionObserver.checkForRegionClosing();
+                        List<Cell> results = useQualifierAsIndex ? new EncodedColumnQualiferCellsList(minMaxQualifiers.getFirst(), minMaxQualifiers.getSecond(), encodingScheme) : new ArrayList<Cell>();
+                        // Results are potentially returned even when the return value of s.next is false
+                        // since this is an indication of whether or not there are more values after the
+                        // ones returned
+                        hasMore = innerScanner.nextRaw(results);
+                        if (!results.isEmpty()) {
+                            lastCell = results.get(0);
+                            result.setKeyValues(results);
+                            if (isDescRowKeyOrderUpgrade) {
+                                if (!descRowKeyOrderUpgrade(results, ptr, mutations)) {
+                                    continue;
+                                }
+                            } else if (buildLocalIndex) {
+                                buildLocalIndex(result, results, ptr, mutations);
+                            } else if (isDelete) {
+                                deleteRow(results, mutations);
+                            } else if (isUpsert) {
+                                upsert(result, ptr, mutations);
+                            } else if (deleteCF != null && deleteCQ != null) {
+                                deleteCForQ(result, results, mutations);
+                            }
+                            if (emptyCF != null) {
+                                /*
+                                 * If we've specified an emptyCF, then we need to insert an empty
+                                 * key value "retroactively" for any key value that is visible at
+                                 * the timestamp that the DDL was issued. Key values that are not
+                                 * visible at this timestamp will not ever be projected up to
+                                 * scans past this timestamp, so don't need to be considered.
+                                 * We insert one empty key value per row per timestamp.
+                                 */
+                                insertEmptyKeyValue(results, mutations);
+                            }
+                            if (ServerUtil.readyToCommit(mutations.size(), mutations.byteSize(), maxBatchSize, maxBatchSizeBytes)) {
+                                ungroupedAggregateRegionObserver.commit(region, mutations, indexUUID, blockingMemStoreSize, indexMaintainersPtr,
+                                        txState, targetHTable, useIndexProto, isPKChanging, clientVersionBytes);
+                                mutations.clear();
+                            }
+                            // Commit in batches based on UPSERT_BATCH_SIZE_BYTES_ATTRIB in config
+
+                            if (ServerUtil.readyToCommit(indexMutations.size(), indexMutations.byteSize(), maxBatchSize, maxBatchSizeBytes)) {
+                                setIndexAndTransactionProperties(indexMutations, indexUUID, indexMaintainersPtr, txState, clientVersionBytes, useIndexProto);
+                                ungroupedAggregateRegionObserver.commitBatch(region, indexMutations, blockingMemStoreSize);
+                                indexMutations.clear();
+                            }
+                            aggregators.aggregate(rowAggregators, result);
+                            hasAny = true;
+                        }
+                    } while (hasMore && (EnvironmentEdgeManager.currentTimeMillis() - startTime) < pageSizeInMs);
+
+                    if (!mutations.isEmpty()) {
+                        ungroupedAggregateRegionObserver.commit(region, mutations, indexUUID, blockingMemStoreSize, indexMaintainersPtr, txState,
+                                targetHTable, useIndexProto, isPKChanging, clientVersionBytes);
+                        mutations.clear();
+                    }
+                    if (!indexMutations.isEmpty()) {
+                        ungroupedAggregateRegionObserver.commitBatch(region, indexMutations, blockingMemStoreSize);
+                        indexMutations.clear();
+                    }
+                }
+            } catch (InsufficientMemoryException e) {
+                throw new DoNotRetryIOException(e);
+            } catch (DataExceedsCapacityException e) {
+                throw new DoNotRetryIOException(e.getMessage(), e);
+            } catch (Throwable e) {
+                LOGGER.error("Exception in UngroupedAggreagteRegionScanner for region "
+                        + region.getRegionInfo().getRegionNameAsString(), e);
+                throw e;
+            }
+            KeyValue keyValue;

Review comment:
       This can also be a Cell populated by CellUtil below at 632.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [phoenix] stoty commented on pull request #936: PHOENIX-5998 Paged server side ungrouped aggregate operations

Posted by GitBox <gi...@apache.org>.

stoty commented on pull request #936:
URL: https://github.com/apache/phoenix/pull/936#issuecomment-716904144


   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |:----:|----------:|--------:|:--------|
   | +0 :ok: |  reexec  |   1m  4s |  Docker mode activated.  |
   ||| _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  No case conflicting files found.  |
   | +1 :green_heart: |  hbaseanti  |   0m  0s |  Patch does not have any anti-patterns.  |
   | +1 :green_heart: |  @author  |   0m  0s |  The patch does not contain any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  The patch appears to include 1 new or modified test files.  |
   ||| _ 4.x Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  11m  0s |  4.x passed  |
   | +1 :green_heart: |  compile  |   0m 55s |  4.x passed  |
   | +1 :green_heart: |  checkstyle  |   1m 19s |  4.x passed  |
   | +1 :green_heart: |  javadoc  |   0m 44s |  4.x passed  |
   | +0 :ok: |  spotbugs  |   2m 55s |  phoenix-core in 4.x has 953 extant spotbugs warnings.  |
   ||| _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   5m 14s |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 55s |  the patch passed  |
   | +1 :green_heart: |  javac  |   0m 55s |  the patch passed  |
   | -1 :x: |  checkstyle  |   1m 15s |  phoenix-core: The patch generated 314 new + 1796 unchanged - 219 fixed = 2110 total (was 2015)  |
   | +1 :green_heart: |  whitespace  |   0m  0s |  The patch has no whitespace issues.  |
   | +1 :green_heart: |  javadoc  |   0m 42s |  the patch passed  |
   | -1 :x: |  spotbugs  |   3m  5s |  phoenix-core generated 1 new + 952 unchanged - 1 fixed = 953 total (was 953)  |
   ||| _ Other Tests _ |
   | -1 :x: |  unit  | 184m 40s |  phoenix-core in the patch failed.  |
   | +1 :green_heart: |  asflicense  |   0m 36s |  The patch does not generate ASF License warnings.  |
   |  |   | 217m 14s |   |
   
   
   | Reason | Tests |
   |-------:|:------|
   | FindBugs | module:phoenix-core |
   |  |  Switch statement found in org.apache.phoenix.coprocessor.UngroupedAggregateRegionScanner.descRowKeyOrderUpgrade(List, ImmutableBytesWritable, UngroupedAggregateRegionObserver$MutationList) where default case is missing  At UngroupedAggregateRegionScanner.java:ImmutableBytesWritable, UngroupedAggregateRegionObserver$MutationList) where default case is missing  At UngroupedAggregateRegionScanner.java:[lines 382-403] |
   | Failed junit tests | phoenix.end2end.UpsertSelectIT |
   |   | phoenix.end2end.AbsFunctionEnd2EndIT |
   |   | phoenix.end2end.DropIndexedColsIT |
   |   | phoenix.end2end.OrphanViewToolIT |
   
   
   | Subsystem | Report/Notes |
   |----------:|:-------------|
   | Docker | ClientAPI=1.40 ServerAPI=1.40 base: https://ci-hadoop.apache.org/job/Phoenix/job/Phoenix-PreCommit-GitHub-PR/job/PR-936/6/artifact/yetus-general-check/output/Dockerfile |
   | GITHUB PR | https://github.com/apache/phoenix/pull/936 |
   | Optional Tests | dupname asflicense javac javadoc unit spotbugs hbaseanti checkstyle compile |
   | uname | Linux 6024265b5617 4.15.0-65-generic #74-Ubuntu SMP Tue Sep 17 17:06:04 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev/phoenix-personality.sh |
   | git revision | 4.x / 1872f5f |
   | Default Java | Private Build-1.8.0_242-8u242-b08-0ubuntu3~16.04-b08 |
   | checkstyle | https://ci-hadoop.apache.org/job/Phoenix/job/Phoenix-PreCommit-GitHub-PR/job/PR-936/6/artifact/yetus-general-check/output/diff-checkstyle-phoenix-core.txt |
   | spotbugs | https://ci-hadoop.apache.org/job/Phoenix/job/Phoenix-PreCommit-GitHub-PR/job/PR-936/6/artifact/yetus-general-check/output/new-spotbugs-phoenix-core.html |
   | unit | https://ci-hadoop.apache.org/job/Phoenix/job/Phoenix-PreCommit-GitHub-PR/job/PR-936/6/artifact/yetus-general-check/output/patch-unit-phoenix-core.txt |
   |  Test Results | https://ci-hadoop.apache.org/job/Phoenix/job/Phoenix-PreCommit-GitHub-PR/job/PR-936/6/testReport/ |
   | Max. process+thread count | 6277 (vs. ulimit of 30000) |
   | modules | C: phoenix-core U: phoenix-core |
   | Console output | https://ci-hadoop.apache.org/job/Phoenix/job/Phoenix-PreCommit-GitHub-PR/job/PR-936/6/console |
   | versions | git=2.7.4 maven=3.3.9 spotbugs=4.1.3 |
   | Powered by | Apache Yetus 0.12.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [phoenix] ChinmaySKulkarni commented on a change in pull request #936: PHOENIX-5998 Paged server side ungrouped aggregate operations

Posted by GitBox <gi...@apache.org>.

ChinmaySKulkarni commented on a change in pull request #936:
URL: https://github.com/apache/phoenix/pull/936#discussion_r521648060



##########
File path: phoenix-core/src/main/java/org/apache/phoenix/coprocessor/UngroupedAggregateRegionScanner.java
##########
@@ -0,0 +1,646 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.phoenix.coprocessor;
+
+import static org.apache.phoenix.coprocessor.BaseScannerRegionObserver.LOCAL_INDEX_BUILD;
+import static org.apache.phoenix.coprocessor.BaseScannerRegionObserver.LOCAL_INDEX_BUILD_PROTO;
+import static org.apache.phoenix.coprocessor.BaseScannerRegionObserver.REPLAY_WRITES;
+import static org.apache.phoenix.coprocessor.BaseScannerRegionObserver.UPGRADE_DESC_ROW_KEY;
+import static org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver.checkForLocalIndexColumnFamilies;
+import static org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver.deserializeExpressions;
+import static org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver.deserializeTable;
+import static org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver.getBlockingMemstoreSize;
+import static org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver.setIndexAndTransactionProperties;
+import static org.apache.phoenix.query.QueryConstants.AGG_TIMESTAMP;
+import static org.apache.phoenix.query.QueryConstants.SINGLE_COLUMN;
+import static org.apache.phoenix.query.QueryConstants.SINGLE_COLUMN_FAMILY;
+import static org.apache.phoenix.query.QueryServices.MUTATE_BATCH_SIZE_ATTRIB;
+import static org.apache.phoenix.query.QueryServices.MUTATE_BATCH_SIZE_BYTES_ATTRIB;
+import static org.apache.phoenix.query.QueryServices.UNGROUPED_AGGREGATE_PAGE_SIZE_IN_MS;
+import static org.apache.phoenix.schema.PTableImpl.getColumnsToClone;
+
+import java.io.IOException;
+import java.sql.SQLException;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.List;
+import java.util.Set;
+
+import com.google.common.collect.Sets;
+import com.google.common.primitives.Ints;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.hbase.Cell;
+import org.apache.hadoop.hbase.CellUtil;
+import org.apache.hadoop.hbase.DoNotRetryIOException;
+import org.apache.hadoop.hbase.HRegionInfo;
+import org.apache.hadoop.hbase.KeyValue;
+import org.apache.hadoop.hbase.client.Delete;
+import org.apache.hadoop.hbase.client.HTable;
+import org.apache.hadoop.hbase.client.Mutation;
+import org.apache.hadoop.hbase.client.Put;
+import org.apache.hadoop.hbase.client.Scan;
+import org.apache.hadoop.hbase.coprocessor.ObserverContext;
+import org.apache.hadoop.hbase.coprocessor.RegionCoprocessorEnvironment;
+import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
+import org.apache.hadoop.hbase.regionserver.Region;
+import org.apache.hadoop.hbase.regionserver.RegionScanner;
+import org.apache.hadoop.hbase.util.Bytes;
+import org.apache.phoenix.cache.GlobalCache;
+import org.apache.phoenix.cache.TenantCache;
+import org.apache.phoenix.exception.DataExceedsCapacityException;
+import org.apache.phoenix.expression.Expression;
+import org.apache.phoenix.expression.aggregator.Aggregator;
+import org.apache.phoenix.expression.aggregator.Aggregators;
+import org.apache.phoenix.expression.aggregator.ServerAggregators;
+import org.apache.phoenix.hbase.index.covered.update.ColumnReference;
+import org.apache.phoenix.hbase.index.util.GenericKeyValueBuilder;
+import org.apache.phoenix.index.PhoenixIndexCodec;
+import org.apache.phoenix.memory.InsufficientMemoryException;
+import org.apache.phoenix.memory.MemoryManager;
+import org.apache.phoenix.query.QueryConstants;
+import org.apache.phoenix.query.QueryServicesOptions;
+import org.apache.phoenix.schema.PColumn;
+import org.apache.phoenix.schema.PRow;
+import org.apache.phoenix.schema.PTable;
+import org.apache.phoenix.schema.PTableImpl;
+import org.apache.phoenix.schema.PTableType;
+import org.apache.phoenix.schema.RowKeySchema;
+import org.apache.phoenix.schema.SortOrder;
+import org.apache.phoenix.schema.TableRef;
+import org.apache.phoenix.schema.ValueSchema;
+import org.apache.phoenix.schema.tuple.EncodedColumnQualiferCellsList;
+import org.apache.phoenix.schema.tuple.MultiKeyValueTuple;
+import org.apache.phoenix.schema.tuple.PositionBasedMultiKeyValueTuple;
+import org.apache.phoenix.schema.tuple.Tuple;
+import org.apache.phoenix.schema.types.PBinary;
+import org.apache.phoenix.schema.types.PChar;
+import org.apache.phoenix.schema.types.PDataType;
+import org.apache.phoenix.schema.types.PDouble;
+import org.apache.phoenix.schema.types.PFloat;
+import org.apache.phoenix.transaction.PhoenixTransactionContext;
+import org.apache.phoenix.transaction.PhoenixTransactionProvider;
+import org.apache.phoenix.transaction.TransactionFactory;
+import org.apache.phoenix.util.ByteUtil;
+import org.apache.phoenix.util.EncodedColumnsUtil;
+import org.apache.hadoop.hbase.util.Pair;
+import org.apache.phoenix.hbase.index.ValueGetter;
+import org.apache.phoenix.hbase.index.util.ImmutableBytesPtr;
+import org.apache.phoenix.index.IndexMaintainer;
+import org.apache.phoenix.util.EnvironmentEdgeManager;
+import org.apache.phoenix.util.ExpressionUtil;
+import org.apache.phoenix.util.IndexUtil;
+import org.apache.phoenix.util.KeyValueUtil;
+import org.apache.phoenix.util.LogUtil;
+import org.apache.phoenix.util.ScanUtil;
+import org.apache.phoenix.util.ServerUtil;
+import org.apache.phoenix.util.StringUtil;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+public class UngroupedAggregateRegionScanner extends BaseRegionScanner {
+
+    private static final Logger LOGGER = LoggerFactory.getLogger(UngroupedAggregateRegionScanner.class);
+
+    protected long pageSizeInMs = Long.MAX_VALUE;
+    protected int maxBatchSize = 0;
+    protected Scan scan;
+    protected RegionScanner innerScanner;
+    protected Region region;
+    private final UngroupedAggregateRegionObserver ungroupedAggregateRegionObserver;
+    final private RegionCoprocessorEnvironment env;
+    private final boolean useQualifierAsIndex;
+    boolean needToWrite = false;
+    final Pair<Integer, Integer> minMaxQualifiers;
+    byte[][] values = null;
+    PTable.QualifierEncodingScheme encodingScheme;
+    PTable writeToTable = null;
+    PTable projectedTable = null;
+    boolean isDescRowKeyOrderUpgrade;
+    final int offset;
+    boolean buildLocalIndex;
+    List<IndexMaintainer> indexMaintainers;
+    boolean isPKChanging = false;
+    long ts;
+    PhoenixTransactionProvider txnProvider = null;
+    UngroupedAggregateRegionObserver.MutationList indexMutations;
+    boolean isDelete = false;
+    byte[] replayMutations;
+    boolean isUpsert = false;
+    List<Expression> selectExpressions = null;
+    byte[] deleteCQ = null;
+    byte[] deleteCF = null;
+    byte[] emptyCF = null;
+    byte[] indexUUID;
+    byte[] txState;
+    byte[] clientVersionBytes;
+    long blockingMemStoreSize;
+    long maxBatchSizeBytes = 0L;
+    HTable targetHTable = null;
+    boolean incrScanRefCount = false;
+    byte[] indexMaintainersPtr;
+    boolean useIndexProto;
+
+    public UngroupedAggregateRegionScanner(final ObserverContext<RegionCoprocessorEnvironment> c,
+                                           final RegionScanner innerScanner, final Region region, final Scan scan,
+                                           final RegionCoprocessorEnvironment env,
+                                           final UngroupedAggregateRegionObserver ungroupedAggregateRegionObserver)
+            throws IOException, SQLException{
+        super(innerScanner);
+        this.env = env;
+        this.region = region;
+        this.scan = scan;
+        this.ungroupedAggregateRegionObserver = ungroupedAggregateRegionObserver;
+        this.innerScanner = innerScanner;
+        Configuration conf = env.getConfiguration();
+        if (scan.getAttribute(BaseScannerRegionObserver.SERVER_PAGING) != null) {
+            byte[] pageSizeFromScan =
+                    scan.getAttribute(BaseScannerRegionObserver.AGGREGATE_PAGE_ROWS);
+            if (pageSizeFromScan != null) {
+                pageSizeInMs = Bytes.toLong(pageSizeFromScan);
+            } else {
+                pageSizeInMs =
+                        conf.getLong(UNGROUPED_AGGREGATE_PAGE_SIZE_IN_MS,
+                                QueryServicesOptions.DEFAULT_UNGROUPED_AGGREGATE_PAGE_SIZE_IN_MS);
+            }
+        }
+        ts = scan.getTimeRange().getMax();
+        boolean localIndexScan = ScanUtil.isLocalIndex(scan);
+
+        encodingScheme = EncodedColumnsUtil.getQualifierEncodingScheme(scan);
+        int offsetToBe = 0;
+        if (localIndexScan) {
+            /*
+             * For local indexes, we need to set an offset on row key expressions to skip
+             * the region start key.
+             */
+            offsetToBe = region.getRegionInfo().getStartKey().length != 0 ? region.getRegionInfo().getStartKey().length :
+                    region.getRegionInfo().getEndKey().length;
+            ScanUtil.setRowKeyOffset(scan, offsetToBe);
+        }
+        offset = offsetToBe;
+
+        byte[] descRowKeyTableBytes = scan.getAttribute(UPGRADE_DESC_ROW_KEY);
+        isDescRowKeyOrderUpgrade = descRowKeyTableBytes != null;
+        if (isDescRowKeyOrderUpgrade) {
+            LOGGER.debug("Upgrading row key for " + region.getRegionInfo().getTable().getNameAsString());
+            projectedTable = deserializeTable(descRowKeyTableBytes);
+            try {
+                writeToTable = PTableImpl.builderWithColumns(projectedTable,
+                        getColumnsToClone(projectedTable))
+                        .setRowKeyOrderOptimizable(true)
+                        .build();
+            } catch (SQLException e) {
+                ServerUtil.throwIOException("Upgrade failed", e); // Impossible
+            }
+            values = new byte[projectedTable.getPKColumns().size()][];
+        }
+        boolean useProto = false;
+        byte[] localIndexBytes = scan.getAttribute(LOCAL_INDEX_BUILD_PROTO);
+        useProto = localIndexBytes != null;
+        if (localIndexBytes == null) {
+            localIndexBytes = scan.getAttribute(LOCAL_INDEX_BUILD);
+        }
+        indexMaintainers = localIndexBytes == null ? null : IndexMaintainer.deserialize(localIndexBytes, useProto);
+        indexMutations = localIndexBytes == null ? new UngroupedAggregateRegionObserver.MutationList() : new UngroupedAggregateRegionObserver.MutationList(1024);
+
+        replayMutations = scan.getAttribute(REPLAY_WRITES);
+        indexUUID = scan.getAttribute(PhoenixIndexCodec.INDEX_UUID);
+        txState = scan.getAttribute(BaseScannerRegionObserver.TX_STATE);
+        clientVersionBytes = scan.getAttribute(BaseScannerRegionObserver.CLIENT_VERSION);
+        if (txState != null) {
+            int clientVersion = clientVersionBytes == null ? ScanUtil.UNKNOWN_CLIENT_VERSION : Bytes.toInt(clientVersionBytes);
+            txnProvider = TransactionFactory.getTransactionProvider(txState, clientVersion);
+        }
+        byte[] upsertSelectTable = scan.getAttribute(BaseScannerRegionObserver.UPSERT_SELECT_TABLE);
+        if (upsertSelectTable != null) {
+            isUpsert = true;
+            projectedTable = deserializeTable(upsertSelectTable);
+            targetHTable = new HTable(ungroupedAggregateRegionObserver.getUpsertSelectConfig(),
+                    projectedTable.getPhysicalName().getBytes());
+            selectExpressions = deserializeExpressions(scan.getAttribute(BaseScannerRegionObserver.UPSERT_SELECT_EXPRS));
+            values = new byte[projectedTable.getPKColumns().size()][];
+            isPKChanging = ExpressionUtil.isPkPositionChanging(new TableRef(projectedTable), selectExpressions);
+        } else {
+            byte[] isDeleteAgg = scan.getAttribute(BaseScannerRegionObserver.DELETE_AGG);
+            isDelete = isDeleteAgg != null && Bytes.compareTo(PDataType.TRUE_BYTES, isDeleteAgg) == 0;
+            if (!isDelete) {
+                deleteCF = scan.getAttribute(BaseScannerRegionObserver.DELETE_CF);
+                deleteCQ = scan.getAttribute(BaseScannerRegionObserver.DELETE_CQ);
+            }
+            emptyCF = scan.getAttribute(BaseScannerRegionObserver.EMPTY_CF);
+        }
+        ColumnReference[] dataColumns = IndexUtil.deserializeDataTableColumnsToJoin(scan);
+        useQualifierAsIndex = EncodedColumnsUtil.useQualifierAsIndex(EncodedColumnsUtil.getMinMaxQualifiersFromScan(scan));
+
+        /**
+         * Slow down the writes if the memstore size more than
+         * (hbase.hregion.memstore.block.multiplier - 1) times hbase.hregion.memstore.flush.size
+         * bytes. This avoids flush storm to hdfs for cases like index building where reads and
+         * write happen to all the table regions in the server.
+         */
+        blockingMemStoreSize = getBlockingMemstoreSize(region, conf) ;
+
+        buildLocalIndex = indexMaintainers != null && dataColumns==null && !localIndexScan;
+        if(buildLocalIndex) {
+            checkForLocalIndexColumnFamilies(region, indexMaintainers);
+        }
+        if (isDescRowKeyOrderUpgrade || isDelete || isUpsert
+                || (deleteCQ != null && deleteCF != null) || emptyCF != null || buildLocalIndex) {
+            needToWrite = true;
+            if((isUpsert && (targetHTable == null ||
+                    !targetHTable.getName().equals(region.getTableDesc().getTableName())))) {
+                needToWrite = false;
+            }
+            maxBatchSize = conf.getInt(MUTATE_BATCH_SIZE_ATTRIB, QueryServicesOptions.DEFAULT_MUTATE_BATCH_SIZE);
+            maxBatchSizeBytes = conf.getLong(MUTATE_BATCH_SIZE_BYTES_ATTRIB,
+                    QueryServicesOptions.DEFAULT_MUTATE_BATCH_SIZE_BYTES);
+        }
+        minMaxQualifiers = EncodedColumnsUtil.getMinMaxQualifiersFromScan(scan);
+        if (LOGGER.isDebugEnabled()) {
+            LOGGER.debug(LogUtil.addCustomAnnotations("Starting ungrouped coprocessor scan " + scan + " " + region.getRegionInfo(), ScanUtil.getCustomAnnotations(scan)));
+        }
+        useIndexProto = true;
+        indexMaintainersPtr = scan.getAttribute(PhoenixIndexCodec.INDEX_PROTO_MD);
+        // for backward compatiblity fall back to look by the old attribute
+        if (indexMaintainersPtr == null) {
+            indexMaintainersPtr = scan.getAttribute(PhoenixIndexCodec.INDEX_MD);
+            useIndexProto = false;
+        }
+
+        if (needToWrite) {
+            ungroupedAggregateRegionObserver.incrementScansReferenceCount();
+            incrScanRefCount = true;
+        }
+    }
+
+    @Override
+    public HRegionInfo getRegionInfo() {
+        return region.getRegionInfo();
+    }
+
+    @Override
+    public boolean isFilterDone() {
+        return false;
+    }
+
+    @Override
+    public void close() throws IOException {
+        if (needToWrite && incrScanRefCount) {
+            ungroupedAggregateRegionObserver.decrementScansReferenceCount();
+        }
+        try {
+            if (targetHTable != null) {
+                try {
+                    targetHTable.close();
+                } catch (IOException e) {
+                    LOGGER.error("Closing table: " + targetHTable + " failed: ", e);
+                }
+            }
+        } finally {
+            innerScanner.close();
+        }
+    }
+
+    boolean descRowKeyOrderUpgrade(List<Cell> results, ImmutableBytesWritable ptr,
+                                UngroupedAggregateRegionObserver.MutationList mutations) throws IOException {
+        Arrays.fill(values, null);
+        Cell firstKV = results.get(0);
+        RowKeySchema schema = projectedTable.getRowKeySchema();
+        int maxOffset = schema.iterator(firstKV.getRowArray(), firstKV.getRowOffset() + offset, firstKV.getRowLength(), ptr);
+        for (int i = 0; i < schema.getFieldCount(); i++) {
+            Boolean hasValue = schema.next(ptr, i, maxOffset);
+            if (hasValue == null) {
+                break;
+            }
+            ValueSchema.Field field = schema.getField(i);
+            if (field.getSortOrder() == SortOrder.DESC) {
+                // Special case for re-writing DESC ARRAY, as the actual byte value needs to change in this case
+                if (field.getDataType().isArrayType()) {
+                    field.getDataType().coerceBytes(ptr, null, field.getDataType(),
+                            field.getMaxLength(), field.getScale(), field.getSortOrder(),
+                            field.getMaxLength(), field.getScale(), field.getSortOrder(), true); // force to use correct separator byte
+                }
+                // Special case for re-writing DESC CHAR or DESC BINARY, to force the re-writing of trailing space characters
+                else if (field.getDataType() == PChar.INSTANCE || field.getDataType() == PBinary.INSTANCE) {
+                    int len = ptr.getLength();
+                    while (len > 0 && ptr.get()[ptr.getOffset() + len - 1] == StringUtil.SPACE_UTF8) {
+                        len--;
+                    }
+                    ptr.set(ptr.get(), ptr.getOffset(), len);
+                    // Special case for re-writing DESC FLOAT and DOUBLE, as they're not inverted like they should be (PHOENIX-2171)
+                } else if (field.getDataType() == PFloat.INSTANCE || field.getDataType() == PDouble.INSTANCE) {
+                    byte[] invertedBytes = SortOrder.invert(ptr.get(), ptr.getOffset(), ptr.getLength());
+                    ptr.set(invertedBytes);
+                }
+            } else if (field.getDataType() == PBinary.INSTANCE) {
+                // Remove trailing space characters so that the setValues call below will replace them
+                // with the correct zero byte character. Note this is somewhat dangerous as these
+                // could be legit, but I don't know what the alternative is.
+                int len = ptr.getLength();
+                while (len > 0 && ptr.get()[ptr.getOffset() + len - 1] == StringUtil.SPACE_UTF8) {
+                    len--;
+                }
+                ptr.set(ptr.get(), ptr.getOffset(), len);
+            }
+            values[i] = ptr.copyBytes();
+        }
+        writeToTable.newKey(ptr, values);
+        if (Bytes.compareTo(
+                firstKV.getRowArray(), firstKV.getRowOffset() + offset, firstKV.getRowLength(),
+                ptr.get(),ptr.getOffset() + offset,ptr.getLength()) == 0) {
+            return false;
+        }
+        byte[] newRow = ByteUtil.copyKeyBytesIfNecessary(ptr);
+        if (offset > 0) { // for local indexes (prepend region start key)
+            byte[] newRowWithOffset = new byte[offset + newRow.length];
+            System.arraycopy(firstKV.getRowArray(), firstKV.getRowOffset(), newRowWithOffset, 0, offset);;
+            System.arraycopy(newRow, 0, newRowWithOffset, offset, newRow.length);
+            newRow = newRowWithOffset;
+        }
+        byte[] oldRow = Bytes.copy(firstKV.getRowArray(), firstKV.getRowOffset(), firstKV.getRowLength());
+        for (Cell cell : results) {
+            // Copy existing cell but with new row key
+            Cell newCell = new KeyValue(newRow, 0, newRow.length,
+                    cell.getFamilyArray(), cell.getFamilyOffset(), cell.getFamilyLength(),
+                    cell.getQualifierArray(), cell.getQualifierOffset(), cell.getQualifierLength(),
+                    cell.getTimestamp(), KeyValue.Type.codeToType(cell.getTypeByte()),
+                    cell.getValueArray(), cell.getValueOffset(), cell.getValueLength());
+            switch (KeyValue.Type.codeToType(cell.getTypeByte())) {
+                case Put:
+                    // If Put, point delete old Put
+                    Delete del = new Delete(oldRow);
+                    del.addDeleteMarker(new KeyValue(cell.getRowArray(), cell.getRowOffset(), cell.getRowLength(),
+                            cell.getFamilyArray(), cell.getFamilyOffset(), cell.getFamilyLength(),
+                            cell.getQualifierArray(), cell.getQualifierOffset(),
+                            cell.getQualifierLength(), cell.getTimestamp(), KeyValue.Type.Delete,
+                            ByteUtil.EMPTY_BYTE_ARRAY, 0, 0));
+                    mutations.add(del);
+
+                    Put put = new Put(newRow);
+                    put.add(newCell);
+                    mutations.add(put);
+                    break;
+                case Delete:
+                case DeleteColumn:
+                case DeleteFamily:
+                case DeleteFamilyVersion:
+                    Delete delete = new Delete(newRow);
+                    delete.addDeleteMarker(newCell);
+                    mutations.add(delete);
+                    break;
+            }
+        }
+        return true;
+    }
+
+    void buildLocalIndex(Tuple result, List<Cell> results, ImmutableBytesWritable ptr,
+                         UngroupedAggregateRegionObserver.MutationList mutations) throws IOException {
+        for (IndexMaintainer maintainer : indexMaintainers) {
+            if (!results.isEmpty()) {
+                result.getKey(ptr);
+                ValueGetter valueGetter =
+                        maintainer.createGetterFromKeyValues(
+                                ImmutableBytesPtr.copyBytesIfNecessary(ptr),
+                                results);
+                Put put = maintainer.buildUpdateMutation(GenericKeyValueBuilder.INSTANCE,
+                        valueGetter, ptr, results.get(0).getTimestamp(),
+                        env.getRegion().getRegionInfo().getStartKey(),
+                        env.getRegion().getRegionInfo().getEndKey());
+
+                if (txnProvider != null) {
+                    put = txnProvider.markPutAsCommitted(put, ts, ts);
+                }
+                indexMutations.add(put);
+            }
+        }
+        result.setKeyValues(results);
+    }
+    void deleteRow(List<Cell> results, UngroupedAggregateRegionObserver.MutationList mutations) {
+        // FIXME: the version of the Delete constructor without the lock
+        // args was introduced in 0.94.4, thus if we try to use it here
+        // we can no longer use the 0.94.2 version of the client.
+        Cell firstKV = results.get(0);
+        Delete delete = new Delete(firstKV.getRowArray(),
+                firstKV.getRowOffset(), firstKV.getRowLength(),ts);
+        if (replayMutations != null) {
+            delete.setAttribute(REPLAY_WRITES, replayMutations);
+        }
+        mutations.add(delete);
+        // force tephra to ignore this deletes
+        delete.setAttribute(PhoenixTransactionContext.TX_ROLLBACK_ATTRIBUTE_KEY, new byte[0]);
+    }
+
+    void deleteCForQ(Tuple result, List<Cell> results, UngroupedAggregateRegionObserver.MutationList mutations) {
+        // No need to search for delete column, since we project only it
+        // if no empty key value is being set
+        if (emptyCF == null ||
+                result.getValue(deleteCF, deleteCQ) != null) {
+            Delete delete = new Delete(results.get(0).getRowArray(),
+                    results.get(0).getRowOffset(),
+                    results.get(0).getRowLength());
+            delete.deleteColumns(deleteCF,  deleteCQ, ts);
+            // force tephra to ignore this deletes
+            delete.setAttribute(PhoenixTransactionContext.TX_ROLLBACK_ATTRIBUTE_KEY, new byte[0]);
+            mutations.add(delete);
+        }
+    }
+    void upsert(Tuple result, ImmutableBytesWritable ptr, UngroupedAggregateRegionObserver.MutationList mutations) {
+        Arrays.fill(values, null);
+        int bucketNumOffset = 0;
+        if (projectedTable.getBucketNum() != null) {
+            values[0] = new byte[] { 0 };
+            bucketNumOffset = 1;
+        }
+        int i = bucketNumOffset;
+        List<PColumn> projectedColumns = projectedTable.getColumns();
+        for (; i < projectedTable.getPKColumns().size(); i++) {
+            Expression expression = selectExpressions.get(i - bucketNumOffset);
+            if (expression.evaluate(result, ptr)) {
+                values[i] = ptr.copyBytes();
+                // If SortOrder from expression in SELECT doesn't match the
+                // column being projected into then invert the bits.
+                if (expression.getSortOrder() !=
+                        projectedColumns.get(i).getSortOrder()) {
+                    SortOrder.invert(values[i], 0, values[i], 0,
+                            values[i].length);
+                }
+            }else{
+                values[i] = ByteUtil.EMPTY_BYTE_ARRAY;
+            }
+        }
+        projectedTable.newKey(ptr, values);
+        PRow row = projectedTable.newRow(GenericKeyValueBuilder.INSTANCE, ts, ptr, false);
+        for (; i < projectedColumns.size(); i++) {
+            Expression expression = selectExpressions.get(i - bucketNumOffset);
+            if (expression.evaluate(result, ptr)) {
+                PColumn column = projectedColumns.get(i);
+                if (!column.getDataType().isSizeCompatible(ptr, null,
+                        expression.getDataType(), expression.getSortOrder(),
+                        expression.getMaxLength(), expression.getScale(),
+                        column.getMaxLength(), column.getScale())) {
+                    throw new DataExceedsCapacityException(
+                            column.getDataType(), column.getMaxLength(),
+                            column.getScale(), column.getName().getString(), ptr);
+                }
+                column.getDataType().coerceBytes(ptr, null,
+                        expression.getDataType(), expression.getMaxLength(),
+                        expression.getScale(), expression.getSortOrder(),
+                        column.getMaxLength(), column.getScale(),
+                        column.getSortOrder(), projectedTable.rowKeyOrderOptimizable());
+                byte[] bytes = ByteUtil.copyKeyBytesIfNecessary(ptr);
+                row.setValue(column, bytes);
+            }
+        }
+        for (Mutation mutation : row.toRowMutations()) {
+            if (replayMutations != null) {
+                mutation.setAttribute(REPLAY_WRITES, replayMutations);
+            } else if (txnProvider != null && projectedTable.getType() == PTableType.INDEX) {
+                mutation = txnProvider.markPutAsCommitted((Put)mutation, ts, ts);
+            }
+            mutations.add(mutation);
+        }
+        for (i = 0; i < selectExpressions.size(); i++) {
+            selectExpressions.get(i).reset();
+        }
+    }
+
+    void insertEmptyKeyValue(List<Cell> results, UngroupedAggregateRegionObserver.MutationList mutations) {
+        Set<Long> timeStamps =
+                Sets.newHashSetWithExpectedSize(results.size());
+        for (Cell kv : results) {
+            long kvts = kv.getTimestamp();
+            if (!timeStamps.contains(kvts)) {
+                Put put = new Put(kv.getRowArray(), kv.getRowOffset(),
+                        kv.getRowLength());
+                put.addColumn(emptyCF, QueryConstants.EMPTY_COLUMN_BYTES, kvts,
+                        ByteUtil.EMPTY_BYTE_ARRAY);
+                mutations.add(put);
+            }
+        }
+    }
+    @Override
+    public boolean next(List<Cell> resultsToReturn) throws IOException {
+        boolean hasMore;
+        long startTime = EnvironmentEdgeManager.currentTimeMillis();
+        Configuration conf = env.getConfiguration();
+        final TenantCache tenantCache = GlobalCache.getTenantCache(env, ScanUtil.getTenantId(scan));
+        try (MemoryManager.MemoryChunk em = tenantCache.getMemoryManager().allocate(0)) {
+            Aggregators aggregators = ServerAggregators.deserialize(
+                    scan.getAttribute(BaseScannerRegionObserver.AGGREGATORS), conf, em);
+            Aggregator[] rowAggregators = aggregators.getAggregators();
+            aggregators.reset(rowAggregators);
+            Cell lastCell = null;
+            boolean hasAny = false;
+            ImmutableBytesWritable ptr = new ImmutableBytesWritable();
+            Tuple result = useQualifierAsIndex ? new PositionBasedMultiKeyValueTuple() : new MultiKeyValueTuple();
+            UngroupedAggregateRegionObserver.MutationList mutations = new UngroupedAggregateRegionObserver.MutationList();
+            if (isDescRowKeyOrderUpgrade || isDelete || isUpsert
+                    || (deleteCQ != null && deleteCF != null) || emptyCF != null || buildLocalIndex) {
+                mutations = new UngroupedAggregateRegionObserver.MutationList(Ints.saturatedCast(maxBatchSize + maxBatchSize / 10));
+            }
+            region.startRegionOperation();
+            try {
+                synchronized (innerScanner) {
+                    do {
+                        ungroupedAggregateRegionObserver.checkForRegionClosing();
+                        List<Cell> results = useQualifierAsIndex ? new EncodedColumnQualiferCellsList(minMaxQualifiers.getFirst(), minMaxQualifiers.getSecond(), encodingScheme) : new ArrayList<Cell>();
+                        // Results are potentially returned even when the return value of s.next is false
+                        // since this is an indication of whether or not there are more values after the
+                        // ones returned
+                        hasMore = innerScanner.nextRaw(results);
+                        if (!results.isEmpty()) {
+                            lastCell = results.get(0);
+                            result.setKeyValues(results);
+                            if (isDescRowKeyOrderUpgrade) {
+                                if (!descRowKeyOrderUpgrade(results, ptr, mutations)) {
+                                    continue;
+                                }
+                            } else if (buildLocalIndex) {
+                                buildLocalIndex(result, results, ptr, mutations);
+                            } else if (isDelete) {
+                                deleteRow(results, mutations);
+                            } else if (isUpsert) {
+                                upsert(result, ptr, mutations);
+                            } else if (deleteCF != null && deleteCQ != null) {
+                                deleteCForQ(result, results, mutations);
+                            }
+                            if (emptyCF != null) {
+                                /*
+                                 * If we've specified an emptyCF, then we need to insert an empty
+                                 * key value "retroactively" for any key value that is visible at
+                                 * the timestamp that the DDL was issued. Key values that are not
+                                 * visible at this timestamp will not ever be projected up to
+                                 * scans past this timestamp, so don't need to be considered.
+                                 * We insert one empty key value per row per timestamp.
+                                 */
+                                insertEmptyKeyValue(results, mutations);
+                            }
+                            if (ServerUtil.readyToCommit(mutations.size(), mutations.byteSize(), maxBatchSize, maxBatchSizeBytes)) {
+                                ungroupedAggregateRegionObserver.commit(region, mutations, indexUUID, blockingMemStoreSize, indexMaintainersPtr,
+                                        txState, targetHTable, useIndexProto, isPKChanging, clientVersionBytes);
+                                mutations.clear();
+                            }
+                            // Commit in batches based on UPSERT_BATCH_SIZE_BYTES_ATTRIB in config
+
+                            if (ServerUtil.readyToCommit(indexMutations.size(), indexMutations.byteSize(), maxBatchSize, maxBatchSizeBytes)) {
+                                setIndexAndTransactionProperties(indexMutations, indexUUID, indexMaintainersPtr, txState, clientVersionBytes, useIndexProto);
+                                ungroupedAggregateRegionObserver.commitBatch(region, indexMutations, blockingMemStoreSize);
+                                indexMutations.clear();
+                            }
+                            aggregators.aggregate(rowAggregators, result);
+                            hasAny = true;
+                        }
+                    } while (hasMore && (EnvironmentEdgeManager.currentTimeMillis() - startTime) < pageSizeInMs);

Review comment:
       Ok makes sense.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [phoenix] kadirozde commented on a change in pull request #936: PHOENIX-5998 Paged server side ungrouped aggregate operations

Posted by GitBox <gi...@apache.org>.

kadirozde commented on a change in pull request #936:
URL: https://github.com/apache/phoenix/pull/936#discussion_r521622742



##########
File path: phoenix-core/src/test/java/org/apache/phoenix/query/BaseTest.java
##########
@@ -629,6 +626,11 @@ public static Configuration setUpConfigForMiniCluster(Configuration conf, ReadOn
         conf.setInt(HConstants.HBASE_CLIENT_RETRIES_NUMBER, 2);
         conf.setInt(NUM_CONCURRENT_INDEX_WRITER_THREADS_CONF_KEY, 1);
         conf.setInt(GLOBAL_INDEX_ROW_AGE_THRESHOLD_TO_DELETE_MS_ATTRIB, 0);
+        if (conf.getLong(QueryServices.UNGROUPED_AGGREGATE_PAGE_SIZE_IN_MS, 0) == 0) {
+            conf.setLong(QueryServices.UNGROUPED_AGGREGATE_PAGE_SIZE_IN_MS, 0);

Review comment:
       I set it to 1 ms first but observed that even within 1ms pages, several iterations can be processed in a page . So, in order to make sure that every test is subject to paging, I had to set it to 0, which results in one-row pages. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [phoenix] kadirozde commented on a change in pull request #936: PHOENIX-5998 Paged server side ungrouped aggregate operations

Posted by GitBox <gi...@apache.org>.

kadirozde commented on a change in pull request #936:
URL: https://github.com/apache/phoenix/pull/936#discussion_r521620095



##########
File path: phoenix-core/src/main/java/org/apache/phoenix/coprocessor/UngroupedAggregateRegionScanner.java
##########
@@ -0,0 +1,646 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.phoenix.coprocessor;
+
+import static org.apache.phoenix.coprocessor.BaseScannerRegionObserver.LOCAL_INDEX_BUILD;
+import static org.apache.phoenix.coprocessor.BaseScannerRegionObserver.LOCAL_INDEX_BUILD_PROTO;
+import static org.apache.phoenix.coprocessor.BaseScannerRegionObserver.REPLAY_WRITES;
+import static org.apache.phoenix.coprocessor.BaseScannerRegionObserver.UPGRADE_DESC_ROW_KEY;
+import static org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver.checkForLocalIndexColumnFamilies;
+import static org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver.deserializeExpressions;
+import static org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver.deserializeTable;
+import static org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver.getBlockingMemstoreSize;
+import static org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver.setIndexAndTransactionProperties;
+import static org.apache.phoenix.query.QueryConstants.AGG_TIMESTAMP;
+import static org.apache.phoenix.query.QueryConstants.SINGLE_COLUMN;
+import static org.apache.phoenix.query.QueryConstants.SINGLE_COLUMN_FAMILY;
+import static org.apache.phoenix.query.QueryServices.MUTATE_BATCH_SIZE_ATTRIB;
+import static org.apache.phoenix.query.QueryServices.MUTATE_BATCH_SIZE_BYTES_ATTRIB;
+import static org.apache.phoenix.query.QueryServices.UNGROUPED_AGGREGATE_PAGE_SIZE_IN_MS;
+import static org.apache.phoenix.schema.PTableImpl.getColumnsToClone;
+
+import java.io.IOException;
+import java.sql.SQLException;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.List;
+import java.util.Set;
+
+import com.google.common.collect.Sets;
+import com.google.common.primitives.Ints;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.hbase.Cell;
+import org.apache.hadoop.hbase.CellUtil;
+import org.apache.hadoop.hbase.DoNotRetryIOException;
+import org.apache.hadoop.hbase.HRegionInfo;
+import org.apache.hadoop.hbase.KeyValue;
+import org.apache.hadoop.hbase.client.Delete;
+import org.apache.hadoop.hbase.client.HTable;
+import org.apache.hadoop.hbase.client.Mutation;
+import org.apache.hadoop.hbase.client.Put;
+import org.apache.hadoop.hbase.client.Scan;
+import org.apache.hadoop.hbase.coprocessor.ObserverContext;
+import org.apache.hadoop.hbase.coprocessor.RegionCoprocessorEnvironment;
+import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
+import org.apache.hadoop.hbase.regionserver.Region;
+import org.apache.hadoop.hbase.regionserver.RegionScanner;
+import org.apache.hadoop.hbase.util.Bytes;
+import org.apache.phoenix.cache.GlobalCache;
+import org.apache.phoenix.cache.TenantCache;
+import org.apache.phoenix.exception.DataExceedsCapacityException;
+import org.apache.phoenix.expression.Expression;
+import org.apache.phoenix.expression.aggregator.Aggregator;
+import org.apache.phoenix.expression.aggregator.Aggregators;
+import org.apache.phoenix.expression.aggregator.ServerAggregators;
+import org.apache.phoenix.hbase.index.covered.update.ColumnReference;
+import org.apache.phoenix.hbase.index.util.GenericKeyValueBuilder;
+import org.apache.phoenix.index.PhoenixIndexCodec;
+import org.apache.phoenix.memory.InsufficientMemoryException;
+import org.apache.phoenix.memory.MemoryManager;
+import org.apache.phoenix.query.QueryConstants;
+import org.apache.phoenix.query.QueryServicesOptions;
+import org.apache.phoenix.schema.PColumn;
+import org.apache.phoenix.schema.PRow;
+import org.apache.phoenix.schema.PTable;
+import org.apache.phoenix.schema.PTableImpl;
+import org.apache.phoenix.schema.PTableType;
+import org.apache.phoenix.schema.RowKeySchema;
+import org.apache.phoenix.schema.SortOrder;
+import org.apache.phoenix.schema.TableRef;
+import org.apache.phoenix.schema.ValueSchema;
+import org.apache.phoenix.schema.tuple.EncodedColumnQualiferCellsList;
+import org.apache.phoenix.schema.tuple.MultiKeyValueTuple;
+import org.apache.phoenix.schema.tuple.PositionBasedMultiKeyValueTuple;
+import org.apache.phoenix.schema.tuple.Tuple;
+import org.apache.phoenix.schema.types.PBinary;
+import org.apache.phoenix.schema.types.PChar;
+import org.apache.phoenix.schema.types.PDataType;
+import org.apache.phoenix.schema.types.PDouble;
+import org.apache.phoenix.schema.types.PFloat;
+import org.apache.phoenix.transaction.PhoenixTransactionContext;
+import org.apache.phoenix.transaction.PhoenixTransactionProvider;
+import org.apache.phoenix.transaction.TransactionFactory;
+import org.apache.phoenix.util.ByteUtil;
+import org.apache.phoenix.util.EncodedColumnsUtil;
+import org.apache.hadoop.hbase.util.Pair;
+import org.apache.phoenix.hbase.index.ValueGetter;
+import org.apache.phoenix.hbase.index.util.ImmutableBytesPtr;
+import org.apache.phoenix.index.IndexMaintainer;
+import org.apache.phoenix.util.EnvironmentEdgeManager;
+import org.apache.phoenix.util.ExpressionUtil;
+import org.apache.phoenix.util.IndexUtil;
+import org.apache.phoenix.util.KeyValueUtil;
+import org.apache.phoenix.util.LogUtil;
+import org.apache.phoenix.util.ScanUtil;
+import org.apache.phoenix.util.ServerUtil;
+import org.apache.phoenix.util.StringUtil;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+public class UngroupedAggregateRegionScanner extends BaseRegionScanner {
+
+    private static final Logger LOGGER = LoggerFactory.getLogger(UngroupedAggregateRegionScanner.class);
+
+    protected long pageSizeInMs = Long.MAX_VALUE;
+    protected int maxBatchSize = 0;
+    protected Scan scan;
+    protected RegionScanner innerScanner;
+    protected Region region;
+    private final UngroupedAggregateRegionObserver ungroupedAggregateRegionObserver;
+    final private RegionCoprocessorEnvironment env;
+    private final boolean useQualifierAsIndex;
+    boolean needToWrite = false;
+    final Pair<Integer, Integer> minMaxQualifiers;
+    byte[][] values = null;
+    PTable.QualifierEncodingScheme encodingScheme;
+    PTable writeToTable = null;
+    PTable projectedTable = null;
+    boolean isDescRowKeyOrderUpgrade;
+    final int offset;
+    boolean buildLocalIndex;
+    List<IndexMaintainer> indexMaintainers;
+    boolean isPKChanging = false;
+    long ts;
+    PhoenixTransactionProvider txnProvider = null;
+    UngroupedAggregateRegionObserver.MutationList indexMutations;
+    boolean isDelete = false;
+    byte[] replayMutations;
+    boolean isUpsert = false;
+    List<Expression> selectExpressions = null;
+    byte[] deleteCQ = null;
+    byte[] deleteCF = null;
+    byte[] emptyCF = null;
+    byte[] indexUUID;
+    byte[] txState;
+    byte[] clientVersionBytes;
+    long blockingMemStoreSize;
+    long maxBatchSizeBytes = 0L;
+    HTable targetHTable = null;
+    boolean incrScanRefCount = false;
+    byte[] indexMaintainersPtr;
+    boolean useIndexProto;
+
+    public UngroupedAggregateRegionScanner(final ObserverContext<RegionCoprocessorEnvironment> c,
+                                           final RegionScanner innerScanner, final Region region, final Scan scan,
+                                           final RegionCoprocessorEnvironment env,
+                                           final UngroupedAggregateRegionObserver ungroupedAggregateRegionObserver)
+            throws IOException, SQLException{
+        super(innerScanner);
+        this.env = env;
+        this.region = region;
+        this.scan = scan;
+        this.ungroupedAggregateRegionObserver = ungroupedAggregateRegionObserver;
+        this.innerScanner = innerScanner;
+        Configuration conf = env.getConfiguration();
+        if (scan.getAttribute(BaseScannerRegionObserver.SERVER_PAGING) != null) {
+            byte[] pageSizeFromScan =
+                    scan.getAttribute(BaseScannerRegionObserver.AGGREGATE_PAGE_ROWS);
+            if (pageSizeFromScan != null) {
+                pageSizeInMs = Bytes.toLong(pageSizeFromScan);
+            } else {
+                pageSizeInMs =
+                        conf.getLong(UNGROUPED_AGGREGATE_PAGE_SIZE_IN_MS,
+                                QueryServicesOptions.DEFAULT_UNGROUPED_AGGREGATE_PAGE_SIZE_IN_MS);
+            }
+        }
+        ts = scan.getTimeRange().getMax();
+        boolean localIndexScan = ScanUtil.isLocalIndex(scan);
+
+        encodingScheme = EncodedColumnsUtil.getQualifierEncodingScheme(scan);
+        int offsetToBe = 0;
+        if (localIndexScan) {
+            /*
+             * For local indexes, we need to set an offset on row key expressions to skip
+             * the region start key.
+             */
+            offsetToBe = region.getRegionInfo().getStartKey().length != 0 ? region.getRegionInfo().getStartKey().length :
+                    region.getRegionInfo().getEndKey().length;
+            ScanUtil.setRowKeyOffset(scan, offsetToBe);
+        }
+        offset = offsetToBe;
+
+        byte[] descRowKeyTableBytes = scan.getAttribute(UPGRADE_DESC_ROW_KEY);
+        isDescRowKeyOrderUpgrade = descRowKeyTableBytes != null;
+        if (isDescRowKeyOrderUpgrade) {
+            LOGGER.debug("Upgrading row key for " + region.getRegionInfo().getTable().getNameAsString());
+            projectedTable = deserializeTable(descRowKeyTableBytes);
+            try {
+                writeToTable = PTableImpl.builderWithColumns(projectedTable,
+                        getColumnsToClone(projectedTable))
+                        .setRowKeyOrderOptimizable(true)
+                        .build();
+            } catch (SQLException e) {
+                ServerUtil.throwIOException("Upgrade failed", e); // Impossible
+            }
+            values = new byte[projectedTable.getPKColumns().size()][];
+        }
+        boolean useProto = false;
+        byte[] localIndexBytes = scan.getAttribute(LOCAL_INDEX_BUILD_PROTO);
+        useProto = localIndexBytes != null;
+        if (localIndexBytes == null) {
+            localIndexBytes = scan.getAttribute(LOCAL_INDEX_BUILD);
+        }
+        indexMaintainers = localIndexBytes == null ? null : IndexMaintainer.deserialize(localIndexBytes, useProto);
+        indexMutations = localIndexBytes == null ? new UngroupedAggregateRegionObserver.MutationList() : new UngroupedAggregateRegionObserver.MutationList(1024);
+
+        replayMutations = scan.getAttribute(REPLAY_WRITES);
+        indexUUID = scan.getAttribute(PhoenixIndexCodec.INDEX_UUID);
+        txState = scan.getAttribute(BaseScannerRegionObserver.TX_STATE);
+        clientVersionBytes = scan.getAttribute(BaseScannerRegionObserver.CLIENT_VERSION);
+        if (txState != null) {
+            int clientVersion = clientVersionBytes == null ? ScanUtil.UNKNOWN_CLIENT_VERSION : Bytes.toInt(clientVersionBytes);
+            txnProvider = TransactionFactory.getTransactionProvider(txState, clientVersion);
+        }
+        byte[] upsertSelectTable = scan.getAttribute(BaseScannerRegionObserver.UPSERT_SELECT_TABLE);
+        if (upsertSelectTable != null) {
+            isUpsert = true;
+            projectedTable = deserializeTable(upsertSelectTable);
+            targetHTable = new HTable(ungroupedAggregateRegionObserver.getUpsertSelectConfig(),
+                    projectedTable.getPhysicalName().getBytes());
+            selectExpressions = deserializeExpressions(scan.getAttribute(BaseScannerRegionObserver.UPSERT_SELECT_EXPRS));
+            values = new byte[projectedTable.getPKColumns().size()][];
+            isPKChanging = ExpressionUtil.isPkPositionChanging(new TableRef(projectedTable), selectExpressions);
+        } else {
+            byte[] isDeleteAgg = scan.getAttribute(BaseScannerRegionObserver.DELETE_AGG);
+            isDelete = isDeleteAgg != null && Bytes.compareTo(PDataType.TRUE_BYTES, isDeleteAgg) == 0;
+            if (!isDelete) {
+                deleteCF = scan.getAttribute(BaseScannerRegionObserver.DELETE_CF);
+                deleteCQ = scan.getAttribute(BaseScannerRegionObserver.DELETE_CQ);
+            }
+            emptyCF = scan.getAttribute(BaseScannerRegionObserver.EMPTY_CF);
+        }
+        ColumnReference[] dataColumns = IndexUtil.deserializeDataTableColumnsToJoin(scan);
+        useQualifierAsIndex = EncodedColumnsUtil.useQualifierAsIndex(EncodedColumnsUtil.getMinMaxQualifiersFromScan(scan));
+
+        /**
+         * Slow down the writes if the memstore size more than
+         * (hbase.hregion.memstore.block.multiplier - 1) times hbase.hregion.memstore.flush.size
+         * bytes. This avoids flush storm to hdfs for cases like index building where reads and
+         * write happen to all the table regions in the server.
+         */
+        blockingMemStoreSize = getBlockingMemstoreSize(region, conf) ;
+
+        buildLocalIndex = indexMaintainers != null && dataColumns==null && !localIndexScan;
+        if(buildLocalIndex) {
+            checkForLocalIndexColumnFamilies(region, indexMaintainers);
+        }
+        if (isDescRowKeyOrderUpgrade || isDelete || isUpsert
+                || (deleteCQ != null && deleteCF != null) || emptyCF != null || buildLocalIndex) {
+            needToWrite = true;
+            if((isUpsert && (targetHTable == null ||
+                    !targetHTable.getName().equals(region.getTableDesc().getTableName())))) {
+                needToWrite = false;
+            }
+            maxBatchSize = conf.getInt(MUTATE_BATCH_SIZE_ATTRIB, QueryServicesOptions.DEFAULT_MUTATE_BATCH_SIZE);
+            maxBatchSizeBytes = conf.getLong(MUTATE_BATCH_SIZE_BYTES_ATTRIB,
+                    QueryServicesOptions.DEFAULT_MUTATE_BATCH_SIZE_BYTES);
+        }
+        minMaxQualifiers = EncodedColumnsUtil.getMinMaxQualifiersFromScan(scan);
+        if (LOGGER.isDebugEnabled()) {
+            LOGGER.debug(LogUtil.addCustomAnnotations("Starting ungrouped coprocessor scan " + scan + " " + region.getRegionInfo(), ScanUtil.getCustomAnnotations(scan)));
+        }
+        useIndexProto = true;
+        indexMaintainersPtr = scan.getAttribute(PhoenixIndexCodec.INDEX_PROTO_MD);
+        // for backward compatiblity fall back to look by the old attribute
+        if (indexMaintainersPtr == null) {
+            indexMaintainersPtr = scan.getAttribute(PhoenixIndexCodec.INDEX_MD);
+            useIndexProto = false;
+        }
+
+        if (needToWrite) {
+            ungroupedAggregateRegionObserver.incrementScansReferenceCount();
+            incrScanRefCount = true;
+        }
+    }
+
+    @Override
+    public HRegionInfo getRegionInfo() {
+        return region.getRegionInfo();
+    }
+
+    @Override
+    public boolean isFilterDone() {
+        return false;
+    }
+
+    @Override
+    public void close() throws IOException {
+        if (needToWrite && incrScanRefCount) {
+            ungroupedAggregateRegionObserver.decrementScansReferenceCount();
+        }
+        try {
+            if (targetHTable != null) {
+                try {
+                    targetHTable.close();
+                } catch (IOException e) {
+                    LOGGER.error("Closing table: " + targetHTable + " failed: ", e);
+                }
+            }
+        } finally {
+            innerScanner.close();
+        }
+    }
+
+    boolean descRowKeyOrderUpgrade(List<Cell> results, ImmutableBytesWritable ptr,
+                                UngroupedAggregateRegionObserver.MutationList mutations) throws IOException {
+        Arrays.fill(values, null);
+        Cell firstKV = results.get(0);
+        RowKeySchema schema = projectedTable.getRowKeySchema();
+        int maxOffset = schema.iterator(firstKV.getRowArray(), firstKV.getRowOffset() + offset, firstKV.getRowLength(), ptr);
+        for (int i = 0; i < schema.getFieldCount(); i++) {
+            Boolean hasValue = schema.next(ptr, i, maxOffset);
+            if (hasValue == null) {
+                break;
+            }
+            ValueSchema.Field field = schema.getField(i);
+            if (field.getSortOrder() == SortOrder.DESC) {
+                // Special case for re-writing DESC ARRAY, as the actual byte value needs to change in this case
+                if (field.getDataType().isArrayType()) {
+                    field.getDataType().coerceBytes(ptr, null, field.getDataType(),
+                            field.getMaxLength(), field.getScale(), field.getSortOrder(),
+                            field.getMaxLength(), field.getScale(), field.getSortOrder(), true); // force to use correct separator byte
+                }
+                // Special case for re-writing DESC CHAR or DESC BINARY, to force the re-writing of trailing space characters
+                else if (field.getDataType() == PChar.INSTANCE || field.getDataType() == PBinary.INSTANCE) {
+                    int len = ptr.getLength();
+                    while (len > 0 && ptr.get()[ptr.getOffset() + len - 1] == StringUtil.SPACE_UTF8) {
+                        len--;
+                    }
+                    ptr.set(ptr.get(), ptr.getOffset(), len);
+                    // Special case for re-writing DESC FLOAT and DOUBLE, as they're not inverted like they should be (PHOENIX-2171)
+                } else if (field.getDataType() == PFloat.INSTANCE || field.getDataType() == PDouble.INSTANCE) {
+                    byte[] invertedBytes = SortOrder.invert(ptr.get(), ptr.getOffset(), ptr.getLength());
+                    ptr.set(invertedBytes);
+                }
+            } else if (field.getDataType() == PBinary.INSTANCE) {
+                // Remove trailing space characters so that the setValues call below will replace them
+                // with the correct zero byte character. Note this is somewhat dangerous as these
+                // could be legit, but I don't know what the alternative is.
+                int len = ptr.getLength();
+                while (len > 0 && ptr.get()[ptr.getOffset() + len - 1] == StringUtil.SPACE_UTF8) {
+                    len--;
+                }
+                ptr.set(ptr.get(), ptr.getOffset(), len);
+            }
+            values[i] = ptr.copyBytes();
+        }
+        writeToTable.newKey(ptr, values);
+        if (Bytes.compareTo(
+                firstKV.getRowArray(), firstKV.getRowOffset() + offset, firstKV.getRowLength(),
+                ptr.get(),ptr.getOffset() + offset,ptr.getLength()) == 0) {
+            return false;
+        }
+        byte[] newRow = ByteUtil.copyKeyBytesIfNecessary(ptr);
+        if (offset > 0) { // for local indexes (prepend region start key)
+            byte[] newRowWithOffset = new byte[offset + newRow.length];
+            System.arraycopy(firstKV.getRowArray(), firstKV.getRowOffset(), newRowWithOffset, 0, offset);;
+            System.arraycopy(newRow, 0, newRowWithOffset, offset, newRow.length);
+            newRow = newRowWithOffset;
+        }
+        byte[] oldRow = Bytes.copy(firstKV.getRowArray(), firstKV.getRowOffset(), firstKV.getRowLength());
+        for (Cell cell : results) {
+            // Copy existing cell but with new row key
+            Cell newCell = new KeyValue(newRow, 0, newRow.length,
+                    cell.getFamilyArray(), cell.getFamilyOffset(), cell.getFamilyLength(),
+                    cell.getQualifierArray(), cell.getQualifierOffset(), cell.getQualifierLength(),
+                    cell.getTimestamp(), KeyValue.Type.codeToType(cell.getTypeByte()),
+                    cell.getValueArray(), cell.getValueOffset(), cell.getValueLength());
+            switch (KeyValue.Type.codeToType(cell.getTypeByte())) {
+                case Put:
+                    // If Put, point delete old Put
+                    Delete del = new Delete(oldRow);
+                    del.addDeleteMarker(new KeyValue(cell.getRowArray(), cell.getRowOffset(), cell.getRowLength(),
+                            cell.getFamilyArray(), cell.getFamilyOffset(), cell.getFamilyLength(),
+                            cell.getQualifierArray(), cell.getQualifierOffset(),
+                            cell.getQualifierLength(), cell.getTimestamp(), KeyValue.Type.Delete,
+                            ByteUtil.EMPTY_BYTE_ARRAY, 0, 0));
+                    mutations.add(del);
+
+                    Put put = new Put(newRow);
+                    put.add(newCell);
+                    mutations.add(put);
+                    break;
+                case Delete:
+                case DeleteColumn:
+                case DeleteFamily:
+                case DeleteFamilyVersion:
+                    Delete delete = new Delete(newRow);
+                    delete.addDeleteMarker(newCell);
+                    mutations.add(delete);
+                    break;
+            }
+        }
+        return true;
+    }
+
+    void buildLocalIndex(Tuple result, List<Cell> results, ImmutableBytesWritable ptr,
+                         UngroupedAggregateRegionObserver.MutationList mutations) throws IOException {
+        for (IndexMaintainer maintainer : indexMaintainers) {
+            if (!results.isEmpty()) {
+                result.getKey(ptr);
+                ValueGetter valueGetter =
+                        maintainer.createGetterFromKeyValues(
+                                ImmutableBytesPtr.copyBytesIfNecessary(ptr),
+                                results);
+                Put put = maintainer.buildUpdateMutation(GenericKeyValueBuilder.INSTANCE,
+                        valueGetter, ptr, results.get(0).getTimestamp(),
+                        env.getRegion().getRegionInfo().getStartKey(),
+                        env.getRegion().getRegionInfo().getEndKey());
+
+                if (txnProvider != null) {
+                    put = txnProvider.markPutAsCommitted(put, ts, ts);
+                }
+                indexMutations.add(put);
+            }
+        }
+        result.setKeyValues(results);
+    }
+    void deleteRow(List<Cell> results, UngroupedAggregateRegionObserver.MutationList mutations) {
+        // FIXME: the version of the Delete constructor without the lock
+        // args was introduced in 0.94.4, thus if we try to use it here
+        // we can no longer use the 0.94.2 version of the client.
+        Cell firstKV = results.get(0);
+        Delete delete = new Delete(firstKV.getRowArray(),
+                firstKV.getRowOffset(), firstKV.getRowLength(),ts);
+        if (replayMutations != null) {
+            delete.setAttribute(REPLAY_WRITES, replayMutations);
+        }
+        mutations.add(delete);
+        // force tephra to ignore this deletes
+        delete.setAttribute(PhoenixTransactionContext.TX_ROLLBACK_ATTRIBUTE_KEY, new byte[0]);
+    }
+
+    void deleteCForQ(Tuple result, List<Cell> results, UngroupedAggregateRegionObserver.MutationList mutations) {
+        // No need to search for delete column, since we project only it
+        // if no empty key value is being set
+        if (emptyCF == null ||
+                result.getValue(deleteCF, deleteCQ) != null) {
+            Delete delete = new Delete(results.get(0).getRowArray(),
+                    results.get(0).getRowOffset(),
+                    results.get(0).getRowLength());
+            delete.deleteColumns(deleteCF,  deleteCQ, ts);
+            // force tephra to ignore this deletes
+            delete.setAttribute(PhoenixTransactionContext.TX_ROLLBACK_ATTRIBUTE_KEY, new byte[0]);
+            mutations.add(delete);
+        }
+    }
+    void upsert(Tuple result, ImmutableBytesWritable ptr, UngroupedAggregateRegionObserver.MutationList mutations) {
+        Arrays.fill(values, null);
+        int bucketNumOffset = 0;
+        if (projectedTable.getBucketNum() != null) {
+            values[0] = new byte[] { 0 };
+            bucketNumOffset = 1;
+        }
+        int i = bucketNumOffset;
+        List<PColumn> projectedColumns = projectedTable.getColumns();
+        for (; i < projectedTable.getPKColumns().size(); i++) {
+            Expression expression = selectExpressions.get(i - bucketNumOffset);
+            if (expression.evaluate(result, ptr)) {
+                values[i] = ptr.copyBytes();
+                // If SortOrder from expression in SELECT doesn't match the
+                // column being projected into then invert the bits.
+                if (expression.getSortOrder() !=
+                        projectedColumns.get(i).getSortOrder()) {
+                    SortOrder.invert(values[i], 0, values[i], 0,
+                            values[i].length);
+                }
+            }else{
+                values[i] = ByteUtil.EMPTY_BYTE_ARRAY;
+            }
+        }
+        projectedTable.newKey(ptr, values);
+        PRow row = projectedTable.newRow(GenericKeyValueBuilder.INSTANCE, ts, ptr, false);
+        for (; i < projectedColumns.size(); i++) {
+            Expression expression = selectExpressions.get(i - bucketNumOffset);
+            if (expression.evaluate(result, ptr)) {
+                PColumn column = projectedColumns.get(i);
+                if (!column.getDataType().isSizeCompatible(ptr, null,
+                        expression.getDataType(), expression.getSortOrder(),
+                        expression.getMaxLength(), expression.getScale(),
+                        column.getMaxLength(), column.getScale())) {
+                    throw new DataExceedsCapacityException(
+                            column.getDataType(), column.getMaxLength(),
+                            column.getScale(), column.getName().getString(), ptr);
+                }
+                column.getDataType().coerceBytes(ptr, null,
+                        expression.getDataType(), expression.getMaxLength(),
+                        expression.getScale(), expression.getSortOrder(),
+                        column.getMaxLength(), column.getScale(),
+                        column.getSortOrder(), projectedTable.rowKeyOrderOptimizable());
+                byte[] bytes = ByteUtil.copyKeyBytesIfNecessary(ptr);
+                row.setValue(column, bytes);
+            }
+        }
+        for (Mutation mutation : row.toRowMutations()) {
+            if (replayMutations != null) {
+                mutation.setAttribute(REPLAY_WRITES, replayMutations);
+            } else if (txnProvider != null && projectedTable.getType() == PTableType.INDEX) {
+                mutation = txnProvider.markPutAsCommitted((Put)mutation, ts, ts);
+            }
+            mutations.add(mutation);
+        }
+        for (i = 0; i < selectExpressions.size(); i++) {
+            selectExpressions.get(i).reset();
+        }
+    }
+
+    void insertEmptyKeyValue(List<Cell> results, UngroupedAggregateRegionObserver.MutationList mutations) {
+        Set<Long> timeStamps =
+                Sets.newHashSetWithExpectedSize(results.size());
+        for (Cell kv : results) {
+            long kvts = kv.getTimestamp();
+            if (!timeStamps.contains(kvts)) {
+                Put put = new Put(kv.getRowArray(), kv.getRowOffset(),
+                        kv.getRowLength());
+                put.addColumn(emptyCF, QueryConstants.EMPTY_COLUMN_BYTES, kvts,
+                        ByteUtil.EMPTY_BYTE_ARRAY);
+                mutations.add(put);
+            }
+        }
+    }
+    @Override
+    public boolean next(List<Cell> resultsToReturn) throws IOException {
+        boolean hasMore;
+        long startTime = EnvironmentEdgeManager.currentTimeMillis();
+        Configuration conf = env.getConfiguration();
+        final TenantCache tenantCache = GlobalCache.getTenantCache(env, ScanUtil.getTenantId(scan));
+        try (MemoryManager.MemoryChunk em = tenantCache.getMemoryManager().allocate(0)) {
+            Aggregators aggregators = ServerAggregators.deserialize(
+                    scan.getAttribute(BaseScannerRegionObserver.AGGREGATORS), conf, em);
+            Aggregator[] rowAggregators = aggregators.getAggregators();
+            aggregators.reset(rowAggregators);
+            Cell lastCell = null;
+            boolean hasAny = false;
+            ImmutableBytesWritable ptr = new ImmutableBytesWritable();
+            Tuple result = useQualifierAsIndex ? new PositionBasedMultiKeyValueTuple() : new MultiKeyValueTuple();
+            UngroupedAggregateRegionObserver.MutationList mutations = new UngroupedAggregateRegionObserver.MutationList();
+            if (isDescRowKeyOrderUpgrade || isDelete || isUpsert
+                    || (deleteCQ != null && deleteCF != null) || emptyCF != null || buildLocalIndex) {
+                mutations = new UngroupedAggregateRegionObserver.MutationList(Ints.saturatedCast(maxBatchSize + maxBatchSize / 10));
+            }
+            region.startRegionOperation();
+            try {
+                synchronized (innerScanner) {
+                    do {
+                        ungroupedAggregateRegionObserver.checkForRegionClosing();
+                        List<Cell> results = useQualifierAsIndex ? new EncodedColumnQualiferCellsList(minMaxQualifiers.getFirst(), minMaxQualifiers.getSecond(), encodingScheme) : new ArrayList<Cell>();
+                        // Results are potentially returned even when the return value of s.next is false
+                        // since this is an indication of whether or not there are more values after the
+                        // ones returned
+                        hasMore = innerScanner.nextRaw(results);
+                        if (!results.isEmpty()) {
+                            lastCell = results.get(0);
+                            result.setKeyValues(results);
+                            if (isDescRowKeyOrderUpgrade) {
+                                if (!descRowKeyOrderUpgrade(results, ptr, mutations)) {
+                                    continue;
+                                }
+                            } else if (buildLocalIndex) {
+                                buildLocalIndex(result, results, ptr, mutations);
+                            } else if (isDelete) {
+                                deleteRow(results, mutations);
+                            } else if (isUpsert) {
+                                upsert(result, ptr, mutations);
+                            } else if (deleteCF != null && deleteCQ != null) {
+                                deleteCForQ(result, results, mutations);
+                            }
+                            if (emptyCF != null) {
+                                /*
+                                 * If we've specified an emptyCF, then we need to insert an empty
+                                 * key value "retroactively" for any key value that is visible at
+                                 * the timestamp that the DDL was issued. Key values that are not
+                                 * visible at this timestamp will not ever be projected up to
+                                 * scans past this timestamp, so don't need to be considered.
+                                 * We insert one empty key value per row per timestamp.
+                                 */
+                                insertEmptyKeyValue(results, mutations);
+                            }
+                            if (ServerUtil.readyToCommit(mutations.size(), mutations.byteSize(), maxBatchSize, maxBatchSizeBytes)) {
+                                ungroupedAggregateRegionObserver.commit(region, mutations, indexUUID, blockingMemStoreSize, indexMaintainersPtr,
+                                        txState, targetHTable, useIndexProto, isPKChanging, clientVersionBytes);
+                                mutations.clear();
+                            }
+                            // Commit in batches based on UPSERT_BATCH_SIZE_BYTES_ATTRIB in config
+
+                            if (ServerUtil.readyToCommit(indexMutations.size(), indexMutations.byteSize(), maxBatchSize, maxBatchSizeBytes)) {
+                                setIndexAndTransactionProperties(indexMutations, indexUUID, indexMaintainersPtr, txState, clientVersionBytes, useIndexProto);
+                                ungroupedAggregateRegionObserver.commitBatch(region, indexMutations, blockingMemStoreSize);
+                                indexMutations.clear();
+                            }
+                            aggregators.aggregate(rowAggregators, result);
+                            hasAny = true;
+                        }
+                    } while (hasMore && (EnvironmentEdgeManager.currentTimeMillis() - startTime) < pageSizeInMs);

Review comment:
       The duration for a page within coproc will be a fraction of HBase client timeout value (1 second vs 2 minutes). As long as each iteration does not take too long, i.e., minutes, this Jira will be sufficient to eliminate the timeouts.  However,  an iteration of a region scanner can take minutes. This happens when the filter set on the scan is very selective or there are too many delete markers in the table regions. 6211 will address this remaining issue. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [phoenix] ChinmaySKulkarni commented on a change in pull request #936: PHOENIX-5998 Paged server side ungrouped aggregate operations

Posted by GitBox <gi...@apache.org>.

ChinmaySKulkarni commented on a change in pull request #936:
URL: https://github.com/apache/phoenix/pull/936#discussion_r521655035



##########
File path: phoenix-core/src/test/java/org/apache/phoenix/query/BaseTest.java
##########
@@ -629,6 +626,11 @@ public static Configuration setUpConfigForMiniCluster(Configuration conf, ReadOn
         conf.setInt(HConstants.HBASE_CLIENT_RETRIES_NUMBER, 2);
         conf.setInt(NUM_CONCURRENT_INDEX_WRITER_THREADS_CONF_KEY, 1);
         conf.setInt(GLOBAL_INDEX_ROW_AGE_THRESHOLD_TO_DELETE_MS_ATTRIB, 0);
+        if (conf.getLong(QueryServices.UNGROUPED_AGGREGATE_PAGE_SIZE_IN_MS, 0) == 0) {
+            conf.setLong(QueryServices.UNGROUPED_AGGREGATE_PAGE_SIZE_IN_MS, 0);

Review comment:
       Oh nevermind. I see that you're passing in 0 as the default in the getLong. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [phoenix] ChinmaySKulkarni commented on a change in pull request #936: PHOENIX-5998 Paged server side ungrouped aggregate operations

Posted by GitBox <gi...@apache.org>.

ChinmaySKulkarni commented on a change in pull request #936:
URL: https://github.com/apache/phoenix/pull/936#discussion_r520204425



##########
File path: phoenix-core/src/test/java/org/apache/phoenix/query/BaseTest.java
##########
@@ -629,6 +629,9 @@ public static Configuration setUpConfigForMiniCluster(Configuration conf, ReadOn
         conf.setInt(HConstants.HBASE_CLIENT_RETRIES_NUMBER, 2);
         conf.setInt(NUM_CONCURRENT_INDEX_WRITER_THREADS_CONF_KEY, 1);
         conf.setInt(GLOBAL_INDEX_ROW_AGE_THRESHOLD_TO_DELETE_MS_ATTRIB, 0);
+        if (conf.getLong(QueryServices.UNGROUPED_AGGREGATE_PAGE_SIZE_IN_ROWS, 0) == 0) {
+            conf.setLong(QueryServices.UNGROUPED_AGGREGATE_PAGE_SIZE_IN_ROWS, 2);

Review comment:
       In tests we don't generally read/write too many rows anyways, so won't setting this config just unnecessarily slow them down?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [phoenix] kadirozde commented on a change in pull request #936: PHOENIX-5998 Paged server side ungrouped aggregate operations

Posted by GitBox <gi...@apache.org>.

kadirozde commented on a change in pull request #936:
URL: https://github.com/apache/phoenix/pull/936#discussion_r520745659



##########
File path: phoenix-core/src/main/java/org/apache/phoenix/iterate/TableResultIterator.java
##########
@@ -134,6 +137,7 @@ public TableResultIterator(MutationState mutationState, Scan scan, ScanMetricsHo
         this.retry=plan.getContext().getConnection().getQueryServices().getProps()
                 .getInt(QueryConstants.HASH_JOIN_CACHE_RETRIES, QueryConstants.DEFAULT_HASH_JOIN_CACHE_RETRIES);
         IndexUtil.setScanAttributesForIndexReadRepair(scan, table, plan.getContext().getConnection());
+        scan.setAttribute(BaseScannerRegionObserver.SERVER_PAGING, TRUE_BYTES);

Review comment:
       I added so that we do not need to upgrade the clients for this change and it can work with mix of old and new clients. Do you have another suggestion here? To disable paging, we can set the page size to Long.MAX_VALUE




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [phoenix] kadirozde commented on a change in pull request #936: PHOENIX-5998 Paged server side ungrouped aggregate operations

Posted by GitBox <gi...@apache.org>.

kadirozde commented on a change in pull request #936:
URL: https://github.com/apache/phoenix/pull/936#discussion_r520741463



##########
File path: phoenix-core/src/main/java/org/apache/phoenix/coprocessor/UngroupedAggregateRegionScanner.java
##########
@@ -0,0 +1,645 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.phoenix.coprocessor;
+
+import static org.apache.phoenix.coprocessor.BaseScannerRegionObserver.LOCAL_INDEX_BUILD;
+import static org.apache.phoenix.coprocessor.BaseScannerRegionObserver.LOCAL_INDEX_BUILD_PROTO;
+import static org.apache.phoenix.coprocessor.BaseScannerRegionObserver.REPLAY_WRITES;
+import static org.apache.phoenix.coprocessor.BaseScannerRegionObserver.UPGRADE_DESC_ROW_KEY;
+import static org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver.checkForLocalIndexColumnFamilies;
+import static org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver.deserializeExpressions;
+import static org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver.deserializeTable;
+import static org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver.getBlockingMemstoreSize;
+import static org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver.setIndexAndTransactionProperties;
+import static org.apache.phoenix.query.QueryConstants.AGG_TIMESTAMP;
+import static org.apache.phoenix.query.QueryConstants.SINGLE_COLUMN;
+import static org.apache.phoenix.query.QueryConstants.SINGLE_COLUMN_FAMILY;
+import static org.apache.phoenix.query.QueryServices.MUTATE_BATCH_SIZE_ATTRIB;
+import static org.apache.phoenix.query.QueryServices.MUTATE_BATCH_SIZE_BYTES_ATTRIB;
+import static org.apache.phoenix.query.QueryServices.UNGROUPED_AGGREGATE_PAGE_SIZE_IN_ROWS;
+import static org.apache.phoenix.schema.PTableImpl.getColumnsToClone;
+
+import java.io.IOException;
+import java.sql.SQLException;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.List;
+import java.util.Set;
+
+import com.google.common.collect.Sets;
+import com.google.common.primitives.Ints;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.hbase.Cell;
+import org.apache.hadoop.hbase.CellUtil;
+import org.apache.hadoop.hbase.DoNotRetryIOException;
+import org.apache.hadoop.hbase.HRegionInfo;
+import org.apache.hadoop.hbase.KeyValue;
+import org.apache.hadoop.hbase.client.Delete;
+import org.apache.hadoop.hbase.client.HTable;
+import org.apache.hadoop.hbase.client.Mutation;
+import org.apache.hadoop.hbase.client.Put;
+import org.apache.hadoop.hbase.client.Scan;
+import org.apache.hadoop.hbase.coprocessor.ObserverContext;
+import org.apache.hadoop.hbase.coprocessor.RegionCoprocessorEnvironment;
+import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
+import org.apache.hadoop.hbase.regionserver.Region;
+import org.apache.hadoop.hbase.regionserver.RegionScanner;
+import org.apache.hadoop.hbase.util.Bytes;
+import org.apache.phoenix.cache.GlobalCache;
+import org.apache.phoenix.cache.TenantCache;
+import org.apache.phoenix.exception.DataExceedsCapacityException;
+import org.apache.phoenix.expression.Expression;
+import org.apache.phoenix.expression.aggregator.Aggregator;
+import org.apache.phoenix.expression.aggregator.Aggregators;
+import org.apache.phoenix.expression.aggregator.ServerAggregators;
+import org.apache.phoenix.hbase.index.covered.update.ColumnReference;
+import org.apache.phoenix.hbase.index.util.GenericKeyValueBuilder;
+import org.apache.phoenix.index.PhoenixIndexCodec;
+import org.apache.phoenix.memory.InsufficientMemoryException;
+import org.apache.phoenix.memory.MemoryManager;
+import org.apache.phoenix.query.QueryConstants;
+import org.apache.phoenix.query.QueryServicesOptions;
+import org.apache.phoenix.schema.PColumn;
+import org.apache.phoenix.schema.PRow;
+import org.apache.phoenix.schema.PTable;
+import org.apache.phoenix.schema.PTableImpl;
+import org.apache.phoenix.schema.PTableType;
+import org.apache.phoenix.schema.RowKeySchema;
+import org.apache.phoenix.schema.SortOrder;
+import org.apache.phoenix.schema.TableRef;
+import org.apache.phoenix.schema.ValueSchema;
+import org.apache.phoenix.schema.tuple.EncodedColumnQualiferCellsList;
+import org.apache.phoenix.schema.tuple.MultiKeyValueTuple;
+import org.apache.phoenix.schema.tuple.PositionBasedMultiKeyValueTuple;
+import org.apache.phoenix.schema.tuple.Tuple;
+import org.apache.phoenix.schema.types.PBinary;
+import org.apache.phoenix.schema.types.PChar;
+import org.apache.phoenix.schema.types.PDataType;
+import org.apache.phoenix.schema.types.PDouble;
+import org.apache.phoenix.schema.types.PFloat;
+import org.apache.phoenix.transaction.PhoenixTransactionContext;
+import org.apache.phoenix.transaction.PhoenixTransactionProvider;
+import org.apache.phoenix.transaction.TransactionFactory;
+import org.apache.phoenix.util.ByteUtil;
+import org.apache.phoenix.util.EncodedColumnsUtil;
+import org.apache.hadoop.hbase.util.Pair;
+import org.apache.phoenix.hbase.index.ValueGetter;
+import org.apache.phoenix.hbase.index.util.ImmutableBytesPtr;
+import org.apache.phoenix.index.IndexMaintainer;
+import org.apache.phoenix.util.ExpressionUtil;
+import org.apache.phoenix.util.IndexUtil;
+import org.apache.phoenix.util.KeyValueUtil;
+import org.apache.phoenix.util.LogUtil;
+import org.apache.phoenix.util.ScanUtil;
+import org.apache.phoenix.util.ServerUtil;
+import org.apache.phoenix.util.StringUtil;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+public class UngroupedAggregateRegionScanner extends BaseRegionScanner {
+
+    private static final Logger LOGGER = LoggerFactory.getLogger(UngroupedAggregateRegionScanner.class);
+
+    protected long pageSizeInRows = Long.MAX_VALUE;
+    protected int maxBatchSize = 0;
+    protected Scan scan;
+    protected RegionScanner innerScanner;
+    protected Region region;
+    private final UngroupedAggregateRegionObserver ungroupedAggregateRegionObserver;
+    final private RegionCoprocessorEnvironment env;
+    private final boolean useQualifierAsIndex;
+    boolean needToWrite = false;
+    final Pair<Integer, Integer> minMaxQualifiers;
+    byte[][] values = null;
+    PTable.QualifierEncodingScheme encodingScheme;
+    PTable writeToTable = null;
+    PTable projectedTable = null;
+    boolean isDescRowKeyOrderUpgrade;
+    final int offset;
+    boolean buildLocalIndex;
+    List<IndexMaintainer> indexMaintainers;
+    boolean isPKChanging = false;
+    long ts;
+    PhoenixTransactionProvider txnProvider = null;
+    UngroupedAggregateRegionObserver.MutationList indexMutations;
+    boolean isDelete = false;
+    byte[] replayMutations;
+    boolean isUpsert = false;
+    List<Expression> selectExpressions = null;
+    byte[] deleteCQ = null;
+    byte[] deleteCF = null;
+    byte[] emptyCF = null;
+    byte[] indexUUID;
+    byte[] txState;
+    byte[] clientVersionBytes;
+    long blockingMemStoreSize;
+    long maxBatchSizeBytes = 0L;
+    HTable targetHTable = null;
+    boolean incrScanRefCount = false;
+    byte[] indexMaintainersPtr;
+    boolean useIndexProto;
+
+    public UngroupedAggregateRegionScanner(final ObserverContext<RegionCoprocessorEnvironment> c,
+                                           final RegionScanner innerScanner, final Region region, final Scan scan,
+                                           final RegionCoprocessorEnvironment env,
+                                           final UngroupedAggregateRegionObserver ungroupedAggregateRegionObserver)
+            throws IOException, SQLException{
+        super(innerScanner);
+        this.env = env;
+        this.region = region;
+        this.scan = scan;
+        this.ungroupedAggregateRegionObserver = ungroupedAggregateRegionObserver;
+        this.innerScanner = innerScanner;
+        Configuration conf = env.getConfiguration();
+        if (scan.getAttribute(BaseScannerRegionObserver.SERVER_PAGING) != null) {
+            byte[] pageSizeFromScan =
+                    scan.getAttribute(BaseScannerRegionObserver.AGGREGATE_PAGE_ROWS);
+            if (pageSizeFromScan != null) {
+                pageSizeInRows = Bytes.toLong(pageSizeFromScan);
+            } else {
+                pageSizeInRows =
+                        conf.getLong(UNGROUPED_AGGREGATE_PAGE_SIZE_IN_ROWS,
+                                QueryServicesOptions.DEFAULT_UNGROUPED_AGGREGATE_PAGE_SIZE_IN_ROWS);
+            }
+        }
+        ts = scan.getTimeRange().getMax();
+        boolean localIndexScan = ScanUtil.isLocalIndex(scan);
+
+        encodingScheme = EncodedColumnsUtil.getQualifierEncodingScheme(scan);
+        int offsetToBe = 0;
+        if (localIndexScan) {
+            /*
+             * For local indexes, we need to set an offset on row key expressions to skip
+             * the region start key.
+             */
+            offsetToBe = region.getRegionInfo().getStartKey().length != 0 ? region.getRegionInfo().getStartKey().length :
+                    region.getRegionInfo().getEndKey().length;
+            ScanUtil.setRowKeyOffset(scan, offsetToBe);
+        }
+        offset = offsetToBe;
+
+        byte[] descRowKeyTableBytes = scan.getAttribute(UPGRADE_DESC_ROW_KEY);
+        isDescRowKeyOrderUpgrade = descRowKeyTableBytes != null;
+        if (isDescRowKeyOrderUpgrade) {
+            LOGGER.debug("Upgrading row key for " + region.getRegionInfo().getTable().getNameAsString());
+            projectedTable = deserializeTable(descRowKeyTableBytes);
+            try {
+                writeToTable = PTableImpl.builderWithColumns(projectedTable,
+                        getColumnsToClone(projectedTable))
+                        .setRowKeyOrderOptimizable(true)
+                        .build();
+            } catch (SQLException e) {
+                ServerUtil.throwIOException("Upgrade failed", e); // Impossible
+            }
+            values = new byte[projectedTable.getPKColumns().size()][];
+        }
+        boolean useProto = false;
+        byte[] localIndexBytes = scan.getAttribute(LOCAL_INDEX_BUILD_PROTO);
+        useProto = localIndexBytes != null;
+        if (localIndexBytes == null) {
+            localIndexBytes = scan.getAttribute(LOCAL_INDEX_BUILD);
+        }
+        indexMaintainers = localIndexBytes == null ? null : IndexMaintainer.deserialize(localIndexBytes, useProto);
+        indexMutations = localIndexBytes == null ? new UngroupedAggregateRegionObserver.MutationList() : new UngroupedAggregateRegionObserver.MutationList(1024);
+
+        replayMutations = scan.getAttribute(REPLAY_WRITES);
+        indexUUID = scan.getAttribute(PhoenixIndexCodec.INDEX_UUID);
+        txState = scan.getAttribute(BaseScannerRegionObserver.TX_STATE);
+        clientVersionBytes = scan.getAttribute(BaseScannerRegionObserver.CLIENT_VERSION);
+        if (txState != null) {
+            int clientVersion = clientVersionBytes == null ? ScanUtil.UNKNOWN_CLIENT_VERSION : Bytes.toInt(clientVersionBytes);
+            txnProvider = TransactionFactory.getTransactionProvider(txState, clientVersion);
+        }
+        byte[] upsertSelectTable = scan.getAttribute(BaseScannerRegionObserver.UPSERT_SELECT_TABLE);
+        if (upsertSelectTable != null) {
+            isUpsert = true;
+            projectedTable = deserializeTable(upsertSelectTable);
+            targetHTable = new HTable(ungroupedAggregateRegionObserver.getUpsertSelectConfig(),
+                    projectedTable.getPhysicalName().getBytes());
+            selectExpressions = deserializeExpressions(scan.getAttribute(BaseScannerRegionObserver.UPSERT_SELECT_EXPRS));
+            values = new byte[projectedTable.getPKColumns().size()][];
+            isPKChanging = ExpressionUtil.isPkPositionChanging(new TableRef(projectedTable), selectExpressions);
+        } else {
+            byte[] isDeleteAgg = scan.getAttribute(BaseScannerRegionObserver.DELETE_AGG);
+            isDelete = isDeleteAgg != null && Bytes.compareTo(PDataType.TRUE_BYTES, isDeleteAgg) == 0;
+            if (!isDelete) {
+                deleteCF = scan.getAttribute(BaseScannerRegionObserver.DELETE_CF);
+                deleteCQ = scan.getAttribute(BaseScannerRegionObserver.DELETE_CQ);
+            }
+            emptyCF = scan.getAttribute(BaseScannerRegionObserver.EMPTY_CF);
+        }
+        ColumnReference[] dataColumns = IndexUtil.deserializeDataTableColumnsToJoin(scan);
+        useQualifierAsIndex = EncodedColumnsUtil.useQualifierAsIndex(EncodedColumnsUtil.getMinMaxQualifiersFromScan(scan));
+
+        /**
+         * Slow down the writes if the memstore size more than
+         * (hbase.hregion.memstore.block.multiplier - 1) times hbase.hregion.memstore.flush.size
+         * bytes. This avoids flush storm to hdfs for cases like index building where reads and
+         * write happen to all the table regions in the server.
+         */
+        blockingMemStoreSize = getBlockingMemstoreSize(region, conf) ;
+
+        buildLocalIndex = indexMaintainers != null && dataColumns==null && !localIndexScan;
+        if(buildLocalIndex) {
+            checkForLocalIndexColumnFamilies(region, indexMaintainers);
+        }
+        if (isDescRowKeyOrderUpgrade || isDelete || isUpsert
+                || (deleteCQ != null && deleteCF != null) || emptyCF != null || buildLocalIndex) {
+            needToWrite = true;
+            if((isUpsert && (targetHTable == null ||
+                    !targetHTable.getName().equals(region.getTableDesc().getTableName())))) {
+                needToWrite = false;
+            }
+            maxBatchSize = conf.getInt(MUTATE_BATCH_SIZE_ATTRIB, QueryServicesOptions.DEFAULT_MUTATE_BATCH_SIZE);
+            maxBatchSizeBytes = conf.getLong(MUTATE_BATCH_SIZE_BYTES_ATTRIB,
+                    QueryServicesOptions.DEFAULT_MUTATE_BATCH_SIZE_BYTES);
+        }
+        minMaxQualifiers = EncodedColumnsUtil.getMinMaxQualifiersFromScan(scan);
+        if (LOGGER.isDebugEnabled()) {
+            LOGGER.debug(LogUtil.addCustomAnnotations("Starting ungrouped coprocessor scan " + scan + " " + region.getRegionInfo(), ScanUtil.getCustomAnnotations(scan)));
+        }
+        useIndexProto = true;
+        indexMaintainersPtr = scan.getAttribute(PhoenixIndexCodec.INDEX_PROTO_MD);
+        // for backward compatiblity fall back to look by the old attribute
+        if (indexMaintainersPtr == null) {
+            indexMaintainersPtr = scan.getAttribute(PhoenixIndexCodec.INDEX_MD);
+            useIndexProto = false;
+        }
+
+        if (needToWrite) {
+            ungroupedAggregateRegionObserver.incrementScansReferenceCount();
+            incrScanRefCount = true;
+        }
+    }
+
+    @Override
+    public HRegionInfo getRegionInfo() {
+        return region.getRegionInfo();
+    }
+
+    @Override
+    public boolean isFilterDone() {
+        return false;
+    }
+
+    @Override
+    public void close() throws IOException {
+        if (needToWrite && incrScanRefCount) {
+            ungroupedAggregateRegionObserver.decrementScansReferenceCount();
+        }
+        try {
+            if (targetHTable != null) {
+                try {
+                    targetHTable.close();
+                } catch (IOException e) {
+                    LOGGER.error("Closing table: " + targetHTable + " failed: ", e);
+                }
+            }
+        } finally {
+            innerScanner.close();
+        }
+    }
+
+    boolean descRowKeyOrderUpgrade(List<Cell> results, ImmutableBytesWritable ptr,
+                                UngroupedAggregateRegionObserver.MutationList mutations) throws IOException {
+        Arrays.fill(values, null);
+        Cell firstKV = results.get(0);
+        RowKeySchema schema = projectedTable.getRowKeySchema();
+        int maxOffset = schema.iterator(firstKV.getRowArray(), firstKV.getRowOffset() + offset, firstKV.getRowLength(), ptr);
+        for (int i = 0; i < schema.getFieldCount(); i++) {
+            Boolean hasValue = schema.next(ptr, i, maxOffset);
+            if (hasValue == null) {
+                break;
+            }
+            ValueSchema.Field field = schema.getField(i);
+            if (field.getSortOrder() == SortOrder.DESC) {
+                // Special case for re-writing DESC ARRAY, as the actual byte value needs to change in this case
+                if (field.getDataType().isArrayType()) {
+                    field.getDataType().coerceBytes(ptr, null, field.getDataType(),
+                            field.getMaxLength(), field.getScale(), field.getSortOrder(),
+                            field.getMaxLength(), field.getScale(), field.getSortOrder(), true); // force to use correct separator byte
+                }
+                // Special case for re-writing DESC CHAR or DESC BINARY, to force the re-writing of trailing space characters
+                else if (field.getDataType() == PChar.INSTANCE || field.getDataType() == PBinary.INSTANCE) {
+                    int len = ptr.getLength();
+                    while (len > 0 && ptr.get()[ptr.getOffset() + len - 1] == StringUtil.SPACE_UTF8) {
+                        len--;
+                    }
+                    ptr.set(ptr.get(), ptr.getOffset(), len);
+                    // Special case for re-writing DESC FLOAT and DOUBLE, as they're not inverted like they should be (PHOENIX-2171)
+                } else if (field.getDataType() == PFloat.INSTANCE || field.getDataType() == PDouble.INSTANCE) {
+                    byte[] invertedBytes = SortOrder.invert(ptr.get(), ptr.getOffset(), ptr.getLength());
+                    ptr.set(invertedBytes);
+                }
+            } else if (field.getDataType() == PBinary.INSTANCE) {
+                // Remove trailing space characters so that the setValues call below will replace them
+                // with the correct zero byte character. Note this is somewhat dangerous as these
+                // could be legit, but I don't know what the alternative is.
+                int len = ptr.getLength();
+                while (len > 0 && ptr.get()[ptr.getOffset() + len - 1] == StringUtil.SPACE_UTF8) {
+                    len--;
+                }
+                ptr.set(ptr.get(), ptr.getOffset(), len);
+            }
+            values[i] = ptr.copyBytes();
+        }
+        writeToTable.newKey(ptr, values);
+        if (Bytes.compareTo(
+                firstKV.getRowArray(), firstKV.getRowOffset() + offset, firstKV.getRowLength(),
+                ptr.get(),ptr.getOffset() + offset,ptr.getLength()) == 0) {
+            return false;
+        }
+        byte[] newRow = ByteUtil.copyKeyBytesIfNecessary(ptr);
+        if (offset > 0) { // for local indexes (prepend region start key)
+            byte[] newRowWithOffset = new byte[offset + newRow.length];
+            System.arraycopy(firstKV.getRowArray(), firstKV.getRowOffset(), newRowWithOffset, 0, offset);;
+            System.arraycopy(newRow, 0, newRowWithOffset, offset, newRow.length);
+            newRow = newRowWithOffset;
+        }
+        byte[] oldRow = Bytes.copy(firstKV.getRowArray(), firstKV.getRowOffset(), firstKV.getRowLength());
+        for (Cell cell : results) {
+            // Copy existing cell but with new row key
+            Cell newCell = new KeyValue(newRow, 0, newRow.length,
+                    cell.getFamilyArray(), cell.getFamilyOffset(), cell.getFamilyLength(),
+                    cell.getQualifierArray(), cell.getQualifierOffset(), cell.getQualifierLength(),
+                    cell.getTimestamp(), KeyValue.Type.codeToType(cell.getTypeByte()),
+                    cell.getValueArray(), cell.getValueOffset(), cell.getValueLength());
+            switch (KeyValue.Type.codeToType(cell.getTypeByte())) {
+                case Put:
+                    // If Put, point delete old Put
+                    Delete del = new Delete(oldRow);
+                    del.addDeleteMarker(new KeyValue(cell.getRowArray(), cell.getRowOffset(), cell.getRowLength(),
+                            cell.getFamilyArray(), cell.getFamilyOffset(), cell.getFamilyLength(),
+                            cell.getQualifierArray(), cell.getQualifierOffset(),
+                            cell.getQualifierLength(), cell.getTimestamp(), KeyValue.Type.Delete,
+                            ByteUtil.EMPTY_BYTE_ARRAY, 0, 0));
+                    mutations.add(del);
+
+                    Put put = new Put(newRow);
+                    put.add(newCell);
+                    mutations.add(put);
+                    break;
+                case Delete:
+                case DeleteColumn:
+                case DeleteFamily:
+                case DeleteFamilyVersion:
+                    Delete delete = new Delete(newRow);
+                    delete.addDeleteMarker(newCell);
+                    mutations.add(delete);
+                    break;
+            }
+        }
+        return true;
+    }
+
+    void buildLocalIndex(Tuple result, List<Cell> results, ImmutableBytesWritable ptr,
+                         UngroupedAggregateRegionObserver.MutationList mutations) throws IOException {
+        for (IndexMaintainer maintainer : indexMaintainers) {
+            if (!results.isEmpty()) {
+                result.getKey(ptr);
+                ValueGetter valueGetter =
+                        maintainer.createGetterFromKeyValues(
+                                ImmutableBytesPtr.copyBytesIfNecessary(ptr),
+                                results);
+                Put put = maintainer.buildUpdateMutation(GenericKeyValueBuilder.INSTANCE,
+                        valueGetter, ptr, results.get(0).getTimestamp(),
+                        env.getRegion().getRegionInfo().getStartKey(),
+                        env.getRegion().getRegionInfo().getEndKey());
+
+                if (txnProvider != null) {
+                    put = txnProvider.markPutAsCommitted(put, ts, ts);
+                }
+                indexMutations.add(put);
+            }
+        }
+        result.setKeyValues(results);
+    }
+    void deleteRow(List<Cell> results, UngroupedAggregateRegionObserver.MutationList mutations) {
+        // FIXME: the version of the Delete constructor without the lock
+        // args was introduced in 0.94.4, thus if we try to use it here
+        // we can no longer use the 0.94.2 version of the client.
+        Cell firstKV = results.get(0);
+        Delete delete = new Delete(firstKV.getRowArray(),
+                firstKV.getRowOffset(), firstKV.getRowLength(),ts);
+        if (replayMutations != null) {
+            delete.setAttribute(REPLAY_WRITES, replayMutations);
+        }
+        mutations.add(delete);
+        // force tephra to ignore this deletes
+        delete.setAttribute(PhoenixTransactionContext.TX_ROLLBACK_ATTRIBUTE_KEY, new byte[0]);
+    }
+
+    void deleteCForQ(Tuple result, List<Cell> results, UngroupedAggregateRegionObserver.MutationList mutations) {
+        // No need to search for delete column, since we project only it
+        // if no empty key value is being set
+        if (emptyCF == null ||
+                result.getValue(deleteCF, deleteCQ) != null) {
+            Delete delete = new Delete(results.get(0).getRowArray(),
+                    results.get(0).getRowOffset(),
+                    results.get(0).getRowLength());
+            delete.deleteColumns(deleteCF,  deleteCQ, ts);
+            // force tephra to ignore this deletes
+            delete.setAttribute(PhoenixTransactionContext.TX_ROLLBACK_ATTRIBUTE_KEY, new byte[0]);
+            mutations.add(delete);
+        }
+    }
+    void upsert(Tuple result, ImmutableBytesWritable ptr, UngroupedAggregateRegionObserver.MutationList mutations) {
+        Arrays.fill(values, null);
+        int bucketNumOffset = 0;
+        if (projectedTable.getBucketNum() != null) {
+            values[0] = new byte[] { 0 };
+            bucketNumOffset = 1;
+        }
+        int i = bucketNumOffset;
+        List<PColumn> projectedColumns = projectedTable.getColumns();
+        for (; i < projectedTable.getPKColumns().size(); i++) {
+            Expression expression = selectExpressions.get(i - bucketNumOffset);
+            if (expression.evaluate(result, ptr)) {
+                values[i] = ptr.copyBytes();
+                // If SortOrder from expression in SELECT doesn't match the
+                // column being projected into then invert the bits.
+                if (expression.getSortOrder() !=
+                        projectedColumns.get(i).getSortOrder()) {
+                    SortOrder.invert(values[i], 0, values[i], 0,
+                            values[i].length);
+                }
+            }else{
+                values[i] = ByteUtil.EMPTY_BYTE_ARRAY;
+            }
+        }
+        projectedTable.newKey(ptr, values);
+        PRow row = projectedTable.newRow(GenericKeyValueBuilder.INSTANCE, ts, ptr, false);
+        for (; i < projectedColumns.size(); i++) {
+            Expression expression = selectExpressions.get(i - bucketNumOffset);
+            if (expression.evaluate(result, ptr)) {
+                PColumn column = projectedColumns.get(i);
+                if (!column.getDataType().isSizeCompatible(ptr, null,
+                        expression.getDataType(), expression.getSortOrder(),
+                        expression.getMaxLength(), expression.getScale(),
+                        column.getMaxLength(), column.getScale())) {
+                    throw new DataExceedsCapacityException(
+                            column.getDataType(), column.getMaxLength(),
+                            column.getScale(), column.getName().getString(), ptr);
+                }
+                column.getDataType().coerceBytes(ptr, null,
+                        expression.getDataType(), expression.getMaxLength(),
+                        expression.getScale(), expression.getSortOrder(),
+                        column.getMaxLength(), column.getScale(),
+                        column.getSortOrder(), projectedTable.rowKeyOrderOptimizable());
+                byte[] bytes = ByteUtil.copyKeyBytesIfNecessary(ptr);
+                row.setValue(column, bytes);
+            }
+        }
+        for (Mutation mutation : row.toRowMutations()) {
+            if (replayMutations != null) {
+                mutation.setAttribute(REPLAY_WRITES, replayMutations);
+            } else if (txnProvider != null && projectedTable.getType() == PTableType.INDEX) {
+                mutation = txnProvider.markPutAsCommitted((Put)mutation, ts, ts);
+            }
+            mutations.add(mutation);
+        }
+        for (i = 0; i < selectExpressions.size(); i++) {
+            selectExpressions.get(i).reset();
+        }
+    }
+
+    void insertEmptyKeyValue(List<Cell> results, UngroupedAggregateRegionObserver.MutationList mutations) {
+        Set<Long> timeStamps =
+                Sets.newHashSetWithExpectedSize(results.size());
+        for (Cell kv : results) {
+            long kvts = kv.getTimestamp();
+            if (!timeStamps.contains(kvts)) {
+                Put put = new Put(kv.getRowArray(), kv.getRowOffset(),
+                        kv.getRowLength());
+                put.addColumn(emptyCF, QueryConstants.EMPTY_COLUMN_BYTES, kvts,
+                        ByteUtil.EMPTY_BYTE_ARRAY);
+                mutations.add(put);
+            }
+        }
+    }
+    @Override
+    public boolean next(List<Cell> resultsToReturn) throws IOException {
+        boolean hasMore;
+        Configuration conf = env.getConfiguration();
+        final TenantCache tenantCache = GlobalCache.getTenantCache(env, ScanUtil.getTenantId(scan));
+        try (MemoryManager.MemoryChunk em = tenantCache.getMemoryManager().allocate(0)) {
+            Aggregators aggregators = ServerAggregators.deserialize(
+                    scan.getAttribute(BaseScannerRegionObserver.AGGREGATORS), conf, em);
+            Aggregator[] rowAggregators = aggregators.getAggregators();
+            aggregators.reset(rowAggregators);
+            Cell lastCell = null;
+            int rowCount = 0;
+            boolean hasAny = false;
+            ImmutableBytesWritable ptr = new ImmutableBytesWritable();
+            Tuple result = useQualifierAsIndex ? new PositionBasedMultiKeyValueTuple() : new MultiKeyValueTuple();
+            UngroupedAggregateRegionObserver.MutationList mutations = new UngroupedAggregateRegionObserver.MutationList();
+            if (isDescRowKeyOrderUpgrade || isDelete || isUpsert
+                    || (deleteCQ != null && deleteCF != null) || emptyCF != null || buildLocalIndex) {
+                mutations = new UngroupedAggregateRegionObserver.MutationList(Ints.saturatedCast(maxBatchSize + maxBatchSize / 10));
+            }
+            region.startRegionOperation();
+            try {
+                synchronized (innerScanner) {
+                    do {
+                        ungroupedAggregateRegionObserver.checkForRegionClosing();
+                        List<Cell> results = useQualifierAsIndex ? new EncodedColumnQualiferCellsList(minMaxQualifiers.getFirst(), minMaxQualifiers.getSecond(), encodingScheme) : new ArrayList<Cell>();
+                        // Results are potentially returned even when the return value of s.next is false
+                        // since this is an indication of whether or not there are more values after the
+                        // ones returned
+                        hasMore = innerScanner.nextRaw(results);
+                        if (!results.isEmpty()) {
+                            lastCell = results.get(0);
+                            rowCount++;
+                            result.setKeyValues(results);
+                            if (isDescRowKeyOrderUpgrade) {
+                                if (!descRowKeyOrderUpgrade(results, ptr, mutations)) {
+                                    continue;
+                                }
+                            } else if (buildLocalIndex) {
+                                buildLocalIndex(result, results, ptr, mutations);
+                            } else if (isDelete) {
+                                deleteRow(results, mutations);
+                            } else if (isUpsert) {
+                                upsert(result, ptr, mutations);
+                            } else if (deleteCF != null && deleteCQ != null) {
+                                deleteCForQ(result, results, mutations);
+                            }
+                            if (emptyCF != null) {
+                                /*
+                                 * If we've specified an emptyCF, then we need to insert an empty
+                                 * key value "retroactively" for any key value that is visible at
+                                 * the timestamp that the DDL was issued. Key values that are not
+                                 * visible at this timestamp will not ever be projected up to
+                                 * scans past this timestamp, so don't need to be considered.
+                                 * We insert one empty key value per row per timestamp.
+                                 */
+                                insertEmptyKeyValue(results, mutations);
+                            }
+                            if (ServerUtil.readyToCommit(mutations.size(), mutations.byteSize(), maxBatchSize, maxBatchSizeBytes)) {
+                                ungroupedAggregateRegionObserver.commit(region, mutations, indexUUID, blockingMemStoreSize, indexMaintainersPtr,
+                                        txState, targetHTable, useIndexProto, isPKChanging, clientVersionBytes);
+                                mutations.clear();
+                            }
+                            // Commit in batches based on UPSERT_BATCH_SIZE_BYTES_ATTRIB in config
+
+                            if (ServerUtil.readyToCommit(indexMutations.size(), indexMutations.byteSize(), maxBatchSize, maxBatchSizeBytes)) {
+                                setIndexAndTransactionProperties(indexMutations, indexUUID, indexMaintainersPtr, txState, clientVersionBytes, useIndexProto);
+                                ungroupedAggregateRegionObserver.commitBatch(region, indexMutations, blockingMemStoreSize);
+                                indexMutations.clear();
+                            }
+                            aggregators.aggregate(rowAggregators, result);
+                            hasAny = true;
+                        }
+                    } while (hasMore && rowCount < pageSizeInRows);

Review comment:
       This is a do-while loop (instead of a while loop).So, we need to stop the loop when the row count becomes pageSizeInRows




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [phoenix] kadirozde commented on a change in pull request #936: PHOENIX-5998 Paged server side ungrouped aggregate operations

Posted by GitBox <gi...@apache.org>.

kadirozde commented on a change in pull request #936:
URL: https://github.com/apache/phoenix/pull/936#discussion_r520736542



##########
File path: phoenix-core/src/test/java/org/apache/phoenix/query/BaseTest.java
##########
@@ -629,6 +629,9 @@ public static Configuration setUpConfigForMiniCluster(Configuration conf, ReadOn
         conf.setInt(HConstants.HBASE_CLIENT_RETRIES_NUMBER, 2);
         conf.setInt(NUM_CONCURRENT_INDEX_WRITER_THREADS_CONF_KEY, 1);
         conf.setInt(GLOBAL_INDEX_ROW_AGE_THRESHOLD_TO_DELETE_MS_ATTRIB, 0);
+        if (conf.getLong(QueryServices.UNGROUPED_AGGREGATE_PAGE_SIZE_IN_ROWS, 0) == 0) {
+            conf.setLong(QueryServices.UNGROUPED_AGGREGATE_PAGE_SIZE_IN_ROWS, 2);

Review comment:
       No, it does not. Instead of writing a small number of new tests, by setting the page size to a very small number (e.g., 2), I make sure almost all of the existing tests exercise this new feature.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [phoenix] kadirozde commented on a change in pull request #936: PHOENIX-5998 Paged server side ungrouped aggregate operations

Posted by GitBox <gi...@apache.org>.

kadirozde commented on a change in pull request #936:
URL: https://github.com/apache/phoenix/pull/936#discussion_r520749622



##########
File path: phoenix-core/src/main/java/org/apache/phoenix/iterate/UngroupedAggregatingResultIterator.java
##########
@@ -33,21 +34,37 @@
     public UngroupedAggregatingResultIterator( PeekingResultIterator resultIterator, Aggregators aggregators) {
         super(resultIterator, aggregators);
     }
-    
     @Override
     public Tuple next() throws SQLException {
-        Tuple result = super.next();
-        // Ensure ungrouped aggregregation always returns a row, even if the underlying iterator doesn't.
-        if (result == null && !hasRows) {
-            // We should reset ClientAggregators here in case they are being reused in a new ResultIterator.
-            aggregators.reset(aggregators.getAggregators());
-            byte[] value = aggregators.toBytes(aggregators.getAggregators());
-            result = new SingleKeyValueTuple(
-                    KeyValueUtil.newKeyValue(UNGROUPED_AGG_ROW_KEY, 
-                            SINGLE_COLUMN_FAMILY, 
-                            SINGLE_COLUMN, 
-                            AGG_TIMESTAMP, 
-                            value));
+        Tuple result = resultIterator.next();
+        if (result == null) {
+            // Ensure ungrouped aggregregation always returns a row, even if the underlying iterator doesn't.
+            if (!hasRows) {
+                // We should reset ClientAggregators here in case they are being reused in a new ResultIterator.
+                aggregators.reset(aggregators.getAggregators());
+                byte[] value = aggregators.toBytes(aggregators.getAggregators());
+                result = new SingleKeyValueTuple(
+                        KeyValueUtil.newKeyValue(UNGROUPED_AGG_ROW_KEY,
+                                SINGLE_COLUMN_FAMILY,
+                                SINGLE_COLUMN,
+                                AGG_TIMESTAMP,
+                                value));
+            }
+        } else {
+            Aggregator[] rowAggregators = aggregators.getAggregators();
+            aggregators.reset(rowAggregators);
+            while (true) {
+                aggregators.aggregate(rowAggregators, result);
+                Tuple nextResult = resultIterator.peek();
+                if (nextResult == null) {
+                    break;
+                }
+                result = resultIterator.next();

Review comment:
       Yes, your understanding is correct.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [phoenix] kadirozde commented on a change in pull request #936: PHOENIX-5998 Paged server side ungrouped aggregate operations

Posted by GitBox <gi...@apache.org>.

kadirozde commented on a change in pull request #936:
URL: https://github.com/apache/phoenix/pull/936#discussion_r520732053



##########
File path: phoenix-core/src/main/java/org/apache/phoenix/query/QueryServicesOptions.java
##########
@@ -342,6 +342,7 @@
     public static final long DEFAULT_GLOBAL_INDEX_ROW_AGE_THRESHOLD_TO_DELETE_MS = 7*24*60*60*1000; /* 7 days */
     public static final boolean DEFAULT_INDEX_REGION_OBSERVER_ENABLED = true;
     public static final long DEFAULT_INDEX_REBUILD_PAGE_SIZE_IN_ROWS = 32*1024;
+    public static final long DEFAULT_UNGROUPED_AGGREGATE_PAGE_SIZE_IN_ROWS = 32*1024;

Review comment:
       No specific reason, I guess I like powers of 2. I will change it to 32*1000. Based on my testing on a real cluster, 32K gives very good performance without timeouts. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [phoenix] ChinmaySKulkarni commented on pull request #936: PHOENIX-5998 Paged server side ungrouped aggregate operations

Posted by GitBox <gi...@apache.org>.

ChinmaySKulkarni commented on pull request #936:
URL: https://github.com/apache/phoenix/pull/936#issuecomment-724948807


   > @ChinmaySKulkarni, Thank you for reviewing this and asking very good questions. I just had a conversation on this with Sukumar Maddineni and we agreed that the limit on the paging size should be in terms of the time to spend on the server side instead of the number of rows or bytes. If you agree too, I would like to update the PR accordingly. Also, I will write a doc that will cover this and the other two Jiras (6207 and 6211).
   
   No problem, thanks for the quick turnaround. I think that makes sense too. It will be interesting to see how to handle corner cases where for example, we have processed the last row partially. A design doc will be very helpful.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [phoenix] kadirozde commented on pull request #936: PHOENIX-5998 Paged server side ungrouped aggregate operations

Posted by GitBox <gi...@apache.org>.

kadirozde commented on pull request #936:
URL: https://github.com/apache/phoenix/pull/936#issuecomment-716946162

> @kadirozde The changes are substantial and I will need some heads-down time to review them. If it is urgent, please feel free to rely on other's reviews and don't wait for me. I plan on taking a look in detail within the next couple of days.

No, it is not urgent. I was going to start working on PHOENIX-6207 and there is some dependency between them and so I wanted to push this before starting the other. But it is okay and I do not have to wait for this PR to be checked in. Please take your time.

Yes, the changes are substantial but mostly mechanic. The core of the change is that instead of scanning the entire table region in the postScannerOpen hook and returning the result of the aggregate operation for the entire table region in one result iteration, this PR just returns a region scanner (i.e., an new scanner called UngroupedAggregateRegionScanner) in the postScannerOpen hook for the UngroupedAggregateRegionObserver coproc, and then applies the aggregate operation on a chunk (i.e, page) of a table region in each result iteration. This means the client needs to do many iterations in order to process a table region and aggregate the results of these pages on the client side. Please note that previously, the client needed to aggregate the results of server side aggregations, one for each table region (not for each table region page). Hope this helps.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [phoenix] ChinmaySKulkarni commented on a change in pull request #936: PHOENIX-5998 Paged server side ungrouped aggregate operations

Posted by GitBox <gi...@apache.org>.

ChinmaySKulkarni commented on a change in pull request #936:
URL: https://github.com/apache/phoenix/pull/936#discussion_r520195446



##########
File path: phoenix-core/src/main/java/org/apache/phoenix/coprocessor/UngroupedAggregateRegionObserver.java
##########
@@ -268,7 +244,7 @@ public void doMutation() throws IOException {
         }
     }
 
-    private void commitBatch(Region region, List<Mutation> mutations, long blockingMemstoreSize) throws IOException {
+    public void commitBatch(Region region, List<Mutation> mutations, long blockingMemstoreSize) throws IOException {

Review comment:
       Isn't package-private access sufficient for all these APIs since I guess they are only being called from classes within the package?

##########
File path: phoenix-core/src/test/java/org/apache/phoenix/query/BaseTest.java
##########
@@ -629,6 +629,9 @@ public static Configuration setUpConfigForMiniCluster(Configuration conf, ReadOn
         conf.setInt(HConstants.HBASE_CLIENT_RETRIES_NUMBER, 2);
         conf.setInt(NUM_CONCURRENT_INDEX_WRITER_THREADS_CONF_KEY, 1);
         conf.setInt(GLOBAL_INDEX_ROW_AGE_THRESHOLD_TO_DELETE_MS_ATTRIB, 0);
+        if (conf.getLong(QueryServices.UNGROUPED_AGGREGATE_PAGE_SIZE_IN_ROWS, 0) == 0) {
+            conf.setLong(QueryServices.UNGROUPED_AGGREGATE_PAGE_SIZE_IN_ROWS, 2);

Review comment:
       In tests we don't generally read too many rows anyways, so won't setting this config just unnecessarily slow them down?

##########
File path: phoenix-core/src/main/java/org/apache/phoenix/iterate/UngroupedAggregatingResultIterator.java
##########
@@ -33,21 +34,37 @@
     public UngroupedAggregatingResultIterator( PeekingResultIterator resultIterator, Aggregators aggregators) {
         super(resultIterator, aggregators);
     }
-    
     @Override
     public Tuple next() throws SQLException {
-        Tuple result = super.next();
-        // Ensure ungrouped aggregregation always returns a row, even if the underlying iterator doesn't.
-        if (result == null && !hasRows) {
-            // We should reset ClientAggregators here in case they are being reused in a new ResultIterator.
-            aggregators.reset(aggregators.getAggregators());
-            byte[] value = aggregators.toBytes(aggregators.getAggregators());
-            result = new SingleKeyValueTuple(
-                    KeyValueUtil.newKeyValue(UNGROUPED_AGG_ROW_KEY, 
-                            SINGLE_COLUMN_FAMILY, 
-                            SINGLE_COLUMN, 
-                            AGG_TIMESTAMP, 
-                            value));
+        Tuple result = resultIterator.next();
+        if (result == null) {
+            // Ensure ungrouped aggregregation always returns a row, even if the underlying iterator doesn't.
+            if (!hasRows) {
+                // We should reset ClientAggregators here in case they are being reused in a new ResultIterator.
+                aggregators.reset(aggregators.getAggregators());
+                byte[] value = aggregators.toBytes(aggregators.getAggregators());
+                result = new SingleKeyValueTuple(
+                        KeyValueUtil.newKeyValue(UNGROUPED_AGG_ROW_KEY,
+                                SINGLE_COLUMN_FAMILY,
+                                SINGLE_COLUMN,
+                                AGG_TIMESTAMP,
+                                value));
+            }
+        } else {
+            Aggregator[] rowAggregators = aggregators.getAggregators();
+            aggregators.reset(rowAggregators);
+            while (true) {
+                aggregators.aggregate(rowAggregators, result);
+                Tuple nextResult = resultIterator.peek();
+                if (nextResult == null) {
+                    break;
+                }
+                result = resultIterator.next();

Review comment:
       Essentially, the server-side paging just means that more number of iterations of this `while (true)` are needed on the client-side to page over all of the data, correct?

##########
File path: phoenix-core/src/main/java/org/apache/phoenix/coprocessor/UngroupedAggregateRegionScanner.java
##########
@@ -0,0 +1,645 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.phoenix.coprocessor;
+
+import static org.apache.phoenix.coprocessor.BaseScannerRegionObserver.LOCAL_INDEX_BUILD;
+import static org.apache.phoenix.coprocessor.BaseScannerRegionObserver.LOCAL_INDEX_BUILD_PROTO;
+import static org.apache.phoenix.coprocessor.BaseScannerRegionObserver.REPLAY_WRITES;
+import static org.apache.phoenix.coprocessor.BaseScannerRegionObserver.UPGRADE_DESC_ROW_KEY;
+import static org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver.checkForLocalIndexColumnFamilies;
+import static org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver.deserializeExpressions;
+import static org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver.deserializeTable;
+import static org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver.getBlockingMemstoreSize;
+import static org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver.setIndexAndTransactionProperties;
+import static org.apache.phoenix.query.QueryConstants.AGG_TIMESTAMP;
+import static org.apache.phoenix.query.QueryConstants.SINGLE_COLUMN;
+import static org.apache.phoenix.query.QueryConstants.SINGLE_COLUMN_FAMILY;
+import static org.apache.phoenix.query.QueryServices.MUTATE_BATCH_SIZE_ATTRIB;
+import static org.apache.phoenix.query.QueryServices.MUTATE_BATCH_SIZE_BYTES_ATTRIB;
+import static org.apache.phoenix.query.QueryServices.UNGROUPED_AGGREGATE_PAGE_SIZE_IN_ROWS;
+import static org.apache.phoenix.schema.PTableImpl.getColumnsToClone;
+
+import java.io.IOException;
+import java.sql.SQLException;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.List;
+import java.util.Set;
+
+import com.google.common.collect.Sets;
+import com.google.common.primitives.Ints;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.hbase.Cell;
+import org.apache.hadoop.hbase.CellUtil;
+import org.apache.hadoop.hbase.DoNotRetryIOException;
+import org.apache.hadoop.hbase.HRegionInfo;
+import org.apache.hadoop.hbase.KeyValue;
+import org.apache.hadoop.hbase.client.Delete;
+import org.apache.hadoop.hbase.client.HTable;
+import org.apache.hadoop.hbase.client.Mutation;
+import org.apache.hadoop.hbase.client.Put;
+import org.apache.hadoop.hbase.client.Scan;
+import org.apache.hadoop.hbase.coprocessor.ObserverContext;
+import org.apache.hadoop.hbase.coprocessor.RegionCoprocessorEnvironment;
+import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
+import org.apache.hadoop.hbase.regionserver.Region;
+import org.apache.hadoop.hbase.regionserver.RegionScanner;
+import org.apache.hadoop.hbase.util.Bytes;
+import org.apache.phoenix.cache.GlobalCache;
+import org.apache.phoenix.cache.TenantCache;
+import org.apache.phoenix.exception.DataExceedsCapacityException;
+import org.apache.phoenix.expression.Expression;
+import org.apache.phoenix.expression.aggregator.Aggregator;
+import org.apache.phoenix.expression.aggregator.Aggregators;
+import org.apache.phoenix.expression.aggregator.ServerAggregators;
+import org.apache.phoenix.hbase.index.covered.update.ColumnReference;
+import org.apache.phoenix.hbase.index.util.GenericKeyValueBuilder;
+import org.apache.phoenix.index.PhoenixIndexCodec;
+import org.apache.phoenix.memory.InsufficientMemoryException;
+import org.apache.phoenix.memory.MemoryManager;
+import org.apache.phoenix.query.QueryConstants;
+import org.apache.phoenix.query.QueryServicesOptions;
+import org.apache.phoenix.schema.PColumn;
+import org.apache.phoenix.schema.PRow;
+import org.apache.phoenix.schema.PTable;
+import org.apache.phoenix.schema.PTableImpl;
+import org.apache.phoenix.schema.PTableType;
+import org.apache.phoenix.schema.RowKeySchema;
+import org.apache.phoenix.schema.SortOrder;
+import org.apache.phoenix.schema.TableRef;
+import org.apache.phoenix.schema.ValueSchema;
+import org.apache.phoenix.schema.tuple.EncodedColumnQualiferCellsList;
+import org.apache.phoenix.schema.tuple.MultiKeyValueTuple;
+import org.apache.phoenix.schema.tuple.PositionBasedMultiKeyValueTuple;
+import org.apache.phoenix.schema.tuple.Tuple;
+import org.apache.phoenix.schema.types.PBinary;
+import org.apache.phoenix.schema.types.PChar;
+import org.apache.phoenix.schema.types.PDataType;
+import org.apache.phoenix.schema.types.PDouble;
+import org.apache.phoenix.schema.types.PFloat;
+import org.apache.phoenix.transaction.PhoenixTransactionContext;
+import org.apache.phoenix.transaction.PhoenixTransactionProvider;
+import org.apache.phoenix.transaction.TransactionFactory;
+import org.apache.phoenix.util.ByteUtil;
+import org.apache.phoenix.util.EncodedColumnsUtil;
+import org.apache.hadoop.hbase.util.Pair;
+import org.apache.phoenix.hbase.index.ValueGetter;
+import org.apache.phoenix.hbase.index.util.ImmutableBytesPtr;
+import org.apache.phoenix.index.IndexMaintainer;
+import org.apache.phoenix.util.ExpressionUtil;
+import org.apache.phoenix.util.IndexUtil;
+import org.apache.phoenix.util.KeyValueUtil;
+import org.apache.phoenix.util.LogUtil;
+import org.apache.phoenix.util.ScanUtil;
+import org.apache.phoenix.util.ServerUtil;
+import org.apache.phoenix.util.StringUtil;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+public class UngroupedAggregateRegionScanner extends BaseRegionScanner {
+
+    private static final Logger LOGGER = LoggerFactory.getLogger(UngroupedAggregateRegionScanner.class);
+
+    protected long pageSizeInRows = Long.MAX_VALUE;
+    protected int maxBatchSize = 0;
+    protected Scan scan;
+    protected RegionScanner innerScanner;
+    protected Region region;
+    private final UngroupedAggregateRegionObserver ungroupedAggregateRegionObserver;
+    final private RegionCoprocessorEnvironment env;
+    private final boolean useQualifierAsIndex;
+    boolean needToWrite = false;
+    final Pair<Integer, Integer> minMaxQualifiers;
+    byte[][] values = null;
+    PTable.QualifierEncodingScheme encodingScheme;
+    PTable writeToTable = null;
+    PTable projectedTable = null;
+    boolean isDescRowKeyOrderUpgrade;
+    final int offset;
+    boolean buildLocalIndex;
+    List<IndexMaintainer> indexMaintainers;
+    boolean isPKChanging = false;
+    long ts;
+    PhoenixTransactionProvider txnProvider = null;
+    UngroupedAggregateRegionObserver.MutationList indexMutations;
+    boolean isDelete = false;
+    byte[] replayMutations;
+    boolean isUpsert = false;
+    List<Expression> selectExpressions = null;
+    byte[] deleteCQ = null;
+    byte[] deleteCF = null;
+    byte[] emptyCF = null;
+    byte[] indexUUID;
+    byte[] txState;
+    byte[] clientVersionBytes;
+    long blockingMemStoreSize;
+    long maxBatchSizeBytes = 0L;
+    HTable targetHTable = null;
+    boolean incrScanRefCount = false;
+    byte[] indexMaintainersPtr;
+    boolean useIndexProto;
+
+    public UngroupedAggregateRegionScanner(final ObserverContext<RegionCoprocessorEnvironment> c,
+                                           final RegionScanner innerScanner, final Region region, final Scan scan,
+                                           final RegionCoprocessorEnvironment env,
+                                           final UngroupedAggregateRegionObserver ungroupedAggregateRegionObserver)
+            throws IOException, SQLException{
+        super(innerScanner);
+        this.env = env;
+        this.region = region;
+        this.scan = scan;
+        this.ungroupedAggregateRegionObserver = ungroupedAggregateRegionObserver;
+        this.innerScanner = innerScanner;
+        Configuration conf = env.getConfiguration();
+        if (scan.getAttribute(BaseScannerRegionObserver.SERVER_PAGING) != null) {
+            byte[] pageSizeFromScan =
+                    scan.getAttribute(BaseScannerRegionObserver.AGGREGATE_PAGE_ROWS);
+            if (pageSizeFromScan != null) {
+                pageSizeInRows = Bytes.toLong(pageSizeFromScan);
+            } else {
+                pageSizeInRows =
+                        conf.getLong(UNGROUPED_AGGREGATE_PAGE_SIZE_IN_ROWS,
+                                QueryServicesOptions.DEFAULT_UNGROUPED_AGGREGATE_PAGE_SIZE_IN_ROWS);
+            }
+        }
+        ts = scan.getTimeRange().getMax();
+        boolean localIndexScan = ScanUtil.isLocalIndex(scan);
+
+        encodingScheme = EncodedColumnsUtil.getQualifierEncodingScheme(scan);
+        int offsetToBe = 0;
+        if (localIndexScan) {
+            /*
+             * For local indexes, we need to set an offset on row key expressions to skip
+             * the region start key.
+             */
+            offsetToBe = region.getRegionInfo().getStartKey().length != 0 ? region.getRegionInfo().getStartKey().length :
+                    region.getRegionInfo().getEndKey().length;
+            ScanUtil.setRowKeyOffset(scan, offsetToBe);
+        }
+        offset = offsetToBe;
+
+        byte[] descRowKeyTableBytes = scan.getAttribute(UPGRADE_DESC_ROW_KEY);
+        isDescRowKeyOrderUpgrade = descRowKeyTableBytes != null;
+        if (isDescRowKeyOrderUpgrade) {
+            LOGGER.debug("Upgrading row key for " + region.getRegionInfo().getTable().getNameAsString());
+            projectedTable = deserializeTable(descRowKeyTableBytes);
+            try {
+                writeToTable = PTableImpl.builderWithColumns(projectedTable,
+                        getColumnsToClone(projectedTable))
+                        .setRowKeyOrderOptimizable(true)
+                        .build();
+            } catch (SQLException e) {
+                ServerUtil.throwIOException("Upgrade failed", e); // Impossible
+            }
+            values = new byte[projectedTable.getPKColumns().size()][];
+        }
+        boolean useProto = false;
+        byte[] localIndexBytes = scan.getAttribute(LOCAL_INDEX_BUILD_PROTO);
+        useProto = localIndexBytes != null;
+        if (localIndexBytes == null) {
+            localIndexBytes = scan.getAttribute(LOCAL_INDEX_BUILD);
+        }
+        indexMaintainers = localIndexBytes == null ? null : IndexMaintainer.deserialize(localIndexBytes, useProto);
+        indexMutations = localIndexBytes == null ? new UngroupedAggregateRegionObserver.MutationList() : new UngroupedAggregateRegionObserver.MutationList(1024);
+
+        replayMutations = scan.getAttribute(REPLAY_WRITES);
+        indexUUID = scan.getAttribute(PhoenixIndexCodec.INDEX_UUID);
+        txState = scan.getAttribute(BaseScannerRegionObserver.TX_STATE);
+        clientVersionBytes = scan.getAttribute(BaseScannerRegionObserver.CLIENT_VERSION);
+        if (txState != null) {
+            int clientVersion = clientVersionBytes == null ? ScanUtil.UNKNOWN_CLIENT_VERSION : Bytes.toInt(clientVersionBytes);
+            txnProvider = TransactionFactory.getTransactionProvider(txState, clientVersion);
+        }
+        byte[] upsertSelectTable = scan.getAttribute(BaseScannerRegionObserver.UPSERT_SELECT_TABLE);
+        if (upsertSelectTable != null) {
+            isUpsert = true;
+            projectedTable = deserializeTable(upsertSelectTable);
+            targetHTable = new HTable(ungroupedAggregateRegionObserver.getUpsertSelectConfig(),
+                    projectedTable.getPhysicalName().getBytes());
+            selectExpressions = deserializeExpressions(scan.getAttribute(BaseScannerRegionObserver.UPSERT_SELECT_EXPRS));
+            values = new byte[projectedTable.getPKColumns().size()][];
+            isPKChanging = ExpressionUtil.isPkPositionChanging(new TableRef(projectedTable), selectExpressions);
+        } else {
+            byte[] isDeleteAgg = scan.getAttribute(BaseScannerRegionObserver.DELETE_AGG);
+            isDelete = isDeleteAgg != null && Bytes.compareTo(PDataType.TRUE_BYTES, isDeleteAgg) == 0;
+            if (!isDelete) {
+                deleteCF = scan.getAttribute(BaseScannerRegionObserver.DELETE_CF);
+                deleteCQ = scan.getAttribute(BaseScannerRegionObserver.DELETE_CQ);
+            }
+            emptyCF = scan.getAttribute(BaseScannerRegionObserver.EMPTY_CF);
+        }
+        ColumnReference[] dataColumns = IndexUtil.deserializeDataTableColumnsToJoin(scan);
+        useQualifierAsIndex = EncodedColumnsUtil.useQualifierAsIndex(EncodedColumnsUtil.getMinMaxQualifiersFromScan(scan));
+
+        /**
+         * Slow down the writes if the memstore size more than
+         * (hbase.hregion.memstore.block.multiplier - 1) times hbase.hregion.memstore.flush.size
+         * bytes. This avoids flush storm to hdfs for cases like index building where reads and
+         * write happen to all the table regions in the server.
+         */
+        blockingMemStoreSize = getBlockingMemstoreSize(region, conf) ;
+
+        buildLocalIndex = indexMaintainers != null && dataColumns==null && !localIndexScan;
+        if(buildLocalIndex) {
+            checkForLocalIndexColumnFamilies(region, indexMaintainers);
+        }
+        if (isDescRowKeyOrderUpgrade || isDelete || isUpsert
+                || (deleteCQ != null && deleteCF != null) || emptyCF != null || buildLocalIndex) {
+            needToWrite = true;
+            if((isUpsert && (targetHTable == null ||
+                    !targetHTable.getName().equals(region.getTableDesc().getTableName())))) {
+                needToWrite = false;
+            }
+            maxBatchSize = conf.getInt(MUTATE_BATCH_SIZE_ATTRIB, QueryServicesOptions.DEFAULT_MUTATE_BATCH_SIZE);
+            maxBatchSizeBytes = conf.getLong(MUTATE_BATCH_SIZE_BYTES_ATTRIB,
+                    QueryServicesOptions.DEFAULT_MUTATE_BATCH_SIZE_BYTES);
+        }
+        minMaxQualifiers = EncodedColumnsUtil.getMinMaxQualifiersFromScan(scan);
+        if (LOGGER.isDebugEnabled()) {
+            LOGGER.debug(LogUtil.addCustomAnnotations("Starting ungrouped coprocessor scan " + scan + " " + region.getRegionInfo(), ScanUtil.getCustomAnnotations(scan)));
+        }
+        useIndexProto = true;
+        indexMaintainersPtr = scan.getAttribute(PhoenixIndexCodec.INDEX_PROTO_MD);
+        // for backward compatiblity fall back to look by the old attribute
+        if (indexMaintainersPtr == null) {
+            indexMaintainersPtr = scan.getAttribute(PhoenixIndexCodec.INDEX_MD);
+            useIndexProto = false;
+        }
+
+        if (needToWrite) {
+            ungroupedAggregateRegionObserver.incrementScansReferenceCount();
+            incrScanRefCount = true;
+        }
+    }
+
+    @Override
+    public HRegionInfo getRegionInfo() {
+        return region.getRegionInfo();
+    }
+
+    @Override
+    public boolean isFilterDone() {
+        return false;
+    }
+
+    @Override
+    public void close() throws IOException {
+        if (needToWrite && incrScanRefCount) {
+            ungroupedAggregateRegionObserver.decrementScansReferenceCount();
+        }
+        try {
+            if (targetHTable != null) {
+                try {
+                    targetHTable.close();
+                } catch (IOException e) {
+                    LOGGER.error("Closing table: " + targetHTable + " failed: ", e);
+                }
+            }
+        } finally {
+            innerScanner.close();
+        }
+    }
+
+    boolean descRowKeyOrderUpgrade(List<Cell> results, ImmutableBytesWritable ptr,
+                                UngroupedAggregateRegionObserver.MutationList mutations) throws IOException {
+        Arrays.fill(values, null);
+        Cell firstKV = results.get(0);
+        RowKeySchema schema = projectedTable.getRowKeySchema();
+        int maxOffset = schema.iterator(firstKV.getRowArray(), firstKV.getRowOffset() + offset, firstKV.getRowLength(), ptr);
+        for (int i = 0; i < schema.getFieldCount(); i++) {
+            Boolean hasValue = schema.next(ptr, i, maxOffset);
+            if (hasValue == null) {
+                break;
+            }
+            ValueSchema.Field field = schema.getField(i);
+            if (field.getSortOrder() == SortOrder.DESC) {
+                // Special case for re-writing DESC ARRAY, as the actual byte value needs to change in this case
+                if (field.getDataType().isArrayType()) {
+                    field.getDataType().coerceBytes(ptr, null, field.getDataType(),
+                            field.getMaxLength(), field.getScale(), field.getSortOrder(),
+                            field.getMaxLength(), field.getScale(), field.getSortOrder(), true); // force to use correct separator byte
+                }
+                // Special case for re-writing DESC CHAR or DESC BINARY, to force the re-writing of trailing space characters
+                else if (field.getDataType() == PChar.INSTANCE || field.getDataType() == PBinary.INSTANCE) {
+                    int len = ptr.getLength();
+                    while (len > 0 && ptr.get()[ptr.getOffset() + len - 1] == StringUtil.SPACE_UTF8) {
+                        len--;
+                    }
+                    ptr.set(ptr.get(), ptr.getOffset(), len);
+                    // Special case for re-writing DESC FLOAT and DOUBLE, as they're not inverted like they should be (PHOENIX-2171)
+                } else if (field.getDataType() == PFloat.INSTANCE || field.getDataType() == PDouble.INSTANCE) {
+                    byte[] invertedBytes = SortOrder.invert(ptr.get(), ptr.getOffset(), ptr.getLength());
+                    ptr.set(invertedBytes);
+                }
+            } else if (field.getDataType() == PBinary.INSTANCE) {
+                // Remove trailing space characters so that the setValues call below will replace them
+                // with the correct zero byte character. Note this is somewhat dangerous as these
+                // could be legit, but I don't know what the alternative is.
+                int len = ptr.getLength();
+                while (len > 0 && ptr.get()[ptr.getOffset() + len - 1] == StringUtil.SPACE_UTF8) {
+                    len--;
+                }
+                ptr.set(ptr.get(), ptr.getOffset(), len);
+            }
+            values[i] = ptr.copyBytes();
+        }
+        writeToTable.newKey(ptr, values);
+        if (Bytes.compareTo(
+                firstKV.getRowArray(), firstKV.getRowOffset() + offset, firstKV.getRowLength(),
+                ptr.get(),ptr.getOffset() + offset,ptr.getLength()) == 0) {
+            return false;
+        }
+        byte[] newRow = ByteUtil.copyKeyBytesIfNecessary(ptr);
+        if (offset > 0) { // for local indexes (prepend region start key)
+            byte[] newRowWithOffset = new byte[offset + newRow.length];
+            System.arraycopy(firstKV.getRowArray(), firstKV.getRowOffset(), newRowWithOffset, 0, offset);;
+            System.arraycopy(newRow, 0, newRowWithOffset, offset, newRow.length);
+            newRow = newRowWithOffset;
+        }
+        byte[] oldRow = Bytes.copy(firstKV.getRowArray(), firstKV.getRowOffset(), firstKV.getRowLength());
+        for (Cell cell : results) {
+            // Copy existing cell but with new row key
+            Cell newCell = new KeyValue(newRow, 0, newRow.length,
+                    cell.getFamilyArray(), cell.getFamilyOffset(), cell.getFamilyLength(),
+                    cell.getQualifierArray(), cell.getQualifierOffset(), cell.getQualifierLength(),
+                    cell.getTimestamp(), KeyValue.Type.codeToType(cell.getTypeByte()),
+                    cell.getValueArray(), cell.getValueOffset(), cell.getValueLength());
+            switch (KeyValue.Type.codeToType(cell.getTypeByte())) {
+                case Put:
+                    // If Put, point delete old Put
+                    Delete del = new Delete(oldRow);
+                    del.addDeleteMarker(new KeyValue(cell.getRowArray(), cell.getRowOffset(), cell.getRowLength(),
+                            cell.getFamilyArray(), cell.getFamilyOffset(), cell.getFamilyLength(),
+                            cell.getQualifierArray(), cell.getQualifierOffset(),
+                            cell.getQualifierLength(), cell.getTimestamp(), KeyValue.Type.Delete,
+                            ByteUtil.EMPTY_BYTE_ARRAY, 0, 0));
+                    mutations.add(del);
+
+                    Put put = new Put(newRow);
+                    put.add(newCell);
+                    mutations.add(put);
+                    break;
+                case Delete:
+                case DeleteColumn:
+                case DeleteFamily:
+                case DeleteFamilyVersion:
+                    Delete delete = new Delete(newRow);
+                    delete.addDeleteMarker(newCell);
+                    mutations.add(delete);
+                    break;
+            }
+        }
+        return true;
+    }
+
+    void buildLocalIndex(Tuple result, List<Cell> results, ImmutableBytesWritable ptr,
+                         UngroupedAggregateRegionObserver.MutationList mutations) throws IOException {
+        for (IndexMaintainer maintainer : indexMaintainers) {
+            if (!results.isEmpty()) {
+                result.getKey(ptr);
+                ValueGetter valueGetter =
+                        maintainer.createGetterFromKeyValues(
+                                ImmutableBytesPtr.copyBytesIfNecessary(ptr),
+                                results);
+                Put put = maintainer.buildUpdateMutation(GenericKeyValueBuilder.INSTANCE,
+                        valueGetter, ptr, results.get(0).getTimestamp(),
+                        env.getRegion().getRegionInfo().getStartKey(),
+                        env.getRegion().getRegionInfo().getEndKey());
+
+                if (txnProvider != null) {
+                    put = txnProvider.markPutAsCommitted(put, ts, ts);
+                }
+                indexMutations.add(put);
+            }
+        }
+        result.setKeyValues(results);
+    }
+    void deleteRow(List<Cell> results, UngroupedAggregateRegionObserver.MutationList mutations) {
+        // FIXME: the version of the Delete constructor without the lock
+        // args was introduced in 0.94.4, thus if we try to use it here
+        // we can no longer use the 0.94.2 version of the client.
+        Cell firstKV = results.get(0);
+        Delete delete = new Delete(firstKV.getRowArray(),
+                firstKV.getRowOffset(), firstKV.getRowLength(),ts);
+        if (replayMutations != null) {
+            delete.setAttribute(REPLAY_WRITES, replayMutations);
+        }
+        mutations.add(delete);
+        // force tephra to ignore this deletes
+        delete.setAttribute(PhoenixTransactionContext.TX_ROLLBACK_ATTRIBUTE_KEY, new byte[0]);
+    }
+
+    void deleteCForQ(Tuple result, List<Cell> results, UngroupedAggregateRegionObserver.MutationList mutations) {
+        // No need to search for delete column, since we project only it
+        // if no empty key value is being set
+        if (emptyCF == null ||
+                result.getValue(deleteCF, deleteCQ) != null) {
+            Delete delete = new Delete(results.get(0).getRowArray(),
+                    results.get(0).getRowOffset(),
+                    results.get(0).getRowLength());
+            delete.deleteColumns(deleteCF,  deleteCQ, ts);
+            // force tephra to ignore this deletes
+            delete.setAttribute(PhoenixTransactionContext.TX_ROLLBACK_ATTRIBUTE_KEY, new byte[0]);
+            mutations.add(delete);
+        }
+    }
+    void upsert(Tuple result, ImmutableBytesWritable ptr, UngroupedAggregateRegionObserver.MutationList mutations) {
+        Arrays.fill(values, null);
+        int bucketNumOffset = 0;
+        if (projectedTable.getBucketNum() != null) {
+            values[0] = new byte[] { 0 };
+            bucketNumOffset = 1;
+        }
+        int i = bucketNumOffset;
+        List<PColumn> projectedColumns = projectedTable.getColumns();
+        for (; i < projectedTable.getPKColumns().size(); i++) {
+            Expression expression = selectExpressions.get(i - bucketNumOffset);
+            if (expression.evaluate(result, ptr)) {
+                values[i] = ptr.copyBytes();
+                // If SortOrder from expression in SELECT doesn't match the
+                // column being projected into then invert the bits.
+                if (expression.getSortOrder() !=
+                        projectedColumns.get(i).getSortOrder()) {
+                    SortOrder.invert(values[i], 0, values[i], 0,
+                            values[i].length);
+                }
+            }else{
+                values[i] = ByteUtil.EMPTY_BYTE_ARRAY;
+            }
+        }
+        projectedTable.newKey(ptr, values);
+        PRow row = projectedTable.newRow(GenericKeyValueBuilder.INSTANCE, ts, ptr, false);
+        for (; i < projectedColumns.size(); i++) {
+            Expression expression = selectExpressions.get(i - bucketNumOffset);
+            if (expression.evaluate(result, ptr)) {
+                PColumn column = projectedColumns.get(i);
+                if (!column.getDataType().isSizeCompatible(ptr, null,
+                        expression.getDataType(), expression.getSortOrder(),
+                        expression.getMaxLength(), expression.getScale(),
+                        column.getMaxLength(), column.getScale())) {
+                    throw new DataExceedsCapacityException(
+                            column.getDataType(), column.getMaxLength(),
+                            column.getScale(), column.getName().getString(), ptr);
+                }
+                column.getDataType().coerceBytes(ptr, null,
+                        expression.getDataType(), expression.getMaxLength(),
+                        expression.getScale(), expression.getSortOrder(),
+                        column.getMaxLength(), column.getScale(),
+                        column.getSortOrder(), projectedTable.rowKeyOrderOptimizable());
+                byte[] bytes = ByteUtil.copyKeyBytesIfNecessary(ptr);
+                row.setValue(column, bytes);
+            }
+        }
+        for (Mutation mutation : row.toRowMutations()) {
+            if (replayMutations != null) {
+                mutation.setAttribute(REPLAY_WRITES, replayMutations);
+            } else if (txnProvider != null && projectedTable.getType() == PTableType.INDEX) {
+                mutation = txnProvider.markPutAsCommitted((Put)mutation, ts, ts);
+            }
+            mutations.add(mutation);
+        }
+        for (i = 0; i < selectExpressions.size(); i++) {
+            selectExpressions.get(i).reset();
+        }
+    }
+
+    void insertEmptyKeyValue(List<Cell> results, UngroupedAggregateRegionObserver.MutationList mutations) {
+        Set<Long> timeStamps =
+                Sets.newHashSetWithExpectedSize(results.size());
+        for (Cell kv : results) {
+            long kvts = kv.getTimestamp();
+            if (!timeStamps.contains(kvts)) {
+                Put put = new Put(kv.getRowArray(), kv.getRowOffset(),
+                        kv.getRowLength());
+                put.addColumn(emptyCF, QueryConstants.EMPTY_COLUMN_BYTES, kvts,
+                        ByteUtil.EMPTY_BYTE_ARRAY);
+                mutations.add(put);
+            }
+        }
+    }
+    @Override
+    public boolean next(List<Cell> resultsToReturn) throws IOException {

Review comment:
       Just a generic question: In case the RS dies and the server-side scanner dies with it, when a new scanner starts its work again it will start from row 0. However, can it occur that the previous scanner had returned rows to the client upon which the client issued DELETE/UPSERT operations corresponding to the UPSERT SELECT/DELETE? Does something inside UARO prevent this?

##########
File path: phoenix-core/src/main/java/org/apache/phoenix/query/QueryServicesOptions.java
##########
@@ -342,6 +342,7 @@
     public static final long DEFAULT_GLOBAL_INDEX_ROW_AGE_THRESHOLD_TO_DELETE_MS = 7*24*60*60*1000; /* 7 days */
     public static final boolean DEFAULT_INDEX_REGION_OBSERVER_ENABLED = true;
     public static final long DEFAULT_INDEX_REBUILD_PAGE_SIZE_IN_ROWS = 32*1024;
+    public static final long DEFAULT_UNGROUPED_AGGREGATE_PAGE_SIZE_IN_ROWS = 32*1024;

Review comment:
       How did we arrive at 32768 rows? Since we aren't using a notion of row size for paging, but rather absolute "number of rows", can we replace the 1024 with say 1000 just to avoid confusion (one may think this is 32KB total row size)?

##########
File path: phoenix-core/src/main/java/org/apache/phoenix/iterate/TableResultIterator.java
##########
@@ -134,6 +137,7 @@ public TableResultIterator(MutationState mutationState, Scan scan, ScanMetricsHo
         this.retry=plan.getContext().getConnection().getQueryServices().getProps()
                 .getInt(QueryConstants.HASH_JOIN_CACHE_RETRIES, QueryConstants.DEFAULT_HASH_JOIN_CACHE_RETRIES);
         IndexUtil.setScanAttributesForIndexReadRepair(scan, table, plan.getContext().getConnection());
+        scan.setAttribute(BaseScannerRegionObserver.SERVER_PAGING, TRUE_BYTES);

Review comment:
       Can there be cases where SERVER_PAGING is not desirable? This change sort of enforces all queries to use paging. Wondering if it is worth introducing a client-side config to decide whether to set this attribute to True/False.

##########
File path: phoenix-core/src/main/java/org/apache/phoenix/coprocessor/UngroupedAggregateRegionObserver.java
##########
@@ -516,402 +437,19 @@ public RegionScanner run() throws Exception {
             }
             ImmutableBytesWritable tempPtr = new ImmutableBytesWritable();
             theScanner =
-                    getWrappedScanner(c, theScanner, offset, scan, dataColumns, tupleProjector, 
-                        region, indexMaintainers == null ? null : indexMaintainers.get(0), viewConstants, p, tempPtr, useQualifierAsIndex);
-        } 
-        
-        if (j != null)  {
-            theScanner = new HashJoinRegionScanner(theScanner, p, j, ScanUtil.getTenantId(scan), env, useQualifierAsIndex, useNewValueColumnQualifier);
-        }
-        
-        int maxBatchSize = 0;
-        long maxBatchSizeBytes = 0L;
-        MutationList mutations = new MutationList();
-        boolean needToWrite = false;
-        Configuration conf = env.getConfiguration();
-
-        /**
-         * Slow down the writes if the memstore size more than
-         * (hbase.hregion.memstore.block.multiplier - 1) times hbase.hregion.memstore.flush.size
-         * bytes. This avoids flush storm to hdfs for cases like index building where reads and
-         * write happen to all the table regions in the server.
-         */
-        final long blockingMemStoreSize = getBlockingMemstoreSize(region, conf) ;
-
-        boolean buildLocalIndex = indexMaintainers != null && dataColumns==null && !localIndexScan;
-        if(buildLocalIndex) {
-            checkForLocalIndexColumnFamilies(region, indexMaintainers);
-        }
-        if (isDescRowKeyOrderUpgrade || isDelete || isUpsert
-                || (deleteCQ != null && deleteCF != null) || emptyCF != null || buildLocalIndex) {
-            needToWrite = true;
-            if((isUpsert && (targetHTable == null ||
-                    !targetHTable.getName().equals(region.getTableDesc().getTableName())))) {
-                needToWrite = false;
-            }
-            maxBatchSize = conf.getInt(MUTATE_BATCH_SIZE_ATTRIB, QueryServicesOptions.DEFAULT_MUTATE_BATCH_SIZE);
-            mutations = new MutationList(Ints.saturatedCast(maxBatchSize + maxBatchSize / 10));
-            maxBatchSizeBytes = conf.getLong(MUTATE_BATCH_SIZE_BYTES_ATTRIB,
-                QueryServicesOptions.DEFAULT_MUTATE_BATCH_SIZE_BYTES);
-        }
-        boolean hasMore;
-        int rowCount = 0;
-        boolean hasAny = false;
-        boolean acquiredLock = false;
-        boolean incrScanRefCount = false;
-        Aggregators aggregators = null;
-        Aggregator[] rowAggregators = null;
-        final RegionScanner innerScanner = theScanner;
-        final TenantCache tenantCache = GlobalCache.getTenantCache(env, ScanUtil.getTenantId(scan));
-        try (MemoryChunk em = tenantCache.getMemoryManager().allocate(0)) {
-            aggregators = ServerAggregators.deserialize(
-                    scan.getAttribute(BaseScannerRegionObserver.AGGREGATORS), conf, em);
-            rowAggregators = aggregators.getAggregators();
-            Pair<Integer, Integer> minMaxQualifiers = EncodedColumnsUtil.getMinMaxQualifiersFromScan(scan);
-            Tuple result = useQualifierAsIndex ? new PositionBasedMultiKeyValueTuple() : new MultiKeyValueTuple();
-            if (LOGGER.isDebugEnabled()) {
-                LOGGER.debug(LogUtil.addCustomAnnotations("Starting ungrouped coprocessor scan " + scan + " "+region.getRegionInfo(), ScanUtil.getCustomAnnotations(scan)));
-            }
-            boolean useIndexProto = true;
-            byte[] indexMaintainersPtr = scan.getAttribute(PhoenixIndexCodec.INDEX_PROTO_MD);
-            // for backward compatiblity fall back to look by the old attribute
-            if (indexMaintainersPtr == null) {
-                indexMaintainersPtr = scan.getAttribute(PhoenixIndexCodec.INDEX_MD);
-                useIndexProto = false;
-            }
-    
-            if(needToWrite) {
-                synchronized (lock) {
-                    if (isRegionClosingOrSplitting) {
-                        throw new IOException("Temporarily unable to write from scan because region is closing or splitting");
-                    }
-                    scansReferenceCount++;
-                    incrScanRefCount = true;
-                    lock.notifyAll();
-                }
-            }
-            region.startRegionOperation();
-            acquiredLock = true;
-            synchronized (innerScanner) {
-                do {
-                    List<Cell> results = useQualifierAsIndex ? new EncodedColumnQualiferCellsList(minMaxQualifiers.getFirst(), minMaxQualifiers.getSecond(), encodingScheme) : new ArrayList<Cell>();
-                    // Results are potentially returned even when the return value of s.next is false
-                    // since this is an indication of whether or not there are more values after the
-                    // ones returned
-                    hasMore = innerScanner.nextRaw(results);
-                    if (!results.isEmpty()) {
-                        rowCount++;
-                        result.setKeyValues(results);
-                        if (isDescRowKeyOrderUpgrade) {
-                            Arrays.fill(values, null);
-                            Cell firstKV = results.get(0);
-                            RowKeySchema schema = projectedTable.getRowKeySchema();
-                            int maxOffset = schema.iterator(firstKV.getRowArray(), firstKV.getRowOffset() + offset, firstKV.getRowLength(), ptr);
-                            for (int i = 0; i < schema.getFieldCount(); i++) {
-                                Boolean hasValue = schema.next(ptr, i, maxOffset);
-                                if (hasValue == null) {
-                                    break;
-                                }
-                                Field field = schema.getField(i);
-                                if (field.getSortOrder() == SortOrder.DESC) {
-                                    // Special case for re-writing DESC ARRAY, as the actual byte value needs to change in this case
-                                    if (field.getDataType().isArrayType()) {
-                                        field.getDataType().coerceBytes(ptr, null, field.getDataType(),
-                                            field.getMaxLength(), field.getScale(), field.getSortOrder(), 
-                                            field.getMaxLength(), field.getScale(), field.getSortOrder(), true); // force to use correct separator byte
-                                    }
-                                    // Special case for re-writing DESC CHAR or DESC BINARY, to force the re-writing of trailing space characters
-                                    else if (field.getDataType() == PChar.INSTANCE || field.getDataType() == PBinary.INSTANCE) {
-                                        int len = ptr.getLength();
-                                        while (len > 0 && ptr.get()[ptr.getOffset() + len - 1] == StringUtil.SPACE_UTF8) {
-                                            len--;
-                                        }
-                                        ptr.set(ptr.get(), ptr.getOffset(), len);
-                                        // Special case for re-writing DESC FLOAT and DOUBLE, as they're not inverted like they should be (PHOENIX-2171)
-                                    } else if (field.getDataType() == PFloat.INSTANCE || field.getDataType() == PDouble.INSTANCE) {
-                                        byte[] invertedBytes = SortOrder.invert(ptr.get(), ptr.getOffset(), ptr.getLength());
-                                        ptr.set(invertedBytes);
-                                    }
-                                } else if (field.getDataType() == PBinary.INSTANCE) {
-                                    // Remove trailing space characters so that the setValues call below will replace them
-                                    // with the correct zero byte character. Note this is somewhat dangerous as these
-                                    // could be legit, but I don't know what the alternative is.
-                                    int len = ptr.getLength();
-                                    while (len > 0 && ptr.get()[ptr.getOffset() + len - 1] == StringUtil.SPACE_UTF8) {
-                                        len--;
-                                    }
-                                    ptr.set(ptr.get(), ptr.getOffset(), len);                                        
-                                }
-                                values[i] = ptr.copyBytes();
-                            }
-                            writeToTable.newKey(ptr, values);
-                            if (Bytes.compareTo(
-                                firstKV.getRowArray(), firstKV.getRowOffset() + offset, firstKV.getRowLength(), 
-                                ptr.get(),ptr.getOffset() + offset,ptr.getLength()) == 0) {
-                                continue;
-                            }
-                            byte[] newRow = ByteUtil.copyKeyBytesIfNecessary(ptr);
-                            if (offset > 0) { // for local indexes (prepend region start key)
-                                byte[] newRowWithOffset = new byte[offset + newRow.length];
-                                System.arraycopy(firstKV.getRowArray(), firstKV.getRowOffset(), newRowWithOffset, 0, offset);;
-                                System.arraycopy(newRow, 0, newRowWithOffset, offset, newRow.length);
-                                newRow = newRowWithOffset;
-                            }
-                            byte[] oldRow = Bytes.copy(firstKV.getRowArray(), firstKV.getRowOffset(), firstKV.getRowLength());
-                            for (Cell cell : results) {
-                                // Copy existing cell but with new row key
-                                Cell newCell = new KeyValue(newRow, 0, newRow.length,
-                                    cell.getFamilyArray(), cell.getFamilyOffset(), cell.getFamilyLength(),
-                                    cell.getQualifierArray(), cell.getQualifierOffset(), cell.getQualifierLength(),
-                                    cell.getTimestamp(), KeyValue.Type.codeToType(cell.getTypeByte()),
-                                    cell.getValueArray(), cell.getValueOffset(), cell.getValueLength());
-                                switch (KeyValue.Type.codeToType(cell.getTypeByte())) {
-                                case Put:
-                                    // If Put, point delete old Put
-                                    Delete del = new Delete(oldRow);
-                                    del.addDeleteMarker(new KeyValue(cell.getRowArray(), cell.getRowOffset(), cell.getRowLength(),
-                                        cell.getFamilyArray(), cell.getFamilyOffset(), cell.getFamilyLength(),
-                                        cell.getQualifierArray(), cell.getQualifierOffset(),
-                                        cell.getQualifierLength(), cell.getTimestamp(), KeyValue.Type.Delete,
-                                        ByteUtil.EMPTY_BYTE_ARRAY, 0, 0));
-                                    mutations.add(del);
-
-                                    Put put = new Put(newRow);
-                                    put.add(newCell);
-                                    mutations.add(put);
-                                    break;
-                                case Delete:
-                                case DeleteColumn:
-                                case DeleteFamily:
-                                case DeleteFamilyVersion:
-                                    Delete delete = new Delete(newRow);
-                                    delete.addDeleteMarker(newCell);
-                                    mutations.add(delete);
-                                    break;
-                                }
-                            }
-                        } else if (buildLocalIndex) {
-                            for (IndexMaintainer maintainer : indexMaintainers) {
-                                if (!results.isEmpty()) {
-                                    result.getKey(ptr);
-                                    ValueGetter valueGetter =
-                                            maintainer.createGetterFromKeyValues(
-                                                ImmutableBytesPtr.copyBytesIfNecessary(ptr),
-                                                results);
-                                    Put put = maintainer.buildUpdateMutation(kvBuilder,
-                                        valueGetter, ptr, results.get(0).getTimestamp(),
-                                        env.getRegion().getRegionInfo().getStartKey(),
-                                        env.getRegion().getRegionInfo().getEndKey());
-
-                                    if (txnProvider != null) {
-                                        put = txnProvider.markPutAsCommitted(put, ts, ts);
-                                    }
-                                    indexMutations.add(put);
-                                }
-                            }
-                            result.setKeyValues(results);
-                        } else if (isDelete) {
-                            // FIXME: the version of the Delete constructor without the lock
-                            // args was introduced in 0.94.4, thus if we try to use it here
-                            // we can no longer use the 0.94.2 version of the client.
-                            Cell firstKV = results.get(0);
-                            Delete delete = new Delete(firstKV.getRowArray(),
-                                firstKV.getRowOffset(), firstKV.getRowLength(),ts);
-                            if (replayMutations != null) {
-                                delete.setAttribute(REPLAY_WRITES, replayMutations);
-                            }
-                            mutations.add(delete);
-                            // force tephra to ignore this deletes
-                            delete.setAttribute(PhoenixTransactionContext.TX_ROLLBACK_ATTRIBUTE_KEY, new byte[0]);
-                        } else if (isUpsert) {
-                            Arrays.fill(values, null);
-                            int bucketNumOffset = 0;
-                            if (projectedTable.getBucketNum() != null) {
-                                values[0] = new byte[] { 0 };
-                                bucketNumOffset = 1;
-                            }
-                            int i = bucketNumOffset;
-                            List<PColumn> projectedColumns = projectedTable.getColumns();
-                            for (; i < projectedTable.getPKColumns().size(); i++) {
-                                Expression expression = selectExpressions.get(i - bucketNumOffset);
-                                if (expression.evaluate(result, ptr)) {
-                                    values[i] = ptr.copyBytes();
-                                    // If SortOrder from expression in SELECT doesn't match the
-                                    // column being projected into then invert the bits.
-                                    if (expression.getSortOrder() !=
-                                            projectedColumns.get(i).getSortOrder()) {
-                                        SortOrder.invert(values[i], 0, values[i], 0,
-                                            values[i].length);
-                                    }
-                                }else{
-                                    values[i] = ByteUtil.EMPTY_BYTE_ARRAY;
-                                }
-                            }
-                            projectedTable.newKey(ptr, values);
-                            PRow row = projectedTable.newRow(kvBuilder, ts, ptr, false);
-                            for (; i < projectedColumns.size(); i++) {
-                                Expression expression = selectExpressions.get(i - bucketNumOffset);
-                                if (expression.evaluate(result, ptr)) {
-                                    PColumn column = projectedColumns.get(i);
-                                    if (!column.getDataType().isSizeCompatible(ptr, null,
-                                        expression.getDataType(), expression.getSortOrder(),
-                                        expression.getMaxLength(), expression.getScale(),
-                                        column.getMaxLength(), column.getScale())) {
-                                        throw new DataExceedsCapacityException(
-                                                column.getDataType(),
-                                                column.getMaxLength(),
-                                                column.getScale(),
-                                                column.getName().getString());
-                                    }
-                                    column.getDataType().coerceBytes(ptr, null,
-                                        expression.getDataType(), expression.getMaxLength(),
-                                        expression.getScale(), expression.getSortOrder(), 
-                                        column.getMaxLength(), column.getScale(),
-                                        column.getSortOrder(), projectedTable.rowKeyOrderOptimizable());
-                                    byte[] bytes = ByteUtil.copyKeyBytesIfNecessary(ptr);
-                                    row.setValue(column, bytes);
-                                }
-                            }
-                            for (Mutation mutation : row.toRowMutations()) {
-                                if (replayMutations != null) {
-                                    mutation.setAttribute(REPLAY_WRITES, replayMutations);
-                                } else if (txnProvider != null && projectedTable.getType() == PTableType.INDEX) {
-                                    mutation = txnProvider.markPutAsCommitted((Put)mutation, ts, ts);
-                                }
-                                mutations.add(mutation);
-                            }
-                            for (i = 0; i < selectExpressions.size(); i++) {
-                                selectExpressions.get(i).reset();
-                            }
-                        } else if (deleteCF != null && deleteCQ != null) {
-                            // No need to search for delete column, since we project only it
-                            // if no empty key value is being set
-                            if (emptyCF == null ||
-                                    result.getValue(deleteCF, deleteCQ) != null) {
-                                Delete delete = new Delete(results.get(0).getRowArray(),
-                                    results.get(0).getRowOffset(),
-                                    results.get(0).getRowLength());
-                                delete.deleteColumns(deleteCF,  deleteCQ, ts);
-                                // force tephra to ignore this deletes
-                                delete.setAttribute(PhoenixTransactionContext.TX_ROLLBACK_ATTRIBUTE_KEY, new byte[0]);
-                                mutations.add(delete);
-                            }
-                        }
-                        if (emptyCF != null) {
-                            /*
-                             * If we've specified an emptyCF, then we need to insert an empty
-                             * key value "retroactively" for any key value that is visible at
-                             * the timestamp that the DDL was issued. Key values that are not
-                             * visible at this timestamp will not ever be projected up to
-                             * scans past this timestamp, so don't need to be considered.
-                             * We insert one empty key value per row per timestamp.
-                             */
-                            Set<Long> timeStamps =
-                                    Sets.newHashSetWithExpectedSize(results.size());
-                            for (Cell kv : results) {
-                                long kvts = kv.getTimestamp();
-                                if (!timeStamps.contains(kvts)) {
-                                    Put put = new Put(kv.getRowArray(), kv.getRowOffset(),
-                                        kv.getRowLength());
-                                    put.addColumn(emptyCF, QueryConstants.EMPTY_COLUMN_BYTES, kvts,
-                                        ByteUtil.EMPTY_BYTE_ARRAY);
-                                    mutations.add(put);
-                                }
-                            }
-                        }
-                        if (ServerUtil.readyToCommit(mutations.size(), mutations.byteSize(), maxBatchSize, maxBatchSizeBytes)) {
-                            commit(region, mutations, indexUUID, blockingMemStoreSize, indexMaintainersPtr,
-                                txState, targetHTable, useIndexProto, isPKChanging, clientVersionBytes);
-                            mutations.clear();
-                        }
-                        // Commit in batches based on UPSERT_BATCH_SIZE_BYTES_ATTRIB in config
-
-                        if (ServerUtil.readyToCommit(indexMutations.size(), indexMutations.byteSize(), maxBatchSize, maxBatchSizeBytes)) {
-                            setIndexAndTransactionProperties(indexMutations, indexUUID, indexMaintainersPtr, txState, clientVersionBytes, useIndexProto);
-                            commitBatch(region, indexMutations, blockingMemStoreSize);
-                            indexMutations.clear();
-                        }
-                        aggregators.aggregate(rowAggregators, result);
-                        hasAny = true;
-                    }
-                } while (hasMore);
-                if (!mutations.isEmpty()) {
-                    commit(region, mutations, indexUUID, blockingMemStoreSize, indexMaintainersPtr, txState,
-                        targetHTable, useIndexProto, isPKChanging, clientVersionBytes);
-                    mutations.clear();
-                }
-
-                if (!indexMutations.isEmpty()) {
-                    commitBatch(region, indexMutations, blockingMemStoreSize);
-                    indexMutations.clear();
-                }
-            }
-        } finally {
-            if (needToWrite && incrScanRefCount) {
-                synchronized (lock) {
-                    scansReferenceCount--;
-                    if (scansReferenceCount < 0) {
-                        LOGGER.warn(
-                            "Scan reference count went below zero. Something isn't correct. Resetting it back to zero");
-                        scansReferenceCount = 0;
-                    }
-                    lock.notifyAll();
-                }
-            }
-            try {
-                if (targetHTable != null) {
-                    try {
-                        targetHTable.close();
-                    } catch (IOException e) {
-                        LOGGER.error("Closing table: " + targetHTable + " failed: ", e);
-                    }
-                }
-            } finally {
-                try {
-                    innerScanner.close();
-                } finally {
-                    if (acquiredLock) region.closeRegionOperation();
-                }
-            }
-        }
-        if (LOGGER.isDebugEnabled()) {
-            LOGGER.debug(LogUtil.addCustomAnnotations("Finished scanning " + rowCount + " rows for ungrouped coprocessor scan " + scan, ScanUtil.getCustomAnnotations(scan)));
+                    getWrappedScanner(c, theScanner, offset, scan, dataColumns, tupleProjector,
+                            region, indexMaintainers == null ? null : indexMaintainers.get(0), viewConstants, p, tempPtr, useQualifierAsIndex);
         }
 
-        final boolean hadAny = hasAny;
-        KeyValue keyValue = null;
-        if (hadAny) {
-            byte[] value = aggregators.toBytes(rowAggregators);
-            keyValue = KeyValueUtil.newKeyValue(UNGROUPED_AGG_ROW_KEY, SINGLE_COLUMN_FAMILY, SINGLE_COLUMN, AGG_TIMESTAMP, value, 0, value.length);
+        if (j != null)  {
+            theScanner = new HashJoinRegionScanner(theScanner, p, j, ScanUtil.getTenantId(scan), env, useQualifierAsIndex, useNewValueColumnQualifier);
         }
-        final KeyValue aggKeyValue = keyValue;
-
-        RegionScanner scanner = new BaseRegionScanner(innerScanner) {
-            private boolean done = !hadAny;
-
-            @Override
-            public boolean isFilterDone() {
-                return done;
-            }
-
-            @Override
-            public boolean next(List<Cell> results) throws IOException {
-                if (done) return false;
-                done = true;
-                results.add(aggKeyValue);
-                return false;
-            }
-
-            @Override
-            public long getMaxResultSize() {
-                return scan.getMaxResultSize();
-            }
-        };
+        RegionScanner scanner = new UngroupedAggregateRegionScanner(c, theScanner,region, scan, env, this);

Review comment:
       nit: You can just `return` here instead of using the new variable `scanner`

##########
File path: phoenix-core/src/main/java/org/apache/phoenix/coprocessor/UngroupedAggregateRegionScanner.java
##########
@@ -0,0 +1,645 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.phoenix.coprocessor;
+
+import static org.apache.phoenix.coprocessor.BaseScannerRegionObserver.LOCAL_INDEX_BUILD;
+import static org.apache.phoenix.coprocessor.BaseScannerRegionObserver.LOCAL_INDEX_BUILD_PROTO;
+import static org.apache.phoenix.coprocessor.BaseScannerRegionObserver.REPLAY_WRITES;
+import static org.apache.phoenix.coprocessor.BaseScannerRegionObserver.UPGRADE_DESC_ROW_KEY;
+import static org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver.checkForLocalIndexColumnFamilies;
+import static org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver.deserializeExpressions;
+import static org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver.deserializeTable;
+import static org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver.getBlockingMemstoreSize;
+import static org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver.setIndexAndTransactionProperties;
+import static org.apache.phoenix.query.QueryConstants.AGG_TIMESTAMP;
+import static org.apache.phoenix.query.QueryConstants.SINGLE_COLUMN;
+import static org.apache.phoenix.query.QueryConstants.SINGLE_COLUMN_FAMILY;
+import static org.apache.phoenix.query.QueryServices.MUTATE_BATCH_SIZE_ATTRIB;
+import static org.apache.phoenix.query.QueryServices.MUTATE_BATCH_SIZE_BYTES_ATTRIB;
+import static org.apache.phoenix.query.QueryServices.UNGROUPED_AGGREGATE_PAGE_SIZE_IN_ROWS;
+import static org.apache.phoenix.schema.PTableImpl.getColumnsToClone;
+
+import java.io.IOException;
+import java.sql.SQLException;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.List;
+import java.util.Set;
+
+import com.google.common.collect.Sets;
+import com.google.common.primitives.Ints;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.hbase.Cell;
+import org.apache.hadoop.hbase.CellUtil;
+import org.apache.hadoop.hbase.DoNotRetryIOException;
+import org.apache.hadoop.hbase.HRegionInfo;
+import org.apache.hadoop.hbase.KeyValue;
+import org.apache.hadoop.hbase.client.Delete;
+import org.apache.hadoop.hbase.client.HTable;
+import org.apache.hadoop.hbase.client.Mutation;
+import org.apache.hadoop.hbase.client.Put;
+import org.apache.hadoop.hbase.client.Scan;
+import org.apache.hadoop.hbase.coprocessor.ObserverContext;
+import org.apache.hadoop.hbase.coprocessor.RegionCoprocessorEnvironment;
+import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
+import org.apache.hadoop.hbase.regionserver.Region;
+import org.apache.hadoop.hbase.regionserver.RegionScanner;
+import org.apache.hadoop.hbase.util.Bytes;
+import org.apache.phoenix.cache.GlobalCache;
+import org.apache.phoenix.cache.TenantCache;
+import org.apache.phoenix.exception.DataExceedsCapacityException;
+import org.apache.phoenix.expression.Expression;
+import org.apache.phoenix.expression.aggregator.Aggregator;
+import org.apache.phoenix.expression.aggregator.Aggregators;
+import org.apache.phoenix.expression.aggregator.ServerAggregators;
+import org.apache.phoenix.hbase.index.covered.update.ColumnReference;
+import org.apache.phoenix.hbase.index.util.GenericKeyValueBuilder;
+import org.apache.phoenix.index.PhoenixIndexCodec;
+import org.apache.phoenix.memory.InsufficientMemoryException;
+import org.apache.phoenix.memory.MemoryManager;
+import org.apache.phoenix.query.QueryConstants;
+import org.apache.phoenix.query.QueryServicesOptions;
+import org.apache.phoenix.schema.PColumn;
+import org.apache.phoenix.schema.PRow;
+import org.apache.phoenix.schema.PTable;
+import org.apache.phoenix.schema.PTableImpl;
+import org.apache.phoenix.schema.PTableType;
+import org.apache.phoenix.schema.RowKeySchema;
+import org.apache.phoenix.schema.SortOrder;
+import org.apache.phoenix.schema.TableRef;
+import org.apache.phoenix.schema.ValueSchema;
+import org.apache.phoenix.schema.tuple.EncodedColumnQualiferCellsList;
+import org.apache.phoenix.schema.tuple.MultiKeyValueTuple;
+import org.apache.phoenix.schema.tuple.PositionBasedMultiKeyValueTuple;
+import org.apache.phoenix.schema.tuple.Tuple;
+import org.apache.phoenix.schema.types.PBinary;
+import org.apache.phoenix.schema.types.PChar;
+import org.apache.phoenix.schema.types.PDataType;
+import org.apache.phoenix.schema.types.PDouble;
+import org.apache.phoenix.schema.types.PFloat;
+import org.apache.phoenix.transaction.PhoenixTransactionContext;
+import org.apache.phoenix.transaction.PhoenixTransactionProvider;
+import org.apache.phoenix.transaction.TransactionFactory;
+import org.apache.phoenix.util.ByteUtil;
+import org.apache.phoenix.util.EncodedColumnsUtil;
+import org.apache.hadoop.hbase.util.Pair;
+import org.apache.phoenix.hbase.index.ValueGetter;
+import org.apache.phoenix.hbase.index.util.ImmutableBytesPtr;
+import org.apache.phoenix.index.IndexMaintainer;
+import org.apache.phoenix.util.ExpressionUtil;
+import org.apache.phoenix.util.IndexUtil;
+import org.apache.phoenix.util.KeyValueUtil;
+import org.apache.phoenix.util.LogUtil;
+import org.apache.phoenix.util.ScanUtil;
+import org.apache.phoenix.util.ServerUtil;
+import org.apache.phoenix.util.StringUtil;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+public class UngroupedAggregateRegionScanner extends BaseRegionScanner {
+
+    private static final Logger LOGGER = LoggerFactory.getLogger(UngroupedAggregateRegionScanner.class);
+
+    protected long pageSizeInRows = Long.MAX_VALUE;
+    protected int maxBatchSize = 0;
+    protected Scan scan;
+    protected RegionScanner innerScanner;
+    protected Region region;
+    private final UngroupedAggregateRegionObserver ungroupedAggregateRegionObserver;
+    final private RegionCoprocessorEnvironment env;
+    private final boolean useQualifierAsIndex;
+    boolean needToWrite = false;
+    final Pair<Integer, Integer> minMaxQualifiers;
+    byte[][] values = null;
+    PTable.QualifierEncodingScheme encodingScheme;
+    PTable writeToTable = null;
+    PTable projectedTable = null;
+    boolean isDescRowKeyOrderUpgrade;
+    final int offset;
+    boolean buildLocalIndex;
+    List<IndexMaintainer> indexMaintainers;
+    boolean isPKChanging = false;
+    long ts;
+    PhoenixTransactionProvider txnProvider = null;
+    UngroupedAggregateRegionObserver.MutationList indexMutations;
+    boolean isDelete = false;
+    byte[] replayMutations;
+    boolean isUpsert = false;
+    List<Expression> selectExpressions = null;
+    byte[] deleteCQ = null;
+    byte[] deleteCF = null;
+    byte[] emptyCF = null;
+    byte[] indexUUID;
+    byte[] txState;
+    byte[] clientVersionBytes;
+    long blockingMemStoreSize;
+    long maxBatchSizeBytes = 0L;
+    HTable targetHTable = null;
+    boolean incrScanRefCount = false;
+    byte[] indexMaintainersPtr;
+    boolean useIndexProto;
+
+    public UngroupedAggregateRegionScanner(final ObserverContext<RegionCoprocessorEnvironment> c,
+                                           final RegionScanner innerScanner, final Region region, final Scan scan,
+                                           final RegionCoprocessorEnvironment env,
+                                           final UngroupedAggregateRegionObserver ungroupedAggregateRegionObserver)
+            throws IOException, SQLException{
+        super(innerScanner);
+        this.env = env;
+        this.region = region;
+        this.scan = scan;
+        this.ungroupedAggregateRegionObserver = ungroupedAggregateRegionObserver;
+        this.innerScanner = innerScanner;
+        Configuration conf = env.getConfiguration();
+        if (scan.getAttribute(BaseScannerRegionObserver.SERVER_PAGING) != null) {
+            byte[] pageSizeFromScan =
+                    scan.getAttribute(BaseScannerRegionObserver.AGGREGATE_PAGE_ROWS);
+            if (pageSizeFromScan != null) {
+                pageSizeInRows = Bytes.toLong(pageSizeFromScan);
+            } else {
+                pageSizeInRows =
+                        conf.getLong(UNGROUPED_AGGREGATE_PAGE_SIZE_IN_ROWS,
+                                QueryServicesOptions.DEFAULT_UNGROUPED_AGGREGATE_PAGE_SIZE_IN_ROWS);
+            }
+        }
+        ts = scan.getTimeRange().getMax();
+        boolean localIndexScan = ScanUtil.isLocalIndex(scan);
+
+        encodingScheme = EncodedColumnsUtil.getQualifierEncodingScheme(scan);
+        int offsetToBe = 0;
+        if (localIndexScan) {
+            /*
+             * For local indexes, we need to set an offset on row key expressions to skip
+             * the region start key.
+             */
+            offsetToBe = region.getRegionInfo().getStartKey().length != 0 ? region.getRegionInfo().getStartKey().length :
+                    region.getRegionInfo().getEndKey().length;
+            ScanUtil.setRowKeyOffset(scan, offsetToBe);
+        }
+        offset = offsetToBe;
+
+        byte[] descRowKeyTableBytes = scan.getAttribute(UPGRADE_DESC_ROW_KEY);
+        isDescRowKeyOrderUpgrade = descRowKeyTableBytes != null;
+        if (isDescRowKeyOrderUpgrade) {
+            LOGGER.debug("Upgrading row key for " + region.getRegionInfo().getTable().getNameAsString());
+            projectedTable = deserializeTable(descRowKeyTableBytes);
+            try {
+                writeToTable = PTableImpl.builderWithColumns(projectedTable,
+                        getColumnsToClone(projectedTable))
+                        .setRowKeyOrderOptimizable(true)
+                        .build();
+            } catch (SQLException e) {
+                ServerUtil.throwIOException("Upgrade failed", e); // Impossible
+            }
+            values = new byte[projectedTable.getPKColumns().size()][];
+        }
+        boolean useProto = false;
+        byte[] localIndexBytes = scan.getAttribute(LOCAL_INDEX_BUILD_PROTO);
+        useProto = localIndexBytes != null;
+        if (localIndexBytes == null) {
+            localIndexBytes = scan.getAttribute(LOCAL_INDEX_BUILD);
+        }
+        indexMaintainers = localIndexBytes == null ? null : IndexMaintainer.deserialize(localIndexBytes, useProto);
+        indexMutations = localIndexBytes == null ? new UngroupedAggregateRegionObserver.MutationList() : new UngroupedAggregateRegionObserver.MutationList(1024);
+
+        replayMutations = scan.getAttribute(REPLAY_WRITES);
+        indexUUID = scan.getAttribute(PhoenixIndexCodec.INDEX_UUID);
+        txState = scan.getAttribute(BaseScannerRegionObserver.TX_STATE);
+        clientVersionBytes = scan.getAttribute(BaseScannerRegionObserver.CLIENT_VERSION);
+        if (txState != null) {
+            int clientVersion = clientVersionBytes == null ? ScanUtil.UNKNOWN_CLIENT_VERSION : Bytes.toInt(clientVersionBytes);
+            txnProvider = TransactionFactory.getTransactionProvider(txState, clientVersion);
+        }
+        byte[] upsertSelectTable = scan.getAttribute(BaseScannerRegionObserver.UPSERT_SELECT_TABLE);
+        if (upsertSelectTable != null) {
+            isUpsert = true;
+            projectedTable = deserializeTable(upsertSelectTable);
+            targetHTable = new HTable(ungroupedAggregateRegionObserver.getUpsertSelectConfig(),
+                    projectedTable.getPhysicalName().getBytes());
+            selectExpressions = deserializeExpressions(scan.getAttribute(BaseScannerRegionObserver.UPSERT_SELECT_EXPRS));
+            values = new byte[projectedTable.getPKColumns().size()][];
+            isPKChanging = ExpressionUtil.isPkPositionChanging(new TableRef(projectedTable), selectExpressions);
+        } else {
+            byte[] isDeleteAgg = scan.getAttribute(BaseScannerRegionObserver.DELETE_AGG);
+            isDelete = isDeleteAgg != null && Bytes.compareTo(PDataType.TRUE_BYTES, isDeleteAgg) == 0;
+            if (!isDelete) {
+                deleteCF = scan.getAttribute(BaseScannerRegionObserver.DELETE_CF);
+                deleteCQ = scan.getAttribute(BaseScannerRegionObserver.DELETE_CQ);
+            }
+            emptyCF = scan.getAttribute(BaseScannerRegionObserver.EMPTY_CF);
+        }
+        ColumnReference[] dataColumns = IndexUtil.deserializeDataTableColumnsToJoin(scan);
+        useQualifierAsIndex = EncodedColumnsUtil.useQualifierAsIndex(EncodedColumnsUtil.getMinMaxQualifiersFromScan(scan));
+
+        /**
+         * Slow down the writes if the memstore size more than
+         * (hbase.hregion.memstore.block.multiplier - 1) times hbase.hregion.memstore.flush.size
+         * bytes. This avoids flush storm to hdfs for cases like index building where reads and
+         * write happen to all the table regions in the server.
+         */
+        blockingMemStoreSize = getBlockingMemstoreSize(region, conf) ;
+
+        buildLocalIndex = indexMaintainers != null && dataColumns==null && !localIndexScan;
+        if(buildLocalIndex) {
+            checkForLocalIndexColumnFamilies(region, indexMaintainers);
+        }
+        if (isDescRowKeyOrderUpgrade || isDelete || isUpsert
+                || (deleteCQ != null && deleteCF != null) || emptyCF != null || buildLocalIndex) {
+            needToWrite = true;
+            if((isUpsert && (targetHTable == null ||
+                    !targetHTable.getName().equals(region.getTableDesc().getTableName())))) {
+                needToWrite = false;
+            }
+            maxBatchSize = conf.getInt(MUTATE_BATCH_SIZE_ATTRIB, QueryServicesOptions.DEFAULT_MUTATE_BATCH_SIZE);
+            maxBatchSizeBytes = conf.getLong(MUTATE_BATCH_SIZE_BYTES_ATTRIB,
+                    QueryServicesOptions.DEFAULT_MUTATE_BATCH_SIZE_BYTES);
+        }
+        minMaxQualifiers = EncodedColumnsUtil.getMinMaxQualifiersFromScan(scan);
+        if (LOGGER.isDebugEnabled()) {
+            LOGGER.debug(LogUtil.addCustomAnnotations("Starting ungrouped coprocessor scan " + scan + " " + region.getRegionInfo(), ScanUtil.getCustomAnnotations(scan)));
+        }
+        useIndexProto = true;
+        indexMaintainersPtr = scan.getAttribute(PhoenixIndexCodec.INDEX_PROTO_MD);
+        // for backward compatiblity fall back to look by the old attribute
+        if (indexMaintainersPtr == null) {
+            indexMaintainersPtr = scan.getAttribute(PhoenixIndexCodec.INDEX_MD);
+            useIndexProto = false;
+        }
+
+        if (needToWrite) {
+            ungroupedAggregateRegionObserver.incrementScansReferenceCount();
+            incrScanRefCount = true;
+        }
+    }
+
+    @Override
+    public HRegionInfo getRegionInfo() {
+        return region.getRegionInfo();
+    }
+
+    @Override
+    public boolean isFilterDone() {
+        return false;
+    }
+
+    @Override
+    public void close() throws IOException {
+        if (needToWrite && incrScanRefCount) {
+            ungroupedAggregateRegionObserver.decrementScansReferenceCount();
+        }
+        try {
+            if (targetHTable != null) {
+                try {
+                    targetHTable.close();
+                } catch (IOException e) {
+                    LOGGER.error("Closing table: " + targetHTable + " failed: ", e);
+                }
+            }
+        } finally {
+            innerScanner.close();
+        }
+    }
+
+    boolean descRowKeyOrderUpgrade(List<Cell> results, ImmutableBytesWritable ptr,
+                                UngroupedAggregateRegionObserver.MutationList mutations) throws IOException {
+        Arrays.fill(values, null);
+        Cell firstKV = results.get(0);
+        RowKeySchema schema = projectedTable.getRowKeySchema();
+        int maxOffset = schema.iterator(firstKV.getRowArray(), firstKV.getRowOffset() + offset, firstKV.getRowLength(), ptr);
+        for (int i = 0; i < schema.getFieldCount(); i++) {
+            Boolean hasValue = schema.next(ptr, i, maxOffset);
+            if (hasValue == null) {
+                break;
+            }
+            ValueSchema.Field field = schema.getField(i);
+            if (field.getSortOrder() == SortOrder.DESC) {
+                // Special case for re-writing DESC ARRAY, as the actual byte value needs to change in this case
+                if (field.getDataType().isArrayType()) {
+                    field.getDataType().coerceBytes(ptr, null, field.getDataType(),
+                            field.getMaxLength(), field.getScale(), field.getSortOrder(),
+                            field.getMaxLength(), field.getScale(), field.getSortOrder(), true); // force to use correct separator byte
+                }
+                // Special case for re-writing DESC CHAR or DESC BINARY, to force the re-writing of trailing space characters
+                else if (field.getDataType() == PChar.INSTANCE || field.getDataType() == PBinary.INSTANCE) {
+                    int len = ptr.getLength();
+                    while (len > 0 && ptr.get()[ptr.getOffset() + len - 1] == StringUtil.SPACE_UTF8) {
+                        len--;
+                    }
+                    ptr.set(ptr.get(), ptr.getOffset(), len);
+                    // Special case for re-writing DESC FLOAT and DOUBLE, as they're not inverted like they should be (PHOENIX-2171)
+                } else if (field.getDataType() == PFloat.INSTANCE || field.getDataType() == PDouble.INSTANCE) {
+                    byte[] invertedBytes = SortOrder.invert(ptr.get(), ptr.getOffset(), ptr.getLength());
+                    ptr.set(invertedBytes);
+                }
+            } else if (field.getDataType() == PBinary.INSTANCE) {
+                // Remove trailing space characters so that the setValues call below will replace them
+                // with the correct zero byte character. Note this is somewhat dangerous as these
+                // could be legit, but I don't know what the alternative is.
+                int len = ptr.getLength();
+                while (len > 0 && ptr.get()[ptr.getOffset() + len - 1] == StringUtil.SPACE_UTF8) {
+                    len--;
+                }
+                ptr.set(ptr.get(), ptr.getOffset(), len);
+            }
+            values[i] = ptr.copyBytes();
+        }
+        writeToTable.newKey(ptr, values);
+        if (Bytes.compareTo(
+                firstKV.getRowArray(), firstKV.getRowOffset() + offset, firstKV.getRowLength(),
+                ptr.get(),ptr.getOffset() + offset,ptr.getLength()) == 0) {
+            return false;
+        }
+        byte[] newRow = ByteUtil.copyKeyBytesIfNecessary(ptr);
+        if (offset > 0) { // for local indexes (prepend region start key)
+            byte[] newRowWithOffset = new byte[offset + newRow.length];
+            System.arraycopy(firstKV.getRowArray(), firstKV.getRowOffset(), newRowWithOffset, 0, offset);;
+            System.arraycopy(newRow, 0, newRowWithOffset, offset, newRow.length);
+            newRow = newRowWithOffset;
+        }
+        byte[] oldRow = Bytes.copy(firstKV.getRowArray(), firstKV.getRowOffset(), firstKV.getRowLength());
+        for (Cell cell : results) {
+            // Copy existing cell but with new row key
+            Cell newCell = new KeyValue(newRow, 0, newRow.length,
+                    cell.getFamilyArray(), cell.getFamilyOffset(), cell.getFamilyLength(),
+                    cell.getQualifierArray(), cell.getQualifierOffset(), cell.getQualifierLength(),
+                    cell.getTimestamp(), KeyValue.Type.codeToType(cell.getTypeByte()),
+                    cell.getValueArray(), cell.getValueOffset(), cell.getValueLength());
+            switch (KeyValue.Type.codeToType(cell.getTypeByte())) {
+                case Put:
+                    // If Put, point delete old Put
+                    Delete del = new Delete(oldRow);
+                    del.addDeleteMarker(new KeyValue(cell.getRowArray(), cell.getRowOffset(), cell.getRowLength(),
+                            cell.getFamilyArray(), cell.getFamilyOffset(), cell.getFamilyLength(),
+                            cell.getQualifierArray(), cell.getQualifierOffset(),
+                            cell.getQualifierLength(), cell.getTimestamp(), KeyValue.Type.Delete,
+                            ByteUtil.EMPTY_BYTE_ARRAY, 0, 0));
+                    mutations.add(del);
+
+                    Put put = new Put(newRow);
+                    put.add(newCell);
+                    mutations.add(put);
+                    break;
+                case Delete:
+                case DeleteColumn:
+                case DeleteFamily:
+                case DeleteFamilyVersion:
+                    Delete delete = new Delete(newRow);
+                    delete.addDeleteMarker(newCell);
+                    mutations.add(delete);
+                    break;
+            }
+        }
+        return true;
+    }
+
+    void buildLocalIndex(Tuple result, List<Cell> results, ImmutableBytesWritable ptr,
+                         UngroupedAggregateRegionObserver.MutationList mutations) throws IOException {
+        for (IndexMaintainer maintainer : indexMaintainers) {
+            if (!results.isEmpty()) {
+                result.getKey(ptr);
+                ValueGetter valueGetter =
+                        maintainer.createGetterFromKeyValues(
+                                ImmutableBytesPtr.copyBytesIfNecessary(ptr),
+                                results);
+                Put put = maintainer.buildUpdateMutation(GenericKeyValueBuilder.INSTANCE,
+                        valueGetter, ptr, results.get(0).getTimestamp(),
+                        env.getRegion().getRegionInfo().getStartKey(),
+                        env.getRegion().getRegionInfo().getEndKey());
+
+                if (txnProvider != null) {
+                    put = txnProvider.markPutAsCommitted(put, ts, ts);
+                }
+                indexMutations.add(put);
+            }
+        }
+        result.setKeyValues(results);
+    }
+    void deleteRow(List<Cell> results, UngroupedAggregateRegionObserver.MutationList mutations) {
+        // FIXME: the version of the Delete constructor without the lock
+        // args was introduced in 0.94.4, thus if we try to use it here
+        // we can no longer use the 0.94.2 version of the client.
+        Cell firstKV = results.get(0);
+        Delete delete = new Delete(firstKV.getRowArray(),
+                firstKV.getRowOffset(), firstKV.getRowLength(),ts);
+        if (replayMutations != null) {
+            delete.setAttribute(REPLAY_WRITES, replayMutations);
+        }
+        mutations.add(delete);
+        // force tephra to ignore this deletes
+        delete.setAttribute(PhoenixTransactionContext.TX_ROLLBACK_ATTRIBUTE_KEY, new byte[0]);
+    }
+
+    void deleteCForQ(Tuple result, List<Cell> results, UngroupedAggregateRegionObserver.MutationList mutations) {
+        // No need to search for delete column, since we project only it
+        // if no empty key value is being set
+        if (emptyCF == null ||
+                result.getValue(deleteCF, deleteCQ) != null) {
+            Delete delete = new Delete(results.get(0).getRowArray(),
+                    results.get(0).getRowOffset(),
+                    results.get(0).getRowLength());
+            delete.deleteColumns(deleteCF,  deleteCQ, ts);
+            // force tephra to ignore this deletes
+            delete.setAttribute(PhoenixTransactionContext.TX_ROLLBACK_ATTRIBUTE_KEY, new byte[0]);
+            mutations.add(delete);
+        }
+    }
+    void upsert(Tuple result, ImmutableBytesWritable ptr, UngroupedAggregateRegionObserver.MutationList mutations) {
+        Arrays.fill(values, null);
+        int bucketNumOffset = 0;
+        if (projectedTable.getBucketNum() != null) {
+            values[0] = new byte[] { 0 };
+            bucketNumOffset = 1;
+        }
+        int i = bucketNumOffset;
+        List<PColumn> projectedColumns = projectedTable.getColumns();
+        for (; i < projectedTable.getPKColumns().size(); i++) {
+            Expression expression = selectExpressions.get(i - bucketNumOffset);
+            if (expression.evaluate(result, ptr)) {
+                values[i] = ptr.copyBytes();
+                // If SortOrder from expression in SELECT doesn't match the
+                // column being projected into then invert the bits.
+                if (expression.getSortOrder() !=
+                        projectedColumns.get(i).getSortOrder()) {
+                    SortOrder.invert(values[i], 0, values[i], 0,
+                            values[i].length);
+                }
+            }else{
+                values[i] = ByteUtil.EMPTY_BYTE_ARRAY;
+            }
+        }
+        projectedTable.newKey(ptr, values);
+        PRow row = projectedTable.newRow(GenericKeyValueBuilder.INSTANCE, ts, ptr, false);
+        for (; i < projectedColumns.size(); i++) {
+            Expression expression = selectExpressions.get(i - bucketNumOffset);
+            if (expression.evaluate(result, ptr)) {
+                PColumn column = projectedColumns.get(i);
+                if (!column.getDataType().isSizeCompatible(ptr, null,
+                        expression.getDataType(), expression.getSortOrder(),
+                        expression.getMaxLength(), expression.getScale(),
+                        column.getMaxLength(), column.getScale())) {
+                    throw new DataExceedsCapacityException(
+                            column.getDataType(), column.getMaxLength(),
+                            column.getScale(), column.getName().getString(), ptr);
+                }
+                column.getDataType().coerceBytes(ptr, null,
+                        expression.getDataType(), expression.getMaxLength(),
+                        expression.getScale(), expression.getSortOrder(),
+                        column.getMaxLength(), column.getScale(),
+                        column.getSortOrder(), projectedTable.rowKeyOrderOptimizable());
+                byte[] bytes = ByteUtil.copyKeyBytesIfNecessary(ptr);
+                row.setValue(column, bytes);
+            }
+        }
+        for (Mutation mutation : row.toRowMutations()) {
+            if (replayMutations != null) {
+                mutation.setAttribute(REPLAY_WRITES, replayMutations);
+            } else if (txnProvider != null && projectedTable.getType() == PTableType.INDEX) {
+                mutation = txnProvider.markPutAsCommitted((Put)mutation, ts, ts);
+            }
+            mutations.add(mutation);
+        }
+        for (i = 0; i < selectExpressions.size(); i++) {
+            selectExpressions.get(i).reset();
+        }
+    }
+
+    void insertEmptyKeyValue(List<Cell> results, UngroupedAggregateRegionObserver.MutationList mutations) {
+        Set<Long> timeStamps =
+                Sets.newHashSetWithExpectedSize(results.size());
+        for (Cell kv : results) {
+            long kvts = kv.getTimestamp();
+            if (!timeStamps.contains(kvts)) {
+                Put put = new Put(kv.getRowArray(), kv.getRowOffset(),
+                        kv.getRowLength());
+                put.addColumn(emptyCF, QueryConstants.EMPTY_COLUMN_BYTES, kvts,
+                        ByteUtil.EMPTY_BYTE_ARRAY);
+                mutations.add(put);
+            }
+        }
+    }
+    @Override
+    public boolean next(List<Cell> resultsToReturn) throws IOException {
+        boolean hasMore;
+        Configuration conf = env.getConfiguration();
+        final TenantCache tenantCache = GlobalCache.getTenantCache(env, ScanUtil.getTenantId(scan));
+        try (MemoryManager.MemoryChunk em = tenantCache.getMemoryManager().allocate(0)) {
+            Aggregators aggregators = ServerAggregators.deserialize(
+                    scan.getAttribute(BaseScannerRegionObserver.AGGREGATORS), conf, em);
+            Aggregator[] rowAggregators = aggregators.getAggregators();
+            aggregators.reset(rowAggregators);
+            Cell lastCell = null;
+            int rowCount = 0;
+            boolean hasAny = false;
+            ImmutableBytesWritable ptr = new ImmutableBytesWritable();
+            Tuple result = useQualifierAsIndex ? new PositionBasedMultiKeyValueTuple() : new MultiKeyValueTuple();
+            UngroupedAggregateRegionObserver.MutationList mutations = new UngroupedAggregateRegionObserver.MutationList();
+            if (isDescRowKeyOrderUpgrade || isDelete || isUpsert
+                    || (deleteCQ != null && deleteCF != null) || emptyCF != null || buildLocalIndex) {
+                mutations = new UngroupedAggregateRegionObserver.MutationList(Ints.saturatedCast(maxBatchSize + maxBatchSize / 10));
+            }
+            region.startRegionOperation();
+            try {
+                synchronized (innerScanner) {
+                    do {
+                        ungroupedAggregateRegionObserver.checkForRegionClosing();
+                        List<Cell> results = useQualifierAsIndex ? new EncodedColumnQualiferCellsList(minMaxQualifiers.getFirst(), minMaxQualifiers.getSecond(), encodingScheme) : new ArrayList<Cell>();
+                        // Results are potentially returned even when the return value of s.next is false
+                        // since this is an indication of whether or not there are more values after the
+                        // ones returned
+                        hasMore = innerScanner.nextRaw(results);
+                        if (!results.isEmpty()) {
+                            lastCell = results.get(0);
+                            rowCount++;
+                            result.setKeyValues(results);
+                            if (isDescRowKeyOrderUpgrade) {
+                                if (!descRowKeyOrderUpgrade(results, ptr, mutations)) {
+                                    continue;
+                                }
+                            } else if (buildLocalIndex) {
+                                buildLocalIndex(result, results, ptr, mutations);
+                            } else if (isDelete) {
+                                deleteRow(results, mutations);
+                            } else if (isUpsert) {
+                                upsert(result, ptr, mutations);
+                            } else if (deleteCF != null && deleteCQ != null) {
+                                deleteCForQ(result, results, mutations);
+                            }
+                            if (emptyCF != null) {
+                                /*
+                                 * If we've specified an emptyCF, then we need to insert an empty
+                                 * key value "retroactively" for any key value that is visible at
+                                 * the timestamp that the DDL was issued. Key values that are not
+                                 * visible at this timestamp will not ever be projected up to
+                                 * scans past this timestamp, so don't need to be considered.
+                                 * We insert one empty key value per row per timestamp.
+                                 */
+                                insertEmptyKeyValue(results, mutations);
+                            }
+                            if (ServerUtil.readyToCommit(mutations.size(), mutations.byteSize(), maxBatchSize, maxBatchSizeBytes)) {
+                                ungroupedAggregateRegionObserver.commit(region, mutations, indexUUID, blockingMemStoreSize, indexMaintainersPtr,
+                                        txState, targetHTable, useIndexProto, isPKChanging, clientVersionBytes);
+                                mutations.clear();
+                            }
+                            // Commit in batches based on UPSERT_BATCH_SIZE_BYTES_ATTRIB in config
+
+                            if (ServerUtil.readyToCommit(indexMutations.size(), indexMutations.byteSize(), maxBatchSize, maxBatchSizeBytes)) {
+                                setIndexAndTransactionProperties(indexMutations, indexUUID, indexMaintainersPtr, txState, clientVersionBytes, useIndexProto);
+                                ungroupedAggregateRegionObserver.commitBatch(region, indexMutations, blockingMemStoreSize);
+                                indexMutations.clear();
+                            }
+                            aggregators.aggregate(rowAggregators, result);
+                            hasAny = true;
+                        }
+                    } while (hasMore && rowCount < pageSizeInRows);

Review comment:
       Based on the comment for `UNGROUPED_AGGREGATE_PAGE_SIZE_IN_ROWS ` in `QueryServices`, shouldn't this be a ` <= pageSizeInRows` check?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [phoenix] kadirozde commented on pull request #936: PHOENIX-5998 Paged server side ungrouped aggregate operations

Posted by GitBox <gi...@apache.org>.

kadirozde commented on pull request #936:
URL: https://github.com/apache/phoenix/pull/936#issuecomment-716862262


   @ChinmaySKulkarni, @gjacoby126, or anyone else who wants to review this PR, if you do not have any questions or comments, can I get your approval? Thanks


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [phoenix] stoty commented on pull request #936: PHOENIX-5998 Paged server side ungrouped aggregate operations

Posted by GitBox <gi...@apache.org>.

stoty commented on pull request #936:
URL: https://github.com/apache/phoenix/pull/936#issuecomment-715332240


   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |:----:|----------:|--------:|:--------|
   | +0 :ok: |  reexec  |   1m  8s |  Docker mode activated.  |
   ||| _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  No case conflicting files found.  |
   | +1 :green_heart: |  hbaseanti  |   0m  0s |  Patch does not have any anti-patterns.  |
   | +1 :green_heart: |  @author  |   0m  0s |  The patch does not contain any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  The patch appears to include 1 new or modified test files.  |
   ||| _ 4.x Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  11m 39s |  4.x passed  |
   | +1 :green_heart: |  compile  |   1m  1s |  4.x passed  |
   | +1 :green_heart: |  checkstyle  |   1m  1s |  4.x passed  |
   | +1 :green_heart: |  javadoc  |   0m 47s |  4.x passed  |
   | +0 :ok: |  spotbugs  |   3m  6s |  phoenix-core in 4.x has 954 extant spotbugs warnings.  |
   ||| _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   6m 21s |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m  1s |  the patch passed  |
   | +1 :green_heart: |  javac  |   1m  1s |  the patch passed  |
   | -1 :x: |  checkstyle  |   1m  5s |  phoenix-core: The patch generated 303 new + 1695 unchanged - 208 fixed = 1998 total (was 1903)  |
   | +1 :green_heart: |  whitespace  |   0m  0s |  The patch has no whitespace issues.  |
   | +1 :green_heart: |  javadoc  |   0m 45s |  the patch passed  |
   | -1 :x: |  spotbugs  |   3m 25s |  phoenix-core generated 1 new + 953 unchanged - 1 fixed = 954 total (was 954)  |
   ||| _ Other Tests _ |
   | -1 :x: |  unit  | 185m 54s |  phoenix-core in the patch failed.  |
   | +1 :green_heart: |  asflicense  |   0m 31s |  The patch does not generate ASF License warnings.  |
   |  |   | 220m 17s |   |
   
   
   | Reason | Tests |
   |-------:|:------|
   | FindBugs | module:phoenix-core |
   |  |  Switch statement found in org.apache.phoenix.coprocessor.UngroupedAggregateRegionScanner.descRowKeyOrderUpgrade(List, ImmutableBytesWritable, UngroupedAggregateRegionObserver$MutationList) where default case is missing  At UngroupedAggregateRegionScanner.java:ImmutableBytesWritable, UngroupedAggregateRegionObserver$MutationList) where default case is missing  At UngroupedAggregateRegionScanner.java:[lines 391-412] |
   | Failed junit tests | phoenix.end2end.TenantSpecificViewIndexSaltedIT |
   |   | phoenix.end2end.SpillableGroupByIT |
   |   | phoenix.execute.UpsertSelectOverlappingBatchesIT |
   |   | phoenix.end2end.UpsertSelectIT |
   |   | phoenix.end2end.StringToArrayFunctionIT |
   |   | phoenix.end2end.index.MutableIndexSplitForwardScanIT |
   |   | phoenix.end2end.join.HashJoinPersistentCacheIT |
   |   | phoenix.end2end.QueryDatabaseMetaDataIT |
   |   | phoenix.end2end.RowTimestampIT |
   |   | phoenix.end2end.ArithmeticQueryIT |
   |   | phoenix.end2end.InListIT |
   |   | phoenix.end2end.index.InvalidIndexStateClientSideIT |
   |   | phoenix.end2end.OrphanViewToolIT |
   |   | phoenix.end2end.BackwardCompatibilityIT |
   |   | phoenix.end2end.index.LocalImmutableNonTxIndexIT |
   
   
   | Subsystem | Report/Notes |
   |----------:|:-------------|
   | Docker | ClientAPI=1.40 ServerAPI=1.40 base: https://ci-hadoop.apache.org/job/Phoenix/job/Phoenix-PreCommit-GitHub-PR/job/PR-936/2/artifact/yetus-general-check/output/Dockerfile |
   | GITHUB PR | https://github.com/apache/phoenix/pull/936 |
   | Optional Tests | dupname asflicense javac javadoc unit spotbugs hbaseanti checkstyle compile |
   | uname | Linux b1c316e10591 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev/phoenix-personality.sh |
   | git revision | 4.x / 605656c |
   | Default Java | Private Build-1.8.0_242-8u242-b08-0ubuntu3~16.04-b08 |
   | checkstyle | https://ci-hadoop.apache.org/job/Phoenix/job/Phoenix-PreCommit-GitHub-PR/job/PR-936/2/artifact/yetus-general-check/output/diff-checkstyle-phoenix-core.txt |
   | spotbugs | https://ci-hadoop.apache.org/job/Phoenix/job/Phoenix-PreCommit-GitHub-PR/job/PR-936/2/artifact/yetus-general-check/output/new-spotbugs-phoenix-core.html |
   | unit | https://ci-hadoop.apache.org/job/Phoenix/job/Phoenix-PreCommit-GitHub-PR/job/PR-936/2/artifact/yetus-general-check/output/patch-unit-phoenix-core.txt |
   |  Test Results | https://ci-hadoop.apache.org/job/Phoenix/job/Phoenix-PreCommit-GitHub-PR/job/PR-936/2/testReport/ |
   | Max. process+thread count | 6113 (vs. ulimit of 30000) |
   | modules | C: phoenix-core U: phoenix-core |
   | Console output | https://ci-hadoop.apache.org/job/Phoenix/job/Phoenix-PreCommit-GitHub-PR/job/PR-936/2/console |
   | versions | git=2.7.4 maven=3.3.9 spotbugs=4.1.3 |
   | Powered by | Apache Yetus 0.12.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [phoenix] kadirozde closed pull request #936: PHOENIX-5998 Paged server side ungrouped aggregate operations

Posted by GitBox <gi...@apache.org>.

kadirozde closed pull request #936:
URL: https://github.com/apache/phoenix/pull/936


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [phoenix] ChinmaySKulkarni edited a comment on pull request #936: PHOENIX-5998 Paged server side ungrouped aggregate operations

Posted by GitBox <gi...@apache.org>.

ChinmaySKulkarni edited a comment on pull request #936:
URL: https://github.com/apache/phoenix/pull/936#issuecomment-716908659


   @kadirozde The changes are substantial and I will need some heads-down time to review them. If it is urgent, please feel free to rely on other's reviews and don't wait for me. I plan on taking a look in detail within the next couple of days.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [phoenix] ChinmaySKulkarni commented on a change in pull request #936: PHOENIX-5998 Paged server side ungrouped aggregate operations

Posted by GitBox <gi...@apache.org>.

ChinmaySKulkarni commented on a change in pull request #936:
URL: https://github.com/apache/phoenix/pull/936#discussion_r520833850



##########
File path: phoenix-core/src/main/java/org/apache/phoenix/coprocessor/UngroupedAggregateRegionScanner.java
##########
@@ -0,0 +1,645 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.phoenix.coprocessor;
+
+import static org.apache.phoenix.coprocessor.BaseScannerRegionObserver.LOCAL_INDEX_BUILD;
+import static org.apache.phoenix.coprocessor.BaseScannerRegionObserver.LOCAL_INDEX_BUILD_PROTO;
+import static org.apache.phoenix.coprocessor.BaseScannerRegionObserver.REPLAY_WRITES;
+import static org.apache.phoenix.coprocessor.BaseScannerRegionObserver.UPGRADE_DESC_ROW_KEY;
+import static org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver.checkForLocalIndexColumnFamilies;
+import static org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver.deserializeExpressions;
+import static org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver.deserializeTable;
+import static org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver.getBlockingMemstoreSize;
+import static org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver.setIndexAndTransactionProperties;
+import static org.apache.phoenix.query.QueryConstants.AGG_TIMESTAMP;
+import static org.apache.phoenix.query.QueryConstants.SINGLE_COLUMN;
+import static org.apache.phoenix.query.QueryConstants.SINGLE_COLUMN_FAMILY;
+import static org.apache.phoenix.query.QueryServices.MUTATE_BATCH_SIZE_ATTRIB;
+import static org.apache.phoenix.query.QueryServices.MUTATE_BATCH_SIZE_BYTES_ATTRIB;
+import static org.apache.phoenix.query.QueryServices.UNGROUPED_AGGREGATE_PAGE_SIZE_IN_ROWS;
+import static org.apache.phoenix.schema.PTableImpl.getColumnsToClone;
+
+import java.io.IOException;
+import java.sql.SQLException;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.List;
+import java.util.Set;
+
+import com.google.common.collect.Sets;
+import com.google.common.primitives.Ints;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.hbase.Cell;
+import org.apache.hadoop.hbase.CellUtil;
+import org.apache.hadoop.hbase.DoNotRetryIOException;
+import org.apache.hadoop.hbase.HRegionInfo;
+import org.apache.hadoop.hbase.KeyValue;
+import org.apache.hadoop.hbase.client.Delete;
+import org.apache.hadoop.hbase.client.HTable;
+import org.apache.hadoop.hbase.client.Mutation;
+import org.apache.hadoop.hbase.client.Put;
+import org.apache.hadoop.hbase.client.Scan;
+import org.apache.hadoop.hbase.coprocessor.ObserverContext;
+import org.apache.hadoop.hbase.coprocessor.RegionCoprocessorEnvironment;
+import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
+import org.apache.hadoop.hbase.regionserver.Region;
+import org.apache.hadoop.hbase.regionserver.RegionScanner;
+import org.apache.hadoop.hbase.util.Bytes;
+import org.apache.phoenix.cache.GlobalCache;
+import org.apache.phoenix.cache.TenantCache;
+import org.apache.phoenix.exception.DataExceedsCapacityException;
+import org.apache.phoenix.expression.Expression;
+import org.apache.phoenix.expression.aggregator.Aggregator;
+import org.apache.phoenix.expression.aggregator.Aggregators;
+import org.apache.phoenix.expression.aggregator.ServerAggregators;
+import org.apache.phoenix.hbase.index.covered.update.ColumnReference;
+import org.apache.phoenix.hbase.index.util.GenericKeyValueBuilder;
+import org.apache.phoenix.index.PhoenixIndexCodec;
+import org.apache.phoenix.memory.InsufficientMemoryException;
+import org.apache.phoenix.memory.MemoryManager;
+import org.apache.phoenix.query.QueryConstants;
+import org.apache.phoenix.query.QueryServicesOptions;
+import org.apache.phoenix.schema.PColumn;
+import org.apache.phoenix.schema.PRow;
+import org.apache.phoenix.schema.PTable;
+import org.apache.phoenix.schema.PTableImpl;
+import org.apache.phoenix.schema.PTableType;
+import org.apache.phoenix.schema.RowKeySchema;
+import org.apache.phoenix.schema.SortOrder;
+import org.apache.phoenix.schema.TableRef;
+import org.apache.phoenix.schema.ValueSchema;
+import org.apache.phoenix.schema.tuple.EncodedColumnQualiferCellsList;
+import org.apache.phoenix.schema.tuple.MultiKeyValueTuple;
+import org.apache.phoenix.schema.tuple.PositionBasedMultiKeyValueTuple;
+import org.apache.phoenix.schema.tuple.Tuple;
+import org.apache.phoenix.schema.types.PBinary;
+import org.apache.phoenix.schema.types.PChar;
+import org.apache.phoenix.schema.types.PDataType;
+import org.apache.phoenix.schema.types.PDouble;
+import org.apache.phoenix.schema.types.PFloat;
+import org.apache.phoenix.transaction.PhoenixTransactionContext;
+import org.apache.phoenix.transaction.PhoenixTransactionProvider;
+import org.apache.phoenix.transaction.TransactionFactory;
+import org.apache.phoenix.util.ByteUtil;
+import org.apache.phoenix.util.EncodedColumnsUtil;
+import org.apache.hadoop.hbase.util.Pair;
+import org.apache.phoenix.hbase.index.ValueGetter;
+import org.apache.phoenix.hbase.index.util.ImmutableBytesPtr;
+import org.apache.phoenix.index.IndexMaintainer;
+import org.apache.phoenix.util.ExpressionUtil;
+import org.apache.phoenix.util.IndexUtil;
+import org.apache.phoenix.util.KeyValueUtil;
+import org.apache.phoenix.util.LogUtil;
+import org.apache.phoenix.util.ScanUtil;
+import org.apache.phoenix.util.ServerUtil;
+import org.apache.phoenix.util.StringUtil;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+public class UngroupedAggregateRegionScanner extends BaseRegionScanner {
+
+    private static final Logger LOGGER = LoggerFactory.getLogger(UngroupedAggregateRegionScanner.class);
+
+    protected long pageSizeInRows = Long.MAX_VALUE;
+    protected int maxBatchSize = 0;
+    protected Scan scan;
+    protected RegionScanner innerScanner;
+    protected Region region;
+    private final UngroupedAggregateRegionObserver ungroupedAggregateRegionObserver;
+    final private RegionCoprocessorEnvironment env;
+    private final boolean useQualifierAsIndex;
+    boolean needToWrite = false;
+    final Pair<Integer, Integer> minMaxQualifiers;
+    byte[][] values = null;
+    PTable.QualifierEncodingScheme encodingScheme;
+    PTable writeToTable = null;
+    PTable projectedTable = null;
+    boolean isDescRowKeyOrderUpgrade;
+    final int offset;
+    boolean buildLocalIndex;
+    List<IndexMaintainer> indexMaintainers;
+    boolean isPKChanging = false;
+    long ts;
+    PhoenixTransactionProvider txnProvider = null;
+    UngroupedAggregateRegionObserver.MutationList indexMutations;
+    boolean isDelete = false;
+    byte[] replayMutations;
+    boolean isUpsert = false;
+    List<Expression> selectExpressions = null;
+    byte[] deleteCQ = null;
+    byte[] deleteCF = null;
+    byte[] emptyCF = null;
+    byte[] indexUUID;
+    byte[] txState;
+    byte[] clientVersionBytes;
+    long blockingMemStoreSize;
+    long maxBatchSizeBytes = 0L;
+    HTable targetHTable = null;
+    boolean incrScanRefCount = false;
+    byte[] indexMaintainersPtr;
+    boolean useIndexProto;
+
+    public UngroupedAggregateRegionScanner(final ObserverContext<RegionCoprocessorEnvironment> c,
+                                           final RegionScanner innerScanner, final Region region, final Scan scan,
+                                           final RegionCoprocessorEnvironment env,
+                                           final UngroupedAggregateRegionObserver ungroupedAggregateRegionObserver)
+            throws IOException, SQLException{
+        super(innerScanner);
+        this.env = env;
+        this.region = region;
+        this.scan = scan;
+        this.ungroupedAggregateRegionObserver = ungroupedAggregateRegionObserver;
+        this.innerScanner = innerScanner;
+        Configuration conf = env.getConfiguration();
+        if (scan.getAttribute(BaseScannerRegionObserver.SERVER_PAGING) != null) {
+            byte[] pageSizeFromScan =
+                    scan.getAttribute(BaseScannerRegionObserver.AGGREGATE_PAGE_ROWS);
+            if (pageSizeFromScan != null) {
+                pageSizeInRows = Bytes.toLong(pageSizeFromScan);
+            } else {
+                pageSizeInRows =
+                        conf.getLong(UNGROUPED_AGGREGATE_PAGE_SIZE_IN_ROWS,
+                                QueryServicesOptions.DEFAULT_UNGROUPED_AGGREGATE_PAGE_SIZE_IN_ROWS);
+            }
+        }
+        ts = scan.getTimeRange().getMax();
+        boolean localIndexScan = ScanUtil.isLocalIndex(scan);
+
+        encodingScheme = EncodedColumnsUtil.getQualifierEncodingScheme(scan);
+        int offsetToBe = 0;
+        if (localIndexScan) {
+            /*
+             * For local indexes, we need to set an offset on row key expressions to skip
+             * the region start key.
+             */
+            offsetToBe = region.getRegionInfo().getStartKey().length != 0 ? region.getRegionInfo().getStartKey().length :
+                    region.getRegionInfo().getEndKey().length;
+            ScanUtil.setRowKeyOffset(scan, offsetToBe);
+        }
+        offset = offsetToBe;
+
+        byte[] descRowKeyTableBytes = scan.getAttribute(UPGRADE_DESC_ROW_KEY);
+        isDescRowKeyOrderUpgrade = descRowKeyTableBytes != null;
+        if (isDescRowKeyOrderUpgrade) {
+            LOGGER.debug("Upgrading row key for " + region.getRegionInfo().getTable().getNameAsString());
+            projectedTable = deserializeTable(descRowKeyTableBytes);
+            try {
+                writeToTable = PTableImpl.builderWithColumns(projectedTable,
+                        getColumnsToClone(projectedTable))
+                        .setRowKeyOrderOptimizable(true)
+                        .build();
+            } catch (SQLException e) {
+                ServerUtil.throwIOException("Upgrade failed", e); // Impossible
+            }
+            values = new byte[projectedTable.getPKColumns().size()][];
+        }
+        boolean useProto = false;
+        byte[] localIndexBytes = scan.getAttribute(LOCAL_INDEX_BUILD_PROTO);
+        useProto = localIndexBytes != null;
+        if (localIndexBytes == null) {
+            localIndexBytes = scan.getAttribute(LOCAL_INDEX_BUILD);
+        }
+        indexMaintainers = localIndexBytes == null ? null : IndexMaintainer.deserialize(localIndexBytes, useProto);
+        indexMutations = localIndexBytes == null ? new UngroupedAggregateRegionObserver.MutationList() : new UngroupedAggregateRegionObserver.MutationList(1024);
+
+        replayMutations = scan.getAttribute(REPLAY_WRITES);
+        indexUUID = scan.getAttribute(PhoenixIndexCodec.INDEX_UUID);
+        txState = scan.getAttribute(BaseScannerRegionObserver.TX_STATE);
+        clientVersionBytes = scan.getAttribute(BaseScannerRegionObserver.CLIENT_VERSION);
+        if (txState != null) {
+            int clientVersion = clientVersionBytes == null ? ScanUtil.UNKNOWN_CLIENT_VERSION : Bytes.toInt(clientVersionBytes);
+            txnProvider = TransactionFactory.getTransactionProvider(txState, clientVersion);
+        }
+        byte[] upsertSelectTable = scan.getAttribute(BaseScannerRegionObserver.UPSERT_SELECT_TABLE);
+        if (upsertSelectTable != null) {
+            isUpsert = true;
+            projectedTable = deserializeTable(upsertSelectTable);
+            targetHTable = new HTable(ungroupedAggregateRegionObserver.getUpsertSelectConfig(),
+                    projectedTable.getPhysicalName().getBytes());
+            selectExpressions = deserializeExpressions(scan.getAttribute(BaseScannerRegionObserver.UPSERT_SELECT_EXPRS));
+            values = new byte[projectedTable.getPKColumns().size()][];
+            isPKChanging = ExpressionUtil.isPkPositionChanging(new TableRef(projectedTable), selectExpressions);
+        } else {
+            byte[] isDeleteAgg = scan.getAttribute(BaseScannerRegionObserver.DELETE_AGG);
+            isDelete = isDeleteAgg != null && Bytes.compareTo(PDataType.TRUE_BYTES, isDeleteAgg) == 0;
+            if (!isDelete) {
+                deleteCF = scan.getAttribute(BaseScannerRegionObserver.DELETE_CF);
+                deleteCQ = scan.getAttribute(BaseScannerRegionObserver.DELETE_CQ);
+            }
+            emptyCF = scan.getAttribute(BaseScannerRegionObserver.EMPTY_CF);
+        }
+        ColumnReference[] dataColumns = IndexUtil.deserializeDataTableColumnsToJoin(scan);
+        useQualifierAsIndex = EncodedColumnsUtil.useQualifierAsIndex(EncodedColumnsUtil.getMinMaxQualifiersFromScan(scan));
+
+        /**
+         * Slow down the writes if the memstore size more than
+         * (hbase.hregion.memstore.block.multiplier - 1) times hbase.hregion.memstore.flush.size
+         * bytes. This avoids flush storm to hdfs for cases like index building where reads and
+         * write happen to all the table regions in the server.
+         */
+        blockingMemStoreSize = getBlockingMemstoreSize(region, conf) ;
+
+        buildLocalIndex = indexMaintainers != null && dataColumns==null && !localIndexScan;
+        if(buildLocalIndex) {
+            checkForLocalIndexColumnFamilies(region, indexMaintainers);
+        }
+        if (isDescRowKeyOrderUpgrade || isDelete || isUpsert
+                || (deleteCQ != null && deleteCF != null) || emptyCF != null || buildLocalIndex) {
+            needToWrite = true;
+            if((isUpsert && (targetHTable == null ||
+                    !targetHTable.getName().equals(region.getTableDesc().getTableName())))) {
+                needToWrite = false;
+            }
+            maxBatchSize = conf.getInt(MUTATE_BATCH_SIZE_ATTRIB, QueryServicesOptions.DEFAULT_MUTATE_BATCH_SIZE);
+            maxBatchSizeBytes = conf.getLong(MUTATE_BATCH_SIZE_BYTES_ATTRIB,
+                    QueryServicesOptions.DEFAULT_MUTATE_BATCH_SIZE_BYTES);
+        }
+        minMaxQualifiers = EncodedColumnsUtil.getMinMaxQualifiersFromScan(scan);
+        if (LOGGER.isDebugEnabled()) {
+            LOGGER.debug(LogUtil.addCustomAnnotations("Starting ungrouped coprocessor scan " + scan + " " + region.getRegionInfo(), ScanUtil.getCustomAnnotations(scan)));
+        }
+        useIndexProto = true;
+        indexMaintainersPtr = scan.getAttribute(PhoenixIndexCodec.INDEX_PROTO_MD);
+        // for backward compatiblity fall back to look by the old attribute
+        if (indexMaintainersPtr == null) {
+            indexMaintainersPtr = scan.getAttribute(PhoenixIndexCodec.INDEX_MD);
+            useIndexProto = false;
+        }
+
+        if (needToWrite) {
+            ungroupedAggregateRegionObserver.incrementScansReferenceCount();
+            incrScanRefCount = true;
+        }
+    }
+
+    @Override
+    public HRegionInfo getRegionInfo() {
+        return region.getRegionInfo();
+    }
+
+    @Override
+    public boolean isFilterDone() {
+        return false;
+    }
+
+    @Override
+    public void close() throws IOException {
+        if (needToWrite && incrScanRefCount) {
+            ungroupedAggregateRegionObserver.decrementScansReferenceCount();
+        }
+        try {
+            if (targetHTable != null) {
+                try {
+                    targetHTable.close();
+                } catch (IOException e) {
+                    LOGGER.error("Closing table: " + targetHTable + " failed: ", e);
+                }
+            }
+        } finally {
+            innerScanner.close();
+        }
+    }
+
+    boolean descRowKeyOrderUpgrade(List<Cell> results, ImmutableBytesWritable ptr,
+                                UngroupedAggregateRegionObserver.MutationList mutations) throws IOException {
+        Arrays.fill(values, null);
+        Cell firstKV = results.get(0);
+        RowKeySchema schema = projectedTable.getRowKeySchema();
+        int maxOffset = schema.iterator(firstKV.getRowArray(), firstKV.getRowOffset() + offset, firstKV.getRowLength(), ptr);
+        for (int i = 0; i < schema.getFieldCount(); i++) {
+            Boolean hasValue = schema.next(ptr, i, maxOffset);
+            if (hasValue == null) {
+                break;
+            }
+            ValueSchema.Field field = schema.getField(i);
+            if (field.getSortOrder() == SortOrder.DESC) {
+                // Special case for re-writing DESC ARRAY, as the actual byte value needs to change in this case
+                if (field.getDataType().isArrayType()) {
+                    field.getDataType().coerceBytes(ptr, null, field.getDataType(),
+                            field.getMaxLength(), field.getScale(), field.getSortOrder(),
+                            field.getMaxLength(), field.getScale(), field.getSortOrder(), true); // force to use correct separator byte
+                }
+                // Special case for re-writing DESC CHAR or DESC BINARY, to force the re-writing of trailing space characters
+                else if (field.getDataType() == PChar.INSTANCE || field.getDataType() == PBinary.INSTANCE) {
+                    int len = ptr.getLength();
+                    while (len > 0 && ptr.get()[ptr.getOffset() + len - 1] == StringUtil.SPACE_UTF8) {
+                        len--;
+                    }
+                    ptr.set(ptr.get(), ptr.getOffset(), len);
+                    // Special case for re-writing DESC FLOAT and DOUBLE, as they're not inverted like they should be (PHOENIX-2171)
+                } else if (field.getDataType() == PFloat.INSTANCE || field.getDataType() == PDouble.INSTANCE) {
+                    byte[] invertedBytes = SortOrder.invert(ptr.get(), ptr.getOffset(), ptr.getLength());
+                    ptr.set(invertedBytes);
+                }
+            } else if (field.getDataType() == PBinary.INSTANCE) {
+                // Remove trailing space characters so that the setValues call below will replace them
+                // with the correct zero byte character. Note this is somewhat dangerous as these
+                // could be legit, but I don't know what the alternative is.
+                int len = ptr.getLength();
+                while (len > 0 && ptr.get()[ptr.getOffset() + len - 1] == StringUtil.SPACE_UTF8) {
+                    len--;
+                }
+                ptr.set(ptr.get(), ptr.getOffset(), len);
+            }
+            values[i] = ptr.copyBytes();
+        }
+        writeToTable.newKey(ptr, values);
+        if (Bytes.compareTo(
+                firstKV.getRowArray(), firstKV.getRowOffset() + offset, firstKV.getRowLength(),
+                ptr.get(),ptr.getOffset() + offset,ptr.getLength()) == 0) {
+            return false;
+        }
+        byte[] newRow = ByteUtil.copyKeyBytesIfNecessary(ptr);
+        if (offset > 0) { // for local indexes (prepend region start key)
+            byte[] newRowWithOffset = new byte[offset + newRow.length];
+            System.arraycopy(firstKV.getRowArray(), firstKV.getRowOffset(), newRowWithOffset, 0, offset);;
+            System.arraycopy(newRow, 0, newRowWithOffset, offset, newRow.length);
+            newRow = newRowWithOffset;
+        }
+        byte[] oldRow = Bytes.copy(firstKV.getRowArray(), firstKV.getRowOffset(), firstKV.getRowLength());
+        for (Cell cell : results) {
+            // Copy existing cell but with new row key
+            Cell newCell = new KeyValue(newRow, 0, newRow.length,
+                    cell.getFamilyArray(), cell.getFamilyOffset(), cell.getFamilyLength(),
+                    cell.getQualifierArray(), cell.getQualifierOffset(), cell.getQualifierLength(),
+                    cell.getTimestamp(), KeyValue.Type.codeToType(cell.getTypeByte()),
+                    cell.getValueArray(), cell.getValueOffset(), cell.getValueLength());
+            switch (KeyValue.Type.codeToType(cell.getTypeByte())) {
+                case Put:
+                    // If Put, point delete old Put
+                    Delete del = new Delete(oldRow);
+                    del.addDeleteMarker(new KeyValue(cell.getRowArray(), cell.getRowOffset(), cell.getRowLength(),
+                            cell.getFamilyArray(), cell.getFamilyOffset(), cell.getFamilyLength(),
+                            cell.getQualifierArray(), cell.getQualifierOffset(),
+                            cell.getQualifierLength(), cell.getTimestamp(), KeyValue.Type.Delete,
+                            ByteUtil.EMPTY_BYTE_ARRAY, 0, 0));
+                    mutations.add(del);
+
+                    Put put = new Put(newRow);
+                    put.add(newCell);
+                    mutations.add(put);
+                    break;
+                case Delete:
+                case DeleteColumn:
+                case DeleteFamily:
+                case DeleteFamilyVersion:
+                    Delete delete = new Delete(newRow);
+                    delete.addDeleteMarker(newCell);
+                    mutations.add(delete);
+                    break;
+            }
+        }
+        return true;
+    }
+
+    void buildLocalIndex(Tuple result, List<Cell> results, ImmutableBytesWritable ptr,
+                         UngroupedAggregateRegionObserver.MutationList mutations) throws IOException {
+        for (IndexMaintainer maintainer : indexMaintainers) {
+            if (!results.isEmpty()) {
+                result.getKey(ptr);
+                ValueGetter valueGetter =
+                        maintainer.createGetterFromKeyValues(
+                                ImmutableBytesPtr.copyBytesIfNecessary(ptr),
+                                results);
+                Put put = maintainer.buildUpdateMutation(GenericKeyValueBuilder.INSTANCE,
+                        valueGetter, ptr, results.get(0).getTimestamp(),
+                        env.getRegion().getRegionInfo().getStartKey(),
+                        env.getRegion().getRegionInfo().getEndKey());
+
+                if (txnProvider != null) {
+                    put = txnProvider.markPutAsCommitted(put, ts, ts);
+                }
+                indexMutations.add(put);
+            }
+        }
+        result.setKeyValues(results);
+    }
+    void deleteRow(List<Cell> results, UngroupedAggregateRegionObserver.MutationList mutations) {
+        // FIXME: the version of the Delete constructor without the lock
+        // args was introduced in 0.94.4, thus if we try to use it here
+        // we can no longer use the 0.94.2 version of the client.
+        Cell firstKV = results.get(0);
+        Delete delete = new Delete(firstKV.getRowArray(),
+                firstKV.getRowOffset(), firstKV.getRowLength(),ts);
+        if (replayMutations != null) {
+            delete.setAttribute(REPLAY_WRITES, replayMutations);
+        }
+        mutations.add(delete);
+        // force tephra to ignore this deletes
+        delete.setAttribute(PhoenixTransactionContext.TX_ROLLBACK_ATTRIBUTE_KEY, new byte[0]);
+    }
+
+    void deleteCForQ(Tuple result, List<Cell> results, UngroupedAggregateRegionObserver.MutationList mutations) {
+        // No need to search for delete column, since we project only it
+        // if no empty key value is being set
+        if (emptyCF == null ||
+                result.getValue(deleteCF, deleteCQ) != null) {
+            Delete delete = new Delete(results.get(0).getRowArray(),
+                    results.get(0).getRowOffset(),
+                    results.get(0).getRowLength());
+            delete.deleteColumns(deleteCF,  deleteCQ, ts);
+            // force tephra to ignore this deletes
+            delete.setAttribute(PhoenixTransactionContext.TX_ROLLBACK_ATTRIBUTE_KEY, new byte[0]);
+            mutations.add(delete);
+        }
+    }
+    void upsert(Tuple result, ImmutableBytesWritable ptr, UngroupedAggregateRegionObserver.MutationList mutations) {
+        Arrays.fill(values, null);
+        int bucketNumOffset = 0;
+        if (projectedTable.getBucketNum() != null) {
+            values[0] = new byte[] { 0 };
+            bucketNumOffset = 1;
+        }
+        int i = bucketNumOffset;
+        List<PColumn> projectedColumns = projectedTable.getColumns();
+        for (; i < projectedTable.getPKColumns().size(); i++) {
+            Expression expression = selectExpressions.get(i - bucketNumOffset);
+            if (expression.evaluate(result, ptr)) {
+                values[i] = ptr.copyBytes();
+                // If SortOrder from expression in SELECT doesn't match the
+                // column being projected into then invert the bits.
+                if (expression.getSortOrder() !=
+                        projectedColumns.get(i).getSortOrder()) {
+                    SortOrder.invert(values[i], 0, values[i], 0,
+                            values[i].length);
+                }
+            }else{
+                values[i] = ByteUtil.EMPTY_BYTE_ARRAY;
+            }
+        }
+        projectedTable.newKey(ptr, values);
+        PRow row = projectedTable.newRow(GenericKeyValueBuilder.INSTANCE, ts, ptr, false);
+        for (; i < projectedColumns.size(); i++) {
+            Expression expression = selectExpressions.get(i - bucketNumOffset);
+            if (expression.evaluate(result, ptr)) {
+                PColumn column = projectedColumns.get(i);
+                if (!column.getDataType().isSizeCompatible(ptr, null,
+                        expression.getDataType(), expression.getSortOrder(),
+                        expression.getMaxLength(), expression.getScale(),
+                        column.getMaxLength(), column.getScale())) {
+                    throw new DataExceedsCapacityException(
+                            column.getDataType(), column.getMaxLength(),
+                            column.getScale(), column.getName().getString(), ptr);
+                }
+                column.getDataType().coerceBytes(ptr, null,
+                        expression.getDataType(), expression.getMaxLength(),
+                        expression.getScale(), expression.getSortOrder(),
+                        column.getMaxLength(), column.getScale(),
+                        column.getSortOrder(), projectedTable.rowKeyOrderOptimizable());
+                byte[] bytes = ByteUtil.copyKeyBytesIfNecessary(ptr);
+                row.setValue(column, bytes);
+            }
+        }
+        for (Mutation mutation : row.toRowMutations()) {
+            if (replayMutations != null) {
+                mutation.setAttribute(REPLAY_WRITES, replayMutations);
+            } else if (txnProvider != null && projectedTable.getType() == PTableType.INDEX) {
+                mutation = txnProvider.markPutAsCommitted((Put)mutation, ts, ts);
+            }
+            mutations.add(mutation);
+        }
+        for (i = 0; i < selectExpressions.size(); i++) {
+            selectExpressions.get(i).reset();
+        }
+    }
+
+    void insertEmptyKeyValue(List<Cell> results, UngroupedAggregateRegionObserver.MutationList mutations) {
+        Set<Long> timeStamps =
+                Sets.newHashSetWithExpectedSize(results.size());
+        for (Cell kv : results) {
+            long kvts = kv.getTimestamp();
+            if (!timeStamps.contains(kvts)) {
+                Put put = new Put(kv.getRowArray(), kv.getRowOffset(),
+                        kv.getRowLength());
+                put.addColumn(emptyCF, QueryConstants.EMPTY_COLUMN_BYTES, kvts,
+                        ByteUtil.EMPTY_BYTE_ARRAY);
+                mutations.add(put);
+            }
+        }
+    }
+    @Override
+    public boolean next(List<Cell> resultsToReturn) throws IOException {

Review comment:
       That's a nice improvement too.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [phoenix] kadirozde commented on a change in pull request #936: PHOENIX-5998 Paged server side ungrouped aggregate operations

Posted by GitBox <gi...@apache.org>.

kadirozde commented on a change in pull request #936:
URL: https://github.com/apache/phoenix/pull/936#discussion_r521620095



##########
File path: phoenix-core/src/main/java/org/apache/phoenix/coprocessor/UngroupedAggregateRegionScanner.java
##########
@@ -0,0 +1,646 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.phoenix.coprocessor;
+
+import static org.apache.phoenix.coprocessor.BaseScannerRegionObserver.LOCAL_INDEX_BUILD;
+import static org.apache.phoenix.coprocessor.BaseScannerRegionObserver.LOCAL_INDEX_BUILD_PROTO;
+import static org.apache.phoenix.coprocessor.BaseScannerRegionObserver.REPLAY_WRITES;
+import static org.apache.phoenix.coprocessor.BaseScannerRegionObserver.UPGRADE_DESC_ROW_KEY;
+import static org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver.checkForLocalIndexColumnFamilies;
+import static org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver.deserializeExpressions;
+import static org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver.deserializeTable;
+import static org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver.getBlockingMemstoreSize;
+import static org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver.setIndexAndTransactionProperties;
+import static org.apache.phoenix.query.QueryConstants.AGG_TIMESTAMP;
+import static org.apache.phoenix.query.QueryConstants.SINGLE_COLUMN;
+import static org.apache.phoenix.query.QueryConstants.SINGLE_COLUMN_FAMILY;
+import static org.apache.phoenix.query.QueryServices.MUTATE_BATCH_SIZE_ATTRIB;
+import static org.apache.phoenix.query.QueryServices.MUTATE_BATCH_SIZE_BYTES_ATTRIB;
+import static org.apache.phoenix.query.QueryServices.UNGROUPED_AGGREGATE_PAGE_SIZE_IN_MS;
+import static org.apache.phoenix.schema.PTableImpl.getColumnsToClone;
+
+import java.io.IOException;
+import java.sql.SQLException;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.List;
+import java.util.Set;
+
+import com.google.common.collect.Sets;
+import com.google.common.primitives.Ints;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.hbase.Cell;
+import org.apache.hadoop.hbase.CellUtil;
+import org.apache.hadoop.hbase.DoNotRetryIOException;
+import org.apache.hadoop.hbase.HRegionInfo;
+import org.apache.hadoop.hbase.KeyValue;
+import org.apache.hadoop.hbase.client.Delete;
+import org.apache.hadoop.hbase.client.HTable;
+import org.apache.hadoop.hbase.client.Mutation;
+import org.apache.hadoop.hbase.client.Put;
+import org.apache.hadoop.hbase.client.Scan;
+import org.apache.hadoop.hbase.coprocessor.ObserverContext;
+import org.apache.hadoop.hbase.coprocessor.RegionCoprocessorEnvironment;
+import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
+import org.apache.hadoop.hbase.regionserver.Region;
+import org.apache.hadoop.hbase.regionserver.RegionScanner;
+import org.apache.hadoop.hbase.util.Bytes;
+import org.apache.phoenix.cache.GlobalCache;
+import org.apache.phoenix.cache.TenantCache;
+import org.apache.phoenix.exception.DataExceedsCapacityException;
+import org.apache.phoenix.expression.Expression;
+import org.apache.phoenix.expression.aggregator.Aggregator;
+import org.apache.phoenix.expression.aggregator.Aggregators;
+import org.apache.phoenix.expression.aggregator.ServerAggregators;
+import org.apache.phoenix.hbase.index.covered.update.ColumnReference;
+import org.apache.phoenix.hbase.index.util.GenericKeyValueBuilder;
+import org.apache.phoenix.index.PhoenixIndexCodec;
+import org.apache.phoenix.memory.InsufficientMemoryException;
+import org.apache.phoenix.memory.MemoryManager;
+import org.apache.phoenix.query.QueryConstants;
+import org.apache.phoenix.query.QueryServicesOptions;
+import org.apache.phoenix.schema.PColumn;
+import org.apache.phoenix.schema.PRow;
+import org.apache.phoenix.schema.PTable;
+import org.apache.phoenix.schema.PTableImpl;
+import org.apache.phoenix.schema.PTableType;
+import org.apache.phoenix.schema.RowKeySchema;
+import org.apache.phoenix.schema.SortOrder;
+import org.apache.phoenix.schema.TableRef;
+import org.apache.phoenix.schema.ValueSchema;
+import org.apache.phoenix.schema.tuple.EncodedColumnQualiferCellsList;
+import org.apache.phoenix.schema.tuple.MultiKeyValueTuple;
+import org.apache.phoenix.schema.tuple.PositionBasedMultiKeyValueTuple;
+import org.apache.phoenix.schema.tuple.Tuple;
+import org.apache.phoenix.schema.types.PBinary;
+import org.apache.phoenix.schema.types.PChar;
+import org.apache.phoenix.schema.types.PDataType;
+import org.apache.phoenix.schema.types.PDouble;
+import org.apache.phoenix.schema.types.PFloat;
+import org.apache.phoenix.transaction.PhoenixTransactionContext;
+import org.apache.phoenix.transaction.PhoenixTransactionProvider;
+import org.apache.phoenix.transaction.TransactionFactory;
+import org.apache.phoenix.util.ByteUtil;
+import org.apache.phoenix.util.EncodedColumnsUtil;
+import org.apache.hadoop.hbase.util.Pair;
+import org.apache.phoenix.hbase.index.ValueGetter;
+import org.apache.phoenix.hbase.index.util.ImmutableBytesPtr;
+import org.apache.phoenix.index.IndexMaintainer;
+import org.apache.phoenix.util.EnvironmentEdgeManager;
+import org.apache.phoenix.util.ExpressionUtil;
+import org.apache.phoenix.util.IndexUtil;
+import org.apache.phoenix.util.KeyValueUtil;
+import org.apache.phoenix.util.LogUtil;
+import org.apache.phoenix.util.ScanUtil;
+import org.apache.phoenix.util.ServerUtil;
+import org.apache.phoenix.util.StringUtil;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+public class UngroupedAggregateRegionScanner extends BaseRegionScanner {
+
+    private static final Logger LOGGER = LoggerFactory.getLogger(UngroupedAggregateRegionScanner.class);
+
+    protected long pageSizeInMs = Long.MAX_VALUE;
+    protected int maxBatchSize = 0;
+    protected Scan scan;
+    protected RegionScanner innerScanner;
+    protected Region region;
+    private final UngroupedAggregateRegionObserver ungroupedAggregateRegionObserver;
+    final private RegionCoprocessorEnvironment env;
+    private final boolean useQualifierAsIndex;
+    boolean needToWrite = false;
+    final Pair<Integer, Integer> minMaxQualifiers;
+    byte[][] values = null;
+    PTable.QualifierEncodingScheme encodingScheme;
+    PTable writeToTable = null;
+    PTable projectedTable = null;
+    boolean isDescRowKeyOrderUpgrade;
+    final int offset;
+    boolean buildLocalIndex;
+    List<IndexMaintainer> indexMaintainers;
+    boolean isPKChanging = false;
+    long ts;
+    PhoenixTransactionProvider txnProvider = null;
+    UngroupedAggregateRegionObserver.MutationList indexMutations;
+    boolean isDelete = false;
+    byte[] replayMutations;
+    boolean isUpsert = false;
+    List<Expression> selectExpressions = null;
+    byte[] deleteCQ = null;
+    byte[] deleteCF = null;
+    byte[] emptyCF = null;
+    byte[] indexUUID;
+    byte[] txState;
+    byte[] clientVersionBytes;
+    long blockingMemStoreSize;
+    long maxBatchSizeBytes = 0L;
+    HTable targetHTable = null;
+    boolean incrScanRefCount = false;
+    byte[] indexMaintainersPtr;
+    boolean useIndexProto;
+
+    public UngroupedAggregateRegionScanner(final ObserverContext<RegionCoprocessorEnvironment> c,
+                                           final RegionScanner innerScanner, final Region region, final Scan scan,
+                                           final RegionCoprocessorEnvironment env,
+                                           final UngroupedAggregateRegionObserver ungroupedAggregateRegionObserver)
+            throws IOException, SQLException{
+        super(innerScanner);
+        this.env = env;
+        this.region = region;
+        this.scan = scan;
+        this.ungroupedAggregateRegionObserver = ungroupedAggregateRegionObserver;
+        this.innerScanner = innerScanner;
+        Configuration conf = env.getConfiguration();
+        if (scan.getAttribute(BaseScannerRegionObserver.SERVER_PAGING) != null) {
+            byte[] pageSizeFromScan =
+                    scan.getAttribute(BaseScannerRegionObserver.AGGREGATE_PAGE_ROWS);
+            if (pageSizeFromScan != null) {
+                pageSizeInMs = Bytes.toLong(pageSizeFromScan);
+            } else {
+                pageSizeInMs =
+                        conf.getLong(UNGROUPED_AGGREGATE_PAGE_SIZE_IN_MS,
+                                QueryServicesOptions.DEFAULT_UNGROUPED_AGGREGATE_PAGE_SIZE_IN_MS);
+            }
+        }
+        ts = scan.getTimeRange().getMax();
+        boolean localIndexScan = ScanUtil.isLocalIndex(scan);
+
+        encodingScheme = EncodedColumnsUtil.getQualifierEncodingScheme(scan);
+        int offsetToBe = 0;
+        if (localIndexScan) {
+            /*
+             * For local indexes, we need to set an offset on row key expressions to skip
+             * the region start key.
+             */
+            offsetToBe = region.getRegionInfo().getStartKey().length != 0 ? region.getRegionInfo().getStartKey().length :
+                    region.getRegionInfo().getEndKey().length;
+            ScanUtil.setRowKeyOffset(scan, offsetToBe);
+        }
+        offset = offsetToBe;
+
+        byte[] descRowKeyTableBytes = scan.getAttribute(UPGRADE_DESC_ROW_KEY);
+        isDescRowKeyOrderUpgrade = descRowKeyTableBytes != null;
+        if (isDescRowKeyOrderUpgrade) {
+            LOGGER.debug("Upgrading row key for " + region.getRegionInfo().getTable().getNameAsString());
+            projectedTable = deserializeTable(descRowKeyTableBytes);
+            try {
+                writeToTable = PTableImpl.builderWithColumns(projectedTable,
+                        getColumnsToClone(projectedTable))
+                        .setRowKeyOrderOptimizable(true)
+                        .build();
+            } catch (SQLException e) {
+                ServerUtil.throwIOException("Upgrade failed", e); // Impossible
+            }
+            values = new byte[projectedTable.getPKColumns().size()][];
+        }
+        boolean useProto = false;
+        byte[] localIndexBytes = scan.getAttribute(LOCAL_INDEX_BUILD_PROTO);
+        useProto = localIndexBytes != null;
+        if (localIndexBytes == null) {
+            localIndexBytes = scan.getAttribute(LOCAL_INDEX_BUILD);
+        }
+        indexMaintainers = localIndexBytes == null ? null : IndexMaintainer.deserialize(localIndexBytes, useProto);
+        indexMutations = localIndexBytes == null ? new UngroupedAggregateRegionObserver.MutationList() : new UngroupedAggregateRegionObserver.MutationList(1024);
+
+        replayMutations = scan.getAttribute(REPLAY_WRITES);
+        indexUUID = scan.getAttribute(PhoenixIndexCodec.INDEX_UUID);
+        txState = scan.getAttribute(BaseScannerRegionObserver.TX_STATE);
+        clientVersionBytes = scan.getAttribute(BaseScannerRegionObserver.CLIENT_VERSION);
+        if (txState != null) {
+            int clientVersion = clientVersionBytes == null ? ScanUtil.UNKNOWN_CLIENT_VERSION : Bytes.toInt(clientVersionBytes);
+            txnProvider = TransactionFactory.getTransactionProvider(txState, clientVersion);
+        }
+        byte[] upsertSelectTable = scan.getAttribute(BaseScannerRegionObserver.UPSERT_SELECT_TABLE);
+        if (upsertSelectTable != null) {
+            isUpsert = true;
+            projectedTable = deserializeTable(upsertSelectTable);
+            targetHTable = new HTable(ungroupedAggregateRegionObserver.getUpsertSelectConfig(),
+                    projectedTable.getPhysicalName().getBytes());
+            selectExpressions = deserializeExpressions(scan.getAttribute(BaseScannerRegionObserver.UPSERT_SELECT_EXPRS));
+            values = new byte[projectedTable.getPKColumns().size()][];
+            isPKChanging = ExpressionUtil.isPkPositionChanging(new TableRef(projectedTable), selectExpressions);
+        } else {
+            byte[] isDeleteAgg = scan.getAttribute(BaseScannerRegionObserver.DELETE_AGG);
+            isDelete = isDeleteAgg != null && Bytes.compareTo(PDataType.TRUE_BYTES, isDeleteAgg) == 0;
+            if (!isDelete) {
+                deleteCF = scan.getAttribute(BaseScannerRegionObserver.DELETE_CF);
+                deleteCQ = scan.getAttribute(BaseScannerRegionObserver.DELETE_CQ);
+            }
+            emptyCF = scan.getAttribute(BaseScannerRegionObserver.EMPTY_CF);
+        }
+        ColumnReference[] dataColumns = IndexUtil.deserializeDataTableColumnsToJoin(scan);
+        useQualifierAsIndex = EncodedColumnsUtil.useQualifierAsIndex(EncodedColumnsUtil.getMinMaxQualifiersFromScan(scan));
+
+        /**
+         * Slow down the writes if the memstore size more than
+         * (hbase.hregion.memstore.block.multiplier - 1) times hbase.hregion.memstore.flush.size
+         * bytes. This avoids flush storm to hdfs for cases like index building where reads and
+         * write happen to all the table regions in the server.
+         */
+        blockingMemStoreSize = getBlockingMemstoreSize(region, conf) ;
+
+        buildLocalIndex = indexMaintainers != null && dataColumns==null && !localIndexScan;
+        if(buildLocalIndex) {
+            checkForLocalIndexColumnFamilies(region, indexMaintainers);
+        }
+        if (isDescRowKeyOrderUpgrade || isDelete || isUpsert
+                || (deleteCQ != null && deleteCF != null) || emptyCF != null || buildLocalIndex) {
+            needToWrite = true;
+            if((isUpsert && (targetHTable == null ||
+                    !targetHTable.getName().equals(region.getTableDesc().getTableName())))) {
+                needToWrite = false;
+            }
+            maxBatchSize = conf.getInt(MUTATE_BATCH_SIZE_ATTRIB, QueryServicesOptions.DEFAULT_MUTATE_BATCH_SIZE);
+            maxBatchSizeBytes = conf.getLong(MUTATE_BATCH_SIZE_BYTES_ATTRIB,
+                    QueryServicesOptions.DEFAULT_MUTATE_BATCH_SIZE_BYTES);
+        }
+        minMaxQualifiers = EncodedColumnsUtil.getMinMaxQualifiersFromScan(scan);
+        if (LOGGER.isDebugEnabled()) {
+            LOGGER.debug(LogUtil.addCustomAnnotations("Starting ungrouped coprocessor scan " + scan + " " + region.getRegionInfo(), ScanUtil.getCustomAnnotations(scan)));
+        }
+        useIndexProto = true;
+        indexMaintainersPtr = scan.getAttribute(PhoenixIndexCodec.INDEX_PROTO_MD);
+        // for backward compatiblity fall back to look by the old attribute
+        if (indexMaintainersPtr == null) {
+            indexMaintainersPtr = scan.getAttribute(PhoenixIndexCodec.INDEX_MD);
+            useIndexProto = false;
+        }
+
+        if (needToWrite) {
+            ungroupedAggregateRegionObserver.incrementScansReferenceCount();
+            incrScanRefCount = true;
+        }
+    }
+
+    @Override
+    public HRegionInfo getRegionInfo() {
+        return region.getRegionInfo();
+    }
+
+    @Override
+    public boolean isFilterDone() {
+        return false;
+    }
+
+    @Override
+    public void close() throws IOException {
+        if (needToWrite && incrScanRefCount) {
+            ungroupedAggregateRegionObserver.decrementScansReferenceCount();
+        }
+        try {
+            if (targetHTable != null) {
+                try {
+                    targetHTable.close();
+                } catch (IOException e) {
+                    LOGGER.error("Closing table: " + targetHTable + " failed: ", e);
+                }
+            }
+        } finally {
+            innerScanner.close();
+        }
+    }
+
+    boolean descRowKeyOrderUpgrade(List<Cell> results, ImmutableBytesWritable ptr,
+                                UngroupedAggregateRegionObserver.MutationList mutations) throws IOException {
+        Arrays.fill(values, null);
+        Cell firstKV = results.get(0);
+        RowKeySchema schema = projectedTable.getRowKeySchema();
+        int maxOffset = schema.iterator(firstKV.getRowArray(), firstKV.getRowOffset() + offset, firstKV.getRowLength(), ptr);
+        for (int i = 0; i < schema.getFieldCount(); i++) {
+            Boolean hasValue = schema.next(ptr, i, maxOffset);
+            if (hasValue == null) {
+                break;
+            }
+            ValueSchema.Field field = schema.getField(i);
+            if (field.getSortOrder() == SortOrder.DESC) {
+                // Special case for re-writing DESC ARRAY, as the actual byte value needs to change in this case
+                if (field.getDataType().isArrayType()) {
+                    field.getDataType().coerceBytes(ptr, null, field.getDataType(),
+                            field.getMaxLength(), field.getScale(), field.getSortOrder(),
+                            field.getMaxLength(), field.getScale(), field.getSortOrder(), true); // force to use correct separator byte
+                }
+                // Special case for re-writing DESC CHAR or DESC BINARY, to force the re-writing of trailing space characters
+                else if (field.getDataType() == PChar.INSTANCE || field.getDataType() == PBinary.INSTANCE) {
+                    int len = ptr.getLength();
+                    while (len > 0 && ptr.get()[ptr.getOffset() + len - 1] == StringUtil.SPACE_UTF8) {
+                        len--;
+                    }
+                    ptr.set(ptr.get(), ptr.getOffset(), len);
+                    // Special case for re-writing DESC FLOAT and DOUBLE, as they're not inverted like they should be (PHOENIX-2171)
+                } else if (field.getDataType() == PFloat.INSTANCE || field.getDataType() == PDouble.INSTANCE) {
+                    byte[] invertedBytes = SortOrder.invert(ptr.get(), ptr.getOffset(), ptr.getLength());
+                    ptr.set(invertedBytes);
+                }
+            } else if (field.getDataType() == PBinary.INSTANCE) {
+                // Remove trailing space characters so that the setValues call below will replace them
+                // with the correct zero byte character. Note this is somewhat dangerous as these
+                // could be legit, but I don't know what the alternative is.
+                int len = ptr.getLength();
+                while (len > 0 && ptr.get()[ptr.getOffset() + len - 1] == StringUtil.SPACE_UTF8) {
+                    len--;
+                }
+                ptr.set(ptr.get(), ptr.getOffset(), len);
+            }
+            values[i] = ptr.copyBytes();
+        }
+        writeToTable.newKey(ptr, values);
+        if (Bytes.compareTo(
+                firstKV.getRowArray(), firstKV.getRowOffset() + offset, firstKV.getRowLength(),
+                ptr.get(),ptr.getOffset() + offset,ptr.getLength()) == 0) {
+            return false;
+        }
+        byte[] newRow = ByteUtil.copyKeyBytesIfNecessary(ptr);
+        if (offset > 0) { // for local indexes (prepend region start key)
+            byte[] newRowWithOffset = new byte[offset + newRow.length];
+            System.arraycopy(firstKV.getRowArray(), firstKV.getRowOffset(), newRowWithOffset, 0, offset);;
+            System.arraycopy(newRow, 0, newRowWithOffset, offset, newRow.length);
+            newRow = newRowWithOffset;
+        }
+        byte[] oldRow = Bytes.copy(firstKV.getRowArray(), firstKV.getRowOffset(), firstKV.getRowLength());
+        for (Cell cell : results) {
+            // Copy existing cell but with new row key
+            Cell newCell = new KeyValue(newRow, 0, newRow.length,
+                    cell.getFamilyArray(), cell.getFamilyOffset(), cell.getFamilyLength(),
+                    cell.getQualifierArray(), cell.getQualifierOffset(), cell.getQualifierLength(),
+                    cell.getTimestamp(), KeyValue.Type.codeToType(cell.getTypeByte()),
+                    cell.getValueArray(), cell.getValueOffset(), cell.getValueLength());
+            switch (KeyValue.Type.codeToType(cell.getTypeByte())) {
+                case Put:
+                    // If Put, point delete old Put
+                    Delete del = new Delete(oldRow);
+                    del.addDeleteMarker(new KeyValue(cell.getRowArray(), cell.getRowOffset(), cell.getRowLength(),
+                            cell.getFamilyArray(), cell.getFamilyOffset(), cell.getFamilyLength(),
+                            cell.getQualifierArray(), cell.getQualifierOffset(),
+                            cell.getQualifierLength(), cell.getTimestamp(), KeyValue.Type.Delete,
+                            ByteUtil.EMPTY_BYTE_ARRAY, 0, 0));
+                    mutations.add(del);
+
+                    Put put = new Put(newRow);
+                    put.add(newCell);
+                    mutations.add(put);
+                    break;
+                case Delete:
+                case DeleteColumn:
+                case DeleteFamily:
+                case DeleteFamilyVersion:
+                    Delete delete = new Delete(newRow);
+                    delete.addDeleteMarker(newCell);
+                    mutations.add(delete);
+                    break;
+            }
+        }
+        return true;
+    }
+
+    void buildLocalIndex(Tuple result, List<Cell> results, ImmutableBytesWritable ptr,
+                         UngroupedAggregateRegionObserver.MutationList mutations) throws IOException {
+        for (IndexMaintainer maintainer : indexMaintainers) {
+            if (!results.isEmpty()) {
+                result.getKey(ptr);
+                ValueGetter valueGetter =
+                        maintainer.createGetterFromKeyValues(
+                                ImmutableBytesPtr.copyBytesIfNecessary(ptr),
+                                results);
+                Put put = maintainer.buildUpdateMutation(GenericKeyValueBuilder.INSTANCE,
+                        valueGetter, ptr, results.get(0).getTimestamp(),
+                        env.getRegion().getRegionInfo().getStartKey(),
+                        env.getRegion().getRegionInfo().getEndKey());
+
+                if (txnProvider != null) {
+                    put = txnProvider.markPutAsCommitted(put, ts, ts);
+                }
+                indexMutations.add(put);
+            }
+        }
+        result.setKeyValues(results);
+    }
+    void deleteRow(List<Cell> results, UngroupedAggregateRegionObserver.MutationList mutations) {
+        // FIXME: the version of the Delete constructor without the lock
+        // args was introduced in 0.94.4, thus if we try to use it here
+        // we can no longer use the 0.94.2 version of the client.
+        Cell firstKV = results.get(0);
+        Delete delete = new Delete(firstKV.getRowArray(),
+                firstKV.getRowOffset(), firstKV.getRowLength(),ts);
+        if (replayMutations != null) {
+            delete.setAttribute(REPLAY_WRITES, replayMutations);
+        }
+        mutations.add(delete);
+        // force tephra to ignore this deletes
+        delete.setAttribute(PhoenixTransactionContext.TX_ROLLBACK_ATTRIBUTE_KEY, new byte[0]);
+    }
+
+    void deleteCForQ(Tuple result, List<Cell> results, UngroupedAggregateRegionObserver.MutationList mutations) {
+        // No need to search for delete column, since we project only it
+        // if no empty key value is being set
+        if (emptyCF == null ||
+                result.getValue(deleteCF, deleteCQ) != null) {
+            Delete delete = new Delete(results.get(0).getRowArray(),
+                    results.get(0).getRowOffset(),
+                    results.get(0).getRowLength());
+            delete.deleteColumns(deleteCF,  deleteCQ, ts);
+            // force tephra to ignore this deletes
+            delete.setAttribute(PhoenixTransactionContext.TX_ROLLBACK_ATTRIBUTE_KEY, new byte[0]);
+            mutations.add(delete);
+        }
+    }
+    void upsert(Tuple result, ImmutableBytesWritable ptr, UngroupedAggregateRegionObserver.MutationList mutations) {
+        Arrays.fill(values, null);
+        int bucketNumOffset = 0;
+        if (projectedTable.getBucketNum() != null) {
+            values[0] = new byte[] { 0 };
+            bucketNumOffset = 1;
+        }
+        int i = bucketNumOffset;
+        List<PColumn> projectedColumns = projectedTable.getColumns();
+        for (; i < projectedTable.getPKColumns().size(); i++) {
+            Expression expression = selectExpressions.get(i - bucketNumOffset);
+            if (expression.evaluate(result, ptr)) {
+                values[i] = ptr.copyBytes();
+                // If SortOrder from expression in SELECT doesn't match the
+                // column being projected into then invert the bits.
+                if (expression.getSortOrder() !=
+                        projectedColumns.get(i).getSortOrder()) {
+                    SortOrder.invert(values[i], 0, values[i], 0,
+                            values[i].length);
+                }
+            }else{
+                values[i] = ByteUtil.EMPTY_BYTE_ARRAY;
+            }
+        }
+        projectedTable.newKey(ptr, values);
+        PRow row = projectedTable.newRow(GenericKeyValueBuilder.INSTANCE, ts, ptr, false);
+        for (; i < projectedColumns.size(); i++) {
+            Expression expression = selectExpressions.get(i - bucketNumOffset);
+            if (expression.evaluate(result, ptr)) {
+                PColumn column = projectedColumns.get(i);
+                if (!column.getDataType().isSizeCompatible(ptr, null,
+                        expression.getDataType(), expression.getSortOrder(),
+                        expression.getMaxLength(), expression.getScale(),
+                        column.getMaxLength(), column.getScale())) {
+                    throw new DataExceedsCapacityException(
+                            column.getDataType(), column.getMaxLength(),
+                            column.getScale(), column.getName().getString(), ptr);
+                }
+                column.getDataType().coerceBytes(ptr, null,
+                        expression.getDataType(), expression.getMaxLength(),
+                        expression.getScale(), expression.getSortOrder(),
+                        column.getMaxLength(), column.getScale(),
+                        column.getSortOrder(), projectedTable.rowKeyOrderOptimizable());
+                byte[] bytes = ByteUtil.copyKeyBytesIfNecessary(ptr);
+                row.setValue(column, bytes);
+            }
+        }
+        for (Mutation mutation : row.toRowMutations()) {
+            if (replayMutations != null) {
+                mutation.setAttribute(REPLAY_WRITES, replayMutations);
+            } else if (txnProvider != null && projectedTable.getType() == PTableType.INDEX) {
+                mutation = txnProvider.markPutAsCommitted((Put)mutation, ts, ts);
+            }
+            mutations.add(mutation);
+        }
+        for (i = 0; i < selectExpressions.size(); i++) {
+            selectExpressions.get(i).reset();
+        }
+    }
+
+    void insertEmptyKeyValue(List<Cell> results, UngroupedAggregateRegionObserver.MutationList mutations) {
+        Set<Long> timeStamps =
+                Sets.newHashSetWithExpectedSize(results.size());
+        for (Cell kv : results) {
+            long kvts = kv.getTimestamp();
+            if (!timeStamps.contains(kvts)) {
+                Put put = new Put(kv.getRowArray(), kv.getRowOffset(),
+                        kv.getRowLength());
+                put.addColumn(emptyCF, QueryConstants.EMPTY_COLUMN_BYTES, kvts,
+                        ByteUtil.EMPTY_BYTE_ARRAY);
+                mutations.add(put);
+            }
+        }
+    }
+    @Override
+    public boolean next(List<Cell> resultsToReturn) throws IOException {
+        boolean hasMore;
+        long startTime = EnvironmentEdgeManager.currentTimeMillis();
+        Configuration conf = env.getConfiguration();
+        final TenantCache tenantCache = GlobalCache.getTenantCache(env, ScanUtil.getTenantId(scan));
+        try (MemoryManager.MemoryChunk em = tenantCache.getMemoryManager().allocate(0)) {
+            Aggregators aggregators = ServerAggregators.deserialize(
+                    scan.getAttribute(BaseScannerRegionObserver.AGGREGATORS), conf, em);
+            Aggregator[] rowAggregators = aggregators.getAggregators();
+            aggregators.reset(rowAggregators);
+            Cell lastCell = null;
+            boolean hasAny = false;
+            ImmutableBytesWritable ptr = new ImmutableBytesWritable();
+            Tuple result = useQualifierAsIndex ? new PositionBasedMultiKeyValueTuple() : new MultiKeyValueTuple();
+            UngroupedAggregateRegionObserver.MutationList mutations = new UngroupedAggregateRegionObserver.MutationList();
+            if (isDescRowKeyOrderUpgrade || isDelete || isUpsert
+                    || (deleteCQ != null && deleteCF != null) || emptyCF != null || buildLocalIndex) {
+                mutations = new UngroupedAggregateRegionObserver.MutationList(Ints.saturatedCast(maxBatchSize + maxBatchSize / 10));
+            }
+            region.startRegionOperation();
+            try {
+                synchronized (innerScanner) {
+                    do {
+                        ungroupedAggregateRegionObserver.checkForRegionClosing();
+                        List<Cell> results = useQualifierAsIndex ? new EncodedColumnQualiferCellsList(minMaxQualifiers.getFirst(), minMaxQualifiers.getSecond(), encodingScheme) : new ArrayList<Cell>();
+                        // Results are potentially returned even when the return value of s.next is false
+                        // since this is an indication of whether or not there are more values after the
+                        // ones returned
+                        hasMore = innerScanner.nextRaw(results);
+                        if (!results.isEmpty()) {
+                            lastCell = results.get(0);
+                            result.setKeyValues(results);
+                            if (isDescRowKeyOrderUpgrade) {
+                                if (!descRowKeyOrderUpgrade(results, ptr, mutations)) {
+                                    continue;
+                                }
+                            } else if (buildLocalIndex) {
+                                buildLocalIndex(result, results, ptr, mutations);
+                            } else if (isDelete) {
+                                deleteRow(results, mutations);
+                            } else if (isUpsert) {
+                                upsert(result, ptr, mutations);
+                            } else if (deleteCF != null && deleteCQ != null) {
+                                deleteCForQ(result, results, mutations);
+                            }
+                            if (emptyCF != null) {
+                                /*
+                                 * If we've specified an emptyCF, then we need to insert an empty
+                                 * key value "retroactively" for any key value that is visible at
+                                 * the timestamp that the DDL was issued. Key values that are not
+                                 * visible at this timestamp will not ever be projected up to
+                                 * scans past this timestamp, so don't need to be considered.
+                                 * We insert one empty key value per row per timestamp.
+                                 */
+                                insertEmptyKeyValue(results, mutations);
+                            }
+                            if (ServerUtil.readyToCommit(mutations.size(), mutations.byteSize(), maxBatchSize, maxBatchSizeBytes)) {
+                                ungroupedAggregateRegionObserver.commit(region, mutations, indexUUID, blockingMemStoreSize, indexMaintainersPtr,
+                                        txState, targetHTable, useIndexProto, isPKChanging, clientVersionBytes);
+                                mutations.clear();
+                            }
+                            // Commit in batches based on UPSERT_BATCH_SIZE_BYTES_ATTRIB in config
+
+                            if (ServerUtil.readyToCommit(indexMutations.size(), indexMutations.byteSize(), maxBatchSize, maxBatchSizeBytes)) {
+                                setIndexAndTransactionProperties(indexMutations, indexUUID, indexMaintainersPtr, txState, clientVersionBytes, useIndexProto);
+                                ungroupedAggregateRegionObserver.commitBatch(region, indexMutations, blockingMemStoreSize);
+                                indexMutations.clear();
+                            }
+                            aggregators.aggregate(rowAggregators, result);
+                            hasAny = true;
+                        }
+                    } while (hasMore && (EnvironmentEdgeManager.currentTimeMillis() - startTime) < pageSizeInMs);

Review comment:
       The duration for a page within coproc will be a fraction of the HBase client timeout value (1 second vs 2 minutes). As long as each iteration does not take too long, i.e., minutes, this Jira will be sufficient to eliminate the timeouts.  However,  an iteration of a region scanner can take minutes. This happens when the filter set on the scan is very selective or there are too many delete markers in the table region. 6211 will address this remaining issue. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [phoenix] stoty commented on pull request #936: PHOENIX-5998 Paged server side ungrouped aggregate operations

Posted by GitBox <gi...@apache.org>.

stoty commented on pull request #936:
URL: https://github.com/apache/phoenix/pull/936#issuecomment-716301665


   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |:----:|----------:|--------:|:--------|
   | +0 :ok: |  reexec  |   6m 44s |  Docker mode activated.  |
   ||| _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  No case conflicting files found.  |
   | +1 :green_heart: |  hbaseanti  |   0m  0s |  Patch does not have any anti-patterns.  |
   | +1 :green_heart: |  @author  |   0m  0s |  The patch does not contain any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  The patch appears to include 1 new or modified test files.  |
   ||| _ 4.x Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  11m 30s |  4.x passed  |
   | +1 :green_heart: |  compile  |   1m  0s |  4.x passed  |
   | +1 :green_heart: |  checkstyle  |   1m  2s |  4.x passed  |
   | +1 :green_heart: |  javadoc  |   0m 46s |  4.x passed  |
   | +0 :ok: |  spotbugs  |   3m 12s |  phoenix-core in 4.x has 954 extant spotbugs warnings.  |
   ||| _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   6m  8s |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 58s |  the patch passed  |
   | +1 :green_heart: |  javac  |   0m 58s |  the patch passed  |
   | -1 :x: |  checkstyle  |   1m  5s |  phoenix-core: The patch generated 314 new + 1796 unchanged - 219 fixed = 2110 total (was 2015)  |
   | +1 :green_heart: |  whitespace  |   0m  0s |  The patch has no whitespace issues.  |
   | +1 :green_heart: |  javadoc  |   0m 46s |  the patch passed  |
   | -1 :x: |  spotbugs  |   3m 21s |  phoenix-core generated 1 new + 953 unchanged - 1 fixed = 954 total (was 954)  |
   ||| _ Other Tests _ |
   | -1 :x: |  unit  | 184m 43s |  phoenix-core in the patch failed.  |
   | +1 :green_heart: |  asflicense  |   0m 36s |  The patch does not generate ASF License warnings.  |
   |  |   | 224m 34s |   |
   
   
   | Reason | Tests |
   |-------:|:------|
   | FindBugs | module:phoenix-core |
   |  |  Switch statement found in org.apache.phoenix.coprocessor.UngroupedAggregateRegionScanner.descRowKeyOrderUpgrade(List, ImmutableBytesWritable, UngroupedAggregateRegionObserver$MutationList) where default case is missing  At UngroupedAggregateRegionScanner.java:ImmutableBytesWritable, UngroupedAggregateRegionObserver$MutationList) where default case is missing  At UngroupedAggregateRegionScanner.java:[lines 380-401] |
   | Failed junit tests | phoenix.end2end.index.PartialIndexRebuilderIT |
   |   | phoenix.end2end.SpillableGroupByIT |
   |   | phoenix.end2end.QueryMoreIT |
   |   | TEST-[IntArithmeticIT_2] |
   |   | phoenix.end2end.UpsertSelectIT |
   |   | phoenix.end2end.index.IndexWithTableSchemaChangeIT |
   |   | phoenix.end2end.QueryDatabaseMetaDataIT |
   |   | phoenix.end2end.index.MutableIndexIT |
   |   | phoenix.end2end.ArithmeticQueryIT |
   |   | phoenix.end2end.DerivedTableIT |
   |   | TEST-[GroupByIT_2] |
   |   | phoenix.end2end.OrphanViewToolIT |
   |   | phoenix.end2end.QueryLoggerIT |
   |   | phoenix.end2end.MetaDataEndpointImplIT |
   
   
   | Subsystem | Report/Notes |
   |----------:|:-------------|
   | Docker | ClientAPI=1.40 ServerAPI=1.40 base: https://ci-hadoop.apache.org/job/Phoenix/job/Phoenix-PreCommit-GitHub-PR/job/PR-936/4/artifact/yetus-general-check/output/Dockerfile |
   | GITHUB PR | https://github.com/apache/phoenix/pull/936 |
   | Optional Tests | dupname asflicense javac javadoc unit spotbugs hbaseanti checkstyle compile |
   | uname | Linux 5cec32287835 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev/phoenix-personality.sh |
   | git revision | 4.x / 135692c |
   | Default Java | Private Build-1.8.0_242-8u242-b08-0ubuntu3~16.04-b08 |
   | checkstyle | https://ci-hadoop.apache.org/job/Phoenix/job/Phoenix-PreCommit-GitHub-PR/job/PR-936/4/artifact/yetus-general-check/output/diff-checkstyle-phoenix-core.txt |
   | spotbugs | https://ci-hadoop.apache.org/job/Phoenix/job/Phoenix-PreCommit-GitHub-PR/job/PR-936/4/artifact/yetus-general-check/output/new-spotbugs-phoenix-core.html |
   | unit | https://ci-hadoop.apache.org/job/Phoenix/job/Phoenix-PreCommit-GitHub-PR/job/PR-936/4/artifact/yetus-general-check/output/patch-unit-phoenix-core.txt |
   |  Test Results | https://ci-hadoop.apache.org/job/Phoenix/job/Phoenix-PreCommit-GitHub-PR/job/PR-936/4/testReport/ |
   | Max. process+thread count | 4730 (vs. ulimit of 30000) |
   | modules | C: phoenix-core U: phoenix-core |
   | Console output | https://ci-hadoop.apache.org/job/Phoenix/job/Phoenix-PreCommit-GitHub-PR/job/PR-936/4/console |
   | versions | git=2.7.4 maven=3.3.9 spotbugs=4.1.3 |
   | Powered by | Apache Yetus 0.12.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [phoenix] kadirozde commented on pull request #936: PHOENIX-5998 Paged server side ungrouped aggregate operations

Posted by GitBox <gi...@apache.org>.

kadirozde commented on pull request #936:
URL: https://github.com/apache/phoenix/pull/936#issuecomment-716866775


   > @kadirozde - UpsertSelectIT seems to have crashed in the last test run. (And timeouts for SequencePointInTimeIT and IndexToolForNonTxGlobalIndexIT). Have these passed elsewhere?
   
   Please check the failures, they are due to minicluster setup and operation timeout. They pass for me locally.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [phoenix] gokceni commented on a change in pull request #936: PHOENIX-5998 Paged server side ungrouped aggregate operations

Posted by GitBox <gi...@apache.org>.

gokceni commented on a change in pull request #936:
URL: https://github.com/apache/phoenix/pull/936#discussion_r511027115



##########
File path: phoenix-core/src/main/java/org/apache/phoenix/iterate/UngroupedAggregatingResultIterator.java
##########
@@ -36,19 +37,33 @@ public UngroupedAggregatingResultIterator( PeekingResultIterator resultIterator,
     
     @Override
     public Tuple next() throws SQLException {
-        Tuple result = super.next();
+        byte[] value;
+        Tuple result = resultIterator.next();
+        // We should reset ClientAggregators here in case they are being reused in a new ResultIterator.
+        aggregators.reset(aggregators.getAggregators());
         // Ensure ungrouped aggregregation always returns a row, even if the underlying iterator doesn't.
-        if (result == null && !hasRows) {
-            // We should reset ClientAggregators here in case they are being reused in a new ResultIterator.
-            aggregators.reset(aggregators.getAggregators());
-            byte[] value = aggregators.toBytes(aggregators.getAggregators());
-            result = new SingleKeyValueTuple(
-                    KeyValueUtil.newKeyValue(UNGROUPED_AGG_ROW_KEY, 
-                            SINGLE_COLUMN_FAMILY, 
-                            SINGLE_COLUMN, 
-                            AGG_TIMESTAMP, 
-                            value));
+        if (result == null) {
+            if (hasRows) {
+                return null;

Review comment:
       This doesn't match with the above comment of always returning rows.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [phoenix] stoty commented on pull request #936: PHOENIX-5998 Paged server side ungrouped aggregate operations

Posted by GitBox <gi...@apache.org>.

stoty commented on pull request #936:
URL: https://github.com/apache/phoenix/pull/936#issuecomment-725669814


   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |:----:|----------:|--------:|:--------|
   | +0 :ok: |  reexec  |   0m 55s |  Docker mode activated.  |
   ||| _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  No case conflicting files found.  |
   | +1 :green_heart: |  hbaseanti  |   0m  0s |  Patch does not have any anti-patterns.  |
   | +1 :green_heart: |  @author  |   0m  0s |  The patch does not contain any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  The patch appears to include 1 new or modified test files.  |
   ||| _ 4.x Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  11m 16s |  4.x passed  |
   | +1 :green_heart: |  compile  |   1m  0s |  4.x passed  |
   | +1 :green_heart: |  checkstyle  |   1m  9s |  4.x passed  |
   | +1 :green_heart: |  javadoc  |   0m 49s |  4.x passed  |
   | +0 :ok: |  spotbugs  |   3m  8s |  phoenix-core in 4.x has 946 extant spotbugs warnings.  |
   ||| _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   5m 57s |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m  1s |  the patch passed  |
   | +1 :green_heart: |  javac  |   1m  1s |  the patch passed  |
   | -1 :x: |  checkstyle  |   1m 10s |  phoenix-core: The patch generated 323 new + 1785 unchanged - 230 fixed = 2108 total (was 2015)  |
   | +1 :green_heart: |  whitespace  |   0m  0s |  The patch has no whitespace issues.  |
   | +1 :green_heart: |  javadoc  |   0m 44s |  the patch passed  |
   | -1 :x: |  spotbugs  |   3m 21s |  phoenix-core generated 1 new + 945 unchanged - 1 fixed = 946 total (was 946)  |
   ||| _ Other Tests _ |
   | -1 :x: |  unit  | 132m  4s |  phoenix-core in the patch failed.  |
   | +1 :green_heart: |  asflicense  |   0m 38s |  The patch does not generate ASF License warnings.  |
   |  |   | 166m  2s |   |
   
   
   | Reason | Tests |
   |-------:|:------|
   | FindBugs | module:phoenix-core |
   |  |  Switch statement found in org.apache.phoenix.coprocessor.UngroupedAggregateRegionScanner.descRowKeyOrderUpgrade(List, ImmutableBytesWritable, UngroupedAggregateRegionObserver$MutationList) where default case is missing  At UngroupedAggregateRegionScanner.java:ImmutableBytesWritable, UngroupedAggregateRegionObserver$MutationList) where default case is missing  At UngroupedAggregateRegionScanner.java:[lines 383-404] |
   | Failed junit tests | phoenix.end2end.index.DropColumnIT |
   |   | phoenix.end2end.PointInTimeQueryIT |
   
   
   | Subsystem | Report/Notes |
   |----------:|:-------------|
   | Docker | ClientAPI=1.40 ServerAPI=1.40 base: https://ci-hadoop.apache.org/job/Phoenix/job/Phoenix-PreCommit-GitHub-PR/job/PR-936/9/artifact/yetus-general-check/output/Dockerfile |
   | GITHUB PR | https://github.com/apache/phoenix/pull/936 |
   | Optional Tests | dupname asflicense javac javadoc unit spotbugs hbaseanti checkstyle compile |
   | uname | Linux cbaa4632fb3d 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev/phoenix-personality.sh |
   | git revision | 4.x / c9c80b2 |
   | Default Java | Private Build-1.8.0_242-8u242-b08-0ubuntu3~16.04-b08 |
   | checkstyle | https://ci-hadoop.apache.org/job/Phoenix/job/Phoenix-PreCommit-GitHub-PR/job/PR-936/9/artifact/yetus-general-check/output/diff-checkstyle-phoenix-core.txt |
   | spotbugs | https://ci-hadoop.apache.org/job/Phoenix/job/Phoenix-PreCommit-GitHub-PR/job/PR-936/9/artifact/yetus-general-check/output/new-spotbugs-phoenix-core.html |
   | unit | https://ci-hadoop.apache.org/job/Phoenix/job/Phoenix-PreCommit-GitHub-PR/job/PR-936/9/artifact/yetus-general-check/output/patch-unit-phoenix-core.txt |
   |  Test Results | https://ci-hadoop.apache.org/job/Phoenix/job/Phoenix-PreCommit-GitHub-PR/job/PR-936/9/testReport/ |
   | Max. process+thread count | 6532 (vs. ulimit of 30000) |
   | modules | C: phoenix-core U: phoenix-core |
   | Console output | https://ci-hadoop.apache.org/job/Phoenix/job/Phoenix-PreCommit-GitHub-PR/job/PR-936/9/console |
   | versions | git=2.7.4 maven=3.3.9 spotbugs=4.1.3 |
   | Powered by | Apache Yetus 0.12.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org