You are viewing a plain text version of this content. The canonical link for it is here.
Posted to codereview@trafodion.apache.org by eowhadi <gi...@git.apache.org> on 2016/03/18 18:01:06 UTC

[GitHub] incubator-trafodion pull request: [TRAFODION-1900] Optimize MDAM s...

GitHub user eowhadi opened a pull request:

    https://github.com/apache/incubator-trafodion/pull/395

    [TRAFODION-1900] Optimize MDAM scans with small scanner

    When doing MDAM scans, we are performing interlaced scan for PROBE and for real scan. The probes always return only 1 row, then we close the scanner immediately, therefore should use always small scanner. I will make it conditional on the existing CQD HBASE_SMALL_SCANNER (ether SYSTEM or ON). In addition, caching of blocks retrieved by probe should always we at least receiving one succesfull cache hit on the next MDAM scan, therefore forcing caching ON for MDAM prob is a good idea. Again, will make this forcing conditional on HBASE_SMALL_SCANNER SYSTEM or ON.
    Then for the real scan part of MDAM, I will use the following heuristic: If previous scan fitted in one hbase block, then it is likelly than next will also fit in one hbase block, therefore enable small scanner for next scan. Again all this only if CQD above is ON or SYSTEM.
    Also includes a fix where SMALL_SCANNER would be turned on for MDAM scan because the compiler for MDAM is not polulating the expected number of rows returned.
    Results of using small scanner on MDAM when it make sense showed a 1.39X speed improvement...

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/eowhadi/incubator-trafodion mdamSmallScanner

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-trafodion/pull/395.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #395
    
----
commit 96bcc40b98694609b1bf20ce465d128852c34a48
Author: Eric Owhadi <er...@esgyn.com>
Date:   2016-03-18T16:52:51Z

    [TRAFODION-1900]
    
    When doing MDAM scans, we are performing interlaced scan for PROBE and for real scan. The probes always return only 1 row, then we close the scanner immediately, therefore should use always small scanner. I will make it conditional on the existing CQD HBASE_SMALL_SCANNER (ether SYSTEM or ON). In addition, caching of blocks retrieved by probe should always we at least receiving one succesfull cache hit on the next MDAM scan, therefore forcing caching ON for MDAM prob is a good idea. Again, will make this forcing conditional on HBASE_SMALL_SCANNER SYSTEM or ON.
    Then for the real scan part of MDAM, I will use the following heuristic: If previous scan fitted in one hbase block, then it is likelly than next will also fit in one hbase block, therefore enable small scanner for next scan. Again all this only if CQD above is ON or SYSTEM.
    Also includes a fix where SMALL_SCANNER would be turned on for MDAM scan because the compiler for MDAM is not polulating the expected number of rows returned.
    Results of using small scanner on MDAM when it make sense showed a 1.39X speed improvement...

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-trafodion pull request: [TRAFODION-1900] Optimize MDAM s...

Posted by DaveBirdsall <gi...@git.apache.org>.
Github user DaveBirdsall commented on a diff in the pull request:

    https://github.com/apache/incubator-trafodion/pull/395#discussion_r56853604
  
    --- Diff: core/sql/executor/ExHbaseSelect.cpp ---
    @@ -444,6 +444,7 @@ ExWorkProcRetcode ExHbaseScanSQTaskTcb::work(short &rc)
       Lng32 retcode = 0;
       rc = 0;
       Lng32 remainingInBatch = batchSize_;
    +  NABoolean isFirstBatch = false;
    --- End diff --
    
    Do you want this to be a stack variable? The work method might return at any time (example: when the parent queue fills up) and then get called again, which would cause this variable to become false again. I'm wondering if this variable should be a member instead?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-trafodion pull request: [TRAFODION-1900] Optimize MDAM s...

Posted by DaveBirdsall <gi...@git.apache.org>.
Github user DaveBirdsall commented on a diff in the pull request:

    https://github.com/apache/incubator-trafodion/pull/395#discussion_r56859815
  
    --- Diff: core/sql/executor/ExHbaseSelect.cpp ---
    @@ -444,6 +444,7 @@ ExWorkProcRetcode ExHbaseScanSQTaskTcb::work(short &rc)
       Lng32 retcode = 0;
       rc = 0;
       Lng32 remainingInBatch = batchSize_;
    +  NABoolean isFirstBatch = false;
    --- End diff --
    
    It depends on the queue size. Which is set by the compiler and can be modified by the executor. But just as a general design principle, work methods are designed to interrupt their work at any time (often due to some resource constraint) so it is best to use members for any state that is needed for the life of the operator.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-trafodion pull request: [TRAFODION-1900] Optimize MDAM s...

Posted by eowhadi <gi...@git.apache.org>.
Github user eowhadi commented on a diff in the pull request:

    https://github.com/apache/incubator-trafodion/pull/395#discussion_r56862361
  
    --- Diff: core/sql/executor/ExHbaseSelect.cpp ---
    @@ -444,6 +444,7 @@ ExWorkProcRetcode ExHbaseScanSQTaskTcb::work(short &rc)
       Lng32 retcode = 0;
       rc = 0;
       Lng32 remainingInBatch = batchSize_;
    +  NABoolean isFirstBatch = false;
    --- End diff --
    
    OK so here this boolean is not needed for the full length of the operator. Only for the first 8K rows.
    I can change the logic and maintain a row counter on the tcb. That will have impact memory write for the full length of the operator. By doing my initial trick, I was hopping that I could leverage the existing remainingInBatch counter and not create another one. But I guess if I cannot trust that the parent queue can fill up with less than an HBase_Buffer_size, I cannot use the optimization.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-trafodion pull request: [TRAFODION-1900] Optimize MDAM s...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/incubator-trafodion/pull/395


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-trafodion pull request: [TRAFODION-1900] Optimize MDAM s...

Posted by eowhadi <gi...@git.apache.org>.
Github user eowhadi commented on a diff in the pull request:

    https://github.com/apache/incubator-trafodion/pull/395#discussion_r56858635
  
    --- Diff: core/sql/executor/ExHbaseSelect.cpp ---
    @@ -444,6 +444,7 @@ ExWorkProcRetcode ExHbaseScanSQTaskTcb::work(short &rc)
       Lng32 retcode = 0;
       rc = 0;
       Lng32 remainingInBatch = batchSize_;
    +  NABoolean isFirstBatch = false;
    --- End diff --
    
    I think you are right , I thought the only exit would be due to batch size exceeding, and was seeing batch size =*K, so was good. But now I see  	    if (tcb_->moveRowToUpQueue(tcb_->convertRow_, tcb_->convertRowLen_, 
     				       &rc, FALSE))		 				       &rc, FALSE))
     	      return 1;		 	      return 1;
    
    I am guessing this is what you are talking about (parent queue fill up). Can this happen realistically before the first 8K?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---