You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@quickstep.apache.org by jianqiao <gi...@git.apache.org> on 2016/10/24 07:32:21 UTC

[GitHub] incubator-quickstep pull request #122: Add backend support for LIPFilters.

GitHub user jianqiao opened a pull request:

    https://github.com/apache/incubator-quickstep/pull/122

    Add backend support for LIPFilters.

    This PR follows #113 and #118 and adds backend support for LIPFilters.
    - `BuildHashOperator` supports building of LIPFilters.
    - `SelectOperator`, `HashJoinOperator` and `AggregateOperator` support probing of LIPFilters. 
    
    For `SelectOperator` and `AggregateOperator`, if an filter predicate is present, then the LIPFilters will be applied AFTER the filter predicate.
    
    Here are the performance results for SSB SF100 and TPC-H SF100.
    <table>
      <tr>
        <td><b>SSB SF100</b></td>
        <td><b>master (ms)</b></td>
        <td><b>w/ LIPFilter (ms)</b></td>
      </tr>
      <tr>
        <td>Q01</td>
        <td>885</td>
        <td>955</td>
      </tr>
      <tr>
        <td>Q02</td>
        <td>738</td>
        <td>821</td>
      </tr>
      <tr>
        <td>Q03</td>
        <td>707</td>
        <td>835</td>
      </tr>
      <tr>
        <td>Q04</td>
        <td>1240</td>
        <td>1114</td>
      </tr>
      <tr>
        <td>Q05</td>
        <td>853</td>
        <td>835</td>
      </tr>
      <tr>
        <td>Q06</td>
        <td>751</td>
        <td>975</td>
      </tr>
      <tr>
        <td>Q07</td>
        <td>3109</td>
        <td>2116</td>
      </tr>
      <tr>
        <td>Q08</td>
        <td>1042</td>
        <td>581</td>
      </tr>
      <tr>
        <td>Q09</td>
        <td>786</td>
        <td>710</td>
      </tr>
      <tr>
        <td>Q10</td>
        <td>603</td>
        <td>558</td>
      </tr>
      <tr>
        <td>Q11</td>
        <td>2851</td>
        <td>1410</td>
      </tr>
      <tr>
        <td>Q12</td>
        <td>3279</td>
        <td>908</td>
      </tr>
      <tr>
        <td>Q13</td>
        <td>1122</td>
        <td>904</td>
      </tr>
      <tr>
        <td>Total</td>
        <td>17967</td>
        <td>12721</td>
      </tr>
    </table>
    
    For TPC-H queries, there is one issue with Q21 that two hash tables on the `lineitem` relation are required. Since all the `HashTable`s are constructed in `QueryContext` at the beginning of query execution, so that 75% of the available memory slots (48569 out of 64385) are occupied which can not be swapped out by `StorageManager`'s `EvictionPolicy`. This incurs heavy _spilling_ behavior and results in over 120 seconds running time for Q21 in master branch / occasional DNF in LIPFilter branch. One quick solution to bypass this problem is to relax the buffer pool size (set `-buffer_pool_slots=100000`). For a long term solution, we may
    (1) reduce hash table size by using untyped values;
    (2) delay allocating hash table memory unless it is actually used;
    (3) revise scheduler to be aware of resource requirements.
    
    (**master** branch's performance is from Harshad's experiment #121)
    <table>
      <tr>
        <td><b>TPCH SF100</b></td>
        <td><b>master (ms)</b></td>
        <td><b>w/ LIPFilter (ms)</b></td>
        <td><b>w/ LIPFilter (ms)<br />-buffer_pool_slots=100000</b></td>
      </tr>
      <tr>
        <td>Q01</td>
        <td>16,046</td>
        <td>15180</td>
        <td>15238</td>
      </tr>
      <tr>
        <td>Q02</td>
        <td>5,625</td>
        <td>710</td>
        <td>744</td>
      </tr>
      <tr>
        <td>Q03</td>
        <td>6,861</td>
        <td>5069</td>
        <td>4907</td>
      </tr>
      <tr>
        <td>Q04</td>
        <td>2,662</td>
        <td>2617</td>
        <td>2448</td>
      </tr>
      <tr>
        <td>Q05</td>
        <td>4,364</td>
        <td>5966</td>
        <td>4499</td>
      </tr>
      <tr>
        <td>Q06</td>
        <td>398</td>
        <td>401</td>
        <td>395</td>
      </tr>
      <tr>
        <td>Q07</td>
        <td>23,367</td>
        <td>25836</td>
        <td>24860</td>
      </tr>
      <tr>
        <td>Q08</td>
        <td>3,274</td>
        <td>1714</td>
        <td>1733</td>
      </tr>
      <tr>
        <td>Q09</td>
        <td>10,050</td>
        <td>13707</td>
        <td>7789</td>
      </tr>
      <tr>
        <td>Q10</td>
        <td>15,296</td>
        <td>13038</td>
        <td>12934</td>
      </tr>
      <tr>
        <td>Q11</td>
        <td>2,110</td>
        <td>2344</td>
        <td>2221</td>
      </tr>
      <tr>
        <td>Q12</td>
        <td>1,805</td>
        <td>2049</td>
        <td>1969</td>
      </tr>
      <tr>
        <td>Q13</td>
        <td>34,220</td>
        <td>35116</td>
        <td>34915</td>
      </tr>
      <tr>
        <td>Q14</td>
        <td>771</td>
        <td>942</td>
        <td>852</td>
      </tr>
      <tr>
        <td>Q15</td>
        <td>4,435</td>
        <td>4882</td>
        <td>4832</td>
      </tr>
      <tr>
        <td>Q16</td>
        <td>8,661</td>
        <td>8062</td>
        <td>9522</td>
      </tr>
      <tr>
        <td>Q17</td>
        <td>160,707</td>
        <td>1749</td>
        <td>1684</td>
      </tr>
      <tr>
        <td>Q18</td>
        <td>66,309</td>
        <td>82505</td>
        <td>86376</td>
      </tr>
      <tr>
        <td>Q19</td>
        <td>1,475</td>
        <td>1871</td>
        <td>1515</td>
      </tr>
      <tr>
        <td>Q20</td>
        <td>55,381</td>
        <td>1591</td>
        <td>1491</td>
      </tr>
      <tr>
        <td>Q21</td>
        <td>121,310</td>
        <td>DNF</td>
        <td>13205</td>
      </tr>
      <tr>
        <td>Q22</td>
        <td>6,792</td>
        <td>6746</td>
        <td>7098</td>
      </tr>
      <tr>
        <td></td>
        <td>551,921</td>
        <td>232096 (w/o Q21)</td>
        <td>241228</td>
      </tr>
    </table>
    
    Note that some improvements are not orthogonal to Harshad's partitioned aggregation #121 since LIPFilters also speed up some aggregations. Roughly speaking, when both PRs are merged, we will have an estimated overall running time of ~150s for TPC-H SF100. 

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/apache/incubator-quickstep lip-refactor-backend

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-quickstep/pull/122.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #122
    
----
commit 31b05122f2278a3c1327674795eec71efe8ff452
Author: Jianqiao Zhu <ji...@cs.wisc.edu>
Date:   2016-09-07T18:20:43Z

    Add backend support for LIPFilters.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-quickstep issue #122: Add backend support for LIPFilters.

Posted by jianqiao <gi...@git.apache.org>.
Github user jianqiao commented on the issue:

    https://github.com/apache/incubator-quickstep/pull/122
  
    Comments addressed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-quickstep issue #122: Add backend support for LIPFilters.

Posted by zuyu <gi...@git.apache.org>.
Github user zuyu commented on the issue:

    https://github.com/apache/incubator-quickstep/pull/122
  
    @jianqiao Many query-optimizer-distributed-execution-generator unit tests failed due to missing LIP data structure `LIPFilterBuilder` in `QueryContext` in the distributed case.
    
    ```
    	 33 - quickstep_queryoptimizer_tests_distributed_executiongenerator_delete (OTHER_FAULT)
    	 34 - quickstep_queryoptimizer_tests_distributed_executiongenerator_distinct (OTHER_FAULT)
    	 37 - quickstep_queryoptimizer_tests_distributed_executiongenerator_insert (OTHER_FAULT)
    	 38 - quickstep_queryoptimizer_tests_distributed_executiongenerator_join (OTHER_FAULT)
    	 39 - quickstep_queryoptimizer_tests_distributed_executiongenerator_select (OTHER_FAULT)
    	 40 - quickstep_queryoptimizer_tests_distributed_executiongenerator_stringpatternmatching (OTHER_FAULT)
    	 41 - quickstep_queryoptimizer_tests_distributed_executiongenerator_tablegenerator (OTHER_FAULT)
    	 42 - quickstep_queryoptimizer_tests_distributed_executiongenerator_update (OTHER_FAULT)
    
    $ ctest -VV -R quickstep_queryoptimizer_tests_distributed_executiongenerator_delete
    
    34: [ RUN      ] DISTRIBUTED_EXECUTION_GENERATOR_TEST/TextBasedTest.CompareOutputs/0
    34: F1030 22:46:20.846637 1064960 QueryContext.hpp:331] Check failed: static_cast<std::size_t>(id) < lip_deployments_.size() (0 vs. 0)
    ```
    
    To me, like other data members in `QueryContext`, if possible, `LIPFilterBuilder` would be serialized into `QueryContext proto`, and then deserialized at the beginning of the query execution, instead of the current way that creates during constructing a `WorkOrder`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-quickstep issue #122: Add backend support for LIPFilters.

Posted by zuyu <gi...@git.apache.org>.
Github user zuyu commented on the issue:

    https://github.com/apache/incubator-quickstep/pull/122
  
    @jianqiao I have fixed the bug, and assigned it to you. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-quickstep issue #122: Add backend support for LIPFilters.

Posted by hbdeshmukh <gi...@git.apache.org>.
Github user hbdeshmukh commented on the issue:

    https://github.com/apache/incubator-quickstep/pull/122
  
    Looks good to me, merging. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-quickstep issue #122: Add backend support for LIPFilters.

Posted by jianqiao <gi...@git.apache.org>.
Github user jianqiao commented on the issue:

    https://github.com/apache/incubator-quickstep/pull/122
  
    @zuyu `LIPFilterDeployment` gets serialized into `QueryContext`, and `LIPFilterBuilder` is a helper class that just wraps some contents from  `LIPFilterDeployment`.
    
    The current design is that `LIPFilterBuilder`'s are created with the information from `LIPFilterDeployment` in `WorkOrderFactory`. So it should suffice to serialize only `LIPFilterDeployment`.
    
    The test failure seems to be caused by the fact that `LIPFilterDeployment`'s are not actually serialized, which was supposed to be done by `LIPFilterGenerator`. So are there some changes in the distributed case that bypass the LIPFilter-related routine in `ExecutionGenerator`? (Or you can simply disable LIPFilters in the distributed case by setting `-use_lip_filters=false`)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-quickstep pull request #122: Add backend support for LIPFilters.

Posted by hbdeshmukh <gi...@git.apache.org>.
Github user hbdeshmukh commented on a diff in the pull request:

    https://github.com/apache/incubator-quickstep/pull/122#discussion_r84789328
  
    --- Diff: storage/StorageBlock.cpp ---
    @@ -340,22 +340,16 @@ void StorageBlock::sample(const bool is_block_sample,
     }
     
     void StorageBlock::select(const vector<unique_ptr<const Scalar>> &selection,
    -                          const Predicate *predicate,
    -                          InsertDestinationInterface *destination) const {
    +                          InsertDestinationInterface *destination,
    +                          const TupleIdSequence *filter) const {
    --- End diff --
    
    Can we move the ``filter`` argument before ``destination``, as per Google style guide? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-quickstep issue #122: Add backend support for LIPFilters.

Posted by zuyu <gi...@git.apache.org>.
Github user zuyu commented on the issue:

    https://github.com/apache/incubator-quickstep/pull/122
  
    @jianqiao No, [the distributed unit tests](https://github.com/apache/incubator-quickstep/blob/master/query_optimizer/tests/DistributedExecutionGeneratorTestRunner.cpp#L127) uses the same optimizer as the single node version. And since it works for the single node, it should work for the distributed one. I will debug myself.
    
    FYI, like what you did in `OptimizerTextTest`, I tried to disable it in the distributed unit tests, but no luck.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-quickstep pull request #122: Add backend support for LIPFilters.

Posted by hbdeshmukh <gi...@git.apache.org>.
Github user hbdeshmukh commented on a diff in the pull request:

    https://github.com/apache/incubator-quickstep/pull/122#discussion_r84787343
  
    --- Diff: relational_operators/SelectOperator.cpp ---
    @@ -210,23 +242,39 @@ serialization::WorkOrder* SelectOperator::createWorkOrderProto(const block_id bl
         }
       }
       proto->SetExtension(serialization::SelectWorkOrder::selection_index, selection_index_);
    +  proto->SetExtension(serialization::SelectWorkOrder::lip_deployment_index, lip_deployment_index_);
     
       return proto;
     }
     
    -
     void SelectWorkOrder::execute() {
       BlockReference block(
           storage_manager_->getBlock(input_block_id_, input_relation_, getPreferredNUMANodes()[0]));
     
    +  std::unique_ptr<TupleIdSequence> predicate_matches;
    --- End diff --
    
    Add a note explaining why we apply predicate first and then the LIP filter. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-quickstep pull request #122: Add backend support for LIPFilters.

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/incubator-quickstep/pull/122


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---