You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@storm.apache.org by HeartSaVioR <gi...@git.apache.org> on 2016/10/14 10:53:39 UTC

[GitHub] storm pull request #1736: STORM-1446 Compile the Calcite logical plan to Sto...

GitHub user HeartSaVioR opened a pull request:

    https://github.com/apache/storm/pull/1736

    STORM-1446 Compile the Calcite logical plan to Storm Trident logical plan

    First of all I should thank to @milinda for sharing great implementation, because this patch is merely just a port of SamzaSQL (though it was from @milinda repo, not Samza repo).
    
    This is for converting Calcite logical plan to Storm's Trident logical plan (as my understanding of logical plan / physical plan is right). Trident itself does a physical plan.
    
    After introducing this, much more things can be followed up, like
    - setting up parallelism hint based on data source's hint (metadata)
    - push down filter / project to data source
    - go back generating source code of Trident topology, but avoiding raw String concatenation
    - and whatever...
    
    This is rather huge patch and it requires understanding of core concept of Calcite. If you haven't learn it, this slide can help you, as I did.
    http://www.slideshare.net/JordanHalterman/introduction-to-apache-calcite
    
    It doesn't touch standalone mode (but I will try later), and also doesn't touch feature itself so it doesn't require documentation, but adding javadoc comment might be needed. 
    If reviewers want to see it please let me know so that I can work on.
    
    Thanks in advance!
    
    Commit log is below:
    
    * Port SamzaSQL implementation to Storm
      * https://github.com/milinda/samza-sql
    * Apply some rules to optimize
    * optimize Calc
      * merge filter and projection scripts into one
      * also applying short circuit
    * Modify Trident unit tests to use new query planner
    * arrange some files
      * Move some files which are only used from standalone
      * Remove some files which are no longer used
    * guard the possibility of stack overflow error on explaining
      * just leave error logs, and print out empty plan and continue
      * reported this behavior to Calcite community
    * leave some comments to clarify what it means

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/HeartSaVioR/storm STORM-1446

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/storm/pull/1736.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1736
    
----
commit 692af990f051e1c37390c798d2b1eddcef60d01b
Author: Jungtaek Lim <ka...@gmail.com>
Date:   2016-10-13T10:00:10Z

    STORM-1446 Compile the Calcite logical plan to Storm Trident logical plan
    
    * Port SamzaSQL implementation to Storm
      * https://github.com/milinda/samza-sql
    * Apply some rules to optimize
    * optimize Calc
      * merge filter and projection scripts into one
      * also applying short circuit
    * Modify Trident unit tests to use new query planner
    * arrange some files
      * Move some files which are only used from standalone
      * Remove some files which are no longer used
    * guard the possibility of stack overflow error on explaining
      * just leave error logs, and print out empty plan and continue
      * reported this behavior to Calcite community
    * leave some comments to clarify what it means

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm pull request #1736: STORM-1446 [Storm SQL] Compile the Calcite logical...

Posted by HeartSaVioR <gi...@git.apache.org>.
Github user HeartSaVioR commented on a diff in the pull request:

    https://github.com/apache/storm/pull/1736#discussion_r94504773
  
    --- Diff: external/sql/storm-sql-core/src/jvm/org/apache/storm/sql/StormSqlImpl.java ---
    @@ -113,18 +113,13 @@ public void submit(
           } else if (node instanceof SqlCreateFunction) {
             handleCreateFunction((SqlCreateFunction) node);
           }  else {
    -        DataContext dataContext = new StormDataContext();
    -        FrameworkConfig config = buildFrameWorkConfig();
    -        Planner planner = Frameworks.getPlanner(config);
    -        SqlNode parse = planner.parse(sql);
    -        SqlNode validate = planner.validate(parse);
    -        RelNode tree = planner.convert(validate);
    -        org.apache.storm.sql.compiler.backends.trident.PlanCompiler compiler =
    -                new org.apache.storm.sql.compiler.backends.trident.PlanCompiler(dataSources, typeFactory, dataContext);
    -        TridentTopology topo = compiler.compile(tree);
    +        QueryPlanner planner = new QueryPlanner(schema);
    +        AbstractTridentProcessor processor = planner.compile(dataSources, sql);
    +        TridentTopology topo = processor.build(null);
    --- End diff --
    
    Just removed that parameter since it is no longer needed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm issue #1736: STORM-1446 [Storm SQL] Compile the Calcite logical plan t...

Posted by HeartSaVioR <gi...@git.apache.org>.
Github user HeartSaVioR commented on the issue:

    https://github.com/apache/storm/pull/1736
  
    Addressed the change with master: update to Calcite 1.11.0


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm issue #1736: STORM-1446 Compile the Calcite logical plan to Storm Trid...

Posted by HeartSaVioR <gi...@git.apache.org>.
Github user HeartSaVioR commented on the issue:

    https://github.com/apache/storm/pull/1736
  
    Updated issue's desc.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm issue #1736: STORM-1446 [Storm SQL] Compile the Calcite logical plan t...

Posted by HeartSaVioR <gi...@git.apache.org>.
Github user HeartSaVioR commented on the issue:

    https://github.com/apache/storm/pull/1736
  
    Reflect STORM-2072 : STORM-1446 now also addresses STORM-2073.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm pull request #1736: STORM-1446 [Storm SQL] Compile the Calcite logical...

Posted by harshach <gi...@git.apache.org>.
Github user harshach commented on a diff in the pull request:

    https://github.com/apache/storm/pull/1736#discussion_r94467825
  
    --- Diff: external/sql/storm-sql-core/src/jvm/org/apache/storm/sql/StormSqlImpl.java ---
    @@ -113,18 +113,13 @@ public void submit(
           } else if (node instanceof SqlCreateFunction) {
             handleCreateFunction((SqlCreateFunction) node);
           }  else {
    -        DataContext dataContext = new StormDataContext();
    -        FrameworkConfig config = buildFrameWorkConfig();
    -        Planner planner = Frameworks.getPlanner(config);
    -        SqlNode parse = planner.parse(sql);
    -        SqlNode validate = planner.validate(parse);
    -        RelNode tree = planner.convert(validate);
    -        org.apache.storm.sql.compiler.backends.trident.PlanCompiler compiler =
    -                new org.apache.storm.sql.compiler.backends.trident.PlanCompiler(dataSources, typeFactory, dataContext);
    -        TridentTopology topo = compiler.compile(tree);
    +        QueryPlanner planner = new QueryPlanner(schema);
    +        AbstractTridentProcessor processor = planner.compile(dataSources, sql);
    +        TridentTopology topo = processor.build(null);
    --- End diff --
    
    any reason to pass null here?. Also a comment would be good on passing null


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm pull request #1736: STORM-1446 [Storm SQL] Compile the Calcite logical...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/storm/pull/1736


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm issue #1736: STORM-1446 [Storm SQL] Compile the Calcite logical plan t...

Posted by HeartSaVioR <gi...@git.apache.org>.
Github user HeartSaVioR commented on the issue:

    https://github.com/apache/storm/pull/1736
  
    @harshach Thanks for the review. I addressed your comments.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm pull request #1736: STORM-1446 [Storm SQL] Compile the Calcite logical...

Posted by HeartSaVioR <gi...@git.apache.org>.
Github user HeartSaVioR commented on a diff in the pull request:

    https://github.com/apache/storm/pull/1736#discussion_r94504782
  
    --- Diff: external/sql/storm-sql-core/src/jvm/org/apache/storm/sql/StormSqlImpl.java ---
    @@ -48,7 +45,10 @@
     import org.apache.storm.sql.parser.SqlCreateFunction;
     import org.apache.storm.sql.parser.SqlCreateTable;
     import org.apache.storm.sql.parser.StormParser;
    +import org.apache.storm.sql.planner.trident.QueryPlanner;
    +import org.apache.storm.sql.planner.StormRelUtils;
     import org.apache.storm.sql.runtime.*;
    --- End diff --
    
    Addressed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm issue #1736: STORM-1446 [Storm SQL] Compile the Calcite logical plan t...

Posted by HeartSaVioR <gi...@git.apache.org>.
Github user HeartSaVioR commented on the issue:

    https://github.com/apache/storm/pull/1736
  
    Also addressed STORM-2200: Drop Aggregate & Join support on Trident mode


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm issue #1736: STORM-1446 Compile the Calcite logical plan to Storm Trid...

Posted by HeartSaVioR <gi...@git.apache.org>.
Github user HeartSaVioR commented on the issue:

    https://github.com/apache/storm/pull/1736
  
    Just fixed the RAT issue.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm issue #1736: STORM-1446 Compile the Calcite logical plan to Storm Trid...

Posted by HeartSaVioR <gi...@git.apache.org>.
Github user HeartSaVioR commented on the issue:

    https://github.com/apache/storm/pull/1736
  
    @manuzhang Yes issue description just links Julian's comment and I think it's not sufficient. I'll also write up motivation and rationale to issue description. Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm issue #1736: STORM-1446 Compile the Calcite logical plan to Storm Trid...

Posted by HeartSaVioR <gi...@git.apache.org>.
Github user HeartSaVioR commented on the issue:

    https://github.com/apache/storm/pull/1736
  
    Fixed some bugs, and renamed classes to reflect more specific.
    Unit test passes and manually tested.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm issue #1736: STORM-1446 Compile the Calcite logical plan to Storm Trid...

Posted by manuzhang <gi...@git.apache.org>.
Github user manuzhang commented on the issue:

    https://github.com/apache/storm/pull/1736
  
    @HeartSaVioR Thanks for the thorough explanation. I'd suggest record the motive and rationals somewhere.  It's hard to follow why something exists in the first place. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm pull request #1736: STORM-1446 [Storm SQL] Compile the Calcite logical...

Posted by harshach <gi...@git.apache.org>.
Github user harshach commented on a diff in the pull request:

    https://github.com/apache/storm/pull/1736#discussion_r94467762
  
    --- Diff: external/sql/storm-sql-core/src/jvm/org/apache/storm/sql/StormSqlImpl.java ---
    @@ -48,7 +45,10 @@
     import org.apache.storm.sql.parser.SqlCreateFunction;
     import org.apache.storm.sql.parser.SqlCreateTable;
     import org.apache.storm.sql.parser.StormParser;
    +import org.apache.storm.sql.planner.trident.QueryPlanner;
    +import org.apache.storm.sql.planner.StormRelUtils;
     import org.apache.storm.sql.runtime.*;
    --- End diff --
    
    can we remove the wildcard import


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm issue #1736: STORM-1446 Compile the Calcite logical plan to Storm Trid...

Posted by manuzhang <gi...@git.apache.org>.
Github user manuzhang commented on the issue:

    https://github.com/apache/storm/pull/1736
  
    @HeartSaVioR Could you please clarify what's the motive behind this big change ? What's the benefits ? Has this changed the [workflow of StormSQL](https://storm.apache.org/releases/2.0.0-SNAPSHOT/storm-sql-internal.html) ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm issue #1736: STORM-1446 [Storm SQL] Compile the Calcite logical plan t...

Posted by harshach <gi...@git.apache.org>.
Github user harshach commented on the issue:

    https://github.com/apache/storm/pull/1736
  
    +1


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm issue #1736: STORM-1446 Compile the Calcite logical plan to Storm Trid...

Posted by HeartSaVioR <gi...@git.apache.org>.
Github user HeartSaVioR commented on the issue:

    https://github.com/apache/storm/pull/1736
  
    @manuzhang 
    Thanks for showing interest on this. Let me rearrange your questions:
    
    > Has this changed the workflow of StormSQL?
    
    No, we still rely on Trident, and nothing changed in point of workflow.
    
    > Motive behind this big change and benefits
    
    This is started from [Julian's comment](https://issues.apache.org/jira/browse/STORM-1040?focusedCommentId=15034472&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15034472) and also [Milinda's comment](https://issues.apache.org/jira/browse/STORM-1040?focusedCommentId=15035182&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15035182).
    
    For me having own relational algebras (rel) has several advantages, 
    
    - We can push operator handling logic to rel itself. Before that we should traverse Calcite logical rel  tree with PostOrderRelNodeVisitor, and visitor needs to handle Calcite's rel directly. Now the logic how to configure Trident topology is all handled from separate rels.
    
    - We sometimes want to have more derived rels compared to Calcite logical operators. One of example is `Join`.  There's only one logical rel regarding join in Calcite - LogicalJoin - but we're now converting LogicalJoin to EquiJoin if conditions are met. If we have various types of join it will make the difference. We're not prepared yet, but streaming scan vs table scan, and streaming insert vs table insert are the other cases.
    
    ```
    TridentStormAggregateRel(group=[{0}], EXPR$1=[COUNT()])
      TridentStormCalcRel(expr#0..4=[{inputs}], expr#5=[0], expr#6=[>($t0, $t5)], DEPTID=[$t3], EMPID=[$t0], $condition=[$t6])
        TridentStormEquiJoinRel(condition=[=($2, $3)], joinType=[inner])
          TridentStormStreamScanRel(table=[[EMP]])
          TridentStormStreamScanRel(table=[[DEPT]])
    ```
    
    We can even override the methods how to represent the rel in explain string if we think Calcite's explain is less informational. For example, showing initial parallelism (when we support) for Scan.
    
    - This patch starts addressing query optimizations. One of example is Calc, which is for merging multiple calculations into one. No need to filter and projection separately. They're still not minimized (as projection is), but it can be addressed after STORM-2072. Defining derived rels helps further query optimizations, like filter pushdown. Calcite rels is not aware of data source characteristic, and we can include it to our own rels.
    
    Please don't hesitate to ask questions. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm pull request #1736: STORM-1446 [Storm SQL] Compile the Calcite logical...

Posted by HeartSaVioR <gi...@git.apache.org>.
Github user HeartSaVioR commented on a diff in the pull request:

    https://github.com/apache/storm/pull/1736#discussion_r94504740
  
    --- Diff: external/sql/storm-sql-core/src/jvm/org/apache/storm/sql/StormSqlImpl.java ---
    @@ -166,8 +161,7 @@ public void explain(Iterable<String> statements) throws Exception {
             SqlNode validate = planner.validate(parse);
             RelNode tree = planner.convert(validate);
     
    -        // TODO: change to all attributes when we change to cost-based planner
    -        String plan = RelOptUtil.toString(tree, SqlExplainLevel.NON_COST_ATTRIBUTES);
    +        String plan = StormRelUtils.explain(tree, SqlExplainLevel.ALL_ATTRIBUTES);
             System.out.println("plan>");
    --- End diff --
    
    That's a good idea, but not related to this change. I'll file an issue regarding this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm pull request #1736: STORM-1446 [Storm SQL] Compile the Calcite logical...

Posted by harshach <gi...@git.apache.org>.
Github user harshach commented on a diff in the pull request:

    https://github.com/apache/storm/pull/1736#discussion_r94468010
  
    --- Diff: external/sql/storm-sql-core/src/jvm/org/apache/storm/sql/StormSqlImpl.java ---
    @@ -166,8 +161,7 @@ public void explain(Iterable<String> statements) throws Exception {
             SqlNode validate = planner.validate(parse);
             RelNode tree = planner.convert(validate);
     
    -        // TODO: change to all attributes when we change to cost-based planner
    -        String plan = RelOptUtil.toString(tree, SqlExplainLevel.NON_COST_ATTRIBUTES);
    +        String plan = StormRelUtils.explain(tree, SqlExplainLevel.ALL_ATTRIBUTES);
             System.out.println("plan>");
    --- End diff --
    
    should we move all these println statements to Utils print statement and in future we can change this to file based approach?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---