You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by fhueske <gi...@git.apache.org> on 2016/01/29 01:49:36 UTC

[GitHub] flink pull request: [FLINK-3225] Implemented optimization of Table...

GitHub user fhueske opened a pull request:

    https://github.com/apache/flink/pull/1559

    [FLINK-3225] Implemented optimization of Table API queries via Calcite

    - added logical Flink nodes and translation rules
    - added stubs for DataSet translation rules
    - ported DataSetNodes to Scala
    - reactivated tests and added expected NotImplementedError

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/fhueske/flink calciteOpt

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/1559.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1559
    
----
commit 3fba706ab39e3fdb1d4c3b307a36a2184476a304
Author: Fabian Hueske <fh...@apache.org>
Date:   2016-01-26T12:22:38Z

    [FLINK-3225] Implemented optimization of Table API queries via Calcite
    
    - added logical Flink nodes and translation rules
    - added stubs for DataSet translation rules
    - ported DataSetNodes to Scala
    - reactivated tests and added expected NotImplementedError

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-3225] Implemented optimization of Table...

Posted by twalthr <gi...@git.apache.org>.
Github user twalthr commented on the pull request:

    https://github.com/apache/flink/pull/1559#issuecomment-177520097
  
    Yes, good idea. My Skype name is `twalthr`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-3225] Implemented optimization of Table...

Posted by ChengXiangLi <gi...@git.apache.org>.
Github user ChengXiangLi commented on the pull request:

    https://github.com/apache/flink/pull/1559#issuecomment-176891835
  
    Yes, @fhueske , i saw you rewrite the DataSetRelNodes on scala, i would continue my work based on your PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-3225] Implemented optimization of Table...

Posted by twalthr <gi...@git.apache.org>.
Github user twalthr commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1559#discussion_r51241130
  
    --- Diff: flink-libraries/flink-table/src/main/scala/org/apache/flink/api/java/table/JavaBatchTranslator.scala ---
    @@ -41,21 +44,13 @@ class JavaBatchTranslator extends PlanTranslator {
     
         // create table representation from DataSet
         val dataSetTable = new DataSetTable[A](
    -    repr.asInstanceOf[JavaDataSet[A]],
    -    fieldNames
    +      repr.asInstanceOf[JavaDataSet[A]],
    +      fieldNames
         )
    -
    -    // register table in Cascading schema
    -    val schema = Frameworks.createRootSchema(true)
         val tableName = repr.hashCode().toString
    -    schema.add(tableName, dataSetTable)
     
    -    // initialize RelBuilder
    -    val frameworkConfig = Frameworks
    -      .newConfigBuilder
    -      .defaultSchema(schema)
    -      .build
    -    val relBuilder = RelBuilder.create(frameworkConfig)
    +    TranslationContext.addDataSet(tableName, dataSetTable)
    --- End diff --
    
    Do we really want to have this static? What happens if we use multipleTableEnvironments in our program? They shouldn't influence each other. Is the hash code really unique? In the old Table API we had a AtomicCounter that guaranted uniqueness.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-3225] Implemented optimization of Table...

Posted by fhueske <gi...@git.apache.org>.
Github user fhueske commented on the pull request:

    https://github.com/apache/flink/pull/1559#issuecomment-176693988
  
    Updated the PR


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-3225] Implemented optimization of Table...

Posted by fhueske <gi...@git.apache.org>.
Github user fhueske closed the pull request at:

    https://github.com/apache/flink/pull/1559


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-3225] Implemented optimization of Table...

Posted by twalthr <gi...@git.apache.org>.
Github user twalthr commented on the pull request:

    https://github.com/apache/flink/pull/1559#issuecomment-176676353
  
    Except for the comments it looks very good.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-3225] Implemented optimization of Table...

Posted by twalthr <gi...@git.apache.org>.
Github user twalthr commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1559#discussion_r51241929
  
    --- Diff: flink-libraries/flink-table/src/main/scala/org/apache/flink/api/table/plan/nodes/dataset/DataSetRel.scala ---
    @@ -0,0 +1,33 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.flink.api.table.plan.nodes.dataset
    +
    +import org.apache.calcite.rel.RelNode
    +import org.apache.flink.api.java.DataSet
    +import org.apache.flink.api.table.Row
    +
    +trait DataSetRel extends RelNode {
    +
    +  /**
    +    * Translate the FlinkRelNode into Flink operator.
    +    */
    +  def translateToPlan: DataSet[Row]
    --- End diff --
    
    I asked this question already in the other PR. Do we really want to limit the translation to `Rows` only? We should use `Any` here, so that we can also use `Tuples` if the null check is disabled and the columns are less than 25. The code generation will support that.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-3225] Implemented optimization of Table...

Posted by twalthr <gi...@git.apache.org>.
Github user twalthr commented on the pull request:

    https://github.com/apache/flink/pull/1559#issuecomment-176727999
  
    +1 for merging


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-3225] Implemented optimization of Table...

Posted by fhueske <gi...@git.apache.org>.
Github user fhueske commented on the pull request:

    https://github.com/apache/flink/pull/1559#issuecomment-176501262
  
    @ChengXiangLi, @twalthr: Please review and give feedback.
    
    @ChengXiangLi, this PR contains stubs for the translation rules which are quite similar to the rules in your PR #1556. 
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-3225] Implemented optimization of Table...

Posted by fhueske <gi...@git.apache.org>.
Github user fhueske commented on the pull request:

    https://github.com/apache/flink/pull/1559#issuecomment-177854554
  
    @ChengXiangLi, just added you on Skype. 
    
    I'll write a mail to coordinate the time :-)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-3225] Implemented optimization of Table...

Posted by fhueske <gi...@git.apache.org>.
Github user fhueske commented on the pull request:

    https://github.com/apache/flink/pull/1559#issuecomment-176976611
  
    @ChengXiangLi Thanks! I'll merge it to the `tableOnCalcite` branch.
    
    @twalthr, @ChengXiangLi should we do a brief Skype call or chat beginning next week to coordinate the missing tasks to finalize FLINK-3221?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-3225] Implemented optimization of Table...

Posted by fhueske <gi...@git.apache.org>.
Github user fhueske commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1559#discussion_r51245496
  
    --- Diff: flink-libraries/flink-table/src/main/scala/org/apache/flink/api/table/plan/nodes/dataset/DataSetRel.scala ---
    @@ -0,0 +1,33 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.flink.api.table.plan.nodes.dataset
    +
    +import org.apache.calcite.rel.RelNode
    +import org.apache.flink.api.java.DataSet
    +import org.apache.flink.api.table.Row
    +
    +trait DataSetRel extends RelNode {
    +
    +  /**
    +    * Translate the FlinkRelNode into Flink operator.
    +    */
    +  def translateToPlan: DataSet[Row]
    --- End diff --
    
    Right, I'll change that to `Any`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-3225] Implemented optimization of Table...

Posted by ChengXiangLi <gi...@git.apache.org>.
Github user ChengXiangLi commented on the pull request:

    https://github.com/apache/flink/pull/1559#issuecomment-177429493
  
    Yes, i think we should do that, my Skype is chengxiang.libra@gmail.com.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-3225] Implemented optimization of Table...

Posted by twalthr <gi...@git.apache.org>.
Github user twalthr commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1559#discussion_r51241424
  
    --- Diff: flink-libraries/flink-table/src/main/scala/org/apache/flink/api/table/plan/nodes/dataset/DataSetExchange.scala ---
    @@ -0,0 +1,63 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.flink.api.table.plan.nodes.dataset
    +
    +import org.apache.calcite.plan.{RelTraitSet, RelOptCluster}
    +import org.apache.calcite.rel.`type`.RelDataType
    +import org.apache.calcite.rel.{RelWriter, RelNode, SingleRel}
    +import org.apache.flink.api.common.operators.base.PartitionOperatorBase.PartitionMethod
    +import org.apache.flink.api.java.DataSet
    +import org.apache.flink.api.table.Row
    +
    +/**
    +  * Flink RelNode which matches along with PartitionOperator.
    +  */
    +class DataSetExchange(
    +                       cluster: RelOptCluster,
    --- End diff --
    
    There should only be 4 whitespaces here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-3225] Implemented optimization of Table...

Posted by fhueske <gi...@git.apache.org>.
Github user fhueske commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1559#discussion_r51245443
  
    --- Diff: flink-libraries/flink-table/src/main/scala/org/apache/flink/api/java/table/JavaBatchTranslator.scala ---
    @@ -41,21 +44,13 @@ class JavaBatchTranslator extends PlanTranslator {
     
         // create table representation from DataSet
         val dataSetTable = new DataSetTable[A](
    -    repr.asInstanceOf[JavaDataSet[A]],
    -    fieldNames
    +      repr.asInstanceOf[JavaDataSet[A]],
    +      fieldNames
         )
    -
    -    // register table in Cascading schema
    -    val schema = Frameworks.createRootSchema(true)
         val tableName = repr.hashCode().toString
    -    schema.add(tableName, dataSetTable)
     
    -    // initialize RelBuilder
    -    val frameworkConfig = Frameworks
    -      .newConfigBuilder
    -      .defaultSchema(schema)
    -      .build
    -    val relBuilder = RelBuilder.create(frameworkConfig)
    +    TranslationContext.addDataSet(tableName, dataSetTable)
    --- End diff --
    
    I think we need to hold the table catalog and the RelBuilder in a singleton. The Scala Table API does not require a TableEnvironment (and I think this should stay like this for seamless integration), so there is no othercentral place to register tables (and the RelBuilder must be the same for all RelNodes of a query). You are right wrt. to the hashcode. I'll use an AtomicCounter for the name.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---