You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by beyond1920 <gi...@git.apache.org> on 2017/01/04 09:11:46 UTC

[GitHub] flink pull request #3058: [FLINK-5394] [Table API & SQL]the estimateRowCount...

GitHub user beyond1920 opened a pull request:

    https://github.com/apache/flink/pull/3058

    [FLINK-5394] [Table API & SQL]the estimateRowCount method of DataSetCalc didn't work

    This pr aims to fix a bug which is referenced by https://issues.apache.org/jira/browse/FLINK-5394.
    The main changes including:
    1. add FlinkRelMdRowCount and  FlinkDefaultRelMetadataProvider to override getRowCount  of some Flink RelNodes
    2. add getRowCount method in DatasetSort to provide more accurate estimate

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/alibaba/flink flink-5394

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/3058.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #3058
    
----
commit 8099920fb8759ed1068e7b8153816a7b63089e45
Author: beyond1920 <be...@126.com>
Date:   2016-12-29T07:52:17Z

    the estimateRowCount method of DataSetCalc didn't work now, fix it

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #3058: [FLINK-5394] [Table API & SQL]the estimateRowCount method...

Posted by twalthr <gi...@git.apache.org>.
Github user twalthr commented on the issue:

    https://github.com/apache/flink/pull/3058
  
    I have no objections. I will merge this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #3058: [FLINK-5394] [Table API & SQL]the estimateRowCount method...

Posted by fhueske <gi...@git.apache.org>.
Github user fhueske commented on the issue:

    https://github.com/apache/flink/pull/3058
  
    cool, thanks @twalthr 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request #3058: [FLINK-5394] [Table API & SQL]the estimateRowCount...

Posted by fhueske <gi...@git.apache.org>.
Github user fhueske commented on a diff in the pull request:

    https://github.com/apache/flink/pull/3058#discussion_r95372931
  
    --- Diff: flink-libraries/flink-table/src/main/scala/org/apache/flink/table/plan/nodes/dataset/DataSetSort.scala ---
    @@ -71,6 +72,21 @@ class DataSetSort(
         )
       }
     
    +  override def estimateRowCount(metadata: RelMetadataQuery): Double = {
    +    val inputRowCnt = metadata.getRowCount(this.getInput)
    +    if (inputRowCnt == null) {
    +      inputRowCnt
    +    } else {
    +      val rowCount = Math.max(inputRowCnt - limitStart, 0D)
    --- End diff --
    
    Returning a cardinality estimate of `0` is not a good idea because all remaining operations might appear to have no costs at all. Rather be conservative and return `1` which is still low but does not invalidate any subsequent costs.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request #3058: [FLINK-5394] [Table API & SQL]the estimateRowCount...

Posted by beyond1920 <gi...@git.apache.org>.
Github user beyond1920 commented on a diff in the pull request:

    https://github.com/apache/flink/pull/3058#discussion_r96112195
  
    --- Diff: flink-libraries/flink-table/src/main/scala/org/apache/flink/table/plan/nodes/dataset/DataSetSort.scala ---
    @@ -71,6 +72,21 @@ class DataSetSort(
         )
       }
     
    +  override def estimateRowCount(metadata: RelMetadataQuery): Double = {
    +    val inputRowCnt = metadata.getRowCount(this.getInput)
    +    if (inputRowCnt == null) {
    +      inputRowCnt
    +    } else {
    +      val rowCount = Math.max(inputRowCnt - limitStart, 0D)
    --- End diff --
    
    Yes, that make sense.  I already fix it. Thanks @fhueske .


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request #3058: [FLINK-5394] [Table API & SQL]the estimateRowCount...

Posted by beyond1920 <gi...@git.apache.org>.
Github user beyond1920 commented on a diff in the pull request:

    https://github.com/apache/flink/pull/3058#discussion_r95967156
  
    --- Diff: flink-libraries/flink-table/src/main/scala/org/apache/flink/table/plan/nodes/dataset/DataSetSort.scala ---
    @@ -71,6 +72,21 @@ class DataSetSort(
         )
       }
     
    +  override def estimateRowCount(metadata: RelMetadataQuery): Double = {
    +    val inputRowCnt = metadata.getRowCount(this.getInput)
    +    if (inputRowCnt == null) {
    +      inputRowCnt
    +    } else {
    +      val rowCount = Math.max(inputRowCnt - limitStart, 0D)
    --- End diff --
    
    Hmm, the estimation 0 only  happens if inputRowCount <= start and (fetch  is null or fetchValue<=0). I think this estimation is reasonable in this case.  Besides, why choose return 1 instead of 0.1 or 0.01 or other values?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request #3058: [FLINK-5394] [Table API & SQL]the estimateRowCount...

Posted by fhueske <gi...@git.apache.org>.
Github user fhueske commented on a diff in the pull request:

    https://github.com/apache/flink/pull/3058#discussion_r95968750
  
    --- Diff: flink-libraries/flink-table/src/main/scala/org/apache/flink/table/plan/nodes/dataset/DataSetSort.scala ---
    @@ -71,6 +72,21 @@ class DataSetSort(
         )
       }
     
    +  override def estimateRowCount(metadata: RelMetadataQuery): Double = {
    +    val inputRowCnt = metadata.getRowCount(this.getInput)
    +    if (inputRowCnt == null) {
    +      inputRowCnt
    +    } else {
    +      val rowCount = Math.max(inputRowCnt - limitStart, 0D)
    --- End diff --
    
    inputRowCount might also just be an estimate and not guaranteed to be precise. Returning 1 is more robust, because it does not result in no-cost operators downstream.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #3058: [FLINK-5394] [Table API & SQL]the estimateRowCount method...

Posted by twalthr <gi...@git.apache.org>.
Github user twalthr commented on the issue:

    https://github.com/apache/flink/pull/3058
  
    I will also look at it tomorrow.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request #3058: [FLINK-5394] [Table API & SQL]the estimateRowCount...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/flink/pull/3058


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---