You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by Shiti <gi...@git.apache.org> on 2015/06/14 21:05:01 UTC

[GitHub] flink pull request: [FLINK-2210] Table API support for aggregation...

GitHub user Shiti opened a pull request:

    https://github.com/apache/flink/pull/834

    [FLINK-2210] Table API support for aggregation on columns with null 

    When a value is null, the AggregationFunction's aggregate method is not called.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/Shiti/flink FLINK-2210

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/834.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #834
    
----
commit f37d4c04ee585c91e6f865eb58b893c863cf4df9
Author: Shiti <ss...@gmail.com>
Date:   2015-06-14T18:59:02Z

    [FLINK-2210] Table API support for aggregation on columns with null values

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2210] Table API support for aggregation...

Posted by mxm <gi...@git.apache.org>.
Github user mxm commented on the pull request:

    https://github.com/apache/flink/pull/834#issuecomment-112114140
  
    For instance, users could wonder that a DataSet containing null values returns a different result for a count aggregation than for a call to the `count()` method.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2210] Table API support for aggregation...

Posted by tillrohrmann <gi...@git.apache.org>.
Github user tillrohrmann commented on the pull request:

    https://github.com/apache/flink/pull/834#issuecomment-112063713
  
    Thanks for your contribution @Shiti, LGTM. +1 for merging.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2210] Table API support for aggregation...

Posted by mxm <gi...@git.apache.org>.
Github user mxm commented on the pull request:

    https://github.com/apache/flink/pull/834#issuecomment-112351299
  
    Looks good. Thanks for addressing my concerns :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2210] Table API support for aggregation...

Posted by aljoscha <gi...@git.apache.org>.
Github user aljoscha commented on the pull request:

    https://github.com/apache/flink/pull/834#issuecomment-112116663
  
    Yes, I see your point. Also, right now an expression like "a.count, a.sum" (where a only contains 0 or 1) returns different results if field a is null for some rows. This could be fixed by only counting fields that are non-null.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2210] Table API support for aggregation...

Posted by aljoscha <gi...@git.apache.org>.
Github user aljoscha commented on the pull request:

    https://github.com/apache/flink/pull/834#issuecomment-112566051
  
    Thanks for your work. :smile: 
    
    Could you please close the PR manually, I forgot to add a tag for closing it in the commit.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2210] Table API support for aggregation...

Posted by aljoscha <gi...@git.apache.org>.
Github user aljoscha commented on the pull request:

    https://github.com/apache/flink/pull/834#issuecomment-112316099
  
    Very good, I would have forgotten about the test case with all-null values. :smiley: 
    
    I only had a minor comment about Boolean.compare()


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2210] Table API support for aggregation...

Posted by aljoscha <gi...@git.apache.org>.
Github user aljoscha commented on the pull request:

    https://github.com/apache/flink/pull/834#issuecomment-112127936
  
    OK, I had some offline discussion with @mxm and @fhueske. We thought that we should have this behaviour: https://en.wikipedia.org/wiki/Null_(SQL)#Aggregate_functions.
    
    The COUNT(a) should be correct with this PR since it is implemented as SUM(1), so this counts also those elements where a = null. The only problem is AVG(), which right now would provide incorrect results since AVG(a) is generated as SUM(a)/SUM(1). The SUM(1) here needs to be changed to something that only counts when a != null.
    
    Then of course we would need tests for the correct behaviour of these aggregations in the presence of null values.
    
    What do you think?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2210] Table API support for aggregation...

Posted by Shiti <gi...@git.apache.org>.
Github user Shiti commented on the pull request:

    https://github.com/apache/flink/pull/834#issuecomment-112276475
  
    @aljoscha is there a way we can define functions for expressions? I tried replacing Literal(1) with IsNotNull but I don't know if its possible to define a function that converts Boolean to Int (1 or 0). 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2210] Table API support for aggregation...

Posted by aljoscha <gi...@git.apache.org>.
Github user aljoscha commented on a diff in the pull request:

    https://github.com/apache/flink/pull/834#discussion_r32495598
  
    --- Diff: flink-staging/flink-table/src/main/scala/org/apache/flink/api/table/codegen/ExpressionCodeGenerator.scala ---
    @@ -489,6 +489,25 @@ abstract class ExpressionCodeGenerator[R](
                 """.stripMargin
             }
     
    +      case NumericIsNotNull(child) =>
    +        val childCode = generateExpression(child)
    +        if (nullCheck) {
    +          childCode.code +
    +            s"""
    +               |boolean $nullTerm = ${childCode.nullTerm};
    +               |if ($nullTerm) {
    +               |  0;
    +               |} else {
    +               |  $resultTpe $resultTerm = Boolean.compare((${childCode.resultTerm} != null),false);
    +               |}
    +            """.stripMargin
    +        } else {
    +          childCode.code +
    +            s"""
    +               |$resultTpe $resultTerm = Boolean.compare((${childCode.resultTerm} != null),false);
    --- End diff --
    
    See above.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2210] Table API support for aggregation...

Posted by Shiti <gi...@git.apache.org>.
Github user Shiti commented on the pull request:

    https://github.com/apache/flink/pull/834#issuecomment-112319099
  
    @aljoscha I updated the code


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2210] Table API support for aggregation...

Posted by aljoscha <gi...@git.apache.org>.
Github user aljoscha commented on the pull request:

    https://github.com/apache/flink/pull/834#issuecomment-112331072
  
    Cool, this looks very nice.
    
    @fhueske @mxm Any objections to me merging this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2210] Table API support for aggregation...

Posted by aljoscha <gi...@git.apache.org>.
Github user aljoscha commented on a diff in the pull request:

    https://github.com/apache/flink/pull/834#discussion_r32495592
  
    --- Diff: flink-staging/flink-table/src/main/scala/org/apache/flink/api/table/codegen/ExpressionCodeGenerator.scala ---
    @@ -489,6 +489,25 @@ abstract class ExpressionCodeGenerator[R](
                 """.stripMargin
             }
     
    +      case NumericIsNotNull(child) =>
    +        val childCode = generateExpression(child)
    +        if (nullCheck) {
    +          childCode.code +
    +            s"""
    +               |boolean $nullTerm = ${childCode.nullTerm};
    +               |if ($nullTerm) {
    +               |  0;
    +               |} else {
    +               |  $resultTpe $resultTerm = Boolean.compare((${childCode.resultTerm} != null),false);
    --- End diff --
    
    Boolean.compare() could be problematic because it is only guaranteed to return a value smaller than 0 or larger then 0 if the two values are not equal.
    
    What about?
    ```
    $resultTpe $resultTerm = ${childCode.resultTerm} != null ? 1 : 0;


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2210] Table API support for aggregation...

Posted by Shiti <gi...@git.apache.org>.
Github user Shiti commented on the pull request:

    https://github.com/apache/flink/pull/834#issuecomment-112635882
  
    merged b59c81bc41f0fc4ade5359dfdf42549a76d412fa


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2210] Table API support for aggregation...

Posted by mxm <gi...@git.apache.org>.
Github user mxm commented on the pull request:

    https://github.com/apache/flink/pull/834#issuecomment-112104910
  
    I have a funny feeling that this might backfire. Is the null value policy now to simply ignore them? :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2210] Table API support for aggregation...

Posted by aljoscha <gi...@git.apache.org>.
Github user aljoscha commented on the pull request:

    https://github.com/apache/flink/pull/834#issuecomment-112295698
  
    I think we need to have a special AST node along with code generation for it. This means:
    
     - Addition of something like NumericIsNotNull in comparisons.scala
     - Code generation for this new AST node in ExpressionCodeGenerator.scala.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2210] Table API support for aggregation...

Posted by Shiti <gi...@git.apache.org>.
Github user Shiti closed the pull request at:

    https://github.com/apache/flink/pull/834


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2210] Table API support for aggregation...

Posted by aljoscha <gi...@git.apache.org>.
Github user aljoscha commented on the pull request:

    https://github.com/apache/flink/pull/834#issuecomment-112111131
  
    But right now the jobs fail with NPEs, that's also not very nice. :smile: 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2210] Table API support for aggregation...

Posted by Shiti <gi...@git.apache.org>.
Github user Shiti commented on the pull request:

    https://github.com/apache/flink/pull/834#issuecomment-112312576
  
    @aljoscha I have made the changes


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---