You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by gallenvara <gi...@git.apache.org> on 2016/06/07 02:07:22 UTC

[GitHub] flink pull request #2078: [FLINK-2985] Allow different field names for union...

GitHub user gallenvara opened a pull request:

    https://github.com/apache/flink/pull/2078

    [FLINK-2985] Allow different field names for unionAll() in Table API

    Thanks for contributing to Apache Flink. Before you open your pull request, please take the following check list into consideration.
    If your changes take all of the items into account, feel free to open your pull request. For more information and/or questions please refer to the [How To Contribute guide](http://flink.apache.org/how-to-contribute.html).
    In addition to going through the list, please provide a meaningful description of your changes.
    
    - [X] General
      - The pull request references the related JIRA issue ("[FLINK-XXX] Jira title text")
      - The pull request addresses only one issue
      - Each commit in the PR has a meaningful commit message (including the JIRA id)
    
    - [X] Tests & Build
      - Functionality added by the pull request is covered by tests
      - `mvn clean verify` has been executed successfully locally or a Travis build has passed
    
    It is not necessary that the corresponding columns in each SELECT statement have the same name, but they do need to be the same data types. This PR supports allowing different field names for union/unionAll in Table API.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/gallenvara/flink flink-2985

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/2078.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2078
    
----
commit e84d836efbb290fb79d14e3d3c6c9e13db567b72
Author: gallenvara <ga...@126.com>
Date:   2016-06-06T08:30:41Z

    Allow different field names for union in Table API.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #2078: [FLINK-2985] Allow different field names for unionAll() i...

Posted by gallenvara <gi...@git.apache.org>.
Github user gallenvara commented on the issue:

    https://github.com/apache/flink/pull/2078
  
    Hi @wuchong , thanks a lot for your advice. With default fieldnames may cause same content of `input1` and `input2` in `CodeGenerator`. Comparing the `inputTerm` is a easy and effective way to avoid it. :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request #2078: [FLINK-2985] Allow different field names for union...

Posted by fhueske <gi...@git.apache.org>.
Github user fhueske commented on a diff in the pull request:

    https://github.com/apache/flink/pull/2078#discussion_r66476461
  
    --- Diff: flink-libraries/flink-table/src/test/scala/org/apache/flink/api/scala/batch/table/UnionITCase.scala ---
    @@ -69,6 +69,36 @@ class UnionITCase(
       }
     
       @Test
    +  def testUnionAllDifferentFieldNames(): Unit = {
    --- End diff --
    
    We are trying to reduce the number of ITCases in of the Table API and only add new ones if necessary to keep the build time low.
    
    I would change the existing tests to use different field names instead of adding new tests here. In addition the tests should try to select a field to check that the field names of the first input are forwarded.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #2078: [FLINK-2985] Allow different field names for unionAll() i...

Posted by fhueske <gi...@git.apache.org>.
Github user fhueske commented on the issue:

    https://github.com/apache/flink/pull/2078
  
    Hi @gallenvara, this issue cannot be easily solved. The `TypeInfo` check in `UnionOperator` cannot be removed, because it prevents that incompatible serializers are used at runtime. However, the equality check of the `RowTypeInfo` is a bit too strict. `RowTypeInfo` is based on `CaseClassInfo` which check for field types and names (which is fine for case classes). However, for `Row` we only need to check for types. Actually, I think we can remove the `fieldNames` parameter from the `RowTypeInfo` constructor and always create default field names ("f0", "f1", ...) as done in some of the other constructors.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request #2078: [FLINK-2985] Allow different field names for union...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/flink/pull/2078


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #2078: [FLINK-2985] Allow different field names for unionAll() i...

Posted by gallenvara <gi...@git.apache.org>.
Github user gallenvara commented on the issue:

    https://github.com/apache/flink/pull/2078
  
    Hi, @fhueske , sorry for the late update for this PR. Codes have been modified.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #2078: [FLINK-2985] Allow different field names for unionAll() i...

Posted by twalthr <gi...@git.apache.org>.
Github user twalthr commented on the issue:

    https://github.com/apache/flink/pull/2078
  
    Thanks for the contribution @gallenvara. I will merge it now...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #2078: [FLINK-2985] Allow different field names for unionAll() i...

Posted by wuchong <gi...@git.apache.org>.
Github user wuchong commented on the issue:

    https://github.com/apache/flink/pull/2078
  
    Hi @gallenvara , I debug the  `IndexOutOfBoundsException` exception of  `testJoinWithDisjunctivePred`, and find this line [L526 in CodeGenerator](https://github.com/apache/flink/blob/master/flink-libraries/flink-table/src/main/scala/org/apache/flink/api/table/codegen/CodeGenerator.scala#L526).  This is the reason for the test failure.   Because  `==` in scala is `equals`\uff0cwhen we visit inputRef of 'd, the input1 and input2 have the same field types ([Int, Long, String]). Here we will get the wrong index(3) which cause IOOB exception, but we want to get the index(0). 
    
    We just need to modify L526 to `val index = if (input._2 == input1Term) {` will fix this problem.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #2078: [FLINK-2985] Allow different field names for unionAll() i...

Posted by gallenvara <gi...@git.apache.org>.
Github user gallenvara commented on the issue:

    https://github.com/apache/flink/pull/2078
  
    @fhueske the failure of CI build is relevant with `testJoinWithDisjunctivePred` in `JoinITCase`. The case is `val joinT = ds1.join(ds2).filter('a === 'd && ('b === 'e || 'b === 'e - 10)).select('c, 'g)`. I still don't quite understand why this case running failed after some debug work. And when i changed the filter condition with `('a === 'd && ('b === 'e || 'b === 'h - 10))` (or replace with `'d` `'f`), it can work sucessfully without exception. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #2078: [FLINK-2985] Allow different field names for unionAll() i...

Posted by gallenvara <gi...@git.apache.org>.
Github user gallenvara commented on the issue:

    https://github.com/apache/flink/pull/2078
  
    @twalthr  @fhueske  can you help with reviewing this PR? Thanks! :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #2078: [FLINK-2985] Allow different field names for unionAll() i...

Posted by wuchong <gi...@git.apache.org>.
Github user wuchong commented on the issue:

    https://github.com/apache/flink/pull/2078
  
    Yes, you are right. Now this PR look good to me. \U0001f44d 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request #2078: [FLINK-2985] Allow different field names for union...

Posted by fhueske <gi...@git.apache.org>.
Github user fhueske commented on a diff in the pull request:

    https://github.com/apache/flink/pull/2078#discussion_r66469727
  
    --- Diff: flink-java/src/main/java/org/apache/flink/api/java/operators/UnionOperator.java ---
    @@ -43,11 +42,6 @@
     	public UnionOperator(DataSet<T> input1, DataSet<T> input2, String unionLocationName) {
     		super(input1, input2, input1.getType());
     		
    -		if (!input1.getType().equals(input2.getType())) {
    --- End diff --
    
    This check is crucial for the DataSet API and must remain in place.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #2078: [FLINK-2985] Allow different field names for unionAll() i...

Posted by gallenvara <gi...@git.apache.org>.
Github user gallenvara commented on the issue:

    https://github.com/apache/flink/pull/2078
  
    Conflict resolved.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request #2078: [FLINK-2985] Allow different field names for union...

Posted by wuchong <gi...@git.apache.org>.
Github user wuchong commented on a diff in the pull request:

    https://github.com/apache/flink/pull/2078#discussion_r69402729
  
    --- Diff: flink-libraries/flink-table/src/main/scala/org/apache/flink/api/table/typeutils/RowTypeInfo.scala ---
    @@ -25,38 +25,25 @@ import org.apache.flink.api.scala.typeutils.CaseClassTypeInfo
     
     import scala.collection.mutable.ArrayBuffer
     import org.apache.flink.api.common.typeutils.TypeSerializer
    -import org.apache.flink.api.table.{Row, TableException}
    +import org.apache.flink.api.table.Row
     
     /**
      * TypeInformation for [[Row]].
      */
    -class RowTypeInfo(fieldTypes: Seq[TypeInformation[_]], fieldNames: Seq[String])
    +class RowTypeInfo(fieldTypes: Seq[TypeInformation[_]])
       extends CaseClassTypeInfo[Row](
         classOf[Row],
         Array(),
         fieldTypes,
    -    fieldNames)
    +    Nil)
    --- End diff --
    
    I would like to replace `Nil` with `for (i <- fieldTypes.indices) yield "f" + i`, so that we can keep fieldNames as `val`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---