You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by twalthr <gi...@git.apache.org> on 2014/11/14 22:30:08 UTC

[GitHub] incubator-flink pull request: [FLINK-1245] Introduce TypeHints for...

GitHub user twalthr opened a pull request:

    https://github.com/apache/incubator-flink/pull/203

    [FLINK-1245] Introduce TypeHints for Java API operators

    This PR contains:
    
    - .returns(String), .returns(TypeInformation), .returns(Class) for all Java API UdfOperators
    - a new UnknownTypeInfo with a corresponding new UnknownTypeSerializer which uses Kryo in a special configuration
    - the TypeExtractor does not throw Exceptions anymore if it recognizes type erasure, 
      instead it prints log warnings and returns a UnknownTypeInfo
    - Java 8 Lambdas can now be compiled and run with Oracle JDK for development (not efficient due to UnknownTypeInfo)
    - since TypeInfoParser is now "Public API" I reworked it such it also support Pojos
    - Unit Tests test TypeHints, Serializers, TypeExtractor and Lambdas

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/twalthr/incubator-flink TypeInfoRework

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-flink/pull/203.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #203
    
----
commit 8574f9c6c8f518ef54d363d07e6d53462de78d5d
Author: twalthr <in...@twalthr.com>
Date:   2014-11-07T15:18:23Z

    [FLINK-1245] Introduce TypeHints for Java API operators

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-flink pull request: [FLINK-1245] Introduce TypeHints for...

Posted by StephanEwen <gi...@git.apache.org>.
Github user StephanEwen commented on a diff in the pull request:

    https://github.com/apache/incubator-flink/pull/203#discussion_r21818932
  
    --- Diff: flink-java/src/main/java/org/apache/flink/api/java/DataSet.java ---
    @@ -98,14 +99,18 @@ protected DataSet(ExecutionEnvironment context, TypeInformation<T> type) {
     			throw new NullPointerException("context is null");
     		}
     
    -		if (type == null) {
    -			throw new NullPointerException("type is null");
    -		}
    -		
     		this.context = context;
     		this.type = type;
     	}
     
    +	protected void setType(TypeInformation<T> type) {
    --- End diff --
    
    Okay, that makes sense.
    
    How about we add a flag to make sure that the `fillInTypeInfo()` cannot be called after the `getType()` method was called? That way we make sure that we do not make things inconsistent (by returning some type info at some point, and another one later). This should not happen anyways, but I would feel much better with this sanity check in place - after all, this can cause very hard to debug issues, if used inconsistently.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-flink pull request: [FLINK-1245] Introduce TypeHints for...

Posted by twalthr <gi...@git.apache.org>.
Github user twalthr commented on the pull request:

    https://github.com/apache/incubator-flink/pull/203#issuecomment-65423242
  
    It seems that Travis doesn't want to test my code, I don't know why.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-flink pull request: [FLINK-1245] Introduce TypeHints for...

Posted by rmetzger <gi...@git.apache.org>.
Github user rmetzger commented on a diff in the pull request:

    https://github.com/apache/incubator-flink/pull/203#discussion_r21416441
  
    --- Diff: flink-java8/src/test/java/org/apache/flink/api/java/type/lambdas/LambdaExtractionTest.java ---
    @@ -129,10 +129,12 @@ public void testMapLambda() {
     		MapFunction<Tuple2<Tuple1<Integer>, Boolean>, Tuple2<Tuple1<Integer>, String>> f = (i) -> null;
     		
     		TypeInformation<?> ti = TypeExtractor.getMapReturnTypes(f, TypeInfoParser.parse("Tuple2<Tuple1<Integer>, Boolean>"));
    -		Assert.assertTrue(ti.isTupleType());
    -		Assert.assertEquals(2, ti.getArity());
    -		Assert.assertTrue(((TupleTypeInfo<?>) ti).getTypeAt(0).isTupleType());
    -		Assert.assertEquals(((TupleTypeInfo<?>) ti).getTypeAt(1), BasicTypeInfo.STRING_TYPE_INFO);
    +        if (ti != null) {
    --- End diff --
    
    spaces vs tabs?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-flink pull request: [FLINK-1245] Introduce TypeHints for...

Posted by twalthr <gi...@git.apache.org>.
Github user twalthr closed the pull request at:

    https://github.com/apache/incubator-flink/pull/203


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-flink pull request: [FLINK-1245] Introduce TypeHints for...

Posted by twalthr <gi...@git.apache.org>.
Github user twalthr commented on a diff in the pull request:

    https://github.com/apache/incubator-flink/pull/203#discussion_r21743690
  
    --- Diff: flink-java/src/main/java/org/apache/flink/api/java/DataSet.java ---
    @@ -98,14 +99,18 @@ protected DataSet(ExecutionEnvironment context, TypeInformation<T> type) {
     			throw new NullPointerException("context is null");
     		}
     
    -		if (type == null) {
    -			throw new NullPointerException("type is null");
    -		}
    -		
     		this.context = context;
     		this.type = type;
     	}
     
    +	protected void setType(TypeInformation<T> type) {
    --- End diff --
    
    I would allow to set the type info twice, because of 2 reasons: 
    - If the type extractor extracts something wrong (what could happen due to the complexity of POJOs), the user can use type hints as a workaround and overwrite the wrong type extraction result.
    - If the user uses lambdas in IntelliJ and gives TypeHints, he will get a IllegalStateExpection when building the project with maven.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-flink pull request: [FLINK-1245] Introduce TypeHints for...

Posted by twalthr <gi...@git.apache.org>.
Github user twalthr commented on the pull request:

    https://github.com/apache/incubator-flink/pull/203#issuecomment-67185989
  
    You are correct, the typeInfo should never be null. I do some checking for that only to be 100 percent sure.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-flink pull request: [FLINK-1245] Introduce TypeHints for...

Posted by StephanEwen <gi...@git.apache.org>.
Github user StephanEwen commented on a diff in the pull request:

    https://github.com/apache/incubator-flink/pull/203#discussion_r21740006
  
    --- Diff: flink-java/src/main/java/org/apache/flink/api/java/DataSet.java ---
    @@ -125,6 +130,11 @@ public ExecutionEnvironment getExecutionEnvironment() {
     	 * @see TypeInformation
     	 */
     	public TypeInformation<T> getType() {
    +		if (type == null) {
    +			throw new InvalidTypesException("The return type could not be determined automatically. "
    --- End diff --
    
    I think we need to be a bit better with the error message. This basically says "didn't work, go figure it out". Directing to the log is not quite as good as giving the right error message a the right place.
    
    In a prior version, the code had a "MissingTypeInfo". I think that was good because that missing type info can hold:
      - The operator (name) where the return type was not determined
      - And the reason for the failure (the exception from the type extractor)
    
    Instead of directing the user to the log, we can throw an exception and in that exception describe exactly which operation it was, what the cause was, and we can chain the original exception.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-flink pull request: [FLINK-1245] Introduce TypeHints for...

Posted by twalthr <gi...@git.apache.org>.
Github user twalthr commented on a diff in the pull request:

    https://github.com/apache/incubator-flink/pull/203#discussion_r21416482
  
    --- Diff: flink-java8/src/test/java/org/apache/flink/api/java/type/lambdas/LambdaExtractionTest.java ---
    @@ -129,10 +129,12 @@ public void testMapLambda() {
     		MapFunction<Tuple2<Tuple1<Integer>, Boolean>, Tuple2<Tuple1<Integer>, String>> f = (i) -> null;
     		
     		TypeInformation<?> ti = TypeExtractor.getMapReturnTypes(f, TypeInfoParser.parse("Tuple2<Tuple1<Integer>, Boolean>"));
    -		Assert.assertTrue(ti.isTupleType());
    -		Assert.assertEquals(2, ti.getArity());
    -		Assert.assertTrue(((TupleTypeInfo<?>) ti).getTypeAt(0).isTupleType());
    -		Assert.assertEquals(((TupleTypeInfo<?>) ti).getTypeAt(1), BasicTypeInfo.STRING_TYPE_INFO);
    +        if (ti != null) {
    --- End diff --
    
    Sorry, I'm always have to switch between IntelliJ and Eclipse when I'm using Lambdas. I should configure IntelliJ to use Tabs. Will correct that...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-flink pull request: [FLINK-1245] Introduce TypeHints for...

Posted by rmetzger <gi...@git.apache.org>.
Github user rmetzger commented on the pull request:

    https://github.com/apache/incubator-flink/pull/203#issuecomment-64384654
  
    Why can't we use the generic type serializer for the unknown type info?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-flink pull request: [FLINK-1245] Introduce TypeHints for...

Posted by twalthr <gi...@git.apache.org>.
Github user twalthr commented on the pull request:

    https://github.com/apache/incubator-flink/pull/203#issuecomment-64880344
  
    Do we really need a MissingTypeInfo anymore? The type extractor could simply return `null`. A MissingTypeInfo type has no functionality anyway (no serializer etc.).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-flink pull request: [FLINK-1245] Introduce TypeHints for...

Posted by twalthr <gi...@git.apache.org>.
Github user twalthr commented on a diff in the pull request:

    https://github.com/apache/incubator-flink/pull/203#discussion_r20855874
  
    --- Diff: flink-java/src/main/java/org/apache/flink/api/java/io/CsvOutputFormat.java ---
    @@ -217,7 +218,7 @@ public String toString() {
     	 */
     	@Override
     	public void setInputType(TypeInformation<?> type) {
    -		if (!type.isTupleType()) {
    +		if (!type.isTupleType() && !(type instanceof UnknownTypeInfo)) {
    --- End diff --
    
    Thats one of the points that need to be discussed. type.isTupleType() can return false, but since it is an UnkownType it can be a TupleType and thus the job runs successfully anyways.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-flink pull request: [FLINK-1245] Introduce TypeHints for...

Posted by twalthr <gi...@git.apache.org>.
Github user twalthr commented on the pull request:

    https://github.com/apache/incubator-flink/pull/203#issuecomment-64381358
  
    Would be great if someone could review this PR soon :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-flink pull request: [FLINK-1245] Introduce TypeHints for...

Posted by StephanEwen <gi...@git.apache.org>.
Github user StephanEwen commented on a diff in the pull request:

    https://github.com/apache/incubator-flink/pull/203#discussion_r21740407
  
    --- Diff: flink-java/src/main/java/org/apache/flink/api/java/DataSet.java ---
    @@ -98,14 +99,18 @@ protected DataSet(ExecutionEnvironment context, TypeInformation<T> type) {
     			throw new NullPointerException("context is null");
     		}
     
    -		if (type == null) {
    -			throw new NullPointerException("type is null");
    -		}
    -		
     		this.context = context;
     		this.type = type;
     	}
     
    +	protected void setType(TypeInformation<T> type) {
    --- End diff --
    
    May be good to make this a little safer here, now that the type info is no longer final.
    We could call this `fillInTypeInfo()` and check whether the type info is currently null, (or a MissingTypeInfo) and only then allow to change the field. Otherwise throw an IllegalStateException.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-flink pull request: [FLINK-1245] Introduce TypeHints for...

Posted by twalthr <gi...@git.apache.org>.
Github user twalthr commented on the pull request:

    https://github.com/apache/incubator-flink/pull/203#issuecomment-64904984
  
    I have updated the PR. I really hope that it got accepted now :)
    I also added the new EnumType to the TypeInfoParser.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-flink pull request: [FLINK-1245] Introduce TypeHints for...

Posted by twalthr <gi...@git.apache.org>.
Github user twalthr commented on the pull request:

    https://github.com/apache/incubator-flink/pull/203#issuecomment-64777998
  
    So in essence I should return to my first proposal :D I will update this PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-flink pull request: [FLINK-1245] Introduce TypeHints for...

Posted by StephanEwen <gi...@git.apache.org>.
Github user StephanEwen commented on the pull request:

    https://github.com/apache/incubator-flink/pull/203#issuecomment-67171707
  
    I think it is right that the `fillInTypeInformation()` call fails in the example you mentioned above. If the semantic properties are extracted on a different type info than the one that is filled in, we assume wrong semantic properties in the optimizer.
    
    We need to fix this by deferring the property extraction.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-flink pull request: [FLINK-1245] Introduce TypeHints for...

Posted by StephanEwen <gi...@git.apache.org>.
Github user StephanEwen commented on the pull request:

    https://github.com/apache/incubator-flink/pull/203#issuecomment-67218263
  
    I added a bit on top of it and opened PR #270 for the changes.
    
    @twalthr Can you have a look whether it makes sense to you?
    
    The main change is that I removed the `updateTypeDependentProperties()`, as that seemed a bit fragile to me. Instead, semantic properties are extracted lazy now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-flink pull request: [FLINK-1245] Introduce TypeHints for...

Posted by twalthr <gi...@git.apache.org>.
Github user twalthr commented on the pull request:

    https://github.com/apache/incubator-flink/pull/203#issuecomment-67066182
  
    I have implemented your feedback. However, if the TypeExtractor classifies an object as GenericTypeInfo and we e.g. use a FlatMap, it not possible to change that type again. Since the extractSemanticProperties() calls getType() and thus flags the type as used.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-flink pull request: [FLINK-1245] Introduce TypeHints for...

Posted by StephanEwen <gi...@git.apache.org>.
Github user StephanEwen commented on the pull request:

    https://github.com/apache/incubator-flink/pull/203#issuecomment-67173983
  
    The error messages currently uses `function.toString()` to identify the functions. I will change that to use the call site location, as for other messages. It is more informative.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-flink pull request: [FLINK-1245] Introduce TypeHints for...

Posted by StephanEwen <gi...@git.apache.org>.
Github user StephanEwen commented on the pull request:

    https://github.com/apache/incubator-flink/pull/203#issuecomment-64698910
  
    I like the direction of this pull request. Here are some thought what we could improve:
    
      - Keeping the UnknownTypeInfo all the way and actually trying to work with it will shadow problems. At much later points, when someone tries to use a POJO field on the type info, an error will occur, at a point that is disconnected from the source of the problem.
    
      - How about we rename the `UnknownTypeInfo` to `MissingTypeInfo` and use it only as a place-holder for the `returns()` hint method, which exchanges that type info by the proper one.
    
      - We could make sure that type extraction failures are recognized early, by having the `getType()` method on the DataSet throw an error if only a `MissingTypeInfo` exists.
    
      - Getting Generic Lambdas to run with this may not be a good idea. This goes against my original suggestion to keep this for the purpose of local development with OracleJDK, but in hind-sight, I think things will get quite intransparent to reason about and hard to maintain/debug/fix.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-flink pull request: [FLINK-1245] Introduce TypeHints for...

Posted by twalthr <gi...@git.apache.org>.
Github user twalthr commented on the pull request:

    https://github.com/apache/incubator-flink/pull/203#issuecomment-67042381
  
    I'm implementing your comments right now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-flink pull request: [FLINK-1245] Introduce TypeHints for...

Posted by rmetzger <gi...@git.apache.org>.
Github user rmetzger commented on a diff in the pull request:

    https://github.com/apache/incubator-flink/pull/203#discussion_r20855741
  
    --- Diff: flink-java/src/main/java/org/apache/flink/api/java/io/CsvOutputFormat.java ---
    @@ -217,7 +218,7 @@ public String toString() {
     	 */
     	@Override
     	public void setInputType(TypeInformation<?> type) {
    -		if (!type.isTupleType()) {
    +		if (!type.isTupleType() && !(type instanceof UnknownTypeInfo)) {
    --- End diff --
    
    Since UnknownTypeInfo is not a Tuple type, why do we need the additional check here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-flink pull request: [FLINK-1245] Introduce TypeHints for...

Posted by StephanEwen <gi...@git.apache.org>.
Github user StephanEwen commented on the pull request:

    https://github.com/apache/incubator-flink/pull/203#issuecomment-67170934
  
    Looks good to me. I am merging this now...
    
    I have one more questions (thinking about adding a sanity check during the merge): Can the typeInfo in the data set ever by null, or is it always either a proper `TypeInformation`, or a `MissingTypeInfo` ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-flink pull request: [FLINK-1245] Introduce TypeHints for...

Posted by twalthr <gi...@git.apache.org>.
Github user twalthr commented on the pull request:

    https://github.com/apache/incubator-flink/pull/203#issuecomment-64385776
  
    All generic type serializers are mutable types, but if we only have Object as type information we cannot create instances of it. Additionally, e.g. in case of a Tuple we may have the information that we have a Tuple2 but without generic information. The UnkownTypeSerializer supports both cases.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-flink pull request: [FLINK-1245] Introduce TypeHints for...

Posted by rmetzger <gi...@git.apache.org>.
Github user rmetzger commented on the pull request:

    https://github.com/apache/incubator-flink/pull/203#issuecomment-65896131
  
    Except for the indentation with spaces in some tests, the change looks good.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---