You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@flink.apache.org by Stephan Ewen <se...@apache.org> on 2015/01/09 11:44:02 UTC

Type Hints in the Java API

Hi everyone!

We recently introduced type hints for the Java API. Since that is a pretty
useful feature, I wanted to quickly explain what it is.

Kudos to Timo Walther, who did a large part of this work.


*Background*

Flink tries to figure out as much information about what types enter and
leave user functions as possible.

 - For the POJO API (where one refers to field names), we need that
information to make checks (for typos and type compatibility) before the
job is executed.

 - For the upcoming logical programs (see roadmap draft) we need this to
know the "schema" of functions.

 - The more we know, the better serialization and data layout schemes the
compiler/optimizer can develop. That is quite important for the memory
usage paradigm in Flink (work on serialized data inside/outside the heap
and make serialization very cheap)

 - Finally, it also spares users having to worry about serialization
frameworks and having to register types at those frameworks.


*Problem*

Scala is an easy case, because it preserves generic type information
(ClassTags / Type Manifests), but Java erases generic type info in most
cases.

We do reflection analysis on the user function classes to get the generic
types. This logic also contains some simple type inference in case the
functions have type variables (such as a MapFunction<T, Tuple2<T, Long>>).

Not in all cases can we figure out the data types of functions reliably in
Java. Some issues remain with generic lambdas (we are trying to solve this
with the Java community, see below) and with generic type variables that we
cannot infer.


*Solution: Type Hints*

To make this cases work easily, a recent addition to the 0.9-SNAPSHOT
master introduced type hints. They allow you to tell the system types that
it cannot infer.

You can write code like

DataSet<SomeType> result =
        dataSet.map(new MyGenericNonInferrableFunction<Long,
SomeType>()).returns(SomeType.class);


To make specification of generic types easier, it also comes with a parser
for simple string representations of generic types:

  .returns("Tuple2<Integer, my.SomeType>")


We suggest to use this instead of the "ResultTypeQueryable" workaround that
has been used in some cases.


*Improving Type information in Java*

One Flink committer (Timo Walther) has actually become active in the
Eclipse JDT compiler community and in the OpenJDK community to try and
improve the way type information is available for lambdas.


Greetings,
Stephan

Re: Type Hints in the Java API

Posted by Timo Walther <tw...@apache.org>.

Thanks for the doc!

I will add a complete list of all currently supported types (including 
Enums, Writables etc. not metioned in the doc) and types that will be 
supported in near future (e.g. multi-dimensional arrays).

On 09.01.2015 17:37, Stephan Ewen wrote:
> Here is a doc that details type handling/extraction for both the Java API
> and the Scala API, including the type hints.
>
> https://github.com/apache/flink/blob/master/docs/internal_types_serialization.md
>
> Enjoy :-)
>
> On Fri, Jan 9, 2015 at 12:26 PM, Vasiliki Kalavri <vasilikikalavri@gmail.com
>> wrote:
>> Hi,
>>
>> thanks for the nice explanation and the great work!
>> This will simplify our Graph API-lives a lot ^^
>>
>> Cheers,
>> V.
>>
>> On 9 January 2015 at 11:59, Stephan Ewen <se...@apache.org> wrote:
>>
>>> I am adding a derivative of that text to the docs right now.
>>>
>>>
>>>
>>> On Fri, Jan 9, 2015 at 11:54 AM, Robert Metzger <rm...@apache.org>
>>> wrote:
>>>
>>>> Thank you!
>>>>
>>>> It would be amazing if you or somebody else could copy paste this into
>>> our
>>>> documentation.
>>>>
>>>> On Fri, Jan 9, 2015 at 11:44 AM, Stephan Ewen <se...@apache.org>
>> wrote:
>>>>> Hi everyone!
>>>>>
>>>>> We recently introduced type hints for the Java API. Since that is a
>>>> pretty
>>>>> useful feature, I wanted to quickly explain what it is.
>>>>>
>>>>> Kudos to Timo Walther, who did a large part of this work.
>>>>>
>>>>>
>>>>> *Background*
>>>>>
>>>>> Flink tries to figure out as much information about what types enter
>>> and
>>>>> leave user functions as possible.
>>>>>
>>>>>   - For the POJO API (where one refers to field names), we need that
>>>>> information to make checks (for typos and type compatibility) before
>>> the
>>>>> job is executed.
>>>>>
>>>>>   - For the upcoming logical programs (see roadmap draft) we need this
>>> to
>>>>> know the "schema" of functions.
>>>>>
>>>>>   - The more we know, the better serialization and data layout schemes
>>> the
>>>>> compiler/optimizer can develop. That is quite important for the
>> memory
>>>>> usage paradigm in Flink (work on serialized data inside/outside the
>>> heap
>>>>> and make serialization very cheap)
>>>>>
>>>>>   - Finally, it also spares users having to worry about serialization
>>>>> frameworks and having to register types at those frameworks.
>>>>>
>>>>>
>>>>> *Problem*
>>>>>
>>>>> Scala is an easy case, because it preserves generic type information
>>>>> (ClassTags / Type Manifests), but Java erases generic type info in
>> most
>>>>> cases.
>>>>>
>>>>> We do reflection analysis on the user function classes to get the
>>> generic
>>>>> types. This logic also contains some simple type inference in case
>> the
>>>>> functions have type variables (such as a MapFunction<T, Tuple2<T,
>>>> Long>>).
>>>>> Not in all cases can we figure out the data types of functions
>> reliably
>>>> in
>>>>> Java. Some issues remain with generic lambdas (we are trying to solve
>>>> this
>>>>> with the Java community, see below) and with generic type variables
>>> that
>>>> we
>>>>> cannot infer.
>>>>>
>>>>>
>>>>> *Solution: Type Hints*
>>>>>
>>>>> To make this cases work easily, a recent addition to the 0.9-SNAPSHOT
>>>>> master introduced type hints. They allow you to tell the system types
>>>> that
>>>>> it cannot infer.
>>>>>
>>>>> You can write code like
>>>>>
>>>>> DataSet<SomeType> result =
>>>>>          dataSet.map(new MyGenericNonInferrableFunction<Long,
>>>>> SomeType>()).returns(SomeType.class);
>>>>>
>>>>>
>>>>> To make specification of generic types easier, it also comes with a
>>>> parser
>>>>> for simple string representations of generic types:
>>>>>
>>>>>    .returns("Tuple2<Integer, my.SomeType>")
>>>>>
>>>>>
>>>>> We suggest to use this instead of the "ResultTypeQueryable"
>> workaround
>>>> that
>>>>> has been used in some cases.
>>>>>
>>>>>
>>>>> *Improving Type information in Java*
>>>>>
>>>>> One Flink committer (Timo Walther) has actually become active in the
>>>>> Eclipse JDT compiler community and in the OpenJDK community to try
>> and
>>>>> improve the way type information is available for lambdas.
>>>>>
>>>>>
>>>>> Greetings,
>>>>> Stephan
>>>>>

Re: Type Hints in the Java API

Posted by Stephan Ewen <se...@apache.org>.

Here is a doc that details type handling/extraction for both the Java API
and the Scala API, including the type hints.

https://github.com/apache/flink/blob/master/docs/internal_types_serialization.md

Enjoy :-)

On Fri, Jan 9, 2015 at 12:26 PM, Vasiliki Kalavri <vasilikikalavri@gmail.com
> wrote:

> Hi,
>
> thanks for the nice explanation and the great work!
> This will simplify our Graph API-lives a lot ^^
>
> Cheers,
> V.
>
> On 9 January 2015 at 11:59, Stephan Ewen <se...@apache.org> wrote:
>
> > I am adding a derivative of that text to the docs right now.
> >
> >
> >
> > On Fri, Jan 9, 2015 at 11:54 AM, Robert Metzger <rm...@apache.org>
> > wrote:
> >
> > > Thank you!
> > >
> > > It would be amazing if you or somebody else could copy paste this into
> > our
> > > documentation.
> > >
> > > On Fri, Jan 9, 2015 at 11:44 AM, Stephan Ewen <se...@apache.org>
> wrote:
> > >
> > > > Hi everyone!
> > > >
> > > > We recently introduced type hints for the Java API. Since that is a
> > > pretty
> > > > useful feature, I wanted to quickly explain what it is.
> > > >
> > > > Kudos to Timo Walther, who did a large part of this work.
> > > >
> > > >
> > > > *Background*
> > > >
> > > > Flink tries to figure out as much information about what types enter
> > and
> > > > leave user functions as possible.
> > > >
> > > >  - For the POJO API (where one refers to field names), we need that
> > > > information to make checks (for typos and type compatibility) before
> > the
> > > > job is executed.
> > > >
> > > >  - For the upcoming logical programs (see roadmap draft) we need this
> > to
> > > > know the "schema" of functions.
> > > >
> > > >  - The more we know, the better serialization and data layout schemes
> > the
> > > > compiler/optimizer can develop. That is quite important for the
> memory
> > > > usage paradigm in Flink (work on serialized data inside/outside the
> > heap
> > > > and make serialization very cheap)
> > > >
> > > >  - Finally, it also spares users having to worry about serialization
> > > > frameworks and having to register types at those frameworks.
> > > >
> > > >
> > > > *Problem*
> > > >
> > > > Scala is an easy case, because it preserves generic type information
> > > > (ClassTags / Type Manifests), but Java erases generic type info in
> most
> > > > cases.
> > > >
> > > > We do reflection analysis on the user function classes to get the
> > generic
> > > > types. This logic also contains some simple type inference in case
> the
> > > > functions have type variables (such as a MapFunction<T, Tuple2<T,
> > > Long>>).
> > > >
> > > > Not in all cases can we figure out the data types of functions
> reliably
> > > in
> > > > Java. Some issues remain with generic lambdas (we are trying to solve
> > > this
> > > > with the Java community, see below) and with generic type variables
> > that
> > > we
> > > > cannot infer.
> > > >
> > > >
> > > > *Solution: Type Hints*
> > > >
> > > > To make this cases work easily, a recent addition to the 0.9-SNAPSHOT
> > > > master introduced type hints. They allow you to tell the system types
> > > that
> > > > it cannot infer.
> > > >
> > > > You can write code like
> > > >
> > > > DataSet<SomeType> result =
> > > >         dataSet.map(new MyGenericNonInferrableFunction<Long,
> > > > SomeType>()).returns(SomeType.class);
> > > >
> > > >
> > > > To make specification of generic types easier, it also comes with a
> > > parser
> > > > for simple string representations of generic types:
> > > >
> > > >   .returns("Tuple2<Integer, my.SomeType>")
> > > >
> > > >
> > > > We suggest to use this instead of the "ResultTypeQueryable"
> workaround
> > > that
> > > > has been used in some cases.
> > > >
> > > >
> > > > *Improving Type information in Java*
> > > >
> > > > One Flink committer (Timo Walther) has actually become active in the
> > > > Eclipse JDT compiler community and in the OpenJDK community to try
> and
> > > > improve the way type information is available for lambdas.
> > > >
> > > >
> > > > Greetings,
> > > > Stephan
> > > >
> > >
> >
>

Re: Type Hints in the Java API

Posted by Vasiliki Kalavri <va...@gmail.com>.

Hi,

thanks for the nice explanation and the great work!
This will simplify our Graph API-lives a lot ^^

Cheers,
V.

On 9 January 2015 at 11:59, Stephan Ewen <se...@apache.org> wrote:

> I am adding a derivative of that text to the docs right now.
>
>
>
> On Fri, Jan 9, 2015 at 11:54 AM, Robert Metzger <rm...@apache.org>
> wrote:
>
> > Thank you!
> >
> > It would be amazing if you or somebody else could copy paste this into
> our
> > documentation.
> >
> > On Fri, Jan 9, 2015 at 11:44 AM, Stephan Ewen <se...@apache.org> wrote:
> >
> > > Hi everyone!
> > >
> > > We recently introduced type hints for the Java API. Since that is a
> > pretty
> > > useful feature, I wanted to quickly explain what it is.
> > >
> > > Kudos to Timo Walther, who did a large part of this work.
> > >
> > >
> > > *Background*
> > >
> > > Flink tries to figure out as much information about what types enter
> and
> > > leave user functions as possible.
> > >
> > >  - For the POJO API (where one refers to field names), we need that
> > > information to make checks (for typos and type compatibility) before
> the
> > > job is executed.
> > >
> > >  - For the upcoming logical programs (see roadmap draft) we need this
> to
> > > know the "schema" of functions.
> > >
> > >  - The more we know, the better serialization and data layout schemes
> the
> > > compiler/optimizer can develop. That is quite important for the memory
> > > usage paradigm in Flink (work on serialized data inside/outside the
> heap
> > > and make serialization very cheap)
> > >
> > >  - Finally, it also spares users having to worry about serialization
> > > frameworks and having to register types at those frameworks.
> > >
> > >
> > > *Problem*
> > >
> > > Scala is an easy case, because it preserves generic type information
> > > (ClassTags / Type Manifests), but Java erases generic type info in most
> > > cases.
> > >
> > > We do reflection analysis on the user function classes to get the
> generic
> > > types. This logic also contains some simple type inference in case the
> > > functions have type variables (such as a MapFunction<T, Tuple2<T,
> > Long>>).
> > >
> > > Not in all cases can we figure out the data types of functions reliably
> > in
> > > Java. Some issues remain with generic lambdas (we are trying to solve
> > this
> > > with the Java community, see below) and with generic type variables
> that
> > we
> > > cannot infer.
> > >
> > >
> > > *Solution: Type Hints*
> > >
> > > To make this cases work easily, a recent addition to the 0.9-SNAPSHOT
> > > master introduced type hints. They allow you to tell the system types
> > that
> > > it cannot infer.
> > >
> > > You can write code like
> > >
> > > DataSet<SomeType> result =
> > >         dataSet.map(new MyGenericNonInferrableFunction<Long,
> > > SomeType>()).returns(SomeType.class);
> > >
> > >
> > > To make specification of generic types easier, it also comes with a
> > parser
> > > for simple string representations of generic types:
> > >
> > >   .returns("Tuple2<Integer, my.SomeType>")
> > >
> > >
> > > We suggest to use this instead of the "ResultTypeQueryable" workaround
> > that
> > > has been used in some cases.
> > >
> > >
> > > *Improving Type information in Java*
> > >
> > > One Flink committer (Timo Walther) has actually become active in the
> > > Eclipse JDT compiler community and in the OpenJDK community to try and
> > > improve the way type information is available for lambdas.
> > >
> > >
> > > Greetings,
> > > Stephan
> > >
> >
>

Re: Type Hints in the Java API

Posted by Stephan Ewen <se...@apache.org>.

I am adding a derivative of that text to the docs right now.



On Fri, Jan 9, 2015 at 11:54 AM, Robert Metzger <rm...@apache.org> wrote:

> Thank you!
>
> It would be amazing if you or somebody else could copy paste this into our
> documentation.
>
> On Fri, Jan 9, 2015 at 11:44 AM, Stephan Ewen <se...@apache.org> wrote:
>
> > Hi everyone!
> >
> > We recently introduced type hints for the Java API. Since that is a
> pretty
> > useful feature, I wanted to quickly explain what it is.
> >
> > Kudos to Timo Walther, who did a large part of this work.
> >
> >
> > *Background*
> >
> > Flink tries to figure out as much information about what types enter and
> > leave user functions as possible.
> >
> >  - For the POJO API (where one refers to field names), we need that
> > information to make checks (for typos and type compatibility) before the
> > job is executed.
> >
> >  - For the upcoming logical programs (see roadmap draft) we need this to
> > know the "schema" of functions.
> >
> >  - The more we know, the better serialization and data layout schemes the
> > compiler/optimizer can develop. That is quite important for the memory
> > usage paradigm in Flink (work on serialized data inside/outside the heap
> > and make serialization very cheap)
> >
> >  - Finally, it also spares users having to worry about serialization
> > frameworks and having to register types at those frameworks.
> >
> >
> > *Problem*
> >
> > Scala is an easy case, because it preserves generic type information
> > (ClassTags / Type Manifests), but Java erases generic type info in most
> > cases.
> >
> > We do reflection analysis on the user function classes to get the generic
> > types. This logic also contains some simple type inference in case the
> > functions have type variables (such as a MapFunction<T, Tuple2<T,
> Long>>).
> >
> > Not in all cases can we figure out the data types of functions reliably
> in
> > Java. Some issues remain with generic lambdas (we are trying to solve
> this
> > with the Java community, see below) and with generic type variables that
> we
> > cannot infer.
> >
> >
> > *Solution: Type Hints*
> >
> > To make this cases work easily, a recent addition to the 0.9-SNAPSHOT
> > master introduced type hints. They allow you to tell the system types
> that
> > it cannot infer.
> >
> > You can write code like
> >
> > DataSet<SomeType> result =
> >         dataSet.map(new MyGenericNonInferrableFunction<Long,
> > SomeType>()).returns(SomeType.class);
> >
> >
> > To make specification of generic types easier, it also comes with a
> parser
> > for simple string representations of generic types:
> >
> >   .returns("Tuple2<Integer, my.SomeType>")
> >
> >
> > We suggest to use this instead of the "ResultTypeQueryable" workaround
> that
> > has been used in some cases.
> >
> >
> > *Improving Type information in Java*
> >
> > One Flink committer (Timo Walther) has actually become active in the
> > Eclipse JDT compiler community and in the OpenJDK community to try and
> > improve the way type information is available for lambdas.
> >
> >
> > Greetings,
> > Stephan
> >
>

Re: Type Hints in the Java API

Posted by Robert Metzger <rm...@apache.org>.

Thank you!

It would be amazing if you or somebody else could copy paste this into our
documentation.

On Fri, Jan 9, 2015 at 11:44 AM, Stephan Ewen <se...@apache.org> wrote:

> Hi everyone!
>
> We recently introduced type hints for the Java API. Since that is a pretty
> useful feature, I wanted to quickly explain what it is.
>
> Kudos to Timo Walther, who did a large part of this work.
>
>
> *Background*
>
> Flink tries to figure out as much information about what types enter and
> leave user functions as possible.
>
>  - For the POJO API (where one refers to field names), we need that
> information to make checks (for typos and type compatibility) before the
> job is executed.
>
>  - For the upcoming logical programs (see roadmap draft) we need this to
> know the "schema" of functions.
>
>  - The more we know, the better serialization and data layout schemes the
> compiler/optimizer can develop. That is quite important for the memory
> usage paradigm in Flink (work on serialized data inside/outside the heap
> and make serialization very cheap)
>
>  - Finally, it also spares users having to worry about serialization
> frameworks and having to register types at those frameworks.
>
>
> *Problem*
>
> Scala is an easy case, because it preserves generic type information
> (ClassTags / Type Manifests), but Java erases generic type info in most
> cases.
>
> We do reflection analysis on the user function classes to get the generic
> types. This logic also contains some simple type inference in case the
> functions have type variables (such as a MapFunction<T, Tuple2<T, Long>>).
>
> Not in all cases can we figure out the data types of functions reliably in
> Java. Some issues remain with generic lambdas (we are trying to solve this
> with the Java community, see below) and with generic type variables that we
> cannot infer.
>
>
> *Solution: Type Hints*
>
> To make this cases work easily, a recent addition to the 0.9-SNAPSHOT
> master introduced type hints. They allow you to tell the system types that
> it cannot infer.
>
> You can write code like
>
> DataSet<SomeType> result =
>         dataSet.map(new MyGenericNonInferrableFunction<Long,
> SomeType>()).returns(SomeType.class);
>
>
> To make specification of generic types easier, it also comes with a parser
> for simple string representations of generic types:
>
>   .returns("Tuple2<Integer, my.SomeType>")
>
>
> We suggest to use this instead of the "ResultTypeQueryable" workaround that
> has been used in some cases.
>
>
> *Improving Type information in Java*
>
> One Flink committer (Timo Walther) has actually become active in the
> Eclipse JDT compiler community and in the OpenJDK community to try and
> improve the way type information is available for lambdas.
>
>
> Greetings,
> Stephan
>