You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flink.apache.org by "Matthias J. Sax" <mj...@informatik.hu-berlin.de> on 2015/07/31 21:37:04 UTC

Tuple

Hi,

is there any specific reason, why Tuple.getTupleClass(int arity) does
not support arity zero? There is a class Tuple0, but it cannot be
generator by Tuple.getTupleClass(...). Is it a missing feature (I would
like to have it).

-Matthias


Re: Tuple

Posted by Chesnay Schepler <ch...@fu-berlin.de>.
yes, if it is present in the core flink files it must work just as any 
tuple in flink.

removing is not an option though; but moving is. The Python API uses it 
(that's the reason Tuple0 was added in the first place).

On 01.08.2015 13:04, Matthias J. Sax wrote:
> I see.
>
> I think that it might be useful to have Tuple0, because in rare cases,
> you only want to "notify" a downstream operators (taking about
> streaming) that something happened but there is no actual data to be
> processed. Furthermore, if Flink cannot deal with Tuple0 it should be
> removed completely for consistency IMHO.
>
> I will open a JIRA for it.
>
> -Matthias
>
> On 07/31/2015 10:44 PM, Chesnay Schepler wrote:
>> also, I'm not sure if I ever sent a Tuple0 through a program, it could
>> be that the system freaks out.
>>
>> On 31.07.2015 22:40, Chesnay Schepler wrote:
>>> there's no specific reason. it was added fairly recently by me (mid of
>>> april), and you're most likely the second person to use it.
>>>
>>> i didn't integrate into all our tuple related stuff because, well, i
>>> never thought anyone would actually need it, so i saved myself the
>>> trouble.
>>>
>>>> Hi,
>>>>
>>>> is there any specific reason, why Tuple.getTupleClass(int arity) does
>>>> not support arity zero? There is a class Tuple0, but it cannot be
>>>> generator by Tuple.getTupleClass(...). Is it a missing feature (I would
>>>> like to have it).
>>>>
>>>> -Matthias
>>>>
>>>


Re: Tuple

Posted by Stephan Ewen <se...@apache.org>.
The idea of the dedicated project was to make the tuples usable in other
programs, that may interact with Flink, but won't want the full
dependencies.

I share the concern about too many small projects...

On Mon, Aug 3, 2015 at 1:01 AM, Matthias J. Sax <
mjsax@informatik.hu-berlin.de> wrote:

> Thanks for the advice about Tuple0.
>
> I personally don't see any advantage in having "flink-tuple" project. Do
> I miss anything about it? Furthermore, I am not sure if it is a good
> idea the have too many too small projects.
>
>
> On 08/03/2015 12:48 AM, Stephan Ewen wrote:
> > Tuple0 would need special serialization and comparator logic. If that is
> > given, I see no reason not to support it.
> >
> > There is BTW, the request to create a dedicated "flink-tuple" project,
> that
> > only contains the tuple classes. Any opinions on that?
> >
> > On Mon, Aug 3, 2015 at 12:45 AM, Matthias J. Sax <
> > mjsax@informatik.hu-berlin.de> wrote:
> >
> >> Thanks for the explanation!
> >>
> >> As I mentioned before, Tuple0 might also be helpful for streaming. And I
> >> guess I will need it for Storm compatibility layer, too. (I need to
> >> double check, but Storm supports zero-attribute-tuples, too).
> >>
> >> With regard to the information I collected during the discussion, I vote
> >> for keeping Tuple0 in Flink core, and fix the serialization problem.
> >> Should we have another JIRA for this? Or should I extend the existing
> >> JIRA? (https://issues.apache.org/jira/browse/FLINK-2457)
> >>
> >> -Matthias
> >>
> >>
> >> On 08/03/2015 12:22 AM, Chesnay Schepler wrote:
> >>> First of all, it was a really good idea to start a discussion about
> this.
> >>>
> >>> So the general idea behind Tuple0 was this:
> >>>
> >>> The Python API maps python tuples to flink tuples. Python can have
> empty
> >>> tuples, so i thought "well duh, let's make a Tuple0 class!". What i did
> >>> not wanna do is create some non-Tuple object to represent empty tuples,
> >>> I'd rather have them treated the same, because it's less work and
> >>> creates simpler code.
> >>>
> >>> When transferring the plan to java, certain parameters for operations
> >>> are tuples, which can be empty aswell.
> >>> This is where the Tuple0 class is really useful, because these empty
> >>> tuples go through the same logic as other tuples.
> >>> This is also why i want to keep the class, at least in the python
> >>> project, for now.
> >>>
> >>> For the actual program execution, I need a new solution. Funny story,
> >>> while writing this reply i noticed that the Python API can't handle
> >>> Tuple0 at runtime aswell. ha...ha... -.-
> >>>
> >>> Guess I now know what I'm working on next.
> >>>
> >>> On 02.08.2015 21:24, Matthias J. Sax wrote:
> >>>> Can you elaborate how and why Python used Tuple0? If it cannot be
> >>>> serialized similar to regular Tuples, what is the usage in Python?
> Right
> >>>> now it seems, as there is no special serialization code for Tuple0.
> >>>>
> >>>> I just want to understand the topic in detail.
> >>>>
> >>>> -Matthias
> >>>>
> >>>>
> >>>> On 08/01/2015 03:38 PM, Stephan Ewen wrote:
> >>>>> I think a Tuple0 cannot be implemented like the current tuples, at
> >> least
> >>>>> with respect to runtime serialization.
> >>>>>
> >>>>> The system makes the assumption that it makes progress in consuming
> >>>>> bytes
> >>>>> when deserializing values. If a Tuple= never consumes data from the
> >> byte
> >>>>> stream, this assumption is broken. It would need at least one marker
> >>>>> byte.
> >>>>> Then it effectively is a Tuple1<Byte> disgusing itself as a tuple0.
> >>>>>
> >>>>>
> >>>>>
> >>>>> On Sat, Aug 1, 2015 at 1:38 PM, Matthias J. Sax <
> >>>>> mjsax@informatik.hu-berlin.de> wrote:
> >>>>>
> >>>>>> I just double checked. Scala does not have type Tuple0. IMHO, it
> would
> >>>>>> be best to remove Tuple0 for consistency. Having Tuple types is for
> >>>>>> consistency reason with Scala in the first place, right? Please give
> >>>>>> feedback.
> >>>>>>
> >>>>>> -Matthias
> >>>>>>
> >>>>>>
> >>>>>> On 08/01/2015 01:04 PM, Matthias J. Sax wrote:
> >>>>>>> I see.
> >>>>>>>
> >>>>>>> I think that it might be useful to have Tuple0, because in rare
> >> cases,
> >>>>>>> you only want to "notify" a downstream operators (taking about
> >>>>>>> streaming) that something happened but there is no actual data to
> be
> >>>>>>> processed. Furthermore, if Flink cannot deal with Tuple0 it should
> be
> >>>>>>> removed completely for consistency IMHO.
> >>>>>>>
> >>>>>>> I will open a JIRA for it.
> >>>>>>>
> >>>>>>> -Matthias
> >>>>>>>
> >>>>>>> On 07/31/2015 10:44 PM, Chesnay Schepler wrote:
> >>>>>>>> also, I'm not sure if I ever sent a Tuple0 through a program, it
> >>>>>>>> could
> >>>>>>>> be that the system freaks out.
> >>>>>>>>
> >>>>>>>> On 31.07.2015 22:40, Chesnay Schepler wrote:
> >>>>>>>>> there's no specific reason. it was added fairly recently by me
> >>>>>>>>> (mid of
> >>>>>>>>> april), and you're most likely the second person to use it.
> >>>>>>>>>
> >>>>>>>>> i didn't integrate into all our tuple related stuff because,
> well,
> >> i
> >>>>>>>>> never thought anyone would actually need it, so i saved myself
> the
> >>>>>>>>> trouble.
> >>>>>>>>>
> >>>>>>>>>> Hi,
> >>>>>>>>>>
> >>>>>>>>>> is there any specific reason, why Tuple.getTupleClass(int arity)
> >>>>>>>>>> does
> >>>>>>>>>> not support arity zero? There is a class Tuple0, but it cannot
> be
> >>>>>>>>>> generator by Tuple.getTupleClass(...). Is it a missing feature
> (I
> >>>>>> would
> >>>>>>>>>> like to have it).
> >>>>>>>>>>
> >>>>>>>>>> -Matthias
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>
> >>>
> >>
> >>
> >
>
>

Re: Tuple

Posted by Aljoscha Krettek <al...@apache.org>.
I think in the Streaming Case it works because every Serializer ends up
being wrapped up in a StreamRecordSerializer. When the
StreamRecordSerializer serializes/deserializes stuff it should be ok that
the Tuple0 doesn't actually serialize/deserialize anything.

On Tue, 4 Aug 2015 at 13:27 Chesnay Schepler <c....@web.de> wrote:

> so I'm not to much into the streaming API, but as i see it this program
> creates an infinite number of tuples and then counts them, right?
>
> The problem with serialization as i understand it is that the receiver
> can't tell how many Tuple0 are sent, since you never actually read any
> data when deserializing a tuple. it's even more likely that it's not
> even attempted.
>
> As such, I'd be curious to see what happens when you create a batch job
> that with a limited number of starting tuples.
>
> On 04.08.2015 13:08, Matthias J. Sax wrote:
> > Hi,
> >
> > I just opened a PR for this. https://github.com/apache/flink/pull/983
> >
> > However, I was not able to "reproduce" serialization issues... I tested
> > Tuple0 (see enclosed code) in a cluster, and the program worked. Do I
> > miss anything?
> >
> > -Matthias
> >
> >
> >
> > On 08/03/2015 01:01 AM, Matthias J. Sax wrote:
> >> Thanks for the advice about Tuple0.
> >>
> >> I personally don't see any advantage in having "flink-tuple" project. Do
> >> I miss anything about it? Furthermore, I am not sure if it is a good
> >> idea the have too many too small projects.
> >>
> >>
> >> On 08/03/2015 12:48 AM, Stephan Ewen wrote:
> >>> Tuple0 would need special serialization and comparator logic. If that
> is
> >>> given, I see no reason not to support it.
> >>>
> >>> There is BTW, the request to create a dedicated "flink-tuple" project,
> that
> >>> only contains the tuple classes. Any opinions on that?
> >>>
> >>> On Mon, Aug 3, 2015 at 12:45 AM, Matthias J. Sax <
> >>> mjsax@informatik.hu-berlin.de> wrote:
> >>>
> >>>> Thanks for the explanation!
> >>>>
> >>>> As I mentioned before, Tuple0 might also be helpful for streaming.
> And I
> >>>> guess I will need it for Storm compatibility layer, too. (I need to
> >>>> double check, but Storm supports zero-attribute-tuples, too).
> >>>>
> >>>> With regard to the information I collected during the discussion, I
> vote
> >>>> for keeping Tuple0 in Flink core, and fix the serialization problem.
> >>>> Should we have another JIRA for this? Or should I extend the existing
> >>>> JIRA? (https://issues.apache.org/jira/browse/FLINK-2457)
> >>>>
> >>>> -Matthias
> >>>>
> >>>>
> >>>> On 08/03/2015 12:22 AM, Chesnay Schepler wrote:
> >>>>> First of all, it was a really good idea to start a discussion about
> this.
> >>>>>
> >>>>> So the general idea behind Tuple0 was this:
> >>>>>
> >>>>> The Python API maps python tuples to flink tuples. Python can have
> empty
> >>>>> tuples, so i thought "well duh, let's make a Tuple0 class!". What i
> did
> >>>>> not wanna do is create some non-Tuple object to represent empty
> tuples,
> >>>>> I'd rather have them treated the same, because it's less work and
> >>>>> creates simpler code.
> >>>>>
> >>>>> When transferring the plan to java, certain parameters for operations
> >>>>> are tuples, which can be empty aswell.
> >>>>> This is where the Tuple0 class is really useful, because these empty
> >>>>> tuples go through the same logic as other tuples.
> >>>>> This is also why i want to keep the class, at least in the python
> >>>>> project, for now.
> >>>>>
> >>>>> For the actual program execution, I need a new solution. Funny story,
> >>>>> while writing this reply i noticed that the Python API can't handle
> >>>>> Tuple0 at runtime aswell. ha...ha... -.-
> >>>>>
> >>>>> Guess I now know what I'm working on next.
> >>>>>
> >>>>> On 02.08.2015 21:24, Matthias J. Sax wrote:
> >>>>>> Can you elaborate how and why Python used Tuple0? If it cannot be
> >>>>>> serialized similar to regular Tuples, what is the usage in Python?
> Right
> >>>>>> now it seems, as there is no special serialization code for Tuple0.
> >>>>>>
> >>>>>> I just want to understand the topic in detail.
> >>>>>>
> >>>>>> -Matthias
> >>>>>>
> >>>>>>
> >>>>>> On 08/01/2015 03:38 PM, Stephan Ewen wrote:
> >>>>>>> I think a Tuple0 cannot be implemented like the current tuples, at
> >>>> least
> >>>>>>> with respect to runtime serialization.
> >>>>>>>
> >>>>>>> The system makes the assumption that it makes progress in consuming
> >>>>>>> bytes
> >>>>>>> when deserializing values. If a Tuple= never consumes data from the
> >>>> byte
> >>>>>>> stream, this assumption is broken. It would need at least one
> marker
> >>>>>>> byte.
> >>>>>>> Then it effectively is a Tuple1<Byte> disgusing itself as a tuple0.
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> On Sat, Aug 1, 2015 at 1:38 PM, Matthias J. Sax <
> >>>>>>> mjsax@informatik.hu-berlin.de> wrote:
> >>>>>>>
> >>>>>>>> I just double checked. Scala does not have type Tuple0. IMHO, it
> would
> >>>>>>>> be best to remove Tuple0 for consistency. Having Tuple types is
> for
> >>>>>>>> consistency reason with Scala in the first place, right? Please
> give
> >>>>>>>> feedback.
> >>>>>>>>
> >>>>>>>> -Matthias
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On 08/01/2015 01:04 PM, Matthias J. Sax wrote:
> >>>>>>>>> I see.
> >>>>>>>>>
> >>>>>>>>> I think that it might be useful to have Tuple0, because in rare
> >>>> cases,
> >>>>>>>>> you only want to "notify" a downstream operators (taking about
> >>>>>>>>> streaming) that something happened but there is no actual data
> to be
> >>>>>>>>> processed. Furthermore, if Flink cannot deal with Tuple0 it
> should be
> >>>>>>>>> removed completely for consistency IMHO.
> >>>>>>>>>
> >>>>>>>>> I will open a JIRA for it.
> >>>>>>>>>
> >>>>>>>>> -Matthias
> >>>>>>>>>
> >>>>>>>>> On 07/31/2015 10:44 PM, Chesnay Schepler wrote:
> >>>>>>>>>> also, I'm not sure if I ever sent a Tuple0 through a program, it
> >>>>>>>>>> could
> >>>>>>>>>> be that the system freaks out.
> >>>>>>>>>>
> >>>>>>>>>> On 31.07.2015 22:40, Chesnay Schepler wrote:
> >>>>>>>>>>> there's no specific reason. it was added fairly recently by me
> >>>>>>>>>>> (mid of
> >>>>>>>>>>> april), and you're most likely the second person to use it.
> >>>>>>>>>>>
> >>>>>>>>>>> i didn't integrate into all our tuple related stuff because,
> well,
> >>>> i
> >>>>>>>>>>> never thought anyone would actually need it, so i saved myself
> the
> >>>>>>>>>>> trouble.
> >>>>>>>>>>>
> >>>>>>>>>>>> Hi,
> >>>>>>>>>>>>
> >>>>>>>>>>>> is there any specific reason, why Tuple.getTupleClass(int
> arity)
> >>>>>>>>>>>> does
> >>>>>>>>>>>> not support arity zero? There is a class Tuple0, but it
> cannot be
> >>>>>>>>>>>> generator by Tuple.getTupleClass(...). Is it a missing
> feature (I
> >>>>>>>> would
> >>>>>>>>>>>> like to have it).
> >>>>>>>>>>>>
> >>>>>>>>>>>> -Matthias
> >>>>>>>>>>>>
> >>>>
>
>

Re: Tuple

Posted by "Matthias J. Sax" <mj...@informatik.hu-berlin.de>.
Yes, that is was the program does. However, streaming is not lazy so
deserialization should have happened.

I will try a batch job, later today.

On 08/04/2015 01:27 PM, Chesnay Schepler wrote:
> so I'm not to much into the streaming API, but as i see it this program
> creates an infinite number of tuples and then counts them, right?
> 
> The problem with serialization as i understand it is that the receiver
> can't tell how many Tuple0 are sent, since you never actually read any
> data when deserializing a tuple. it's even more likely that it's not
> even attempted.
> 
> As such, I'd be curious to see what happens when you create a batch job
> that with a limited number of starting tuples.
> 
> On 04.08.2015 13:08, Matthias J. Sax wrote:
>> Hi,
>>
>> I just opened a PR for this. https://github.com/apache/flink/pull/983
>>
>> However, I was not able to "reproduce" serialization issues... I tested
>> Tuple0 (see enclosed code) in a cluster, and the program worked. Do I
>> miss anything?
>>
>> -Matthias
>>
>>
>>
>> On 08/03/2015 01:01 AM, Matthias J. Sax wrote:
>>> Thanks for the advice about Tuple0.
>>>
>>> I personally don't see any advantage in having "flink-tuple" project. Do
>>> I miss anything about it? Furthermore, I am not sure if it is a good
>>> idea the have too many too small projects.
>>>
>>>
>>> On 08/03/2015 12:48 AM, Stephan Ewen wrote:
>>>> Tuple0 would need special serialization and comparator logic. If
>>>> that is
>>>> given, I see no reason not to support it.
>>>>
>>>> There is BTW, the request to create a dedicated "flink-tuple"
>>>> project, that
>>>> only contains the tuple classes. Any opinions on that?
>>>>
>>>> On Mon, Aug 3, 2015 at 12:45 AM, Matthias J. Sax <
>>>> mjsax@informatik.hu-berlin.de> wrote:
>>>>
>>>>> Thanks for the explanation!
>>>>>
>>>>> As I mentioned before, Tuple0 might also be helpful for streaming.
>>>>> And I
>>>>> guess I will need it for Storm compatibility layer, too. (I need to
>>>>> double check, but Storm supports zero-attribute-tuples, too).
>>>>>
>>>>> With regard to the information I collected during the discussion, I
>>>>> vote
>>>>> for keeping Tuple0 in Flink core, and fix the serialization problem.
>>>>> Should we have another JIRA for this? Or should I extend the existing
>>>>> JIRA? (https://issues.apache.org/jira/browse/FLINK-2457)
>>>>>
>>>>> -Matthias
>>>>>
>>>>>
>>>>> On 08/03/2015 12:22 AM, Chesnay Schepler wrote:
>>>>>> First of all, it was a really good idea to start a discussion
>>>>>> about this.
>>>>>>
>>>>>> So the general idea behind Tuple0 was this:
>>>>>>
>>>>>> The Python API maps python tuples to flink tuples. Python can have
>>>>>> empty
>>>>>> tuples, so i thought "well duh, let's make a Tuple0 class!". What
>>>>>> i did
>>>>>> not wanna do is create some non-Tuple object to represent empty
>>>>>> tuples,
>>>>>> I'd rather have them treated the same, because it's less work and
>>>>>> creates simpler code.
>>>>>>
>>>>>> When transferring the plan to java, certain parameters for operations
>>>>>> are tuples, which can be empty aswell.
>>>>>> This is where the Tuple0 class is really useful, because these empty
>>>>>> tuples go through the same logic as other tuples.
>>>>>> This is also why i want to keep the class, at least in the python
>>>>>> project, for now.
>>>>>>
>>>>>> For the actual program execution, I need a new solution. Funny story,
>>>>>> while writing this reply i noticed that the Python API can't handle
>>>>>> Tuple0 at runtime aswell. ha...ha... -.-
>>>>>>
>>>>>> Guess I now know what I'm working on next.
>>>>>>
>>>>>> On 02.08.2015 21:24, Matthias J. Sax wrote:
>>>>>>> Can you elaborate how and why Python used Tuple0? If it cannot be
>>>>>>> serialized similar to regular Tuples, what is the usage in
>>>>>>> Python? Right
>>>>>>> now it seems, as there is no special serialization code for Tuple0.
>>>>>>>
>>>>>>> I just want to understand the topic in detail.
>>>>>>>
>>>>>>> -Matthias
>>>>>>>
>>>>>>>
>>>>>>> On 08/01/2015 03:38 PM, Stephan Ewen wrote:
>>>>>>>> I think a Tuple0 cannot be implemented like the current tuples, at
>>>>> least
>>>>>>>> with respect to runtime serialization.
>>>>>>>>
>>>>>>>> The system makes the assumption that it makes progress in consuming
>>>>>>>> bytes
>>>>>>>> when deserializing values. If a Tuple= never consumes data from the
>>>>> byte
>>>>>>>> stream, this assumption is broken. It would need at least one
>>>>>>>> marker
>>>>>>>> byte.
>>>>>>>> Then it effectively is a Tuple1<Byte> disgusing itself as a tuple0.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Sat, Aug 1, 2015 at 1:38 PM, Matthias J. Sax <
>>>>>>>> mjsax@informatik.hu-berlin.de> wrote:
>>>>>>>>
>>>>>>>>> I just double checked. Scala does not have type Tuple0. IMHO,
>>>>>>>>> it would
>>>>>>>>> be best to remove Tuple0 for consistency. Having Tuple types is
>>>>>>>>> for
>>>>>>>>> consistency reason with Scala in the first place, right? Please
>>>>>>>>> give
>>>>>>>>> feedback.
>>>>>>>>>
>>>>>>>>> -Matthias
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 08/01/2015 01:04 PM, Matthias J. Sax wrote:
>>>>>>>>>> I see.
>>>>>>>>>>
>>>>>>>>>> I think that it might be useful to have Tuple0, because in rare
>>>>> cases,
>>>>>>>>>> you only want to "notify" a downstream operators (taking about
>>>>>>>>>> streaming) that something happened but there is no actual data
>>>>>>>>>> to be
>>>>>>>>>> processed. Furthermore, if Flink cannot deal with Tuple0 it
>>>>>>>>>> should be
>>>>>>>>>> removed completely for consistency IMHO.
>>>>>>>>>>
>>>>>>>>>> I will open a JIRA for it.
>>>>>>>>>>
>>>>>>>>>> -Matthias
>>>>>>>>>>
>>>>>>>>>> On 07/31/2015 10:44 PM, Chesnay Schepler wrote:
>>>>>>>>>>> also, I'm not sure if I ever sent a Tuple0 through a program, it
>>>>>>>>>>> could
>>>>>>>>>>> be that the system freaks out.
>>>>>>>>>>>
>>>>>>>>>>> On 31.07.2015 22:40, Chesnay Schepler wrote:
>>>>>>>>>>>> there's no specific reason. it was added fairly recently by me
>>>>>>>>>>>> (mid of
>>>>>>>>>>>> april), and you're most likely the second person to use it.
>>>>>>>>>>>>
>>>>>>>>>>>> i didn't integrate into all our tuple related stuff because,
>>>>>>>>>>>> well,
>>>>> i
>>>>>>>>>>>> never thought anyone would actually need it, so i saved
>>>>>>>>>>>> myself the
>>>>>>>>>>>> trouble.
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>
>>>>>>>>>>>>> is there any specific reason, why Tuple.getTupleClass(int
>>>>>>>>>>>>> arity)
>>>>>>>>>>>>> does
>>>>>>>>>>>>> not support arity zero? There is a class Tuple0, but it
>>>>>>>>>>>>> cannot be
>>>>>>>>>>>>> generator by Tuple.getTupleClass(...). Is it a missing
>>>>>>>>>>>>> feature (I
>>>>>>>>> would
>>>>>>>>>>>>> like to have it).
>>>>>>>>>>>>>
>>>>>>>>>>>>> -Matthias
>>>>>>>>>>>>>
>>>>>
> 


Re: Tuple

Posted by "Matthias J. Sax" <mj...@informatik.hu-berlin.de>.
I set parallelism of map to 4 (and I double checked, that the 4 mappers
are running on different machines). Furthermore, fromElements() source
has parallelism of 1. Thus, some data is going over the network for sure.


On 08/04/2015 02:31 PM, Chesnay Schepler wrote:
> i think this job would be chained completely and never do any
> serialization.
> 
> On 04.08.2015 14:25, Matthias J. Sax wrote:
>> Works for batch job, too. See enclosed.
>>
>> On 08/04/2015 01:34 PM, Matthias J. Sax wrote:
>>> Yes, that is was the program does. However, streaming is not lazy so
>>> deserialization should have happened.
>>>
>>> I will try a batch job, later today.
>>>
>>> On 08/04/2015 01:27 PM, Chesnay Schepler wrote:
>>>> so I'm not to much into the streaming API, but as i see it this program
>>>> creates an infinite number of tuples and then counts them, right?
>>>>
>>>> The problem with serialization as i understand it is that the receiver
>>>> can't tell how many Tuple0 are sent, since you never actually read any
>>>> data when deserializing a tuple. it's even more likely that it's not
>>>> even attempted.
>>>>
>>>> As such, I'd be curious to see what happens when you create a batch job
>>>> that with a limited number of starting tuples.
>>>>
>>>> On 04.08.2015 13:08, Matthias J. Sax wrote:
>>>>> Hi,
>>>>>
>>>>> I just opened a PR for this. https://github.com/apache/flink/pull/983
>>>>>
>>>>> However, I was not able to "reproduce" serialization issues... I
>>>>> tested
>>>>> Tuple0 (see enclosed code) in a cluster, and the program worked. Do I
>>>>> miss anything?
>>>>>
>>>>> -Matthias
>>>>>
>>>>>
>>>>>
>>>>> On 08/03/2015 01:01 AM, Matthias J. Sax wrote:
>>>>>> Thanks for the advice about Tuple0.
>>>>>>
>>>>>> I personally don't see any advantage in having "flink-tuple"
>>>>>> project. Do
>>>>>> I miss anything about it? Furthermore, I am not sure if it is a good
>>>>>> idea the have too many too small projects.
>>>>>>
>>>>>>
>>>>>> On 08/03/2015 12:48 AM, Stephan Ewen wrote:
>>>>>>> Tuple0 would need special serialization and comparator logic. If
>>>>>>> that is
>>>>>>> given, I see no reason not to support it.
>>>>>>>
>>>>>>> There is BTW, the request to create a dedicated "flink-tuple"
>>>>>>> project, that
>>>>>>> only contains the tuple classes. Any opinions on that?
>>>>>>>
>>>>>>> On Mon, Aug 3, 2015 at 12:45 AM, Matthias J. Sax <
>>>>>>> mjsax@informatik.hu-berlin.de> wrote:
>>>>>>>
>>>>>>>> Thanks for the explanation!
>>>>>>>>
>>>>>>>> As I mentioned before, Tuple0 might also be helpful for streaming.
>>>>>>>> And I
>>>>>>>> guess I will need it for Storm compatibility layer, too. (I need to
>>>>>>>> double check, but Storm supports zero-attribute-tuples, too).
>>>>>>>>
>>>>>>>> With regard to the information I collected during the discussion, I
>>>>>>>> vote
>>>>>>>> for keeping Tuple0 in Flink core, and fix the serialization
>>>>>>>> problem.
>>>>>>>> Should we have another JIRA for this? Or should I extend the
>>>>>>>> existing
>>>>>>>> JIRA? (https://issues.apache.org/jira/browse/FLINK-2457)
>>>>>>>>
>>>>>>>> -Matthias
>>>>>>>>
>>>>>>>>
>>>>>>>> On 08/03/2015 12:22 AM, Chesnay Schepler wrote:
>>>>>>>>> First of all, it was a really good idea to start a discussion
>>>>>>>>> about this.
>>>>>>>>>
>>>>>>>>> So the general idea behind Tuple0 was this:
>>>>>>>>>
>>>>>>>>> The Python API maps python tuples to flink tuples. Python can have
>>>>>>>>> empty
>>>>>>>>> tuples, so i thought "well duh, let's make a Tuple0 class!". What
>>>>>>>>> i did
>>>>>>>>> not wanna do is create some non-Tuple object to represent empty
>>>>>>>>> tuples,
>>>>>>>>> I'd rather have them treated the same, because it's less work and
>>>>>>>>> creates simpler code.
>>>>>>>>>
>>>>>>>>> When transferring the plan to java, certain parameters for
>>>>>>>>> operations
>>>>>>>>> are tuples, which can be empty aswell.
>>>>>>>>> This is where the Tuple0 class is really useful, because these
>>>>>>>>> empty
>>>>>>>>> tuples go through the same logic as other tuples.
>>>>>>>>> This is also why i want to keep the class, at least in the python
>>>>>>>>> project, for now.
>>>>>>>>>
>>>>>>>>> For the actual program execution, I need a new solution. Funny
>>>>>>>>> story,
>>>>>>>>> while writing this reply i noticed that the Python API can't
>>>>>>>>> handle
>>>>>>>>> Tuple0 at runtime aswell. ha...ha... -.-
>>>>>>>>>
>>>>>>>>> Guess I now know what I'm working on next.
>>>>>>>>>
>>>>>>>>> On 02.08.2015 21:24, Matthias J. Sax wrote:
>>>>>>>>>> Can you elaborate how and why Python used Tuple0? If it cannot be
>>>>>>>>>> serialized similar to regular Tuples, what is the usage in
>>>>>>>>>> Python? Right
>>>>>>>>>> now it seems, as there is no special serialization code for
>>>>>>>>>> Tuple0.
>>>>>>>>>>
>>>>>>>>>> I just want to understand the topic in detail.
>>>>>>>>>>
>>>>>>>>>> -Matthias
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 08/01/2015 03:38 PM, Stephan Ewen wrote:
>>>>>>>>>>> I think a Tuple0 cannot be implemented like the current
>>>>>>>>>>> tuples, at
>>>>>>>> least
>>>>>>>>>>> with respect to runtime serialization.
>>>>>>>>>>>
>>>>>>>>>>> The system makes the assumption that it makes progress in
>>>>>>>>>>> consuming
>>>>>>>>>>> bytes
>>>>>>>>>>> when deserializing values. If a Tuple= never consumes data
>>>>>>>>>>> from the
>>>>>>>> byte
>>>>>>>>>>> stream, this assumption is broken. It would need at least one
>>>>>>>>>>> marker
>>>>>>>>>>> byte.
>>>>>>>>>>> Then it effectively is a Tuple1<Byte> disgusing itself as a
>>>>>>>>>>> tuple0.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Sat, Aug 1, 2015 at 1:38 PM, Matthias J. Sax <
>>>>>>>>>>> mjsax@informatik.hu-berlin.de> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> I just double checked. Scala does not have type Tuple0. IMHO,
>>>>>>>>>>>> it would
>>>>>>>>>>>> be best to remove Tuple0 for consistency. Having Tuple types is
>>>>>>>>>>>> for
>>>>>>>>>>>> consistency reason with Scala in the first place, right? Please
>>>>>>>>>>>> give
>>>>>>>>>>>> feedback.
>>>>>>>>>>>>
>>>>>>>>>>>> -Matthias
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On 08/01/2015 01:04 PM, Matthias J. Sax wrote:
>>>>>>>>>>>>> I see.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I think that it might be useful to have Tuple0, because in
>>>>>>>>>>>>> rare
>>>>>>>> cases,
>>>>>>>>>>>>> you only want to "notify" a downstream operators (taking about
>>>>>>>>>>>>> streaming) that something happened but there is no actual data
>>>>>>>>>>>>> to be
>>>>>>>>>>>>> processed. Furthermore, if Flink cannot deal with Tuple0 it
>>>>>>>>>>>>> should be
>>>>>>>>>>>>> removed completely for consistency IMHO.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I will open a JIRA for it.
>>>>>>>>>>>>>
>>>>>>>>>>>>> -Matthias
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 07/31/2015 10:44 PM, Chesnay Schepler wrote:
>>>>>>>>>>>>>> also, I'm not sure if I ever sent a Tuple0 through a
>>>>>>>>>>>>>> program, it
>>>>>>>>>>>>>> could
>>>>>>>>>>>>>> be that the system freaks out.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On 31.07.2015 22:40, Chesnay Schepler wrote:
>>>>>>>>>>>>>>> there's no specific reason. it was added fairly recently
>>>>>>>>>>>>>>> by me
>>>>>>>>>>>>>>> (mid of
>>>>>>>>>>>>>>> april), and you're most likely the second person to use it.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> i didn't integrate into all our tuple related stuff because,
>>>>>>>>>>>>>>> well,
>>>>>>>> i
>>>>>>>>>>>>>>> never thought anyone would actually need it, so i saved
>>>>>>>>>>>>>>> myself the
>>>>>>>>>>>>>>> trouble.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> is there any specific reason, why Tuple.getTupleClass(int
>>>>>>>>>>>>>>>> arity)
>>>>>>>>>>>>>>>> does
>>>>>>>>>>>>>>>> not support arity zero? There is a class Tuple0, but it
>>>>>>>>>>>>>>>> cannot be
>>>>>>>>>>>>>>>> generator by Tuple.getTupleClass(...). Is it a missing
>>>>>>>>>>>>>>>> feature (I
>>>>>>>>>>>> would
>>>>>>>>>>>>>>>> like to have it).
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> -Matthias
>>>>>>>>>>>>>>>>
> 


Re: Tuple

Posted by Chesnay Schepler <ch...@fu-berlin.de>.
i think this job would be chained completely and never do any serialization.

On 04.08.2015 14:25, Matthias J. Sax wrote:
> Works for batch job, too. See enclosed.
>
> On 08/04/2015 01:34 PM, Matthias J. Sax wrote:
>> Yes, that is was the program does. However, streaming is not lazy so
>> deserialization should have happened.
>>
>> I will try a batch job, later today.
>>
>> On 08/04/2015 01:27 PM, Chesnay Schepler wrote:
>>> so I'm not to much into the streaming API, but as i see it this program
>>> creates an infinite number of tuples and then counts them, right?
>>>
>>> The problem with serialization as i understand it is that the receiver
>>> can't tell how many Tuple0 are sent, since you never actually read any
>>> data when deserializing a tuple. it's even more likely that it's not
>>> even attempted.
>>>
>>> As such, I'd be curious to see what happens when you create a batch job
>>> that with a limited number of starting tuples.
>>>
>>> On 04.08.2015 13:08, Matthias J. Sax wrote:
>>>> Hi,
>>>>
>>>> I just opened a PR for this. https://github.com/apache/flink/pull/983
>>>>
>>>> However, I was not able to "reproduce" serialization issues... I tested
>>>> Tuple0 (see enclosed code) in a cluster, and the program worked. Do I
>>>> miss anything?
>>>>
>>>> -Matthias
>>>>
>>>>
>>>>
>>>> On 08/03/2015 01:01 AM, Matthias J. Sax wrote:
>>>>> Thanks for the advice about Tuple0.
>>>>>
>>>>> I personally don't see any advantage in having "flink-tuple" project. Do
>>>>> I miss anything about it? Furthermore, I am not sure if it is a good
>>>>> idea the have too many too small projects.
>>>>>
>>>>>
>>>>> On 08/03/2015 12:48 AM, Stephan Ewen wrote:
>>>>>> Tuple0 would need special serialization and comparator logic. If
>>>>>> that is
>>>>>> given, I see no reason not to support it.
>>>>>>
>>>>>> There is BTW, the request to create a dedicated "flink-tuple"
>>>>>> project, that
>>>>>> only contains the tuple classes. Any opinions on that?
>>>>>>
>>>>>> On Mon, Aug 3, 2015 at 12:45 AM, Matthias J. Sax <
>>>>>> mjsax@informatik.hu-berlin.de> wrote:
>>>>>>
>>>>>>> Thanks for the explanation!
>>>>>>>
>>>>>>> As I mentioned before, Tuple0 might also be helpful for streaming.
>>>>>>> And I
>>>>>>> guess I will need it for Storm compatibility layer, too. (I need to
>>>>>>> double check, but Storm supports zero-attribute-tuples, too).
>>>>>>>
>>>>>>> With regard to the information I collected during the discussion, I
>>>>>>> vote
>>>>>>> for keeping Tuple0 in Flink core, and fix the serialization problem.
>>>>>>> Should we have another JIRA for this? Or should I extend the existing
>>>>>>> JIRA? (https://issues.apache.org/jira/browse/FLINK-2457)
>>>>>>>
>>>>>>> -Matthias
>>>>>>>
>>>>>>>
>>>>>>> On 08/03/2015 12:22 AM, Chesnay Schepler wrote:
>>>>>>>> First of all, it was a really good idea to start a discussion
>>>>>>>> about this.
>>>>>>>>
>>>>>>>> So the general idea behind Tuple0 was this:
>>>>>>>>
>>>>>>>> The Python API maps python tuples to flink tuples. Python can have
>>>>>>>> empty
>>>>>>>> tuples, so i thought "well duh, let's make a Tuple0 class!". What
>>>>>>>> i did
>>>>>>>> not wanna do is create some non-Tuple object to represent empty
>>>>>>>> tuples,
>>>>>>>> I'd rather have them treated the same, because it's less work and
>>>>>>>> creates simpler code.
>>>>>>>>
>>>>>>>> When transferring the plan to java, certain parameters for operations
>>>>>>>> are tuples, which can be empty aswell.
>>>>>>>> This is where the Tuple0 class is really useful, because these empty
>>>>>>>> tuples go through the same logic as other tuples.
>>>>>>>> This is also why i want to keep the class, at least in the python
>>>>>>>> project, for now.
>>>>>>>>
>>>>>>>> For the actual program execution, I need a new solution. Funny story,
>>>>>>>> while writing this reply i noticed that the Python API can't handle
>>>>>>>> Tuple0 at runtime aswell. ha...ha... -.-
>>>>>>>>
>>>>>>>> Guess I now know what I'm working on next.
>>>>>>>>
>>>>>>>> On 02.08.2015 21:24, Matthias J. Sax wrote:
>>>>>>>>> Can you elaborate how and why Python used Tuple0? If it cannot be
>>>>>>>>> serialized similar to regular Tuples, what is the usage in
>>>>>>>>> Python? Right
>>>>>>>>> now it seems, as there is no special serialization code for Tuple0.
>>>>>>>>>
>>>>>>>>> I just want to understand the topic in detail.
>>>>>>>>>
>>>>>>>>> -Matthias
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 08/01/2015 03:38 PM, Stephan Ewen wrote:
>>>>>>>>>> I think a Tuple0 cannot be implemented like the current tuples, at
>>>>>>> least
>>>>>>>>>> with respect to runtime serialization.
>>>>>>>>>>
>>>>>>>>>> The system makes the assumption that it makes progress in consuming
>>>>>>>>>> bytes
>>>>>>>>>> when deserializing values. If a Tuple= never consumes data from the
>>>>>>> byte
>>>>>>>>>> stream, this assumption is broken. It would need at least one
>>>>>>>>>> marker
>>>>>>>>>> byte.
>>>>>>>>>> Then it effectively is a Tuple1<Byte> disgusing itself as a tuple0.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Sat, Aug 1, 2015 at 1:38 PM, Matthias J. Sax <
>>>>>>>>>> mjsax@informatik.hu-berlin.de> wrote:
>>>>>>>>>>
>>>>>>>>>>> I just double checked. Scala does not have type Tuple0. IMHO,
>>>>>>>>>>> it would
>>>>>>>>>>> be best to remove Tuple0 for consistency. Having Tuple types is
>>>>>>>>>>> for
>>>>>>>>>>> consistency reason with Scala in the first place, right? Please
>>>>>>>>>>> give
>>>>>>>>>>> feedback.
>>>>>>>>>>>
>>>>>>>>>>> -Matthias
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 08/01/2015 01:04 PM, Matthias J. Sax wrote:
>>>>>>>>>>>> I see.
>>>>>>>>>>>>
>>>>>>>>>>>> I think that it might be useful to have Tuple0, because in rare
>>>>>>> cases,
>>>>>>>>>>>> you only want to "notify" a downstream operators (taking about
>>>>>>>>>>>> streaming) that something happened but there is no actual data
>>>>>>>>>>>> to be
>>>>>>>>>>>> processed. Furthermore, if Flink cannot deal with Tuple0 it
>>>>>>>>>>>> should be
>>>>>>>>>>>> removed completely for consistency IMHO.
>>>>>>>>>>>>
>>>>>>>>>>>> I will open a JIRA for it.
>>>>>>>>>>>>
>>>>>>>>>>>> -Matthias
>>>>>>>>>>>>
>>>>>>>>>>>> On 07/31/2015 10:44 PM, Chesnay Schepler wrote:
>>>>>>>>>>>>> also, I'm not sure if I ever sent a Tuple0 through a program, it
>>>>>>>>>>>>> could
>>>>>>>>>>>>> be that the system freaks out.
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 31.07.2015 22:40, Chesnay Schepler wrote:
>>>>>>>>>>>>>> there's no specific reason. it was added fairly recently by me
>>>>>>>>>>>>>> (mid of
>>>>>>>>>>>>>> april), and you're most likely the second person to use it.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> i didn't integrate into all our tuple related stuff because,
>>>>>>>>>>>>>> well,
>>>>>>> i
>>>>>>>>>>>>>> never thought anyone would actually need it, so i saved
>>>>>>>>>>>>>> myself the
>>>>>>>>>>>>>> trouble.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> is there any specific reason, why Tuple.getTupleClass(int
>>>>>>>>>>>>>>> arity)
>>>>>>>>>>>>>>> does
>>>>>>>>>>>>>>> not support arity zero? There is a class Tuple0, but it
>>>>>>>>>>>>>>> cannot be
>>>>>>>>>>>>>>> generator by Tuple.getTupleClass(...). Is it a missing
>>>>>>>>>>>>>>> feature (I
>>>>>>>>>>> would
>>>>>>>>>>>>>>> like to have it).
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> -Matthias
>>>>>>>>>>>>>>>


Re: Tuple

Posted by "Matthias J. Sax" <mj...@informatik.hu-berlin.de>.
Works for batch job, too. See enclosed.

On 08/04/2015 01:34 PM, Matthias J. Sax wrote:
> Yes, that is was the program does. However, streaming is not lazy so
> deserialization should have happened.
> 
> I will try a batch job, later today.
> 
> On 08/04/2015 01:27 PM, Chesnay Schepler wrote:
>> so I'm not to much into the streaming API, but as i see it this program
>> creates an infinite number of tuples and then counts them, right?
>>
>> The problem with serialization as i understand it is that the receiver
>> can't tell how many Tuple0 are sent, since you never actually read any
>> data when deserializing a tuple. it's even more likely that it's not
>> even attempted.
>>
>> As such, I'd be curious to see what happens when you create a batch job
>> that with a limited number of starting tuples.
>>
>> On 04.08.2015 13:08, Matthias J. Sax wrote:
>>> Hi,
>>>
>>> I just opened a PR for this. https://github.com/apache/flink/pull/983
>>>
>>> However, I was not able to "reproduce" serialization issues... I tested
>>> Tuple0 (see enclosed code) in a cluster, and the program worked. Do I
>>> miss anything?
>>>
>>> -Matthias
>>>
>>>
>>>
>>> On 08/03/2015 01:01 AM, Matthias J. Sax wrote:
>>>> Thanks for the advice about Tuple0.
>>>>
>>>> I personally don't see any advantage in having "flink-tuple" project. Do
>>>> I miss anything about it? Furthermore, I am not sure if it is a good
>>>> idea the have too many too small projects.
>>>>
>>>>
>>>> On 08/03/2015 12:48 AM, Stephan Ewen wrote:
>>>>> Tuple0 would need special serialization and comparator logic. If
>>>>> that is
>>>>> given, I see no reason not to support it.
>>>>>
>>>>> There is BTW, the request to create a dedicated "flink-tuple"
>>>>> project, that
>>>>> only contains the tuple classes. Any opinions on that?
>>>>>
>>>>> On Mon, Aug 3, 2015 at 12:45 AM, Matthias J. Sax <
>>>>> mjsax@informatik.hu-berlin.de> wrote:
>>>>>
>>>>>> Thanks for the explanation!
>>>>>>
>>>>>> As I mentioned before, Tuple0 might also be helpful for streaming.
>>>>>> And I
>>>>>> guess I will need it for Storm compatibility layer, too. (I need to
>>>>>> double check, but Storm supports zero-attribute-tuples, too).
>>>>>>
>>>>>> With regard to the information I collected during the discussion, I
>>>>>> vote
>>>>>> for keeping Tuple0 in Flink core, and fix the serialization problem.
>>>>>> Should we have another JIRA for this? Or should I extend the existing
>>>>>> JIRA? (https://issues.apache.org/jira/browse/FLINK-2457)
>>>>>>
>>>>>> -Matthias
>>>>>>
>>>>>>
>>>>>> On 08/03/2015 12:22 AM, Chesnay Schepler wrote:
>>>>>>> First of all, it was a really good idea to start a discussion
>>>>>>> about this.
>>>>>>>
>>>>>>> So the general idea behind Tuple0 was this:
>>>>>>>
>>>>>>> The Python API maps python tuples to flink tuples. Python can have
>>>>>>> empty
>>>>>>> tuples, so i thought "well duh, let's make a Tuple0 class!". What
>>>>>>> i did
>>>>>>> not wanna do is create some non-Tuple object to represent empty
>>>>>>> tuples,
>>>>>>> I'd rather have them treated the same, because it's less work and
>>>>>>> creates simpler code.
>>>>>>>
>>>>>>> When transferring the plan to java, certain parameters for operations
>>>>>>> are tuples, which can be empty aswell.
>>>>>>> This is where the Tuple0 class is really useful, because these empty
>>>>>>> tuples go through the same logic as other tuples.
>>>>>>> This is also why i want to keep the class, at least in the python
>>>>>>> project, for now.
>>>>>>>
>>>>>>> For the actual program execution, I need a new solution. Funny story,
>>>>>>> while writing this reply i noticed that the Python API can't handle
>>>>>>> Tuple0 at runtime aswell. ha...ha... -.-
>>>>>>>
>>>>>>> Guess I now know what I'm working on next.
>>>>>>>
>>>>>>> On 02.08.2015 21:24, Matthias J. Sax wrote:
>>>>>>>> Can you elaborate how and why Python used Tuple0? If it cannot be
>>>>>>>> serialized similar to regular Tuples, what is the usage in
>>>>>>>> Python? Right
>>>>>>>> now it seems, as there is no special serialization code for Tuple0.
>>>>>>>>
>>>>>>>> I just want to understand the topic in detail.
>>>>>>>>
>>>>>>>> -Matthias
>>>>>>>>
>>>>>>>>
>>>>>>>> On 08/01/2015 03:38 PM, Stephan Ewen wrote:
>>>>>>>>> I think a Tuple0 cannot be implemented like the current tuples, at
>>>>>> least
>>>>>>>>> with respect to runtime serialization.
>>>>>>>>>
>>>>>>>>> The system makes the assumption that it makes progress in consuming
>>>>>>>>> bytes
>>>>>>>>> when deserializing values. If a Tuple= never consumes data from the
>>>>>> byte
>>>>>>>>> stream, this assumption is broken. It would need at least one
>>>>>>>>> marker
>>>>>>>>> byte.
>>>>>>>>> Then it effectively is a Tuple1<Byte> disgusing itself as a tuple0.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Sat, Aug 1, 2015 at 1:38 PM, Matthias J. Sax <
>>>>>>>>> mjsax@informatik.hu-berlin.de> wrote:
>>>>>>>>>
>>>>>>>>>> I just double checked. Scala does not have type Tuple0. IMHO,
>>>>>>>>>> it would
>>>>>>>>>> be best to remove Tuple0 for consistency. Having Tuple types is
>>>>>>>>>> for
>>>>>>>>>> consistency reason with Scala in the first place, right? Please
>>>>>>>>>> give
>>>>>>>>>> feedback.
>>>>>>>>>>
>>>>>>>>>> -Matthias
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 08/01/2015 01:04 PM, Matthias J. Sax wrote:
>>>>>>>>>>> I see.
>>>>>>>>>>>
>>>>>>>>>>> I think that it might be useful to have Tuple0, because in rare
>>>>>> cases,
>>>>>>>>>>> you only want to "notify" a downstream operators (taking about
>>>>>>>>>>> streaming) that something happened but there is no actual data
>>>>>>>>>>> to be
>>>>>>>>>>> processed. Furthermore, if Flink cannot deal with Tuple0 it
>>>>>>>>>>> should be
>>>>>>>>>>> removed completely for consistency IMHO.
>>>>>>>>>>>
>>>>>>>>>>> I will open a JIRA for it.
>>>>>>>>>>>
>>>>>>>>>>> -Matthias
>>>>>>>>>>>
>>>>>>>>>>> On 07/31/2015 10:44 PM, Chesnay Schepler wrote:
>>>>>>>>>>>> also, I'm not sure if I ever sent a Tuple0 through a program, it
>>>>>>>>>>>> could
>>>>>>>>>>>> be that the system freaks out.
>>>>>>>>>>>>
>>>>>>>>>>>> On 31.07.2015 22:40, Chesnay Schepler wrote:
>>>>>>>>>>>>> there's no specific reason. it was added fairly recently by me
>>>>>>>>>>>>> (mid of
>>>>>>>>>>>>> april), and you're most likely the second person to use it.
>>>>>>>>>>>>>
>>>>>>>>>>>>> i didn't integrate into all our tuple related stuff because,
>>>>>>>>>>>>> well,
>>>>>> i
>>>>>>>>>>>>> never thought anyone would actually need it, so i saved
>>>>>>>>>>>>> myself the
>>>>>>>>>>>>> trouble.
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> is there any specific reason, why Tuple.getTupleClass(int
>>>>>>>>>>>>>> arity)
>>>>>>>>>>>>>> does
>>>>>>>>>>>>>> not support arity zero? There is a class Tuple0, but it
>>>>>>>>>>>>>> cannot be
>>>>>>>>>>>>>> generator by Tuple.getTupleClass(...). Is it a missing
>>>>>>>>>>>>>> feature (I
>>>>>>>>>> would
>>>>>>>>>>>>>> like to have it).
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> -Matthias
>>>>>>>>>>>>>>
>>>>>>
>>
> 

Re: Tuple

Posted by Chesnay Schepler <c....@web.de>.
so I'm not to much into the streaming API, but as i see it this program 
creates an infinite number of tuples and then counts them, right?

The problem with serialization as i understand it is that the receiver 
can't tell how many Tuple0 are sent, since you never actually read any 
data when deserializing a tuple. it's even more likely that it's not 
even attempted.

As such, I'd be curious to see what happens when you create a batch job 
that with a limited number of starting tuples.

On 04.08.2015 13:08, Matthias J. Sax wrote:
> Hi,
>
> I just opened a PR for this. https://github.com/apache/flink/pull/983
>
> However, I was not able to "reproduce" serialization issues... I tested
> Tuple0 (see enclosed code) in a cluster, and the program worked. Do I
> miss anything?
>
> -Matthias
>
>
>
> On 08/03/2015 01:01 AM, Matthias J. Sax wrote:
>> Thanks for the advice about Tuple0.
>>
>> I personally don't see any advantage in having "flink-tuple" project. Do
>> I miss anything about it? Furthermore, I am not sure if it is a good
>> idea the have too many too small projects.
>>
>>
>> On 08/03/2015 12:48 AM, Stephan Ewen wrote:
>>> Tuple0 would need special serialization and comparator logic. If that is
>>> given, I see no reason not to support it.
>>>
>>> There is BTW, the request to create a dedicated "flink-tuple" project, that
>>> only contains the tuple classes. Any opinions on that?
>>>
>>> On Mon, Aug 3, 2015 at 12:45 AM, Matthias J. Sax <
>>> mjsax@informatik.hu-berlin.de> wrote:
>>>
>>>> Thanks for the explanation!
>>>>
>>>> As I mentioned before, Tuple0 might also be helpful for streaming. And I
>>>> guess I will need it for Storm compatibility layer, too. (I need to
>>>> double check, but Storm supports zero-attribute-tuples, too).
>>>>
>>>> With regard to the information I collected during the discussion, I vote
>>>> for keeping Tuple0 in Flink core, and fix the serialization problem.
>>>> Should we have another JIRA for this? Or should I extend the existing
>>>> JIRA? (https://issues.apache.org/jira/browse/FLINK-2457)
>>>>
>>>> -Matthias
>>>>
>>>>
>>>> On 08/03/2015 12:22 AM, Chesnay Schepler wrote:
>>>>> First of all, it was a really good idea to start a discussion about this.
>>>>>
>>>>> So the general idea behind Tuple0 was this:
>>>>>
>>>>> The Python API maps python tuples to flink tuples. Python can have empty
>>>>> tuples, so i thought "well duh, let's make a Tuple0 class!". What i did
>>>>> not wanna do is create some non-Tuple object to represent empty tuples,
>>>>> I'd rather have them treated the same, because it's less work and
>>>>> creates simpler code.
>>>>>
>>>>> When transferring the plan to java, certain parameters for operations
>>>>> are tuples, which can be empty aswell.
>>>>> This is where the Tuple0 class is really useful, because these empty
>>>>> tuples go through the same logic as other tuples.
>>>>> This is also why i want to keep the class, at least in the python
>>>>> project, for now.
>>>>>
>>>>> For the actual program execution, I need a new solution. Funny story,
>>>>> while writing this reply i noticed that the Python API can't handle
>>>>> Tuple0 at runtime aswell. ha...ha... -.-
>>>>>
>>>>> Guess I now know what I'm working on next.
>>>>>
>>>>> On 02.08.2015 21:24, Matthias J. Sax wrote:
>>>>>> Can you elaborate how and why Python used Tuple0? If it cannot be
>>>>>> serialized similar to regular Tuples, what is the usage in Python? Right
>>>>>> now it seems, as there is no special serialization code for Tuple0.
>>>>>>
>>>>>> I just want to understand the topic in detail.
>>>>>>
>>>>>> -Matthias
>>>>>>
>>>>>>
>>>>>> On 08/01/2015 03:38 PM, Stephan Ewen wrote:
>>>>>>> I think a Tuple0 cannot be implemented like the current tuples, at
>>>> least
>>>>>>> with respect to runtime serialization.
>>>>>>>
>>>>>>> The system makes the assumption that it makes progress in consuming
>>>>>>> bytes
>>>>>>> when deserializing values. If a Tuple= never consumes data from the
>>>> byte
>>>>>>> stream, this assumption is broken. It would need at least one marker
>>>>>>> byte.
>>>>>>> Then it effectively is a Tuple1<Byte> disgusing itself as a tuple0.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Sat, Aug 1, 2015 at 1:38 PM, Matthias J. Sax <
>>>>>>> mjsax@informatik.hu-berlin.de> wrote:
>>>>>>>
>>>>>>>> I just double checked. Scala does not have type Tuple0. IMHO, it would
>>>>>>>> be best to remove Tuple0 for consistency. Having Tuple types is for
>>>>>>>> consistency reason with Scala in the first place, right? Please give
>>>>>>>> feedback.
>>>>>>>>
>>>>>>>> -Matthias
>>>>>>>>
>>>>>>>>
>>>>>>>> On 08/01/2015 01:04 PM, Matthias J. Sax wrote:
>>>>>>>>> I see.
>>>>>>>>>
>>>>>>>>> I think that it might be useful to have Tuple0, because in rare
>>>> cases,
>>>>>>>>> you only want to "notify" a downstream operators (taking about
>>>>>>>>> streaming) that something happened but there is no actual data to be
>>>>>>>>> processed. Furthermore, if Flink cannot deal with Tuple0 it should be
>>>>>>>>> removed completely for consistency IMHO.
>>>>>>>>>
>>>>>>>>> I will open a JIRA for it.
>>>>>>>>>
>>>>>>>>> -Matthias
>>>>>>>>>
>>>>>>>>> On 07/31/2015 10:44 PM, Chesnay Schepler wrote:
>>>>>>>>>> also, I'm not sure if I ever sent a Tuple0 through a program, it
>>>>>>>>>> could
>>>>>>>>>> be that the system freaks out.
>>>>>>>>>>
>>>>>>>>>> On 31.07.2015 22:40, Chesnay Schepler wrote:
>>>>>>>>>>> there's no specific reason. it was added fairly recently by me
>>>>>>>>>>> (mid of
>>>>>>>>>>> april), and you're most likely the second person to use it.
>>>>>>>>>>>
>>>>>>>>>>> i didn't integrate into all our tuple related stuff because, well,
>>>> i
>>>>>>>>>>> never thought anyone would actually need it, so i saved myself the
>>>>>>>>>>> trouble.
>>>>>>>>>>>
>>>>>>>>>>>> Hi,
>>>>>>>>>>>>
>>>>>>>>>>>> is there any specific reason, why Tuple.getTupleClass(int arity)
>>>>>>>>>>>> does
>>>>>>>>>>>> not support arity zero? There is a class Tuple0, but it cannot be
>>>>>>>>>>>> generator by Tuple.getTupleClass(...). Is it a missing feature (I
>>>>>>>> would
>>>>>>>>>>>> like to have it).
>>>>>>>>>>>>
>>>>>>>>>>>> -Matthias
>>>>>>>>>>>>
>>>>


Re: Tuple

Posted by "Matthias J. Sax" <mj...@informatik.hu-berlin.de>.
Hi,

I just opened a PR for this. https://github.com/apache/flink/pull/983

However, I was not able to "reproduce" serialization issues... I tested
Tuple0 (see enclosed code) in a cluster, and the program worked. Do I
miss anything?

-Matthias



On 08/03/2015 01:01 AM, Matthias J. Sax wrote:
> Thanks for the advice about Tuple0.
> 
> I personally don't see any advantage in having "flink-tuple" project. Do
> I miss anything about it? Furthermore, I am not sure if it is a good
> idea the have too many too small projects.
> 
> 
> On 08/03/2015 12:48 AM, Stephan Ewen wrote:
>> Tuple0 would need special serialization and comparator logic. If that is
>> given, I see no reason not to support it.
>>
>> There is BTW, the request to create a dedicated "flink-tuple" project, that
>> only contains the tuple classes. Any opinions on that?
>>
>> On Mon, Aug 3, 2015 at 12:45 AM, Matthias J. Sax <
>> mjsax@informatik.hu-berlin.de> wrote:
>>
>>> Thanks for the explanation!
>>>
>>> As I mentioned before, Tuple0 might also be helpful for streaming. And I
>>> guess I will need it for Storm compatibility layer, too. (I need to
>>> double check, but Storm supports zero-attribute-tuples, too).
>>>
>>> With regard to the information I collected during the discussion, I vote
>>> for keeping Tuple0 in Flink core, and fix the serialization problem.
>>> Should we have another JIRA for this? Or should I extend the existing
>>> JIRA? (https://issues.apache.org/jira/browse/FLINK-2457)
>>>
>>> -Matthias
>>>
>>>
>>> On 08/03/2015 12:22 AM, Chesnay Schepler wrote:
>>>> First of all, it was a really good idea to start a discussion about this.
>>>>
>>>> So the general idea behind Tuple0 was this:
>>>>
>>>> The Python API maps python tuples to flink tuples. Python can have empty
>>>> tuples, so i thought "well duh, let's make a Tuple0 class!". What i did
>>>> not wanna do is create some non-Tuple object to represent empty tuples,
>>>> I'd rather have them treated the same, because it's less work and
>>>> creates simpler code.
>>>>
>>>> When transferring the plan to java, certain parameters for operations
>>>> are tuples, which can be empty aswell.
>>>> This is where the Tuple0 class is really useful, because these empty
>>>> tuples go through the same logic as other tuples.
>>>> This is also why i want to keep the class, at least in the python
>>>> project, for now.
>>>>
>>>> For the actual program execution, I need a new solution. Funny story,
>>>> while writing this reply i noticed that the Python API can't handle
>>>> Tuple0 at runtime aswell. ha...ha... -.-
>>>>
>>>> Guess I now know what I'm working on next.
>>>>
>>>> On 02.08.2015 21:24, Matthias J. Sax wrote:
>>>>> Can you elaborate how and why Python used Tuple0? If it cannot be
>>>>> serialized similar to regular Tuples, what is the usage in Python? Right
>>>>> now it seems, as there is no special serialization code for Tuple0.
>>>>>
>>>>> I just want to understand the topic in detail.
>>>>>
>>>>> -Matthias
>>>>>
>>>>>
>>>>> On 08/01/2015 03:38 PM, Stephan Ewen wrote:
>>>>>> I think a Tuple0 cannot be implemented like the current tuples, at
>>> least
>>>>>> with respect to runtime serialization.
>>>>>>
>>>>>> The system makes the assumption that it makes progress in consuming
>>>>>> bytes
>>>>>> when deserializing values. If a Tuple= never consumes data from the
>>> byte
>>>>>> stream, this assumption is broken. It would need at least one marker
>>>>>> byte.
>>>>>> Then it effectively is a Tuple1<Byte> disgusing itself as a tuple0.
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Sat, Aug 1, 2015 at 1:38 PM, Matthias J. Sax <
>>>>>> mjsax@informatik.hu-berlin.de> wrote:
>>>>>>
>>>>>>> I just double checked. Scala does not have type Tuple0. IMHO, it would
>>>>>>> be best to remove Tuple0 for consistency. Having Tuple types is for
>>>>>>> consistency reason with Scala in the first place, right? Please give
>>>>>>> feedback.
>>>>>>>
>>>>>>> -Matthias
>>>>>>>
>>>>>>>
>>>>>>> On 08/01/2015 01:04 PM, Matthias J. Sax wrote:
>>>>>>>> I see.
>>>>>>>>
>>>>>>>> I think that it might be useful to have Tuple0, because in rare
>>> cases,
>>>>>>>> you only want to "notify" a downstream operators (taking about
>>>>>>>> streaming) that something happened but there is no actual data to be
>>>>>>>> processed. Furthermore, if Flink cannot deal with Tuple0 it should be
>>>>>>>> removed completely for consistency IMHO.
>>>>>>>>
>>>>>>>> I will open a JIRA for it.
>>>>>>>>
>>>>>>>> -Matthias
>>>>>>>>
>>>>>>>> On 07/31/2015 10:44 PM, Chesnay Schepler wrote:
>>>>>>>>> also, I'm not sure if I ever sent a Tuple0 through a program, it
>>>>>>>>> could
>>>>>>>>> be that the system freaks out.
>>>>>>>>>
>>>>>>>>> On 31.07.2015 22:40, Chesnay Schepler wrote:
>>>>>>>>>> there's no specific reason. it was added fairly recently by me
>>>>>>>>>> (mid of
>>>>>>>>>> april), and you're most likely the second person to use it.
>>>>>>>>>>
>>>>>>>>>> i didn't integrate into all our tuple related stuff because, well,
>>> i
>>>>>>>>>> never thought anyone would actually need it, so i saved myself the
>>>>>>>>>> trouble.
>>>>>>>>>>
>>>>>>>>>>> Hi,
>>>>>>>>>>>
>>>>>>>>>>> is there any specific reason, why Tuple.getTupleClass(int arity)
>>>>>>>>>>> does
>>>>>>>>>>> not support arity zero? There is a class Tuple0, but it cannot be
>>>>>>>>>>> generator by Tuple.getTupleClass(...). Is it a missing feature (I
>>>>>>> would
>>>>>>>>>>> like to have it).
>>>>>>>>>>>
>>>>>>>>>>> -Matthias
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>
>>>>
>>>
>>>
>>
> 

Re: Tuple

Posted by "Matthias J. Sax" <mj...@informatik.hu-berlin.de>.
Thanks for the advice about Tuple0.

I personally don't see any advantage in having "flink-tuple" project. Do
I miss anything about it? Furthermore, I am not sure if it is a good
idea the have too many too small projects.


On 08/03/2015 12:48 AM, Stephan Ewen wrote:
> Tuple0 would need special serialization and comparator logic. If that is
> given, I see no reason not to support it.
> 
> There is BTW, the request to create a dedicated "flink-tuple" project, that
> only contains the tuple classes. Any opinions on that?
> 
> On Mon, Aug 3, 2015 at 12:45 AM, Matthias J. Sax <
> mjsax@informatik.hu-berlin.de> wrote:
> 
>> Thanks for the explanation!
>>
>> As I mentioned before, Tuple0 might also be helpful for streaming. And I
>> guess I will need it for Storm compatibility layer, too. (I need to
>> double check, but Storm supports zero-attribute-tuples, too).
>>
>> With regard to the information I collected during the discussion, I vote
>> for keeping Tuple0 in Flink core, and fix the serialization problem.
>> Should we have another JIRA for this? Or should I extend the existing
>> JIRA? (https://issues.apache.org/jira/browse/FLINK-2457)
>>
>> -Matthias
>>
>>
>> On 08/03/2015 12:22 AM, Chesnay Schepler wrote:
>>> First of all, it was a really good idea to start a discussion about this.
>>>
>>> So the general idea behind Tuple0 was this:
>>>
>>> The Python API maps python tuples to flink tuples. Python can have empty
>>> tuples, so i thought "well duh, let's make a Tuple0 class!". What i did
>>> not wanna do is create some non-Tuple object to represent empty tuples,
>>> I'd rather have them treated the same, because it's less work and
>>> creates simpler code.
>>>
>>> When transferring the plan to java, certain parameters for operations
>>> are tuples, which can be empty aswell.
>>> This is where the Tuple0 class is really useful, because these empty
>>> tuples go through the same logic as other tuples.
>>> This is also why i want to keep the class, at least in the python
>>> project, for now.
>>>
>>> For the actual program execution, I need a new solution. Funny story,
>>> while writing this reply i noticed that the Python API can't handle
>>> Tuple0 at runtime aswell. ha...ha... -.-
>>>
>>> Guess I now know what I'm working on next.
>>>
>>> On 02.08.2015 21:24, Matthias J. Sax wrote:
>>>> Can you elaborate how and why Python used Tuple0? If it cannot be
>>>> serialized similar to regular Tuples, what is the usage in Python? Right
>>>> now it seems, as there is no special serialization code for Tuple0.
>>>>
>>>> I just want to understand the topic in detail.
>>>>
>>>> -Matthias
>>>>
>>>>
>>>> On 08/01/2015 03:38 PM, Stephan Ewen wrote:
>>>>> I think a Tuple0 cannot be implemented like the current tuples, at
>> least
>>>>> with respect to runtime serialization.
>>>>>
>>>>> The system makes the assumption that it makes progress in consuming
>>>>> bytes
>>>>> when deserializing values. If a Tuple= never consumes data from the
>> byte
>>>>> stream, this assumption is broken. It would need at least one marker
>>>>> byte.
>>>>> Then it effectively is a Tuple1<Byte> disgusing itself as a tuple0.
>>>>>
>>>>>
>>>>>
>>>>> On Sat, Aug 1, 2015 at 1:38 PM, Matthias J. Sax <
>>>>> mjsax@informatik.hu-berlin.de> wrote:
>>>>>
>>>>>> I just double checked. Scala does not have type Tuple0. IMHO, it would
>>>>>> be best to remove Tuple0 for consistency. Having Tuple types is for
>>>>>> consistency reason with Scala in the first place, right? Please give
>>>>>> feedback.
>>>>>>
>>>>>> -Matthias
>>>>>>
>>>>>>
>>>>>> On 08/01/2015 01:04 PM, Matthias J. Sax wrote:
>>>>>>> I see.
>>>>>>>
>>>>>>> I think that it might be useful to have Tuple0, because in rare
>> cases,
>>>>>>> you only want to "notify" a downstream operators (taking about
>>>>>>> streaming) that something happened but there is no actual data to be
>>>>>>> processed. Furthermore, if Flink cannot deal with Tuple0 it should be
>>>>>>> removed completely for consistency IMHO.
>>>>>>>
>>>>>>> I will open a JIRA for it.
>>>>>>>
>>>>>>> -Matthias
>>>>>>>
>>>>>>> On 07/31/2015 10:44 PM, Chesnay Schepler wrote:
>>>>>>>> also, I'm not sure if I ever sent a Tuple0 through a program, it
>>>>>>>> could
>>>>>>>> be that the system freaks out.
>>>>>>>>
>>>>>>>> On 31.07.2015 22:40, Chesnay Schepler wrote:
>>>>>>>>> there's no specific reason. it was added fairly recently by me
>>>>>>>>> (mid of
>>>>>>>>> april), and you're most likely the second person to use it.
>>>>>>>>>
>>>>>>>>> i didn't integrate into all our tuple related stuff because, well,
>> i
>>>>>>>>> never thought anyone would actually need it, so i saved myself the
>>>>>>>>> trouble.
>>>>>>>>>
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> is there any specific reason, why Tuple.getTupleClass(int arity)
>>>>>>>>>> does
>>>>>>>>>> not support arity zero? There is a class Tuple0, but it cannot be
>>>>>>>>>> generator by Tuple.getTupleClass(...). Is it a missing feature (I
>>>>>> would
>>>>>>>>>> like to have it).
>>>>>>>>>>
>>>>>>>>>> -Matthias
>>>>>>>>>>
>>>>>>>>>
>>>>>>
>>>
>>
>>
> 


Re: Tuple

Posted by Stephan Ewen <se...@apache.org>.
Tuple0 would need special serialization and comparator logic. If that is
given, I see no reason not to support it.

There is BTW, the request to create a dedicated "flink-tuple" project, that
only contains the tuple classes. Any opinions on that?

On Mon, Aug 3, 2015 at 12:45 AM, Matthias J. Sax <
mjsax@informatik.hu-berlin.de> wrote:

> Thanks for the explanation!
>
> As I mentioned before, Tuple0 might also be helpful for streaming. And I
> guess I will need it for Storm compatibility layer, too. (I need to
> double check, but Storm supports zero-attribute-tuples, too).
>
> With regard to the information I collected during the discussion, I vote
> for keeping Tuple0 in Flink core, and fix the serialization problem.
> Should we have another JIRA for this? Or should I extend the existing
> JIRA? (https://issues.apache.org/jira/browse/FLINK-2457)
>
> -Matthias
>
>
> On 08/03/2015 12:22 AM, Chesnay Schepler wrote:
> > First of all, it was a really good idea to start a discussion about this.
> >
> > So the general idea behind Tuple0 was this:
> >
> > The Python API maps python tuples to flink tuples. Python can have empty
> > tuples, so i thought "well duh, let's make a Tuple0 class!". What i did
> > not wanna do is create some non-Tuple object to represent empty tuples,
> > I'd rather have them treated the same, because it's less work and
> > creates simpler code.
> >
> > When transferring the plan to java, certain parameters for operations
> > are tuples, which can be empty aswell.
> > This is where the Tuple0 class is really useful, because these empty
> > tuples go through the same logic as other tuples.
> > This is also why i want to keep the class, at least in the python
> > project, for now.
> >
> > For the actual program execution, I need a new solution. Funny story,
> > while writing this reply i noticed that the Python API can't handle
> > Tuple0 at runtime aswell. ha...ha... -.-
> >
> > Guess I now know what I'm working on next.
> >
> > On 02.08.2015 21:24, Matthias J. Sax wrote:
> >> Can you elaborate how and why Python used Tuple0? If it cannot be
> >> serialized similar to regular Tuples, what is the usage in Python? Right
> >> now it seems, as there is no special serialization code for Tuple0.
> >>
> >> I just want to understand the topic in detail.
> >>
> >> -Matthias
> >>
> >>
> >> On 08/01/2015 03:38 PM, Stephan Ewen wrote:
> >>> I think a Tuple0 cannot be implemented like the current tuples, at
> least
> >>> with respect to runtime serialization.
> >>>
> >>> The system makes the assumption that it makes progress in consuming
> >>> bytes
> >>> when deserializing values. If a Tuple= never consumes data from the
> byte
> >>> stream, this assumption is broken. It would need at least one marker
> >>> byte.
> >>> Then it effectively is a Tuple1<Byte> disgusing itself as a tuple0.
> >>>
> >>>
> >>>
> >>> On Sat, Aug 1, 2015 at 1:38 PM, Matthias J. Sax <
> >>> mjsax@informatik.hu-berlin.de> wrote:
> >>>
> >>>> I just double checked. Scala does not have type Tuple0. IMHO, it would
> >>>> be best to remove Tuple0 for consistency. Having Tuple types is for
> >>>> consistency reason with Scala in the first place, right? Please give
> >>>> feedback.
> >>>>
> >>>> -Matthias
> >>>>
> >>>>
> >>>> On 08/01/2015 01:04 PM, Matthias J. Sax wrote:
> >>>>> I see.
> >>>>>
> >>>>> I think that it might be useful to have Tuple0, because in rare
> cases,
> >>>>> you only want to "notify" a downstream operators (taking about
> >>>>> streaming) that something happened but there is no actual data to be
> >>>>> processed. Furthermore, if Flink cannot deal with Tuple0 it should be
> >>>>> removed completely for consistency IMHO.
> >>>>>
> >>>>> I will open a JIRA for it.
> >>>>>
> >>>>> -Matthias
> >>>>>
> >>>>> On 07/31/2015 10:44 PM, Chesnay Schepler wrote:
> >>>>>> also, I'm not sure if I ever sent a Tuple0 through a program, it
> >>>>>> could
> >>>>>> be that the system freaks out.
> >>>>>>
> >>>>>> On 31.07.2015 22:40, Chesnay Schepler wrote:
> >>>>>>> there's no specific reason. it was added fairly recently by me
> >>>>>>> (mid of
> >>>>>>> april), and you're most likely the second person to use it.
> >>>>>>>
> >>>>>>> i didn't integrate into all our tuple related stuff because, well,
> i
> >>>>>>> never thought anyone would actually need it, so i saved myself the
> >>>>>>> trouble.
> >>>>>>>
> >>>>>>>> Hi,
> >>>>>>>>
> >>>>>>>> is there any specific reason, why Tuple.getTupleClass(int arity)
> >>>>>>>> does
> >>>>>>>> not support arity zero? There is a class Tuple0, but it cannot be
> >>>>>>>> generator by Tuple.getTupleClass(...). Is it a missing feature (I
> >>>> would
> >>>>>>>> like to have it).
> >>>>>>>>
> >>>>>>>> -Matthias
> >>>>>>>>
> >>>>>>>
> >>>>
> >
>
>

Re: Tuple

Posted by "Matthias J. Sax" <mj...@informatik.hu-berlin.de>.
Thanks for the explanation!

As I mentioned before, Tuple0 might also be helpful for streaming. And I
guess I will need it for Storm compatibility layer, too. (I need to
double check, but Storm supports zero-attribute-tuples, too).

With regard to the information I collected during the discussion, I vote
for keeping Tuple0 in Flink core, and fix the serialization problem.
Should we have another JIRA for this? Or should I extend the existing
JIRA? (https://issues.apache.org/jira/browse/FLINK-2457)

-Matthias


On 08/03/2015 12:22 AM, Chesnay Schepler wrote:
> First of all, it was a really good idea to start a discussion about this.
> 
> So the general idea behind Tuple0 was this:
> 
> The Python API maps python tuples to flink tuples. Python can have empty
> tuples, so i thought "well duh, let's make a Tuple0 class!". What i did
> not wanna do is create some non-Tuple object to represent empty tuples,
> I'd rather have them treated the same, because it's less work and
> creates simpler code.
> 
> When transferring the plan to java, certain parameters for operations
> are tuples, which can be empty aswell.
> This is where the Tuple0 class is really useful, because these empty
> tuples go through the same logic as other tuples.
> This is also why i want to keep the class, at least in the python
> project, for now.
> 
> For the actual program execution, I need a new solution. Funny story,
> while writing this reply i noticed that the Python API can't handle
> Tuple0 at runtime aswell. ha...ha... -.-
> 
> Guess I now know what I'm working on next.
> 
> On 02.08.2015 21:24, Matthias J. Sax wrote:
>> Can you elaborate how and why Python used Tuple0? If it cannot be
>> serialized similar to regular Tuples, what is the usage in Python? Right
>> now it seems, as there is no special serialization code for Tuple0.
>>
>> I just want to understand the topic in detail.
>>
>> -Matthias
>>
>>
>> On 08/01/2015 03:38 PM, Stephan Ewen wrote:
>>> I think a Tuple0 cannot be implemented like the current tuples, at least
>>> with respect to runtime serialization.
>>>
>>> The system makes the assumption that it makes progress in consuming
>>> bytes
>>> when deserializing values. If a Tuple= never consumes data from the byte
>>> stream, this assumption is broken. It would need at least one marker
>>> byte.
>>> Then it effectively is a Tuple1<Byte> disgusing itself as a tuple0.
>>>
>>>
>>>
>>> On Sat, Aug 1, 2015 at 1:38 PM, Matthias J. Sax <
>>> mjsax@informatik.hu-berlin.de> wrote:
>>>
>>>> I just double checked. Scala does not have type Tuple0. IMHO, it would
>>>> be best to remove Tuple0 for consistency. Having Tuple types is for
>>>> consistency reason with Scala in the first place, right? Please give
>>>> feedback.
>>>>
>>>> -Matthias
>>>>
>>>>
>>>> On 08/01/2015 01:04 PM, Matthias J. Sax wrote:
>>>>> I see.
>>>>>
>>>>> I think that it might be useful to have Tuple0, because in rare cases,
>>>>> you only want to "notify" a downstream operators (taking about
>>>>> streaming) that something happened but there is no actual data to be
>>>>> processed. Furthermore, if Flink cannot deal with Tuple0 it should be
>>>>> removed completely for consistency IMHO.
>>>>>
>>>>> I will open a JIRA for it.
>>>>>
>>>>> -Matthias
>>>>>
>>>>> On 07/31/2015 10:44 PM, Chesnay Schepler wrote:
>>>>>> also, I'm not sure if I ever sent a Tuple0 through a program, it
>>>>>> could
>>>>>> be that the system freaks out.
>>>>>>
>>>>>> On 31.07.2015 22:40, Chesnay Schepler wrote:
>>>>>>> there's no specific reason. it was added fairly recently by me
>>>>>>> (mid of
>>>>>>> april), and you're most likely the second person to use it.
>>>>>>>
>>>>>>> i didn't integrate into all our tuple related stuff because, well, i
>>>>>>> never thought anyone would actually need it, so i saved myself the
>>>>>>> trouble.
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> is there any specific reason, why Tuple.getTupleClass(int arity)
>>>>>>>> does
>>>>>>>> not support arity zero? There is a class Tuple0, but it cannot be
>>>>>>>> generator by Tuple.getTupleClass(...). Is it a missing feature (I
>>>> would
>>>>>>>> like to have it).
>>>>>>>>
>>>>>>>> -Matthias
>>>>>>>>
>>>>>>>
>>>>
> 


Re: Tuple

Posted by Chesnay Schepler <ch...@fu-berlin.de>.
First of all, it was a really good idea to start a discussion about this.

So the general idea behind Tuple0 was this:

The Python API maps python tuples to flink tuples. Python can have empty 
tuples, so i thought "well duh, let's make a Tuple0 class!". What i did 
not wanna do is create some non-Tuple object to represent empty tuples, 
I'd rather have them treated the same, because it's less work and 
creates simpler code.

When transferring the plan to java, certain parameters for operations 
are tuples, which can be empty aswell.
This is where the Tuple0 class is really useful, because these empty 
tuples go through the same logic as other tuples.
This is also why i want to keep the class, at least in the python 
project, for now.

For the actual program execution, I need a new solution. Funny story, 
while writing this reply i noticed that the Python API can't handle 
Tuple0 at runtime aswell. ha...ha... -.-

Guess I now know what I'm working on next.

On 02.08.2015 21:24, Matthias J. Sax wrote:
> Can you elaborate how and why Python used Tuple0? If it cannot be
> serialized similar to regular Tuples, what is the usage in Python? Right
> now it seems, as there is no special serialization code for Tuple0.
>
> I just want to understand the topic in detail.
>
> -Matthias
>
>
> On 08/01/2015 03:38 PM, Stephan Ewen wrote:
>> I think a Tuple0 cannot be implemented like the current tuples, at least
>> with respect to runtime serialization.
>>
>> The system makes the assumption that it makes progress in consuming bytes
>> when deserializing values. If a Tuple= never consumes data from the byte
>> stream, this assumption is broken. It would need at least one marker byte.
>> Then it effectively is a Tuple1<Byte> disgusing itself as a tuple0.
>>
>>
>>
>> On Sat, Aug 1, 2015 at 1:38 PM, Matthias J. Sax <
>> mjsax@informatik.hu-berlin.de> wrote:
>>
>>> I just double checked. Scala does not have type Tuple0. IMHO, it would
>>> be best to remove Tuple0 for consistency. Having Tuple types is for
>>> consistency reason with Scala in the first place, right? Please give
>>> feedback.
>>>
>>> -Matthias
>>>
>>>
>>> On 08/01/2015 01:04 PM, Matthias J. Sax wrote:
>>>> I see.
>>>>
>>>> I think that it might be useful to have Tuple0, because in rare cases,
>>>> you only want to "notify" a downstream operators (taking about
>>>> streaming) that something happened but there is no actual data to be
>>>> processed. Furthermore, if Flink cannot deal with Tuple0 it should be
>>>> removed completely for consistency IMHO.
>>>>
>>>> I will open a JIRA for it.
>>>>
>>>> -Matthias
>>>>
>>>> On 07/31/2015 10:44 PM, Chesnay Schepler wrote:
>>>>> also, I'm not sure if I ever sent a Tuple0 through a program, it could
>>>>> be that the system freaks out.
>>>>>
>>>>> On 31.07.2015 22:40, Chesnay Schepler wrote:
>>>>>> there's no specific reason. it was added fairly recently by me (mid of
>>>>>> april), and you're most likely the second person to use it.
>>>>>>
>>>>>> i didn't integrate into all our tuple related stuff because, well, i
>>>>>> never thought anyone would actually need it, so i saved myself the
>>>>>> trouble.
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> is there any specific reason, why Tuple.getTupleClass(int arity) does
>>>>>>> not support arity zero? There is a class Tuple0, but it cannot be
>>>>>>> generator by Tuple.getTupleClass(...). Is it a missing feature (I
>>> would
>>>>>>> like to have it).
>>>>>>>
>>>>>>> -Matthias
>>>>>>>
>>>>>>
>>>


Re: Tuple

Posted by "Matthias J. Sax" <mj...@informatik.hu-berlin.de>.
Can you elaborate how and why Python used Tuple0? If it cannot be
serialized similar to regular Tuples, what is the usage in Python? Right
now it seems, as there is no special serialization code for Tuple0.

I just want to understand the topic in detail.

-Matthias


On 08/01/2015 03:38 PM, Stephan Ewen wrote:
> I think a Tuple0 cannot be implemented like the current tuples, at least
> with respect to runtime serialization.
> 
> The system makes the assumption that it makes progress in consuming bytes
> when deserializing values. If a Tuple= never consumes data from the byte
> stream, this assumption is broken. It would need at least one marker byte.
> Then it effectively is a Tuple1<Byte> disgusing itself as a tuple0.
> 
> 
> 
> On Sat, Aug 1, 2015 at 1:38 PM, Matthias J. Sax <
> mjsax@informatik.hu-berlin.de> wrote:
> 
>> I just double checked. Scala does not have type Tuple0. IMHO, it would
>> be best to remove Tuple0 for consistency. Having Tuple types is for
>> consistency reason with Scala in the first place, right? Please give
>> feedback.
>>
>> -Matthias
>>
>>
>> On 08/01/2015 01:04 PM, Matthias J. Sax wrote:
>>> I see.
>>>
>>> I think that it might be useful to have Tuple0, because in rare cases,
>>> you only want to "notify" a downstream operators (taking about
>>> streaming) that something happened but there is no actual data to be
>>> processed. Furthermore, if Flink cannot deal with Tuple0 it should be
>>> removed completely for consistency IMHO.
>>>
>>> I will open a JIRA for it.
>>>
>>> -Matthias
>>>
>>> On 07/31/2015 10:44 PM, Chesnay Schepler wrote:
>>>> also, I'm not sure if I ever sent a Tuple0 through a program, it could
>>>> be that the system freaks out.
>>>>
>>>> On 31.07.2015 22:40, Chesnay Schepler wrote:
>>>>> there's no specific reason. it was added fairly recently by me (mid of
>>>>> april), and you're most likely the second person to use it.
>>>>>
>>>>> i didn't integrate into all our tuple related stuff because, well, i
>>>>> never thought anyone would actually need it, so i saved myself the
>>>>> trouble.
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> is there any specific reason, why Tuple.getTupleClass(int arity) does
>>>>>> not support arity zero? There is a class Tuple0, but it cannot be
>>>>>> generator by Tuple.getTupleClass(...). Is it a missing feature (I
>> would
>>>>>> like to have it).
>>>>>>
>>>>>> -Matthias
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>>
> 


Re: Tuple

Posted by Stephan Ewen <se...@apache.org>.
I think a Tuple0 cannot be implemented like the current tuples, at least
with respect to runtime serialization.

The system makes the assumption that it makes progress in consuming bytes
when deserializing values. If a Tuple= never consumes data from the byte
stream, this assumption is broken. It would need at least one marker byte.
Then it effectively is a Tuple1<Byte> disgusing itself as a tuple0.



On Sat, Aug 1, 2015 at 1:38 PM, Matthias J. Sax <
mjsax@informatik.hu-berlin.de> wrote:

> I just double checked. Scala does not have type Tuple0. IMHO, it would
> be best to remove Tuple0 for consistency. Having Tuple types is for
> consistency reason with Scala in the first place, right? Please give
> feedback.
>
> -Matthias
>
>
> On 08/01/2015 01:04 PM, Matthias J. Sax wrote:
> > I see.
> >
> > I think that it might be useful to have Tuple0, because in rare cases,
> > you only want to "notify" a downstream operators (taking about
> > streaming) that something happened but there is no actual data to be
> > processed. Furthermore, if Flink cannot deal with Tuple0 it should be
> > removed completely for consistency IMHO.
> >
> > I will open a JIRA for it.
> >
> > -Matthias
> >
> > On 07/31/2015 10:44 PM, Chesnay Schepler wrote:
> >> also, I'm not sure if I ever sent a Tuple0 through a program, it could
> >> be that the system freaks out.
> >>
> >> On 31.07.2015 22:40, Chesnay Schepler wrote:
> >>> there's no specific reason. it was added fairly recently by me (mid of
> >>> april), and you're most likely the second person to use it.
> >>>
> >>> i didn't integrate into all our tuple related stuff because, well, i
> >>> never thought anyone would actually need it, so i saved myself the
> >>> trouble.
> >>>
> >>>> Hi,
> >>>>
> >>>> is there any specific reason, why Tuple.getTupleClass(int arity) does
> >>>> not support arity zero? There is a class Tuple0, but it cannot be
> >>>> generator by Tuple.getTupleClass(...). Is it a missing feature (I
> would
> >>>> like to have it).
> >>>>
> >>>> -Matthias
> >>>>
> >>>
> >>>
> >>
> >
>
>

Re: Tuple

Posted by "Matthias J. Sax" <mj...@informatik.hu-berlin.de>.
I just double checked. Scala does not have type Tuple0. IMHO, it would
be best to remove Tuple0 for consistency. Having Tuple types is for
consistency reason with Scala in the first place, right? Please give
feedback.

-Matthias


On 08/01/2015 01:04 PM, Matthias J. Sax wrote:
> I see.
> 
> I think that it might be useful to have Tuple0, because in rare cases,
> you only want to "notify" a downstream operators (taking about
> streaming) that something happened but there is no actual data to be
> processed. Furthermore, if Flink cannot deal with Tuple0 it should be
> removed completely for consistency IMHO.
> 
> I will open a JIRA for it.
> 
> -Matthias
> 
> On 07/31/2015 10:44 PM, Chesnay Schepler wrote:
>> also, I'm not sure if I ever sent a Tuple0 through a program, it could
>> be that the system freaks out.
>>
>> On 31.07.2015 22:40, Chesnay Schepler wrote:
>>> there's no specific reason. it was added fairly recently by me (mid of
>>> april), and you're most likely the second person to use it.
>>>
>>> i didn't integrate into all our tuple related stuff because, well, i
>>> never thought anyone would actually need it, so i saved myself the
>>> trouble.
>>>
>>>> Hi,
>>>>
>>>> is there any specific reason, why Tuple.getTupleClass(int arity) does
>>>> not support arity zero? There is a class Tuple0, but it cannot be
>>>> generator by Tuple.getTupleClass(...). Is it a missing feature (I would
>>>> like to have it).
>>>>
>>>> -Matthias
>>>>
>>>
>>>
>>
> 


Re: Tuple

Posted by "Matthias J. Sax" <mj...@informatik.hu-berlin.de>.
I see.

I think that it might be useful to have Tuple0, because in rare cases,
you only want to "notify" a downstream operators (taking about
streaming) that something happened but there is no actual data to be
processed. Furthermore, if Flink cannot deal with Tuple0 it should be
removed completely for consistency IMHO.

I will open a JIRA for it.

-Matthias

On 07/31/2015 10:44 PM, Chesnay Schepler wrote:
> also, I'm not sure if I ever sent a Tuple0 through a program, it could
> be that the system freaks out.
> 
> On 31.07.2015 22:40, Chesnay Schepler wrote:
>> there's no specific reason. it was added fairly recently by me (mid of
>> april), and you're most likely the second person to use it.
>>
>> i didn't integrate into all our tuple related stuff because, well, i
>> never thought anyone would actually need it, so i saved myself the
>> trouble.
>>
>>> Hi,
>>>
>>> is there any specific reason, why Tuple.getTupleClass(int arity) does
>>> not support arity zero? There is a class Tuple0, but it cannot be
>>> generator by Tuple.getTupleClass(...). Is it a missing feature (I would
>>> like to have it).
>>>
>>> -Matthias
>>>
>>
>>
> 


Re: Tuple

Posted by Chesnay Schepler <ch...@fu-berlin.de>.
also, I'm not sure if I ever sent a Tuple0 through a program, it could 
be that the system freaks out.

On 31.07.2015 22:40, Chesnay Schepler wrote:
> there's no specific reason. it was added fairly recently by me (mid of 
> april), and you're most likely the second person to use it.
>
> i didn't integrate into all our tuple related stuff because, well, i 
> never thought anyone would actually need it, so i saved myself the 
> trouble.
>
>> Hi,
>>
>> is there any specific reason, why Tuple.getTupleClass(int arity) does
>> not support arity zero? There is a class Tuple0, but it cannot be
>> generator by Tuple.getTupleClass(...). Is it a missing feature (I would
>> like to have it).
>>
>> -Matthias
>>
>
>


Re: Tuple

Posted by Chesnay Schepler <ch...@fu-berlin.de>.
there's no specific reason. it was added fairly recently by me (mid of 
april), and you're most likely the second person to use it.

i didn't integrate into all our tuple related stuff because, well, i 
never thought anyone would actually need it, so i saved myself the trouble.

> Hi,
>
> is there any specific reason, why Tuple.getTupleClass(int arity) does
> not support arity zero? There is a class Tuple0, but it cannot be
> generator by Tuple.getTupleClass(...). Is it a missing feature (I would
> like to have it).
>
> -Matthias
>