You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flink.apache.org by Robert Metzger <rm...@apache.org> on 2015/06/02 11:35:39 UTC

Re: Changed the behavior of "DataSet.print()"

I would like to reach consensus on this before the 0.9 release.

So far we have the following ideas:

writeToWorkerStdOut(prefix)
printOnTaskManager(prefix) (+1)
logOnTaskManager(prefix)

I'm against logOnTM because we are not logging the output, we are writing
or printing it.


*I would vote for deprecating "print(prefix)" and adding
"writeToWorkerStdOut(prefix)"*



On Thu, May 28, 2015 at 5:00 PM, Chiwan Park <ch...@icloud.com> wrote:

> I agree that avoiding name which starts with “print” is better.
>
> Regards,
> Chiwan Park
>
> > On May 28, 2015, at 11:35 PM, Maximilian Michels <mx...@apache.org> wrote:
> >
> > +1 for printOnTaskManager()
> >
> > On Thu, May 28, 2015 at 2:53 PM, Kruse, Sebastian <
> Sebastian.Kruse@hpi.de>
> > wrote:
> >
> >> Thanks, for your quick responses!
> >>
> >> I also think that renaming the old print method should do the trick. As
> a
> >> contribution to your brainstorming for a name, I propose
> logOnTaskManager()
> >> ;)
> >>
> >> Cheers,
> >> Sebastian
> >>
> >> -----Original Message-----
> >> From: Fabian Hueske [mailto:fhueske@gmail.com]
> >> Sent: Donnerstag, 28. Mai 2015 14:34
> >> To: dev@flink.apache.org
> >> Subject: Re: Changed the behavior of "DataSet.print()"
> >>
> >> As I said, the common print prefix might indicate eager execution.
> >>
> >> I know that writeToTaskManagerStdOut() is quite bulky, but we should
> make
> >> the difference in the behavior very clear, IMO.
> >>
> >> 2015-05-28 14:29 GMT+02:00 Stephan Ewen <se...@apache.org>:
> >>
> >>> Actually, there is a method "print(String prefix)" which still goes to
> >>> the sysout of where the job is executed.
> >>>
> >>> Let's give that one the name "printOnTaskManager()" and then we should
> >>> have it...
> >>>
> >>> On Thu, May 28, 2015 at 2:13 PM, Fabian Hueske <fh...@gmail.com>
> >> wrote:
> >>>
> >>>> I would avoid to call it printXYZ, since print()'s behavior changed
> >>>> to eager execution.
> >>>>
> >>>> 2015-05-28 14:10 GMT+02:00 Robert Metzger <rm...@apache.org>:
> >>>>
> >>>>> Okay, you are right, local is actually confusing.
> >>>>> I'm against introducing "worker" as a term in the API. Its still
> >>>>> called "TaskManager". Maybe "printOnTaskManager()" ?
> >>>>>
> >>>>> On Thu, May 28, 2015 at 2:06 PM, Fabian Hueske <fh...@gmail.com>
> >>>> wrote:
> >>>>>
> >>>>>> +1 for both.
> >>>>>>
> >>>>>> printLocal() might not be the best name, because "local" is not
> >>>>>> well defined and could also be understood as the local machine
> >>>>>> of the
> >>> user.
> >>>>>> How about naming the method completely different
> >>>> (writeToWorkerStdOut()?)
> >>>>>> to make sure users are not confused with eager and lazy execution?
> >>>>>>
> >>>>>>
> >>>>>> 2015-05-28 13:44 GMT+02:00 Robert Metzger <rm...@apache.org>:
> >>>>>>
> >>>>>>> Hi Sebastian,
> >>>>>>>
> >>>>>>> thank you for the feedback. I agree that both variants have a
> >>>>>>> right
> >>>> to
> >>>>>>> exist.
> >>>>>>>
> >>>>>>> I would vote for adding another method to the DataSet called
> >>>>>> "printLocal()"
> >>>>>>> that has the old behavior.
> >>>>>>>
> >>>>>>> On Thu, May 28, 2015 at 1:01 PM, Kruse, Sebastian <
> >>>>>> Sebastian.Kruse@hpi.de>
> >>>>>>> wrote:
> >>>>>>>
> >>>>>>>> Hi everyone,
> >>>>>>>>
> >>>>>>>> I am a bit worried about that recent change of the print()
> >>> method.
> >>>> I
> >>>>>> can
> >>>>>>>> understand the rationale that obtaining the stdout from all
> >>>>>>>> the taskmanagers is cumbersome (although, for local
> >>>>>>>> debugging the old
> >>>>>> print()
> >>>>>>>> was fine).
> >>>>>>>> However, a major problem, I see with the new print(), is,
> >>>>>>>> that
> >>> now
> >>>>> you
> >>>>>>> can
> >>>>>>>> only have one print() per plan, as the plan is directly
> >>>>>>>> executed
> >>> as
> >>>>>> soon
> >>>>>>> as
> >>>>>>>> print() is invoked. If you regard print() as a debugging
> >>>>>>>> means,
> >>>> this
> >>>>>> is a
> >>>>>>>> severe restriction.
> >>>>>>>> I see use cases for both print() implementations, but I
> >>>>>>>> would at
> >>>>> least
> >>>>>>>> provide some kind of backwards compatibility, be at a
> >>>>>>>> parameter
> >>> or
> >>>> a
> >>>>>>>> legacyPrint() method or anything else. As I assume print()
> >>>>>>>> to be
> >>>> very
> >>>>>>>> frequently used, a lot of existing programs would benefit
> >>>>>>>> from
> >>> this
> >>>>> and
> >>>>>>>> might otherwise not be directly portable to newer Flink
> >> versions.
> >>>>> What
> >>>>>> do
> >>>>>>>> you think?
> >>>>>>>>
> >>>>>>>> Cheers,
> >>>>>>>> Sebastian
> >>>>>>>>
> >>>>>>>> -----Original Message-----
> >>>>>>>> From: Robert Metzger [mailto:rmetzger@apache.org]
> >>>>>>>> Sent: Dienstag, 26. Mai 2015 11:12
> >>>>>>>> To: dev@flink.apache.org
> >>>>>>>> Subject: Re: Changed the behavior of "DataSet.print()"
> >>>>>>>>
> >>>>>>>> I've filed a JIRA to update the documentation:
> >>>>>>>> https://issues.apache.org/jira/browse/FLINK-2092
> >>>>>>>>
> >>>>>>>> On Fri, May 22, 2015 at 11:08 AM, Stephan Ewen
> >>>>>>>> <sewen@apache.org
> >>>>
> >>>>>> wrote:
> >>>>>>>>
> >>>>>>>>> Hi all!
> >>>>>>>>>
> >>>>>>>>> Me merged a patch yesterday that changed the API behavior
> >>>>>>>>> of
> >>> the
> >>>>>>>>> "DataSet.print()" function.
> >>>>>>>>>
> >>>>>>>>> "print()" now prints to stdout on the client process,
> >>>>>>>>> rather
> >>> than
> >>>>> the
> >>>>>>>>> TaskManager process, as before. This is much nicer for
> >>> debugging
> >>>>> and
> >>>>>>>>> exploring data sets.
> >>>>>>>>>
> >>>>>>>>> One implication of this is that print() is now an eager
> >>>>>>>>> method
> >>> (
> >>>>> like
> >>>>>>>>> collect() or count() ). That means that calling "print()"
> >>>>> immediately
> >>>>>>>>> triggers the execution, and no "env.execute()" is required
> >>>>>>>>> any
> >>>>> more.
> >>>>>>>>>
> >>>>>>>>> Greetings,
> >>>>>>>>> Stephan
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
>
>
>
>
>

Re: Changed the behavior of "DataSet.print()"

Posted by Robert Metzger <rm...@apache.org>.
Resolved in https://issues.apache.org/jira/browse/FLINK-2070.

I'll update the documentation.

On Thu, Jun 4, 2015 at 12:22 AM, Stephan Ewen <se...@apache.org> wrote:

> I'll prepare a fix...
>
> On Wed, Jun 3, 2015 at 10:24 PM, Stephan Ewen <se...@apache.org> wrote:
>
> > +1 for printOnTaskManager(prefix)
> >
> > +1 for deprecating the print(prefix) method.
> >
> > On Tue, Jun 2, 2015 at 5:24 PM, Aljoscha Krettek <al...@apache.org>
> > wrote:
> >
> >> By the way, we also should rename the corresponding Streaming API
> >> method accordingly.
> >>
> >> On Tue, Jun 2, 2015 at 3:24 PM, Maximilian Michels <mx...@apache.org>
> >> wrote:
> >> > +1 for printOnTaskManager(prefix)
> >> >
> >> > On Tue, Jun 2, 2015 at 1:54 PM, Kostas Tzoumas <kt...@apache.org>
> >> wrote:
> >> >
> >> >> +1 for printOnTaskManager(prefix)
> >> >>
> >> >> On Tue, Jun 2, 2015 at 1:35 PM, Till Rohrmann <tr...@apache.org>
> >> >> wrote:
> >> >>
> >> >> > +1 for printOnTaskManager(prefix)
> >> >> >
> >> >> > On Tue, Jun 2, 2015 at 12:08 PM, Fabian Hueske <fh...@gmail.com>
> >> >> wrote:
> >> >> >
> >> >> > > +1 for writeToWorkerStdOut(prefix)
> >> >> > > On Jun 2, 2015 11:42, "Aljoscha Krettek" <al...@apache.org>
> >> wrote:
> >> >> > >
> >> >> > > > +1 for printOnTaskManager(prefix)
> >> >> > > >
> >> >> > > > On Tue, Jun 2, 2015 at 11:35 AM, Robert Metzger <
> >> rmetzger@apache.org
> >> >> >
> >> >> > > > wrote:
> >> >> > > > > I would like to reach consensus on this before the 0.9
> release.
> >> >> > > > >
> >> >> > > > > So far we have the following ideas:
> >> >> > > > >
> >> >> > > > > writeToWorkerStdOut(prefix)
> >> >> > > > > printOnTaskManager(prefix) (+1)
> >> >> > > > > logOnTaskManager(prefix)
> >> >> > > > >
> >> >> > > > > I'm against logOnTM because we are not logging the output, we
> >> are
> >> >> > > writing
> >> >> > > > > or printing it.
> >> >> > > > >
> >> >> > > > >
> >> >> > > > > *I would vote for deprecating "print(prefix)" and adding
> >> >> > > > > "writeToWorkerStdOut(prefix)"*
> >> >> > > > >
> >> >> > > > >
> >> >> > > > >
> >> >> > > > > On Thu, May 28, 2015 at 5:00 PM, Chiwan Park <
> >> >> chiwanpark@icloud.com>
> >> >> > > > wrote:
> >> >> > > > >
> >> >> > > > >> I agree that avoiding name which starts with “print” is
> >> better.
> >> >> > > > >>
> >> >> > > > >> Regards,
> >> >> > > > >> Chiwan Park
> >> >> > > > >>
> >> >> > > > >> > On May 28, 2015, at 11:35 PM, Maximilian Michels <
> >> >> mxm@apache.org>
> >> >> > > > wrote:
> >> >> > > > >> >
> >> >> > > > >> > +1 for printOnTaskManager()
> >> >> > > > >> >
> >> >> > > > >> > On Thu, May 28, 2015 at 2:53 PM, Kruse, Sebastian <
> >> >> > > > >> Sebastian.Kruse@hpi.de>
> >> >> > > > >> > wrote:
> >> >> > > > >> >
> >> >> > > > >> >> Thanks, for your quick responses!
> >> >> > > > >> >>
> >> >> > > > >> >> I also think that renaming the old print method should do
> >> the
> >> >> > > trick.
> >> >> > > > As
> >> >> > > > >> a
> >> >> > > > >> >> contribution to your brainstorming for a name, I propose
> >> >> > > > >> logOnTaskManager()
> >> >> > > > >> >> ;)
> >> >> > > > >> >>
> >> >> > > > >> >> Cheers,
> >> >> > > > >> >> Sebastian
> >> >> > > > >> >>
> >> >> > > > >> >> -----Original Message-----
> >> >> > > > >> >> From: Fabian Hueske [mailto:fhueske@gmail.com]
> >> >> > > > >> >> Sent: Donnerstag, 28. Mai 2015 14:34
> >> >> > > > >> >> To: dev@flink.apache.org
> >> >> > > > >> >> Subject: Re: Changed the behavior of "DataSet.print()"
> >> >> > > > >> >>
> >> >> > > > >> >> As I said, the common print prefix might indicate eager
> >> >> > execution.
> >> >> > > > >> >>
> >> >> > > > >> >> I know that writeToTaskManagerStdOut() is quite bulky,
> but
> >> we
> >> >> > > should
> >> >> > > > >> make
> >> >> > > > >> >> the difference in the behavior very clear, IMO.
> >> >> > > > >> >>
> >> >> > > > >> >> 2015-05-28 14:29 GMT+02:00 Stephan Ewen <
> sewen@apache.org
> >> >:
> >> >> > > > >> >>
> >> >> > > > >> >>> Actually, there is a method "print(String prefix)" which
> >> still
> >> >> > > goes
> >> >> > > > to
> >> >> > > > >> >>> the sysout of where the job is executed.
> >> >> > > > >> >>>
> >> >> > > > >> >>> Let's give that one the name "printOnTaskManager()" and
> >> then
> >> >> we
> >> >> > > > should
> >> >> > > > >> >>> have it...
> >> >> > > > >> >>>
> >> >> > > > >> >>> On Thu, May 28, 2015 at 2:13 PM, Fabian Hueske <
> >> >> > fhueske@gmail.com
> >> >> > > >
> >> >> > > > >> >> wrote:
> >> >> > > > >> >>>
> >> >> > > > >> >>>> I would avoid to call it printXYZ, since print()'s
> >> behavior
> >> >> > > changed
> >> >> > > > >> >>>> to eager execution.
> >> >> > > > >> >>>>
> >> >> > > > >> >>>> 2015-05-28 14:10 GMT+02:00 Robert Metzger <
> >> >> rmetzger@apache.org
> >> >> > >:
> >> >> > > > >> >>>>
> >> >> > > > >> >>>>> Okay, you are right, local is actually confusing.
> >> >> > > > >> >>>>> I'm against introducing "worker" as a term in the API.
> >> Its
> >> >> > still
> >> >> > > > >> >>>>> called "TaskManager". Maybe "printOnTaskManager()" ?
> >> >> > > > >> >>>>>
> >> >> > > > >> >>>>> On Thu, May 28, 2015 at 2:06 PM, Fabian Hueske <
> >> >> > > fhueske@gmail.com
> >> >> > > > >
> >> >> > > > >> >>>> wrote:
> >> >> > > > >> >>>>>
> >> >> > > > >> >>>>>> +1 for both.
> >> >> > > > >> >>>>>>
> >> >> > > > >> >>>>>> printLocal() might not be the best name, because
> >> "local" is
> >> >> > not
> >> >> > > > >> >>>>>> well defined and could also be understood as the
> local
> >> >> > machine
> >> >> > > > >> >>>>>> of the
> >> >> > > > >> >>> user.
> >> >> > > > >> >>>>>> How about naming the method completely different
> >> >> > > > >> >>>> (writeToWorkerStdOut()?)
> >> >> > > > >> >>>>>> to make sure users are not confused with eager and
> lazy
> >> >> > > > execution?
> >> >> > > > >> >>>>>>
> >> >> > > > >> >>>>>>
> >> >> > > > >> >>>>>> 2015-05-28 13:44 GMT+02:00 Robert Metzger <
> >> >> > rmetzger@apache.org
> >> >> > > >:
> >> >> > > > >> >>>>>>
> >> >> > > > >> >>>>>>> Hi Sebastian,
> >> >> > > > >> >>>>>>>
> >> >> > > > >> >>>>>>> thank you for the feedback. I agree that both
> variants
> >> >> have
> >> >> > a
> >> >> > > > >> >>>>>>> right
> >> >> > > > >> >>>> to
> >> >> > > > >> >>>>>>> exist.
> >> >> > > > >> >>>>>>>
> >> >> > > > >> >>>>>>> I would vote for adding another method to the
> DataSet
> >> >> called
> >> >> > > > >> >>>>>> "printLocal()"
> >> >> > > > >> >>>>>>> that has the old behavior.
> >> >> > > > >> >>>>>>>
> >> >> > > > >> >>>>>>> On Thu, May 28, 2015 at 1:01 PM, Kruse, Sebastian <
> >> >> > > > >> >>>>>> Sebastian.Kruse@hpi.de>
> >> >> > > > >> >>>>>>> wrote:
> >> >> > > > >> >>>>>>>
> >> >> > > > >> >>>>>>>> Hi everyone,
> >> >> > > > >> >>>>>>>>
> >> >> > > > >> >>>>>>>> I am a bit worried about that recent change of the
> >> >> print()
> >> >> > > > >> >>> method.
> >> >> > > > >> >>>> I
> >> >> > > > >> >>>>>> can
> >> >> > > > >> >>>>>>>> understand the rationale that obtaining the stdout
> >> from
> >> >> all
> >> >> > > > >> >>>>>>>> the taskmanagers is cumbersome (although, for local
> >> >> > > > >> >>>>>>>> debugging the old
> >> >> > > > >> >>>>>> print()
> >> >> > > > >> >>>>>>>> was fine).
> >> >> > > > >> >>>>>>>> However, a major problem, I see with the new
> >> print(), is,
> >> >> > > > >> >>>>>>>> that
> >> >> > > > >> >>> now
> >> >> > > > >> >>>>> you
> >> >> > > > >> >>>>>>> can
> >> >> > > > >> >>>>>>>> only have one print() per plan, as the plan is
> >> directly
> >> >> > > > >> >>>>>>>> executed
> >> >> > > > >> >>> as
> >> >> > > > >> >>>>>> soon
> >> >> > > > >> >>>>>>> as
> >> >> > > > >> >>>>>>>> print() is invoked. If you regard print() as a
> >> debugging
> >> >> > > > >> >>>>>>>> means,
> >> >> > > > >> >>>> this
> >> >> > > > >> >>>>>> is a
> >> >> > > > >> >>>>>>>> severe restriction.
> >> >> > > > >> >>>>>>>> I see use cases for both print() implementations,
> >> but I
> >> >> > > > >> >>>>>>>> would at
> >> >> > > > >> >>>>> least
> >> >> > > > >> >>>>>>>> provide some kind of backwards compatibility, be
> at a
> >> >> > > > >> >>>>>>>> parameter
> >> >> > > > >> >>> or
> >> >> > > > >> >>>> a
> >> >> > > > >> >>>>>>>> legacyPrint() method or anything else. As I assume
> >> >> print()
> >> >> > > > >> >>>>>>>> to be
> >> >> > > > >> >>>> very
> >> >> > > > >> >>>>>>>> frequently used, a lot of existing programs would
> >> benefit
> >> >> > > > >> >>>>>>>> from
> >> >> > > > >> >>> this
> >> >> > > > >> >>>>> and
> >> >> > > > >> >>>>>>>> might otherwise not be directly portable to newer
> >> Flink
> >> >> > > > >> >> versions.
> >> >> > > > >> >>>>> What
> >> >> > > > >> >>>>>> do
> >> >> > > > >> >>>>>>>> you think?
> >> >> > > > >> >>>>>>>>
> >> >> > > > >> >>>>>>>> Cheers,
> >> >> > > > >> >>>>>>>> Sebastian
> >> >> > > > >> >>>>>>>>
> >> >> > > > >> >>>>>>>> -----Original Message-----
> >> >> > > > >> >>>>>>>> From: Robert Metzger [mailto:rmetzger@apache.org]
> >> >> > > > >> >>>>>>>> Sent: Dienstag, 26. Mai 2015 11:12
> >> >> > > > >> >>>>>>>> To: dev@flink.apache.org
> >> >> > > > >> >>>>>>>> Subject: Re: Changed the behavior of
> >> "DataSet.print()"
> >> >> > > > >> >>>>>>>>
> >> >> > > > >> >>>>>>>> I've filed a JIRA to update the documentation:
> >> >> > > > >> >>>>>>>> https://issues.apache.org/jira/browse/FLINK-2092
> >> >> > > > >> >>>>>>>>
> >> >> > > > >> >>>>>>>> On Fri, May 22, 2015 at 11:08 AM, Stephan Ewen
> >> >> > > > >> >>>>>>>> <sewen@apache.org
> >> >> > > > >> >>>>
> >> >> > > > >> >>>>>> wrote:
> >> >> > > > >> >>>>>>>>
> >> >> > > > >> >>>>>>>>> Hi all!
> >> >> > > > >> >>>>>>>>>
> >> >> > > > >> >>>>>>>>> Me merged a patch yesterday that changed the API
> >> >> behavior
> >> >> > > > >> >>>>>>>>> of
> >> >> > > > >> >>> the
> >> >> > > > >> >>>>>>>>> "DataSet.print()" function.
> >> >> > > > >> >>>>>>>>>
> >> >> > > > >> >>>>>>>>> "print()" now prints to stdout on the client
> >> process,
> >> >> > > > >> >>>>>>>>> rather
> >> >> > > > >> >>> than
> >> >> > > > >> >>>>> the
> >> >> > > > >> >>>>>>>>> TaskManager process, as before. This is much nicer
> >> for
> >> >> > > > >> >>> debugging
> >> >> > > > >> >>>>> and
> >> >> > > > >> >>>>>>>>> exploring data sets.
> >> >> > > > >> >>>>>>>>>
> >> >> > > > >> >>>>>>>>> One implication of this is that print() is now an
> >> eager
> >> >> > > > >> >>>>>>>>> method
> >> >> > > > >> >>> (
> >> >> > > > >> >>>>> like
> >> >> > > > >> >>>>>>>>> collect() or count() ). That means that calling
> >> >> "print()"
> >> >> > > > >> >>>>> immediately
> >> >> > > > >> >>>>>>>>> triggers the execution, and no "env.execute()" is
> >> >> required
> >> >> > > > >> >>>>>>>>> any
> >> >> > > > >> >>>>> more.
> >> >> > > > >> >>>>>>>>>
> >> >> > > > >> >>>>>>>>> Greetings,
> >> >> > > > >> >>>>>>>>> Stephan
> >> >> > > > >> >>>>>>>>>
> >> >> > > > >> >>>>>>>>>
> >> >> > > > >> >>>>>>>>
> >> >> > > > >> >>>>>>>
> >> >> > > > >> >>>>>>
> >> >> > > > >> >>>>>
> >> >> > > > >> >>>>
> >> >> > > > >> >>>
> >> >> > > > >> >>
> >> >> > > > >>
> >> >> > > > >>
> >> >> > > > >>
> >> >> > > > >>
> >> >> > > > >>
> >> >> > > >
> >> >> > >
> >> >> >
> >> >>
> >>
> >
> >
>

Re: Changed the behavior of "DataSet.print()"

Posted by Stephan Ewen <se...@apache.org>.
I'll prepare a fix...

On Wed, Jun 3, 2015 at 10:24 PM, Stephan Ewen <se...@apache.org> wrote:

> +1 for printOnTaskManager(prefix)
>
> +1 for deprecating the print(prefix) method.
>
> On Tue, Jun 2, 2015 at 5:24 PM, Aljoscha Krettek <al...@apache.org>
> wrote:
>
>> By the way, we also should rename the corresponding Streaming API
>> method accordingly.
>>
>> On Tue, Jun 2, 2015 at 3:24 PM, Maximilian Michels <mx...@apache.org>
>> wrote:
>> > +1 for printOnTaskManager(prefix)
>> >
>> > On Tue, Jun 2, 2015 at 1:54 PM, Kostas Tzoumas <kt...@apache.org>
>> wrote:
>> >
>> >> +1 for printOnTaskManager(prefix)
>> >>
>> >> On Tue, Jun 2, 2015 at 1:35 PM, Till Rohrmann <tr...@apache.org>
>> >> wrote:
>> >>
>> >> > +1 for printOnTaskManager(prefix)
>> >> >
>> >> > On Tue, Jun 2, 2015 at 12:08 PM, Fabian Hueske <fh...@gmail.com>
>> >> wrote:
>> >> >
>> >> > > +1 for writeToWorkerStdOut(prefix)
>> >> > > On Jun 2, 2015 11:42, "Aljoscha Krettek" <al...@apache.org>
>> wrote:
>> >> > >
>> >> > > > +1 for printOnTaskManager(prefix)
>> >> > > >
>> >> > > > On Tue, Jun 2, 2015 at 11:35 AM, Robert Metzger <
>> rmetzger@apache.org
>> >> >
>> >> > > > wrote:
>> >> > > > > I would like to reach consensus on this before the 0.9 release.
>> >> > > > >
>> >> > > > > So far we have the following ideas:
>> >> > > > >
>> >> > > > > writeToWorkerStdOut(prefix)
>> >> > > > > printOnTaskManager(prefix) (+1)
>> >> > > > > logOnTaskManager(prefix)
>> >> > > > >
>> >> > > > > I'm against logOnTM because we are not logging the output, we
>> are
>> >> > > writing
>> >> > > > > or printing it.
>> >> > > > >
>> >> > > > >
>> >> > > > > *I would vote for deprecating "print(prefix)" and adding
>> >> > > > > "writeToWorkerStdOut(prefix)"*
>> >> > > > >
>> >> > > > >
>> >> > > > >
>> >> > > > > On Thu, May 28, 2015 at 5:00 PM, Chiwan Park <
>> >> chiwanpark@icloud.com>
>> >> > > > wrote:
>> >> > > > >
>> >> > > > >> I agree that avoiding name which starts with “print” is
>> better.
>> >> > > > >>
>> >> > > > >> Regards,
>> >> > > > >> Chiwan Park
>> >> > > > >>
>> >> > > > >> > On May 28, 2015, at 11:35 PM, Maximilian Michels <
>> >> mxm@apache.org>
>> >> > > > wrote:
>> >> > > > >> >
>> >> > > > >> > +1 for printOnTaskManager()
>> >> > > > >> >
>> >> > > > >> > On Thu, May 28, 2015 at 2:53 PM, Kruse, Sebastian <
>> >> > > > >> Sebastian.Kruse@hpi.de>
>> >> > > > >> > wrote:
>> >> > > > >> >
>> >> > > > >> >> Thanks, for your quick responses!
>> >> > > > >> >>
>> >> > > > >> >> I also think that renaming the old print method should do
>> the
>> >> > > trick.
>> >> > > > As
>> >> > > > >> a
>> >> > > > >> >> contribution to your brainstorming for a name, I propose
>> >> > > > >> logOnTaskManager()
>> >> > > > >> >> ;)
>> >> > > > >> >>
>> >> > > > >> >> Cheers,
>> >> > > > >> >> Sebastian
>> >> > > > >> >>
>> >> > > > >> >> -----Original Message-----
>> >> > > > >> >> From: Fabian Hueske [mailto:fhueske@gmail.com]
>> >> > > > >> >> Sent: Donnerstag, 28. Mai 2015 14:34
>> >> > > > >> >> To: dev@flink.apache.org
>> >> > > > >> >> Subject: Re: Changed the behavior of "DataSet.print()"
>> >> > > > >> >>
>> >> > > > >> >> As I said, the common print prefix might indicate eager
>> >> > execution.
>> >> > > > >> >>
>> >> > > > >> >> I know that writeToTaskManagerStdOut() is quite bulky, but
>> we
>> >> > > should
>> >> > > > >> make
>> >> > > > >> >> the difference in the behavior very clear, IMO.
>> >> > > > >> >>
>> >> > > > >> >> 2015-05-28 14:29 GMT+02:00 Stephan Ewen <sewen@apache.org
>> >:
>> >> > > > >> >>
>> >> > > > >> >>> Actually, there is a method "print(String prefix)" which
>> still
>> >> > > goes
>> >> > > > to
>> >> > > > >> >>> the sysout of where the job is executed.
>> >> > > > >> >>>
>> >> > > > >> >>> Let's give that one the name "printOnTaskManager()" and
>> then
>> >> we
>> >> > > > should
>> >> > > > >> >>> have it...
>> >> > > > >> >>>
>> >> > > > >> >>> On Thu, May 28, 2015 at 2:13 PM, Fabian Hueske <
>> >> > fhueske@gmail.com
>> >> > > >
>> >> > > > >> >> wrote:
>> >> > > > >> >>>
>> >> > > > >> >>>> I would avoid to call it printXYZ, since print()'s
>> behavior
>> >> > > changed
>> >> > > > >> >>>> to eager execution.
>> >> > > > >> >>>>
>> >> > > > >> >>>> 2015-05-28 14:10 GMT+02:00 Robert Metzger <
>> >> rmetzger@apache.org
>> >> > >:
>> >> > > > >> >>>>
>> >> > > > >> >>>>> Okay, you are right, local is actually confusing.
>> >> > > > >> >>>>> I'm against introducing "worker" as a term in the API.
>> Its
>> >> > still
>> >> > > > >> >>>>> called "TaskManager". Maybe "printOnTaskManager()" ?
>> >> > > > >> >>>>>
>> >> > > > >> >>>>> On Thu, May 28, 2015 at 2:06 PM, Fabian Hueske <
>> >> > > fhueske@gmail.com
>> >> > > > >
>> >> > > > >> >>>> wrote:
>> >> > > > >> >>>>>
>> >> > > > >> >>>>>> +1 for both.
>> >> > > > >> >>>>>>
>> >> > > > >> >>>>>> printLocal() might not be the best name, because
>> "local" is
>> >> > not
>> >> > > > >> >>>>>> well defined and could also be understood as the local
>> >> > machine
>> >> > > > >> >>>>>> of the
>> >> > > > >> >>> user.
>> >> > > > >> >>>>>> How about naming the method completely different
>> >> > > > >> >>>> (writeToWorkerStdOut()?)
>> >> > > > >> >>>>>> to make sure users are not confused with eager and lazy
>> >> > > > execution?
>> >> > > > >> >>>>>>
>> >> > > > >> >>>>>>
>> >> > > > >> >>>>>> 2015-05-28 13:44 GMT+02:00 Robert Metzger <
>> >> > rmetzger@apache.org
>> >> > > >:
>> >> > > > >> >>>>>>
>> >> > > > >> >>>>>>> Hi Sebastian,
>> >> > > > >> >>>>>>>
>> >> > > > >> >>>>>>> thank you for the feedback. I agree that both variants
>> >> have
>> >> > a
>> >> > > > >> >>>>>>> right
>> >> > > > >> >>>> to
>> >> > > > >> >>>>>>> exist.
>> >> > > > >> >>>>>>>
>> >> > > > >> >>>>>>> I would vote for adding another method to the DataSet
>> >> called
>> >> > > > >> >>>>>> "printLocal()"
>> >> > > > >> >>>>>>> that has the old behavior.
>> >> > > > >> >>>>>>>
>> >> > > > >> >>>>>>> On Thu, May 28, 2015 at 1:01 PM, Kruse, Sebastian <
>> >> > > > >> >>>>>> Sebastian.Kruse@hpi.de>
>> >> > > > >> >>>>>>> wrote:
>> >> > > > >> >>>>>>>
>> >> > > > >> >>>>>>>> Hi everyone,
>> >> > > > >> >>>>>>>>
>> >> > > > >> >>>>>>>> I am a bit worried about that recent change of the
>> >> print()
>> >> > > > >> >>> method.
>> >> > > > >> >>>> I
>> >> > > > >> >>>>>> can
>> >> > > > >> >>>>>>>> understand the rationale that obtaining the stdout
>> from
>> >> all
>> >> > > > >> >>>>>>>> the taskmanagers is cumbersome (although, for local
>> >> > > > >> >>>>>>>> debugging the old
>> >> > > > >> >>>>>> print()
>> >> > > > >> >>>>>>>> was fine).
>> >> > > > >> >>>>>>>> However, a major problem, I see with the new
>> print(), is,
>> >> > > > >> >>>>>>>> that
>> >> > > > >> >>> now
>> >> > > > >> >>>>> you
>> >> > > > >> >>>>>>> can
>> >> > > > >> >>>>>>>> only have one print() per plan, as the plan is
>> directly
>> >> > > > >> >>>>>>>> executed
>> >> > > > >> >>> as
>> >> > > > >> >>>>>> soon
>> >> > > > >> >>>>>>> as
>> >> > > > >> >>>>>>>> print() is invoked. If you regard print() as a
>> debugging
>> >> > > > >> >>>>>>>> means,
>> >> > > > >> >>>> this
>> >> > > > >> >>>>>> is a
>> >> > > > >> >>>>>>>> severe restriction.
>> >> > > > >> >>>>>>>> I see use cases for both print() implementations,
>> but I
>> >> > > > >> >>>>>>>> would at
>> >> > > > >> >>>>> least
>> >> > > > >> >>>>>>>> provide some kind of backwards compatibility, be at a
>> >> > > > >> >>>>>>>> parameter
>> >> > > > >> >>> or
>> >> > > > >> >>>> a
>> >> > > > >> >>>>>>>> legacyPrint() method or anything else. As I assume
>> >> print()
>> >> > > > >> >>>>>>>> to be
>> >> > > > >> >>>> very
>> >> > > > >> >>>>>>>> frequently used, a lot of existing programs would
>> benefit
>> >> > > > >> >>>>>>>> from
>> >> > > > >> >>> this
>> >> > > > >> >>>>> and
>> >> > > > >> >>>>>>>> might otherwise not be directly portable to newer
>> Flink
>> >> > > > >> >> versions.
>> >> > > > >> >>>>> What
>> >> > > > >> >>>>>> do
>> >> > > > >> >>>>>>>> you think?
>> >> > > > >> >>>>>>>>
>> >> > > > >> >>>>>>>> Cheers,
>> >> > > > >> >>>>>>>> Sebastian
>> >> > > > >> >>>>>>>>
>> >> > > > >> >>>>>>>> -----Original Message-----
>> >> > > > >> >>>>>>>> From: Robert Metzger [mailto:rmetzger@apache.org]
>> >> > > > >> >>>>>>>> Sent: Dienstag, 26. Mai 2015 11:12
>> >> > > > >> >>>>>>>> To: dev@flink.apache.org
>> >> > > > >> >>>>>>>> Subject: Re: Changed the behavior of
>> "DataSet.print()"
>> >> > > > >> >>>>>>>>
>> >> > > > >> >>>>>>>> I've filed a JIRA to update the documentation:
>> >> > > > >> >>>>>>>> https://issues.apache.org/jira/browse/FLINK-2092
>> >> > > > >> >>>>>>>>
>> >> > > > >> >>>>>>>> On Fri, May 22, 2015 at 11:08 AM, Stephan Ewen
>> >> > > > >> >>>>>>>> <sewen@apache.org
>> >> > > > >> >>>>
>> >> > > > >> >>>>>> wrote:
>> >> > > > >> >>>>>>>>
>> >> > > > >> >>>>>>>>> Hi all!
>> >> > > > >> >>>>>>>>>
>> >> > > > >> >>>>>>>>> Me merged a patch yesterday that changed the API
>> >> behavior
>> >> > > > >> >>>>>>>>> of
>> >> > > > >> >>> the
>> >> > > > >> >>>>>>>>> "DataSet.print()" function.
>> >> > > > >> >>>>>>>>>
>> >> > > > >> >>>>>>>>> "print()" now prints to stdout on the client
>> process,
>> >> > > > >> >>>>>>>>> rather
>> >> > > > >> >>> than
>> >> > > > >> >>>>> the
>> >> > > > >> >>>>>>>>> TaskManager process, as before. This is much nicer
>> for
>> >> > > > >> >>> debugging
>> >> > > > >> >>>>> and
>> >> > > > >> >>>>>>>>> exploring data sets.
>> >> > > > >> >>>>>>>>>
>> >> > > > >> >>>>>>>>> One implication of this is that print() is now an
>> eager
>> >> > > > >> >>>>>>>>> method
>> >> > > > >> >>> (
>> >> > > > >> >>>>> like
>> >> > > > >> >>>>>>>>> collect() or count() ). That means that calling
>> >> "print()"
>> >> > > > >> >>>>> immediately
>> >> > > > >> >>>>>>>>> triggers the execution, and no "env.execute()" is
>> >> required
>> >> > > > >> >>>>>>>>> any
>> >> > > > >> >>>>> more.
>> >> > > > >> >>>>>>>>>
>> >> > > > >> >>>>>>>>> Greetings,
>> >> > > > >> >>>>>>>>> Stephan
>> >> > > > >> >>>>>>>>>
>> >> > > > >> >>>>>>>>>
>> >> > > > >> >>>>>>>>
>> >> > > > >> >>>>>>>
>> >> > > > >> >>>>>>
>> >> > > > >> >>>>>
>> >> > > > >> >>>>
>> >> > > > >> >>>
>> >> > > > >> >>
>> >> > > > >>
>> >> > > > >>
>> >> > > > >>
>> >> > > > >>
>> >> > > > >>
>> >> > > >
>> >> > >
>> >> >
>> >>
>>
>
>

Re: Changed the behavior of "DataSet.print()"

Posted by Stephan Ewen <se...@apache.org>.
+1 for printOnTaskManager(prefix)

+1 for deprecating the print(prefix) method.

On Tue, Jun 2, 2015 at 5:24 PM, Aljoscha Krettek <al...@apache.org>
wrote:

> By the way, we also should rename the corresponding Streaming API
> method accordingly.
>
> On Tue, Jun 2, 2015 at 3:24 PM, Maximilian Michels <mx...@apache.org> wrote:
> > +1 for printOnTaskManager(prefix)
> >
> > On Tue, Jun 2, 2015 at 1:54 PM, Kostas Tzoumas <kt...@apache.org>
> wrote:
> >
> >> +1 for printOnTaskManager(prefix)
> >>
> >> On Tue, Jun 2, 2015 at 1:35 PM, Till Rohrmann <tr...@apache.org>
> >> wrote:
> >>
> >> > +1 for printOnTaskManager(prefix)
> >> >
> >> > On Tue, Jun 2, 2015 at 12:08 PM, Fabian Hueske <fh...@gmail.com>
> >> wrote:
> >> >
> >> > > +1 for writeToWorkerStdOut(prefix)
> >> > > On Jun 2, 2015 11:42, "Aljoscha Krettek" <al...@apache.org>
> wrote:
> >> > >
> >> > > > +1 for printOnTaskManager(prefix)
> >> > > >
> >> > > > On Tue, Jun 2, 2015 at 11:35 AM, Robert Metzger <
> rmetzger@apache.org
> >> >
> >> > > > wrote:
> >> > > > > I would like to reach consensus on this before the 0.9 release.
> >> > > > >
> >> > > > > So far we have the following ideas:
> >> > > > >
> >> > > > > writeToWorkerStdOut(prefix)
> >> > > > > printOnTaskManager(prefix) (+1)
> >> > > > > logOnTaskManager(prefix)
> >> > > > >
> >> > > > > I'm against logOnTM because we are not logging the output, we
> are
> >> > > writing
> >> > > > > or printing it.
> >> > > > >
> >> > > > >
> >> > > > > *I would vote for deprecating "print(prefix)" and adding
> >> > > > > "writeToWorkerStdOut(prefix)"*
> >> > > > >
> >> > > > >
> >> > > > >
> >> > > > > On Thu, May 28, 2015 at 5:00 PM, Chiwan Park <
> >> chiwanpark@icloud.com>
> >> > > > wrote:
> >> > > > >
> >> > > > >> I agree that avoiding name which starts with “print” is better.
> >> > > > >>
> >> > > > >> Regards,
> >> > > > >> Chiwan Park
> >> > > > >>
> >> > > > >> > On May 28, 2015, at 11:35 PM, Maximilian Michels <
> >> mxm@apache.org>
> >> > > > wrote:
> >> > > > >> >
> >> > > > >> > +1 for printOnTaskManager()
> >> > > > >> >
> >> > > > >> > On Thu, May 28, 2015 at 2:53 PM, Kruse, Sebastian <
> >> > > > >> Sebastian.Kruse@hpi.de>
> >> > > > >> > wrote:
> >> > > > >> >
> >> > > > >> >> Thanks, for your quick responses!
> >> > > > >> >>
> >> > > > >> >> I also think that renaming the old print method should do
> the
> >> > > trick.
> >> > > > As
> >> > > > >> a
> >> > > > >> >> contribution to your brainstorming for a name, I propose
> >> > > > >> logOnTaskManager()
> >> > > > >> >> ;)
> >> > > > >> >>
> >> > > > >> >> Cheers,
> >> > > > >> >> Sebastian
> >> > > > >> >>
> >> > > > >> >> -----Original Message-----
> >> > > > >> >> From: Fabian Hueske [mailto:fhueske@gmail.com]
> >> > > > >> >> Sent: Donnerstag, 28. Mai 2015 14:34
> >> > > > >> >> To: dev@flink.apache.org
> >> > > > >> >> Subject: Re: Changed the behavior of "DataSet.print()"
> >> > > > >> >>
> >> > > > >> >> As I said, the common print prefix might indicate eager
> >> > execution.
> >> > > > >> >>
> >> > > > >> >> I know that writeToTaskManagerStdOut() is quite bulky, but
> we
> >> > > should
> >> > > > >> make
> >> > > > >> >> the difference in the behavior very clear, IMO.
> >> > > > >> >>
> >> > > > >> >> 2015-05-28 14:29 GMT+02:00 Stephan Ewen <se...@apache.org>:
> >> > > > >> >>
> >> > > > >> >>> Actually, there is a method "print(String prefix)" which
> still
> >> > > goes
> >> > > > to
> >> > > > >> >>> the sysout of where the job is executed.
> >> > > > >> >>>
> >> > > > >> >>> Let's give that one the name "printOnTaskManager()" and
> then
> >> we
> >> > > > should
> >> > > > >> >>> have it...
> >> > > > >> >>>
> >> > > > >> >>> On Thu, May 28, 2015 at 2:13 PM, Fabian Hueske <
> >> > fhueske@gmail.com
> >> > > >
> >> > > > >> >> wrote:
> >> > > > >> >>>
> >> > > > >> >>>> I would avoid to call it printXYZ, since print()'s
> behavior
> >> > > changed
> >> > > > >> >>>> to eager execution.
> >> > > > >> >>>>
> >> > > > >> >>>> 2015-05-28 14:10 GMT+02:00 Robert Metzger <
> >> rmetzger@apache.org
> >> > >:
> >> > > > >> >>>>
> >> > > > >> >>>>> Okay, you are right, local is actually confusing.
> >> > > > >> >>>>> I'm against introducing "worker" as a term in the API.
> Its
> >> > still
> >> > > > >> >>>>> called "TaskManager". Maybe "printOnTaskManager()" ?
> >> > > > >> >>>>>
> >> > > > >> >>>>> On Thu, May 28, 2015 at 2:06 PM, Fabian Hueske <
> >> > > fhueske@gmail.com
> >> > > > >
> >> > > > >> >>>> wrote:
> >> > > > >> >>>>>
> >> > > > >> >>>>>> +1 for both.
> >> > > > >> >>>>>>
> >> > > > >> >>>>>> printLocal() might not be the best name, because
> "local" is
> >> > not
> >> > > > >> >>>>>> well defined and could also be understood as the local
> >> > machine
> >> > > > >> >>>>>> of the
> >> > > > >> >>> user.
> >> > > > >> >>>>>> How about naming the method completely different
> >> > > > >> >>>> (writeToWorkerStdOut()?)
> >> > > > >> >>>>>> to make sure users are not confused with eager and lazy
> >> > > > execution?
> >> > > > >> >>>>>>
> >> > > > >> >>>>>>
> >> > > > >> >>>>>> 2015-05-28 13:44 GMT+02:00 Robert Metzger <
> >> > rmetzger@apache.org
> >> > > >:
> >> > > > >> >>>>>>
> >> > > > >> >>>>>>> Hi Sebastian,
> >> > > > >> >>>>>>>
> >> > > > >> >>>>>>> thank you for the feedback. I agree that both variants
> >> have
> >> > a
> >> > > > >> >>>>>>> right
> >> > > > >> >>>> to
> >> > > > >> >>>>>>> exist.
> >> > > > >> >>>>>>>
> >> > > > >> >>>>>>> I would vote for adding another method to the DataSet
> >> called
> >> > > > >> >>>>>> "printLocal()"
> >> > > > >> >>>>>>> that has the old behavior.
> >> > > > >> >>>>>>>
> >> > > > >> >>>>>>> On Thu, May 28, 2015 at 1:01 PM, Kruse, Sebastian <
> >> > > > >> >>>>>> Sebastian.Kruse@hpi.de>
> >> > > > >> >>>>>>> wrote:
> >> > > > >> >>>>>>>
> >> > > > >> >>>>>>>> Hi everyone,
> >> > > > >> >>>>>>>>
> >> > > > >> >>>>>>>> I am a bit worried about that recent change of the
> >> print()
> >> > > > >> >>> method.
> >> > > > >> >>>> I
> >> > > > >> >>>>>> can
> >> > > > >> >>>>>>>> understand the rationale that obtaining the stdout
> from
> >> all
> >> > > > >> >>>>>>>> the taskmanagers is cumbersome (although, for local
> >> > > > >> >>>>>>>> debugging the old
> >> > > > >> >>>>>> print()
> >> > > > >> >>>>>>>> was fine).
> >> > > > >> >>>>>>>> However, a major problem, I see with the new print(),
> is,
> >> > > > >> >>>>>>>> that
> >> > > > >> >>> now
> >> > > > >> >>>>> you
> >> > > > >> >>>>>>> can
> >> > > > >> >>>>>>>> only have one print() per plan, as the plan is
> directly
> >> > > > >> >>>>>>>> executed
> >> > > > >> >>> as
> >> > > > >> >>>>>> soon
> >> > > > >> >>>>>>> as
> >> > > > >> >>>>>>>> print() is invoked. If you regard print() as a
> debugging
> >> > > > >> >>>>>>>> means,
> >> > > > >> >>>> this
> >> > > > >> >>>>>> is a
> >> > > > >> >>>>>>>> severe restriction.
> >> > > > >> >>>>>>>> I see use cases for both print() implementations, but
> I
> >> > > > >> >>>>>>>> would at
> >> > > > >> >>>>> least
> >> > > > >> >>>>>>>> provide some kind of backwards compatibility, be at a
> >> > > > >> >>>>>>>> parameter
> >> > > > >> >>> or
> >> > > > >> >>>> a
> >> > > > >> >>>>>>>> legacyPrint() method or anything else. As I assume
> >> print()
> >> > > > >> >>>>>>>> to be
> >> > > > >> >>>> very
> >> > > > >> >>>>>>>> frequently used, a lot of existing programs would
> benefit
> >> > > > >> >>>>>>>> from
> >> > > > >> >>> this
> >> > > > >> >>>>> and
> >> > > > >> >>>>>>>> might otherwise not be directly portable to newer
> Flink
> >> > > > >> >> versions.
> >> > > > >> >>>>> What
> >> > > > >> >>>>>> do
> >> > > > >> >>>>>>>> you think?
> >> > > > >> >>>>>>>>
> >> > > > >> >>>>>>>> Cheers,
> >> > > > >> >>>>>>>> Sebastian
> >> > > > >> >>>>>>>>
> >> > > > >> >>>>>>>> -----Original Message-----
> >> > > > >> >>>>>>>> From: Robert Metzger [mailto:rmetzger@apache.org]
> >> > > > >> >>>>>>>> Sent: Dienstag, 26. Mai 2015 11:12
> >> > > > >> >>>>>>>> To: dev@flink.apache.org
> >> > > > >> >>>>>>>> Subject: Re: Changed the behavior of "DataSet.print()"
> >> > > > >> >>>>>>>>
> >> > > > >> >>>>>>>> I've filed a JIRA to update the documentation:
> >> > > > >> >>>>>>>> https://issues.apache.org/jira/browse/FLINK-2092
> >> > > > >> >>>>>>>>
> >> > > > >> >>>>>>>> On Fri, May 22, 2015 at 11:08 AM, Stephan Ewen
> >> > > > >> >>>>>>>> <sewen@apache.org
> >> > > > >> >>>>
> >> > > > >> >>>>>> wrote:
> >> > > > >> >>>>>>>>
> >> > > > >> >>>>>>>>> Hi all!
> >> > > > >> >>>>>>>>>
> >> > > > >> >>>>>>>>> Me merged a patch yesterday that changed the API
> >> behavior
> >> > > > >> >>>>>>>>> of
> >> > > > >> >>> the
> >> > > > >> >>>>>>>>> "DataSet.print()" function.
> >> > > > >> >>>>>>>>>
> >> > > > >> >>>>>>>>> "print()" now prints to stdout on the client process,
> >> > > > >> >>>>>>>>> rather
> >> > > > >> >>> than
> >> > > > >> >>>>> the
> >> > > > >> >>>>>>>>> TaskManager process, as before. This is much nicer
> for
> >> > > > >> >>> debugging
> >> > > > >> >>>>> and
> >> > > > >> >>>>>>>>> exploring data sets.
> >> > > > >> >>>>>>>>>
> >> > > > >> >>>>>>>>> One implication of this is that print() is now an
> eager
> >> > > > >> >>>>>>>>> method
> >> > > > >> >>> (
> >> > > > >> >>>>> like
> >> > > > >> >>>>>>>>> collect() or count() ). That means that calling
> >> "print()"
> >> > > > >> >>>>> immediately
> >> > > > >> >>>>>>>>> triggers the execution, and no "env.execute()" is
> >> required
> >> > > > >> >>>>>>>>> any
> >> > > > >> >>>>> more.
> >> > > > >> >>>>>>>>>
> >> > > > >> >>>>>>>>> Greetings,
> >> > > > >> >>>>>>>>> Stephan
> >> > > > >> >>>>>>>>>
> >> > > > >> >>>>>>>>>
> >> > > > >> >>>>>>>>
> >> > > > >> >>>>>>>
> >> > > > >> >>>>>>
> >> > > > >> >>>>>
> >> > > > >> >>>>
> >> > > > >> >>>
> >> > > > >> >>
> >> > > > >>
> >> > > > >>
> >> > > > >>
> >> > > > >>
> >> > > > >>
> >> > > >
> >> > >
> >> >
> >>
>

Re: Changed the behavior of "DataSet.print()"

Posted by Aljoscha Krettek <al...@apache.org>.
By the way, we also should rename the corresponding Streaming API
method accordingly.

On Tue, Jun 2, 2015 at 3:24 PM, Maximilian Michels <mx...@apache.org> wrote:
> +1 for printOnTaskManager(prefix)
>
> On Tue, Jun 2, 2015 at 1:54 PM, Kostas Tzoumas <kt...@apache.org> wrote:
>
>> +1 for printOnTaskManager(prefix)
>>
>> On Tue, Jun 2, 2015 at 1:35 PM, Till Rohrmann <tr...@apache.org>
>> wrote:
>>
>> > +1 for printOnTaskManager(prefix)
>> >
>> > On Tue, Jun 2, 2015 at 12:08 PM, Fabian Hueske <fh...@gmail.com>
>> wrote:
>> >
>> > > +1 for writeToWorkerStdOut(prefix)
>> > > On Jun 2, 2015 11:42, "Aljoscha Krettek" <al...@apache.org> wrote:
>> > >
>> > > > +1 for printOnTaskManager(prefix)
>> > > >
>> > > > On Tue, Jun 2, 2015 at 11:35 AM, Robert Metzger <rmetzger@apache.org
>> >
>> > > > wrote:
>> > > > > I would like to reach consensus on this before the 0.9 release.
>> > > > >
>> > > > > So far we have the following ideas:
>> > > > >
>> > > > > writeToWorkerStdOut(prefix)
>> > > > > printOnTaskManager(prefix) (+1)
>> > > > > logOnTaskManager(prefix)
>> > > > >
>> > > > > I'm against logOnTM because we are not logging the output, we are
>> > > writing
>> > > > > or printing it.
>> > > > >
>> > > > >
>> > > > > *I would vote for deprecating "print(prefix)" and adding
>> > > > > "writeToWorkerStdOut(prefix)"*
>> > > > >
>> > > > >
>> > > > >
>> > > > > On Thu, May 28, 2015 at 5:00 PM, Chiwan Park <
>> chiwanpark@icloud.com>
>> > > > wrote:
>> > > > >
>> > > > >> I agree that avoiding name which starts with “print” is better.
>> > > > >>
>> > > > >> Regards,
>> > > > >> Chiwan Park
>> > > > >>
>> > > > >> > On May 28, 2015, at 11:35 PM, Maximilian Michels <
>> mxm@apache.org>
>> > > > wrote:
>> > > > >> >
>> > > > >> > +1 for printOnTaskManager()
>> > > > >> >
>> > > > >> > On Thu, May 28, 2015 at 2:53 PM, Kruse, Sebastian <
>> > > > >> Sebastian.Kruse@hpi.de>
>> > > > >> > wrote:
>> > > > >> >
>> > > > >> >> Thanks, for your quick responses!
>> > > > >> >>
>> > > > >> >> I also think that renaming the old print method should do the
>> > > trick.
>> > > > As
>> > > > >> a
>> > > > >> >> contribution to your brainstorming for a name, I propose
>> > > > >> logOnTaskManager()
>> > > > >> >> ;)
>> > > > >> >>
>> > > > >> >> Cheers,
>> > > > >> >> Sebastian
>> > > > >> >>
>> > > > >> >> -----Original Message-----
>> > > > >> >> From: Fabian Hueske [mailto:fhueske@gmail.com]
>> > > > >> >> Sent: Donnerstag, 28. Mai 2015 14:34
>> > > > >> >> To: dev@flink.apache.org
>> > > > >> >> Subject: Re: Changed the behavior of "DataSet.print()"
>> > > > >> >>
>> > > > >> >> As I said, the common print prefix might indicate eager
>> > execution.
>> > > > >> >>
>> > > > >> >> I know that writeToTaskManagerStdOut() is quite bulky, but we
>> > > should
>> > > > >> make
>> > > > >> >> the difference in the behavior very clear, IMO.
>> > > > >> >>
>> > > > >> >> 2015-05-28 14:29 GMT+02:00 Stephan Ewen <se...@apache.org>:
>> > > > >> >>
>> > > > >> >>> Actually, there is a method "print(String prefix)" which still
>> > > goes
>> > > > to
>> > > > >> >>> the sysout of where the job is executed.
>> > > > >> >>>
>> > > > >> >>> Let's give that one the name "printOnTaskManager()" and then
>> we
>> > > > should
>> > > > >> >>> have it...
>> > > > >> >>>
>> > > > >> >>> On Thu, May 28, 2015 at 2:13 PM, Fabian Hueske <
>> > fhueske@gmail.com
>> > > >
>> > > > >> >> wrote:
>> > > > >> >>>
>> > > > >> >>>> I would avoid to call it printXYZ, since print()'s behavior
>> > > changed
>> > > > >> >>>> to eager execution.
>> > > > >> >>>>
>> > > > >> >>>> 2015-05-28 14:10 GMT+02:00 Robert Metzger <
>> rmetzger@apache.org
>> > >:
>> > > > >> >>>>
>> > > > >> >>>>> Okay, you are right, local is actually confusing.
>> > > > >> >>>>> I'm against introducing "worker" as a term in the API. Its
>> > still
>> > > > >> >>>>> called "TaskManager". Maybe "printOnTaskManager()" ?
>> > > > >> >>>>>
>> > > > >> >>>>> On Thu, May 28, 2015 at 2:06 PM, Fabian Hueske <
>> > > fhueske@gmail.com
>> > > > >
>> > > > >> >>>> wrote:
>> > > > >> >>>>>
>> > > > >> >>>>>> +1 for both.
>> > > > >> >>>>>>
>> > > > >> >>>>>> printLocal() might not be the best name, because "local" is
>> > not
>> > > > >> >>>>>> well defined and could also be understood as the local
>> > machine
>> > > > >> >>>>>> of the
>> > > > >> >>> user.
>> > > > >> >>>>>> How about naming the method completely different
>> > > > >> >>>> (writeToWorkerStdOut()?)
>> > > > >> >>>>>> to make sure users are not confused with eager and lazy
>> > > > execution?
>> > > > >> >>>>>>
>> > > > >> >>>>>>
>> > > > >> >>>>>> 2015-05-28 13:44 GMT+02:00 Robert Metzger <
>> > rmetzger@apache.org
>> > > >:
>> > > > >> >>>>>>
>> > > > >> >>>>>>> Hi Sebastian,
>> > > > >> >>>>>>>
>> > > > >> >>>>>>> thank you for the feedback. I agree that both variants
>> have
>> > a
>> > > > >> >>>>>>> right
>> > > > >> >>>> to
>> > > > >> >>>>>>> exist.
>> > > > >> >>>>>>>
>> > > > >> >>>>>>> I would vote for adding another method to the DataSet
>> called
>> > > > >> >>>>>> "printLocal()"
>> > > > >> >>>>>>> that has the old behavior.
>> > > > >> >>>>>>>
>> > > > >> >>>>>>> On Thu, May 28, 2015 at 1:01 PM, Kruse, Sebastian <
>> > > > >> >>>>>> Sebastian.Kruse@hpi.de>
>> > > > >> >>>>>>> wrote:
>> > > > >> >>>>>>>
>> > > > >> >>>>>>>> Hi everyone,
>> > > > >> >>>>>>>>
>> > > > >> >>>>>>>> I am a bit worried about that recent change of the
>> print()
>> > > > >> >>> method.
>> > > > >> >>>> I
>> > > > >> >>>>>> can
>> > > > >> >>>>>>>> understand the rationale that obtaining the stdout from
>> all
>> > > > >> >>>>>>>> the taskmanagers is cumbersome (although, for local
>> > > > >> >>>>>>>> debugging the old
>> > > > >> >>>>>> print()
>> > > > >> >>>>>>>> was fine).
>> > > > >> >>>>>>>> However, a major problem, I see with the new print(), is,
>> > > > >> >>>>>>>> that
>> > > > >> >>> now
>> > > > >> >>>>> you
>> > > > >> >>>>>>> can
>> > > > >> >>>>>>>> only have one print() per plan, as the plan is directly
>> > > > >> >>>>>>>> executed
>> > > > >> >>> as
>> > > > >> >>>>>> soon
>> > > > >> >>>>>>> as
>> > > > >> >>>>>>>> print() is invoked. If you regard print() as a debugging
>> > > > >> >>>>>>>> means,
>> > > > >> >>>> this
>> > > > >> >>>>>> is a
>> > > > >> >>>>>>>> severe restriction.
>> > > > >> >>>>>>>> I see use cases for both print() implementations, but I
>> > > > >> >>>>>>>> would at
>> > > > >> >>>>> least
>> > > > >> >>>>>>>> provide some kind of backwards compatibility, be at a
>> > > > >> >>>>>>>> parameter
>> > > > >> >>> or
>> > > > >> >>>> a
>> > > > >> >>>>>>>> legacyPrint() method or anything else. As I assume
>> print()
>> > > > >> >>>>>>>> to be
>> > > > >> >>>> very
>> > > > >> >>>>>>>> frequently used, a lot of existing programs would benefit
>> > > > >> >>>>>>>> from
>> > > > >> >>> this
>> > > > >> >>>>> and
>> > > > >> >>>>>>>> might otherwise not be directly portable to newer Flink
>> > > > >> >> versions.
>> > > > >> >>>>> What
>> > > > >> >>>>>> do
>> > > > >> >>>>>>>> you think?
>> > > > >> >>>>>>>>
>> > > > >> >>>>>>>> Cheers,
>> > > > >> >>>>>>>> Sebastian
>> > > > >> >>>>>>>>
>> > > > >> >>>>>>>> -----Original Message-----
>> > > > >> >>>>>>>> From: Robert Metzger [mailto:rmetzger@apache.org]
>> > > > >> >>>>>>>> Sent: Dienstag, 26. Mai 2015 11:12
>> > > > >> >>>>>>>> To: dev@flink.apache.org
>> > > > >> >>>>>>>> Subject: Re: Changed the behavior of "DataSet.print()"
>> > > > >> >>>>>>>>
>> > > > >> >>>>>>>> I've filed a JIRA to update the documentation:
>> > > > >> >>>>>>>> https://issues.apache.org/jira/browse/FLINK-2092
>> > > > >> >>>>>>>>
>> > > > >> >>>>>>>> On Fri, May 22, 2015 at 11:08 AM, Stephan Ewen
>> > > > >> >>>>>>>> <sewen@apache.org
>> > > > >> >>>>
>> > > > >> >>>>>> wrote:
>> > > > >> >>>>>>>>
>> > > > >> >>>>>>>>> Hi all!
>> > > > >> >>>>>>>>>
>> > > > >> >>>>>>>>> Me merged a patch yesterday that changed the API
>> behavior
>> > > > >> >>>>>>>>> of
>> > > > >> >>> the
>> > > > >> >>>>>>>>> "DataSet.print()" function.
>> > > > >> >>>>>>>>>
>> > > > >> >>>>>>>>> "print()" now prints to stdout on the client process,
>> > > > >> >>>>>>>>> rather
>> > > > >> >>> than
>> > > > >> >>>>> the
>> > > > >> >>>>>>>>> TaskManager process, as before. This is much nicer for
>> > > > >> >>> debugging
>> > > > >> >>>>> and
>> > > > >> >>>>>>>>> exploring data sets.
>> > > > >> >>>>>>>>>
>> > > > >> >>>>>>>>> One implication of this is that print() is now an eager
>> > > > >> >>>>>>>>> method
>> > > > >> >>> (
>> > > > >> >>>>> like
>> > > > >> >>>>>>>>> collect() or count() ). That means that calling
>> "print()"
>> > > > >> >>>>> immediately
>> > > > >> >>>>>>>>> triggers the execution, and no "env.execute()" is
>> required
>> > > > >> >>>>>>>>> any
>> > > > >> >>>>> more.
>> > > > >> >>>>>>>>>
>> > > > >> >>>>>>>>> Greetings,
>> > > > >> >>>>>>>>> Stephan
>> > > > >> >>>>>>>>>
>> > > > >> >>>>>>>>>
>> > > > >> >>>>>>>>
>> > > > >> >>>>>>>
>> > > > >> >>>>>>
>> > > > >> >>>>>
>> > > > >> >>>>
>> > > > >> >>>
>> > > > >> >>
>> > > > >>
>> > > > >>
>> > > > >>
>> > > > >>
>> > > > >>
>> > > >
>> > >
>> >
>>

Re: Changed the behavior of "DataSet.print()"

Posted by Maximilian Michels <mx...@apache.org>.
+1 for printOnTaskManager(prefix)

On Tue, Jun 2, 2015 at 1:54 PM, Kostas Tzoumas <kt...@apache.org> wrote:

> +1 for printOnTaskManager(prefix)
>
> On Tue, Jun 2, 2015 at 1:35 PM, Till Rohrmann <tr...@apache.org>
> wrote:
>
> > +1 for printOnTaskManager(prefix)
> >
> > On Tue, Jun 2, 2015 at 12:08 PM, Fabian Hueske <fh...@gmail.com>
> wrote:
> >
> > > +1 for writeToWorkerStdOut(prefix)
> > > On Jun 2, 2015 11:42, "Aljoscha Krettek" <al...@apache.org> wrote:
> > >
> > > > +1 for printOnTaskManager(prefix)
> > > >
> > > > On Tue, Jun 2, 2015 at 11:35 AM, Robert Metzger <rmetzger@apache.org
> >
> > > > wrote:
> > > > > I would like to reach consensus on this before the 0.9 release.
> > > > >
> > > > > So far we have the following ideas:
> > > > >
> > > > > writeToWorkerStdOut(prefix)
> > > > > printOnTaskManager(prefix) (+1)
> > > > > logOnTaskManager(prefix)
> > > > >
> > > > > I'm against logOnTM because we are not logging the output, we are
> > > writing
> > > > > or printing it.
> > > > >
> > > > >
> > > > > *I would vote for deprecating "print(prefix)" and adding
> > > > > "writeToWorkerStdOut(prefix)"*
> > > > >
> > > > >
> > > > >
> > > > > On Thu, May 28, 2015 at 5:00 PM, Chiwan Park <
> chiwanpark@icloud.com>
> > > > wrote:
> > > > >
> > > > >> I agree that avoiding name which starts with “print” is better.
> > > > >>
> > > > >> Regards,
> > > > >> Chiwan Park
> > > > >>
> > > > >> > On May 28, 2015, at 11:35 PM, Maximilian Michels <
> mxm@apache.org>
> > > > wrote:
> > > > >> >
> > > > >> > +1 for printOnTaskManager()
> > > > >> >
> > > > >> > On Thu, May 28, 2015 at 2:53 PM, Kruse, Sebastian <
> > > > >> Sebastian.Kruse@hpi.de>
> > > > >> > wrote:
> > > > >> >
> > > > >> >> Thanks, for your quick responses!
> > > > >> >>
> > > > >> >> I also think that renaming the old print method should do the
> > > trick.
> > > > As
> > > > >> a
> > > > >> >> contribution to your brainstorming for a name, I propose
> > > > >> logOnTaskManager()
> > > > >> >> ;)
> > > > >> >>
> > > > >> >> Cheers,
> > > > >> >> Sebastian
> > > > >> >>
> > > > >> >> -----Original Message-----
> > > > >> >> From: Fabian Hueske [mailto:fhueske@gmail.com]
> > > > >> >> Sent: Donnerstag, 28. Mai 2015 14:34
> > > > >> >> To: dev@flink.apache.org
> > > > >> >> Subject: Re: Changed the behavior of "DataSet.print()"
> > > > >> >>
> > > > >> >> As I said, the common print prefix might indicate eager
> > execution.
> > > > >> >>
> > > > >> >> I know that writeToTaskManagerStdOut() is quite bulky, but we
> > > should
> > > > >> make
> > > > >> >> the difference in the behavior very clear, IMO.
> > > > >> >>
> > > > >> >> 2015-05-28 14:29 GMT+02:00 Stephan Ewen <se...@apache.org>:
> > > > >> >>
> > > > >> >>> Actually, there is a method "print(String prefix)" which still
> > > goes
> > > > to
> > > > >> >>> the sysout of where the job is executed.
> > > > >> >>>
> > > > >> >>> Let's give that one the name "printOnTaskManager()" and then
> we
> > > > should
> > > > >> >>> have it...
> > > > >> >>>
> > > > >> >>> On Thu, May 28, 2015 at 2:13 PM, Fabian Hueske <
> > fhueske@gmail.com
> > > >
> > > > >> >> wrote:
> > > > >> >>>
> > > > >> >>>> I would avoid to call it printXYZ, since print()'s behavior
> > > changed
> > > > >> >>>> to eager execution.
> > > > >> >>>>
> > > > >> >>>> 2015-05-28 14:10 GMT+02:00 Robert Metzger <
> rmetzger@apache.org
> > >:
> > > > >> >>>>
> > > > >> >>>>> Okay, you are right, local is actually confusing.
> > > > >> >>>>> I'm against introducing "worker" as a term in the API. Its
> > still
> > > > >> >>>>> called "TaskManager". Maybe "printOnTaskManager()" ?
> > > > >> >>>>>
> > > > >> >>>>> On Thu, May 28, 2015 at 2:06 PM, Fabian Hueske <
> > > fhueske@gmail.com
> > > > >
> > > > >> >>>> wrote:
> > > > >> >>>>>
> > > > >> >>>>>> +1 for both.
> > > > >> >>>>>>
> > > > >> >>>>>> printLocal() might not be the best name, because "local" is
> > not
> > > > >> >>>>>> well defined and could also be understood as the local
> > machine
> > > > >> >>>>>> of the
> > > > >> >>> user.
> > > > >> >>>>>> How about naming the method completely different
> > > > >> >>>> (writeToWorkerStdOut()?)
> > > > >> >>>>>> to make sure users are not confused with eager and lazy
> > > > execution?
> > > > >> >>>>>>
> > > > >> >>>>>>
> > > > >> >>>>>> 2015-05-28 13:44 GMT+02:00 Robert Metzger <
> > rmetzger@apache.org
> > > >:
> > > > >> >>>>>>
> > > > >> >>>>>>> Hi Sebastian,
> > > > >> >>>>>>>
> > > > >> >>>>>>> thank you for the feedback. I agree that both variants
> have
> > a
> > > > >> >>>>>>> right
> > > > >> >>>> to
> > > > >> >>>>>>> exist.
> > > > >> >>>>>>>
> > > > >> >>>>>>> I would vote for adding another method to the DataSet
> called
> > > > >> >>>>>> "printLocal()"
> > > > >> >>>>>>> that has the old behavior.
> > > > >> >>>>>>>
> > > > >> >>>>>>> On Thu, May 28, 2015 at 1:01 PM, Kruse, Sebastian <
> > > > >> >>>>>> Sebastian.Kruse@hpi.de>
> > > > >> >>>>>>> wrote:
> > > > >> >>>>>>>
> > > > >> >>>>>>>> Hi everyone,
> > > > >> >>>>>>>>
> > > > >> >>>>>>>> I am a bit worried about that recent change of the
> print()
> > > > >> >>> method.
> > > > >> >>>> I
> > > > >> >>>>>> can
> > > > >> >>>>>>>> understand the rationale that obtaining the stdout from
> all
> > > > >> >>>>>>>> the taskmanagers is cumbersome (although, for local
> > > > >> >>>>>>>> debugging the old
> > > > >> >>>>>> print()
> > > > >> >>>>>>>> was fine).
> > > > >> >>>>>>>> However, a major problem, I see with the new print(), is,
> > > > >> >>>>>>>> that
> > > > >> >>> now
> > > > >> >>>>> you
> > > > >> >>>>>>> can
> > > > >> >>>>>>>> only have one print() per plan, as the plan is directly
> > > > >> >>>>>>>> executed
> > > > >> >>> as
> > > > >> >>>>>> soon
> > > > >> >>>>>>> as
> > > > >> >>>>>>>> print() is invoked. If you regard print() as a debugging
> > > > >> >>>>>>>> means,
> > > > >> >>>> this
> > > > >> >>>>>> is a
> > > > >> >>>>>>>> severe restriction.
> > > > >> >>>>>>>> I see use cases for both print() implementations, but I
> > > > >> >>>>>>>> would at
> > > > >> >>>>> least
> > > > >> >>>>>>>> provide some kind of backwards compatibility, be at a
> > > > >> >>>>>>>> parameter
> > > > >> >>> or
> > > > >> >>>> a
> > > > >> >>>>>>>> legacyPrint() method or anything else. As I assume
> print()
> > > > >> >>>>>>>> to be
> > > > >> >>>> very
> > > > >> >>>>>>>> frequently used, a lot of existing programs would benefit
> > > > >> >>>>>>>> from
> > > > >> >>> this
> > > > >> >>>>> and
> > > > >> >>>>>>>> might otherwise not be directly portable to newer Flink
> > > > >> >> versions.
> > > > >> >>>>> What
> > > > >> >>>>>> do
> > > > >> >>>>>>>> you think?
> > > > >> >>>>>>>>
> > > > >> >>>>>>>> Cheers,
> > > > >> >>>>>>>> Sebastian
> > > > >> >>>>>>>>
> > > > >> >>>>>>>> -----Original Message-----
> > > > >> >>>>>>>> From: Robert Metzger [mailto:rmetzger@apache.org]
> > > > >> >>>>>>>> Sent: Dienstag, 26. Mai 2015 11:12
> > > > >> >>>>>>>> To: dev@flink.apache.org
> > > > >> >>>>>>>> Subject: Re: Changed the behavior of "DataSet.print()"
> > > > >> >>>>>>>>
> > > > >> >>>>>>>> I've filed a JIRA to update the documentation:
> > > > >> >>>>>>>> https://issues.apache.org/jira/browse/FLINK-2092
> > > > >> >>>>>>>>
> > > > >> >>>>>>>> On Fri, May 22, 2015 at 11:08 AM, Stephan Ewen
> > > > >> >>>>>>>> <sewen@apache.org
> > > > >> >>>>
> > > > >> >>>>>> wrote:
> > > > >> >>>>>>>>
> > > > >> >>>>>>>>> Hi all!
> > > > >> >>>>>>>>>
> > > > >> >>>>>>>>> Me merged a patch yesterday that changed the API
> behavior
> > > > >> >>>>>>>>> of
> > > > >> >>> the
> > > > >> >>>>>>>>> "DataSet.print()" function.
> > > > >> >>>>>>>>>
> > > > >> >>>>>>>>> "print()" now prints to stdout on the client process,
> > > > >> >>>>>>>>> rather
> > > > >> >>> than
> > > > >> >>>>> the
> > > > >> >>>>>>>>> TaskManager process, as before. This is much nicer for
> > > > >> >>> debugging
> > > > >> >>>>> and
> > > > >> >>>>>>>>> exploring data sets.
> > > > >> >>>>>>>>>
> > > > >> >>>>>>>>> One implication of this is that print() is now an eager
> > > > >> >>>>>>>>> method
> > > > >> >>> (
> > > > >> >>>>> like
> > > > >> >>>>>>>>> collect() or count() ). That means that calling
> "print()"
> > > > >> >>>>> immediately
> > > > >> >>>>>>>>> triggers the execution, and no "env.execute()" is
> required
> > > > >> >>>>>>>>> any
> > > > >> >>>>> more.
> > > > >> >>>>>>>>>
> > > > >> >>>>>>>>> Greetings,
> > > > >> >>>>>>>>> Stephan
> > > > >> >>>>>>>>>
> > > > >> >>>>>>>>>
> > > > >> >>>>>>>>
> > > > >> >>>>>>>
> > > > >> >>>>>>
> > > > >> >>>>>
> > > > >> >>>>
> > > > >> >>>
> > > > >> >>
> > > > >>
> > > > >>
> > > > >>
> > > > >>
> > > > >>
> > > >
> > >
> >
>

Re: Changed the behavior of "DataSet.print()"

Posted by Kostas Tzoumas <kt...@apache.org>.
+1 for printOnTaskManager(prefix)

On Tue, Jun 2, 2015 at 1:35 PM, Till Rohrmann <tr...@apache.org> wrote:

> +1 for printOnTaskManager(prefix)
>
> On Tue, Jun 2, 2015 at 12:08 PM, Fabian Hueske <fh...@gmail.com> wrote:
>
> > +1 for writeToWorkerStdOut(prefix)
> > On Jun 2, 2015 11:42, "Aljoscha Krettek" <al...@apache.org> wrote:
> >
> > > +1 for printOnTaskManager(prefix)
> > >
> > > On Tue, Jun 2, 2015 at 11:35 AM, Robert Metzger <rm...@apache.org>
> > > wrote:
> > > > I would like to reach consensus on this before the 0.9 release.
> > > >
> > > > So far we have the following ideas:
> > > >
> > > > writeToWorkerStdOut(prefix)
> > > > printOnTaskManager(prefix) (+1)
> > > > logOnTaskManager(prefix)
> > > >
> > > > I'm against logOnTM because we are not logging the output, we are
> > writing
> > > > or printing it.
> > > >
> > > >
> > > > *I would vote for deprecating "print(prefix)" and adding
> > > > "writeToWorkerStdOut(prefix)"*
> > > >
> > > >
> > > >
> > > > On Thu, May 28, 2015 at 5:00 PM, Chiwan Park <ch...@icloud.com>
> > > wrote:
> > > >
> > > >> I agree that avoiding name which starts with “print” is better.
> > > >>
> > > >> Regards,
> > > >> Chiwan Park
> > > >>
> > > >> > On May 28, 2015, at 11:35 PM, Maximilian Michels <mx...@apache.org>
> > > wrote:
> > > >> >
> > > >> > +1 for printOnTaskManager()
> > > >> >
> > > >> > On Thu, May 28, 2015 at 2:53 PM, Kruse, Sebastian <
> > > >> Sebastian.Kruse@hpi.de>
> > > >> > wrote:
> > > >> >
> > > >> >> Thanks, for your quick responses!
> > > >> >>
> > > >> >> I also think that renaming the old print method should do the
> > trick.
> > > As
> > > >> a
> > > >> >> contribution to your brainstorming for a name, I propose
> > > >> logOnTaskManager()
> > > >> >> ;)
> > > >> >>
> > > >> >> Cheers,
> > > >> >> Sebastian
> > > >> >>
> > > >> >> -----Original Message-----
> > > >> >> From: Fabian Hueske [mailto:fhueske@gmail.com]
> > > >> >> Sent: Donnerstag, 28. Mai 2015 14:34
> > > >> >> To: dev@flink.apache.org
> > > >> >> Subject: Re: Changed the behavior of "DataSet.print()"
> > > >> >>
> > > >> >> As I said, the common print prefix might indicate eager
> execution.
> > > >> >>
> > > >> >> I know that writeToTaskManagerStdOut() is quite bulky, but we
> > should
> > > >> make
> > > >> >> the difference in the behavior very clear, IMO.
> > > >> >>
> > > >> >> 2015-05-28 14:29 GMT+02:00 Stephan Ewen <se...@apache.org>:
> > > >> >>
> > > >> >>> Actually, there is a method "print(String prefix)" which still
> > goes
> > > to
> > > >> >>> the sysout of where the job is executed.
> > > >> >>>
> > > >> >>> Let's give that one the name "printOnTaskManager()" and then we
> > > should
> > > >> >>> have it...
> > > >> >>>
> > > >> >>> On Thu, May 28, 2015 at 2:13 PM, Fabian Hueske <
> fhueske@gmail.com
> > >
> > > >> >> wrote:
> > > >> >>>
> > > >> >>>> I would avoid to call it printXYZ, since print()'s behavior
> > changed
> > > >> >>>> to eager execution.
> > > >> >>>>
> > > >> >>>> 2015-05-28 14:10 GMT+02:00 Robert Metzger <rmetzger@apache.org
> >:
> > > >> >>>>
> > > >> >>>>> Okay, you are right, local is actually confusing.
> > > >> >>>>> I'm against introducing "worker" as a term in the API. Its
> still
> > > >> >>>>> called "TaskManager". Maybe "printOnTaskManager()" ?
> > > >> >>>>>
> > > >> >>>>> On Thu, May 28, 2015 at 2:06 PM, Fabian Hueske <
> > fhueske@gmail.com
> > > >
> > > >> >>>> wrote:
> > > >> >>>>>
> > > >> >>>>>> +1 for both.
> > > >> >>>>>>
> > > >> >>>>>> printLocal() might not be the best name, because "local" is
> not
> > > >> >>>>>> well defined and could also be understood as the local
> machine
> > > >> >>>>>> of the
> > > >> >>> user.
> > > >> >>>>>> How about naming the method completely different
> > > >> >>>> (writeToWorkerStdOut()?)
> > > >> >>>>>> to make sure users are not confused with eager and lazy
> > > execution?
> > > >> >>>>>>
> > > >> >>>>>>
> > > >> >>>>>> 2015-05-28 13:44 GMT+02:00 Robert Metzger <
> rmetzger@apache.org
> > >:
> > > >> >>>>>>
> > > >> >>>>>>> Hi Sebastian,
> > > >> >>>>>>>
> > > >> >>>>>>> thank you for the feedback. I agree that both variants have
> a
> > > >> >>>>>>> right
> > > >> >>>> to
> > > >> >>>>>>> exist.
> > > >> >>>>>>>
> > > >> >>>>>>> I would vote for adding another method to the DataSet called
> > > >> >>>>>> "printLocal()"
> > > >> >>>>>>> that has the old behavior.
> > > >> >>>>>>>
> > > >> >>>>>>> On Thu, May 28, 2015 at 1:01 PM, Kruse, Sebastian <
> > > >> >>>>>> Sebastian.Kruse@hpi.de>
> > > >> >>>>>>> wrote:
> > > >> >>>>>>>
> > > >> >>>>>>>> Hi everyone,
> > > >> >>>>>>>>
> > > >> >>>>>>>> I am a bit worried about that recent change of the print()
> > > >> >>> method.
> > > >> >>>> I
> > > >> >>>>>> can
> > > >> >>>>>>>> understand the rationale that obtaining the stdout from all
> > > >> >>>>>>>> the taskmanagers is cumbersome (although, for local
> > > >> >>>>>>>> debugging the old
> > > >> >>>>>> print()
> > > >> >>>>>>>> was fine).
> > > >> >>>>>>>> However, a major problem, I see with the new print(), is,
> > > >> >>>>>>>> that
> > > >> >>> now
> > > >> >>>>> you
> > > >> >>>>>>> can
> > > >> >>>>>>>> only have one print() per plan, as the plan is directly
> > > >> >>>>>>>> executed
> > > >> >>> as
> > > >> >>>>>> soon
> > > >> >>>>>>> as
> > > >> >>>>>>>> print() is invoked. If you regard print() as a debugging
> > > >> >>>>>>>> means,
> > > >> >>>> this
> > > >> >>>>>> is a
> > > >> >>>>>>>> severe restriction.
> > > >> >>>>>>>> I see use cases for both print() implementations, but I
> > > >> >>>>>>>> would at
> > > >> >>>>> least
> > > >> >>>>>>>> provide some kind of backwards compatibility, be at a
> > > >> >>>>>>>> parameter
> > > >> >>> or
> > > >> >>>> a
> > > >> >>>>>>>> legacyPrint() method or anything else. As I assume print()
> > > >> >>>>>>>> to be
> > > >> >>>> very
> > > >> >>>>>>>> frequently used, a lot of existing programs would benefit
> > > >> >>>>>>>> from
> > > >> >>> this
> > > >> >>>>> and
> > > >> >>>>>>>> might otherwise not be directly portable to newer Flink
> > > >> >> versions.
> > > >> >>>>> What
> > > >> >>>>>> do
> > > >> >>>>>>>> you think?
> > > >> >>>>>>>>
> > > >> >>>>>>>> Cheers,
> > > >> >>>>>>>> Sebastian
> > > >> >>>>>>>>
> > > >> >>>>>>>> -----Original Message-----
> > > >> >>>>>>>> From: Robert Metzger [mailto:rmetzger@apache.org]
> > > >> >>>>>>>> Sent: Dienstag, 26. Mai 2015 11:12
> > > >> >>>>>>>> To: dev@flink.apache.org
> > > >> >>>>>>>> Subject: Re: Changed the behavior of "DataSet.print()"
> > > >> >>>>>>>>
> > > >> >>>>>>>> I've filed a JIRA to update the documentation:
> > > >> >>>>>>>> https://issues.apache.org/jira/browse/FLINK-2092
> > > >> >>>>>>>>
> > > >> >>>>>>>> On Fri, May 22, 2015 at 11:08 AM, Stephan Ewen
> > > >> >>>>>>>> <sewen@apache.org
> > > >> >>>>
> > > >> >>>>>> wrote:
> > > >> >>>>>>>>
> > > >> >>>>>>>>> Hi all!
> > > >> >>>>>>>>>
> > > >> >>>>>>>>> Me merged a patch yesterday that changed the API behavior
> > > >> >>>>>>>>> of
> > > >> >>> the
> > > >> >>>>>>>>> "DataSet.print()" function.
> > > >> >>>>>>>>>
> > > >> >>>>>>>>> "print()" now prints to stdout on the client process,
> > > >> >>>>>>>>> rather
> > > >> >>> than
> > > >> >>>>> the
> > > >> >>>>>>>>> TaskManager process, as before. This is much nicer for
> > > >> >>> debugging
> > > >> >>>>> and
> > > >> >>>>>>>>> exploring data sets.
> > > >> >>>>>>>>>
> > > >> >>>>>>>>> One implication of this is that print() is now an eager
> > > >> >>>>>>>>> method
> > > >> >>> (
> > > >> >>>>> like
> > > >> >>>>>>>>> collect() or count() ). That means that calling "print()"
> > > >> >>>>> immediately
> > > >> >>>>>>>>> triggers the execution, and no "env.execute()" is required
> > > >> >>>>>>>>> any
> > > >> >>>>> more.
> > > >> >>>>>>>>>
> > > >> >>>>>>>>> Greetings,
> > > >> >>>>>>>>> Stephan
> > > >> >>>>>>>>>
> > > >> >>>>>>>>>
> > > >> >>>>>>>>
> > > >> >>>>>>>
> > > >> >>>>>>
> > > >> >>>>>
> > > >> >>>>
> > > >> >>>
> > > >> >>
> > > >>
> > > >>
> > > >>
> > > >>
> > > >>
> > >
> >
>

Re: Changed the behavior of "DataSet.print()"

Posted by Till Rohrmann <tr...@apache.org>.
+1 for printOnTaskManager(prefix)

On Tue, Jun 2, 2015 at 12:08 PM, Fabian Hueske <fh...@gmail.com> wrote:

> +1 for writeToWorkerStdOut(prefix)
> On Jun 2, 2015 11:42, "Aljoscha Krettek" <al...@apache.org> wrote:
>
> > +1 for printOnTaskManager(prefix)
> >
> > On Tue, Jun 2, 2015 at 11:35 AM, Robert Metzger <rm...@apache.org>
> > wrote:
> > > I would like to reach consensus on this before the 0.9 release.
> > >
> > > So far we have the following ideas:
> > >
> > > writeToWorkerStdOut(prefix)
> > > printOnTaskManager(prefix) (+1)
> > > logOnTaskManager(prefix)
> > >
> > > I'm against logOnTM because we are not logging the output, we are
> writing
> > > or printing it.
> > >
> > >
> > > *I would vote for deprecating "print(prefix)" and adding
> > > "writeToWorkerStdOut(prefix)"*
> > >
> > >
> > >
> > > On Thu, May 28, 2015 at 5:00 PM, Chiwan Park <ch...@icloud.com>
> > wrote:
> > >
> > >> I agree that avoiding name which starts with “print” is better.
> > >>
> > >> Regards,
> > >> Chiwan Park
> > >>
> > >> > On May 28, 2015, at 11:35 PM, Maximilian Michels <mx...@apache.org>
> > wrote:
> > >> >
> > >> > +1 for printOnTaskManager()
> > >> >
> > >> > On Thu, May 28, 2015 at 2:53 PM, Kruse, Sebastian <
> > >> Sebastian.Kruse@hpi.de>
> > >> > wrote:
> > >> >
> > >> >> Thanks, for your quick responses!
> > >> >>
> > >> >> I also think that renaming the old print method should do the
> trick.
> > As
> > >> a
> > >> >> contribution to your brainstorming for a name, I propose
> > >> logOnTaskManager()
> > >> >> ;)
> > >> >>
> > >> >> Cheers,
> > >> >> Sebastian
> > >> >>
> > >> >> -----Original Message-----
> > >> >> From: Fabian Hueske [mailto:fhueske@gmail.com]
> > >> >> Sent: Donnerstag, 28. Mai 2015 14:34
> > >> >> To: dev@flink.apache.org
> > >> >> Subject: Re: Changed the behavior of "DataSet.print()"
> > >> >>
> > >> >> As I said, the common print prefix might indicate eager execution.
> > >> >>
> > >> >> I know that writeToTaskManagerStdOut() is quite bulky, but we
> should
> > >> make
> > >> >> the difference in the behavior very clear, IMO.
> > >> >>
> > >> >> 2015-05-28 14:29 GMT+02:00 Stephan Ewen <se...@apache.org>:
> > >> >>
> > >> >>> Actually, there is a method "print(String prefix)" which still
> goes
> > to
> > >> >>> the sysout of where the job is executed.
> > >> >>>
> > >> >>> Let's give that one the name "printOnTaskManager()" and then we
> > should
> > >> >>> have it...
> > >> >>>
> > >> >>> On Thu, May 28, 2015 at 2:13 PM, Fabian Hueske <fhueske@gmail.com
> >
> > >> >> wrote:
> > >> >>>
> > >> >>>> I would avoid to call it printXYZ, since print()'s behavior
> changed
> > >> >>>> to eager execution.
> > >> >>>>
> > >> >>>> 2015-05-28 14:10 GMT+02:00 Robert Metzger <rm...@apache.org>:
> > >> >>>>
> > >> >>>>> Okay, you are right, local is actually confusing.
> > >> >>>>> I'm against introducing "worker" as a term in the API. Its still
> > >> >>>>> called "TaskManager". Maybe "printOnTaskManager()" ?
> > >> >>>>>
> > >> >>>>> On Thu, May 28, 2015 at 2:06 PM, Fabian Hueske <
> fhueske@gmail.com
> > >
> > >> >>>> wrote:
> > >> >>>>>
> > >> >>>>>> +1 for both.
> > >> >>>>>>
> > >> >>>>>> printLocal() might not be the best name, because "local" is not
> > >> >>>>>> well defined and could also be understood as the local machine
> > >> >>>>>> of the
> > >> >>> user.
> > >> >>>>>> How about naming the method completely different
> > >> >>>> (writeToWorkerStdOut()?)
> > >> >>>>>> to make sure users are not confused with eager and lazy
> > execution?
> > >> >>>>>>
> > >> >>>>>>
> > >> >>>>>> 2015-05-28 13:44 GMT+02:00 Robert Metzger <rmetzger@apache.org
> >:
> > >> >>>>>>
> > >> >>>>>>> Hi Sebastian,
> > >> >>>>>>>
> > >> >>>>>>> thank you for the feedback. I agree that both variants have a
> > >> >>>>>>> right
> > >> >>>> to
> > >> >>>>>>> exist.
> > >> >>>>>>>
> > >> >>>>>>> I would vote for adding another method to the DataSet called
> > >> >>>>>> "printLocal()"
> > >> >>>>>>> that has the old behavior.
> > >> >>>>>>>
> > >> >>>>>>> On Thu, May 28, 2015 at 1:01 PM, Kruse, Sebastian <
> > >> >>>>>> Sebastian.Kruse@hpi.de>
> > >> >>>>>>> wrote:
> > >> >>>>>>>
> > >> >>>>>>>> Hi everyone,
> > >> >>>>>>>>
> > >> >>>>>>>> I am a bit worried about that recent change of the print()
> > >> >>> method.
> > >> >>>> I
> > >> >>>>>> can
> > >> >>>>>>>> understand the rationale that obtaining the stdout from all
> > >> >>>>>>>> the taskmanagers is cumbersome (although, for local
> > >> >>>>>>>> debugging the old
> > >> >>>>>> print()
> > >> >>>>>>>> was fine).
> > >> >>>>>>>> However, a major problem, I see with the new print(), is,
> > >> >>>>>>>> that
> > >> >>> now
> > >> >>>>> you
> > >> >>>>>>> can
> > >> >>>>>>>> only have one print() per plan, as the plan is directly
> > >> >>>>>>>> executed
> > >> >>> as
> > >> >>>>>> soon
> > >> >>>>>>> as
> > >> >>>>>>>> print() is invoked. If you regard print() as a debugging
> > >> >>>>>>>> means,
> > >> >>>> this
> > >> >>>>>> is a
> > >> >>>>>>>> severe restriction.
> > >> >>>>>>>> I see use cases for both print() implementations, but I
> > >> >>>>>>>> would at
> > >> >>>>> least
> > >> >>>>>>>> provide some kind of backwards compatibility, be at a
> > >> >>>>>>>> parameter
> > >> >>> or
> > >> >>>> a
> > >> >>>>>>>> legacyPrint() method or anything else. As I assume print()
> > >> >>>>>>>> to be
> > >> >>>> very
> > >> >>>>>>>> frequently used, a lot of existing programs would benefit
> > >> >>>>>>>> from
> > >> >>> this
> > >> >>>>> and
> > >> >>>>>>>> might otherwise not be directly portable to newer Flink
> > >> >> versions.
> > >> >>>>> What
> > >> >>>>>> do
> > >> >>>>>>>> you think?
> > >> >>>>>>>>
> > >> >>>>>>>> Cheers,
> > >> >>>>>>>> Sebastian
> > >> >>>>>>>>
> > >> >>>>>>>> -----Original Message-----
> > >> >>>>>>>> From: Robert Metzger [mailto:rmetzger@apache.org]
> > >> >>>>>>>> Sent: Dienstag, 26. Mai 2015 11:12
> > >> >>>>>>>> To: dev@flink.apache.org
> > >> >>>>>>>> Subject: Re: Changed the behavior of "DataSet.print()"
> > >> >>>>>>>>
> > >> >>>>>>>> I've filed a JIRA to update the documentation:
> > >> >>>>>>>> https://issues.apache.org/jira/browse/FLINK-2092
> > >> >>>>>>>>
> > >> >>>>>>>> On Fri, May 22, 2015 at 11:08 AM, Stephan Ewen
> > >> >>>>>>>> <sewen@apache.org
> > >> >>>>
> > >> >>>>>> wrote:
> > >> >>>>>>>>
> > >> >>>>>>>>> Hi all!
> > >> >>>>>>>>>
> > >> >>>>>>>>> Me merged a patch yesterday that changed the API behavior
> > >> >>>>>>>>> of
> > >> >>> the
> > >> >>>>>>>>> "DataSet.print()" function.
> > >> >>>>>>>>>
> > >> >>>>>>>>> "print()" now prints to stdout on the client process,
> > >> >>>>>>>>> rather
> > >> >>> than
> > >> >>>>> the
> > >> >>>>>>>>> TaskManager process, as before. This is much nicer for
> > >> >>> debugging
> > >> >>>>> and
> > >> >>>>>>>>> exploring data sets.
> > >> >>>>>>>>>
> > >> >>>>>>>>> One implication of this is that print() is now an eager
> > >> >>>>>>>>> method
> > >> >>> (
> > >> >>>>> like
> > >> >>>>>>>>> collect() or count() ). That means that calling "print()"
> > >> >>>>> immediately
> > >> >>>>>>>>> triggers the execution, and no "env.execute()" is required
> > >> >>>>>>>>> any
> > >> >>>>> more.
> > >> >>>>>>>>>
> > >> >>>>>>>>> Greetings,
> > >> >>>>>>>>> Stephan
> > >> >>>>>>>>>
> > >> >>>>>>>>>
> > >> >>>>>>>>
> > >> >>>>>>>
> > >> >>>>>>
> > >> >>>>>
> > >> >>>>
> > >> >>>
> > >> >>
> > >>
> > >>
> > >>
> > >>
> > >>
> >
>

Re: Changed the behavior of "DataSet.print()"

Posted by Fabian Hueske <fh...@gmail.com>.
+1 for writeToWorkerStdOut(prefix)
On Jun 2, 2015 11:42, "Aljoscha Krettek" <al...@apache.org> wrote:

> +1 for printOnTaskManager(prefix)
>
> On Tue, Jun 2, 2015 at 11:35 AM, Robert Metzger <rm...@apache.org>
> wrote:
> > I would like to reach consensus on this before the 0.9 release.
> >
> > So far we have the following ideas:
> >
> > writeToWorkerStdOut(prefix)
> > printOnTaskManager(prefix) (+1)
> > logOnTaskManager(prefix)
> >
> > I'm against logOnTM because we are not logging the output, we are writing
> > or printing it.
> >
> >
> > *I would vote for deprecating "print(prefix)" and adding
> > "writeToWorkerStdOut(prefix)"*
> >
> >
> >
> > On Thu, May 28, 2015 at 5:00 PM, Chiwan Park <ch...@icloud.com>
> wrote:
> >
> >> I agree that avoiding name which starts with “print” is better.
> >>
> >> Regards,
> >> Chiwan Park
> >>
> >> > On May 28, 2015, at 11:35 PM, Maximilian Michels <mx...@apache.org>
> wrote:
> >> >
> >> > +1 for printOnTaskManager()
> >> >
> >> > On Thu, May 28, 2015 at 2:53 PM, Kruse, Sebastian <
> >> Sebastian.Kruse@hpi.de>
> >> > wrote:
> >> >
> >> >> Thanks, for your quick responses!
> >> >>
> >> >> I also think that renaming the old print method should do the trick.
> As
> >> a
> >> >> contribution to your brainstorming for a name, I propose
> >> logOnTaskManager()
> >> >> ;)
> >> >>
> >> >> Cheers,
> >> >> Sebastian
> >> >>
> >> >> -----Original Message-----
> >> >> From: Fabian Hueske [mailto:fhueske@gmail.com]
> >> >> Sent: Donnerstag, 28. Mai 2015 14:34
> >> >> To: dev@flink.apache.org
> >> >> Subject: Re: Changed the behavior of "DataSet.print()"
> >> >>
> >> >> As I said, the common print prefix might indicate eager execution.
> >> >>
> >> >> I know that writeToTaskManagerStdOut() is quite bulky, but we should
> >> make
> >> >> the difference in the behavior very clear, IMO.
> >> >>
> >> >> 2015-05-28 14:29 GMT+02:00 Stephan Ewen <se...@apache.org>:
> >> >>
> >> >>> Actually, there is a method "print(String prefix)" which still goes
> to
> >> >>> the sysout of where the job is executed.
> >> >>>
> >> >>> Let's give that one the name "printOnTaskManager()" and then we
> should
> >> >>> have it...
> >> >>>
> >> >>> On Thu, May 28, 2015 at 2:13 PM, Fabian Hueske <fh...@gmail.com>
> >> >> wrote:
> >> >>>
> >> >>>> I would avoid to call it printXYZ, since print()'s behavior changed
> >> >>>> to eager execution.
> >> >>>>
> >> >>>> 2015-05-28 14:10 GMT+02:00 Robert Metzger <rm...@apache.org>:
> >> >>>>
> >> >>>>> Okay, you are right, local is actually confusing.
> >> >>>>> I'm against introducing "worker" as a term in the API. Its still
> >> >>>>> called "TaskManager". Maybe "printOnTaskManager()" ?
> >> >>>>>
> >> >>>>> On Thu, May 28, 2015 at 2:06 PM, Fabian Hueske <fhueske@gmail.com
> >
> >> >>>> wrote:
> >> >>>>>
> >> >>>>>> +1 for both.
> >> >>>>>>
> >> >>>>>> printLocal() might not be the best name, because "local" is not
> >> >>>>>> well defined and could also be understood as the local machine
> >> >>>>>> of the
> >> >>> user.
> >> >>>>>> How about naming the method completely different
> >> >>>> (writeToWorkerStdOut()?)
> >> >>>>>> to make sure users are not confused with eager and lazy
> execution?
> >> >>>>>>
> >> >>>>>>
> >> >>>>>> 2015-05-28 13:44 GMT+02:00 Robert Metzger <rm...@apache.org>:
> >> >>>>>>
> >> >>>>>>> Hi Sebastian,
> >> >>>>>>>
> >> >>>>>>> thank you for the feedback. I agree that both variants have a
> >> >>>>>>> right
> >> >>>> to
> >> >>>>>>> exist.
> >> >>>>>>>
> >> >>>>>>> I would vote for adding another method to the DataSet called
> >> >>>>>> "printLocal()"
> >> >>>>>>> that has the old behavior.
> >> >>>>>>>
> >> >>>>>>> On Thu, May 28, 2015 at 1:01 PM, Kruse, Sebastian <
> >> >>>>>> Sebastian.Kruse@hpi.de>
> >> >>>>>>> wrote:
> >> >>>>>>>
> >> >>>>>>>> Hi everyone,
> >> >>>>>>>>
> >> >>>>>>>> I am a bit worried about that recent change of the print()
> >> >>> method.
> >> >>>> I
> >> >>>>>> can
> >> >>>>>>>> understand the rationale that obtaining the stdout from all
> >> >>>>>>>> the taskmanagers is cumbersome (although, for local
> >> >>>>>>>> debugging the old
> >> >>>>>> print()
> >> >>>>>>>> was fine).
> >> >>>>>>>> However, a major problem, I see with the new print(), is,
> >> >>>>>>>> that
> >> >>> now
> >> >>>>> you
> >> >>>>>>> can
> >> >>>>>>>> only have one print() per plan, as the plan is directly
> >> >>>>>>>> executed
> >> >>> as
> >> >>>>>> soon
> >> >>>>>>> as
> >> >>>>>>>> print() is invoked. If you regard print() as a debugging
> >> >>>>>>>> means,
> >> >>>> this
> >> >>>>>> is a
> >> >>>>>>>> severe restriction.
> >> >>>>>>>> I see use cases for both print() implementations, but I
> >> >>>>>>>> would at
> >> >>>>> least
> >> >>>>>>>> provide some kind of backwards compatibility, be at a
> >> >>>>>>>> parameter
> >> >>> or
> >> >>>> a
> >> >>>>>>>> legacyPrint() method or anything else. As I assume print()
> >> >>>>>>>> to be
> >> >>>> very
> >> >>>>>>>> frequently used, a lot of existing programs would benefit
> >> >>>>>>>> from
> >> >>> this
> >> >>>>> and
> >> >>>>>>>> might otherwise not be directly portable to newer Flink
> >> >> versions.
> >> >>>>> What
> >> >>>>>> do
> >> >>>>>>>> you think?
> >> >>>>>>>>
> >> >>>>>>>> Cheers,
> >> >>>>>>>> Sebastian
> >> >>>>>>>>
> >> >>>>>>>> -----Original Message-----
> >> >>>>>>>> From: Robert Metzger [mailto:rmetzger@apache.org]
> >> >>>>>>>> Sent: Dienstag, 26. Mai 2015 11:12
> >> >>>>>>>> To: dev@flink.apache.org
> >> >>>>>>>> Subject: Re: Changed the behavior of "DataSet.print()"
> >> >>>>>>>>
> >> >>>>>>>> I've filed a JIRA to update the documentation:
> >> >>>>>>>> https://issues.apache.org/jira/browse/FLINK-2092
> >> >>>>>>>>
> >> >>>>>>>> On Fri, May 22, 2015 at 11:08 AM, Stephan Ewen
> >> >>>>>>>> <sewen@apache.org
> >> >>>>
> >> >>>>>> wrote:
> >> >>>>>>>>
> >> >>>>>>>>> Hi all!
> >> >>>>>>>>>
> >> >>>>>>>>> Me merged a patch yesterday that changed the API behavior
> >> >>>>>>>>> of
> >> >>> the
> >> >>>>>>>>> "DataSet.print()" function.
> >> >>>>>>>>>
> >> >>>>>>>>> "print()" now prints to stdout on the client process,
> >> >>>>>>>>> rather
> >> >>> than
> >> >>>>> the
> >> >>>>>>>>> TaskManager process, as before. This is much nicer for
> >> >>> debugging
> >> >>>>> and
> >> >>>>>>>>> exploring data sets.
> >> >>>>>>>>>
> >> >>>>>>>>> One implication of this is that print() is now an eager
> >> >>>>>>>>> method
> >> >>> (
> >> >>>>> like
> >> >>>>>>>>> collect() or count() ). That means that calling "print()"
> >> >>>>> immediately
> >> >>>>>>>>> triggers the execution, and no "env.execute()" is required
> >> >>>>>>>>> any
> >> >>>>> more.
> >> >>>>>>>>>
> >> >>>>>>>>> Greetings,
> >> >>>>>>>>> Stephan
> >> >>>>>>>>>
> >> >>>>>>>>>
> >> >>>>>>>>
> >> >>>>>>>
> >> >>>>>>
> >> >>>>>
> >> >>>>
> >> >>>
> >> >>
> >>
> >>
> >>
> >>
> >>
>

Re: Changed the behavior of "DataSet.print()"

Posted by Aljoscha Krettek <al...@apache.org>.
+1 for printOnTaskManager(prefix)

On Tue, Jun 2, 2015 at 11:35 AM, Robert Metzger <rm...@apache.org> wrote:
> I would like to reach consensus on this before the 0.9 release.
>
> So far we have the following ideas:
>
> writeToWorkerStdOut(prefix)
> printOnTaskManager(prefix) (+1)
> logOnTaskManager(prefix)
>
> I'm against logOnTM because we are not logging the output, we are writing
> or printing it.
>
>
> *I would vote for deprecating "print(prefix)" and adding
> "writeToWorkerStdOut(prefix)"*
>
>
>
> On Thu, May 28, 2015 at 5:00 PM, Chiwan Park <ch...@icloud.com> wrote:
>
>> I agree that avoiding name which starts with “print” is better.
>>
>> Regards,
>> Chiwan Park
>>
>> > On May 28, 2015, at 11:35 PM, Maximilian Michels <mx...@apache.org> wrote:
>> >
>> > +1 for printOnTaskManager()
>> >
>> > On Thu, May 28, 2015 at 2:53 PM, Kruse, Sebastian <
>> Sebastian.Kruse@hpi.de>
>> > wrote:
>> >
>> >> Thanks, for your quick responses!
>> >>
>> >> I also think that renaming the old print method should do the trick. As
>> a
>> >> contribution to your brainstorming for a name, I propose
>> logOnTaskManager()
>> >> ;)
>> >>
>> >> Cheers,
>> >> Sebastian
>> >>
>> >> -----Original Message-----
>> >> From: Fabian Hueske [mailto:fhueske@gmail.com]
>> >> Sent: Donnerstag, 28. Mai 2015 14:34
>> >> To: dev@flink.apache.org
>> >> Subject: Re: Changed the behavior of "DataSet.print()"
>> >>
>> >> As I said, the common print prefix might indicate eager execution.
>> >>
>> >> I know that writeToTaskManagerStdOut() is quite bulky, but we should
>> make
>> >> the difference in the behavior very clear, IMO.
>> >>
>> >> 2015-05-28 14:29 GMT+02:00 Stephan Ewen <se...@apache.org>:
>> >>
>> >>> Actually, there is a method "print(String prefix)" which still goes to
>> >>> the sysout of where the job is executed.
>> >>>
>> >>> Let's give that one the name "printOnTaskManager()" and then we should
>> >>> have it...
>> >>>
>> >>> On Thu, May 28, 2015 at 2:13 PM, Fabian Hueske <fh...@gmail.com>
>> >> wrote:
>> >>>
>> >>>> I would avoid to call it printXYZ, since print()'s behavior changed
>> >>>> to eager execution.
>> >>>>
>> >>>> 2015-05-28 14:10 GMT+02:00 Robert Metzger <rm...@apache.org>:
>> >>>>
>> >>>>> Okay, you are right, local is actually confusing.
>> >>>>> I'm against introducing "worker" as a term in the API. Its still
>> >>>>> called "TaskManager". Maybe "printOnTaskManager()" ?
>> >>>>>
>> >>>>> On Thu, May 28, 2015 at 2:06 PM, Fabian Hueske <fh...@gmail.com>
>> >>>> wrote:
>> >>>>>
>> >>>>>> +1 for both.
>> >>>>>>
>> >>>>>> printLocal() might not be the best name, because "local" is not
>> >>>>>> well defined and could also be understood as the local machine
>> >>>>>> of the
>> >>> user.
>> >>>>>> How about naming the method completely different
>> >>>> (writeToWorkerStdOut()?)
>> >>>>>> to make sure users are not confused with eager and lazy execution?
>> >>>>>>
>> >>>>>>
>> >>>>>> 2015-05-28 13:44 GMT+02:00 Robert Metzger <rm...@apache.org>:
>> >>>>>>
>> >>>>>>> Hi Sebastian,
>> >>>>>>>
>> >>>>>>> thank you for the feedback. I agree that both variants have a
>> >>>>>>> right
>> >>>> to
>> >>>>>>> exist.
>> >>>>>>>
>> >>>>>>> I would vote for adding another method to the DataSet called
>> >>>>>> "printLocal()"
>> >>>>>>> that has the old behavior.
>> >>>>>>>
>> >>>>>>> On Thu, May 28, 2015 at 1:01 PM, Kruse, Sebastian <
>> >>>>>> Sebastian.Kruse@hpi.de>
>> >>>>>>> wrote:
>> >>>>>>>
>> >>>>>>>> Hi everyone,
>> >>>>>>>>
>> >>>>>>>> I am a bit worried about that recent change of the print()
>> >>> method.
>> >>>> I
>> >>>>>> can
>> >>>>>>>> understand the rationale that obtaining the stdout from all
>> >>>>>>>> the taskmanagers is cumbersome (although, for local
>> >>>>>>>> debugging the old
>> >>>>>> print()
>> >>>>>>>> was fine).
>> >>>>>>>> However, a major problem, I see with the new print(), is,
>> >>>>>>>> that
>> >>> now
>> >>>>> you
>> >>>>>>> can
>> >>>>>>>> only have one print() per plan, as the plan is directly
>> >>>>>>>> executed
>> >>> as
>> >>>>>> soon
>> >>>>>>> as
>> >>>>>>>> print() is invoked. If you regard print() as a debugging
>> >>>>>>>> means,
>> >>>> this
>> >>>>>> is a
>> >>>>>>>> severe restriction.
>> >>>>>>>> I see use cases for both print() implementations, but I
>> >>>>>>>> would at
>> >>>>> least
>> >>>>>>>> provide some kind of backwards compatibility, be at a
>> >>>>>>>> parameter
>> >>> or
>> >>>> a
>> >>>>>>>> legacyPrint() method or anything else. As I assume print()
>> >>>>>>>> to be
>> >>>> very
>> >>>>>>>> frequently used, a lot of existing programs would benefit
>> >>>>>>>> from
>> >>> this
>> >>>>> and
>> >>>>>>>> might otherwise not be directly portable to newer Flink
>> >> versions.
>> >>>>> What
>> >>>>>> do
>> >>>>>>>> you think?
>> >>>>>>>>
>> >>>>>>>> Cheers,
>> >>>>>>>> Sebastian
>> >>>>>>>>
>> >>>>>>>> -----Original Message-----
>> >>>>>>>> From: Robert Metzger [mailto:rmetzger@apache.org]
>> >>>>>>>> Sent: Dienstag, 26. Mai 2015 11:12
>> >>>>>>>> To: dev@flink.apache.org
>> >>>>>>>> Subject: Re: Changed the behavior of "DataSet.print()"
>> >>>>>>>>
>> >>>>>>>> I've filed a JIRA to update the documentation:
>> >>>>>>>> https://issues.apache.org/jira/browse/FLINK-2092
>> >>>>>>>>
>> >>>>>>>> On Fri, May 22, 2015 at 11:08 AM, Stephan Ewen
>> >>>>>>>> <sewen@apache.org
>> >>>>
>> >>>>>> wrote:
>> >>>>>>>>
>> >>>>>>>>> Hi all!
>> >>>>>>>>>
>> >>>>>>>>> Me merged a patch yesterday that changed the API behavior
>> >>>>>>>>> of
>> >>> the
>> >>>>>>>>> "DataSet.print()" function.
>> >>>>>>>>>
>> >>>>>>>>> "print()" now prints to stdout on the client process,
>> >>>>>>>>> rather
>> >>> than
>> >>>>> the
>> >>>>>>>>> TaskManager process, as before. This is much nicer for
>> >>> debugging
>> >>>>> and
>> >>>>>>>>> exploring data sets.
>> >>>>>>>>>
>> >>>>>>>>> One implication of this is that print() is now an eager
>> >>>>>>>>> method
>> >>> (
>> >>>>> like
>> >>>>>>>>> collect() or count() ). That means that calling "print()"
>> >>>>> immediately
>> >>>>>>>>> triggers the execution, and no "env.execute()" is required
>> >>>>>>>>> any
>> >>>>> more.
>> >>>>>>>>>
>> >>>>>>>>> Greetings,
>> >>>>>>>>> Stephan
>> >>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>
>> >>>>>>>
>> >>>>>>
>> >>>>>
>> >>>>
>> >>>
>> >>
>>
>>
>>
>>
>>