You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flink.apache.org by Stephan Ewen <se...@apache.org> on 2015/05/22 11:08:58 UTC

Changed the behavior of "DataSet.print()"

Hi all!

Me merged a patch yesterday that changed the API behavior of the
"DataSet.print()" function.

"print()" now prints to stdout on the client process, rather than the
TaskManager process, as before. This is much nicer for debugging and
exploring data sets.

One implication of this is that print() is now an eager method ( like
collect() or count() ). That means that calling "print()" immediately
triggers the execution, and no "env.execute()" is required any more.

Greetings,
Stephan

Re: Changed the behavior of "DataSet.print()"

Posted by Robert Metzger <rm...@apache.org>.
Resolved in https://issues.apache.org/jira/browse/FLINK-2070.

I'll update the documentation.

On Thu, Jun 4, 2015 at 12:22 AM, Stephan Ewen <se...@apache.org> wrote:

> I'll prepare a fix...
>
> On Wed, Jun 3, 2015 at 10:24 PM, Stephan Ewen <se...@apache.org> wrote:
>
> > +1 for printOnTaskManager(prefix)
> >
> > +1 for deprecating the print(prefix) method.
> >
> > On Tue, Jun 2, 2015 at 5:24 PM, Aljoscha Krettek <al...@apache.org>
> > wrote:
> >
> >> By the way, we also should rename the corresponding Streaming API
> >> method accordingly.
> >>
> >> On Tue, Jun 2, 2015 at 3:24 PM, Maximilian Michels <mx...@apache.org>
> >> wrote:
> >> > +1 for printOnTaskManager(prefix)
> >> >
> >> > On Tue, Jun 2, 2015 at 1:54 PM, Kostas Tzoumas <kt...@apache.org>
> >> wrote:
> >> >
> >> >> +1 for printOnTaskManager(prefix)
> >> >>
> >> >> On Tue, Jun 2, 2015 at 1:35 PM, Till Rohrmann <tr...@apache.org>
> >> >> wrote:
> >> >>
> >> >> > +1 for printOnTaskManager(prefix)
> >> >> >
> >> >> > On Tue, Jun 2, 2015 at 12:08 PM, Fabian Hueske <fh...@gmail.com>
> >> >> wrote:
> >> >> >
> >> >> > > +1 for writeToWorkerStdOut(prefix)
> >> >> > > On Jun 2, 2015 11:42, "Aljoscha Krettek" <al...@apache.org>
> >> wrote:
> >> >> > >
> >> >> > > > +1 for printOnTaskManager(prefix)
> >> >> > > >
> >> >> > > > On Tue, Jun 2, 2015 at 11:35 AM, Robert Metzger <
> >> rmetzger@apache.org
> >> >> >
> >> >> > > > wrote:
> >> >> > > > > I would like to reach consensus on this before the 0.9
> release.
> >> >> > > > >
> >> >> > > > > So far we have the following ideas:
> >> >> > > > >
> >> >> > > > > writeToWorkerStdOut(prefix)
> >> >> > > > > printOnTaskManager(prefix) (+1)
> >> >> > > > > logOnTaskManager(prefix)
> >> >> > > > >
> >> >> > > > > I'm against logOnTM because we are not logging the output, we
> >> are
> >> >> > > writing
> >> >> > > > > or printing it.
> >> >> > > > >
> >> >> > > > >
> >> >> > > > > *I would vote for deprecating "print(prefix)" and adding
> >> >> > > > > "writeToWorkerStdOut(prefix)"*
> >> >> > > > >
> >> >> > > > >
> >> >> > > > >
> >> >> > > > > On Thu, May 28, 2015 at 5:00 PM, Chiwan Park <
> >> >> chiwanpark@icloud.com>
> >> >> > > > wrote:
> >> >> > > > >
> >> >> > > > >> I agree that avoiding name which starts with “print” is
> >> better.
> >> >> > > > >>
> >> >> > > > >> Regards,
> >> >> > > > >> Chiwan Park
> >> >> > > > >>
> >> >> > > > >> > On May 28, 2015, at 11:35 PM, Maximilian Michels <
> >> >> mxm@apache.org>
> >> >> > > > wrote:
> >> >> > > > >> >
> >> >> > > > >> > +1 for printOnTaskManager()
> >> >> > > > >> >
> >> >> > > > >> > On Thu, May 28, 2015 at 2:53 PM, Kruse, Sebastian <
> >> >> > > > >> Sebastian.Kruse@hpi.de>
> >> >> > > > >> > wrote:
> >> >> > > > >> >
> >> >> > > > >> >> Thanks, for your quick responses!
> >> >> > > > >> >>
> >> >> > > > >> >> I also think that renaming the old print method should do
> >> the
> >> >> > > trick.
> >> >> > > > As
> >> >> > > > >> a
> >> >> > > > >> >> contribution to your brainstorming for a name, I propose
> >> >> > > > >> logOnTaskManager()
> >> >> > > > >> >> ;)
> >> >> > > > >> >>
> >> >> > > > >> >> Cheers,
> >> >> > > > >> >> Sebastian
> >> >> > > > >> >>
> >> >> > > > >> >> -----Original Message-----
> >> >> > > > >> >> From: Fabian Hueske [mailto:fhueske@gmail.com]
> >> >> > > > >> >> Sent: Donnerstag, 28. Mai 2015 14:34
> >> >> > > > >> >> To: dev@flink.apache.org
> >> >> > > > >> >> Subject: Re: Changed the behavior of "DataSet.print()"
> >> >> > > > >> >>
> >> >> > > > >> >> As I said, the common print prefix might indicate eager
> >> >> > execution.
> >> >> > > > >> >>
> >> >> > > > >> >> I know that writeToTaskManagerStdOut() is quite bulky,
> but
> >> we
> >> >> > > should
> >> >> > > > >> make
> >> >> > > > >> >> the difference in the behavior very clear, IMO.
> >> >> > > > >> >>
> >> >> > > > >> >> 2015-05-28 14:29 GMT+02:00 Stephan Ewen <
> sewen@apache.org
> >> >:
> >> >> > > > >> >>
> >> >> > > > >> >>> Actually, there is a method "print(String prefix)" which
> >> still
> >> >> > > goes
> >> >> > > > to
> >> >> > > > >> >>> the sysout of where the job is executed.
> >> >> > > > >> >>>
> >> >> > > > >> >>> Let's give that one the name "printOnTaskManager()" and
> >> then
> >> >> we
> >> >> > > > should
> >> >> > > > >> >>> have it...
> >> >> > > > >> >>>
> >> >> > > > >> >>> On Thu, May 28, 2015 at 2:13 PM, Fabian Hueske <
> >> >> > fhueske@gmail.com
> >> >> > > >
> >> >> > > > >> >> wrote:
> >> >> > > > >> >>>
> >> >> > > > >> >>>> I would avoid to call it printXYZ, since print()'s
> >> behavior
> >> >> > > changed
> >> >> > > > >> >>>> to eager execution.
> >> >> > > > >> >>>>
> >> >> > > > >> >>>> 2015-05-28 14:10 GMT+02:00 Robert Metzger <
> >> >> rmetzger@apache.org
> >> >> > >:
> >> >> > > > >> >>>>
> >> >> > > > >> >>>>> Okay, you are right, local is actually confusing.
> >> >> > > > >> >>>>> I'm against introducing "worker" as a term in the API.
> >> Its
> >> >> > still
> >> >> > > > >> >>>>> called "TaskManager". Maybe "printOnTaskManager()" ?
> >> >> > > > >> >>>>>
> >> >> > > > >> >>>>> On Thu, May 28, 2015 at 2:06 PM, Fabian Hueske <
> >> >> > > fhueske@gmail.com
> >> >> > > > >
> >> >> > > > >> >>>> wrote:
> >> >> > > > >> >>>>>
> >> >> > > > >> >>>>>> +1 for both.
> >> >> > > > >> >>>>>>
> >> >> > > > >> >>>>>> printLocal() might not be the best name, because
> >> "local" is
> >> >> > not
> >> >> > > > >> >>>>>> well defined and could also be understood as the
> local
> >> >> > machine
> >> >> > > > >> >>>>>> of the
> >> >> > > > >> >>> user.
> >> >> > > > >> >>>>>> How about naming the method completely different
> >> >> > > > >> >>>> (writeToWorkerStdOut()?)
> >> >> > > > >> >>>>>> to make sure users are not confused with eager and
> lazy
> >> >> > > > execution?
> >> >> > > > >> >>>>>>
> >> >> > > > >> >>>>>>
> >> >> > > > >> >>>>>> 2015-05-28 13:44 GMT+02:00 Robert Metzger <
> >> >> > rmetzger@apache.org
> >> >> > > >:
> >> >> > > > >> >>>>>>
> >> >> > > > >> >>>>>>> Hi Sebastian,
> >> >> > > > >> >>>>>>>
> >> >> > > > >> >>>>>>> thank you for the feedback. I agree that both
> variants
> >> >> have
> >> >> > a
> >> >> > > > >> >>>>>>> right
> >> >> > > > >> >>>> to
> >> >> > > > >> >>>>>>> exist.
> >> >> > > > >> >>>>>>>
> >> >> > > > >> >>>>>>> I would vote for adding another method to the
> DataSet
> >> >> called
> >> >> > > > >> >>>>>> "printLocal()"
> >> >> > > > >> >>>>>>> that has the old behavior.
> >> >> > > > >> >>>>>>>
> >> >> > > > >> >>>>>>> On Thu, May 28, 2015 at 1:01 PM, Kruse, Sebastian <
> >> >> > > > >> >>>>>> Sebastian.Kruse@hpi.de>
> >> >> > > > >> >>>>>>> wrote:
> >> >> > > > >> >>>>>>>
> >> >> > > > >> >>>>>>>> Hi everyone,
> >> >> > > > >> >>>>>>>>
> >> >> > > > >> >>>>>>>> I am a bit worried about that recent change of the
> >> >> print()
> >> >> > > > >> >>> method.
> >> >> > > > >> >>>> I
> >> >> > > > >> >>>>>> can
> >> >> > > > >> >>>>>>>> understand the rationale that obtaining the stdout
> >> from
> >> >> all
> >> >> > > > >> >>>>>>>> the taskmanagers is cumbersome (although, for local
> >> >> > > > >> >>>>>>>> debugging the old
> >> >> > > > >> >>>>>> print()
> >> >> > > > >> >>>>>>>> was fine).
> >> >> > > > >> >>>>>>>> However, a major problem, I see with the new
> >> print(), is,
> >> >> > > > >> >>>>>>>> that
> >> >> > > > >> >>> now
> >> >> > > > >> >>>>> you
> >> >> > > > >> >>>>>>> can
> >> >> > > > >> >>>>>>>> only have one print() per plan, as the plan is
> >> directly
> >> >> > > > >> >>>>>>>> executed
> >> >> > > > >> >>> as
> >> >> > > > >> >>>>>> soon
> >> >> > > > >> >>>>>>> as
> >> >> > > > >> >>>>>>>> print() is invoked. If you regard print() as a
> >> debugging
> >> >> > > > >> >>>>>>>> means,
> >> >> > > > >> >>>> this
> >> >> > > > >> >>>>>> is a
> >> >> > > > >> >>>>>>>> severe restriction.
> >> >> > > > >> >>>>>>>> I see use cases for both print() implementations,
> >> but I
> >> >> > > > >> >>>>>>>> would at
> >> >> > > > >> >>>>> least
> >> >> > > > >> >>>>>>>> provide some kind of backwards compatibility, be
> at a
> >> >> > > > >> >>>>>>>> parameter
> >> >> > > > >> >>> or
> >> >> > > > >> >>>> a
> >> >> > > > >> >>>>>>>> legacyPrint() method or anything else. As I assume
> >> >> print()
> >> >> > > > >> >>>>>>>> to be
> >> >> > > > >> >>>> very
> >> >> > > > >> >>>>>>>> frequently used, a lot of existing programs would
> >> benefit
> >> >> > > > >> >>>>>>>> from
> >> >> > > > >> >>> this
> >> >> > > > >> >>>>> and
> >> >> > > > >> >>>>>>>> might otherwise not be directly portable to newer
> >> Flink
> >> >> > > > >> >> versions.
> >> >> > > > >> >>>>> What
> >> >> > > > >> >>>>>> do
> >> >> > > > >> >>>>>>>> you think?
> >> >> > > > >> >>>>>>>>
> >> >> > > > >> >>>>>>>> Cheers,
> >> >> > > > >> >>>>>>>> Sebastian
> >> >> > > > >> >>>>>>>>
> >> >> > > > >> >>>>>>>> -----Original Message-----
> >> >> > > > >> >>>>>>>> From: Robert Metzger [mailto:rmetzger@apache.org]
> >> >> > > > >> >>>>>>>> Sent: Dienstag, 26. Mai 2015 11:12
> >> >> > > > >> >>>>>>>> To: dev@flink.apache.org
> >> >> > > > >> >>>>>>>> Subject: Re: Changed the behavior of
> >> "DataSet.print()"
> >> >> > > > >> >>>>>>>>
> >> >> > > > >> >>>>>>>> I've filed a JIRA to update the documentation:
> >> >> > > > >> >>>>>>>> https://issues.apache.org/jira/browse/FLINK-2092
> >> >> > > > >> >>>>>>>>
> >> >> > > > >> >>>>>>>> On Fri, May 22, 2015 at 11:08 AM, Stephan Ewen
> >> >> > > > >> >>>>>>>> <sewen@apache.org
> >> >> > > > >> >>>>
> >> >> > > > >> >>>>>> wrote:
> >> >> > > > >> >>>>>>>>
> >> >> > > > >> >>>>>>>>> Hi all!
> >> >> > > > >> >>>>>>>>>
> >> >> > > > >> >>>>>>>>> Me merged a patch yesterday that changed the API
> >> >> behavior
> >> >> > > > >> >>>>>>>>> of
> >> >> > > > >> >>> the
> >> >> > > > >> >>>>>>>>> "DataSet.print()" function.
> >> >> > > > >> >>>>>>>>>
> >> >> > > > >> >>>>>>>>> "print()" now prints to stdout on the client
> >> process,
> >> >> > > > >> >>>>>>>>> rather
> >> >> > > > >> >>> than
> >> >> > > > >> >>>>> the
> >> >> > > > >> >>>>>>>>> TaskManager process, as before. This is much nicer
> >> for
> >> >> > > > >> >>> debugging
> >> >> > > > >> >>>>> and
> >> >> > > > >> >>>>>>>>> exploring data sets.
> >> >> > > > >> >>>>>>>>>
> >> >> > > > >> >>>>>>>>> One implication of this is that print() is now an
> >> eager
> >> >> > > > >> >>>>>>>>> method
> >> >> > > > >> >>> (
> >> >> > > > >> >>>>> like
> >> >> > > > >> >>>>>>>>> collect() or count() ). That means that calling
> >> >> "print()"
> >> >> > > > >> >>>>> immediately
> >> >> > > > >> >>>>>>>>> triggers the execution, and no "env.execute()" is
> >> >> required
> >> >> > > > >> >>>>>>>>> any
> >> >> > > > >> >>>>> more.
> >> >> > > > >> >>>>>>>>>
> >> >> > > > >> >>>>>>>>> Greetings,
> >> >> > > > >> >>>>>>>>> Stephan
> >> >> > > > >> >>>>>>>>>
> >> >> > > > >> >>>>>>>>>
> >> >> > > > >> >>>>>>>>
> >> >> > > > >> >>>>>>>
> >> >> > > > >> >>>>>>
> >> >> > > > >> >>>>>
> >> >> > > > >> >>>>
> >> >> > > > >> >>>
> >> >> > > > >> >>
> >> >> > > > >>
> >> >> > > > >>
> >> >> > > > >>
> >> >> > > > >>
> >> >> > > > >>
> >> >> > > >
> >> >> > >
> >> >> >
> >> >>
> >>
> >
> >
>

Re: Changed the behavior of "DataSet.print()"

Posted by Stephan Ewen <se...@apache.org>.
I'll prepare a fix...

On Wed, Jun 3, 2015 at 10:24 PM, Stephan Ewen <se...@apache.org> wrote:

> +1 for printOnTaskManager(prefix)
>
> +1 for deprecating the print(prefix) method.
>
> On Tue, Jun 2, 2015 at 5:24 PM, Aljoscha Krettek <al...@apache.org>
> wrote:
>
>> By the way, we also should rename the corresponding Streaming API
>> method accordingly.
>>
>> On Tue, Jun 2, 2015 at 3:24 PM, Maximilian Michels <mx...@apache.org>
>> wrote:
>> > +1 for printOnTaskManager(prefix)
>> >
>> > On Tue, Jun 2, 2015 at 1:54 PM, Kostas Tzoumas <kt...@apache.org>
>> wrote:
>> >
>> >> +1 for printOnTaskManager(prefix)
>> >>
>> >> On Tue, Jun 2, 2015 at 1:35 PM, Till Rohrmann <tr...@apache.org>
>> >> wrote:
>> >>
>> >> > +1 for printOnTaskManager(prefix)
>> >> >
>> >> > On Tue, Jun 2, 2015 at 12:08 PM, Fabian Hueske <fh...@gmail.com>
>> >> wrote:
>> >> >
>> >> > > +1 for writeToWorkerStdOut(prefix)
>> >> > > On Jun 2, 2015 11:42, "Aljoscha Krettek" <al...@apache.org>
>> wrote:
>> >> > >
>> >> > > > +1 for printOnTaskManager(prefix)
>> >> > > >
>> >> > > > On Tue, Jun 2, 2015 at 11:35 AM, Robert Metzger <
>> rmetzger@apache.org
>> >> >
>> >> > > > wrote:
>> >> > > > > I would like to reach consensus on this before the 0.9 release.
>> >> > > > >
>> >> > > > > So far we have the following ideas:
>> >> > > > >
>> >> > > > > writeToWorkerStdOut(prefix)
>> >> > > > > printOnTaskManager(prefix) (+1)
>> >> > > > > logOnTaskManager(prefix)
>> >> > > > >
>> >> > > > > I'm against logOnTM because we are not logging the output, we
>> are
>> >> > > writing
>> >> > > > > or printing it.
>> >> > > > >
>> >> > > > >
>> >> > > > > *I would vote for deprecating "print(prefix)" and adding
>> >> > > > > "writeToWorkerStdOut(prefix)"*
>> >> > > > >
>> >> > > > >
>> >> > > > >
>> >> > > > > On Thu, May 28, 2015 at 5:00 PM, Chiwan Park <
>> >> chiwanpark@icloud.com>
>> >> > > > wrote:
>> >> > > > >
>> >> > > > >> I agree that avoiding name which starts with “print” is
>> better.
>> >> > > > >>
>> >> > > > >> Regards,
>> >> > > > >> Chiwan Park
>> >> > > > >>
>> >> > > > >> > On May 28, 2015, at 11:35 PM, Maximilian Michels <
>> >> mxm@apache.org>
>> >> > > > wrote:
>> >> > > > >> >
>> >> > > > >> > +1 for printOnTaskManager()
>> >> > > > >> >
>> >> > > > >> > On Thu, May 28, 2015 at 2:53 PM, Kruse, Sebastian <
>> >> > > > >> Sebastian.Kruse@hpi.de>
>> >> > > > >> > wrote:
>> >> > > > >> >
>> >> > > > >> >> Thanks, for your quick responses!
>> >> > > > >> >>
>> >> > > > >> >> I also think that renaming the old print method should do
>> the
>> >> > > trick.
>> >> > > > As
>> >> > > > >> a
>> >> > > > >> >> contribution to your brainstorming for a name, I propose
>> >> > > > >> logOnTaskManager()
>> >> > > > >> >> ;)
>> >> > > > >> >>
>> >> > > > >> >> Cheers,
>> >> > > > >> >> Sebastian
>> >> > > > >> >>
>> >> > > > >> >> -----Original Message-----
>> >> > > > >> >> From: Fabian Hueske [mailto:fhueske@gmail.com]
>> >> > > > >> >> Sent: Donnerstag, 28. Mai 2015 14:34
>> >> > > > >> >> To: dev@flink.apache.org
>> >> > > > >> >> Subject: Re: Changed the behavior of "DataSet.print()"
>> >> > > > >> >>
>> >> > > > >> >> As I said, the common print prefix might indicate eager
>> >> > execution.
>> >> > > > >> >>
>> >> > > > >> >> I know that writeToTaskManagerStdOut() is quite bulky, but
>> we
>> >> > > should
>> >> > > > >> make
>> >> > > > >> >> the difference in the behavior very clear, IMO.
>> >> > > > >> >>
>> >> > > > >> >> 2015-05-28 14:29 GMT+02:00 Stephan Ewen <sewen@apache.org
>> >:
>> >> > > > >> >>
>> >> > > > >> >>> Actually, there is a method "print(String prefix)" which
>> still
>> >> > > goes
>> >> > > > to
>> >> > > > >> >>> the sysout of where the job is executed.
>> >> > > > >> >>>
>> >> > > > >> >>> Let's give that one the name "printOnTaskManager()" and
>> then
>> >> we
>> >> > > > should
>> >> > > > >> >>> have it...
>> >> > > > >> >>>
>> >> > > > >> >>> On Thu, May 28, 2015 at 2:13 PM, Fabian Hueske <
>> >> > fhueske@gmail.com
>> >> > > >
>> >> > > > >> >> wrote:
>> >> > > > >> >>>
>> >> > > > >> >>>> I would avoid to call it printXYZ, since print()'s
>> behavior
>> >> > > changed
>> >> > > > >> >>>> to eager execution.
>> >> > > > >> >>>>
>> >> > > > >> >>>> 2015-05-28 14:10 GMT+02:00 Robert Metzger <
>> >> rmetzger@apache.org
>> >> > >:
>> >> > > > >> >>>>
>> >> > > > >> >>>>> Okay, you are right, local is actually confusing.
>> >> > > > >> >>>>> I'm against introducing "worker" as a term in the API.
>> Its
>> >> > still
>> >> > > > >> >>>>> called "TaskManager". Maybe "printOnTaskManager()" ?
>> >> > > > >> >>>>>
>> >> > > > >> >>>>> On Thu, May 28, 2015 at 2:06 PM, Fabian Hueske <
>> >> > > fhueske@gmail.com
>> >> > > > >
>> >> > > > >> >>>> wrote:
>> >> > > > >> >>>>>
>> >> > > > >> >>>>>> +1 for both.
>> >> > > > >> >>>>>>
>> >> > > > >> >>>>>> printLocal() might not be the best name, because
>> "local" is
>> >> > not
>> >> > > > >> >>>>>> well defined and could also be understood as the local
>> >> > machine
>> >> > > > >> >>>>>> of the
>> >> > > > >> >>> user.
>> >> > > > >> >>>>>> How about naming the method completely different
>> >> > > > >> >>>> (writeToWorkerStdOut()?)
>> >> > > > >> >>>>>> to make sure users are not confused with eager and lazy
>> >> > > > execution?
>> >> > > > >> >>>>>>
>> >> > > > >> >>>>>>
>> >> > > > >> >>>>>> 2015-05-28 13:44 GMT+02:00 Robert Metzger <
>> >> > rmetzger@apache.org
>> >> > > >:
>> >> > > > >> >>>>>>
>> >> > > > >> >>>>>>> Hi Sebastian,
>> >> > > > >> >>>>>>>
>> >> > > > >> >>>>>>> thank you for the feedback. I agree that both variants
>> >> have
>> >> > a
>> >> > > > >> >>>>>>> right
>> >> > > > >> >>>> to
>> >> > > > >> >>>>>>> exist.
>> >> > > > >> >>>>>>>
>> >> > > > >> >>>>>>> I would vote for adding another method to the DataSet
>> >> called
>> >> > > > >> >>>>>> "printLocal()"
>> >> > > > >> >>>>>>> that has the old behavior.
>> >> > > > >> >>>>>>>
>> >> > > > >> >>>>>>> On Thu, May 28, 2015 at 1:01 PM, Kruse, Sebastian <
>> >> > > > >> >>>>>> Sebastian.Kruse@hpi.de>
>> >> > > > >> >>>>>>> wrote:
>> >> > > > >> >>>>>>>
>> >> > > > >> >>>>>>>> Hi everyone,
>> >> > > > >> >>>>>>>>
>> >> > > > >> >>>>>>>> I am a bit worried about that recent change of the
>> >> print()
>> >> > > > >> >>> method.
>> >> > > > >> >>>> I
>> >> > > > >> >>>>>> can
>> >> > > > >> >>>>>>>> understand the rationale that obtaining the stdout
>> from
>> >> all
>> >> > > > >> >>>>>>>> the taskmanagers is cumbersome (although, for local
>> >> > > > >> >>>>>>>> debugging the old
>> >> > > > >> >>>>>> print()
>> >> > > > >> >>>>>>>> was fine).
>> >> > > > >> >>>>>>>> However, a major problem, I see with the new
>> print(), is,
>> >> > > > >> >>>>>>>> that
>> >> > > > >> >>> now
>> >> > > > >> >>>>> you
>> >> > > > >> >>>>>>> can
>> >> > > > >> >>>>>>>> only have one print() per plan, as the plan is
>> directly
>> >> > > > >> >>>>>>>> executed
>> >> > > > >> >>> as
>> >> > > > >> >>>>>> soon
>> >> > > > >> >>>>>>> as
>> >> > > > >> >>>>>>>> print() is invoked. If you regard print() as a
>> debugging
>> >> > > > >> >>>>>>>> means,
>> >> > > > >> >>>> this
>> >> > > > >> >>>>>> is a
>> >> > > > >> >>>>>>>> severe restriction.
>> >> > > > >> >>>>>>>> I see use cases for both print() implementations,
>> but I
>> >> > > > >> >>>>>>>> would at
>> >> > > > >> >>>>> least
>> >> > > > >> >>>>>>>> provide some kind of backwards compatibility, be at a
>> >> > > > >> >>>>>>>> parameter
>> >> > > > >> >>> or
>> >> > > > >> >>>> a
>> >> > > > >> >>>>>>>> legacyPrint() method or anything else. As I assume
>> >> print()
>> >> > > > >> >>>>>>>> to be
>> >> > > > >> >>>> very
>> >> > > > >> >>>>>>>> frequently used, a lot of existing programs would
>> benefit
>> >> > > > >> >>>>>>>> from
>> >> > > > >> >>> this
>> >> > > > >> >>>>> and
>> >> > > > >> >>>>>>>> might otherwise not be directly portable to newer
>> Flink
>> >> > > > >> >> versions.
>> >> > > > >> >>>>> What
>> >> > > > >> >>>>>> do
>> >> > > > >> >>>>>>>> you think?
>> >> > > > >> >>>>>>>>
>> >> > > > >> >>>>>>>> Cheers,
>> >> > > > >> >>>>>>>> Sebastian
>> >> > > > >> >>>>>>>>
>> >> > > > >> >>>>>>>> -----Original Message-----
>> >> > > > >> >>>>>>>> From: Robert Metzger [mailto:rmetzger@apache.org]
>> >> > > > >> >>>>>>>> Sent: Dienstag, 26. Mai 2015 11:12
>> >> > > > >> >>>>>>>> To: dev@flink.apache.org
>> >> > > > >> >>>>>>>> Subject: Re: Changed the behavior of
>> "DataSet.print()"
>> >> > > > >> >>>>>>>>
>> >> > > > >> >>>>>>>> I've filed a JIRA to update the documentation:
>> >> > > > >> >>>>>>>> https://issues.apache.org/jira/browse/FLINK-2092
>> >> > > > >> >>>>>>>>
>> >> > > > >> >>>>>>>> On Fri, May 22, 2015 at 11:08 AM, Stephan Ewen
>> >> > > > >> >>>>>>>> <sewen@apache.org
>> >> > > > >> >>>>
>> >> > > > >> >>>>>> wrote:
>> >> > > > >> >>>>>>>>
>> >> > > > >> >>>>>>>>> Hi all!
>> >> > > > >> >>>>>>>>>
>> >> > > > >> >>>>>>>>> Me merged a patch yesterday that changed the API
>> >> behavior
>> >> > > > >> >>>>>>>>> of
>> >> > > > >> >>> the
>> >> > > > >> >>>>>>>>> "DataSet.print()" function.
>> >> > > > >> >>>>>>>>>
>> >> > > > >> >>>>>>>>> "print()" now prints to stdout on the client
>> process,
>> >> > > > >> >>>>>>>>> rather
>> >> > > > >> >>> than
>> >> > > > >> >>>>> the
>> >> > > > >> >>>>>>>>> TaskManager process, as before. This is much nicer
>> for
>> >> > > > >> >>> debugging
>> >> > > > >> >>>>> and
>> >> > > > >> >>>>>>>>> exploring data sets.
>> >> > > > >> >>>>>>>>>
>> >> > > > >> >>>>>>>>> One implication of this is that print() is now an
>> eager
>> >> > > > >> >>>>>>>>> method
>> >> > > > >> >>> (
>> >> > > > >> >>>>> like
>> >> > > > >> >>>>>>>>> collect() or count() ). That means that calling
>> >> "print()"
>> >> > > > >> >>>>> immediately
>> >> > > > >> >>>>>>>>> triggers the execution, and no "env.execute()" is
>> >> required
>> >> > > > >> >>>>>>>>> any
>> >> > > > >> >>>>> more.
>> >> > > > >> >>>>>>>>>
>> >> > > > >> >>>>>>>>> Greetings,
>> >> > > > >> >>>>>>>>> Stephan
>> >> > > > >> >>>>>>>>>
>> >> > > > >> >>>>>>>>>
>> >> > > > >> >>>>>>>>
>> >> > > > >> >>>>>>>
>> >> > > > >> >>>>>>
>> >> > > > >> >>>>>
>> >> > > > >> >>>>
>> >> > > > >> >>>
>> >> > > > >> >>
>> >> > > > >>
>> >> > > > >>
>> >> > > > >>
>> >> > > > >>
>> >> > > > >>
>> >> > > >
>> >> > >
>> >> >
>> >>
>>
>
>

Re: Changed the behavior of "DataSet.print()"

Posted by Stephan Ewen <se...@apache.org>.
+1 for printOnTaskManager(prefix)

+1 for deprecating the print(prefix) method.

On Tue, Jun 2, 2015 at 5:24 PM, Aljoscha Krettek <al...@apache.org>
wrote:

> By the way, we also should rename the corresponding Streaming API
> method accordingly.
>
> On Tue, Jun 2, 2015 at 3:24 PM, Maximilian Michels <mx...@apache.org> wrote:
> > +1 for printOnTaskManager(prefix)
> >
> > On Tue, Jun 2, 2015 at 1:54 PM, Kostas Tzoumas <kt...@apache.org>
> wrote:
> >
> >> +1 for printOnTaskManager(prefix)
> >>
> >> On Tue, Jun 2, 2015 at 1:35 PM, Till Rohrmann <tr...@apache.org>
> >> wrote:
> >>
> >> > +1 for printOnTaskManager(prefix)
> >> >
> >> > On Tue, Jun 2, 2015 at 12:08 PM, Fabian Hueske <fh...@gmail.com>
> >> wrote:
> >> >
> >> > > +1 for writeToWorkerStdOut(prefix)
> >> > > On Jun 2, 2015 11:42, "Aljoscha Krettek" <al...@apache.org>
> wrote:
> >> > >
> >> > > > +1 for printOnTaskManager(prefix)
> >> > > >
> >> > > > On Tue, Jun 2, 2015 at 11:35 AM, Robert Metzger <
> rmetzger@apache.org
> >> >
> >> > > > wrote:
> >> > > > > I would like to reach consensus on this before the 0.9 release.
> >> > > > >
> >> > > > > So far we have the following ideas:
> >> > > > >
> >> > > > > writeToWorkerStdOut(prefix)
> >> > > > > printOnTaskManager(prefix) (+1)
> >> > > > > logOnTaskManager(prefix)
> >> > > > >
> >> > > > > I'm against logOnTM because we are not logging the output, we
> are
> >> > > writing
> >> > > > > or printing it.
> >> > > > >
> >> > > > >
> >> > > > > *I would vote for deprecating "print(prefix)" and adding
> >> > > > > "writeToWorkerStdOut(prefix)"*
> >> > > > >
> >> > > > >
> >> > > > >
> >> > > > > On Thu, May 28, 2015 at 5:00 PM, Chiwan Park <
> >> chiwanpark@icloud.com>
> >> > > > wrote:
> >> > > > >
> >> > > > >> I agree that avoiding name which starts with “print” is better.
> >> > > > >>
> >> > > > >> Regards,
> >> > > > >> Chiwan Park
> >> > > > >>
> >> > > > >> > On May 28, 2015, at 11:35 PM, Maximilian Michels <
> >> mxm@apache.org>
> >> > > > wrote:
> >> > > > >> >
> >> > > > >> > +1 for printOnTaskManager()
> >> > > > >> >
> >> > > > >> > On Thu, May 28, 2015 at 2:53 PM, Kruse, Sebastian <
> >> > > > >> Sebastian.Kruse@hpi.de>
> >> > > > >> > wrote:
> >> > > > >> >
> >> > > > >> >> Thanks, for your quick responses!
> >> > > > >> >>
> >> > > > >> >> I also think that renaming the old print method should do
> the
> >> > > trick.
> >> > > > As
> >> > > > >> a
> >> > > > >> >> contribution to your brainstorming for a name, I propose
> >> > > > >> logOnTaskManager()
> >> > > > >> >> ;)
> >> > > > >> >>
> >> > > > >> >> Cheers,
> >> > > > >> >> Sebastian
> >> > > > >> >>
> >> > > > >> >> -----Original Message-----
> >> > > > >> >> From: Fabian Hueske [mailto:fhueske@gmail.com]
> >> > > > >> >> Sent: Donnerstag, 28. Mai 2015 14:34
> >> > > > >> >> To: dev@flink.apache.org
> >> > > > >> >> Subject: Re: Changed the behavior of "DataSet.print()"
> >> > > > >> >>
> >> > > > >> >> As I said, the common print prefix might indicate eager
> >> > execution.
> >> > > > >> >>
> >> > > > >> >> I know that writeToTaskManagerStdOut() is quite bulky, but
> we
> >> > > should
> >> > > > >> make
> >> > > > >> >> the difference in the behavior very clear, IMO.
> >> > > > >> >>
> >> > > > >> >> 2015-05-28 14:29 GMT+02:00 Stephan Ewen <se...@apache.org>:
> >> > > > >> >>
> >> > > > >> >>> Actually, there is a method "print(String prefix)" which
> still
> >> > > goes
> >> > > > to
> >> > > > >> >>> the sysout of where the job is executed.
> >> > > > >> >>>
> >> > > > >> >>> Let's give that one the name "printOnTaskManager()" and
> then
> >> we
> >> > > > should
> >> > > > >> >>> have it...
> >> > > > >> >>>
> >> > > > >> >>> On Thu, May 28, 2015 at 2:13 PM, Fabian Hueske <
> >> > fhueske@gmail.com
> >> > > >
> >> > > > >> >> wrote:
> >> > > > >> >>>
> >> > > > >> >>>> I would avoid to call it printXYZ, since print()'s
> behavior
> >> > > changed
> >> > > > >> >>>> to eager execution.
> >> > > > >> >>>>
> >> > > > >> >>>> 2015-05-28 14:10 GMT+02:00 Robert Metzger <
> >> rmetzger@apache.org
> >> > >:
> >> > > > >> >>>>
> >> > > > >> >>>>> Okay, you are right, local is actually confusing.
> >> > > > >> >>>>> I'm against introducing "worker" as a term in the API.
> Its
> >> > still
> >> > > > >> >>>>> called "TaskManager". Maybe "printOnTaskManager()" ?
> >> > > > >> >>>>>
> >> > > > >> >>>>> On Thu, May 28, 2015 at 2:06 PM, Fabian Hueske <
> >> > > fhueske@gmail.com
> >> > > > >
> >> > > > >> >>>> wrote:
> >> > > > >> >>>>>
> >> > > > >> >>>>>> +1 for both.
> >> > > > >> >>>>>>
> >> > > > >> >>>>>> printLocal() might not be the best name, because
> "local" is
> >> > not
> >> > > > >> >>>>>> well defined and could also be understood as the local
> >> > machine
> >> > > > >> >>>>>> of the
> >> > > > >> >>> user.
> >> > > > >> >>>>>> How about naming the method completely different
> >> > > > >> >>>> (writeToWorkerStdOut()?)
> >> > > > >> >>>>>> to make sure users are not confused with eager and lazy
> >> > > > execution?
> >> > > > >> >>>>>>
> >> > > > >> >>>>>>
> >> > > > >> >>>>>> 2015-05-28 13:44 GMT+02:00 Robert Metzger <
> >> > rmetzger@apache.org
> >> > > >:
> >> > > > >> >>>>>>
> >> > > > >> >>>>>>> Hi Sebastian,
> >> > > > >> >>>>>>>
> >> > > > >> >>>>>>> thank you for the feedback. I agree that both variants
> >> have
> >> > a
> >> > > > >> >>>>>>> right
> >> > > > >> >>>> to
> >> > > > >> >>>>>>> exist.
> >> > > > >> >>>>>>>
> >> > > > >> >>>>>>> I would vote for adding another method to the DataSet
> >> called
> >> > > > >> >>>>>> "printLocal()"
> >> > > > >> >>>>>>> that has the old behavior.
> >> > > > >> >>>>>>>
> >> > > > >> >>>>>>> On Thu, May 28, 2015 at 1:01 PM, Kruse, Sebastian <
> >> > > > >> >>>>>> Sebastian.Kruse@hpi.de>
> >> > > > >> >>>>>>> wrote:
> >> > > > >> >>>>>>>
> >> > > > >> >>>>>>>> Hi everyone,
> >> > > > >> >>>>>>>>
> >> > > > >> >>>>>>>> I am a bit worried about that recent change of the
> >> print()
> >> > > > >> >>> method.
> >> > > > >> >>>> I
> >> > > > >> >>>>>> can
> >> > > > >> >>>>>>>> understand the rationale that obtaining the stdout
> from
> >> all
> >> > > > >> >>>>>>>> the taskmanagers is cumbersome (although, for local
> >> > > > >> >>>>>>>> debugging the old
> >> > > > >> >>>>>> print()
> >> > > > >> >>>>>>>> was fine).
> >> > > > >> >>>>>>>> However, a major problem, I see with the new print(),
> is,
> >> > > > >> >>>>>>>> that
> >> > > > >> >>> now
> >> > > > >> >>>>> you
> >> > > > >> >>>>>>> can
> >> > > > >> >>>>>>>> only have one print() per plan, as the plan is
> directly
> >> > > > >> >>>>>>>> executed
> >> > > > >> >>> as
> >> > > > >> >>>>>> soon
> >> > > > >> >>>>>>> as
> >> > > > >> >>>>>>>> print() is invoked. If you regard print() as a
> debugging
> >> > > > >> >>>>>>>> means,
> >> > > > >> >>>> this
> >> > > > >> >>>>>> is a
> >> > > > >> >>>>>>>> severe restriction.
> >> > > > >> >>>>>>>> I see use cases for both print() implementations, but
> I
> >> > > > >> >>>>>>>> would at
> >> > > > >> >>>>> least
> >> > > > >> >>>>>>>> provide some kind of backwards compatibility, be at a
> >> > > > >> >>>>>>>> parameter
> >> > > > >> >>> or
> >> > > > >> >>>> a
> >> > > > >> >>>>>>>> legacyPrint() method or anything else. As I assume
> >> print()
> >> > > > >> >>>>>>>> to be
> >> > > > >> >>>> very
> >> > > > >> >>>>>>>> frequently used, a lot of existing programs would
> benefit
> >> > > > >> >>>>>>>> from
> >> > > > >> >>> this
> >> > > > >> >>>>> and
> >> > > > >> >>>>>>>> might otherwise not be directly portable to newer
> Flink
> >> > > > >> >> versions.
> >> > > > >> >>>>> What
> >> > > > >> >>>>>> do
> >> > > > >> >>>>>>>> you think?
> >> > > > >> >>>>>>>>
> >> > > > >> >>>>>>>> Cheers,
> >> > > > >> >>>>>>>> Sebastian
> >> > > > >> >>>>>>>>
> >> > > > >> >>>>>>>> -----Original Message-----
> >> > > > >> >>>>>>>> From: Robert Metzger [mailto:rmetzger@apache.org]
> >> > > > >> >>>>>>>> Sent: Dienstag, 26. Mai 2015 11:12
> >> > > > >> >>>>>>>> To: dev@flink.apache.org
> >> > > > >> >>>>>>>> Subject: Re: Changed the behavior of "DataSet.print()"
> >> > > > >> >>>>>>>>
> >> > > > >> >>>>>>>> I've filed a JIRA to update the documentation:
> >> > > > >> >>>>>>>> https://issues.apache.org/jira/browse/FLINK-2092
> >> > > > >> >>>>>>>>
> >> > > > >> >>>>>>>> On Fri, May 22, 2015 at 11:08 AM, Stephan Ewen
> >> > > > >> >>>>>>>> <sewen@apache.org
> >> > > > >> >>>>
> >> > > > >> >>>>>> wrote:
> >> > > > >> >>>>>>>>
> >> > > > >> >>>>>>>>> Hi all!
> >> > > > >> >>>>>>>>>
> >> > > > >> >>>>>>>>> Me merged a patch yesterday that changed the API
> >> behavior
> >> > > > >> >>>>>>>>> of
> >> > > > >> >>> the
> >> > > > >> >>>>>>>>> "DataSet.print()" function.
> >> > > > >> >>>>>>>>>
> >> > > > >> >>>>>>>>> "print()" now prints to stdout on the client process,
> >> > > > >> >>>>>>>>> rather
> >> > > > >> >>> than
> >> > > > >> >>>>> the
> >> > > > >> >>>>>>>>> TaskManager process, as before. This is much nicer
> for
> >> > > > >> >>> debugging
> >> > > > >> >>>>> and
> >> > > > >> >>>>>>>>> exploring data sets.
> >> > > > >> >>>>>>>>>
> >> > > > >> >>>>>>>>> One implication of this is that print() is now an
> eager
> >> > > > >> >>>>>>>>> method
> >> > > > >> >>> (
> >> > > > >> >>>>> like
> >> > > > >> >>>>>>>>> collect() or count() ). That means that calling
> >> "print()"
> >> > > > >> >>>>> immediately
> >> > > > >> >>>>>>>>> triggers the execution, and no "env.execute()" is
> >> required
> >> > > > >> >>>>>>>>> any
> >> > > > >> >>>>> more.
> >> > > > >> >>>>>>>>>
> >> > > > >> >>>>>>>>> Greetings,
> >> > > > >> >>>>>>>>> Stephan
> >> > > > >> >>>>>>>>>
> >> > > > >> >>>>>>>>>
> >> > > > >> >>>>>>>>
> >> > > > >> >>>>>>>
> >> > > > >> >>>>>>
> >> > > > >> >>>>>
> >> > > > >> >>>>
> >> > > > >> >>>
> >> > > > >> >>
> >> > > > >>
> >> > > > >>
> >> > > > >>
> >> > > > >>
> >> > > > >>
> >> > > >
> >> > >
> >> >
> >>
>

Re: Changed the behavior of "DataSet.print()"

Posted by Aljoscha Krettek <al...@apache.org>.
By the way, we also should rename the corresponding Streaming API
method accordingly.

On Tue, Jun 2, 2015 at 3:24 PM, Maximilian Michels <mx...@apache.org> wrote:
> +1 for printOnTaskManager(prefix)
>
> On Tue, Jun 2, 2015 at 1:54 PM, Kostas Tzoumas <kt...@apache.org> wrote:
>
>> +1 for printOnTaskManager(prefix)
>>
>> On Tue, Jun 2, 2015 at 1:35 PM, Till Rohrmann <tr...@apache.org>
>> wrote:
>>
>> > +1 for printOnTaskManager(prefix)
>> >
>> > On Tue, Jun 2, 2015 at 12:08 PM, Fabian Hueske <fh...@gmail.com>
>> wrote:
>> >
>> > > +1 for writeToWorkerStdOut(prefix)
>> > > On Jun 2, 2015 11:42, "Aljoscha Krettek" <al...@apache.org> wrote:
>> > >
>> > > > +1 for printOnTaskManager(prefix)
>> > > >
>> > > > On Tue, Jun 2, 2015 at 11:35 AM, Robert Metzger <rmetzger@apache.org
>> >
>> > > > wrote:
>> > > > > I would like to reach consensus on this before the 0.9 release.
>> > > > >
>> > > > > So far we have the following ideas:
>> > > > >
>> > > > > writeToWorkerStdOut(prefix)
>> > > > > printOnTaskManager(prefix) (+1)
>> > > > > logOnTaskManager(prefix)
>> > > > >
>> > > > > I'm against logOnTM because we are not logging the output, we are
>> > > writing
>> > > > > or printing it.
>> > > > >
>> > > > >
>> > > > > *I would vote for deprecating "print(prefix)" and adding
>> > > > > "writeToWorkerStdOut(prefix)"*
>> > > > >
>> > > > >
>> > > > >
>> > > > > On Thu, May 28, 2015 at 5:00 PM, Chiwan Park <
>> chiwanpark@icloud.com>
>> > > > wrote:
>> > > > >
>> > > > >> I agree that avoiding name which starts with “print” is better.
>> > > > >>
>> > > > >> Regards,
>> > > > >> Chiwan Park
>> > > > >>
>> > > > >> > On May 28, 2015, at 11:35 PM, Maximilian Michels <
>> mxm@apache.org>
>> > > > wrote:
>> > > > >> >
>> > > > >> > +1 for printOnTaskManager()
>> > > > >> >
>> > > > >> > On Thu, May 28, 2015 at 2:53 PM, Kruse, Sebastian <
>> > > > >> Sebastian.Kruse@hpi.de>
>> > > > >> > wrote:
>> > > > >> >
>> > > > >> >> Thanks, for your quick responses!
>> > > > >> >>
>> > > > >> >> I also think that renaming the old print method should do the
>> > > trick.
>> > > > As
>> > > > >> a
>> > > > >> >> contribution to your brainstorming for a name, I propose
>> > > > >> logOnTaskManager()
>> > > > >> >> ;)
>> > > > >> >>
>> > > > >> >> Cheers,
>> > > > >> >> Sebastian
>> > > > >> >>
>> > > > >> >> -----Original Message-----
>> > > > >> >> From: Fabian Hueske [mailto:fhueske@gmail.com]
>> > > > >> >> Sent: Donnerstag, 28. Mai 2015 14:34
>> > > > >> >> To: dev@flink.apache.org
>> > > > >> >> Subject: Re: Changed the behavior of "DataSet.print()"
>> > > > >> >>
>> > > > >> >> As I said, the common print prefix might indicate eager
>> > execution.
>> > > > >> >>
>> > > > >> >> I know that writeToTaskManagerStdOut() is quite bulky, but we
>> > > should
>> > > > >> make
>> > > > >> >> the difference in the behavior very clear, IMO.
>> > > > >> >>
>> > > > >> >> 2015-05-28 14:29 GMT+02:00 Stephan Ewen <se...@apache.org>:
>> > > > >> >>
>> > > > >> >>> Actually, there is a method "print(String prefix)" which still
>> > > goes
>> > > > to
>> > > > >> >>> the sysout of where the job is executed.
>> > > > >> >>>
>> > > > >> >>> Let's give that one the name "printOnTaskManager()" and then
>> we
>> > > > should
>> > > > >> >>> have it...
>> > > > >> >>>
>> > > > >> >>> On Thu, May 28, 2015 at 2:13 PM, Fabian Hueske <
>> > fhueske@gmail.com
>> > > >
>> > > > >> >> wrote:
>> > > > >> >>>
>> > > > >> >>>> I would avoid to call it printXYZ, since print()'s behavior
>> > > changed
>> > > > >> >>>> to eager execution.
>> > > > >> >>>>
>> > > > >> >>>> 2015-05-28 14:10 GMT+02:00 Robert Metzger <
>> rmetzger@apache.org
>> > >:
>> > > > >> >>>>
>> > > > >> >>>>> Okay, you are right, local is actually confusing.
>> > > > >> >>>>> I'm against introducing "worker" as a term in the API. Its
>> > still
>> > > > >> >>>>> called "TaskManager". Maybe "printOnTaskManager()" ?
>> > > > >> >>>>>
>> > > > >> >>>>> On Thu, May 28, 2015 at 2:06 PM, Fabian Hueske <
>> > > fhueske@gmail.com
>> > > > >
>> > > > >> >>>> wrote:
>> > > > >> >>>>>
>> > > > >> >>>>>> +1 for both.
>> > > > >> >>>>>>
>> > > > >> >>>>>> printLocal() might not be the best name, because "local" is
>> > not
>> > > > >> >>>>>> well defined and could also be understood as the local
>> > machine
>> > > > >> >>>>>> of the
>> > > > >> >>> user.
>> > > > >> >>>>>> How about naming the method completely different
>> > > > >> >>>> (writeToWorkerStdOut()?)
>> > > > >> >>>>>> to make sure users are not confused with eager and lazy
>> > > > execution?
>> > > > >> >>>>>>
>> > > > >> >>>>>>
>> > > > >> >>>>>> 2015-05-28 13:44 GMT+02:00 Robert Metzger <
>> > rmetzger@apache.org
>> > > >:
>> > > > >> >>>>>>
>> > > > >> >>>>>>> Hi Sebastian,
>> > > > >> >>>>>>>
>> > > > >> >>>>>>> thank you for the feedback. I agree that both variants
>> have
>> > a
>> > > > >> >>>>>>> right
>> > > > >> >>>> to
>> > > > >> >>>>>>> exist.
>> > > > >> >>>>>>>
>> > > > >> >>>>>>> I would vote for adding another method to the DataSet
>> called
>> > > > >> >>>>>> "printLocal()"
>> > > > >> >>>>>>> that has the old behavior.
>> > > > >> >>>>>>>
>> > > > >> >>>>>>> On Thu, May 28, 2015 at 1:01 PM, Kruse, Sebastian <
>> > > > >> >>>>>> Sebastian.Kruse@hpi.de>
>> > > > >> >>>>>>> wrote:
>> > > > >> >>>>>>>
>> > > > >> >>>>>>>> Hi everyone,
>> > > > >> >>>>>>>>
>> > > > >> >>>>>>>> I am a bit worried about that recent change of the
>> print()
>> > > > >> >>> method.
>> > > > >> >>>> I
>> > > > >> >>>>>> can
>> > > > >> >>>>>>>> understand the rationale that obtaining the stdout from
>> all
>> > > > >> >>>>>>>> the taskmanagers is cumbersome (although, for local
>> > > > >> >>>>>>>> debugging the old
>> > > > >> >>>>>> print()
>> > > > >> >>>>>>>> was fine).
>> > > > >> >>>>>>>> However, a major problem, I see with the new print(), is,
>> > > > >> >>>>>>>> that
>> > > > >> >>> now
>> > > > >> >>>>> you
>> > > > >> >>>>>>> can
>> > > > >> >>>>>>>> only have one print() per plan, as the plan is directly
>> > > > >> >>>>>>>> executed
>> > > > >> >>> as
>> > > > >> >>>>>> soon
>> > > > >> >>>>>>> as
>> > > > >> >>>>>>>> print() is invoked. If you regard print() as a debugging
>> > > > >> >>>>>>>> means,
>> > > > >> >>>> this
>> > > > >> >>>>>> is a
>> > > > >> >>>>>>>> severe restriction.
>> > > > >> >>>>>>>> I see use cases for both print() implementations, but I
>> > > > >> >>>>>>>> would at
>> > > > >> >>>>> least
>> > > > >> >>>>>>>> provide some kind of backwards compatibility, be at a
>> > > > >> >>>>>>>> parameter
>> > > > >> >>> or
>> > > > >> >>>> a
>> > > > >> >>>>>>>> legacyPrint() method or anything else. As I assume
>> print()
>> > > > >> >>>>>>>> to be
>> > > > >> >>>> very
>> > > > >> >>>>>>>> frequently used, a lot of existing programs would benefit
>> > > > >> >>>>>>>> from
>> > > > >> >>> this
>> > > > >> >>>>> and
>> > > > >> >>>>>>>> might otherwise not be directly portable to newer Flink
>> > > > >> >> versions.
>> > > > >> >>>>> What
>> > > > >> >>>>>> do
>> > > > >> >>>>>>>> you think?
>> > > > >> >>>>>>>>
>> > > > >> >>>>>>>> Cheers,
>> > > > >> >>>>>>>> Sebastian
>> > > > >> >>>>>>>>
>> > > > >> >>>>>>>> -----Original Message-----
>> > > > >> >>>>>>>> From: Robert Metzger [mailto:rmetzger@apache.org]
>> > > > >> >>>>>>>> Sent: Dienstag, 26. Mai 2015 11:12
>> > > > >> >>>>>>>> To: dev@flink.apache.org
>> > > > >> >>>>>>>> Subject: Re: Changed the behavior of "DataSet.print()"
>> > > > >> >>>>>>>>
>> > > > >> >>>>>>>> I've filed a JIRA to update the documentation:
>> > > > >> >>>>>>>> https://issues.apache.org/jira/browse/FLINK-2092
>> > > > >> >>>>>>>>
>> > > > >> >>>>>>>> On Fri, May 22, 2015 at 11:08 AM, Stephan Ewen
>> > > > >> >>>>>>>> <sewen@apache.org
>> > > > >> >>>>
>> > > > >> >>>>>> wrote:
>> > > > >> >>>>>>>>
>> > > > >> >>>>>>>>> Hi all!
>> > > > >> >>>>>>>>>
>> > > > >> >>>>>>>>> Me merged a patch yesterday that changed the API
>> behavior
>> > > > >> >>>>>>>>> of
>> > > > >> >>> the
>> > > > >> >>>>>>>>> "DataSet.print()" function.
>> > > > >> >>>>>>>>>
>> > > > >> >>>>>>>>> "print()" now prints to stdout on the client process,
>> > > > >> >>>>>>>>> rather
>> > > > >> >>> than
>> > > > >> >>>>> the
>> > > > >> >>>>>>>>> TaskManager process, as before. This is much nicer for
>> > > > >> >>> debugging
>> > > > >> >>>>> and
>> > > > >> >>>>>>>>> exploring data sets.
>> > > > >> >>>>>>>>>
>> > > > >> >>>>>>>>> One implication of this is that print() is now an eager
>> > > > >> >>>>>>>>> method
>> > > > >> >>> (
>> > > > >> >>>>> like
>> > > > >> >>>>>>>>> collect() or count() ). That means that calling
>> "print()"
>> > > > >> >>>>> immediately
>> > > > >> >>>>>>>>> triggers the execution, and no "env.execute()" is
>> required
>> > > > >> >>>>>>>>> any
>> > > > >> >>>>> more.
>> > > > >> >>>>>>>>>
>> > > > >> >>>>>>>>> Greetings,
>> > > > >> >>>>>>>>> Stephan
>> > > > >> >>>>>>>>>
>> > > > >> >>>>>>>>>
>> > > > >> >>>>>>>>
>> > > > >> >>>>>>>
>> > > > >> >>>>>>
>> > > > >> >>>>>
>> > > > >> >>>>
>> > > > >> >>>
>> > > > >> >>
>> > > > >>
>> > > > >>
>> > > > >>
>> > > > >>
>> > > > >>
>> > > >
>> > >
>> >
>>

Re: Changed the behavior of "DataSet.print()"

Posted by Maximilian Michels <mx...@apache.org>.
+1 for printOnTaskManager(prefix)

On Tue, Jun 2, 2015 at 1:54 PM, Kostas Tzoumas <kt...@apache.org> wrote:

> +1 for printOnTaskManager(prefix)
>
> On Tue, Jun 2, 2015 at 1:35 PM, Till Rohrmann <tr...@apache.org>
> wrote:
>
> > +1 for printOnTaskManager(prefix)
> >
> > On Tue, Jun 2, 2015 at 12:08 PM, Fabian Hueske <fh...@gmail.com>
> wrote:
> >
> > > +1 for writeToWorkerStdOut(prefix)
> > > On Jun 2, 2015 11:42, "Aljoscha Krettek" <al...@apache.org> wrote:
> > >
> > > > +1 for printOnTaskManager(prefix)
> > > >
> > > > On Tue, Jun 2, 2015 at 11:35 AM, Robert Metzger <rmetzger@apache.org
> >
> > > > wrote:
> > > > > I would like to reach consensus on this before the 0.9 release.
> > > > >
> > > > > So far we have the following ideas:
> > > > >
> > > > > writeToWorkerStdOut(prefix)
> > > > > printOnTaskManager(prefix) (+1)
> > > > > logOnTaskManager(prefix)
> > > > >
> > > > > I'm against logOnTM because we are not logging the output, we are
> > > writing
> > > > > or printing it.
> > > > >
> > > > >
> > > > > *I would vote for deprecating "print(prefix)" and adding
> > > > > "writeToWorkerStdOut(prefix)"*
> > > > >
> > > > >
> > > > >
> > > > > On Thu, May 28, 2015 at 5:00 PM, Chiwan Park <
> chiwanpark@icloud.com>
> > > > wrote:
> > > > >
> > > > >> I agree that avoiding name which starts with “print” is better.
> > > > >>
> > > > >> Regards,
> > > > >> Chiwan Park
> > > > >>
> > > > >> > On May 28, 2015, at 11:35 PM, Maximilian Michels <
> mxm@apache.org>
> > > > wrote:
> > > > >> >
> > > > >> > +1 for printOnTaskManager()
> > > > >> >
> > > > >> > On Thu, May 28, 2015 at 2:53 PM, Kruse, Sebastian <
> > > > >> Sebastian.Kruse@hpi.de>
> > > > >> > wrote:
> > > > >> >
> > > > >> >> Thanks, for your quick responses!
> > > > >> >>
> > > > >> >> I also think that renaming the old print method should do the
> > > trick.
> > > > As
> > > > >> a
> > > > >> >> contribution to your brainstorming for a name, I propose
> > > > >> logOnTaskManager()
> > > > >> >> ;)
> > > > >> >>
> > > > >> >> Cheers,
> > > > >> >> Sebastian
> > > > >> >>
> > > > >> >> -----Original Message-----
> > > > >> >> From: Fabian Hueske [mailto:fhueske@gmail.com]
> > > > >> >> Sent: Donnerstag, 28. Mai 2015 14:34
> > > > >> >> To: dev@flink.apache.org
> > > > >> >> Subject: Re: Changed the behavior of "DataSet.print()"
> > > > >> >>
> > > > >> >> As I said, the common print prefix might indicate eager
> > execution.
> > > > >> >>
> > > > >> >> I know that writeToTaskManagerStdOut() is quite bulky, but we
> > > should
> > > > >> make
> > > > >> >> the difference in the behavior very clear, IMO.
> > > > >> >>
> > > > >> >> 2015-05-28 14:29 GMT+02:00 Stephan Ewen <se...@apache.org>:
> > > > >> >>
> > > > >> >>> Actually, there is a method "print(String prefix)" which still
> > > goes
> > > > to
> > > > >> >>> the sysout of where the job is executed.
> > > > >> >>>
> > > > >> >>> Let's give that one the name "printOnTaskManager()" and then
> we
> > > > should
> > > > >> >>> have it...
> > > > >> >>>
> > > > >> >>> On Thu, May 28, 2015 at 2:13 PM, Fabian Hueske <
> > fhueske@gmail.com
> > > >
> > > > >> >> wrote:
> > > > >> >>>
> > > > >> >>>> I would avoid to call it printXYZ, since print()'s behavior
> > > changed
> > > > >> >>>> to eager execution.
> > > > >> >>>>
> > > > >> >>>> 2015-05-28 14:10 GMT+02:00 Robert Metzger <
> rmetzger@apache.org
> > >:
> > > > >> >>>>
> > > > >> >>>>> Okay, you are right, local is actually confusing.
> > > > >> >>>>> I'm against introducing "worker" as a term in the API. Its
> > still
> > > > >> >>>>> called "TaskManager". Maybe "printOnTaskManager()" ?
> > > > >> >>>>>
> > > > >> >>>>> On Thu, May 28, 2015 at 2:06 PM, Fabian Hueske <
> > > fhueske@gmail.com
> > > > >
> > > > >> >>>> wrote:
> > > > >> >>>>>
> > > > >> >>>>>> +1 for both.
> > > > >> >>>>>>
> > > > >> >>>>>> printLocal() might not be the best name, because "local" is
> > not
> > > > >> >>>>>> well defined and could also be understood as the local
> > machine
> > > > >> >>>>>> of the
> > > > >> >>> user.
> > > > >> >>>>>> How about naming the method completely different
> > > > >> >>>> (writeToWorkerStdOut()?)
> > > > >> >>>>>> to make sure users are not confused with eager and lazy
> > > > execution?
> > > > >> >>>>>>
> > > > >> >>>>>>
> > > > >> >>>>>> 2015-05-28 13:44 GMT+02:00 Robert Metzger <
> > rmetzger@apache.org
> > > >:
> > > > >> >>>>>>
> > > > >> >>>>>>> Hi Sebastian,
> > > > >> >>>>>>>
> > > > >> >>>>>>> thank you for the feedback. I agree that both variants
> have
> > a
> > > > >> >>>>>>> right
> > > > >> >>>> to
> > > > >> >>>>>>> exist.
> > > > >> >>>>>>>
> > > > >> >>>>>>> I would vote for adding another method to the DataSet
> called
> > > > >> >>>>>> "printLocal()"
> > > > >> >>>>>>> that has the old behavior.
> > > > >> >>>>>>>
> > > > >> >>>>>>> On Thu, May 28, 2015 at 1:01 PM, Kruse, Sebastian <
> > > > >> >>>>>> Sebastian.Kruse@hpi.de>
> > > > >> >>>>>>> wrote:
> > > > >> >>>>>>>
> > > > >> >>>>>>>> Hi everyone,
> > > > >> >>>>>>>>
> > > > >> >>>>>>>> I am a bit worried about that recent change of the
> print()
> > > > >> >>> method.
> > > > >> >>>> I
> > > > >> >>>>>> can
> > > > >> >>>>>>>> understand the rationale that obtaining the stdout from
> all
> > > > >> >>>>>>>> the taskmanagers is cumbersome (although, for local
> > > > >> >>>>>>>> debugging the old
> > > > >> >>>>>> print()
> > > > >> >>>>>>>> was fine).
> > > > >> >>>>>>>> However, a major problem, I see with the new print(), is,
> > > > >> >>>>>>>> that
> > > > >> >>> now
> > > > >> >>>>> you
> > > > >> >>>>>>> can
> > > > >> >>>>>>>> only have one print() per plan, as the plan is directly
> > > > >> >>>>>>>> executed
> > > > >> >>> as
> > > > >> >>>>>> soon
> > > > >> >>>>>>> as
> > > > >> >>>>>>>> print() is invoked. If you regard print() as a debugging
> > > > >> >>>>>>>> means,
> > > > >> >>>> this
> > > > >> >>>>>> is a
> > > > >> >>>>>>>> severe restriction.
> > > > >> >>>>>>>> I see use cases for both print() implementations, but I
> > > > >> >>>>>>>> would at
> > > > >> >>>>> least
> > > > >> >>>>>>>> provide some kind of backwards compatibility, be at a
> > > > >> >>>>>>>> parameter
> > > > >> >>> or
> > > > >> >>>> a
> > > > >> >>>>>>>> legacyPrint() method or anything else. As I assume
> print()
> > > > >> >>>>>>>> to be
> > > > >> >>>> very
> > > > >> >>>>>>>> frequently used, a lot of existing programs would benefit
> > > > >> >>>>>>>> from
> > > > >> >>> this
> > > > >> >>>>> and
> > > > >> >>>>>>>> might otherwise not be directly portable to newer Flink
> > > > >> >> versions.
> > > > >> >>>>> What
> > > > >> >>>>>> do
> > > > >> >>>>>>>> you think?
> > > > >> >>>>>>>>
> > > > >> >>>>>>>> Cheers,
> > > > >> >>>>>>>> Sebastian
> > > > >> >>>>>>>>
> > > > >> >>>>>>>> -----Original Message-----
> > > > >> >>>>>>>> From: Robert Metzger [mailto:rmetzger@apache.org]
> > > > >> >>>>>>>> Sent: Dienstag, 26. Mai 2015 11:12
> > > > >> >>>>>>>> To: dev@flink.apache.org
> > > > >> >>>>>>>> Subject: Re: Changed the behavior of "DataSet.print()"
> > > > >> >>>>>>>>
> > > > >> >>>>>>>> I've filed a JIRA to update the documentation:
> > > > >> >>>>>>>> https://issues.apache.org/jira/browse/FLINK-2092
> > > > >> >>>>>>>>
> > > > >> >>>>>>>> On Fri, May 22, 2015 at 11:08 AM, Stephan Ewen
> > > > >> >>>>>>>> <sewen@apache.org
> > > > >> >>>>
> > > > >> >>>>>> wrote:
> > > > >> >>>>>>>>
> > > > >> >>>>>>>>> Hi all!
> > > > >> >>>>>>>>>
> > > > >> >>>>>>>>> Me merged a patch yesterday that changed the API
> behavior
> > > > >> >>>>>>>>> of
> > > > >> >>> the
> > > > >> >>>>>>>>> "DataSet.print()" function.
> > > > >> >>>>>>>>>
> > > > >> >>>>>>>>> "print()" now prints to stdout on the client process,
> > > > >> >>>>>>>>> rather
> > > > >> >>> than
> > > > >> >>>>> the
> > > > >> >>>>>>>>> TaskManager process, as before. This is much nicer for
> > > > >> >>> debugging
> > > > >> >>>>> and
> > > > >> >>>>>>>>> exploring data sets.
> > > > >> >>>>>>>>>
> > > > >> >>>>>>>>> One implication of this is that print() is now an eager
> > > > >> >>>>>>>>> method
> > > > >> >>> (
> > > > >> >>>>> like
> > > > >> >>>>>>>>> collect() or count() ). That means that calling
> "print()"
> > > > >> >>>>> immediately
> > > > >> >>>>>>>>> triggers the execution, and no "env.execute()" is
> required
> > > > >> >>>>>>>>> any
> > > > >> >>>>> more.
> > > > >> >>>>>>>>>
> > > > >> >>>>>>>>> Greetings,
> > > > >> >>>>>>>>> Stephan
> > > > >> >>>>>>>>>
> > > > >> >>>>>>>>>
> > > > >> >>>>>>>>
> > > > >> >>>>>>>
> > > > >> >>>>>>
> > > > >> >>>>>
> > > > >> >>>>
> > > > >> >>>
> > > > >> >>
> > > > >>
> > > > >>
> > > > >>
> > > > >>
> > > > >>
> > > >
> > >
> >
>

Re: Changed the behavior of "DataSet.print()"

Posted by Kostas Tzoumas <kt...@apache.org>.
+1 for printOnTaskManager(prefix)

On Tue, Jun 2, 2015 at 1:35 PM, Till Rohrmann <tr...@apache.org> wrote:

> +1 for printOnTaskManager(prefix)
>
> On Tue, Jun 2, 2015 at 12:08 PM, Fabian Hueske <fh...@gmail.com> wrote:
>
> > +1 for writeToWorkerStdOut(prefix)
> > On Jun 2, 2015 11:42, "Aljoscha Krettek" <al...@apache.org> wrote:
> >
> > > +1 for printOnTaskManager(prefix)
> > >
> > > On Tue, Jun 2, 2015 at 11:35 AM, Robert Metzger <rm...@apache.org>
> > > wrote:
> > > > I would like to reach consensus on this before the 0.9 release.
> > > >
> > > > So far we have the following ideas:
> > > >
> > > > writeToWorkerStdOut(prefix)
> > > > printOnTaskManager(prefix) (+1)
> > > > logOnTaskManager(prefix)
> > > >
> > > > I'm against logOnTM because we are not logging the output, we are
> > writing
> > > > or printing it.
> > > >
> > > >
> > > > *I would vote for deprecating "print(prefix)" and adding
> > > > "writeToWorkerStdOut(prefix)"*
> > > >
> > > >
> > > >
> > > > On Thu, May 28, 2015 at 5:00 PM, Chiwan Park <ch...@icloud.com>
> > > wrote:
> > > >
> > > >> I agree that avoiding name which starts with “print” is better.
> > > >>
> > > >> Regards,
> > > >> Chiwan Park
> > > >>
> > > >> > On May 28, 2015, at 11:35 PM, Maximilian Michels <mx...@apache.org>
> > > wrote:
> > > >> >
> > > >> > +1 for printOnTaskManager()
> > > >> >
> > > >> > On Thu, May 28, 2015 at 2:53 PM, Kruse, Sebastian <
> > > >> Sebastian.Kruse@hpi.de>
> > > >> > wrote:
> > > >> >
> > > >> >> Thanks, for your quick responses!
> > > >> >>
> > > >> >> I also think that renaming the old print method should do the
> > trick.
> > > As
> > > >> a
> > > >> >> contribution to your brainstorming for a name, I propose
> > > >> logOnTaskManager()
> > > >> >> ;)
> > > >> >>
> > > >> >> Cheers,
> > > >> >> Sebastian
> > > >> >>
> > > >> >> -----Original Message-----
> > > >> >> From: Fabian Hueske [mailto:fhueske@gmail.com]
> > > >> >> Sent: Donnerstag, 28. Mai 2015 14:34
> > > >> >> To: dev@flink.apache.org
> > > >> >> Subject: Re: Changed the behavior of "DataSet.print()"
> > > >> >>
> > > >> >> As I said, the common print prefix might indicate eager
> execution.
> > > >> >>
> > > >> >> I know that writeToTaskManagerStdOut() is quite bulky, but we
> > should
> > > >> make
> > > >> >> the difference in the behavior very clear, IMO.
> > > >> >>
> > > >> >> 2015-05-28 14:29 GMT+02:00 Stephan Ewen <se...@apache.org>:
> > > >> >>
> > > >> >>> Actually, there is a method "print(String prefix)" which still
> > goes
> > > to
> > > >> >>> the sysout of where the job is executed.
> > > >> >>>
> > > >> >>> Let's give that one the name "printOnTaskManager()" and then we
> > > should
> > > >> >>> have it...
> > > >> >>>
> > > >> >>> On Thu, May 28, 2015 at 2:13 PM, Fabian Hueske <
> fhueske@gmail.com
> > >
> > > >> >> wrote:
> > > >> >>>
> > > >> >>>> I would avoid to call it printXYZ, since print()'s behavior
> > changed
> > > >> >>>> to eager execution.
> > > >> >>>>
> > > >> >>>> 2015-05-28 14:10 GMT+02:00 Robert Metzger <rmetzger@apache.org
> >:
> > > >> >>>>
> > > >> >>>>> Okay, you are right, local is actually confusing.
> > > >> >>>>> I'm against introducing "worker" as a term in the API. Its
> still
> > > >> >>>>> called "TaskManager". Maybe "printOnTaskManager()" ?
> > > >> >>>>>
> > > >> >>>>> On Thu, May 28, 2015 at 2:06 PM, Fabian Hueske <
> > fhueske@gmail.com
> > > >
> > > >> >>>> wrote:
> > > >> >>>>>
> > > >> >>>>>> +1 for both.
> > > >> >>>>>>
> > > >> >>>>>> printLocal() might not be the best name, because "local" is
> not
> > > >> >>>>>> well defined and could also be understood as the local
> machine
> > > >> >>>>>> of the
> > > >> >>> user.
> > > >> >>>>>> How about naming the method completely different
> > > >> >>>> (writeToWorkerStdOut()?)
> > > >> >>>>>> to make sure users are not confused with eager and lazy
> > > execution?
> > > >> >>>>>>
> > > >> >>>>>>
> > > >> >>>>>> 2015-05-28 13:44 GMT+02:00 Robert Metzger <
> rmetzger@apache.org
> > >:
> > > >> >>>>>>
> > > >> >>>>>>> Hi Sebastian,
> > > >> >>>>>>>
> > > >> >>>>>>> thank you for the feedback. I agree that both variants have
> a
> > > >> >>>>>>> right
> > > >> >>>> to
> > > >> >>>>>>> exist.
> > > >> >>>>>>>
> > > >> >>>>>>> I would vote for adding another method to the DataSet called
> > > >> >>>>>> "printLocal()"
> > > >> >>>>>>> that has the old behavior.
> > > >> >>>>>>>
> > > >> >>>>>>> On Thu, May 28, 2015 at 1:01 PM, Kruse, Sebastian <
> > > >> >>>>>> Sebastian.Kruse@hpi.de>
> > > >> >>>>>>> wrote:
> > > >> >>>>>>>
> > > >> >>>>>>>> Hi everyone,
> > > >> >>>>>>>>
> > > >> >>>>>>>> I am a bit worried about that recent change of the print()
> > > >> >>> method.
> > > >> >>>> I
> > > >> >>>>>> can
> > > >> >>>>>>>> understand the rationale that obtaining the stdout from all
> > > >> >>>>>>>> the taskmanagers is cumbersome (although, for local
> > > >> >>>>>>>> debugging the old
> > > >> >>>>>> print()
> > > >> >>>>>>>> was fine).
> > > >> >>>>>>>> However, a major problem, I see with the new print(), is,
> > > >> >>>>>>>> that
> > > >> >>> now
> > > >> >>>>> you
> > > >> >>>>>>> can
> > > >> >>>>>>>> only have one print() per plan, as the plan is directly
> > > >> >>>>>>>> executed
> > > >> >>> as
> > > >> >>>>>> soon
> > > >> >>>>>>> as
> > > >> >>>>>>>> print() is invoked. If you regard print() as a debugging
> > > >> >>>>>>>> means,
> > > >> >>>> this
> > > >> >>>>>> is a
> > > >> >>>>>>>> severe restriction.
> > > >> >>>>>>>> I see use cases for both print() implementations, but I
> > > >> >>>>>>>> would at
> > > >> >>>>> least
> > > >> >>>>>>>> provide some kind of backwards compatibility, be at a
> > > >> >>>>>>>> parameter
> > > >> >>> or
> > > >> >>>> a
> > > >> >>>>>>>> legacyPrint() method or anything else. As I assume print()
> > > >> >>>>>>>> to be
> > > >> >>>> very
> > > >> >>>>>>>> frequently used, a lot of existing programs would benefit
> > > >> >>>>>>>> from
> > > >> >>> this
> > > >> >>>>> and
> > > >> >>>>>>>> might otherwise not be directly portable to newer Flink
> > > >> >> versions.
> > > >> >>>>> What
> > > >> >>>>>> do
> > > >> >>>>>>>> you think?
> > > >> >>>>>>>>
> > > >> >>>>>>>> Cheers,
> > > >> >>>>>>>> Sebastian
> > > >> >>>>>>>>
> > > >> >>>>>>>> -----Original Message-----
> > > >> >>>>>>>> From: Robert Metzger [mailto:rmetzger@apache.org]
> > > >> >>>>>>>> Sent: Dienstag, 26. Mai 2015 11:12
> > > >> >>>>>>>> To: dev@flink.apache.org
> > > >> >>>>>>>> Subject: Re: Changed the behavior of "DataSet.print()"
> > > >> >>>>>>>>
> > > >> >>>>>>>> I've filed a JIRA to update the documentation:
> > > >> >>>>>>>> https://issues.apache.org/jira/browse/FLINK-2092
> > > >> >>>>>>>>
> > > >> >>>>>>>> On Fri, May 22, 2015 at 11:08 AM, Stephan Ewen
> > > >> >>>>>>>> <sewen@apache.org
> > > >> >>>>
> > > >> >>>>>> wrote:
> > > >> >>>>>>>>
> > > >> >>>>>>>>> Hi all!
> > > >> >>>>>>>>>
> > > >> >>>>>>>>> Me merged a patch yesterday that changed the API behavior
> > > >> >>>>>>>>> of
> > > >> >>> the
> > > >> >>>>>>>>> "DataSet.print()" function.
> > > >> >>>>>>>>>
> > > >> >>>>>>>>> "print()" now prints to stdout on the client process,
> > > >> >>>>>>>>> rather
> > > >> >>> than
> > > >> >>>>> the
> > > >> >>>>>>>>> TaskManager process, as before. This is much nicer for
> > > >> >>> debugging
> > > >> >>>>> and
> > > >> >>>>>>>>> exploring data sets.
> > > >> >>>>>>>>>
> > > >> >>>>>>>>> One implication of this is that print() is now an eager
> > > >> >>>>>>>>> method
> > > >> >>> (
> > > >> >>>>> like
> > > >> >>>>>>>>> collect() or count() ). That means that calling "print()"
> > > >> >>>>> immediately
> > > >> >>>>>>>>> triggers the execution, and no "env.execute()" is required
> > > >> >>>>>>>>> any
> > > >> >>>>> more.
> > > >> >>>>>>>>>
> > > >> >>>>>>>>> Greetings,
> > > >> >>>>>>>>> Stephan
> > > >> >>>>>>>>>
> > > >> >>>>>>>>>
> > > >> >>>>>>>>
> > > >> >>>>>>>
> > > >> >>>>>>
> > > >> >>>>>
> > > >> >>>>
> > > >> >>>
> > > >> >>
> > > >>
> > > >>
> > > >>
> > > >>
> > > >>
> > >
> >
>

Re: Changed the behavior of "DataSet.print()"

Posted by Till Rohrmann <tr...@apache.org>.
+1 for printOnTaskManager(prefix)

On Tue, Jun 2, 2015 at 12:08 PM, Fabian Hueske <fh...@gmail.com> wrote:

> +1 for writeToWorkerStdOut(prefix)
> On Jun 2, 2015 11:42, "Aljoscha Krettek" <al...@apache.org> wrote:
>
> > +1 for printOnTaskManager(prefix)
> >
> > On Tue, Jun 2, 2015 at 11:35 AM, Robert Metzger <rm...@apache.org>
> > wrote:
> > > I would like to reach consensus on this before the 0.9 release.
> > >
> > > So far we have the following ideas:
> > >
> > > writeToWorkerStdOut(prefix)
> > > printOnTaskManager(prefix) (+1)
> > > logOnTaskManager(prefix)
> > >
> > > I'm against logOnTM because we are not logging the output, we are
> writing
> > > or printing it.
> > >
> > >
> > > *I would vote for deprecating "print(prefix)" and adding
> > > "writeToWorkerStdOut(prefix)"*
> > >
> > >
> > >
> > > On Thu, May 28, 2015 at 5:00 PM, Chiwan Park <ch...@icloud.com>
> > wrote:
> > >
> > >> I agree that avoiding name which starts with “print” is better.
> > >>
> > >> Regards,
> > >> Chiwan Park
> > >>
> > >> > On May 28, 2015, at 11:35 PM, Maximilian Michels <mx...@apache.org>
> > wrote:
> > >> >
> > >> > +1 for printOnTaskManager()
> > >> >
> > >> > On Thu, May 28, 2015 at 2:53 PM, Kruse, Sebastian <
> > >> Sebastian.Kruse@hpi.de>
> > >> > wrote:
> > >> >
> > >> >> Thanks, for your quick responses!
> > >> >>
> > >> >> I also think that renaming the old print method should do the
> trick.
> > As
> > >> a
> > >> >> contribution to your brainstorming for a name, I propose
> > >> logOnTaskManager()
> > >> >> ;)
> > >> >>
> > >> >> Cheers,
> > >> >> Sebastian
> > >> >>
> > >> >> -----Original Message-----
> > >> >> From: Fabian Hueske [mailto:fhueske@gmail.com]
> > >> >> Sent: Donnerstag, 28. Mai 2015 14:34
> > >> >> To: dev@flink.apache.org
> > >> >> Subject: Re: Changed the behavior of "DataSet.print()"
> > >> >>
> > >> >> As I said, the common print prefix might indicate eager execution.
> > >> >>
> > >> >> I know that writeToTaskManagerStdOut() is quite bulky, but we
> should
> > >> make
> > >> >> the difference in the behavior very clear, IMO.
> > >> >>
> > >> >> 2015-05-28 14:29 GMT+02:00 Stephan Ewen <se...@apache.org>:
> > >> >>
> > >> >>> Actually, there is a method "print(String prefix)" which still
> goes
> > to
> > >> >>> the sysout of where the job is executed.
> > >> >>>
> > >> >>> Let's give that one the name "printOnTaskManager()" and then we
> > should
> > >> >>> have it...
> > >> >>>
> > >> >>> On Thu, May 28, 2015 at 2:13 PM, Fabian Hueske <fhueske@gmail.com
> >
> > >> >> wrote:
> > >> >>>
> > >> >>>> I would avoid to call it printXYZ, since print()'s behavior
> changed
> > >> >>>> to eager execution.
> > >> >>>>
> > >> >>>> 2015-05-28 14:10 GMT+02:00 Robert Metzger <rm...@apache.org>:
> > >> >>>>
> > >> >>>>> Okay, you are right, local is actually confusing.
> > >> >>>>> I'm against introducing "worker" as a term in the API. Its still
> > >> >>>>> called "TaskManager". Maybe "printOnTaskManager()" ?
> > >> >>>>>
> > >> >>>>> On Thu, May 28, 2015 at 2:06 PM, Fabian Hueske <
> fhueske@gmail.com
> > >
> > >> >>>> wrote:
> > >> >>>>>
> > >> >>>>>> +1 for both.
> > >> >>>>>>
> > >> >>>>>> printLocal() might not be the best name, because "local" is not
> > >> >>>>>> well defined and could also be understood as the local machine
> > >> >>>>>> of the
> > >> >>> user.
> > >> >>>>>> How about naming the method completely different
> > >> >>>> (writeToWorkerStdOut()?)
> > >> >>>>>> to make sure users are not confused with eager and lazy
> > execution?
> > >> >>>>>>
> > >> >>>>>>
> > >> >>>>>> 2015-05-28 13:44 GMT+02:00 Robert Metzger <rmetzger@apache.org
> >:
> > >> >>>>>>
> > >> >>>>>>> Hi Sebastian,
> > >> >>>>>>>
> > >> >>>>>>> thank you for the feedback. I agree that both variants have a
> > >> >>>>>>> right
> > >> >>>> to
> > >> >>>>>>> exist.
> > >> >>>>>>>
> > >> >>>>>>> I would vote for adding another method to the DataSet called
> > >> >>>>>> "printLocal()"
> > >> >>>>>>> that has the old behavior.
> > >> >>>>>>>
> > >> >>>>>>> On Thu, May 28, 2015 at 1:01 PM, Kruse, Sebastian <
> > >> >>>>>> Sebastian.Kruse@hpi.de>
> > >> >>>>>>> wrote:
> > >> >>>>>>>
> > >> >>>>>>>> Hi everyone,
> > >> >>>>>>>>
> > >> >>>>>>>> I am a bit worried about that recent change of the print()
> > >> >>> method.
> > >> >>>> I
> > >> >>>>>> can
> > >> >>>>>>>> understand the rationale that obtaining the stdout from all
> > >> >>>>>>>> the taskmanagers is cumbersome (although, for local
> > >> >>>>>>>> debugging the old
> > >> >>>>>> print()
> > >> >>>>>>>> was fine).
> > >> >>>>>>>> However, a major problem, I see with the new print(), is,
> > >> >>>>>>>> that
> > >> >>> now
> > >> >>>>> you
> > >> >>>>>>> can
> > >> >>>>>>>> only have one print() per plan, as the plan is directly
> > >> >>>>>>>> executed
> > >> >>> as
> > >> >>>>>> soon
> > >> >>>>>>> as
> > >> >>>>>>>> print() is invoked. If you regard print() as a debugging
> > >> >>>>>>>> means,
> > >> >>>> this
> > >> >>>>>> is a
> > >> >>>>>>>> severe restriction.
> > >> >>>>>>>> I see use cases for both print() implementations, but I
> > >> >>>>>>>> would at
> > >> >>>>> least
> > >> >>>>>>>> provide some kind of backwards compatibility, be at a
> > >> >>>>>>>> parameter
> > >> >>> or
> > >> >>>> a
> > >> >>>>>>>> legacyPrint() method or anything else. As I assume print()
> > >> >>>>>>>> to be
> > >> >>>> very
> > >> >>>>>>>> frequently used, a lot of existing programs would benefit
> > >> >>>>>>>> from
> > >> >>> this
> > >> >>>>> and
> > >> >>>>>>>> might otherwise not be directly portable to newer Flink
> > >> >> versions.
> > >> >>>>> What
> > >> >>>>>> do
> > >> >>>>>>>> you think?
> > >> >>>>>>>>
> > >> >>>>>>>> Cheers,
> > >> >>>>>>>> Sebastian
> > >> >>>>>>>>
> > >> >>>>>>>> -----Original Message-----
> > >> >>>>>>>> From: Robert Metzger [mailto:rmetzger@apache.org]
> > >> >>>>>>>> Sent: Dienstag, 26. Mai 2015 11:12
> > >> >>>>>>>> To: dev@flink.apache.org
> > >> >>>>>>>> Subject: Re: Changed the behavior of "DataSet.print()"
> > >> >>>>>>>>
> > >> >>>>>>>> I've filed a JIRA to update the documentation:
> > >> >>>>>>>> https://issues.apache.org/jira/browse/FLINK-2092
> > >> >>>>>>>>
> > >> >>>>>>>> On Fri, May 22, 2015 at 11:08 AM, Stephan Ewen
> > >> >>>>>>>> <sewen@apache.org
> > >> >>>>
> > >> >>>>>> wrote:
> > >> >>>>>>>>
> > >> >>>>>>>>> Hi all!
> > >> >>>>>>>>>
> > >> >>>>>>>>> Me merged a patch yesterday that changed the API behavior
> > >> >>>>>>>>> of
> > >> >>> the
> > >> >>>>>>>>> "DataSet.print()" function.
> > >> >>>>>>>>>
> > >> >>>>>>>>> "print()" now prints to stdout on the client process,
> > >> >>>>>>>>> rather
> > >> >>> than
> > >> >>>>> the
> > >> >>>>>>>>> TaskManager process, as before. This is much nicer for
> > >> >>> debugging
> > >> >>>>> and
> > >> >>>>>>>>> exploring data sets.
> > >> >>>>>>>>>
> > >> >>>>>>>>> One implication of this is that print() is now an eager
> > >> >>>>>>>>> method
> > >> >>> (
> > >> >>>>> like
> > >> >>>>>>>>> collect() or count() ). That means that calling "print()"
> > >> >>>>> immediately
> > >> >>>>>>>>> triggers the execution, and no "env.execute()" is required
> > >> >>>>>>>>> any
> > >> >>>>> more.
> > >> >>>>>>>>>
> > >> >>>>>>>>> Greetings,
> > >> >>>>>>>>> Stephan
> > >> >>>>>>>>>
> > >> >>>>>>>>>
> > >> >>>>>>>>
> > >> >>>>>>>
> > >> >>>>>>
> > >> >>>>>
> > >> >>>>
> > >> >>>
> > >> >>
> > >>
> > >>
> > >>
> > >>
> > >>
> >
>

Re: Changed the behavior of "DataSet.print()"

Posted by Fabian Hueske <fh...@gmail.com>.
+1 for writeToWorkerStdOut(prefix)
On Jun 2, 2015 11:42, "Aljoscha Krettek" <al...@apache.org> wrote:

> +1 for printOnTaskManager(prefix)
>
> On Tue, Jun 2, 2015 at 11:35 AM, Robert Metzger <rm...@apache.org>
> wrote:
> > I would like to reach consensus on this before the 0.9 release.
> >
> > So far we have the following ideas:
> >
> > writeToWorkerStdOut(prefix)
> > printOnTaskManager(prefix) (+1)
> > logOnTaskManager(prefix)
> >
> > I'm against logOnTM because we are not logging the output, we are writing
> > or printing it.
> >
> >
> > *I would vote for deprecating "print(prefix)" and adding
> > "writeToWorkerStdOut(prefix)"*
> >
> >
> >
> > On Thu, May 28, 2015 at 5:00 PM, Chiwan Park <ch...@icloud.com>
> wrote:
> >
> >> I agree that avoiding name which starts with “print” is better.
> >>
> >> Regards,
> >> Chiwan Park
> >>
> >> > On May 28, 2015, at 11:35 PM, Maximilian Michels <mx...@apache.org>
> wrote:
> >> >
> >> > +1 for printOnTaskManager()
> >> >
> >> > On Thu, May 28, 2015 at 2:53 PM, Kruse, Sebastian <
> >> Sebastian.Kruse@hpi.de>
> >> > wrote:
> >> >
> >> >> Thanks, for your quick responses!
> >> >>
> >> >> I also think that renaming the old print method should do the trick.
> As
> >> a
> >> >> contribution to your brainstorming for a name, I propose
> >> logOnTaskManager()
> >> >> ;)
> >> >>
> >> >> Cheers,
> >> >> Sebastian
> >> >>
> >> >> -----Original Message-----
> >> >> From: Fabian Hueske [mailto:fhueske@gmail.com]
> >> >> Sent: Donnerstag, 28. Mai 2015 14:34
> >> >> To: dev@flink.apache.org
> >> >> Subject: Re: Changed the behavior of "DataSet.print()"
> >> >>
> >> >> As I said, the common print prefix might indicate eager execution.
> >> >>
> >> >> I know that writeToTaskManagerStdOut() is quite bulky, but we should
> >> make
> >> >> the difference in the behavior very clear, IMO.
> >> >>
> >> >> 2015-05-28 14:29 GMT+02:00 Stephan Ewen <se...@apache.org>:
> >> >>
> >> >>> Actually, there is a method "print(String prefix)" which still goes
> to
> >> >>> the sysout of where the job is executed.
> >> >>>
> >> >>> Let's give that one the name "printOnTaskManager()" and then we
> should
> >> >>> have it...
> >> >>>
> >> >>> On Thu, May 28, 2015 at 2:13 PM, Fabian Hueske <fh...@gmail.com>
> >> >> wrote:
> >> >>>
> >> >>>> I would avoid to call it printXYZ, since print()'s behavior changed
> >> >>>> to eager execution.
> >> >>>>
> >> >>>> 2015-05-28 14:10 GMT+02:00 Robert Metzger <rm...@apache.org>:
> >> >>>>
> >> >>>>> Okay, you are right, local is actually confusing.
> >> >>>>> I'm against introducing "worker" as a term in the API. Its still
> >> >>>>> called "TaskManager". Maybe "printOnTaskManager()" ?
> >> >>>>>
> >> >>>>> On Thu, May 28, 2015 at 2:06 PM, Fabian Hueske <fhueske@gmail.com
> >
> >> >>>> wrote:
> >> >>>>>
> >> >>>>>> +1 for both.
> >> >>>>>>
> >> >>>>>> printLocal() might not be the best name, because "local" is not
> >> >>>>>> well defined and could also be understood as the local machine
> >> >>>>>> of the
> >> >>> user.
> >> >>>>>> How about naming the method completely different
> >> >>>> (writeToWorkerStdOut()?)
> >> >>>>>> to make sure users are not confused with eager and lazy
> execution?
> >> >>>>>>
> >> >>>>>>
> >> >>>>>> 2015-05-28 13:44 GMT+02:00 Robert Metzger <rm...@apache.org>:
> >> >>>>>>
> >> >>>>>>> Hi Sebastian,
> >> >>>>>>>
> >> >>>>>>> thank you for the feedback. I agree that both variants have a
> >> >>>>>>> right
> >> >>>> to
> >> >>>>>>> exist.
> >> >>>>>>>
> >> >>>>>>> I would vote for adding another method to the DataSet called
> >> >>>>>> "printLocal()"
> >> >>>>>>> that has the old behavior.
> >> >>>>>>>
> >> >>>>>>> On Thu, May 28, 2015 at 1:01 PM, Kruse, Sebastian <
> >> >>>>>> Sebastian.Kruse@hpi.de>
> >> >>>>>>> wrote:
> >> >>>>>>>
> >> >>>>>>>> Hi everyone,
> >> >>>>>>>>
> >> >>>>>>>> I am a bit worried about that recent change of the print()
> >> >>> method.
> >> >>>> I
> >> >>>>>> can
> >> >>>>>>>> understand the rationale that obtaining the stdout from all
> >> >>>>>>>> the taskmanagers is cumbersome (although, for local
> >> >>>>>>>> debugging the old
> >> >>>>>> print()
> >> >>>>>>>> was fine).
> >> >>>>>>>> However, a major problem, I see with the new print(), is,
> >> >>>>>>>> that
> >> >>> now
> >> >>>>> you
> >> >>>>>>> can
> >> >>>>>>>> only have one print() per plan, as the plan is directly
> >> >>>>>>>> executed
> >> >>> as
> >> >>>>>> soon
> >> >>>>>>> as
> >> >>>>>>>> print() is invoked. If you regard print() as a debugging
> >> >>>>>>>> means,
> >> >>>> this
> >> >>>>>> is a
> >> >>>>>>>> severe restriction.
> >> >>>>>>>> I see use cases for both print() implementations, but I
> >> >>>>>>>> would at
> >> >>>>> least
> >> >>>>>>>> provide some kind of backwards compatibility, be at a
> >> >>>>>>>> parameter
> >> >>> or
> >> >>>> a
> >> >>>>>>>> legacyPrint() method or anything else. As I assume print()
> >> >>>>>>>> to be
> >> >>>> very
> >> >>>>>>>> frequently used, a lot of existing programs would benefit
> >> >>>>>>>> from
> >> >>> this
> >> >>>>> and
> >> >>>>>>>> might otherwise not be directly portable to newer Flink
> >> >> versions.
> >> >>>>> What
> >> >>>>>> do
> >> >>>>>>>> you think?
> >> >>>>>>>>
> >> >>>>>>>> Cheers,
> >> >>>>>>>> Sebastian
> >> >>>>>>>>
> >> >>>>>>>> -----Original Message-----
> >> >>>>>>>> From: Robert Metzger [mailto:rmetzger@apache.org]
> >> >>>>>>>> Sent: Dienstag, 26. Mai 2015 11:12
> >> >>>>>>>> To: dev@flink.apache.org
> >> >>>>>>>> Subject: Re: Changed the behavior of "DataSet.print()"
> >> >>>>>>>>
> >> >>>>>>>> I've filed a JIRA to update the documentation:
> >> >>>>>>>> https://issues.apache.org/jira/browse/FLINK-2092
> >> >>>>>>>>
> >> >>>>>>>> On Fri, May 22, 2015 at 11:08 AM, Stephan Ewen
> >> >>>>>>>> <sewen@apache.org
> >> >>>>
> >> >>>>>> wrote:
> >> >>>>>>>>
> >> >>>>>>>>> Hi all!
> >> >>>>>>>>>
> >> >>>>>>>>> Me merged a patch yesterday that changed the API behavior
> >> >>>>>>>>> of
> >> >>> the
> >> >>>>>>>>> "DataSet.print()" function.
> >> >>>>>>>>>
> >> >>>>>>>>> "print()" now prints to stdout on the client process,
> >> >>>>>>>>> rather
> >> >>> than
> >> >>>>> the
> >> >>>>>>>>> TaskManager process, as before. This is much nicer for
> >> >>> debugging
> >> >>>>> and
> >> >>>>>>>>> exploring data sets.
> >> >>>>>>>>>
> >> >>>>>>>>> One implication of this is that print() is now an eager
> >> >>>>>>>>> method
> >> >>> (
> >> >>>>> like
> >> >>>>>>>>> collect() or count() ). That means that calling "print()"
> >> >>>>> immediately
> >> >>>>>>>>> triggers the execution, and no "env.execute()" is required
> >> >>>>>>>>> any
> >> >>>>> more.
> >> >>>>>>>>>
> >> >>>>>>>>> Greetings,
> >> >>>>>>>>> Stephan
> >> >>>>>>>>>
> >> >>>>>>>>>
> >> >>>>>>>>
> >> >>>>>>>
> >> >>>>>>
> >> >>>>>
> >> >>>>
> >> >>>
> >> >>
> >>
> >>
> >>
> >>
> >>
>

Re: Changed the behavior of "DataSet.print()"

Posted by Aljoscha Krettek <al...@apache.org>.
+1 for printOnTaskManager(prefix)

On Tue, Jun 2, 2015 at 11:35 AM, Robert Metzger <rm...@apache.org> wrote:
> I would like to reach consensus on this before the 0.9 release.
>
> So far we have the following ideas:
>
> writeToWorkerStdOut(prefix)
> printOnTaskManager(prefix) (+1)
> logOnTaskManager(prefix)
>
> I'm against logOnTM because we are not logging the output, we are writing
> or printing it.
>
>
> *I would vote for deprecating "print(prefix)" and adding
> "writeToWorkerStdOut(prefix)"*
>
>
>
> On Thu, May 28, 2015 at 5:00 PM, Chiwan Park <ch...@icloud.com> wrote:
>
>> I agree that avoiding name which starts with “print” is better.
>>
>> Regards,
>> Chiwan Park
>>
>> > On May 28, 2015, at 11:35 PM, Maximilian Michels <mx...@apache.org> wrote:
>> >
>> > +1 for printOnTaskManager()
>> >
>> > On Thu, May 28, 2015 at 2:53 PM, Kruse, Sebastian <
>> Sebastian.Kruse@hpi.de>
>> > wrote:
>> >
>> >> Thanks, for your quick responses!
>> >>
>> >> I also think that renaming the old print method should do the trick. As
>> a
>> >> contribution to your brainstorming for a name, I propose
>> logOnTaskManager()
>> >> ;)
>> >>
>> >> Cheers,
>> >> Sebastian
>> >>
>> >> -----Original Message-----
>> >> From: Fabian Hueske [mailto:fhueske@gmail.com]
>> >> Sent: Donnerstag, 28. Mai 2015 14:34
>> >> To: dev@flink.apache.org
>> >> Subject: Re: Changed the behavior of "DataSet.print()"
>> >>
>> >> As I said, the common print prefix might indicate eager execution.
>> >>
>> >> I know that writeToTaskManagerStdOut() is quite bulky, but we should
>> make
>> >> the difference in the behavior very clear, IMO.
>> >>
>> >> 2015-05-28 14:29 GMT+02:00 Stephan Ewen <se...@apache.org>:
>> >>
>> >>> Actually, there is a method "print(String prefix)" which still goes to
>> >>> the sysout of where the job is executed.
>> >>>
>> >>> Let's give that one the name "printOnTaskManager()" and then we should
>> >>> have it...
>> >>>
>> >>> On Thu, May 28, 2015 at 2:13 PM, Fabian Hueske <fh...@gmail.com>
>> >> wrote:
>> >>>
>> >>>> I would avoid to call it printXYZ, since print()'s behavior changed
>> >>>> to eager execution.
>> >>>>
>> >>>> 2015-05-28 14:10 GMT+02:00 Robert Metzger <rm...@apache.org>:
>> >>>>
>> >>>>> Okay, you are right, local is actually confusing.
>> >>>>> I'm against introducing "worker" as a term in the API. Its still
>> >>>>> called "TaskManager". Maybe "printOnTaskManager()" ?
>> >>>>>
>> >>>>> On Thu, May 28, 2015 at 2:06 PM, Fabian Hueske <fh...@gmail.com>
>> >>>> wrote:
>> >>>>>
>> >>>>>> +1 for both.
>> >>>>>>
>> >>>>>> printLocal() might not be the best name, because "local" is not
>> >>>>>> well defined and could also be understood as the local machine
>> >>>>>> of the
>> >>> user.
>> >>>>>> How about naming the method completely different
>> >>>> (writeToWorkerStdOut()?)
>> >>>>>> to make sure users are not confused with eager and lazy execution?
>> >>>>>>
>> >>>>>>
>> >>>>>> 2015-05-28 13:44 GMT+02:00 Robert Metzger <rm...@apache.org>:
>> >>>>>>
>> >>>>>>> Hi Sebastian,
>> >>>>>>>
>> >>>>>>> thank you for the feedback. I agree that both variants have a
>> >>>>>>> right
>> >>>> to
>> >>>>>>> exist.
>> >>>>>>>
>> >>>>>>> I would vote for adding another method to the DataSet called
>> >>>>>> "printLocal()"
>> >>>>>>> that has the old behavior.
>> >>>>>>>
>> >>>>>>> On Thu, May 28, 2015 at 1:01 PM, Kruse, Sebastian <
>> >>>>>> Sebastian.Kruse@hpi.de>
>> >>>>>>> wrote:
>> >>>>>>>
>> >>>>>>>> Hi everyone,
>> >>>>>>>>
>> >>>>>>>> I am a bit worried about that recent change of the print()
>> >>> method.
>> >>>> I
>> >>>>>> can
>> >>>>>>>> understand the rationale that obtaining the stdout from all
>> >>>>>>>> the taskmanagers is cumbersome (although, for local
>> >>>>>>>> debugging the old
>> >>>>>> print()
>> >>>>>>>> was fine).
>> >>>>>>>> However, a major problem, I see with the new print(), is,
>> >>>>>>>> that
>> >>> now
>> >>>>> you
>> >>>>>>> can
>> >>>>>>>> only have one print() per plan, as the plan is directly
>> >>>>>>>> executed
>> >>> as
>> >>>>>> soon
>> >>>>>>> as
>> >>>>>>>> print() is invoked. If you regard print() as a debugging
>> >>>>>>>> means,
>> >>>> this
>> >>>>>> is a
>> >>>>>>>> severe restriction.
>> >>>>>>>> I see use cases for both print() implementations, but I
>> >>>>>>>> would at
>> >>>>> least
>> >>>>>>>> provide some kind of backwards compatibility, be at a
>> >>>>>>>> parameter
>> >>> or
>> >>>> a
>> >>>>>>>> legacyPrint() method or anything else. As I assume print()
>> >>>>>>>> to be
>> >>>> very
>> >>>>>>>> frequently used, a lot of existing programs would benefit
>> >>>>>>>> from
>> >>> this
>> >>>>> and
>> >>>>>>>> might otherwise not be directly portable to newer Flink
>> >> versions.
>> >>>>> What
>> >>>>>> do
>> >>>>>>>> you think?
>> >>>>>>>>
>> >>>>>>>> Cheers,
>> >>>>>>>> Sebastian
>> >>>>>>>>
>> >>>>>>>> -----Original Message-----
>> >>>>>>>> From: Robert Metzger [mailto:rmetzger@apache.org]
>> >>>>>>>> Sent: Dienstag, 26. Mai 2015 11:12
>> >>>>>>>> To: dev@flink.apache.org
>> >>>>>>>> Subject: Re: Changed the behavior of "DataSet.print()"
>> >>>>>>>>
>> >>>>>>>> I've filed a JIRA to update the documentation:
>> >>>>>>>> https://issues.apache.org/jira/browse/FLINK-2092
>> >>>>>>>>
>> >>>>>>>> On Fri, May 22, 2015 at 11:08 AM, Stephan Ewen
>> >>>>>>>> <sewen@apache.org
>> >>>>
>> >>>>>> wrote:
>> >>>>>>>>
>> >>>>>>>>> Hi all!
>> >>>>>>>>>
>> >>>>>>>>> Me merged a patch yesterday that changed the API behavior
>> >>>>>>>>> of
>> >>> the
>> >>>>>>>>> "DataSet.print()" function.
>> >>>>>>>>>
>> >>>>>>>>> "print()" now prints to stdout on the client process,
>> >>>>>>>>> rather
>> >>> than
>> >>>>> the
>> >>>>>>>>> TaskManager process, as before. This is much nicer for
>> >>> debugging
>> >>>>> and
>> >>>>>>>>> exploring data sets.
>> >>>>>>>>>
>> >>>>>>>>> One implication of this is that print() is now an eager
>> >>>>>>>>> method
>> >>> (
>> >>>>> like
>> >>>>>>>>> collect() or count() ). That means that calling "print()"
>> >>>>> immediately
>> >>>>>>>>> triggers the execution, and no "env.execute()" is required
>> >>>>>>>>> any
>> >>>>> more.
>> >>>>>>>>>
>> >>>>>>>>> Greetings,
>> >>>>>>>>> Stephan
>> >>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>
>> >>>>>>>
>> >>>>>>
>> >>>>>
>> >>>>
>> >>>
>> >>
>>
>>
>>
>>
>>

Re: Changed the behavior of "DataSet.print()"

Posted by Robert Metzger <rm...@apache.org>.
I would like to reach consensus on this before the 0.9 release.

So far we have the following ideas:

writeToWorkerStdOut(prefix)
printOnTaskManager(prefix) (+1)
logOnTaskManager(prefix)

I'm against logOnTM because we are not logging the output, we are writing
or printing it.


*I would vote for deprecating "print(prefix)" and adding
"writeToWorkerStdOut(prefix)"*



On Thu, May 28, 2015 at 5:00 PM, Chiwan Park <ch...@icloud.com> wrote:

> I agree that avoiding name which starts with “print” is better.
>
> Regards,
> Chiwan Park
>
> > On May 28, 2015, at 11:35 PM, Maximilian Michels <mx...@apache.org> wrote:
> >
> > +1 for printOnTaskManager()
> >
> > On Thu, May 28, 2015 at 2:53 PM, Kruse, Sebastian <
> Sebastian.Kruse@hpi.de>
> > wrote:
> >
> >> Thanks, for your quick responses!
> >>
> >> I also think that renaming the old print method should do the trick. As
> a
> >> contribution to your brainstorming for a name, I propose
> logOnTaskManager()
> >> ;)
> >>
> >> Cheers,
> >> Sebastian
> >>
> >> -----Original Message-----
> >> From: Fabian Hueske [mailto:fhueske@gmail.com]
> >> Sent: Donnerstag, 28. Mai 2015 14:34
> >> To: dev@flink.apache.org
> >> Subject: Re: Changed the behavior of "DataSet.print()"
> >>
> >> As I said, the common print prefix might indicate eager execution.
> >>
> >> I know that writeToTaskManagerStdOut() is quite bulky, but we should
> make
> >> the difference in the behavior very clear, IMO.
> >>
> >> 2015-05-28 14:29 GMT+02:00 Stephan Ewen <se...@apache.org>:
> >>
> >>> Actually, there is a method "print(String prefix)" which still goes to
> >>> the sysout of where the job is executed.
> >>>
> >>> Let's give that one the name "printOnTaskManager()" and then we should
> >>> have it...
> >>>
> >>> On Thu, May 28, 2015 at 2:13 PM, Fabian Hueske <fh...@gmail.com>
> >> wrote:
> >>>
> >>>> I would avoid to call it printXYZ, since print()'s behavior changed
> >>>> to eager execution.
> >>>>
> >>>> 2015-05-28 14:10 GMT+02:00 Robert Metzger <rm...@apache.org>:
> >>>>
> >>>>> Okay, you are right, local is actually confusing.
> >>>>> I'm against introducing "worker" as a term in the API. Its still
> >>>>> called "TaskManager". Maybe "printOnTaskManager()" ?
> >>>>>
> >>>>> On Thu, May 28, 2015 at 2:06 PM, Fabian Hueske <fh...@gmail.com>
> >>>> wrote:
> >>>>>
> >>>>>> +1 for both.
> >>>>>>
> >>>>>> printLocal() might not be the best name, because "local" is not
> >>>>>> well defined and could also be understood as the local machine
> >>>>>> of the
> >>> user.
> >>>>>> How about naming the method completely different
> >>>> (writeToWorkerStdOut()?)
> >>>>>> to make sure users are not confused with eager and lazy execution?
> >>>>>>
> >>>>>>
> >>>>>> 2015-05-28 13:44 GMT+02:00 Robert Metzger <rm...@apache.org>:
> >>>>>>
> >>>>>>> Hi Sebastian,
> >>>>>>>
> >>>>>>> thank you for the feedback. I agree that both variants have a
> >>>>>>> right
> >>>> to
> >>>>>>> exist.
> >>>>>>>
> >>>>>>> I would vote for adding another method to the DataSet called
> >>>>>> "printLocal()"
> >>>>>>> that has the old behavior.
> >>>>>>>
> >>>>>>> On Thu, May 28, 2015 at 1:01 PM, Kruse, Sebastian <
> >>>>>> Sebastian.Kruse@hpi.de>
> >>>>>>> wrote:
> >>>>>>>
> >>>>>>>> Hi everyone,
> >>>>>>>>
> >>>>>>>> I am a bit worried about that recent change of the print()
> >>> method.
> >>>> I
> >>>>>> can
> >>>>>>>> understand the rationale that obtaining the stdout from all
> >>>>>>>> the taskmanagers is cumbersome (although, for local
> >>>>>>>> debugging the old
> >>>>>> print()
> >>>>>>>> was fine).
> >>>>>>>> However, a major problem, I see with the new print(), is,
> >>>>>>>> that
> >>> now
> >>>>> you
> >>>>>>> can
> >>>>>>>> only have one print() per plan, as the plan is directly
> >>>>>>>> executed
> >>> as
> >>>>>> soon
> >>>>>>> as
> >>>>>>>> print() is invoked. If you regard print() as a debugging
> >>>>>>>> means,
> >>>> this
> >>>>>> is a
> >>>>>>>> severe restriction.
> >>>>>>>> I see use cases for both print() implementations, but I
> >>>>>>>> would at
> >>>>> least
> >>>>>>>> provide some kind of backwards compatibility, be at a
> >>>>>>>> parameter
> >>> or
> >>>> a
> >>>>>>>> legacyPrint() method or anything else. As I assume print()
> >>>>>>>> to be
> >>>> very
> >>>>>>>> frequently used, a lot of existing programs would benefit
> >>>>>>>> from
> >>> this
> >>>>> and
> >>>>>>>> might otherwise not be directly portable to newer Flink
> >> versions.
> >>>>> What
> >>>>>> do
> >>>>>>>> you think?
> >>>>>>>>
> >>>>>>>> Cheers,
> >>>>>>>> Sebastian
> >>>>>>>>
> >>>>>>>> -----Original Message-----
> >>>>>>>> From: Robert Metzger [mailto:rmetzger@apache.org]
> >>>>>>>> Sent: Dienstag, 26. Mai 2015 11:12
> >>>>>>>> To: dev@flink.apache.org
> >>>>>>>> Subject: Re: Changed the behavior of "DataSet.print()"
> >>>>>>>>
> >>>>>>>> I've filed a JIRA to update the documentation:
> >>>>>>>> https://issues.apache.org/jira/browse/FLINK-2092
> >>>>>>>>
> >>>>>>>> On Fri, May 22, 2015 at 11:08 AM, Stephan Ewen
> >>>>>>>> <sewen@apache.org
> >>>>
> >>>>>> wrote:
> >>>>>>>>
> >>>>>>>>> Hi all!
> >>>>>>>>>
> >>>>>>>>> Me merged a patch yesterday that changed the API behavior
> >>>>>>>>> of
> >>> the
> >>>>>>>>> "DataSet.print()" function.
> >>>>>>>>>
> >>>>>>>>> "print()" now prints to stdout on the client process,
> >>>>>>>>> rather
> >>> than
> >>>>> the
> >>>>>>>>> TaskManager process, as before. This is much nicer for
> >>> debugging
> >>>>> and
> >>>>>>>>> exploring data sets.
> >>>>>>>>>
> >>>>>>>>> One implication of this is that print() is now an eager
> >>>>>>>>> method
> >>> (
> >>>>> like
> >>>>>>>>> collect() or count() ). That means that calling "print()"
> >>>>> immediately
> >>>>>>>>> triggers the execution, and no "env.execute()" is required
> >>>>>>>>> any
> >>>>> more.
> >>>>>>>>>
> >>>>>>>>> Greetings,
> >>>>>>>>> Stephan
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
>
>
>
>
>

Re: Changed the behavior of "DataSet.print()"

Posted by Chiwan Park <ch...@icloud.com>.
I agree that avoiding name which starts with “print” is better.

Regards,
Chiwan Park

> On May 28, 2015, at 11:35 PM, Maximilian Michels <mx...@apache.org> wrote:
> 
> +1 for printOnTaskManager()
> 
> On Thu, May 28, 2015 at 2:53 PM, Kruse, Sebastian <Se...@hpi.de>
> wrote:
> 
>> Thanks, for your quick responses!
>> 
>> I also think that renaming the old print method should do the trick. As a
>> contribution to your brainstorming for a name, I propose logOnTaskManager()
>> ;)
>> 
>> Cheers,
>> Sebastian
>> 
>> -----Original Message-----
>> From: Fabian Hueske [mailto:fhueske@gmail.com]
>> Sent: Donnerstag, 28. Mai 2015 14:34
>> To: dev@flink.apache.org
>> Subject: Re: Changed the behavior of "DataSet.print()"
>> 
>> As I said, the common print prefix might indicate eager execution.
>> 
>> I know that writeToTaskManagerStdOut() is quite bulky, but we should make
>> the difference in the behavior very clear, IMO.
>> 
>> 2015-05-28 14:29 GMT+02:00 Stephan Ewen <se...@apache.org>:
>> 
>>> Actually, there is a method "print(String prefix)" which still goes to
>>> the sysout of where the job is executed.
>>> 
>>> Let's give that one the name "printOnTaskManager()" and then we should
>>> have it...
>>> 
>>> On Thu, May 28, 2015 at 2:13 PM, Fabian Hueske <fh...@gmail.com>
>> wrote:
>>> 
>>>> I would avoid to call it printXYZ, since print()'s behavior changed
>>>> to eager execution.
>>>> 
>>>> 2015-05-28 14:10 GMT+02:00 Robert Metzger <rm...@apache.org>:
>>>> 
>>>>> Okay, you are right, local is actually confusing.
>>>>> I'm against introducing "worker" as a term in the API. Its still
>>>>> called "TaskManager". Maybe "printOnTaskManager()" ?
>>>>> 
>>>>> On Thu, May 28, 2015 at 2:06 PM, Fabian Hueske <fh...@gmail.com>
>>>> wrote:
>>>>> 
>>>>>> +1 for both.
>>>>>> 
>>>>>> printLocal() might not be the best name, because "local" is not
>>>>>> well defined and could also be understood as the local machine
>>>>>> of the
>>> user.
>>>>>> How about naming the method completely different
>>>> (writeToWorkerStdOut()?)
>>>>>> to make sure users are not confused with eager and lazy execution?
>>>>>> 
>>>>>> 
>>>>>> 2015-05-28 13:44 GMT+02:00 Robert Metzger <rm...@apache.org>:
>>>>>> 
>>>>>>> Hi Sebastian,
>>>>>>> 
>>>>>>> thank you for the feedback. I agree that both variants have a
>>>>>>> right
>>>> to
>>>>>>> exist.
>>>>>>> 
>>>>>>> I would vote for adding another method to the DataSet called
>>>>>> "printLocal()"
>>>>>>> that has the old behavior.
>>>>>>> 
>>>>>>> On Thu, May 28, 2015 at 1:01 PM, Kruse, Sebastian <
>>>>>> Sebastian.Kruse@hpi.de>
>>>>>>> wrote:
>>>>>>> 
>>>>>>>> Hi everyone,
>>>>>>>> 
>>>>>>>> I am a bit worried about that recent change of the print()
>>> method.
>>>> I
>>>>>> can
>>>>>>>> understand the rationale that obtaining the stdout from all
>>>>>>>> the taskmanagers is cumbersome (although, for local
>>>>>>>> debugging the old
>>>>>> print()
>>>>>>>> was fine).
>>>>>>>> However, a major problem, I see with the new print(), is,
>>>>>>>> that
>>> now
>>>>> you
>>>>>>> can
>>>>>>>> only have one print() per plan, as the plan is directly
>>>>>>>> executed
>>> as
>>>>>> soon
>>>>>>> as
>>>>>>>> print() is invoked. If you regard print() as a debugging
>>>>>>>> means,
>>>> this
>>>>>> is a
>>>>>>>> severe restriction.
>>>>>>>> I see use cases for both print() implementations, but I
>>>>>>>> would at
>>>>> least
>>>>>>>> provide some kind of backwards compatibility, be at a
>>>>>>>> parameter
>>> or
>>>> a
>>>>>>>> legacyPrint() method or anything else. As I assume print()
>>>>>>>> to be
>>>> very
>>>>>>>> frequently used, a lot of existing programs would benefit
>>>>>>>> from
>>> this
>>>>> and
>>>>>>>> might otherwise not be directly portable to newer Flink
>> versions.
>>>>> What
>>>>>> do
>>>>>>>> you think?
>>>>>>>> 
>>>>>>>> Cheers,
>>>>>>>> Sebastian
>>>>>>>> 
>>>>>>>> -----Original Message-----
>>>>>>>> From: Robert Metzger [mailto:rmetzger@apache.org]
>>>>>>>> Sent: Dienstag, 26. Mai 2015 11:12
>>>>>>>> To: dev@flink.apache.org
>>>>>>>> Subject: Re: Changed the behavior of "DataSet.print()"
>>>>>>>> 
>>>>>>>> I've filed a JIRA to update the documentation:
>>>>>>>> https://issues.apache.org/jira/browse/FLINK-2092
>>>>>>>> 
>>>>>>>> On Fri, May 22, 2015 at 11:08 AM, Stephan Ewen
>>>>>>>> <sewen@apache.org
>>>> 
>>>>>> wrote:
>>>>>>>> 
>>>>>>>>> Hi all!
>>>>>>>>> 
>>>>>>>>> Me merged a patch yesterday that changed the API behavior
>>>>>>>>> of
>>> the
>>>>>>>>> "DataSet.print()" function.
>>>>>>>>> 
>>>>>>>>> "print()" now prints to stdout on the client process,
>>>>>>>>> rather
>>> than
>>>>> the
>>>>>>>>> TaskManager process, as before. This is much nicer for
>>> debugging
>>>>> and
>>>>>>>>> exploring data sets.
>>>>>>>>> 
>>>>>>>>> One implication of this is that print() is now an eager
>>>>>>>>> method
>>> (
>>>>> like
>>>>>>>>> collect() or count() ). That means that calling "print()"
>>>>> immediately
>>>>>>>>> triggers the execution, and no "env.execute()" is required
>>>>>>>>> any
>>>>> more.
>>>>>>>>> 
>>>>>>>>> Greetings,
>>>>>>>>> Stephan
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>> 





Re: Changed the behavior of "DataSet.print()"

Posted by Maximilian Michels <mx...@apache.org>.
+1 for printOnTaskManager()

On Thu, May 28, 2015 at 2:53 PM, Kruse, Sebastian <Se...@hpi.de>
wrote:

> Thanks, for your quick responses!
>
> I also think that renaming the old print method should do the trick. As a
> contribution to your brainstorming for a name, I propose logOnTaskManager()
> ;)
>
> Cheers,
> Sebastian
>
> -----Original Message-----
> From: Fabian Hueske [mailto:fhueske@gmail.com]
> Sent: Donnerstag, 28. Mai 2015 14:34
> To: dev@flink.apache.org
> Subject: Re: Changed the behavior of "DataSet.print()"
>
> As I said, the common print prefix might indicate eager execution.
>
> I know that writeToTaskManagerStdOut() is quite bulky, but we should make
> the difference in the behavior very clear, IMO.
>
> 2015-05-28 14:29 GMT+02:00 Stephan Ewen <se...@apache.org>:
>
> > Actually, there is a method "print(String prefix)" which still goes to
> > the sysout of where the job is executed.
> >
> > Let's give that one the name "printOnTaskManager()" and then we should
> > have it...
> >
> > On Thu, May 28, 2015 at 2:13 PM, Fabian Hueske <fh...@gmail.com>
> wrote:
> >
> > > I would avoid to call it printXYZ, since print()'s behavior changed
> > > to eager execution.
> > >
> > > 2015-05-28 14:10 GMT+02:00 Robert Metzger <rm...@apache.org>:
> > >
> > > > Okay, you are right, local is actually confusing.
> > > > I'm against introducing "worker" as a term in the API. Its still
> > > > called "TaskManager". Maybe "printOnTaskManager()" ?
> > > >
> > > > On Thu, May 28, 2015 at 2:06 PM, Fabian Hueske <fh...@gmail.com>
> > > wrote:
> > > >
> > > > > +1 for both.
> > > > >
> > > > > printLocal() might not be the best name, because "local" is not
> > > > > well defined and could also be understood as the local machine
> > > > > of the
> > user.
> > > > > How about naming the method completely different
> > > (writeToWorkerStdOut()?)
> > > > > to make sure users are not confused with eager and lazy execution?
> > > > >
> > > > >
> > > > > 2015-05-28 13:44 GMT+02:00 Robert Metzger <rm...@apache.org>:
> > > > >
> > > > > > Hi Sebastian,
> > > > > >
> > > > > > thank you for the feedback. I agree that both variants have a
> > > > > > right
> > > to
> > > > > > exist.
> > > > > >
> > > > > > I would vote for adding another method to the DataSet called
> > > > > "printLocal()"
> > > > > > that has the old behavior.
> > > > > >
> > > > > > On Thu, May 28, 2015 at 1:01 PM, Kruse, Sebastian <
> > > > > Sebastian.Kruse@hpi.de>
> > > > > > wrote:
> > > > > >
> > > > > > > Hi everyone,
> > > > > > >
> > > > > > > I am a bit worried about that recent change of the print()
> > method.
> > > I
> > > > > can
> > > > > > > understand the rationale that obtaining the stdout from all
> > > > > > > the taskmanagers is cumbersome (although, for local
> > > > > > > debugging the old
> > > > > print()
> > > > > > > was fine).
> > > > > > > However, a major problem, I see with the new print(), is,
> > > > > > > that
> > now
> > > > you
> > > > > > can
> > > > > > > only have one print() per plan, as the plan is directly
> > > > > > > executed
> > as
> > > > > soon
> > > > > > as
> > > > > > > print() is invoked. If you regard print() as a debugging
> > > > > > > means,
> > > this
> > > > > is a
> > > > > > > severe restriction.
> > > > > > > I see use cases for both print() implementations, but I
> > > > > > > would at
> > > > least
> > > > > > > provide some kind of backwards compatibility, be at a
> > > > > > > parameter
> > or
> > > a
> > > > > > > legacyPrint() method or anything else. As I assume print()
> > > > > > > to be
> > > very
> > > > > > > frequently used, a lot of existing programs would benefit
> > > > > > > from
> > this
> > > > and
> > > > > > > might otherwise not be directly portable to newer Flink
> versions.
> > > > What
> > > > > do
> > > > > > > you think?
> > > > > > >
> > > > > > > Cheers,
> > > > > > > Sebastian
> > > > > > >
> > > > > > > -----Original Message-----
> > > > > > > From: Robert Metzger [mailto:rmetzger@apache.org]
> > > > > > > Sent: Dienstag, 26. Mai 2015 11:12
> > > > > > > To: dev@flink.apache.org
> > > > > > > Subject: Re: Changed the behavior of "DataSet.print()"
> > > > > > >
> > > > > > > I've filed a JIRA to update the documentation:
> > > > > > > https://issues.apache.org/jira/browse/FLINK-2092
> > > > > > >
> > > > > > > On Fri, May 22, 2015 at 11:08 AM, Stephan Ewen
> > > > > > > <sewen@apache.org
> > >
> > > > > wrote:
> > > > > > >
> > > > > > > > Hi all!
> > > > > > > >
> > > > > > > > Me merged a patch yesterday that changed the API behavior
> > > > > > > > of
> > the
> > > > > > > > "DataSet.print()" function.
> > > > > > > >
> > > > > > > > "print()" now prints to stdout on the client process,
> > > > > > > > rather
> > than
> > > > the
> > > > > > > > TaskManager process, as before. This is much nicer for
> > debugging
> > > > and
> > > > > > > > exploring data sets.
> > > > > > > >
> > > > > > > > One implication of this is that print() is now an eager
> > > > > > > > method
> > (
> > > > like
> > > > > > > > collect() or count() ). That means that calling "print()"
> > > > immediately
> > > > > > > > triggers the execution, and no "env.execute()" is required
> > > > > > > > any
> > > > more.
> > > > > > > >
> > > > > > > > Greetings,
> > > > > > > > Stephan
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

RE: Changed the behavior of "DataSet.print()"

Posted by "Kruse, Sebastian" <Se...@hpi.de>.
Thanks, for your quick responses!

I also think that renaming the old print method should do the trick. As a contribution to your brainstorming for a name, I propose logOnTaskManager() ;)

Cheers,
Sebastian

-----Original Message-----
From: Fabian Hueske [mailto:fhueske@gmail.com] 
Sent: Donnerstag, 28. Mai 2015 14:34
To: dev@flink.apache.org
Subject: Re: Changed the behavior of "DataSet.print()"

As I said, the common print prefix might indicate eager execution.

I know that writeToTaskManagerStdOut() is quite bulky, but we should make the difference in the behavior very clear, IMO.

2015-05-28 14:29 GMT+02:00 Stephan Ewen <se...@apache.org>:

> Actually, there is a method "print(String prefix)" which still goes to 
> the sysout of where the job is executed.
>
> Let's give that one the name "printOnTaskManager()" and then we should 
> have it...
>
> On Thu, May 28, 2015 at 2:13 PM, Fabian Hueske <fh...@gmail.com> wrote:
>
> > I would avoid to call it printXYZ, since print()'s behavior changed 
> > to eager execution.
> >
> > 2015-05-28 14:10 GMT+02:00 Robert Metzger <rm...@apache.org>:
> >
> > > Okay, you are right, local is actually confusing.
> > > I'm against introducing "worker" as a term in the API. Its still 
> > > called "TaskManager". Maybe "printOnTaskManager()" ?
> > >
> > > On Thu, May 28, 2015 at 2:06 PM, Fabian Hueske <fh...@gmail.com>
> > wrote:
> > >
> > > > +1 for both.
> > > >
> > > > printLocal() might not be the best name, because "local" is not 
> > > > well defined and could also be understood as the local machine 
> > > > of the
> user.
> > > > How about naming the method completely different
> > (writeToWorkerStdOut()?)
> > > > to make sure users are not confused with eager and lazy execution?
> > > >
> > > >
> > > > 2015-05-28 13:44 GMT+02:00 Robert Metzger <rm...@apache.org>:
> > > >
> > > > > Hi Sebastian,
> > > > >
> > > > > thank you for the feedback. I agree that both variants have a 
> > > > > right
> > to
> > > > > exist.
> > > > >
> > > > > I would vote for adding another method to the DataSet called
> > > > "printLocal()"
> > > > > that has the old behavior.
> > > > >
> > > > > On Thu, May 28, 2015 at 1:01 PM, Kruse, Sebastian <
> > > > Sebastian.Kruse@hpi.de>
> > > > > wrote:
> > > > >
> > > > > > Hi everyone,
> > > > > >
> > > > > > I am a bit worried about that recent change of the print()
> method.
> > I
> > > > can
> > > > > > understand the rationale that obtaining the stdout from all 
> > > > > > the taskmanagers is cumbersome (although, for local 
> > > > > > debugging the old
> > > > print()
> > > > > > was fine).
> > > > > > However, a major problem, I see with the new print(), is, 
> > > > > > that
> now
> > > you
> > > > > can
> > > > > > only have one print() per plan, as the plan is directly 
> > > > > > executed
> as
> > > > soon
> > > > > as
> > > > > > print() is invoked. If you regard print() as a debugging 
> > > > > > means,
> > this
> > > > is a
> > > > > > severe restriction.
> > > > > > I see use cases for both print() implementations, but I 
> > > > > > would at
> > > least
> > > > > > provide some kind of backwards compatibility, be at a 
> > > > > > parameter
> or
> > a
> > > > > > legacyPrint() method or anything else. As I assume print() 
> > > > > > to be
> > very
> > > > > > frequently used, a lot of existing programs would benefit 
> > > > > > from
> this
> > > and
> > > > > > might otherwise not be directly portable to newer Flink versions.
> > > What
> > > > do
> > > > > > you think?
> > > > > >
> > > > > > Cheers,
> > > > > > Sebastian
> > > > > >
> > > > > > -----Original Message-----
> > > > > > From: Robert Metzger [mailto:rmetzger@apache.org]
> > > > > > Sent: Dienstag, 26. Mai 2015 11:12
> > > > > > To: dev@flink.apache.org
> > > > > > Subject: Re: Changed the behavior of "DataSet.print()"
> > > > > >
> > > > > > I've filed a JIRA to update the documentation:
> > > > > > https://issues.apache.org/jira/browse/FLINK-2092
> > > > > >
> > > > > > On Fri, May 22, 2015 at 11:08 AM, Stephan Ewen 
> > > > > > <sewen@apache.org
> >
> > > > wrote:
> > > > > >
> > > > > > > Hi all!
> > > > > > >
> > > > > > > Me merged a patch yesterday that changed the API behavior 
> > > > > > > of
> the
> > > > > > > "DataSet.print()" function.
> > > > > > >
> > > > > > > "print()" now prints to stdout on the client process, 
> > > > > > > rather
> than
> > > the
> > > > > > > TaskManager process, as before. This is much nicer for
> debugging
> > > and
> > > > > > > exploring data sets.
> > > > > > >
> > > > > > > One implication of this is that print() is now an eager 
> > > > > > > method
> (
> > > like
> > > > > > > collect() or count() ). That means that calling "print()"
> > > immediately
> > > > > > > triggers the execution, and no "env.execute()" is required 
> > > > > > > any
> > > more.
> > > > > > >
> > > > > > > Greetings,
> > > > > > > Stephan
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Changed the behavior of "DataSet.print()"

Posted by Fabian Hueske <fh...@gmail.com>.
As I said, the common print prefix might indicate eager execution.

I know that writeToTaskManagerStdOut() is quite bulky, but we should make
the difference in the behavior very clear, IMO.

2015-05-28 14:29 GMT+02:00 Stephan Ewen <se...@apache.org>:

> Actually, there is a method "print(String prefix)" which still goes to the
> sysout of where the job is executed.
>
> Let's give that one the name "printOnTaskManager()" and then we should have
> it...
>
> On Thu, May 28, 2015 at 2:13 PM, Fabian Hueske <fh...@gmail.com> wrote:
>
> > I would avoid to call it printXYZ, since print()'s behavior changed to
> > eager execution.
> >
> > 2015-05-28 14:10 GMT+02:00 Robert Metzger <rm...@apache.org>:
> >
> > > Okay, you are right, local is actually confusing.
> > > I'm against introducing "worker" as a term in the API. Its still called
> > > "TaskManager". Maybe "printOnTaskManager()" ?
> > >
> > > On Thu, May 28, 2015 at 2:06 PM, Fabian Hueske <fh...@gmail.com>
> > wrote:
> > >
> > > > +1 for both.
> > > >
> > > > printLocal() might not be the best name, because "local" is not well
> > > > defined and could also be understood as the local machine of the
> user.
> > > > How about naming the method completely different
> > (writeToWorkerStdOut()?)
> > > > to make sure users are not confused with eager and lazy execution?
> > > >
> > > >
> > > > 2015-05-28 13:44 GMT+02:00 Robert Metzger <rm...@apache.org>:
> > > >
> > > > > Hi Sebastian,
> > > > >
> > > > > thank you for the feedback. I agree that both variants have a right
> > to
> > > > > exist.
> > > > >
> > > > > I would vote for adding another method to the DataSet called
> > > > "printLocal()"
> > > > > that has the old behavior.
> > > > >
> > > > > On Thu, May 28, 2015 at 1:01 PM, Kruse, Sebastian <
> > > > Sebastian.Kruse@hpi.de>
> > > > > wrote:
> > > > >
> > > > > > Hi everyone,
> > > > > >
> > > > > > I am a bit worried about that recent change of the print()
> method.
> > I
> > > > can
> > > > > > understand the rationale that obtaining the stdout from all the
> > > > > > taskmanagers is cumbersome (although, for local debugging the old
> > > > print()
> > > > > > was fine).
> > > > > > However, a major problem, I see with the new print(), is, that
> now
> > > you
> > > > > can
> > > > > > only have one print() per plan, as the plan is directly executed
> as
> > > > soon
> > > > > as
> > > > > > print() is invoked. If you regard print() as a debugging means,
> > this
> > > > is a
> > > > > > severe restriction.
> > > > > > I see use cases for both print() implementations, but I would at
> > > least
> > > > > > provide some kind of backwards compatibility, be at a parameter
> or
> > a
> > > > > > legacyPrint() method or anything else. As I assume print() to be
> > very
> > > > > > frequently used, a lot of existing programs would benefit from
> this
> > > and
> > > > > > might otherwise not be directly portable to newer Flink versions.
> > > What
> > > > do
> > > > > > you think?
> > > > > >
> > > > > > Cheers,
> > > > > > Sebastian
> > > > > >
> > > > > > -----Original Message-----
> > > > > > From: Robert Metzger [mailto:rmetzger@apache.org]
> > > > > > Sent: Dienstag, 26. Mai 2015 11:12
> > > > > > To: dev@flink.apache.org
> > > > > > Subject: Re: Changed the behavior of "DataSet.print()"
> > > > > >
> > > > > > I've filed a JIRA to update the documentation:
> > > > > > https://issues.apache.org/jira/browse/FLINK-2092
> > > > > >
> > > > > > On Fri, May 22, 2015 at 11:08 AM, Stephan Ewen <sewen@apache.org
> >
> > > > wrote:
> > > > > >
> > > > > > > Hi all!
> > > > > > >
> > > > > > > Me merged a patch yesterday that changed the API behavior of
> the
> > > > > > > "DataSet.print()" function.
> > > > > > >
> > > > > > > "print()" now prints to stdout on the client process, rather
> than
> > > the
> > > > > > > TaskManager process, as before. This is much nicer for
> debugging
> > > and
> > > > > > > exploring data sets.
> > > > > > >
> > > > > > > One implication of this is that print() is now an eager method
> (
> > > like
> > > > > > > collect() or count() ). That means that calling "print()"
> > > immediately
> > > > > > > triggers the execution, and no "env.execute()" is required any
> > > more.
> > > > > > >
> > > > > > > Greetings,
> > > > > > > Stephan
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Changed the behavior of "DataSet.print()"

Posted by Stephan Ewen <se...@apache.org>.
Actually, there is a method "print(String prefix)" which still goes to the
sysout of where the job is executed.

Let's give that one the name "printOnTaskManager()" and then we should have
it...

On Thu, May 28, 2015 at 2:13 PM, Fabian Hueske <fh...@gmail.com> wrote:

> I would avoid to call it printXYZ, since print()'s behavior changed to
> eager execution.
>
> 2015-05-28 14:10 GMT+02:00 Robert Metzger <rm...@apache.org>:
>
> > Okay, you are right, local is actually confusing.
> > I'm against introducing "worker" as a term in the API. Its still called
> > "TaskManager". Maybe "printOnTaskManager()" ?
> >
> > On Thu, May 28, 2015 at 2:06 PM, Fabian Hueske <fh...@gmail.com>
> wrote:
> >
> > > +1 for both.
> > >
> > > printLocal() might not be the best name, because "local" is not well
> > > defined and could also be understood as the local machine of the user.
> > > How about naming the method completely different
> (writeToWorkerStdOut()?)
> > > to make sure users are not confused with eager and lazy execution?
> > >
> > >
> > > 2015-05-28 13:44 GMT+02:00 Robert Metzger <rm...@apache.org>:
> > >
> > > > Hi Sebastian,
> > > >
> > > > thank you for the feedback. I agree that both variants have a right
> to
> > > > exist.
> > > >
> > > > I would vote for adding another method to the DataSet called
> > > "printLocal()"
> > > > that has the old behavior.
> > > >
> > > > On Thu, May 28, 2015 at 1:01 PM, Kruse, Sebastian <
> > > Sebastian.Kruse@hpi.de>
> > > > wrote:
> > > >
> > > > > Hi everyone,
> > > > >
> > > > > I am a bit worried about that recent change of the print() method.
> I
> > > can
> > > > > understand the rationale that obtaining the stdout from all the
> > > > > taskmanagers is cumbersome (although, for local debugging the old
> > > print()
> > > > > was fine).
> > > > > However, a major problem, I see with the new print(), is, that now
> > you
> > > > can
> > > > > only have one print() per plan, as the plan is directly executed as
> > > soon
> > > > as
> > > > > print() is invoked. If you regard print() as a debugging means,
> this
> > > is a
> > > > > severe restriction.
> > > > > I see use cases for both print() implementations, but I would at
> > least
> > > > > provide some kind of backwards compatibility, be at a parameter or
> a
> > > > > legacyPrint() method or anything else. As I assume print() to be
> very
> > > > > frequently used, a lot of existing programs would benefit from this
> > and
> > > > > might otherwise not be directly portable to newer Flink versions.
> > What
> > > do
> > > > > you think?
> > > > >
> > > > > Cheers,
> > > > > Sebastian
> > > > >
> > > > > -----Original Message-----
> > > > > From: Robert Metzger [mailto:rmetzger@apache.org]
> > > > > Sent: Dienstag, 26. Mai 2015 11:12
> > > > > To: dev@flink.apache.org
> > > > > Subject: Re: Changed the behavior of "DataSet.print()"
> > > > >
> > > > > I've filed a JIRA to update the documentation:
> > > > > https://issues.apache.org/jira/browse/FLINK-2092
> > > > >
> > > > > On Fri, May 22, 2015 at 11:08 AM, Stephan Ewen <se...@apache.org>
> > > wrote:
> > > > >
> > > > > > Hi all!
> > > > > >
> > > > > > Me merged a patch yesterday that changed the API behavior of the
> > > > > > "DataSet.print()" function.
> > > > > >
> > > > > > "print()" now prints to stdout on the client process, rather than
> > the
> > > > > > TaskManager process, as before. This is much nicer for debugging
> > and
> > > > > > exploring data sets.
> > > > > >
> > > > > > One implication of this is that print() is now an eager method (
> > like
> > > > > > collect() or count() ). That means that calling "print()"
> > immediately
> > > > > > triggers the execution, and no "env.execute()" is required any
> > more.
> > > > > >
> > > > > > Greetings,
> > > > > > Stephan
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Changed the behavior of "DataSet.print()"

Posted by Fabian Hueske <fh...@gmail.com>.
I would avoid to call it printXYZ, since print()'s behavior changed to
eager execution.

2015-05-28 14:10 GMT+02:00 Robert Metzger <rm...@apache.org>:

> Okay, you are right, local is actually confusing.
> I'm against introducing "worker" as a term in the API. Its still called
> "TaskManager". Maybe "printOnTaskManager()" ?
>
> On Thu, May 28, 2015 at 2:06 PM, Fabian Hueske <fh...@gmail.com> wrote:
>
> > +1 for both.
> >
> > printLocal() might not be the best name, because "local" is not well
> > defined and could also be understood as the local machine of the user.
> > How about naming the method completely different (writeToWorkerStdOut()?)
> > to make sure users are not confused with eager and lazy execution?
> >
> >
> > 2015-05-28 13:44 GMT+02:00 Robert Metzger <rm...@apache.org>:
> >
> > > Hi Sebastian,
> > >
> > > thank you for the feedback. I agree that both variants have a right to
> > > exist.
> > >
> > > I would vote for adding another method to the DataSet called
> > "printLocal()"
> > > that has the old behavior.
> > >
> > > On Thu, May 28, 2015 at 1:01 PM, Kruse, Sebastian <
> > Sebastian.Kruse@hpi.de>
> > > wrote:
> > >
> > > > Hi everyone,
> > > >
> > > > I am a bit worried about that recent change of the print() method. I
> > can
> > > > understand the rationale that obtaining the stdout from all the
> > > > taskmanagers is cumbersome (although, for local debugging the old
> > print()
> > > > was fine).
> > > > However, a major problem, I see with the new print(), is, that now
> you
> > > can
> > > > only have one print() per plan, as the plan is directly executed as
> > soon
> > > as
> > > > print() is invoked. If you regard print() as a debugging means, this
> > is a
> > > > severe restriction.
> > > > I see use cases for both print() implementations, but I would at
> least
> > > > provide some kind of backwards compatibility, be at a parameter or a
> > > > legacyPrint() method or anything else. As I assume print() to be very
> > > > frequently used, a lot of existing programs would benefit from this
> and
> > > > might otherwise not be directly portable to newer Flink versions.
> What
> > do
> > > > you think?
> > > >
> > > > Cheers,
> > > > Sebastian
> > > >
> > > > -----Original Message-----
> > > > From: Robert Metzger [mailto:rmetzger@apache.org]
> > > > Sent: Dienstag, 26. Mai 2015 11:12
> > > > To: dev@flink.apache.org
> > > > Subject: Re: Changed the behavior of "DataSet.print()"
> > > >
> > > > I've filed a JIRA to update the documentation:
> > > > https://issues.apache.org/jira/browse/FLINK-2092
> > > >
> > > > On Fri, May 22, 2015 at 11:08 AM, Stephan Ewen <se...@apache.org>
> > wrote:
> > > >
> > > > > Hi all!
> > > > >
> > > > > Me merged a patch yesterday that changed the API behavior of the
> > > > > "DataSet.print()" function.
> > > > >
> > > > > "print()" now prints to stdout on the client process, rather than
> the
> > > > > TaskManager process, as before. This is much nicer for debugging
> and
> > > > > exploring data sets.
> > > > >
> > > > > One implication of this is that print() is now an eager method (
> like
> > > > > collect() or count() ). That means that calling "print()"
> immediately
> > > > > triggers the execution, and no "env.execute()" is required any
> more.
> > > > >
> > > > > Greetings,
> > > > > Stephan
> > > > >
> > > > >
> > > >
> > >
> >
>

Re: Changed the behavior of "DataSet.print()"

Posted by Robert Metzger <rm...@apache.org>.
Okay, you are right, local is actually confusing.
I'm against introducing "worker" as a term in the API. Its still called
"TaskManager". Maybe "printOnTaskManager()" ?

On Thu, May 28, 2015 at 2:06 PM, Fabian Hueske <fh...@gmail.com> wrote:

> +1 for both.
>
> printLocal() might not be the best name, because "local" is not well
> defined and could also be understood as the local machine of the user.
> How about naming the method completely different (writeToWorkerStdOut()?)
> to make sure users are not confused with eager and lazy execution?
>
>
> 2015-05-28 13:44 GMT+02:00 Robert Metzger <rm...@apache.org>:
>
> > Hi Sebastian,
> >
> > thank you for the feedback. I agree that both variants have a right to
> > exist.
> >
> > I would vote for adding another method to the DataSet called
> "printLocal()"
> > that has the old behavior.
> >
> > On Thu, May 28, 2015 at 1:01 PM, Kruse, Sebastian <
> Sebastian.Kruse@hpi.de>
> > wrote:
> >
> > > Hi everyone,
> > >
> > > I am a bit worried about that recent change of the print() method. I
> can
> > > understand the rationale that obtaining the stdout from all the
> > > taskmanagers is cumbersome (although, for local debugging the old
> print()
> > > was fine).
> > > However, a major problem, I see with the new print(), is, that now you
> > can
> > > only have one print() per plan, as the plan is directly executed as
> soon
> > as
> > > print() is invoked. If you regard print() as a debugging means, this
> is a
> > > severe restriction.
> > > I see use cases for both print() implementations, but I would at least
> > > provide some kind of backwards compatibility, be at a parameter or a
> > > legacyPrint() method or anything else. As I assume print() to be very
> > > frequently used, a lot of existing programs would benefit from this and
> > > might otherwise not be directly portable to newer Flink versions. What
> do
> > > you think?
> > >
> > > Cheers,
> > > Sebastian
> > >
> > > -----Original Message-----
> > > From: Robert Metzger [mailto:rmetzger@apache.org]
> > > Sent: Dienstag, 26. Mai 2015 11:12
> > > To: dev@flink.apache.org
> > > Subject: Re: Changed the behavior of "DataSet.print()"
> > >
> > > I've filed a JIRA to update the documentation:
> > > https://issues.apache.org/jira/browse/FLINK-2092
> > >
> > > On Fri, May 22, 2015 at 11:08 AM, Stephan Ewen <se...@apache.org>
> wrote:
> > >
> > > > Hi all!
> > > >
> > > > Me merged a patch yesterday that changed the API behavior of the
> > > > "DataSet.print()" function.
> > > >
> > > > "print()" now prints to stdout on the client process, rather than the
> > > > TaskManager process, as before. This is much nicer for debugging and
> > > > exploring data sets.
> > > >
> > > > One implication of this is that print() is now an eager method ( like
> > > > collect() or count() ). That means that calling "print()" immediately
> > > > triggers the execution, and no "env.execute()" is required any more.
> > > >
> > > > Greetings,
> > > > Stephan
> > > >
> > > >
> > >
> >
>

Re: Changed the behavior of "DataSet.print()"

Posted by Fabian Hueske <fh...@gmail.com>.
+1 for both.

printLocal() might not be the best name, because "local" is not well
defined and could also be understood as the local machine of the user.
How about naming the method completely different (writeToWorkerStdOut()?)
to make sure users are not confused with eager and lazy execution?


2015-05-28 13:44 GMT+02:00 Robert Metzger <rm...@apache.org>:

> Hi Sebastian,
>
> thank you for the feedback. I agree that both variants have a right to
> exist.
>
> I would vote for adding another method to the DataSet called "printLocal()"
> that has the old behavior.
>
> On Thu, May 28, 2015 at 1:01 PM, Kruse, Sebastian <Se...@hpi.de>
> wrote:
>
> > Hi everyone,
> >
> > I am a bit worried about that recent change of the print() method. I can
> > understand the rationale that obtaining the stdout from all the
> > taskmanagers is cumbersome (although, for local debugging the old print()
> > was fine).
> > However, a major problem, I see with the new print(), is, that now you
> can
> > only have one print() per plan, as the plan is directly executed as soon
> as
> > print() is invoked. If you regard print() as a debugging means, this is a
> > severe restriction.
> > I see use cases for both print() implementations, but I would at least
> > provide some kind of backwards compatibility, be at a parameter or a
> > legacyPrint() method or anything else. As I assume print() to be very
> > frequently used, a lot of existing programs would benefit from this and
> > might otherwise not be directly portable to newer Flink versions. What do
> > you think?
> >
> > Cheers,
> > Sebastian
> >
> > -----Original Message-----
> > From: Robert Metzger [mailto:rmetzger@apache.org]
> > Sent: Dienstag, 26. Mai 2015 11:12
> > To: dev@flink.apache.org
> > Subject: Re: Changed the behavior of "DataSet.print()"
> >
> > I've filed a JIRA to update the documentation:
> > https://issues.apache.org/jira/browse/FLINK-2092
> >
> > On Fri, May 22, 2015 at 11:08 AM, Stephan Ewen <se...@apache.org> wrote:
> >
> > > Hi all!
> > >
> > > Me merged a patch yesterday that changed the API behavior of the
> > > "DataSet.print()" function.
> > >
> > > "print()" now prints to stdout on the client process, rather than the
> > > TaskManager process, as before. This is much nicer for debugging and
> > > exploring data sets.
> > >
> > > One implication of this is that print() is now an eager method ( like
> > > collect() or count() ). That means that calling "print()" immediately
> > > triggers the execution, and no "env.execute()" is required any more.
> > >
> > > Greetings,
> > > Stephan
> > >
> > >
> >
>

Re: Changed the behavior of "DataSet.print()"

Posted by Robert Metzger <rm...@apache.org>.
Hi Sebastian,

thank you for the feedback. I agree that both variants have a right to
exist.

I would vote for adding another method to the DataSet called "printLocal()"
that has the old behavior.

On Thu, May 28, 2015 at 1:01 PM, Kruse, Sebastian <Se...@hpi.de>
wrote:

> Hi everyone,
>
> I am a bit worried about that recent change of the print() method. I can
> understand the rationale that obtaining the stdout from all the
> taskmanagers is cumbersome (although, for local debugging the old print()
> was fine).
> However, a major problem, I see with the new print(), is, that now you can
> only have one print() per plan, as the plan is directly executed as soon as
> print() is invoked. If you regard print() as a debugging means, this is a
> severe restriction.
> I see use cases for both print() implementations, but I would at least
> provide some kind of backwards compatibility, be at a parameter or a
> legacyPrint() method or anything else. As I assume print() to be very
> frequently used, a lot of existing programs would benefit from this and
> might otherwise not be directly portable to newer Flink versions. What do
> you think?
>
> Cheers,
> Sebastian
>
> -----Original Message-----
> From: Robert Metzger [mailto:rmetzger@apache.org]
> Sent: Dienstag, 26. Mai 2015 11:12
> To: dev@flink.apache.org
> Subject: Re: Changed the behavior of "DataSet.print()"
>
> I've filed a JIRA to update the documentation:
> https://issues.apache.org/jira/browse/FLINK-2092
>
> On Fri, May 22, 2015 at 11:08 AM, Stephan Ewen <se...@apache.org> wrote:
>
> > Hi all!
> >
> > Me merged a patch yesterday that changed the API behavior of the
> > "DataSet.print()" function.
> >
> > "print()" now prints to stdout on the client process, rather than the
> > TaskManager process, as before. This is much nicer for debugging and
> > exploring data sets.
> >
> > One implication of this is that print() is now an eager method ( like
> > collect() or count() ). That means that calling "print()" immediately
> > triggers the execution, and no "env.execute()" is required any more.
> >
> > Greetings,
> > Stephan
> >
> >
>

RE: Changed the behavior of "DataSet.print()"

Posted by "Kruse, Sebastian" <Se...@hpi.de>.
Hi everyone,

I am a bit worried about that recent change of the print() method. I can understand the rationale that obtaining the stdout from all the taskmanagers is cumbersome (although, for local debugging the old print() was fine). 
However, a major problem, I see with the new print(), is, that now you can only have one print() per plan, as the plan is directly executed as soon as print() is invoked. If you regard print() as a debugging means, this is a severe restriction.
I see use cases for both print() implementations, but I would at least provide some kind of backwards compatibility, be at a parameter or a legacyPrint() method or anything else. As I assume print() to be very frequently used, a lot of existing programs would benefit from this and might otherwise not be directly portable to newer Flink versions. What do you think?

Cheers,
Sebastian 

-----Original Message-----
From: Robert Metzger [mailto:rmetzger@apache.org] 
Sent: Dienstag, 26. Mai 2015 11:12
To: dev@flink.apache.org
Subject: Re: Changed the behavior of "DataSet.print()"

I've filed a JIRA to update the documentation:
https://issues.apache.org/jira/browse/FLINK-2092

On Fri, May 22, 2015 at 11:08 AM, Stephan Ewen <se...@apache.org> wrote:

> Hi all!
>
> Me merged a patch yesterday that changed the API behavior of the 
> "DataSet.print()" function.
>
> "print()" now prints to stdout on the client process, rather than the 
> TaskManager process, as before. This is much nicer for debugging and 
> exploring data sets.
>
> One implication of this is that print() is now an eager method ( like
> collect() or count() ). That means that calling "print()" immediately 
> triggers the execution, and no "env.execute()" is required any more.
>
> Greetings,
> Stephan
>
>

Re: Changed the behavior of "DataSet.print()"

Posted by Robert Metzger <rm...@apache.org>.
I've filed a JIRA to update the documentation:
https://issues.apache.org/jira/browse/FLINK-2092

On Fri, May 22, 2015 at 11:08 AM, Stephan Ewen <se...@apache.org> wrote:

> Hi all!
>
> Me merged a patch yesterday that changed the API behavior of the
> "DataSet.print()" function.
>
> "print()" now prints to stdout on the client process, rather than the
> TaskManager process, as before. This is much nicer for debugging and
> exploring data sets.
>
> One implication of this is that print() is now an eager method ( like
> collect() or count() ). That means that calling "print()" immediately
> triggers the execution, and no "env.execute()" is required any more.
>
> Greetings,
> Stephan
>
>