You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flink.apache.org by "Matthias J. Sax" <mj...@informatik.hu-berlin.de> on 2015/05/17 22:38:31 UTC

Re: Package multiple jobs in a single jar

Hi,

I like the idea that Flink's WebClient can show different plans for
different jobs within a single jar file.

I prepared a prototype for this feature. You can find it here:
https://github.com/mjsax/flink/tree/multipleJobsWebUI

To test the feature, you need to prepare a jar file, that contains the
code of multiple programs and specify each entry class in the manifest
file as comma separated values in "program-class" line.

Feedback is welcome. :)


-Matthias


On 05/08/2015 03:08 PM, Flavio Pompermaier wrote:
> Thank you all for the support!
> It will be a really nice feature if the web client could be able to show
> me the list of Flink jobs within my jar..
> it should be sufficient to mark them with a special annotation and
> inspect the classes within the jar..
> 
> On Fri, May 8, 2015 at 3:03 PM, Malte Schwarzer <ms@mieo.de
> <ma...@mieo.de>> wrote:
> 
>     Hi Flavio,
> 
>     you also can put each job in a single class and use the –c parameter
>     to execute jobs separately:
> 
>     /bin/flink run –c com.myflinkjobs.JobA /path/to/jar/multiplejobs.jar
>     /bin/flink run –c com.myflinkjobs.JobB /path/to/jar/multiplejobs.jar
>     …
> 
>     Cheers
>     Malte
> 
>     Von: Robert Metzger <rmetzger@apache.org <ma...@apache.org>>
>     Antworten an: <user@flink.apache.org <ma...@flink.apache.org>>
>     Datum: Freitag, 8. Mai 2015 14:57
>     An: "user@flink.apache.org <ma...@flink.apache.org>"
>     <user@flink.apache.org <ma...@flink.apache.org>>
>     Betreff: Re: Package multiple jobs in a single jar
> 
>     Hi Flavio,
> 
>     the pom from our quickstart is a good
>     reference: https://github.com/apache/flink/blob/master/flink-quickstart/flink-quickstart-java/src/main/resources/archetype-resources/pom.xml
> 
> 
> 
> 
>     On Fri, May 8, 2015 at 2:53 PM, Flavio Pompermaier
>     <pompermaier@okkam.it <ma...@okkam.it>> wrote:
> 
>         Ok, get it.
>         And is there a reference pom.xml for shading my application into
>         one fat-jar? which flink dependencies can I exclude?
> 
>         On Fri, May 8, 2015 at 1:05 PM, Fabian Hueske <fhueske@gmail.com
>         <ma...@gmail.com>> wrote:
> 
>             I didn't say that the main should return the
>             ExecutionEnvironment.
>             You can define and execute as many programs in a main
>             function as you like.
>             The program can be defined somewhere else, e.g., in a
>             function that receives an ExecutionEnvironment and attaches
>             a program such as
> 
>             public void buildMyProgram(ExecutionEnvironment env) {
>               DataSet<String> lines = env.readTextFile(...);
>               // do something
>               lines.writeAsText(...);
>             }
> 
>             That method could be invoked from main():
> 
>             psv main() {
>               ExecutionEnv env = ...
> 
>               if(...) {
>                 buildMyProgram(env);
>               }
>               else {
>                 buildSomeOtherProg(env);
>               }
> 
>               env.execute();
> 
>               // run some more programs
>             }
> 
>             2015-05-08 12:56 GMT+02:00 Flavio Pompermaier
>             <pompermaier@okkam.it <ma...@okkam.it>>:
> 
>                 Hi Fabian,
>                 thanks for the response.
>                 So my mains should be converted in a method returning
>                 the ExecutionEnvironment.
>                 However it think that it will be very nice to have a
>                 syntax like the one of the Hadoop ProgramDriver to
>                 define jobs to invoke from a single root class.
>                 Do you think it could be useful?
> 
>                 On Fri, May 8, 2015 at 12:42 PM, Fabian Hueske
>                 <fhueske@gmail.com <ma...@gmail.com>> wrote:
> 
>                     You easily have multiple Flink programs in a single
>                     JAR file.
>                     A program is defined using an ExecutionEnvironment
>                     and executed when you call
>                     ExecutionEnvironment.exeucte().
>                     Where and how you do that does not matter.
> 
>                     You can for example implement a main function such as:
> 
>                     public static void main(String... args) {
> 
>                       if (today == Monday) {
>                         ExecutionEnvironment env = ...
>                         // define Monday prog
>                         env.execute()
>                       }
>                       else {
>                         ExecutionEnvironment env = ...
>                         // define other prog
>                         env.execute()
>                       }
>                     }
> 
>                     2015-05-08 11:41 GMT+02:00 Flavio Pompermaier
>                     <pompermaier@okkam.it <ma...@okkam.it>>:
> 
>                         Hi to all,
>                         is there any way to keep multiple jobs in a jar
>                         and then choose at runtime the one to execute
>                         (like what ProgramDriver does in Hadoop)?
> 
>                         Best,
>                         Flavio
> 
> 
> 
> 
> 
> 
> 
> 
> 


Re: Package multiple jobs in a single jar

Posted by Maximilian Michels <mx...@apache.org>.
Hi Matthias,

I understand your point about "advertising" the interfaces but there is so
much stuff to be advertised :). Honestly, I think ProgramDescription
doesn't add much value although it is kind of neat. Parameters can be
described in the code or by displaying a help message. However, I'm in
favor of making it easier to list all executable classes in a JAR.
Therefore, I like your proposed changes. I just don't see much of a use of
the Program or ProgramDescription interface in the examples. That's just my
personal opinion.

Best regards,
Max


On Tue, May 26, 2015 at 5:10 PM, Flavio Pompermaier <po...@okkam.it>
wrote:

> I agree with Matthias,I didn't know about ProgramDesciption and Program
> Interfaces because they are not advertised anywhere..
>
> On Tue, May 26, 2015 at 5:01 PM, Matthias J. Sax <
> mjsax@informatik.hu-berlin.de> wrote:
>
> > I see your point.
> >
> > However, right now only few people are aware of "ProgramDesciption"
> > interface. If we want to "advertise" for it, it should be used (at
> > least) in a few examples. Otherwise, people will never use it, and the
> > changes I plan to apply are kind of useless. I would even claim, that
> > the interface should be removed completely is this case...
> >
> >
> > On 05/26/2015 03:31 PM, Maximilian Michels wrote:
> > > Sorry, my bad. Yes, it is helpful to have a separate program and
> > parameter
> > > description in ProgramDescription. I'm not sure if it adds much value
> to
> > > implement ProgramDescription in the examples. It introduces verbosity
> and
> > > might give the impression that you have to implement ProgramDescription
> > in
> > > your Flink job.
> > >
> > > On Tue, May 26, 2015 at 12:00 PM, Matthias J. Sax <
> > > mjsax@informatik.hu-berlin.de> wrote:
> > >
> > >> Hi Max,
> > >>
> > >> thanks for your feedback. I guess you confuse the interfaces "Program"
> > >> and "ProgramDescription". Using "Program" the use of main method is
> > >> replaced by "getPlan(...)". However, "ProgramDescription" only adds
> > >> method "getDescription()" which returns a string that explains the
> usage
> > >> of the program (ie, short description, expected parameters).
> > >>
> > >> Thus, adding "ProgramDescription" to the examples, does not change the
> > >> examples -- main method will still be uses. It only adds the ability
> > >> that a program "explains" itself (ie, give meta info). Furhtermore,
> > >> "ProgramDescription" is also not related to the new "ParameterTool".
> > >>
> > >> -Matthias
> > >>
> > >> On 05/26/2015 11:46 AM, Maximilian Michels wrote:
> > >>> I don't think `getDisplayName()` is necessary either. The class name
> > and
> > >>> the description string should be fine. Adding ProgramDescription to
> the
> > >>> examples is not necessary; as already pointed out, using the main
> > method
> > >> is
> > >>> more convenient for most users. As far as I know, the idea of the
> > >>> ParameterTool was to use it only in the user code and not
> automatically
> > >>> handle parameters.
> > >>>
> > >>> Changing the interface would be quite API breaking but since most
> > >> programs
> > >>> use the main method, IMHO we could do it.
> > >>>
> > >>> On Fri, May 22, 2015 at 10:09 PM, Matthias J. Sax <
> > >>> mjsax@informatik.hu-berlin.de> wrote:
> > >>>
> > >>>> Makes sense to me. :)
> > >>>>
> > >>>> One more thing: What about extending the "ProgramDescription"
> > interface
> > >>>> to have multiple methods as Flavio suggested (with the config(...)
> > >>>> method that should be handle by the ParameterTool)
> > >>>>
> > >>>>> public interface FlinkJob {
> > >>>>>
> > >>>>> /** The name to display in the job submission UI or shell */
> > >>>>> //e.g. "My Flink HelloWorld"
> > >>>>> String getDisplayName();
> > >>>>> //e.g. "This program does this and that etc.."
> > >>>>> String getDescription();
> > >>>>> //e.g. <0,Integer,"An integer representing my first param">,
> > >>>> <1,String,"An string representing my second param">
> > >>>>> List<Tuple3<Integer, TypeInfo, String>> paramDescription;
> > >>>>> /** Set up the flink job in the passed ExecutionEnvironment */
> > >>>>> ExecutionEnvironment config(ExecutionEnvironment env);
> > >>>>> }
> > >>>>
> > >>>> Right now, the interface is used only a couple of times in Flink's
> > code
> > >>>> base, so it would not be a problem to update those classes. However,
> > it
> > >>>> could break external code that uses the interface already (even if I
> > >>>> doubt that the interface is well known and used often [or at all]).
> > >>>>
> > >>>> I personally don't think, that "getDiplayName()" to too helpful.
> > >>>> Splitting the program description and the parameter description
> seems
> > to
> > >>>> be useful. For example, if wrong parameters are provided, the
> > parameter
> > >>>> description can be included in the error message. If
> program+parameter
> > >>>> description is given in a single string, this is not possible. But
> > this
> > >>>> is only a minor issue of course.
> > >>>>
> > >>>> Maybe, we should also add the interface to the current Flink
> examples,
> > >>>> to make people more aware of it. Is there any documentation on the
> web
> > >>>> site.
> > >>>>
> > >>>>
> > >>>> -Matthias
> > >>>>
> > >>>>
> > >>>>
> > >>>> On 05/22/2015 09:43 PM, Robert Metzger wrote:
> > >>>>> Thank you for working on this.
> > >>>>> My responses are inline below:
> > >>>>>
> > >>>>> (Flavio)
> > >>>>>
> > >>>>>> My suggestion is to create a specific Flink interface to get also
> > >>>>>> description of a job and standardize parameter passing.
> > >>>>>
> > >>>>>
> > >>>>> I've recently merged the ParameterTool which is solving the
> > >> "standardize
> > >>>>> parameter passing" problem (at least it presents a best practice) :
> > >>>>>
> > >>>>
> > >>
> >
> http://ci.apache.org/projects/flink/flink-docs-master/apis/best_practices.html#parsing-command-line-arguments-and-passing-them-around-in-your-flink-application
> > >>>>>
> > >>>>> Regarding the description: Maybe we can use the
> "ProgramDescription"
> > >>>>> interface for getting a string describing the program in the web
> > >>>> frontend.
> > >>>>>
> > >>>>> (Matthias)
> > >>>>>
> > >>>>>> I don't want to start working on it, before it's clear that it
> has a
> > >>>>>> chance to be
> > >>>>>> included in Flink.
> > >>>>>
> > >>>>>
> > >>>>> I think the changes discussed here won't change the current
> behavior,
> > >> but
> > >>>>> they add new functionality which
> > >>>>> can make the life of our users easier, so I'll vote to include your
> > >>>> changes
> > >>>>> (given they meet our quality standards)
> > >>>>>
> > >>>>>
> > >>>>> If multiple classes implement "Program" interface an exception
> should
> > >> be
> > >>>>>> through (I think that would make sense). However, I am not sure
> was
> > >>>>>> "good" behavior is, if a single "Program"-class is found and an
> > >>>>>> additional main-method class.
> > >>>>>>   - should "Program"-class be executed (ie, "overwrite"
> main-method
> > >>>> class)
> > >>>>>>   - or, better to through an exception ?
> > >>>>>
> > >>>>>
> > >>>>> I would give a class implementing "Program" priority over a random
> > >> main()
> > >>>>> method in a random class.
> > >>>>> Maybe printing a WARN log message informing the user that the
> > "Program"
> > >>>>> class has been choosen.
> > >>>>>
> > >>>>>
> > >>>>> If no "Program"-class is found, but a single main-method class,
> Flink
> > >>>>>> could execute using main method. But I am not sure either, if this
> > is
> > >>>>>> "good" behavior. If multiple main-method classes are present,
> > throwing
> > >>>>>> and exception is the only way to got, I guess.
> > >>>>>
> > >>>>>
> > >>>>> I think the best effort approach "one class with main() found" is
> > good.
> > >>>> In
> > >>>>> case of multiple main methods, a helpful exception is the best
> > approach
> > >>>> in
> > >>>>> my opinion.
> > >>>>>
> > >>>>>
> > >>>>>  If the manifest contains "program-class" or "Main-Class" entry,
> > >>>>>> should we check the jar file right away if the specified class is
> > >> there?
> > >>>>>> Right now, no check is performed and an error occurs if the user
> > tries
> > >>>>>> to execute the job.
> > >>>>>
> > >>>>>
> > >>>>> I'd say the current approach is sufficient. There is no need to
> have
> > a
> > >>>>> special code path which is doing the check.
> > >>>>> I think the error message will be pretty similar in both cases and
> I
> > >> fear
> > >>>>> that this additional code could also introduce new bugs ;)
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>> On Fri, May 22, 2015 at 9:06 PM, Matthias J. Sax <
> > >>>>> mjsax@informatik.hu-berlin.de> wrote:
> > >>>>>
> > >>>>>> Hi,
> > >>>>>>
> > >>>>>> two more thoughts to this discussion:
> > >>>>>>
> > >>>>>>  1) looking at the commit history of "CliFrontend", I found the
> > >>>>>> following closed issue and the closing pull request
> > >>>>>>     * https://issues.apache.org/jira/browse/FLINK-1095
> > >>>>>>     * https://github.com/apache/flink/pull/238
> > >>>>>> It stand in opposite of Flavio's request to have a job
> description.
> > >> Any
> > >>>>>> comment on this? Should a removed feature be re-introduced? If
> not,
> > I
> > >>>>>> would suggest to remove the "ProgramDescription" interface
> > completely.
> > >>>>>>
> > >>>>>>  2) If the manifest contains "program-class" or "Main-Class"
> entry,
> > >>>>>> should we check the jar file right away if the specified class is
> > >> there?
> > >>>>>> Right now, no check is performed and an error occurs if the user
> > tries
> > >>>>>> to execute the job.
> > >>>>>>
> > >>>>>>
> > >>>>>> -Matthias
> > >>>>>>
> > >>>>>>
> > >>>>>> On 05/22/2015 12:06 PM, Matthias J. Sax wrote:
> > >>>>>>> Thanks for your feedback.
> > >>>>>>>
> > >>>>>>> I agree on the main method "problem". For scanning and listing
> all
> > >>>> stuff
> > >>>>>>> that is found it's fine.
> > >>>>>>>
> > >>>>>>> The tricky question is the automatic invocation mechanism, if
> "-c"
> > >> flag
> > >>>>>>> is not used, and no manifest program-class or Main-Class entry is
> > >>>> found.
> > >>>>>>>
> > >>>>>>> If multiple classes implement "Program" interface an exception
> > should
> > >>>> be
> > >>>>>>> through (I think that would make sense). However, I am not sure
> was
> > >>>>>>> "good" behavior is, if a single "Program"-class is found and an
> > >>>>>>> additional main-method class.
> > >>>>>>>   - should "Program"-class be executed (ie, "overwrite"
> main-method
> > >>>>>> class)
> > >>>>>>>   - or, better to through an exception ?
> > >>>>>>>
> > >>>>>>> If no "Program"-class is found, but a single main-method class,
> > Flink
> > >>>>>>> could execute using main method. But I am not sure either, if
> this
> > is
> > >>>>>>> "good" behavior. If multiple main-method classes are present,
> > >> throwing
> > >>>>>>> and exception is the only way to got, I guess.
> > >>>>>>>
> > >>>>>>> To sum up: Should Flink consider main-method classes for
> automatic
> > >>>>>>> invocation, or should it be required for main-method classes to
> > >> either
> > >>>>>>> list them in "program-class" or "Main-Class" manifest parameter
> (to
> > >>>>>>> enable them for automatic invocation)?
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> -Matthias
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> On 05/22/2015 09:56 AM, Maximilian Michels wrote:
> > >>>>>>>> Hi Matthias,
> > >>>>>>>>
> > >>>>>>>> Thank you for taking the time to analyze Flink's invocation
> > >> behavior.
> > >>>> I
> > >>>>>>>> like your proposal. I'm not sure whether it is a good idea to
> scan
> > >> the
> > >>>>>>>> entire JAR for main methods. Sometimes, main methods are added
> > >> solely
> > >>>>>> for
> > >>>>>>>> testing purposes and don't really serve any practical use.
> > However,
> > >> if
> > >>>>>>>> you're already going through the JAR to find the
> > ProgramDescription
> > >>>>>>>> interface, then you might look for main methods as well. As long
> > as
> > >> it
> > >>>>>> is
> > >>>>>>>> just a listing without execution, that should be fine.
> > >>>>>>>>
> > >>>>>>>> Best regards,
> > >>>>>>>> Max
> > >>>>>>>>
> > >>>>>>>> On Thu, May 21, 2015 at 3:43 PM, Matthias J. Sax <
> > >>>>>>>> mjsax@informatik.hu-berlin.de> wrote:
> > >>>>>>>>
> > >>>>>>>>> Hi,
> > >>>>>>>>>
> > >>>>>>>>> I had a look into the current Workflow of Flink with regard to
> > the
> > >>>>>>>>> progressing steps of a jar file.
> > >>>>>>>>>
> > >>>>>>>>> If I got it right it works as follows (not sure if this is
> > >> documented
> > >>>>>>>>> somewhere):
> > >>>>>>>>>
> > >>>>>>>>> 1) check, if "-c" flag is used to set program entry point
> > >>>>>>>>>    if yes, goto 4
> > >>>>>>>>> 2) try to extract "program-class" property from manifest
> > >>>>>>>>>    (if found goto 4)
> > >>>>>>>>> 3) try to extract "Main-Class" property from manifest
> > >>>>>>>>>    -> if not found through exception (this happens also, if no
> > >>>> manifest
> > >>>>>>>>> file is found at all)
> > >>>>>>>>>
> > >>>>>>>>> 4) check if entry point class implements "Program" interface
> > >>>>>>>>>    if yes, goto 6
> > >>>>>>>>> 5) check if entry point class provided "public static void
> > >>>>>> main(String[]
> > >>>>>>>>> args)" method
> > >>>>>>>>>    -> if not, through exception
> > >>>>>>>>>
> > >>>>>>>>> 6) execute program (ie, show plan/info or really run it)
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> I also "discovered" the interface "ProgramDescription" with a
> > >> single
> > >>>>>>>>> method "String getDescription()". Even if some examples
> implement
> > >>>> this
> > >>>>>>>>> interface (and use it in the example itself), Flink basically
> > >> ignores
> > >>>>>>>>> it... From the CLI there is no way to get this info, and the
> > WebUI
> > >>>> does
> > >>>>>>>>> actually get it if present, however, doesn't show it
> anywhere...
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> I think it would be nice, if we would extend the following
> > >> functions:
> > >>>>>>>>>
> > >>>>>>>>>  - extend the possibility to specify multiple entry classes in
> > >>>>>>>>> "program-class" or "Main-Class" -> in this case, the user needs
> > to
> > >>>> use
> > >>>>>>>>> "-c" flag to pick program to run every time
> > >>>>>>>>>
> > >>>>>>>>>  - add a CLI option that allows the user to see what entry
> point
> > >>>>>> classes
> > >>>>>>>>> are available
> > >>>>>>>>>    for this, consider
> > >>>>>>>>>      a) "program-class" entry
> > >>>>>>>>>      b) "Main-Class" entry
> > >>>>>>>>>      c) if neither is found, scan jar-file for classes
> > implementing
> > >>>>>>>>> "Program" interface
> > >>>>>>>>>      d) if still not found, scan jar-file for classes with
> "main"
> > >>>>>> method
> > >>>>>>>>>
> > >>>>>>>>>  - if user looks for entry point classes via CLI, check for
> > >>>>>>>>> "ProgramDesciption" interface and show info
> > >>>>>>>>>
> > >>>>>>>>>  - extend WebUI to show all available entry-classes (pull
> request
> > >>>>>>>>> already there, for multiple entries in "program-class")
> > >>>>>>>>>
> > >>>>>>>>>  - extend WebUI to show "ProgramDescription" info
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> What do you think? I am not too sure about the "auto scan" of
> the
> > >> jar
> > >>>>>>>>> file if no manifest entry is provided. We might get some "fat
> > jars"
> > >>>> and
> > >>>>>>>>> scanning might take some time.
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> -Matthias
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> On 05/19/2015 10:44 AM, Stephan Ewen wrote:
> > >>>>>>>>>> We actually has an interface like that before ("Program"). It
> is
> > >>>> still
> > >>>>>>>>>> supported, but in all new programs we simply use the Java main
> > >>>> method.
> > >>>>>>>>> The
> > >>>>>>>>>> advantage is that
> > >>>>>>>>>> most IDEs can create executable JARs automatically, setting
> the
> > >> JAR
> > >>>>>>>>>> manifest attributes, etc.
> > >>>>>>>>>>
> > >>>>>>>>>> The "Program" interface still works, though. Most tool classes
> > >> (like
> > >>>>>>>>>> "PackagedProgram") have a way to figure out whether the code
> > uses
> > >>>>>>>>> "main()"
> > >>>>>>>>>> or implements "Program"
> > >>>>>>>>>> and calls the right method.
> > >>>>>>>>>>
> > >>>>>>>>>> You can try and extend the program interface. If you want to
> > >>>>>> consistently
> > >>>>>>>>>> support multiple programs in one JAR file, you may need to
> > adjust
> > >>>> the
> > >>>>>>>>> util
> > >>>>>>>>>> classes as
> > >>>>>>>>>> well to deal with that.
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>> On Tue, May 19, 2015 at 10:10 AM, Matthias J. Sax <
> > >>>>>>>>>> mjsax@informatik.hu-berlin.de> wrote:
> > >>>>>>>>>>
> > >>>>>>>>>>> Supporting an interface like this seems to be a nice idea.
> Any
> > >>>> other
> > >>>>>>>>>>> opinions on it?
> > >>>>>>>>>>>
> > >>>>>>>>>>> It seems to be some more work to get it done right. I don't
> > want
> > >> to
> > >>>>>>>>>>> start working on it, before it's clear that it has a chance
> to
> > be
> > >>>>>>>>>>> included in Flink.
> > >>>>>>>>>>>
> > >>>>>>>>>>> @Flavio: I moved the discussion to dev mailing list (user
> list
> > is
> > >>>> not
> > >>>>>>>>>>> appropriate for this discussion). Are you subscribed to it or
> > >>>> should
> > >>>>>> I
> > >>>>>>>>>>> cc you in each mail?
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>> -Matthias
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>> On 05/19/2015 09:39 AM, Flavio Pompermaier wrote:
> > >>>>>>>>>>>> Nice feature Matthias!
> > >>>>>>>>>>>> My suggestion is to create a specific Flink interface to get
> > >> also
> > >>>>>>>>>>>> description of a job and standardize parameter passing.
> > >>>>>>>>>>>> Then, somewhere (e.g. Manifest) you could specify the list
> of
> > >>>>>> packages
> > >>>>>>>>>>> (or
> > >>>>>>>>>>>> also directly the classes) to inspect with reflection to
> > extract
> > >>>> the
> > >>>>>>>>> list
> > >>>>>>>>>>>> of available Flink jobs.
> > >>>>>>>>>>>> Something like:
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> public interface FlinkJob {
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> /** The name to display in the job submission UI or shell */
> > >>>>>>>>>>>> //e.g. "My Flink HelloWorld"
> > >>>>>>>>>>>> String getDisplayName();
> > >>>>>>>>>>>>  //e.g. "This program does this and that etc.."
> > >>>>>>>>>>>> String getDescription();
> > >>>>>>>>>>>>  //e.g. <0,Integer,"An integer representing my first
> param">,
> > >>>>>>>>>>> <1,String,"An
> > >>>>>>>>>>>> string representing my second param">
> > >>>>>>>>>>>> List<Tuple3<Integer, TypeInfo, String>> paramDescription;
> > >>>>>>>>>>>>  /** Set up the flink job in the passed ExecutionEnvironment
> > */
> > >>>>>>>>>>>> ExecutionEnvironment config(ExecutionEnvironment env);
> > >>>>>>>>>>>> }
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> What do you think?
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> On Sun, May 17, 2015 at 10:38 PM, Matthias J. Sax <
> > >>>>>>>>>>>> mjsax@informatik.hu-berlin.de> wrote:
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>> Hi,
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> I like the idea that Flink's WebClient can show different
> > plans
> > >>>> for
> > >>>>>>>>>>>>> different jobs within a single jar file.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> I prepared a prototype for this feature. You can find it
> > here:
> > >>>>>>>>>>>>> https://github.com/mjsax/flink/tree/multipleJobsWebUI
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> To test the feature, you need to prepare a jar file, that
> > >>>> contains
> > >>>>>> the
> > >>>>>>>>>>>>> code of multiple programs and specify each entry class in
> the
> > >>>>>> manifest
> > >>>>>>>>>>>>> file as comma separated values in "program-class" line.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> Feedback is welcome. :)
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> -Matthias
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> On 05/08/2015 03:08 PM, Flavio Pompermaier wrote:
> > >>>>>>>>>>>>>> Thank you all for the support!
> > >>>>>>>>>>>>>> It will be a really nice feature if the web client could
> be
> > >> able
> > >>>>>> to
> > >>>>>>>>>>> show
> > >>>>>>>>>>>>>> me the list of Flink jobs within my jar..
> > >>>>>>>>>>>>>> it should be sufficient to mark them with a special
> > annotation
> > >>>> and
> > >>>>>>>>>>>>>> inspect the classes within the jar..
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> On Fri, May 8, 2015 at 3:03 PM, Malte Schwarzer <
> ms@mieo.de
> > >>>>>>>>>>>>>> <ma...@mieo.de>> wrote:
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>     Hi Flavio,
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>     you also can put each job in a single class and use
> the
> > –c
> > >>>>>>>>>>> parameter
> > >>>>>>>>>>>>>>     to execute jobs separately:
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>     /bin/flink run –c com.myflinkjobs.JobA
> > >>>>>>>>>>> /path/to/jar/multiplejobs.jar
> > >>>>>>>>>>>>>>     /bin/flink run –c com.myflinkjobs.JobB
> > >>>>>>>>>>> /path/to/jar/multiplejobs.jar
> > >>>>>>>>>>>>>>     …
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>     Cheers
> > >>>>>>>>>>>>>>     Malte
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>     Von: Robert Metzger <rmetzger@apache.org <mailto:
> > >>>>>>>>>>> rmetzger@apache.org
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>     Antworten an: <user@flink.apache.org <mailto:
> > >>>>>>>>> user@flink.apache.org
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>>     Datum: Freitag, 8. Mai 2015 14:57
> > >>>>>>>>>>>>>>     An: "user@flink.apache.org <mailto:
> > user@flink.apache.org
> > >>> "
> > >>>>>>>>>>>>>>     <user@flink.apache.org <mailto:user@flink.apache.org
> >>
> > >>>>>>>>>>>>>>     Betreff: Re: Package multiple jobs in a single jar
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>     Hi Flavio,
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>     the pom from our quickstart is a good
> > >>>>>>>>>>>>>>     reference:
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>
> > >>>>>>
> > >>>>
> > >>
> >
> https://github.com/apache/flink/blob/master/flink-quickstart/flink-quickstart-java/src/main/resources/archetype-resources/pom.xml
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>     On Fri, May 8, 2015 at 2:53 PM, Flavio Pompermaier
> > >>>>>>>>>>>>>>     <pompermaier@okkam.it <ma...@okkam.it>>
> > >> wrote:
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>         Ok, get it.
> > >>>>>>>>>>>>>>         And is there a reference pom.xml for shading my
> > >>>>>> application
> > >>>>>>>>>>> into
> > >>>>>>>>>>>>>>         one fat-jar? which flink dependencies can I
> exclude?
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>         On Fri, May 8, 2015 at 1:05 PM, Fabian Hueske <
> > >>>>>>>>>>> fhueske@gmail.com
> > >>>>>>>>>>>>>>         <ma...@gmail.com>> wrote:
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>             I didn't say that the main should return the
> > >>>>>>>>>>>>>>             ExecutionEnvironment.
> > >>>>>>>>>>>>>>             You can define and execute as many programs
> in a
> > >>>> main
> > >>>>>>>>>>>>>>             function as you like.
> > >>>>>>>>>>>>>>             The program can be defined somewhere else,
> e.g.,
> > >> in
> > >>>> a
> > >>>>>>>>>>>>>>             function that receives an ExecutionEnvironment
> > and
> > >>>>>>>>> attaches
> > >>>>>>>>>>>>>>             a program such as
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>             public void
> buildMyProgram(ExecutionEnvironment
> > >>>> env) {
> > >>>>>>>>>>>>>>               DataSet<String> lines =
> env.readTextFile(...);
> > >>>>>>>>>>>>>>               // do something
> > >>>>>>>>>>>>>>               lines.writeAsText(...);
> > >>>>>>>>>>>>>>             }
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>             That method could be invoked from main():
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>             psv main() {
> > >>>>>>>>>>>>>>               ExecutionEnv env = ...
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>               if(...) {
> > >>>>>>>>>>>>>>                 buildMyProgram(env);
> > >>>>>>>>>>>>>>               }
> > >>>>>>>>>>>>>>               else {
> > >>>>>>>>>>>>>>                 buildSomeOtherProg(env);
> > >>>>>>>>>>>>>>               }
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>               env.execute();
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>               // run some more programs
> > >>>>>>>>>>>>>>             }
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>             2015-05-08 12:56 GMT+02:00 Flavio Pompermaier
> > >>>>>>>>>>>>>>             <pompermaier@okkam.it <mailto:
> > >> pompermaier@okkam.it
> > >>>>>> :
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>                 Hi Fabian,
> > >>>>>>>>>>>>>>                 thanks for the response.
> > >>>>>>>>>>>>>>                 So my mains should be converted in a
> method
> > >>>>>> returning
> > >>>>>>>>>>>>>>                 the ExecutionEnvironment.
> > >>>>>>>>>>>>>>                 However it think that it will be very nice
> > to
> > >>>>>> have a
> > >>>>>>>>>>>>>>                 syntax like the one of the Hadoop
> > >> ProgramDriver
> > >>>> to
> > >>>>>>>>>>>>>>                 define jobs to invoke from a single root
> > >> class.
> > >>>>>>>>>>>>>>                 Do you think it could be useful?
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>                 On Fri, May 8, 2015 at 12:42 PM, Fabian
> > Hueske
> > >>>>>>>>>>>>>>                 <fhueske@gmail.com <mailto:
> > fhueske@gmail.com
> > >>>>
> > >>>>>>>>> wrote:
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>                     You easily have multiple Flink
> programs
> > >> in a
> > >>>>>>>>> single
> > >>>>>>>>>>>>>>                     JAR file.
> > >>>>>>>>>>>>>>                     A program is defined using an
> > >>>>>>>>> ExecutionEnvironment
> > >>>>>>>>>>>>>>                     and executed when you call
> > >>>>>>>>>>>>>>                     ExecutionEnvironment.exeucte().
> > >>>>>>>>>>>>>>                     Where and how you do that does not
> > matter.
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>                     You can for example implement a main
> > >>>> function
> > >>>>>>>>> such
> > >>>>>>>>>>>>> as:
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>                     public static void main(String...
> args)
> > {
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>                       if (today == Monday) {
> > >>>>>>>>>>>>>>                         ExecutionEnvironment env = ...
> > >>>>>>>>>>>>>>                         // define Monday prog
> > >>>>>>>>>>>>>>                         env.execute()
> > >>>>>>>>>>>>>>                       }
> > >>>>>>>>>>>>>>                       else {
> > >>>>>>>>>>>>>>                         ExecutionEnvironment env = ...
> > >>>>>>>>>>>>>>                         // define other prog
> > >>>>>>>>>>>>>>                         env.execute()
> > >>>>>>>>>>>>>>                       }
> > >>>>>>>>>>>>>>                     }
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>                     2015-05-08 11:41 GMT+02:00 Flavio
> > >>>> Pompermaier
> > >>>>>>>>>>>>>>                     <pompermaier@okkam.it <mailto:
> > >>>>>>>>> pompermaier@okkam.it
> > >>>>>>>>>>>>>>> :
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>                         Hi to all,
> > >>>>>>>>>>>>>>                         is there any way to keep multiple
> > jobs
> > >>>> in
> > >>>>>> a
> > >>>>>>>>> jar
> > >>>>>>>>>>>>>>                         and then choose at runtime the one
> > to
> > >>>>>> execute
> > >>>>>>>>>>>>>>                         (like what ProgramDriver does in
> > >>>> Hadoop)?
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>                         Best,
> > >>>>>>>>>>>>>>                         Flavio
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>
> > >>>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>
> > >>>>
> > >>>>
> > >>>
> > >>
> > >>
> > >
> >
> >
>

Re: Package multiple jobs in a single jar

Posted by Flavio Pompermaier <po...@okkam.it>.
I agree with Matthias,I didn't know about ProgramDesciption and Program
Interfaces because they are not advertised anywhere..

On Tue, May 26, 2015 at 5:01 PM, Matthias J. Sax <
mjsax@informatik.hu-berlin.de> wrote:

> I see your point.
>
> However, right now only few people are aware of "ProgramDesciption"
> interface. If we want to "advertise" for it, it should be used (at
> least) in a few examples. Otherwise, people will never use it, and the
> changes I plan to apply are kind of useless. I would even claim, that
> the interface should be removed completely is this case...
>
>
> On 05/26/2015 03:31 PM, Maximilian Michels wrote:
> > Sorry, my bad. Yes, it is helpful to have a separate program and
> parameter
> > description in ProgramDescription. I'm not sure if it adds much value to
> > implement ProgramDescription in the examples. It introduces verbosity and
> > might give the impression that you have to implement ProgramDescription
> in
> > your Flink job.
> >
> > On Tue, May 26, 2015 at 12:00 PM, Matthias J. Sax <
> > mjsax@informatik.hu-berlin.de> wrote:
> >
> >> Hi Max,
> >>
> >> thanks for your feedback. I guess you confuse the interfaces "Program"
> >> and "ProgramDescription". Using "Program" the use of main method is
> >> replaced by "getPlan(...)". However, "ProgramDescription" only adds
> >> method "getDescription()" which returns a string that explains the usage
> >> of the program (ie, short description, expected parameters).
> >>
> >> Thus, adding "ProgramDescription" to the examples, does not change the
> >> examples -- main method will still be uses. It only adds the ability
> >> that a program "explains" itself (ie, give meta info). Furhtermore,
> >> "ProgramDescription" is also not related to the new "ParameterTool".
> >>
> >> -Matthias
> >>
> >> On 05/26/2015 11:46 AM, Maximilian Michels wrote:
> >>> I don't think `getDisplayName()` is necessary either. The class name
> and
> >>> the description string should be fine. Adding ProgramDescription to the
> >>> examples is not necessary; as already pointed out, using the main
> method
> >> is
> >>> more convenient for most users. As far as I know, the idea of the
> >>> ParameterTool was to use it only in the user code and not automatically
> >>> handle parameters.
> >>>
> >>> Changing the interface would be quite API breaking but since most
> >> programs
> >>> use the main method, IMHO we could do it.
> >>>
> >>> On Fri, May 22, 2015 at 10:09 PM, Matthias J. Sax <
> >>> mjsax@informatik.hu-berlin.de> wrote:
> >>>
> >>>> Makes sense to me. :)
> >>>>
> >>>> One more thing: What about extending the "ProgramDescription"
> interface
> >>>> to have multiple methods as Flavio suggested (with the config(...)
> >>>> method that should be handle by the ParameterTool)
> >>>>
> >>>>> public interface FlinkJob {
> >>>>>
> >>>>> /** The name to display in the job submission UI or shell */
> >>>>> //e.g. "My Flink HelloWorld"
> >>>>> String getDisplayName();
> >>>>> //e.g. "This program does this and that etc.."
> >>>>> String getDescription();
> >>>>> //e.g. <0,Integer,"An integer representing my first param">,
> >>>> <1,String,"An string representing my second param">
> >>>>> List<Tuple3<Integer, TypeInfo, String>> paramDescription;
> >>>>> /** Set up the flink job in the passed ExecutionEnvironment */
> >>>>> ExecutionEnvironment config(ExecutionEnvironment env);
> >>>>> }
> >>>>
> >>>> Right now, the interface is used only a couple of times in Flink's
> code
> >>>> base, so it would not be a problem to update those classes. However,
> it
> >>>> could break external code that uses the interface already (even if I
> >>>> doubt that the interface is well known and used often [or at all]).
> >>>>
> >>>> I personally don't think, that "getDiplayName()" to too helpful.
> >>>> Splitting the program description and the parameter description seems
> to
> >>>> be useful. For example, if wrong parameters are provided, the
> parameter
> >>>> description can be included in the error message. If program+parameter
> >>>> description is given in a single string, this is not possible. But
> this
> >>>> is only a minor issue of course.
> >>>>
> >>>> Maybe, we should also add the interface to the current Flink examples,
> >>>> to make people more aware of it. Is there any documentation on the web
> >>>> site.
> >>>>
> >>>>
> >>>> -Matthias
> >>>>
> >>>>
> >>>>
> >>>> On 05/22/2015 09:43 PM, Robert Metzger wrote:
> >>>>> Thank you for working on this.
> >>>>> My responses are inline below:
> >>>>>
> >>>>> (Flavio)
> >>>>>
> >>>>>> My suggestion is to create a specific Flink interface to get also
> >>>>>> description of a job and standardize parameter passing.
> >>>>>
> >>>>>
> >>>>> I've recently merged the ParameterTool which is solving the
> >> "standardize
> >>>>> parameter passing" problem (at least it presents a best practice) :
> >>>>>
> >>>>
> >>
> http://ci.apache.org/projects/flink/flink-docs-master/apis/best_practices.html#parsing-command-line-arguments-and-passing-them-around-in-your-flink-application
> >>>>>
> >>>>> Regarding the description: Maybe we can use the "ProgramDescription"
> >>>>> interface for getting a string describing the program in the web
> >>>> frontend.
> >>>>>
> >>>>> (Matthias)
> >>>>>
> >>>>>> I don't want to start working on it, before it's clear that it has a
> >>>>>> chance to be
> >>>>>> included in Flink.
> >>>>>
> >>>>>
> >>>>> I think the changes discussed here won't change the current behavior,
> >> but
> >>>>> they add new functionality which
> >>>>> can make the life of our users easier, so I'll vote to include your
> >>>> changes
> >>>>> (given they meet our quality standards)
> >>>>>
> >>>>>
> >>>>> If multiple classes implement "Program" interface an exception should
> >> be
> >>>>>> through (I think that would make sense). However, I am not sure was
> >>>>>> "good" behavior is, if a single "Program"-class is found and an
> >>>>>> additional main-method class.
> >>>>>>   - should "Program"-class be executed (ie, "overwrite" main-method
> >>>> class)
> >>>>>>   - or, better to through an exception ?
> >>>>>
> >>>>>
> >>>>> I would give a class implementing "Program" priority over a random
> >> main()
> >>>>> method in a random class.
> >>>>> Maybe printing a WARN log message informing the user that the
> "Program"
> >>>>> class has been choosen.
> >>>>>
> >>>>>
> >>>>> If no "Program"-class is found, but a single main-method class, Flink
> >>>>>> could execute using main method. But I am not sure either, if this
> is
> >>>>>> "good" behavior. If multiple main-method classes are present,
> throwing
> >>>>>> and exception is the only way to got, I guess.
> >>>>>
> >>>>>
> >>>>> I think the best effort approach "one class with main() found" is
> good.
> >>>> In
> >>>>> case of multiple main methods, a helpful exception is the best
> approach
> >>>> in
> >>>>> my opinion.
> >>>>>
> >>>>>
> >>>>>  If the manifest contains "program-class" or "Main-Class" entry,
> >>>>>> should we check the jar file right away if the specified class is
> >> there?
> >>>>>> Right now, no check is performed and an error occurs if the user
> tries
> >>>>>> to execute the job.
> >>>>>
> >>>>>
> >>>>> I'd say the current approach is sufficient. There is no need to have
> a
> >>>>> special code path which is doing the check.
> >>>>> I think the error message will be pretty similar in both cases and I
> >> fear
> >>>>> that this additional code could also introduce new bugs ;)
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>> On Fri, May 22, 2015 at 9:06 PM, Matthias J. Sax <
> >>>>> mjsax@informatik.hu-berlin.de> wrote:
> >>>>>
> >>>>>> Hi,
> >>>>>>
> >>>>>> two more thoughts to this discussion:
> >>>>>>
> >>>>>>  1) looking at the commit history of "CliFrontend", I found the
> >>>>>> following closed issue and the closing pull request
> >>>>>>     * https://issues.apache.org/jira/browse/FLINK-1095
> >>>>>>     * https://github.com/apache/flink/pull/238
> >>>>>> It stand in opposite of Flavio's request to have a job description.
> >> Any
> >>>>>> comment on this? Should a removed feature be re-introduced? If not,
> I
> >>>>>> would suggest to remove the "ProgramDescription" interface
> completely.
> >>>>>>
> >>>>>>  2) If the manifest contains "program-class" or "Main-Class" entry,
> >>>>>> should we check the jar file right away if the specified class is
> >> there?
> >>>>>> Right now, no check is performed and an error occurs if the user
> tries
> >>>>>> to execute the job.
> >>>>>>
> >>>>>>
> >>>>>> -Matthias
> >>>>>>
> >>>>>>
> >>>>>> On 05/22/2015 12:06 PM, Matthias J. Sax wrote:
> >>>>>>> Thanks for your feedback.
> >>>>>>>
> >>>>>>> I agree on the main method "problem". For scanning and listing all
> >>>> stuff
> >>>>>>> that is found it's fine.
> >>>>>>>
> >>>>>>> The tricky question is the automatic invocation mechanism, if "-c"
> >> flag
> >>>>>>> is not used, and no manifest program-class or Main-Class entry is
> >>>> found.
> >>>>>>>
> >>>>>>> If multiple classes implement "Program" interface an exception
> should
> >>>> be
> >>>>>>> through (I think that would make sense). However, I am not sure was
> >>>>>>> "good" behavior is, if a single "Program"-class is found and an
> >>>>>>> additional main-method class.
> >>>>>>>   - should "Program"-class be executed (ie, "overwrite" main-method
> >>>>>> class)
> >>>>>>>   - or, better to through an exception ?
> >>>>>>>
> >>>>>>> If no "Program"-class is found, but a single main-method class,
> Flink
> >>>>>>> could execute using main method. But I am not sure either, if this
> is
> >>>>>>> "good" behavior. If multiple main-method classes are present,
> >> throwing
> >>>>>>> and exception is the only way to got, I guess.
> >>>>>>>
> >>>>>>> To sum up: Should Flink consider main-method classes for automatic
> >>>>>>> invocation, or should it be required for main-method classes to
> >> either
> >>>>>>> list them in "program-class" or "Main-Class" manifest parameter (to
> >>>>>>> enable them for automatic invocation)?
> >>>>>>>
> >>>>>>>
> >>>>>>> -Matthias
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> On 05/22/2015 09:56 AM, Maximilian Michels wrote:
> >>>>>>>> Hi Matthias,
> >>>>>>>>
> >>>>>>>> Thank you for taking the time to analyze Flink's invocation
> >> behavior.
> >>>> I
> >>>>>>>> like your proposal. I'm not sure whether it is a good idea to scan
> >> the
> >>>>>>>> entire JAR for main methods. Sometimes, main methods are added
> >> solely
> >>>>>> for
> >>>>>>>> testing purposes and don't really serve any practical use.
> However,
> >> if
> >>>>>>>> you're already going through the JAR to find the
> ProgramDescription
> >>>>>>>> interface, then you might look for main methods as well. As long
> as
> >> it
> >>>>>> is
> >>>>>>>> just a listing without execution, that should be fine.
> >>>>>>>>
> >>>>>>>> Best regards,
> >>>>>>>> Max
> >>>>>>>>
> >>>>>>>> On Thu, May 21, 2015 at 3:43 PM, Matthias J. Sax <
> >>>>>>>> mjsax@informatik.hu-berlin.de> wrote:
> >>>>>>>>
> >>>>>>>>> Hi,
> >>>>>>>>>
> >>>>>>>>> I had a look into the current Workflow of Flink with regard to
> the
> >>>>>>>>> progressing steps of a jar file.
> >>>>>>>>>
> >>>>>>>>> If I got it right it works as follows (not sure if this is
> >> documented
> >>>>>>>>> somewhere):
> >>>>>>>>>
> >>>>>>>>> 1) check, if "-c" flag is used to set program entry point
> >>>>>>>>>    if yes, goto 4
> >>>>>>>>> 2) try to extract "program-class" property from manifest
> >>>>>>>>>    (if found goto 4)
> >>>>>>>>> 3) try to extract "Main-Class" property from manifest
> >>>>>>>>>    -> if not found through exception (this happens also, if no
> >>>> manifest
> >>>>>>>>> file is found at all)
> >>>>>>>>>
> >>>>>>>>> 4) check if entry point class implements "Program" interface
> >>>>>>>>>    if yes, goto 6
> >>>>>>>>> 5) check if entry point class provided "public static void
> >>>>>> main(String[]
> >>>>>>>>> args)" method
> >>>>>>>>>    -> if not, through exception
> >>>>>>>>>
> >>>>>>>>> 6) execute program (ie, show plan/info or really run it)
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> I also "discovered" the interface "ProgramDescription" with a
> >> single
> >>>>>>>>> method "String getDescription()". Even if some examples implement
> >>>> this
> >>>>>>>>> interface (and use it in the example itself), Flink basically
> >> ignores
> >>>>>>>>> it... From the CLI there is no way to get this info, and the
> WebUI
> >>>> does
> >>>>>>>>> actually get it if present, however, doesn't show it anywhere...
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> I think it would be nice, if we would extend the following
> >> functions:
> >>>>>>>>>
> >>>>>>>>>  - extend the possibility to specify multiple entry classes in
> >>>>>>>>> "program-class" or "Main-Class" -> in this case, the user needs
> to
> >>>> use
> >>>>>>>>> "-c" flag to pick program to run every time
> >>>>>>>>>
> >>>>>>>>>  - add a CLI option that allows the user to see what entry point
> >>>>>> classes
> >>>>>>>>> are available
> >>>>>>>>>    for this, consider
> >>>>>>>>>      a) "program-class" entry
> >>>>>>>>>      b) "Main-Class" entry
> >>>>>>>>>      c) if neither is found, scan jar-file for classes
> implementing
> >>>>>>>>> "Program" interface
> >>>>>>>>>      d) if still not found, scan jar-file for classes with "main"
> >>>>>> method
> >>>>>>>>>
> >>>>>>>>>  - if user looks for entry point classes via CLI, check for
> >>>>>>>>> "ProgramDesciption" interface and show info
> >>>>>>>>>
> >>>>>>>>>  - extend WebUI to show all available entry-classes (pull request
> >>>>>>>>> already there, for multiple entries in "program-class")
> >>>>>>>>>
> >>>>>>>>>  - extend WebUI to show "ProgramDescription" info
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> What do you think? I am not too sure about the "auto scan" of the
> >> jar
> >>>>>>>>> file if no manifest entry is provided. We might get some "fat
> jars"
> >>>> and
> >>>>>>>>> scanning might take some time.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> -Matthias
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> On 05/19/2015 10:44 AM, Stephan Ewen wrote:
> >>>>>>>>>> We actually has an interface like that before ("Program"). It is
> >>>> still
> >>>>>>>>>> supported, but in all new programs we simply use the Java main
> >>>> method.
> >>>>>>>>> The
> >>>>>>>>>> advantage is that
> >>>>>>>>>> most IDEs can create executable JARs automatically, setting the
> >> JAR
> >>>>>>>>>> manifest attributes, etc.
> >>>>>>>>>>
> >>>>>>>>>> The "Program" interface still works, though. Most tool classes
> >> (like
> >>>>>>>>>> "PackagedProgram") have a way to figure out whether the code
> uses
> >>>>>>>>> "main()"
> >>>>>>>>>> or implements "Program"
> >>>>>>>>>> and calls the right method.
> >>>>>>>>>>
> >>>>>>>>>> You can try and extend the program interface. If you want to
> >>>>>> consistently
> >>>>>>>>>> support multiple programs in one JAR file, you may need to
> adjust
> >>>> the
> >>>>>>>>> util
> >>>>>>>>>> classes as
> >>>>>>>>>> well to deal with that.
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> On Tue, May 19, 2015 at 10:10 AM, Matthias J. Sax <
> >>>>>>>>>> mjsax@informatik.hu-berlin.de> wrote:
> >>>>>>>>>>
> >>>>>>>>>>> Supporting an interface like this seems to be a nice idea. Any
> >>>> other
> >>>>>>>>>>> opinions on it?
> >>>>>>>>>>>
> >>>>>>>>>>> It seems to be some more work to get it done right. I don't
> want
> >> to
> >>>>>>>>>>> start working on it, before it's clear that it has a chance to
> be
> >>>>>>>>>>> included in Flink.
> >>>>>>>>>>>
> >>>>>>>>>>> @Flavio: I moved the discussion to dev mailing list (user list
> is
> >>>> not
> >>>>>>>>>>> appropriate for this discussion). Are you subscribed to it or
> >>>> should
> >>>>>> I
> >>>>>>>>>>> cc you in each mail?
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> -Matthias
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> On 05/19/2015 09:39 AM, Flavio Pompermaier wrote:
> >>>>>>>>>>>> Nice feature Matthias!
> >>>>>>>>>>>> My suggestion is to create a specific Flink interface to get
> >> also
> >>>>>>>>>>>> description of a job and standardize parameter passing.
> >>>>>>>>>>>> Then, somewhere (e.g. Manifest) you could specify the list of
> >>>>>> packages
> >>>>>>>>>>> (or
> >>>>>>>>>>>> also directly the classes) to inspect with reflection to
> extract
> >>>> the
> >>>>>>>>> list
> >>>>>>>>>>>> of available Flink jobs.
> >>>>>>>>>>>> Something like:
> >>>>>>>>>>>>
> >>>>>>>>>>>> public interface FlinkJob {
> >>>>>>>>>>>>
> >>>>>>>>>>>> /** The name to display in the job submission UI or shell */
> >>>>>>>>>>>> //e.g. "My Flink HelloWorld"
> >>>>>>>>>>>> String getDisplayName();
> >>>>>>>>>>>>  //e.g. "This program does this and that etc.."
> >>>>>>>>>>>> String getDescription();
> >>>>>>>>>>>>  //e.g. <0,Integer,"An integer representing my first param">,
> >>>>>>>>>>> <1,String,"An
> >>>>>>>>>>>> string representing my second param">
> >>>>>>>>>>>> List<Tuple3<Integer, TypeInfo, String>> paramDescription;
> >>>>>>>>>>>>  /** Set up the flink job in the passed ExecutionEnvironment
> */
> >>>>>>>>>>>> ExecutionEnvironment config(ExecutionEnvironment env);
> >>>>>>>>>>>> }
> >>>>>>>>>>>>
> >>>>>>>>>>>> What do you think?
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> On Sun, May 17, 2015 at 10:38 PM, Matthias J. Sax <
> >>>>>>>>>>>> mjsax@informatik.hu-berlin.de> wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>>> Hi,
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> I like the idea that Flink's WebClient can show different
> plans
> >>>> for
> >>>>>>>>>>>>> different jobs within a single jar file.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> I prepared a prototype for this feature. You can find it
> here:
> >>>>>>>>>>>>> https://github.com/mjsax/flink/tree/multipleJobsWebUI
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> To test the feature, you need to prepare a jar file, that
> >>>> contains
> >>>>>> the
> >>>>>>>>>>>>> code of multiple programs and specify each entry class in the
> >>>>>> manifest
> >>>>>>>>>>>>> file as comma separated values in "program-class" line.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Feedback is welcome. :)
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> -Matthias
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> On 05/08/2015 03:08 PM, Flavio Pompermaier wrote:
> >>>>>>>>>>>>>> Thank you all for the support!
> >>>>>>>>>>>>>> It will be a really nice feature if the web client could be
> >> able
> >>>>>> to
> >>>>>>>>>>> show
> >>>>>>>>>>>>>> me the list of Flink jobs within my jar..
> >>>>>>>>>>>>>> it should be sufficient to mark them with a special
> annotation
> >>>> and
> >>>>>>>>>>>>>> inspect the classes within the jar..
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> On Fri, May 8, 2015 at 3:03 PM, Malte Schwarzer <ms@mieo.de
> >>>>>>>>>>>>>> <ma...@mieo.de>> wrote:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>     Hi Flavio,
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>     you also can put each job in a single class and use the
> –c
> >>>>>>>>>>> parameter
> >>>>>>>>>>>>>>     to execute jobs separately:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>     /bin/flink run –c com.myflinkjobs.JobA
> >>>>>>>>>>> /path/to/jar/multiplejobs.jar
> >>>>>>>>>>>>>>     /bin/flink run –c com.myflinkjobs.JobB
> >>>>>>>>>>> /path/to/jar/multiplejobs.jar
> >>>>>>>>>>>>>>     …
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>     Cheers
> >>>>>>>>>>>>>>     Malte
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>     Von: Robert Metzger <rmetzger@apache.org <mailto:
> >>>>>>>>>>> rmetzger@apache.org
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>     Antworten an: <user@flink.apache.org <mailto:
> >>>>>>>>> user@flink.apache.org
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>>     Datum: Freitag, 8. Mai 2015 14:57
> >>>>>>>>>>>>>>     An: "user@flink.apache.org <mailto:
> user@flink.apache.org
> >>> "
> >>>>>>>>>>>>>>     <user@flink.apache.org <ma...@flink.apache.org>>
> >>>>>>>>>>>>>>     Betreff: Re: Package multiple jobs in a single jar
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>     Hi Flavio,
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>     the pom from our quickstart is a good
> >>>>>>>>>>>>>>     reference:
> >>>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>
> >>>>>>
> >>>>
> >>
> https://github.com/apache/flink/blob/master/flink-quickstart/flink-quickstart-java/src/main/resources/archetype-resources/pom.xml
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>     On Fri, May 8, 2015 at 2:53 PM, Flavio Pompermaier
> >>>>>>>>>>>>>>     <pompermaier@okkam.it <ma...@okkam.it>>
> >> wrote:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>         Ok, get it.
> >>>>>>>>>>>>>>         And is there a reference pom.xml for shading my
> >>>>>> application
> >>>>>>>>>>> into
> >>>>>>>>>>>>>>         one fat-jar? which flink dependencies can I exclude?
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>         On Fri, May 8, 2015 at 1:05 PM, Fabian Hueske <
> >>>>>>>>>>> fhueske@gmail.com
> >>>>>>>>>>>>>>         <ma...@gmail.com>> wrote:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>             I didn't say that the main should return the
> >>>>>>>>>>>>>>             ExecutionEnvironment.
> >>>>>>>>>>>>>>             You can define and execute as many programs in a
> >>>> main
> >>>>>>>>>>>>>>             function as you like.
> >>>>>>>>>>>>>>             The program can be defined somewhere else, e.g.,
> >> in
> >>>> a
> >>>>>>>>>>>>>>             function that receives an ExecutionEnvironment
> and
> >>>>>>>>> attaches
> >>>>>>>>>>>>>>             a program such as
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>             public void buildMyProgram(ExecutionEnvironment
> >>>> env) {
> >>>>>>>>>>>>>>               DataSet<String> lines = env.readTextFile(...);
> >>>>>>>>>>>>>>               // do something
> >>>>>>>>>>>>>>               lines.writeAsText(...);
> >>>>>>>>>>>>>>             }
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>             That method could be invoked from main():
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>             psv main() {
> >>>>>>>>>>>>>>               ExecutionEnv env = ...
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>               if(...) {
> >>>>>>>>>>>>>>                 buildMyProgram(env);
> >>>>>>>>>>>>>>               }
> >>>>>>>>>>>>>>               else {
> >>>>>>>>>>>>>>                 buildSomeOtherProg(env);
> >>>>>>>>>>>>>>               }
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>               env.execute();
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>               // run some more programs
> >>>>>>>>>>>>>>             }
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>             2015-05-08 12:56 GMT+02:00 Flavio Pompermaier
> >>>>>>>>>>>>>>             <pompermaier@okkam.it <mailto:
> >> pompermaier@okkam.it
> >>>>>> :
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>                 Hi Fabian,
> >>>>>>>>>>>>>>                 thanks for the response.
> >>>>>>>>>>>>>>                 So my mains should be converted in a method
> >>>>>> returning
> >>>>>>>>>>>>>>                 the ExecutionEnvironment.
> >>>>>>>>>>>>>>                 However it think that it will be very nice
> to
> >>>>>> have a
> >>>>>>>>>>>>>>                 syntax like the one of the Hadoop
> >> ProgramDriver
> >>>> to
> >>>>>>>>>>>>>>                 define jobs to invoke from a single root
> >> class.
> >>>>>>>>>>>>>>                 Do you think it could be useful?
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>                 On Fri, May 8, 2015 at 12:42 PM, Fabian
> Hueske
> >>>>>>>>>>>>>>                 <fhueske@gmail.com <mailto:
> fhueske@gmail.com
> >>>>
> >>>>>>>>> wrote:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>                     You easily have multiple Flink programs
> >> in a
> >>>>>>>>> single
> >>>>>>>>>>>>>>                     JAR file.
> >>>>>>>>>>>>>>                     A program is defined using an
> >>>>>>>>> ExecutionEnvironment
> >>>>>>>>>>>>>>                     and executed when you call
> >>>>>>>>>>>>>>                     ExecutionEnvironment.exeucte().
> >>>>>>>>>>>>>>                     Where and how you do that does not
> matter.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>                     You can for example implement a main
> >>>> function
> >>>>>>>>> such
> >>>>>>>>>>>>> as:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>                     public static void main(String... args)
> {
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>                       if (today == Monday) {
> >>>>>>>>>>>>>>                         ExecutionEnvironment env = ...
> >>>>>>>>>>>>>>                         // define Monday prog
> >>>>>>>>>>>>>>                         env.execute()
> >>>>>>>>>>>>>>                       }
> >>>>>>>>>>>>>>                       else {
> >>>>>>>>>>>>>>                         ExecutionEnvironment env = ...
> >>>>>>>>>>>>>>                         // define other prog
> >>>>>>>>>>>>>>                         env.execute()
> >>>>>>>>>>>>>>                       }
> >>>>>>>>>>>>>>                     }
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>                     2015-05-08 11:41 GMT+02:00 Flavio
> >>>> Pompermaier
> >>>>>>>>>>>>>>                     <pompermaier@okkam.it <mailto:
> >>>>>>>>> pompermaier@okkam.it
> >>>>>>>>>>>>>>> :
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>                         Hi to all,
> >>>>>>>>>>>>>>                         is there any way to keep multiple
> jobs
> >>>> in
> >>>>>> a
> >>>>>>>>> jar
> >>>>>>>>>>>>>>                         and then choose at runtime the one
> to
> >>>>>> execute
> >>>>>>>>>>>>>>                         (like what ProgramDriver does in
> >>>> Hadoop)?
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>                         Best,
> >>>>>>>>>>>>>>                         Flavio
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>>
> >>>
> >>
> >>
> >
>
>

Re: Package multiple jobs in a single jar

Posted by "Matthias J. Sax" <mj...@informatik.hu-berlin.de>.
I see your point.

However, right now only few people are aware of "ProgramDesciption"
interface. If we want to "advertise" for it, it should be used (at
least) in a few examples. Otherwise, people will never use it, and the
changes I plan to apply are kind of useless. I would even claim, that
the interface should be removed completely is this case...


On 05/26/2015 03:31 PM, Maximilian Michels wrote:
> Sorry, my bad. Yes, it is helpful to have a separate program and parameter
> description in ProgramDescription. I'm not sure if it adds much value to
> implement ProgramDescription in the examples. It introduces verbosity and
> might give the impression that you have to implement ProgramDescription in
> your Flink job.
> 
> On Tue, May 26, 2015 at 12:00 PM, Matthias J. Sax <
> mjsax@informatik.hu-berlin.de> wrote:
> 
>> Hi Max,
>>
>> thanks for your feedback. I guess you confuse the interfaces "Program"
>> and "ProgramDescription". Using "Program" the use of main method is
>> replaced by "getPlan(...)". However, "ProgramDescription" only adds
>> method "getDescription()" which returns a string that explains the usage
>> of the program (ie, short description, expected parameters).
>>
>> Thus, adding "ProgramDescription" to the examples, does not change the
>> examples -- main method will still be uses. It only adds the ability
>> that a program "explains" itself (ie, give meta info). Furhtermore,
>> "ProgramDescription" is also not related to the new "ParameterTool".
>>
>> -Matthias
>>
>> On 05/26/2015 11:46 AM, Maximilian Michels wrote:
>>> I don't think `getDisplayName()` is necessary either. The class name and
>>> the description string should be fine. Adding ProgramDescription to the
>>> examples is not necessary; as already pointed out, using the main method
>> is
>>> more convenient for most users. As far as I know, the idea of the
>>> ParameterTool was to use it only in the user code and not automatically
>>> handle parameters.
>>>
>>> Changing the interface would be quite API breaking but since most
>> programs
>>> use the main method, IMHO we could do it.
>>>
>>> On Fri, May 22, 2015 at 10:09 PM, Matthias J. Sax <
>>> mjsax@informatik.hu-berlin.de> wrote:
>>>
>>>> Makes sense to me. :)
>>>>
>>>> One more thing: What about extending the "ProgramDescription" interface
>>>> to have multiple methods as Flavio suggested (with the config(...)
>>>> method that should be handle by the ParameterTool)
>>>>
>>>>> public interface FlinkJob {
>>>>>
>>>>> /** The name to display in the job submission UI or shell */
>>>>> //e.g. "My Flink HelloWorld"
>>>>> String getDisplayName();
>>>>> //e.g. "This program does this and that etc.."
>>>>> String getDescription();
>>>>> //e.g. <0,Integer,"An integer representing my first param">,
>>>> <1,String,"An string representing my second param">
>>>>> List<Tuple3<Integer, TypeInfo, String>> paramDescription;
>>>>> /** Set up the flink job in the passed ExecutionEnvironment */
>>>>> ExecutionEnvironment config(ExecutionEnvironment env);
>>>>> }
>>>>
>>>> Right now, the interface is used only a couple of times in Flink's code
>>>> base, so it would not be a problem to update those classes. However, it
>>>> could break external code that uses the interface already (even if I
>>>> doubt that the interface is well known and used often [or at all]).
>>>>
>>>> I personally don't think, that "getDiplayName()" to too helpful.
>>>> Splitting the program description and the parameter description seems to
>>>> be useful. For example, if wrong parameters are provided, the parameter
>>>> description can be included in the error message. If program+parameter
>>>> description is given in a single string, this is not possible. But this
>>>> is only a minor issue of course.
>>>>
>>>> Maybe, we should also add the interface to the current Flink examples,
>>>> to make people more aware of it. Is there any documentation on the web
>>>> site.
>>>>
>>>>
>>>> -Matthias
>>>>
>>>>
>>>>
>>>> On 05/22/2015 09:43 PM, Robert Metzger wrote:
>>>>> Thank you for working on this.
>>>>> My responses are inline below:
>>>>>
>>>>> (Flavio)
>>>>>
>>>>>> My suggestion is to create a specific Flink interface to get also
>>>>>> description of a job and standardize parameter passing.
>>>>>
>>>>>
>>>>> I've recently merged the ParameterTool which is solving the
>> "standardize
>>>>> parameter passing" problem (at least it presents a best practice) :
>>>>>
>>>>
>> http://ci.apache.org/projects/flink/flink-docs-master/apis/best_practices.html#parsing-command-line-arguments-and-passing-them-around-in-your-flink-application
>>>>>
>>>>> Regarding the description: Maybe we can use the "ProgramDescription"
>>>>> interface for getting a string describing the program in the web
>>>> frontend.
>>>>>
>>>>> (Matthias)
>>>>>
>>>>>> I don't want to start working on it, before it's clear that it has a
>>>>>> chance to be
>>>>>> included in Flink.
>>>>>
>>>>>
>>>>> I think the changes discussed here won't change the current behavior,
>> but
>>>>> they add new functionality which
>>>>> can make the life of our users easier, so I'll vote to include your
>>>> changes
>>>>> (given they meet our quality standards)
>>>>>
>>>>>
>>>>> If multiple classes implement "Program" interface an exception should
>> be
>>>>>> through (I think that would make sense). However, I am not sure was
>>>>>> "good" behavior is, if a single "Program"-class is found and an
>>>>>> additional main-method class.
>>>>>>   - should "Program"-class be executed (ie, "overwrite" main-method
>>>> class)
>>>>>>   - or, better to through an exception ?
>>>>>
>>>>>
>>>>> I would give a class implementing "Program" priority over a random
>> main()
>>>>> method in a random class.
>>>>> Maybe printing a WARN log message informing the user that the "Program"
>>>>> class has been choosen.
>>>>>
>>>>>
>>>>> If no "Program"-class is found, but a single main-method class, Flink
>>>>>> could execute using main method. But I am not sure either, if this is
>>>>>> "good" behavior. If multiple main-method classes are present, throwing
>>>>>> and exception is the only way to got, I guess.
>>>>>
>>>>>
>>>>> I think the best effort approach "one class with main() found" is good.
>>>> In
>>>>> case of multiple main methods, a helpful exception is the best approach
>>>> in
>>>>> my opinion.
>>>>>
>>>>>
>>>>>  If the manifest contains "program-class" or "Main-Class" entry,
>>>>>> should we check the jar file right away if the specified class is
>> there?
>>>>>> Right now, no check is performed and an error occurs if the user tries
>>>>>> to execute the job.
>>>>>
>>>>>
>>>>> I'd say the current approach is sufficient. There is no need to have a
>>>>> special code path which is doing the check.
>>>>> I think the error message will be pretty similar in both cases and I
>> fear
>>>>> that this additional code could also introduce new bugs ;)
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Fri, May 22, 2015 at 9:06 PM, Matthias J. Sax <
>>>>> mjsax@informatik.hu-berlin.de> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> two more thoughts to this discussion:
>>>>>>
>>>>>>  1) looking at the commit history of "CliFrontend", I found the
>>>>>> following closed issue and the closing pull request
>>>>>>     * https://issues.apache.org/jira/browse/FLINK-1095
>>>>>>     * https://github.com/apache/flink/pull/238
>>>>>> It stand in opposite of Flavio's request to have a job description.
>> Any
>>>>>> comment on this? Should a removed feature be re-introduced? If not, I
>>>>>> would suggest to remove the "ProgramDescription" interface completely.
>>>>>>
>>>>>>  2) If the manifest contains "program-class" or "Main-Class" entry,
>>>>>> should we check the jar file right away if the specified class is
>> there?
>>>>>> Right now, no check is performed and an error occurs if the user tries
>>>>>> to execute the job.
>>>>>>
>>>>>>
>>>>>> -Matthias
>>>>>>
>>>>>>
>>>>>> On 05/22/2015 12:06 PM, Matthias J. Sax wrote:
>>>>>>> Thanks for your feedback.
>>>>>>>
>>>>>>> I agree on the main method "problem". For scanning and listing all
>>>> stuff
>>>>>>> that is found it's fine.
>>>>>>>
>>>>>>> The tricky question is the automatic invocation mechanism, if "-c"
>> flag
>>>>>>> is not used, and no manifest program-class or Main-Class entry is
>>>> found.
>>>>>>>
>>>>>>> If multiple classes implement "Program" interface an exception should
>>>> be
>>>>>>> through (I think that would make sense). However, I am not sure was
>>>>>>> "good" behavior is, if a single "Program"-class is found and an
>>>>>>> additional main-method class.
>>>>>>>   - should "Program"-class be executed (ie, "overwrite" main-method
>>>>>> class)
>>>>>>>   - or, better to through an exception ?
>>>>>>>
>>>>>>> If no "Program"-class is found, but a single main-method class, Flink
>>>>>>> could execute using main method. But I am not sure either, if this is
>>>>>>> "good" behavior. If multiple main-method classes are present,
>> throwing
>>>>>>> and exception is the only way to got, I guess.
>>>>>>>
>>>>>>> To sum up: Should Flink consider main-method classes for automatic
>>>>>>> invocation, or should it be required for main-method classes to
>> either
>>>>>>> list them in "program-class" or "Main-Class" manifest parameter (to
>>>>>>> enable them for automatic invocation)?
>>>>>>>
>>>>>>>
>>>>>>> -Matthias
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 05/22/2015 09:56 AM, Maximilian Michels wrote:
>>>>>>>> Hi Matthias,
>>>>>>>>
>>>>>>>> Thank you for taking the time to analyze Flink's invocation
>> behavior.
>>>> I
>>>>>>>> like your proposal. I'm not sure whether it is a good idea to scan
>> the
>>>>>>>> entire JAR for main methods. Sometimes, main methods are added
>> solely
>>>>>> for
>>>>>>>> testing purposes and don't really serve any practical use. However,
>> if
>>>>>>>> you're already going through the JAR to find the ProgramDescription
>>>>>>>> interface, then you might look for main methods as well. As long as
>> it
>>>>>> is
>>>>>>>> just a listing without execution, that should be fine.
>>>>>>>>
>>>>>>>> Best regards,
>>>>>>>> Max
>>>>>>>>
>>>>>>>> On Thu, May 21, 2015 at 3:43 PM, Matthias J. Sax <
>>>>>>>> mjsax@informatik.hu-berlin.de> wrote:
>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> I had a look into the current Workflow of Flink with regard to the
>>>>>>>>> progressing steps of a jar file.
>>>>>>>>>
>>>>>>>>> If I got it right it works as follows (not sure if this is
>> documented
>>>>>>>>> somewhere):
>>>>>>>>>
>>>>>>>>> 1) check, if "-c" flag is used to set program entry point
>>>>>>>>>    if yes, goto 4
>>>>>>>>> 2) try to extract "program-class" property from manifest
>>>>>>>>>    (if found goto 4)
>>>>>>>>> 3) try to extract "Main-Class" property from manifest
>>>>>>>>>    -> if not found through exception (this happens also, if no
>>>> manifest
>>>>>>>>> file is found at all)
>>>>>>>>>
>>>>>>>>> 4) check if entry point class implements "Program" interface
>>>>>>>>>    if yes, goto 6
>>>>>>>>> 5) check if entry point class provided "public static void
>>>>>> main(String[]
>>>>>>>>> args)" method
>>>>>>>>>    -> if not, through exception
>>>>>>>>>
>>>>>>>>> 6) execute program (ie, show plan/info or really run it)
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I also "discovered" the interface "ProgramDescription" with a
>> single
>>>>>>>>> method "String getDescription()". Even if some examples implement
>>>> this
>>>>>>>>> interface (and use it in the example itself), Flink basically
>> ignores
>>>>>>>>> it... From the CLI there is no way to get this info, and the WebUI
>>>> does
>>>>>>>>> actually get it if present, however, doesn't show it anywhere...
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I think it would be nice, if we would extend the following
>> functions:
>>>>>>>>>
>>>>>>>>>  - extend the possibility to specify multiple entry classes in
>>>>>>>>> "program-class" or "Main-Class" -> in this case, the user needs to
>>>> use
>>>>>>>>> "-c" flag to pick program to run every time
>>>>>>>>>
>>>>>>>>>  - add a CLI option that allows the user to see what entry point
>>>>>> classes
>>>>>>>>> are available
>>>>>>>>>    for this, consider
>>>>>>>>>      a) "program-class" entry
>>>>>>>>>      b) "Main-Class" entry
>>>>>>>>>      c) if neither is found, scan jar-file for classes implementing
>>>>>>>>> "Program" interface
>>>>>>>>>      d) if still not found, scan jar-file for classes with "main"
>>>>>> method
>>>>>>>>>
>>>>>>>>>  - if user looks for entry point classes via CLI, check for
>>>>>>>>> "ProgramDesciption" interface and show info
>>>>>>>>>
>>>>>>>>>  - extend WebUI to show all available entry-classes (pull request
>>>>>>>>> already there, for multiple entries in "program-class")
>>>>>>>>>
>>>>>>>>>  - extend WebUI to show "ProgramDescription" info
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> What do you think? I am not too sure about the "auto scan" of the
>> jar
>>>>>>>>> file if no manifest entry is provided. We might get some "fat jars"
>>>> and
>>>>>>>>> scanning might take some time.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> -Matthias
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 05/19/2015 10:44 AM, Stephan Ewen wrote:
>>>>>>>>>> We actually has an interface like that before ("Program"). It is
>>>> still
>>>>>>>>>> supported, but in all new programs we simply use the Java main
>>>> method.
>>>>>>>>> The
>>>>>>>>>> advantage is that
>>>>>>>>>> most IDEs can create executable JARs automatically, setting the
>> JAR
>>>>>>>>>> manifest attributes, etc.
>>>>>>>>>>
>>>>>>>>>> The "Program" interface still works, though. Most tool classes
>> (like
>>>>>>>>>> "PackagedProgram") have a way to figure out whether the code uses
>>>>>>>>> "main()"
>>>>>>>>>> or implements "Program"
>>>>>>>>>> and calls the right method.
>>>>>>>>>>
>>>>>>>>>> You can try and extend the program interface. If you want to
>>>>>> consistently
>>>>>>>>>> support multiple programs in one JAR file, you may need to adjust
>>>> the
>>>>>>>>> util
>>>>>>>>>> classes as
>>>>>>>>>> well to deal with that.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Tue, May 19, 2015 at 10:10 AM, Matthias J. Sax <
>>>>>>>>>> mjsax@informatik.hu-berlin.de> wrote:
>>>>>>>>>>
>>>>>>>>>>> Supporting an interface like this seems to be a nice idea. Any
>>>> other
>>>>>>>>>>> opinions on it?
>>>>>>>>>>>
>>>>>>>>>>> It seems to be some more work to get it done right. I don't want
>> to
>>>>>>>>>>> start working on it, before it's clear that it has a chance to be
>>>>>>>>>>> included in Flink.
>>>>>>>>>>>
>>>>>>>>>>> @Flavio: I moved the discussion to dev mailing list (user list is
>>>> not
>>>>>>>>>>> appropriate for this discussion). Are you subscribed to it or
>>>> should
>>>>>> I
>>>>>>>>>>> cc you in each mail?
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> -Matthias
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 05/19/2015 09:39 AM, Flavio Pompermaier wrote:
>>>>>>>>>>>> Nice feature Matthias!
>>>>>>>>>>>> My suggestion is to create a specific Flink interface to get
>> also
>>>>>>>>>>>> description of a job and standardize parameter passing.
>>>>>>>>>>>> Then, somewhere (e.g. Manifest) you could specify the list of
>>>>>> packages
>>>>>>>>>>> (or
>>>>>>>>>>>> also directly the classes) to inspect with reflection to extract
>>>> the
>>>>>>>>> list
>>>>>>>>>>>> of available Flink jobs.
>>>>>>>>>>>> Something like:
>>>>>>>>>>>>
>>>>>>>>>>>> public interface FlinkJob {
>>>>>>>>>>>>
>>>>>>>>>>>> /** The name to display in the job submission UI or shell */
>>>>>>>>>>>> //e.g. "My Flink HelloWorld"
>>>>>>>>>>>> String getDisplayName();
>>>>>>>>>>>>  //e.g. "This program does this and that etc.."
>>>>>>>>>>>> String getDescription();
>>>>>>>>>>>>  //e.g. <0,Integer,"An integer representing my first param">,
>>>>>>>>>>> <1,String,"An
>>>>>>>>>>>> string representing my second param">
>>>>>>>>>>>> List<Tuple3<Integer, TypeInfo, String>> paramDescription;
>>>>>>>>>>>>  /** Set up the flink job in the passed ExecutionEnvironment */
>>>>>>>>>>>> ExecutionEnvironment config(ExecutionEnvironment env);
>>>>>>>>>>>> }
>>>>>>>>>>>>
>>>>>>>>>>>> What do you think?
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Sun, May 17, 2015 at 10:38 PM, Matthias J. Sax <
>>>>>>>>>>>> mjsax@informatik.hu-berlin.de> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>
>>>>>>>>>>>>> I like the idea that Flink's WebClient can show different plans
>>>> for
>>>>>>>>>>>>> different jobs within a single jar file.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I prepared a prototype for this feature. You can find it here:
>>>>>>>>>>>>> https://github.com/mjsax/flink/tree/multipleJobsWebUI
>>>>>>>>>>>>>
>>>>>>>>>>>>> To test the feature, you need to prepare a jar file, that
>>>> contains
>>>>>> the
>>>>>>>>>>>>> code of multiple programs and specify each entry class in the
>>>>>> manifest
>>>>>>>>>>>>> file as comma separated values in "program-class" line.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Feedback is welcome. :)
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> -Matthias
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 05/08/2015 03:08 PM, Flavio Pompermaier wrote:
>>>>>>>>>>>>>> Thank you all for the support!
>>>>>>>>>>>>>> It will be a really nice feature if the web client could be
>> able
>>>>>> to
>>>>>>>>>>> show
>>>>>>>>>>>>>> me the list of Flink jobs within my jar..
>>>>>>>>>>>>>> it should be sufficient to mark them with a special annotation
>>>> and
>>>>>>>>>>>>>> inspect the classes within the jar..
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Fri, May 8, 2015 at 3:03 PM, Malte Schwarzer <ms@mieo.de
>>>>>>>>>>>>>> <ma...@mieo.de>> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>     Hi Flavio,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>     you also can put each job in a single class and use the –c
>>>>>>>>>>> parameter
>>>>>>>>>>>>>>     to execute jobs separately:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>     /bin/flink run –c com.myflinkjobs.JobA
>>>>>>>>>>> /path/to/jar/multiplejobs.jar
>>>>>>>>>>>>>>     /bin/flink run –c com.myflinkjobs.JobB
>>>>>>>>>>> /path/to/jar/multiplejobs.jar
>>>>>>>>>>>>>>     …
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>     Cheers
>>>>>>>>>>>>>>     Malte
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>     Von: Robert Metzger <rmetzger@apache.org <mailto:
>>>>>>>>>>> rmetzger@apache.org
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>     Antworten an: <user@flink.apache.org <mailto:
>>>>>>>>> user@flink.apache.org
>>>>>>>>>>>>>
>>>>>>>>>>>>>>     Datum: Freitag, 8. Mai 2015 14:57
>>>>>>>>>>>>>>     An: "user@flink.apache.org <mailto:user@flink.apache.org
>>> "
>>>>>>>>>>>>>>     <user@flink.apache.org <ma...@flink.apache.org>>
>>>>>>>>>>>>>>     Betreff: Re: Package multiple jobs in a single jar
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>     Hi Flavio,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>     the pom from our quickstart is a good
>>>>>>>>>>>>>>     reference:
>>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>
>>>>>>
>>>>
>> https://github.com/apache/flink/blob/master/flink-quickstart/flink-quickstart-java/src/main/resources/archetype-resources/pom.xml
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>     On Fri, May 8, 2015 at 2:53 PM, Flavio Pompermaier
>>>>>>>>>>>>>>     <pompermaier@okkam.it <ma...@okkam.it>>
>> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>         Ok, get it.
>>>>>>>>>>>>>>         And is there a reference pom.xml for shading my
>>>>>> application
>>>>>>>>>>> into
>>>>>>>>>>>>>>         one fat-jar? which flink dependencies can I exclude?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>         On Fri, May 8, 2015 at 1:05 PM, Fabian Hueske <
>>>>>>>>>>> fhueske@gmail.com
>>>>>>>>>>>>>>         <ma...@gmail.com>> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>             I didn't say that the main should return the
>>>>>>>>>>>>>>             ExecutionEnvironment.
>>>>>>>>>>>>>>             You can define and execute as many programs in a
>>>> main
>>>>>>>>>>>>>>             function as you like.
>>>>>>>>>>>>>>             The program can be defined somewhere else, e.g.,
>> in
>>>> a
>>>>>>>>>>>>>>             function that receives an ExecutionEnvironment and
>>>>>>>>> attaches
>>>>>>>>>>>>>>             a program such as
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>             public void buildMyProgram(ExecutionEnvironment
>>>> env) {
>>>>>>>>>>>>>>               DataSet<String> lines = env.readTextFile(...);
>>>>>>>>>>>>>>               // do something
>>>>>>>>>>>>>>               lines.writeAsText(...);
>>>>>>>>>>>>>>             }
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>             That method could be invoked from main():
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>             psv main() {
>>>>>>>>>>>>>>               ExecutionEnv env = ...
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>               if(...) {
>>>>>>>>>>>>>>                 buildMyProgram(env);
>>>>>>>>>>>>>>               }
>>>>>>>>>>>>>>               else {
>>>>>>>>>>>>>>                 buildSomeOtherProg(env);
>>>>>>>>>>>>>>               }
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>               env.execute();
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>               // run some more programs
>>>>>>>>>>>>>>             }
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>             2015-05-08 12:56 GMT+02:00 Flavio Pompermaier
>>>>>>>>>>>>>>             <pompermaier@okkam.it <mailto:
>> pompermaier@okkam.it
>>>>>> :
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>                 Hi Fabian,
>>>>>>>>>>>>>>                 thanks for the response.
>>>>>>>>>>>>>>                 So my mains should be converted in a method
>>>>>> returning
>>>>>>>>>>>>>>                 the ExecutionEnvironment.
>>>>>>>>>>>>>>                 However it think that it will be very nice to
>>>>>> have a
>>>>>>>>>>>>>>                 syntax like the one of the Hadoop
>> ProgramDriver
>>>> to
>>>>>>>>>>>>>>                 define jobs to invoke from a single root
>> class.
>>>>>>>>>>>>>>                 Do you think it could be useful?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>                 On Fri, May 8, 2015 at 12:42 PM, Fabian Hueske
>>>>>>>>>>>>>>                 <fhueske@gmail.com <mailto:fhueske@gmail.com
>>>>
>>>>>>>>> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>                     You easily have multiple Flink programs
>> in a
>>>>>>>>> single
>>>>>>>>>>>>>>                     JAR file.
>>>>>>>>>>>>>>                     A program is defined using an
>>>>>>>>> ExecutionEnvironment
>>>>>>>>>>>>>>                     and executed when you call
>>>>>>>>>>>>>>                     ExecutionEnvironment.exeucte().
>>>>>>>>>>>>>>                     Where and how you do that does not matter.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>                     You can for example implement a main
>>>> function
>>>>>>>>> such
>>>>>>>>>>>>> as:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>                     public static void main(String... args) {
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>                       if (today == Monday) {
>>>>>>>>>>>>>>                         ExecutionEnvironment env = ...
>>>>>>>>>>>>>>                         // define Monday prog
>>>>>>>>>>>>>>                         env.execute()
>>>>>>>>>>>>>>                       }
>>>>>>>>>>>>>>                       else {
>>>>>>>>>>>>>>                         ExecutionEnvironment env = ...
>>>>>>>>>>>>>>                         // define other prog
>>>>>>>>>>>>>>                         env.execute()
>>>>>>>>>>>>>>                       }
>>>>>>>>>>>>>>                     }
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>                     2015-05-08 11:41 GMT+02:00 Flavio
>>>> Pompermaier
>>>>>>>>>>>>>>                     <pompermaier@okkam.it <mailto:
>>>>>>>>> pompermaier@okkam.it
>>>>>>>>>>>>>>> :
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>                         Hi to all,
>>>>>>>>>>>>>>                         is there any way to keep multiple jobs
>>>> in
>>>>>> a
>>>>>>>>> jar
>>>>>>>>>>>>>>                         and then choose at runtime the one to
>>>>>> execute
>>>>>>>>>>>>>>                         (like what ProgramDriver does in
>>>> Hadoop)?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>                         Best,
>>>>>>>>>>>>>>                         Flavio
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>
>>
>>
> 


Re: Package multiple jobs in a single jar

Posted by Maximilian Michels <mx...@apache.org>.
Sorry, my bad. Yes, it is helpful to have a separate program and parameter
description in ProgramDescription. I'm not sure if it adds much value to
implement ProgramDescription in the examples. It introduces verbosity and
might give the impression that you have to implement ProgramDescription in
your Flink job.

On Tue, May 26, 2015 at 12:00 PM, Matthias J. Sax <
mjsax@informatik.hu-berlin.de> wrote:

> Hi Max,
>
> thanks for your feedback. I guess you confuse the interfaces "Program"
> and "ProgramDescription". Using "Program" the use of main method is
> replaced by "getPlan(...)". However, "ProgramDescription" only adds
> method "getDescription()" which returns a string that explains the usage
> of the program (ie, short description, expected parameters).
>
> Thus, adding "ProgramDescription" to the examples, does not change the
> examples -- main method will still be uses. It only adds the ability
> that a program "explains" itself (ie, give meta info). Furhtermore,
> "ProgramDescription" is also not related to the new "ParameterTool".
>
> -Matthias
>
> On 05/26/2015 11:46 AM, Maximilian Michels wrote:
> > I don't think `getDisplayName()` is necessary either. The class name and
> > the description string should be fine. Adding ProgramDescription to the
> > examples is not necessary; as already pointed out, using the main method
> is
> > more convenient for most users. As far as I know, the idea of the
> > ParameterTool was to use it only in the user code and not automatically
> > handle parameters.
> >
> > Changing the interface would be quite API breaking but since most
> programs
> > use the main method, IMHO we could do it.
> >
> > On Fri, May 22, 2015 at 10:09 PM, Matthias J. Sax <
> > mjsax@informatik.hu-berlin.de> wrote:
> >
> >> Makes sense to me. :)
> >>
> >> One more thing: What about extending the "ProgramDescription" interface
> >> to have multiple methods as Flavio suggested (with the config(...)
> >> method that should be handle by the ParameterTool)
> >>
> >>> public interface FlinkJob {
> >>>
> >>> /** The name to display in the job submission UI or shell */
> >>> //e.g. "My Flink HelloWorld"
> >>> String getDisplayName();
> >>> //e.g. "This program does this and that etc.."
> >>> String getDescription();
> >>> //e.g. <0,Integer,"An integer representing my first param">,
> >> <1,String,"An string representing my second param">
> >>> List<Tuple3<Integer, TypeInfo, String>> paramDescription;
> >>> /** Set up the flink job in the passed ExecutionEnvironment */
> >>> ExecutionEnvironment config(ExecutionEnvironment env);
> >>> }
> >>
> >> Right now, the interface is used only a couple of times in Flink's code
> >> base, so it would not be a problem to update those classes. However, it
> >> could break external code that uses the interface already (even if I
> >> doubt that the interface is well known and used often [or at all]).
> >>
> >> I personally don't think, that "getDiplayName()" to too helpful.
> >> Splitting the program description and the parameter description seems to
> >> be useful. For example, if wrong parameters are provided, the parameter
> >> description can be included in the error message. If program+parameter
> >> description is given in a single string, this is not possible. But this
> >> is only a minor issue of course.
> >>
> >> Maybe, we should also add the interface to the current Flink examples,
> >> to make people more aware of it. Is there any documentation on the web
> >> site.
> >>
> >>
> >> -Matthias
> >>
> >>
> >>
> >> On 05/22/2015 09:43 PM, Robert Metzger wrote:
> >>> Thank you for working on this.
> >>> My responses are inline below:
> >>>
> >>> (Flavio)
> >>>
> >>>> My suggestion is to create a specific Flink interface to get also
> >>>> description of a job and standardize parameter passing.
> >>>
> >>>
> >>> I've recently merged the ParameterTool which is solving the
> "standardize
> >>> parameter passing" problem (at least it presents a best practice) :
> >>>
> >>
> http://ci.apache.org/projects/flink/flink-docs-master/apis/best_practices.html#parsing-command-line-arguments-and-passing-them-around-in-your-flink-application
> >>>
> >>> Regarding the description: Maybe we can use the "ProgramDescription"
> >>> interface for getting a string describing the program in the web
> >> frontend.
> >>>
> >>> (Matthias)
> >>>
> >>>> I don't want to start working on it, before it's clear that it has a
> >>>> chance to be
> >>>> included in Flink.
> >>>
> >>>
> >>> I think the changes discussed here won't change the current behavior,
> but
> >>> they add new functionality which
> >>> can make the life of our users easier, so I'll vote to include your
> >> changes
> >>> (given they meet our quality standards)
> >>>
> >>>
> >>> If multiple classes implement "Program" interface an exception should
> be
> >>>> through (I think that would make sense). However, I am not sure was
> >>>> "good" behavior is, if a single "Program"-class is found and an
> >>>> additional main-method class.
> >>>>   - should "Program"-class be executed (ie, "overwrite" main-method
> >> class)
> >>>>   - or, better to through an exception ?
> >>>
> >>>
> >>> I would give a class implementing "Program" priority over a random
> main()
> >>> method in a random class.
> >>> Maybe printing a WARN log message informing the user that the "Program"
> >>> class has been choosen.
> >>>
> >>>
> >>> If no "Program"-class is found, but a single main-method class, Flink
> >>>> could execute using main method. But I am not sure either, if this is
> >>>> "good" behavior. If multiple main-method classes are present, throwing
> >>>> and exception is the only way to got, I guess.
> >>>
> >>>
> >>> I think the best effort approach "one class with main() found" is good.
> >> In
> >>> case of multiple main methods, a helpful exception is the best approach
> >> in
> >>> my opinion.
> >>>
> >>>
> >>>  If the manifest contains "program-class" or "Main-Class" entry,
> >>>> should we check the jar file right away if the specified class is
> there?
> >>>> Right now, no check is performed and an error occurs if the user tries
> >>>> to execute the job.
> >>>
> >>>
> >>> I'd say the current approach is sufficient. There is no need to have a
> >>> special code path which is doing the check.
> >>> I think the error message will be pretty similar in both cases and I
> fear
> >>> that this additional code could also introduce new bugs ;)
> >>>
> >>>
> >>>
> >>>
> >>> On Fri, May 22, 2015 at 9:06 PM, Matthias J. Sax <
> >>> mjsax@informatik.hu-berlin.de> wrote:
> >>>
> >>>> Hi,
> >>>>
> >>>> two more thoughts to this discussion:
> >>>>
> >>>>  1) looking at the commit history of "CliFrontend", I found the
> >>>> following closed issue and the closing pull request
> >>>>     * https://issues.apache.org/jira/browse/FLINK-1095
> >>>>     * https://github.com/apache/flink/pull/238
> >>>> It stand in opposite of Flavio's request to have a job description.
> Any
> >>>> comment on this? Should a removed feature be re-introduced? If not, I
> >>>> would suggest to remove the "ProgramDescription" interface completely.
> >>>>
> >>>>  2) If the manifest contains "program-class" or "Main-Class" entry,
> >>>> should we check the jar file right away if the specified class is
> there?
> >>>> Right now, no check is performed and an error occurs if the user tries
> >>>> to execute the job.
> >>>>
> >>>>
> >>>> -Matthias
> >>>>
> >>>>
> >>>> On 05/22/2015 12:06 PM, Matthias J. Sax wrote:
> >>>>> Thanks for your feedback.
> >>>>>
> >>>>> I agree on the main method "problem". For scanning and listing all
> >> stuff
> >>>>> that is found it's fine.
> >>>>>
> >>>>> The tricky question is the automatic invocation mechanism, if "-c"
> flag
> >>>>> is not used, and no manifest program-class or Main-Class entry is
> >> found.
> >>>>>
> >>>>> If multiple classes implement "Program" interface an exception should
> >> be
> >>>>> through (I think that would make sense). However, I am not sure was
> >>>>> "good" behavior is, if a single "Program"-class is found and an
> >>>>> additional main-method class.
> >>>>>   - should "Program"-class be executed (ie, "overwrite" main-method
> >>>> class)
> >>>>>   - or, better to through an exception ?
> >>>>>
> >>>>> If no "Program"-class is found, but a single main-method class, Flink
> >>>>> could execute using main method. But I am not sure either, if this is
> >>>>> "good" behavior. If multiple main-method classes are present,
> throwing
> >>>>> and exception is the only way to got, I guess.
> >>>>>
> >>>>> To sum up: Should Flink consider main-method classes for automatic
> >>>>> invocation, or should it be required for main-method classes to
> either
> >>>>> list them in "program-class" or "Main-Class" manifest parameter (to
> >>>>> enable them for automatic invocation)?
> >>>>>
> >>>>>
> >>>>> -Matthias
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>> On 05/22/2015 09:56 AM, Maximilian Michels wrote:
> >>>>>> Hi Matthias,
> >>>>>>
> >>>>>> Thank you for taking the time to analyze Flink's invocation
> behavior.
> >> I
> >>>>>> like your proposal. I'm not sure whether it is a good idea to scan
> the
> >>>>>> entire JAR for main methods. Sometimes, main methods are added
> solely
> >>>> for
> >>>>>> testing purposes and don't really serve any practical use. However,
> if
> >>>>>> you're already going through the JAR to find the ProgramDescription
> >>>>>> interface, then you might look for main methods as well. As long as
> it
> >>>> is
> >>>>>> just a listing without execution, that should be fine.
> >>>>>>
> >>>>>> Best regards,
> >>>>>> Max
> >>>>>>
> >>>>>> On Thu, May 21, 2015 at 3:43 PM, Matthias J. Sax <
> >>>>>> mjsax@informatik.hu-berlin.de> wrote:
> >>>>>>
> >>>>>>> Hi,
> >>>>>>>
> >>>>>>> I had a look into the current Workflow of Flink with regard to the
> >>>>>>> progressing steps of a jar file.
> >>>>>>>
> >>>>>>> If I got it right it works as follows (not sure if this is
> documented
> >>>>>>> somewhere):
> >>>>>>>
> >>>>>>> 1) check, if "-c" flag is used to set program entry point
> >>>>>>>    if yes, goto 4
> >>>>>>> 2) try to extract "program-class" property from manifest
> >>>>>>>    (if found goto 4)
> >>>>>>> 3) try to extract "Main-Class" property from manifest
> >>>>>>>    -> if not found through exception (this happens also, if no
> >> manifest
> >>>>>>> file is found at all)
> >>>>>>>
> >>>>>>> 4) check if entry point class implements "Program" interface
> >>>>>>>    if yes, goto 6
> >>>>>>> 5) check if entry point class provided "public static void
> >>>> main(String[]
> >>>>>>> args)" method
> >>>>>>>    -> if not, through exception
> >>>>>>>
> >>>>>>> 6) execute program (ie, show plan/info or really run it)
> >>>>>>>
> >>>>>>>
> >>>>>>> I also "discovered" the interface "ProgramDescription" with a
> single
> >>>>>>> method "String getDescription()". Even if some examples implement
> >> this
> >>>>>>> interface (and use it in the example itself), Flink basically
> ignores
> >>>>>>> it... From the CLI there is no way to get this info, and the WebUI
> >> does
> >>>>>>> actually get it if present, however, doesn't show it anywhere...
> >>>>>>>
> >>>>>>>
> >>>>>>> I think it would be nice, if we would extend the following
> functions:
> >>>>>>>
> >>>>>>>  - extend the possibility to specify multiple entry classes in
> >>>>>>> "program-class" or "Main-Class" -> in this case, the user needs to
> >> use
> >>>>>>> "-c" flag to pick program to run every time
> >>>>>>>
> >>>>>>>  - add a CLI option that allows the user to see what entry point
> >>>> classes
> >>>>>>> are available
> >>>>>>>    for this, consider
> >>>>>>>      a) "program-class" entry
> >>>>>>>      b) "Main-Class" entry
> >>>>>>>      c) if neither is found, scan jar-file for classes implementing
> >>>>>>> "Program" interface
> >>>>>>>      d) if still not found, scan jar-file for classes with "main"
> >>>> method
> >>>>>>>
> >>>>>>>  - if user looks for entry point classes via CLI, check for
> >>>>>>> "ProgramDesciption" interface and show info
> >>>>>>>
> >>>>>>>  - extend WebUI to show all available entry-classes (pull request
> >>>>>>> already there, for multiple entries in "program-class")
> >>>>>>>
> >>>>>>>  - extend WebUI to show "ProgramDescription" info
> >>>>>>>
> >>>>>>>
> >>>>>>> What do you think? I am not too sure about the "auto scan" of the
> jar
> >>>>>>> file if no manifest entry is provided. We might get some "fat jars"
> >> and
> >>>>>>> scanning might take some time.
> >>>>>>>
> >>>>>>>
> >>>>>>> -Matthias
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> On 05/19/2015 10:44 AM, Stephan Ewen wrote:
> >>>>>>>> We actually has an interface like that before ("Program"). It is
> >> still
> >>>>>>>> supported, but in all new programs we simply use the Java main
> >> method.
> >>>>>>> The
> >>>>>>>> advantage is that
> >>>>>>>> most IDEs can create executable JARs automatically, setting the
> JAR
> >>>>>>>> manifest attributes, etc.
> >>>>>>>>
> >>>>>>>> The "Program" interface still works, though. Most tool classes
> (like
> >>>>>>>> "PackagedProgram") have a way to figure out whether the code uses
> >>>>>>> "main()"
> >>>>>>>> or implements "Program"
> >>>>>>>> and calls the right method.
> >>>>>>>>
> >>>>>>>> You can try and extend the program interface. If you want to
> >>>> consistently
> >>>>>>>> support multiple programs in one JAR file, you may need to adjust
> >> the
> >>>>>>> util
> >>>>>>>> classes as
> >>>>>>>> well to deal with that.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On Tue, May 19, 2015 at 10:10 AM, Matthias J. Sax <
> >>>>>>>> mjsax@informatik.hu-berlin.de> wrote:
> >>>>>>>>
> >>>>>>>>> Supporting an interface like this seems to be a nice idea. Any
> >> other
> >>>>>>>>> opinions on it?
> >>>>>>>>>
> >>>>>>>>> It seems to be some more work to get it done right. I don't want
> to
> >>>>>>>>> start working on it, before it's clear that it has a chance to be
> >>>>>>>>> included in Flink.
> >>>>>>>>>
> >>>>>>>>> @Flavio: I moved the discussion to dev mailing list (user list is
> >> not
> >>>>>>>>> appropriate for this discussion). Are you subscribed to it or
> >> should
> >>>> I
> >>>>>>>>> cc you in each mail?
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> -Matthias
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> On 05/19/2015 09:39 AM, Flavio Pompermaier wrote:
> >>>>>>>>>> Nice feature Matthias!
> >>>>>>>>>> My suggestion is to create a specific Flink interface to get
> also
> >>>>>>>>>> description of a job and standardize parameter passing.
> >>>>>>>>>> Then, somewhere (e.g. Manifest) you could specify the list of
> >>>> packages
> >>>>>>>>> (or
> >>>>>>>>>> also directly the classes) to inspect with reflection to extract
> >> the
> >>>>>>> list
> >>>>>>>>>> of available Flink jobs.
> >>>>>>>>>> Something like:
> >>>>>>>>>>
> >>>>>>>>>> public interface FlinkJob {
> >>>>>>>>>>
> >>>>>>>>>> /** The name to display in the job submission UI or shell */
> >>>>>>>>>> //e.g. "My Flink HelloWorld"
> >>>>>>>>>> String getDisplayName();
> >>>>>>>>>>  //e.g. "This program does this and that etc.."
> >>>>>>>>>> String getDescription();
> >>>>>>>>>>  //e.g. <0,Integer,"An integer representing my first param">,
> >>>>>>>>> <1,String,"An
> >>>>>>>>>> string representing my second param">
> >>>>>>>>>> List<Tuple3<Integer, TypeInfo, String>> paramDescription;
> >>>>>>>>>>  /** Set up the flink job in the passed ExecutionEnvironment */
> >>>>>>>>>> ExecutionEnvironment config(ExecutionEnvironment env);
> >>>>>>>>>> }
> >>>>>>>>>>
> >>>>>>>>>> What do you think?
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> On Sun, May 17, 2015 at 10:38 PM, Matthias J. Sax <
> >>>>>>>>>> mjsax@informatik.hu-berlin.de> wrote:
> >>>>>>>>>>
> >>>>>>>>>>> Hi,
> >>>>>>>>>>>
> >>>>>>>>>>> I like the idea that Flink's WebClient can show different plans
> >> for
> >>>>>>>>>>> different jobs within a single jar file.
> >>>>>>>>>>>
> >>>>>>>>>>> I prepared a prototype for this feature. You can find it here:
> >>>>>>>>>>> https://github.com/mjsax/flink/tree/multipleJobsWebUI
> >>>>>>>>>>>
> >>>>>>>>>>> To test the feature, you need to prepare a jar file, that
> >> contains
> >>>> the
> >>>>>>>>>>> code of multiple programs and specify each entry class in the
> >>>> manifest
> >>>>>>>>>>> file as comma separated values in "program-class" line.
> >>>>>>>>>>>
> >>>>>>>>>>> Feedback is welcome. :)
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> -Matthias
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> On 05/08/2015 03:08 PM, Flavio Pompermaier wrote:
> >>>>>>>>>>>> Thank you all for the support!
> >>>>>>>>>>>> It will be a really nice feature if the web client could be
> able
> >>>> to
> >>>>>>>>> show
> >>>>>>>>>>>> me the list of Flink jobs within my jar..
> >>>>>>>>>>>> it should be sufficient to mark them with a special annotation
> >> and
> >>>>>>>>>>>> inspect the classes within the jar..
> >>>>>>>>>>>>
> >>>>>>>>>>>> On Fri, May 8, 2015 at 3:03 PM, Malte Schwarzer <ms@mieo.de
> >>>>>>>>>>>> <ma...@mieo.de>> wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>>     Hi Flavio,
> >>>>>>>>>>>>
> >>>>>>>>>>>>     you also can put each job in a single class and use the –c
> >>>>>>>>> parameter
> >>>>>>>>>>>>     to execute jobs separately:
> >>>>>>>>>>>>
> >>>>>>>>>>>>     /bin/flink run –c com.myflinkjobs.JobA
> >>>>>>>>> /path/to/jar/multiplejobs.jar
> >>>>>>>>>>>>     /bin/flink run –c com.myflinkjobs.JobB
> >>>>>>>>> /path/to/jar/multiplejobs.jar
> >>>>>>>>>>>>     …
> >>>>>>>>>>>>
> >>>>>>>>>>>>     Cheers
> >>>>>>>>>>>>     Malte
> >>>>>>>>>>>>
> >>>>>>>>>>>>     Von: Robert Metzger <rmetzger@apache.org <mailto:
> >>>>>>>>> rmetzger@apache.org
> >>>>>>>>>>>>>
> >>>>>>>>>>>>     Antworten an: <user@flink.apache.org <mailto:
> >>>>>>> user@flink.apache.org
> >>>>>>>>>>>
> >>>>>>>>>>>>     Datum: Freitag, 8. Mai 2015 14:57
> >>>>>>>>>>>>     An: "user@flink.apache.org <mailto:user@flink.apache.org
> >"
> >>>>>>>>>>>>     <user@flink.apache.org <ma...@flink.apache.org>>
> >>>>>>>>>>>>     Betreff: Re: Package multiple jobs in a single jar
> >>>>>>>>>>>>
> >>>>>>>>>>>>     Hi Flavio,
> >>>>>>>>>>>>
> >>>>>>>>>>>>     the pom from our quickstart is a good
> >>>>>>>>>>>>     reference:
> >>>>>>>>>>>
> >>>>>>>>>
> >>>>>>>
> >>>>
> >>
> https://github.com/apache/flink/blob/master/flink-quickstart/flink-quickstart-java/src/main/resources/archetype-resources/pom.xml
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>     On Fri, May 8, 2015 at 2:53 PM, Flavio Pompermaier
> >>>>>>>>>>>>     <pompermaier@okkam.it <ma...@okkam.it>>
> wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>>         Ok, get it.
> >>>>>>>>>>>>         And is there a reference pom.xml for shading my
> >>>> application
> >>>>>>>>> into
> >>>>>>>>>>>>         one fat-jar? which flink dependencies can I exclude?
> >>>>>>>>>>>>
> >>>>>>>>>>>>         On Fri, May 8, 2015 at 1:05 PM, Fabian Hueske <
> >>>>>>>>> fhueske@gmail.com
> >>>>>>>>>>>>         <ma...@gmail.com>> wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>>             I didn't say that the main should return the
> >>>>>>>>>>>>             ExecutionEnvironment.
> >>>>>>>>>>>>             You can define and execute as many programs in a
> >> main
> >>>>>>>>>>>>             function as you like.
> >>>>>>>>>>>>             The program can be defined somewhere else, e.g.,
> in
> >> a
> >>>>>>>>>>>>             function that receives an ExecutionEnvironment and
> >>>>>>> attaches
> >>>>>>>>>>>>             a program such as
> >>>>>>>>>>>>
> >>>>>>>>>>>>             public void buildMyProgram(ExecutionEnvironment
> >> env) {
> >>>>>>>>>>>>               DataSet<String> lines = env.readTextFile(...);
> >>>>>>>>>>>>               // do something
> >>>>>>>>>>>>               lines.writeAsText(...);
> >>>>>>>>>>>>             }
> >>>>>>>>>>>>
> >>>>>>>>>>>>             That method could be invoked from main():
> >>>>>>>>>>>>
> >>>>>>>>>>>>             psv main() {
> >>>>>>>>>>>>               ExecutionEnv env = ...
> >>>>>>>>>>>>
> >>>>>>>>>>>>               if(...) {
> >>>>>>>>>>>>                 buildMyProgram(env);
> >>>>>>>>>>>>               }
> >>>>>>>>>>>>               else {
> >>>>>>>>>>>>                 buildSomeOtherProg(env);
> >>>>>>>>>>>>               }
> >>>>>>>>>>>>
> >>>>>>>>>>>>               env.execute();
> >>>>>>>>>>>>
> >>>>>>>>>>>>               // run some more programs
> >>>>>>>>>>>>             }
> >>>>>>>>>>>>
> >>>>>>>>>>>>             2015-05-08 12:56 GMT+02:00 Flavio Pompermaier
> >>>>>>>>>>>>             <pompermaier@okkam.it <mailto:
> pompermaier@okkam.it
> >>>> :
> >>>>>>>>>>>>
> >>>>>>>>>>>>                 Hi Fabian,
> >>>>>>>>>>>>                 thanks for the response.
> >>>>>>>>>>>>                 So my mains should be converted in a method
> >>>> returning
> >>>>>>>>>>>>                 the ExecutionEnvironment.
> >>>>>>>>>>>>                 However it think that it will be very nice to
> >>>> have a
> >>>>>>>>>>>>                 syntax like the one of the Hadoop
> ProgramDriver
> >> to
> >>>>>>>>>>>>                 define jobs to invoke from a single root
> class.
> >>>>>>>>>>>>                 Do you think it could be useful?
> >>>>>>>>>>>>
> >>>>>>>>>>>>                 On Fri, May 8, 2015 at 12:42 PM, Fabian Hueske
> >>>>>>>>>>>>                 <fhueske@gmail.com <mailto:fhueske@gmail.com
> >>
> >>>>>>> wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>>                     You easily have multiple Flink programs
> in a
> >>>>>>> single
> >>>>>>>>>>>>                     JAR file.
> >>>>>>>>>>>>                     A program is defined using an
> >>>>>>> ExecutionEnvironment
> >>>>>>>>>>>>                     and executed when you call
> >>>>>>>>>>>>                     ExecutionEnvironment.exeucte().
> >>>>>>>>>>>>                     Where and how you do that does not matter.
> >>>>>>>>>>>>
> >>>>>>>>>>>>                     You can for example implement a main
> >> function
> >>>>>>> such
> >>>>>>>>>>> as:
> >>>>>>>>>>>>
> >>>>>>>>>>>>                     public static void main(String... args) {
> >>>>>>>>>>>>
> >>>>>>>>>>>>                       if (today == Monday) {
> >>>>>>>>>>>>                         ExecutionEnvironment env = ...
> >>>>>>>>>>>>                         // define Monday prog
> >>>>>>>>>>>>                         env.execute()
> >>>>>>>>>>>>                       }
> >>>>>>>>>>>>                       else {
> >>>>>>>>>>>>                         ExecutionEnvironment env = ...
> >>>>>>>>>>>>                         // define other prog
> >>>>>>>>>>>>                         env.execute()
> >>>>>>>>>>>>                       }
> >>>>>>>>>>>>                     }
> >>>>>>>>>>>>
> >>>>>>>>>>>>                     2015-05-08 11:41 GMT+02:00 Flavio
> >> Pompermaier
> >>>>>>>>>>>>                     <pompermaier@okkam.it <mailto:
> >>>>>>> pompermaier@okkam.it
> >>>>>>>>>>>>> :
> >>>>>>>>>>>>
> >>>>>>>>>>>>                         Hi to all,
> >>>>>>>>>>>>                         is there any way to keep multiple jobs
> >> in
> >>>> a
> >>>>>>> jar
> >>>>>>>>>>>>                         and then choose at runtime the one to
> >>>> execute
> >>>>>>>>>>>>                         (like what ProgramDriver does in
> >> Hadoop)?
> >>>>>>>>>>>>
> >>>>>>>>>>>>                         Best,
> >>>>>>>>>>>>                         Flavio
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>>
> >>>
> >>
> >>
> >
>
>

Re: Package multiple jobs in a single jar

Posted by "Matthias J. Sax" <mj...@informatik.hu-berlin.de>.
Hi Max,

thanks for your feedback. I guess you confuse the interfaces "Program"
and "ProgramDescription". Using "Program" the use of main method is
replaced by "getPlan(...)". However, "ProgramDescription" only adds
method "getDescription()" which returns a string that explains the usage
of the program (ie, short description, expected parameters).

Thus, adding "ProgramDescription" to the examples, does not change the
examples -- main method will still be uses. It only adds the ability
that a program "explains" itself (ie, give meta info). Furhtermore,
"ProgramDescription" is also not related to the new "ParameterTool".

-Matthias

On 05/26/2015 11:46 AM, Maximilian Michels wrote:
> I don't think `getDisplayName()` is necessary either. The class name and
> the description string should be fine. Adding ProgramDescription to the
> examples is not necessary; as already pointed out, using the main method is
> more convenient for most users. As far as I know, the idea of the
> ParameterTool was to use it only in the user code and not automatically
> handle parameters.
> 
> Changing the interface would be quite API breaking but since most programs
> use the main method, IMHO we could do it.
> 
> On Fri, May 22, 2015 at 10:09 PM, Matthias J. Sax <
> mjsax@informatik.hu-berlin.de> wrote:
> 
>> Makes sense to me. :)
>>
>> One more thing: What about extending the "ProgramDescription" interface
>> to have multiple methods as Flavio suggested (with the config(...)
>> method that should be handle by the ParameterTool)
>>
>>> public interface FlinkJob {
>>>
>>> /** The name to display in the job submission UI or shell */
>>> //e.g. "My Flink HelloWorld"
>>> String getDisplayName();
>>> //e.g. "This program does this and that etc.."
>>> String getDescription();
>>> //e.g. <0,Integer,"An integer representing my first param">,
>> <1,String,"An string representing my second param">
>>> List<Tuple3<Integer, TypeInfo, String>> paramDescription;
>>> /** Set up the flink job in the passed ExecutionEnvironment */
>>> ExecutionEnvironment config(ExecutionEnvironment env);
>>> }
>>
>> Right now, the interface is used only a couple of times in Flink's code
>> base, so it would not be a problem to update those classes. However, it
>> could break external code that uses the interface already (even if I
>> doubt that the interface is well known and used often [or at all]).
>>
>> I personally don't think, that "getDiplayName()" to too helpful.
>> Splitting the program description and the parameter description seems to
>> be useful. For example, if wrong parameters are provided, the parameter
>> description can be included in the error message. If program+parameter
>> description is given in a single string, this is not possible. But this
>> is only a minor issue of course.
>>
>> Maybe, we should also add the interface to the current Flink examples,
>> to make people more aware of it. Is there any documentation on the web
>> site.
>>
>>
>> -Matthias
>>
>>
>>
>> On 05/22/2015 09:43 PM, Robert Metzger wrote:
>>> Thank you for working on this.
>>> My responses are inline below:
>>>
>>> (Flavio)
>>>
>>>> My suggestion is to create a specific Flink interface to get also
>>>> description of a job and standardize parameter passing.
>>>
>>>
>>> I've recently merged the ParameterTool which is solving the "standardize
>>> parameter passing" problem (at least it presents a best practice) :
>>>
>> http://ci.apache.org/projects/flink/flink-docs-master/apis/best_practices.html#parsing-command-line-arguments-and-passing-them-around-in-your-flink-application
>>>
>>> Regarding the description: Maybe we can use the "ProgramDescription"
>>> interface for getting a string describing the program in the web
>> frontend.
>>>
>>> (Matthias)
>>>
>>>> I don't want to start working on it, before it's clear that it has a
>>>> chance to be
>>>> included in Flink.
>>>
>>>
>>> I think the changes discussed here won't change the current behavior, but
>>> they add new functionality which
>>> can make the life of our users easier, so I'll vote to include your
>> changes
>>> (given they meet our quality standards)
>>>
>>>
>>> If multiple classes implement "Program" interface an exception should be
>>>> through (I think that would make sense). However, I am not sure was
>>>> "good" behavior is, if a single "Program"-class is found and an
>>>> additional main-method class.
>>>>   - should "Program"-class be executed (ie, "overwrite" main-method
>> class)
>>>>   - or, better to through an exception ?
>>>
>>>
>>> I would give a class implementing "Program" priority over a random main()
>>> method in a random class.
>>> Maybe printing a WARN log message informing the user that the "Program"
>>> class has been choosen.
>>>
>>>
>>> If no "Program"-class is found, but a single main-method class, Flink
>>>> could execute using main method. But I am not sure either, if this is
>>>> "good" behavior. If multiple main-method classes are present, throwing
>>>> and exception is the only way to got, I guess.
>>>
>>>
>>> I think the best effort approach "one class with main() found" is good.
>> In
>>> case of multiple main methods, a helpful exception is the best approach
>> in
>>> my opinion.
>>>
>>>
>>>  If the manifest contains "program-class" or "Main-Class" entry,
>>>> should we check the jar file right away if the specified class is there?
>>>> Right now, no check is performed and an error occurs if the user tries
>>>> to execute the job.
>>>
>>>
>>> I'd say the current approach is sufficient. There is no need to have a
>>> special code path which is doing the check.
>>> I think the error message will be pretty similar in both cases and I fear
>>> that this additional code could also introduce new bugs ;)
>>>
>>>
>>>
>>>
>>> On Fri, May 22, 2015 at 9:06 PM, Matthias J. Sax <
>>> mjsax@informatik.hu-berlin.de> wrote:
>>>
>>>> Hi,
>>>>
>>>> two more thoughts to this discussion:
>>>>
>>>>  1) looking at the commit history of "CliFrontend", I found the
>>>> following closed issue and the closing pull request
>>>>     * https://issues.apache.org/jira/browse/FLINK-1095
>>>>     * https://github.com/apache/flink/pull/238
>>>> It stand in opposite of Flavio's request to have a job description. Any
>>>> comment on this? Should a removed feature be re-introduced? If not, I
>>>> would suggest to remove the "ProgramDescription" interface completely.
>>>>
>>>>  2) If the manifest contains "program-class" or "Main-Class" entry,
>>>> should we check the jar file right away if the specified class is there?
>>>> Right now, no check is performed and an error occurs if the user tries
>>>> to execute the job.
>>>>
>>>>
>>>> -Matthias
>>>>
>>>>
>>>> On 05/22/2015 12:06 PM, Matthias J. Sax wrote:
>>>>> Thanks for your feedback.
>>>>>
>>>>> I agree on the main method "problem". For scanning and listing all
>> stuff
>>>>> that is found it's fine.
>>>>>
>>>>> The tricky question is the automatic invocation mechanism, if "-c" flag
>>>>> is not used, and no manifest program-class or Main-Class entry is
>> found.
>>>>>
>>>>> If multiple classes implement "Program" interface an exception should
>> be
>>>>> through (I think that would make sense). However, I am not sure was
>>>>> "good" behavior is, if a single "Program"-class is found and an
>>>>> additional main-method class.
>>>>>   - should "Program"-class be executed (ie, "overwrite" main-method
>>>> class)
>>>>>   - or, better to through an exception ?
>>>>>
>>>>> If no "Program"-class is found, but a single main-method class, Flink
>>>>> could execute using main method. But I am not sure either, if this is
>>>>> "good" behavior. If multiple main-method classes are present, throwing
>>>>> and exception is the only way to got, I guess.
>>>>>
>>>>> To sum up: Should Flink consider main-method classes for automatic
>>>>> invocation, or should it be required for main-method classes to either
>>>>> list them in "program-class" or "Main-Class" manifest parameter (to
>>>>> enable them for automatic invocation)?
>>>>>
>>>>>
>>>>> -Matthias
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On 05/22/2015 09:56 AM, Maximilian Michels wrote:
>>>>>> Hi Matthias,
>>>>>>
>>>>>> Thank you for taking the time to analyze Flink's invocation behavior.
>> I
>>>>>> like your proposal. I'm not sure whether it is a good idea to scan the
>>>>>> entire JAR for main methods. Sometimes, main methods are added solely
>>>> for
>>>>>> testing purposes and don't really serve any practical use. However, if
>>>>>> you're already going through the JAR to find the ProgramDescription
>>>>>> interface, then you might look for main methods as well. As long as it
>>>> is
>>>>>> just a listing without execution, that should be fine.
>>>>>>
>>>>>> Best regards,
>>>>>> Max
>>>>>>
>>>>>> On Thu, May 21, 2015 at 3:43 PM, Matthias J. Sax <
>>>>>> mjsax@informatik.hu-berlin.de> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> I had a look into the current Workflow of Flink with regard to the
>>>>>>> progressing steps of a jar file.
>>>>>>>
>>>>>>> If I got it right it works as follows (not sure if this is documented
>>>>>>> somewhere):
>>>>>>>
>>>>>>> 1) check, if "-c" flag is used to set program entry point
>>>>>>>    if yes, goto 4
>>>>>>> 2) try to extract "program-class" property from manifest
>>>>>>>    (if found goto 4)
>>>>>>> 3) try to extract "Main-Class" property from manifest
>>>>>>>    -> if not found through exception (this happens also, if no
>> manifest
>>>>>>> file is found at all)
>>>>>>>
>>>>>>> 4) check if entry point class implements "Program" interface
>>>>>>>    if yes, goto 6
>>>>>>> 5) check if entry point class provided "public static void
>>>> main(String[]
>>>>>>> args)" method
>>>>>>>    -> if not, through exception
>>>>>>>
>>>>>>> 6) execute program (ie, show plan/info or really run it)
>>>>>>>
>>>>>>>
>>>>>>> I also "discovered" the interface "ProgramDescription" with a single
>>>>>>> method "String getDescription()". Even if some examples implement
>> this
>>>>>>> interface (and use it in the example itself), Flink basically ignores
>>>>>>> it... From the CLI there is no way to get this info, and the WebUI
>> does
>>>>>>> actually get it if present, however, doesn't show it anywhere...
>>>>>>>
>>>>>>>
>>>>>>> I think it would be nice, if we would extend the following functions:
>>>>>>>
>>>>>>>  - extend the possibility to specify multiple entry classes in
>>>>>>> "program-class" or "Main-Class" -> in this case, the user needs to
>> use
>>>>>>> "-c" flag to pick program to run every time
>>>>>>>
>>>>>>>  - add a CLI option that allows the user to see what entry point
>>>> classes
>>>>>>> are available
>>>>>>>    for this, consider
>>>>>>>      a) "program-class" entry
>>>>>>>      b) "Main-Class" entry
>>>>>>>      c) if neither is found, scan jar-file for classes implementing
>>>>>>> "Program" interface
>>>>>>>      d) if still not found, scan jar-file for classes with "main"
>>>> method
>>>>>>>
>>>>>>>  - if user looks for entry point classes via CLI, check for
>>>>>>> "ProgramDesciption" interface and show info
>>>>>>>
>>>>>>>  - extend WebUI to show all available entry-classes (pull request
>>>>>>> already there, for multiple entries in "program-class")
>>>>>>>
>>>>>>>  - extend WebUI to show "ProgramDescription" info
>>>>>>>
>>>>>>>
>>>>>>> What do you think? I am not too sure about the "auto scan" of the jar
>>>>>>> file if no manifest entry is provided. We might get some "fat jars"
>> and
>>>>>>> scanning might take some time.
>>>>>>>
>>>>>>>
>>>>>>> -Matthias
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 05/19/2015 10:44 AM, Stephan Ewen wrote:
>>>>>>>> We actually has an interface like that before ("Program"). It is
>> still
>>>>>>>> supported, but in all new programs we simply use the Java main
>> method.
>>>>>>> The
>>>>>>>> advantage is that
>>>>>>>> most IDEs can create executable JARs automatically, setting the JAR
>>>>>>>> manifest attributes, etc.
>>>>>>>>
>>>>>>>> The "Program" interface still works, though. Most tool classes (like
>>>>>>>> "PackagedProgram") have a way to figure out whether the code uses
>>>>>>> "main()"
>>>>>>>> or implements "Program"
>>>>>>>> and calls the right method.
>>>>>>>>
>>>>>>>> You can try and extend the program interface. If you want to
>>>> consistently
>>>>>>>> support multiple programs in one JAR file, you may need to adjust
>> the
>>>>>>> util
>>>>>>>> classes as
>>>>>>>> well to deal with that.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, May 19, 2015 at 10:10 AM, Matthias J. Sax <
>>>>>>>> mjsax@informatik.hu-berlin.de> wrote:
>>>>>>>>
>>>>>>>>> Supporting an interface like this seems to be a nice idea. Any
>> other
>>>>>>>>> opinions on it?
>>>>>>>>>
>>>>>>>>> It seems to be some more work to get it done right. I don't want to
>>>>>>>>> start working on it, before it's clear that it has a chance to be
>>>>>>>>> included in Flink.
>>>>>>>>>
>>>>>>>>> @Flavio: I moved the discussion to dev mailing list (user list is
>> not
>>>>>>>>> appropriate for this discussion). Are you subscribed to it or
>> should
>>>> I
>>>>>>>>> cc you in each mail?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> -Matthias
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 05/19/2015 09:39 AM, Flavio Pompermaier wrote:
>>>>>>>>>> Nice feature Matthias!
>>>>>>>>>> My suggestion is to create a specific Flink interface to get also
>>>>>>>>>> description of a job and standardize parameter passing.
>>>>>>>>>> Then, somewhere (e.g. Manifest) you could specify the list of
>>>> packages
>>>>>>>>> (or
>>>>>>>>>> also directly the classes) to inspect with reflection to extract
>> the
>>>>>>> list
>>>>>>>>>> of available Flink jobs.
>>>>>>>>>> Something like:
>>>>>>>>>>
>>>>>>>>>> public interface FlinkJob {
>>>>>>>>>>
>>>>>>>>>> /** The name to display in the job submission UI or shell */
>>>>>>>>>> //e.g. "My Flink HelloWorld"
>>>>>>>>>> String getDisplayName();
>>>>>>>>>>  //e.g. "This program does this and that etc.."
>>>>>>>>>> String getDescription();
>>>>>>>>>>  //e.g. <0,Integer,"An integer representing my first param">,
>>>>>>>>> <1,String,"An
>>>>>>>>>> string representing my second param">
>>>>>>>>>> List<Tuple3<Integer, TypeInfo, String>> paramDescription;
>>>>>>>>>>  /** Set up the flink job in the passed ExecutionEnvironment */
>>>>>>>>>> ExecutionEnvironment config(ExecutionEnvironment env);
>>>>>>>>>> }
>>>>>>>>>>
>>>>>>>>>> What do you think?
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Sun, May 17, 2015 at 10:38 PM, Matthias J. Sax <
>>>>>>>>>> mjsax@informatik.hu-berlin.de> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi,
>>>>>>>>>>>
>>>>>>>>>>> I like the idea that Flink's WebClient can show different plans
>> for
>>>>>>>>>>> different jobs within a single jar file.
>>>>>>>>>>>
>>>>>>>>>>> I prepared a prototype for this feature. You can find it here:
>>>>>>>>>>> https://github.com/mjsax/flink/tree/multipleJobsWebUI
>>>>>>>>>>>
>>>>>>>>>>> To test the feature, you need to prepare a jar file, that
>> contains
>>>> the
>>>>>>>>>>> code of multiple programs and specify each entry class in the
>>>> manifest
>>>>>>>>>>> file as comma separated values in "program-class" line.
>>>>>>>>>>>
>>>>>>>>>>> Feedback is welcome. :)
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> -Matthias
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 05/08/2015 03:08 PM, Flavio Pompermaier wrote:
>>>>>>>>>>>> Thank you all for the support!
>>>>>>>>>>>> It will be a really nice feature if the web client could be able
>>>> to
>>>>>>>>> show
>>>>>>>>>>>> me the list of Flink jobs within my jar..
>>>>>>>>>>>> it should be sufficient to mark them with a special annotation
>> and
>>>>>>>>>>>> inspect the classes within the jar..
>>>>>>>>>>>>
>>>>>>>>>>>> On Fri, May 8, 2015 at 3:03 PM, Malte Schwarzer <ms@mieo.de
>>>>>>>>>>>> <ma...@mieo.de>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>     Hi Flavio,
>>>>>>>>>>>>
>>>>>>>>>>>>     you also can put each job in a single class and use the –c
>>>>>>>>> parameter
>>>>>>>>>>>>     to execute jobs separately:
>>>>>>>>>>>>
>>>>>>>>>>>>     /bin/flink run –c com.myflinkjobs.JobA
>>>>>>>>> /path/to/jar/multiplejobs.jar
>>>>>>>>>>>>     /bin/flink run –c com.myflinkjobs.JobB
>>>>>>>>> /path/to/jar/multiplejobs.jar
>>>>>>>>>>>>     …
>>>>>>>>>>>>
>>>>>>>>>>>>     Cheers
>>>>>>>>>>>>     Malte
>>>>>>>>>>>>
>>>>>>>>>>>>     Von: Robert Metzger <rmetzger@apache.org <mailto:
>>>>>>>>> rmetzger@apache.org
>>>>>>>>>>>>>
>>>>>>>>>>>>     Antworten an: <user@flink.apache.org <mailto:
>>>>>>> user@flink.apache.org
>>>>>>>>>>>
>>>>>>>>>>>>     Datum: Freitag, 8. Mai 2015 14:57
>>>>>>>>>>>>     An: "user@flink.apache.org <ma...@flink.apache.org>"
>>>>>>>>>>>>     <user@flink.apache.org <ma...@flink.apache.org>>
>>>>>>>>>>>>     Betreff: Re: Package multiple jobs in a single jar
>>>>>>>>>>>>
>>>>>>>>>>>>     Hi Flavio,
>>>>>>>>>>>>
>>>>>>>>>>>>     the pom from our quickstart is a good
>>>>>>>>>>>>     reference:
>>>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>
>> https://github.com/apache/flink/blob/master/flink-quickstart/flink-quickstart-java/src/main/resources/archetype-resources/pom.xml
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>     On Fri, May 8, 2015 at 2:53 PM, Flavio Pompermaier
>>>>>>>>>>>>     <pompermaier@okkam.it <ma...@okkam.it>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>         Ok, get it.
>>>>>>>>>>>>         And is there a reference pom.xml for shading my
>>>> application
>>>>>>>>> into
>>>>>>>>>>>>         one fat-jar? which flink dependencies can I exclude?
>>>>>>>>>>>>
>>>>>>>>>>>>         On Fri, May 8, 2015 at 1:05 PM, Fabian Hueske <
>>>>>>>>> fhueske@gmail.com
>>>>>>>>>>>>         <ma...@gmail.com>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>             I didn't say that the main should return the
>>>>>>>>>>>>             ExecutionEnvironment.
>>>>>>>>>>>>             You can define and execute as many programs in a
>> main
>>>>>>>>>>>>             function as you like.
>>>>>>>>>>>>             The program can be defined somewhere else, e.g., in
>> a
>>>>>>>>>>>>             function that receives an ExecutionEnvironment and
>>>>>>> attaches
>>>>>>>>>>>>             a program such as
>>>>>>>>>>>>
>>>>>>>>>>>>             public void buildMyProgram(ExecutionEnvironment
>> env) {
>>>>>>>>>>>>               DataSet<String> lines = env.readTextFile(...);
>>>>>>>>>>>>               // do something
>>>>>>>>>>>>               lines.writeAsText(...);
>>>>>>>>>>>>             }
>>>>>>>>>>>>
>>>>>>>>>>>>             That method could be invoked from main():
>>>>>>>>>>>>
>>>>>>>>>>>>             psv main() {
>>>>>>>>>>>>               ExecutionEnv env = ...
>>>>>>>>>>>>
>>>>>>>>>>>>               if(...) {
>>>>>>>>>>>>                 buildMyProgram(env);
>>>>>>>>>>>>               }
>>>>>>>>>>>>               else {
>>>>>>>>>>>>                 buildSomeOtherProg(env);
>>>>>>>>>>>>               }
>>>>>>>>>>>>
>>>>>>>>>>>>               env.execute();
>>>>>>>>>>>>
>>>>>>>>>>>>               // run some more programs
>>>>>>>>>>>>             }
>>>>>>>>>>>>
>>>>>>>>>>>>             2015-05-08 12:56 GMT+02:00 Flavio Pompermaier
>>>>>>>>>>>>             <pompermaier@okkam.it <mailto:pompermaier@okkam.it
>>>> :
>>>>>>>>>>>>
>>>>>>>>>>>>                 Hi Fabian,
>>>>>>>>>>>>                 thanks for the response.
>>>>>>>>>>>>                 So my mains should be converted in a method
>>>> returning
>>>>>>>>>>>>                 the ExecutionEnvironment.
>>>>>>>>>>>>                 However it think that it will be very nice to
>>>> have a
>>>>>>>>>>>>                 syntax like the one of the Hadoop ProgramDriver
>> to
>>>>>>>>>>>>                 define jobs to invoke from a single root class.
>>>>>>>>>>>>                 Do you think it could be useful?
>>>>>>>>>>>>
>>>>>>>>>>>>                 On Fri, May 8, 2015 at 12:42 PM, Fabian Hueske
>>>>>>>>>>>>                 <fhueske@gmail.com <ma...@gmail.com>>
>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>                     You easily have multiple Flink programs in a
>>>>>>> single
>>>>>>>>>>>>                     JAR file.
>>>>>>>>>>>>                     A program is defined using an
>>>>>>> ExecutionEnvironment
>>>>>>>>>>>>                     and executed when you call
>>>>>>>>>>>>                     ExecutionEnvironment.exeucte().
>>>>>>>>>>>>                     Where and how you do that does not matter.
>>>>>>>>>>>>
>>>>>>>>>>>>                     You can for example implement a main
>> function
>>>>>>> such
>>>>>>>>>>> as:
>>>>>>>>>>>>
>>>>>>>>>>>>                     public static void main(String... args) {
>>>>>>>>>>>>
>>>>>>>>>>>>                       if (today == Monday) {
>>>>>>>>>>>>                         ExecutionEnvironment env = ...
>>>>>>>>>>>>                         // define Monday prog
>>>>>>>>>>>>                         env.execute()
>>>>>>>>>>>>                       }
>>>>>>>>>>>>                       else {
>>>>>>>>>>>>                         ExecutionEnvironment env = ...
>>>>>>>>>>>>                         // define other prog
>>>>>>>>>>>>                         env.execute()
>>>>>>>>>>>>                       }
>>>>>>>>>>>>                     }
>>>>>>>>>>>>
>>>>>>>>>>>>                     2015-05-08 11:41 GMT+02:00 Flavio
>> Pompermaier
>>>>>>>>>>>>                     <pompermaier@okkam.it <mailto:
>>>>>>> pompermaier@okkam.it
>>>>>>>>>>>>> :
>>>>>>>>>>>>
>>>>>>>>>>>>                         Hi to all,
>>>>>>>>>>>>                         is there any way to keep multiple jobs
>> in
>>>> a
>>>>>>> jar
>>>>>>>>>>>>                         and then choose at runtime the one to
>>>> execute
>>>>>>>>>>>>                         (like what ProgramDriver does in
>> Hadoop)?
>>>>>>>>>>>>
>>>>>>>>>>>>                         Best,
>>>>>>>>>>>>                         Flavio
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>
>>
>>
> 


Re: Package multiple jobs in a single jar

Posted by Maximilian Michels <mx...@apache.org>.
I don't think `getDisplayName()` is necessary either. The class name and
the description string should be fine. Adding ProgramDescription to the
examples is not necessary; as already pointed out, using the main method is
more convenient for most users. As far as I know, the idea of the
ParameterTool was to use it only in the user code and not automatically
handle parameters.

Changing the interface would be quite API breaking but since most programs
use the main method, IMHO we could do it.

On Fri, May 22, 2015 at 10:09 PM, Matthias J. Sax <
mjsax@informatik.hu-berlin.de> wrote:

> Makes sense to me. :)
>
> One more thing: What about extending the "ProgramDescription" interface
> to have multiple methods as Flavio suggested (with the config(...)
> method that should be handle by the ParameterTool)
>
> > public interface FlinkJob {
> >
> > /** The name to display in the job submission UI or shell */
> > //e.g. "My Flink HelloWorld"
> > String getDisplayName();
> > //e.g. "This program does this and that etc.."
> > String getDescription();
> > //e.g. <0,Integer,"An integer representing my first param">,
> <1,String,"An string representing my second param">
> > List<Tuple3<Integer, TypeInfo, String>> paramDescription;
> > /** Set up the flink job in the passed ExecutionEnvironment */
> > ExecutionEnvironment config(ExecutionEnvironment env);
> > }
>
> Right now, the interface is used only a couple of times in Flink's code
> base, so it would not be a problem to update those classes. However, it
> could break external code that uses the interface already (even if I
> doubt that the interface is well known and used often [or at all]).
>
> I personally don't think, that "getDiplayName()" to too helpful.
> Splitting the program description and the parameter description seems to
> be useful. For example, if wrong parameters are provided, the parameter
> description can be included in the error message. If program+parameter
> description is given in a single string, this is not possible. But this
> is only a minor issue of course.
>
> Maybe, we should also add the interface to the current Flink examples,
> to make people more aware of it. Is there any documentation on the web
> site.
>
>
> -Matthias
>
>
>
> On 05/22/2015 09:43 PM, Robert Metzger wrote:
> > Thank you for working on this.
> > My responses are inline below:
> >
> > (Flavio)
> >
> >> My suggestion is to create a specific Flink interface to get also
> >> description of a job and standardize parameter passing.
> >
> >
> > I've recently merged the ParameterTool which is solving the "standardize
> > parameter passing" problem (at least it presents a best practice) :
> >
> http://ci.apache.org/projects/flink/flink-docs-master/apis/best_practices.html#parsing-command-line-arguments-and-passing-them-around-in-your-flink-application
> >
> > Regarding the description: Maybe we can use the "ProgramDescription"
> > interface for getting a string describing the program in the web
> frontend.
> >
> > (Matthias)
> >
> >> I don't want to start working on it, before it's clear that it has a
> >> chance to be
> >> included in Flink.
> >
> >
> > I think the changes discussed here won't change the current behavior, but
> > they add new functionality which
> > can make the life of our users easier, so I'll vote to include your
> changes
> > (given they meet our quality standards)
> >
> >
> > If multiple classes implement "Program" interface an exception should be
> >> through (I think that would make sense). However, I am not sure was
> >> "good" behavior is, if a single "Program"-class is found and an
> >> additional main-method class.
> >>   - should "Program"-class be executed (ie, "overwrite" main-method
> class)
> >>   - or, better to through an exception ?
> >
> >
> > I would give a class implementing "Program" priority over a random main()
> > method in a random class.
> > Maybe printing a WARN log message informing the user that the "Program"
> > class has been choosen.
> >
> >
> > If no "Program"-class is found, but a single main-method class, Flink
> >> could execute using main method. But I am not sure either, if this is
> >> "good" behavior. If multiple main-method classes are present, throwing
> >> and exception is the only way to got, I guess.
> >
> >
> > I think the best effort approach "one class with main() found" is good.
> In
> > case of multiple main methods, a helpful exception is the best approach
> in
> > my opinion.
> >
> >
> >  If the manifest contains "program-class" or "Main-Class" entry,
> >> should we check the jar file right away if the specified class is there?
> >> Right now, no check is performed and an error occurs if the user tries
> >> to execute the job.
> >
> >
> > I'd say the current approach is sufficient. There is no need to have a
> > special code path which is doing the check.
> > I think the error message will be pretty similar in both cases and I fear
> > that this additional code could also introduce new bugs ;)
> >
> >
> >
> >
> > On Fri, May 22, 2015 at 9:06 PM, Matthias J. Sax <
> > mjsax@informatik.hu-berlin.de> wrote:
> >
> >> Hi,
> >>
> >> two more thoughts to this discussion:
> >>
> >>  1) looking at the commit history of "CliFrontend", I found the
> >> following closed issue and the closing pull request
> >>     * https://issues.apache.org/jira/browse/FLINK-1095
> >>     * https://github.com/apache/flink/pull/238
> >> It stand in opposite of Flavio's request to have a job description. Any
> >> comment on this? Should a removed feature be re-introduced? If not, I
> >> would suggest to remove the "ProgramDescription" interface completely.
> >>
> >>  2) If the manifest contains "program-class" or "Main-Class" entry,
> >> should we check the jar file right away if the specified class is there?
> >> Right now, no check is performed and an error occurs if the user tries
> >> to execute the job.
> >>
> >>
> >> -Matthias
> >>
> >>
> >> On 05/22/2015 12:06 PM, Matthias J. Sax wrote:
> >>> Thanks for your feedback.
> >>>
> >>> I agree on the main method "problem". For scanning and listing all
> stuff
> >>> that is found it's fine.
> >>>
> >>> The tricky question is the automatic invocation mechanism, if "-c" flag
> >>> is not used, and no manifest program-class or Main-Class entry is
> found.
> >>>
> >>> If multiple classes implement "Program" interface an exception should
> be
> >>> through (I think that would make sense). However, I am not sure was
> >>> "good" behavior is, if a single "Program"-class is found and an
> >>> additional main-method class.
> >>>   - should "Program"-class be executed (ie, "overwrite" main-method
> >> class)
> >>>   - or, better to through an exception ?
> >>>
> >>> If no "Program"-class is found, but a single main-method class, Flink
> >>> could execute using main method. But I am not sure either, if this is
> >>> "good" behavior. If multiple main-method classes are present, throwing
> >>> and exception is the only way to got, I guess.
> >>>
> >>> To sum up: Should Flink consider main-method classes for automatic
> >>> invocation, or should it be required for main-method classes to either
> >>> list them in "program-class" or "Main-Class" manifest parameter (to
> >>> enable them for automatic invocation)?
> >>>
> >>>
> >>> -Matthias
> >>>
> >>>
> >>>
> >>>
> >>> On 05/22/2015 09:56 AM, Maximilian Michels wrote:
> >>>> Hi Matthias,
> >>>>
> >>>> Thank you for taking the time to analyze Flink's invocation behavior.
> I
> >>>> like your proposal. I'm not sure whether it is a good idea to scan the
> >>>> entire JAR for main methods. Sometimes, main methods are added solely
> >> for
> >>>> testing purposes and don't really serve any practical use. However, if
> >>>> you're already going through the JAR to find the ProgramDescription
> >>>> interface, then you might look for main methods as well. As long as it
> >> is
> >>>> just a listing without execution, that should be fine.
> >>>>
> >>>> Best regards,
> >>>> Max
> >>>>
> >>>> On Thu, May 21, 2015 at 3:43 PM, Matthias J. Sax <
> >>>> mjsax@informatik.hu-berlin.de> wrote:
> >>>>
> >>>>> Hi,
> >>>>>
> >>>>> I had a look into the current Workflow of Flink with regard to the
> >>>>> progressing steps of a jar file.
> >>>>>
> >>>>> If I got it right it works as follows (not sure if this is documented
> >>>>> somewhere):
> >>>>>
> >>>>> 1) check, if "-c" flag is used to set program entry point
> >>>>>    if yes, goto 4
> >>>>> 2) try to extract "program-class" property from manifest
> >>>>>    (if found goto 4)
> >>>>> 3) try to extract "Main-Class" property from manifest
> >>>>>    -> if not found through exception (this happens also, if no
> manifest
> >>>>> file is found at all)
> >>>>>
> >>>>> 4) check if entry point class implements "Program" interface
> >>>>>    if yes, goto 6
> >>>>> 5) check if entry point class provided "public static void
> >> main(String[]
> >>>>> args)" method
> >>>>>    -> if not, through exception
> >>>>>
> >>>>> 6) execute program (ie, show plan/info or really run it)
> >>>>>
> >>>>>
> >>>>> I also "discovered" the interface "ProgramDescription" with a single
> >>>>> method "String getDescription()". Even if some examples implement
> this
> >>>>> interface (and use it in the example itself), Flink basically ignores
> >>>>> it... From the CLI there is no way to get this info, and the WebUI
> does
> >>>>> actually get it if present, however, doesn't show it anywhere...
> >>>>>
> >>>>>
> >>>>> I think it would be nice, if we would extend the following functions:
> >>>>>
> >>>>>  - extend the possibility to specify multiple entry classes in
> >>>>> "program-class" or "Main-Class" -> in this case, the user needs to
> use
> >>>>> "-c" flag to pick program to run every time
> >>>>>
> >>>>>  - add a CLI option that allows the user to see what entry point
> >> classes
> >>>>> are available
> >>>>>    for this, consider
> >>>>>      a) "program-class" entry
> >>>>>      b) "Main-Class" entry
> >>>>>      c) if neither is found, scan jar-file for classes implementing
> >>>>> "Program" interface
> >>>>>      d) if still not found, scan jar-file for classes with "main"
> >> method
> >>>>>
> >>>>>  - if user looks for entry point classes via CLI, check for
> >>>>> "ProgramDesciption" interface and show info
> >>>>>
> >>>>>  - extend WebUI to show all available entry-classes (pull request
> >>>>> already there, for multiple entries in "program-class")
> >>>>>
> >>>>>  - extend WebUI to show "ProgramDescription" info
> >>>>>
> >>>>>
> >>>>> What do you think? I am not too sure about the "auto scan" of the jar
> >>>>> file if no manifest entry is provided. We might get some "fat jars"
> and
> >>>>> scanning might take some time.
> >>>>>
> >>>>>
> >>>>> -Matthias
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>> On 05/19/2015 10:44 AM, Stephan Ewen wrote:
> >>>>>> We actually has an interface like that before ("Program"). It is
> still
> >>>>>> supported, but in all new programs we simply use the Java main
> method.
> >>>>> The
> >>>>>> advantage is that
> >>>>>> most IDEs can create executable JARs automatically, setting the JAR
> >>>>>> manifest attributes, etc.
> >>>>>>
> >>>>>> The "Program" interface still works, though. Most tool classes (like
> >>>>>> "PackagedProgram") have a way to figure out whether the code uses
> >>>>> "main()"
> >>>>>> or implements "Program"
> >>>>>> and calls the right method.
> >>>>>>
> >>>>>> You can try and extend the program interface. If you want to
> >> consistently
> >>>>>> support multiple programs in one JAR file, you may need to adjust
> the
> >>>>> util
> >>>>>> classes as
> >>>>>> well to deal with that.
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> On Tue, May 19, 2015 at 10:10 AM, Matthias J. Sax <
> >>>>>> mjsax@informatik.hu-berlin.de> wrote:
> >>>>>>
> >>>>>>> Supporting an interface like this seems to be a nice idea. Any
> other
> >>>>>>> opinions on it?
> >>>>>>>
> >>>>>>> It seems to be some more work to get it done right. I don't want to
> >>>>>>> start working on it, before it's clear that it has a chance to be
> >>>>>>> included in Flink.
> >>>>>>>
> >>>>>>> @Flavio: I moved the discussion to dev mailing list (user list is
> not
> >>>>>>> appropriate for this discussion). Are you subscribed to it or
> should
> >> I
> >>>>>>> cc you in each mail?
> >>>>>>>
> >>>>>>>
> >>>>>>> -Matthias
> >>>>>>>
> >>>>>>>
> >>>>>>> On 05/19/2015 09:39 AM, Flavio Pompermaier wrote:
> >>>>>>>> Nice feature Matthias!
> >>>>>>>> My suggestion is to create a specific Flink interface to get also
> >>>>>>>> description of a job and standardize parameter passing.
> >>>>>>>> Then, somewhere (e.g. Manifest) you could specify the list of
> >> packages
> >>>>>>> (or
> >>>>>>>> also directly the classes) to inspect with reflection to extract
> the
> >>>>> list
> >>>>>>>> of available Flink jobs.
> >>>>>>>> Something like:
> >>>>>>>>
> >>>>>>>> public interface FlinkJob {
> >>>>>>>>
> >>>>>>>> /** The name to display in the job submission UI or shell */
> >>>>>>>> //e.g. "My Flink HelloWorld"
> >>>>>>>> String getDisplayName();
> >>>>>>>>  //e.g. "This program does this and that etc.."
> >>>>>>>> String getDescription();
> >>>>>>>>  //e.g. <0,Integer,"An integer representing my first param">,
> >>>>>>> <1,String,"An
> >>>>>>>> string representing my second param">
> >>>>>>>> List<Tuple3<Integer, TypeInfo, String>> paramDescription;
> >>>>>>>>  /** Set up the flink job in the passed ExecutionEnvironment */
> >>>>>>>> ExecutionEnvironment config(ExecutionEnvironment env);
> >>>>>>>> }
> >>>>>>>>
> >>>>>>>> What do you think?
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On Sun, May 17, 2015 at 10:38 PM, Matthias J. Sax <
> >>>>>>>> mjsax@informatik.hu-berlin.de> wrote:
> >>>>>>>>
> >>>>>>>>> Hi,
> >>>>>>>>>
> >>>>>>>>> I like the idea that Flink's WebClient can show different plans
> for
> >>>>>>>>> different jobs within a single jar file.
> >>>>>>>>>
> >>>>>>>>> I prepared a prototype for this feature. You can find it here:
> >>>>>>>>> https://github.com/mjsax/flink/tree/multipleJobsWebUI
> >>>>>>>>>
> >>>>>>>>> To test the feature, you need to prepare a jar file, that
> contains
> >> the
> >>>>>>>>> code of multiple programs and specify each entry class in the
> >> manifest
> >>>>>>>>> file as comma separated values in "program-class" line.
> >>>>>>>>>
> >>>>>>>>> Feedback is welcome. :)
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> -Matthias
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> On 05/08/2015 03:08 PM, Flavio Pompermaier wrote:
> >>>>>>>>>> Thank you all for the support!
> >>>>>>>>>> It will be a really nice feature if the web client could be able
> >> to
> >>>>>>> show
> >>>>>>>>>> me the list of Flink jobs within my jar..
> >>>>>>>>>> it should be sufficient to mark them with a special annotation
> and
> >>>>>>>>>> inspect the classes within the jar..
> >>>>>>>>>>
> >>>>>>>>>> On Fri, May 8, 2015 at 3:03 PM, Malte Schwarzer <ms@mieo.de
> >>>>>>>>>> <ma...@mieo.de>> wrote:
> >>>>>>>>>>
> >>>>>>>>>>     Hi Flavio,
> >>>>>>>>>>
> >>>>>>>>>>     you also can put each job in a single class and use the –c
> >>>>>>> parameter
> >>>>>>>>>>     to execute jobs separately:
> >>>>>>>>>>
> >>>>>>>>>>     /bin/flink run –c com.myflinkjobs.JobA
> >>>>>>> /path/to/jar/multiplejobs.jar
> >>>>>>>>>>     /bin/flink run –c com.myflinkjobs.JobB
> >>>>>>> /path/to/jar/multiplejobs.jar
> >>>>>>>>>>     …
> >>>>>>>>>>
> >>>>>>>>>>     Cheers
> >>>>>>>>>>     Malte
> >>>>>>>>>>
> >>>>>>>>>>     Von: Robert Metzger <rmetzger@apache.org <mailto:
> >>>>>>> rmetzger@apache.org
> >>>>>>>>>>>
> >>>>>>>>>>     Antworten an: <user@flink.apache.org <mailto:
> >>>>> user@flink.apache.org
> >>>>>>>>>
> >>>>>>>>>>     Datum: Freitag, 8. Mai 2015 14:57
> >>>>>>>>>>     An: "user@flink.apache.org <ma...@flink.apache.org>"
> >>>>>>>>>>     <user@flink.apache.org <ma...@flink.apache.org>>
> >>>>>>>>>>     Betreff: Re: Package multiple jobs in a single jar
> >>>>>>>>>>
> >>>>>>>>>>     Hi Flavio,
> >>>>>>>>>>
> >>>>>>>>>>     the pom from our quickstart is a good
> >>>>>>>>>>     reference:
> >>>>>>>>>
> >>>>>>>
> >>>>>
> >>
> https://github.com/apache/flink/blob/master/flink-quickstart/flink-quickstart-java/src/main/resources/archetype-resources/pom.xml
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>     On Fri, May 8, 2015 at 2:53 PM, Flavio Pompermaier
> >>>>>>>>>>     <pompermaier@okkam.it <ma...@okkam.it>> wrote:
> >>>>>>>>>>
> >>>>>>>>>>         Ok, get it.
> >>>>>>>>>>         And is there a reference pom.xml for shading my
> >> application
> >>>>>>> into
> >>>>>>>>>>         one fat-jar? which flink dependencies can I exclude?
> >>>>>>>>>>
> >>>>>>>>>>         On Fri, May 8, 2015 at 1:05 PM, Fabian Hueske <
> >>>>>>> fhueske@gmail.com
> >>>>>>>>>>         <ma...@gmail.com>> wrote:
> >>>>>>>>>>
> >>>>>>>>>>             I didn't say that the main should return the
> >>>>>>>>>>             ExecutionEnvironment.
> >>>>>>>>>>             You can define and execute as many programs in a
> main
> >>>>>>>>>>             function as you like.
> >>>>>>>>>>             The program can be defined somewhere else, e.g., in
> a
> >>>>>>>>>>             function that receives an ExecutionEnvironment and
> >>>>> attaches
> >>>>>>>>>>             a program such as
> >>>>>>>>>>
> >>>>>>>>>>             public void buildMyProgram(ExecutionEnvironment
> env) {
> >>>>>>>>>>               DataSet<String> lines = env.readTextFile(...);
> >>>>>>>>>>               // do something
> >>>>>>>>>>               lines.writeAsText(...);
> >>>>>>>>>>             }
> >>>>>>>>>>
> >>>>>>>>>>             That method could be invoked from main():
> >>>>>>>>>>
> >>>>>>>>>>             psv main() {
> >>>>>>>>>>               ExecutionEnv env = ...
> >>>>>>>>>>
> >>>>>>>>>>               if(...) {
> >>>>>>>>>>                 buildMyProgram(env);
> >>>>>>>>>>               }
> >>>>>>>>>>               else {
> >>>>>>>>>>                 buildSomeOtherProg(env);
> >>>>>>>>>>               }
> >>>>>>>>>>
> >>>>>>>>>>               env.execute();
> >>>>>>>>>>
> >>>>>>>>>>               // run some more programs
> >>>>>>>>>>             }
> >>>>>>>>>>
> >>>>>>>>>>             2015-05-08 12:56 GMT+02:00 Flavio Pompermaier
> >>>>>>>>>>             <pompermaier@okkam.it <mailto:pompermaier@okkam.it
> >>:
> >>>>>>>>>>
> >>>>>>>>>>                 Hi Fabian,
> >>>>>>>>>>                 thanks for the response.
> >>>>>>>>>>                 So my mains should be converted in a method
> >> returning
> >>>>>>>>>>                 the ExecutionEnvironment.
> >>>>>>>>>>                 However it think that it will be very nice to
> >> have a
> >>>>>>>>>>                 syntax like the one of the Hadoop ProgramDriver
> to
> >>>>>>>>>>                 define jobs to invoke from a single root class.
> >>>>>>>>>>                 Do you think it could be useful?
> >>>>>>>>>>
> >>>>>>>>>>                 On Fri, May 8, 2015 at 12:42 PM, Fabian Hueske
> >>>>>>>>>>                 <fhueske@gmail.com <ma...@gmail.com>>
> >>>>> wrote:
> >>>>>>>>>>
> >>>>>>>>>>                     You easily have multiple Flink programs in a
> >>>>> single
> >>>>>>>>>>                     JAR file.
> >>>>>>>>>>                     A program is defined using an
> >>>>> ExecutionEnvironment
> >>>>>>>>>>                     and executed when you call
> >>>>>>>>>>                     ExecutionEnvironment.exeucte().
> >>>>>>>>>>                     Where and how you do that does not matter.
> >>>>>>>>>>
> >>>>>>>>>>                     You can for example implement a main
> function
> >>>>> such
> >>>>>>>>> as:
> >>>>>>>>>>
> >>>>>>>>>>                     public static void main(String... args) {
> >>>>>>>>>>
> >>>>>>>>>>                       if (today == Monday) {
> >>>>>>>>>>                         ExecutionEnvironment env = ...
> >>>>>>>>>>                         // define Monday prog
> >>>>>>>>>>                         env.execute()
> >>>>>>>>>>                       }
> >>>>>>>>>>                       else {
> >>>>>>>>>>                         ExecutionEnvironment env = ...
> >>>>>>>>>>                         // define other prog
> >>>>>>>>>>                         env.execute()
> >>>>>>>>>>                       }
> >>>>>>>>>>                     }
> >>>>>>>>>>
> >>>>>>>>>>                     2015-05-08 11:41 GMT+02:00 Flavio
> Pompermaier
> >>>>>>>>>>                     <pompermaier@okkam.it <mailto:
> >>>>> pompermaier@okkam.it
> >>>>>>>>>>> :
> >>>>>>>>>>
> >>>>>>>>>>                         Hi to all,
> >>>>>>>>>>                         is there any way to keep multiple jobs
> in
> >> a
> >>>>> jar
> >>>>>>>>>>                         and then choose at runtime the one to
> >> execute
> >>>>>>>>>>                         (like what ProgramDriver does in
> Hadoop)?
> >>>>>>>>>>
> >>>>>>>>>>                         Best,
> >>>>>>>>>>                         Flavio
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>>
> >>>>
> >>>
> >>
> >>
> >
>
>

Re: Package multiple jobs in a single jar

Posted by "Matthias J. Sax" <mj...@informatik.hu-berlin.de>.
Makes sense to me. :)

One more thing: What about extending the "ProgramDescription" interface
to have multiple methods as Flavio suggested (with the config(...)
method that should be handle by the ParameterTool)

> public interface FlinkJob {
> 
> /** The name to display in the job submission UI or shell */
> //e.g. "My Flink HelloWorld"
> String getDisplayName();
> //e.g. "This program does this and that etc.."
> String getDescription();
> //e.g. <0,Integer,"An integer representing my first param">, <1,String,"An string representing my second param">
> List<Tuple3<Integer, TypeInfo, String>> paramDescription;
> /** Set up the flink job in the passed ExecutionEnvironment */
> ExecutionEnvironment config(ExecutionEnvironment env);
> }

Right now, the interface is used only a couple of times in Flink's code
base, so it would not be a problem to update those classes. However, it
could break external code that uses the interface already (even if I
doubt that the interface is well known and used often [or at all]).

I personally don't think, that "getDiplayName()" to too helpful.
Splitting the program description and the parameter description seems to
be useful. For example, if wrong parameters are provided, the parameter
description can be included in the error message. If program+parameter
description is given in a single string, this is not possible. But this
is only a minor issue of course.

Maybe, we should also add the interface to the current Flink examples,
to make people more aware of it. Is there any documentation on the web site.


-Matthias



On 05/22/2015 09:43 PM, Robert Metzger wrote:
> Thank you for working on this.
> My responses are inline below:
> 
> (Flavio)
> 
>> My suggestion is to create a specific Flink interface to get also
>> description of a job and standardize parameter passing.
> 
> 
> I've recently merged the ParameterTool which is solving the "standardize
> parameter passing" problem (at least it presents a best practice) :
> http://ci.apache.org/projects/flink/flink-docs-master/apis/best_practices.html#parsing-command-line-arguments-and-passing-them-around-in-your-flink-application
> 
> Regarding the description: Maybe we can use the "ProgramDescription"
> interface for getting a string describing the program in the web frontend.
> 
> (Matthias)
> 
>> I don't want to start working on it, before it's clear that it has a
>> chance to be
>> included in Flink.
> 
> 
> I think the changes discussed here won't change the current behavior, but
> they add new functionality which
> can make the life of our users easier, so I'll vote to include your changes
> (given they meet our quality standards)
> 
> 
> If multiple classes implement "Program" interface an exception should be
>> through (I think that would make sense). However, I am not sure was
>> "good" behavior is, if a single "Program"-class is found and an
>> additional main-method class.
>>   - should "Program"-class be executed (ie, "overwrite" main-method class)
>>   - or, better to through an exception ?
> 
> 
> I would give a class implementing "Program" priority over a random main()
> method in a random class.
> Maybe printing a WARN log message informing the user that the "Program"
> class has been choosen.
> 
> 
> If no "Program"-class is found, but a single main-method class, Flink
>> could execute using main method. But I am not sure either, if this is
>> "good" behavior. If multiple main-method classes are present, throwing
>> and exception is the only way to got, I guess.
> 
> 
> I think the best effort approach "one class with main() found" is good. In
> case of multiple main methods, a helpful exception is the best approach in
> my opinion.
> 
> 
>  If the manifest contains "program-class" or "Main-Class" entry,
>> should we check the jar file right away if the specified class is there?
>> Right now, no check is performed and an error occurs if the user tries
>> to execute the job.
> 
> 
> I'd say the current approach is sufficient. There is no need to have a
> special code path which is doing the check.
> I think the error message will be pretty similar in both cases and I fear
> that this additional code could also introduce new bugs ;)
> 
> 
> 
> 
> On Fri, May 22, 2015 at 9:06 PM, Matthias J. Sax <
> mjsax@informatik.hu-berlin.de> wrote:
> 
>> Hi,
>>
>> two more thoughts to this discussion:
>>
>>  1) looking at the commit history of "CliFrontend", I found the
>> following closed issue and the closing pull request
>>     * https://issues.apache.org/jira/browse/FLINK-1095
>>     * https://github.com/apache/flink/pull/238
>> It stand in opposite of Flavio's request to have a job description. Any
>> comment on this? Should a removed feature be re-introduced? If not, I
>> would suggest to remove the "ProgramDescription" interface completely.
>>
>>  2) If the manifest contains "program-class" or "Main-Class" entry,
>> should we check the jar file right away if the specified class is there?
>> Right now, no check is performed and an error occurs if the user tries
>> to execute the job.
>>
>>
>> -Matthias
>>
>>
>> On 05/22/2015 12:06 PM, Matthias J. Sax wrote:
>>> Thanks for your feedback.
>>>
>>> I agree on the main method "problem". For scanning and listing all stuff
>>> that is found it's fine.
>>>
>>> The tricky question is the automatic invocation mechanism, if "-c" flag
>>> is not used, and no manifest program-class or Main-Class entry is found.
>>>
>>> If multiple classes implement "Program" interface an exception should be
>>> through (I think that would make sense). However, I am not sure was
>>> "good" behavior is, if a single "Program"-class is found and an
>>> additional main-method class.
>>>   - should "Program"-class be executed (ie, "overwrite" main-method
>> class)
>>>   - or, better to through an exception ?
>>>
>>> If no "Program"-class is found, but a single main-method class, Flink
>>> could execute using main method. But I am not sure either, if this is
>>> "good" behavior. If multiple main-method classes are present, throwing
>>> and exception is the only way to got, I guess.
>>>
>>> To sum up: Should Flink consider main-method classes for automatic
>>> invocation, or should it be required for main-method classes to either
>>> list them in "program-class" or "Main-Class" manifest parameter (to
>>> enable them for automatic invocation)?
>>>
>>>
>>> -Matthias
>>>
>>>
>>>
>>>
>>> On 05/22/2015 09:56 AM, Maximilian Michels wrote:
>>>> Hi Matthias,
>>>>
>>>> Thank you for taking the time to analyze Flink's invocation behavior. I
>>>> like your proposal. I'm not sure whether it is a good idea to scan the
>>>> entire JAR for main methods. Sometimes, main methods are added solely
>> for
>>>> testing purposes and don't really serve any practical use. However, if
>>>> you're already going through the JAR to find the ProgramDescription
>>>> interface, then you might look for main methods as well. As long as it
>> is
>>>> just a listing without execution, that should be fine.
>>>>
>>>> Best regards,
>>>> Max
>>>>
>>>> On Thu, May 21, 2015 at 3:43 PM, Matthias J. Sax <
>>>> mjsax@informatik.hu-berlin.de> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I had a look into the current Workflow of Flink with regard to the
>>>>> progressing steps of a jar file.
>>>>>
>>>>> If I got it right it works as follows (not sure if this is documented
>>>>> somewhere):
>>>>>
>>>>> 1) check, if "-c" flag is used to set program entry point
>>>>>    if yes, goto 4
>>>>> 2) try to extract "program-class" property from manifest
>>>>>    (if found goto 4)
>>>>> 3) try to extract "Main-Class" property from manifest
>>>>>    -> if not found through exception (this happens also, if no manifest
>>>>> file is found at all)
>>>>>
>>>>> 4) check if entry point class implements "Program" interface
>>>>>    if yes, goto 6
>>>>> 5) check if entry point class provided "public static void
>> main(String[]
>>>>> args)" method
>>>>>    -> if not, through exception
>>>>>
>>>>> 6) execute program (ie, show plan/info or really run it)
>>>>>
>>>>>
>>>>> I also "discovered" the interface "ProgramDescription" with a single
>>>>> method "String getDescription()". Even if some examples implement this
>>>>> interface (and use it in the example itself), Flink basically ignores
>>>>> it... From the CLI there is no way to get this info, and the WebUI does
>>>>> actually get it if present, however, doesn't show it anywhere...
>>>>>
>>>>>
>>>>> I think it would be nice, if we would extend the following functions:
>>>>>
>>>>>  - extend the possibility to specify multiple entry classes in
>>>>> "program-class" or "Main-Class" -> in this case, the user needs to use
>>>>> "-c" flag to pick program to run every time
>>>>>
>>>>>  - add a CLI option that allows the user to see what entry point
>> classes
>>>>> are available
>>>>>    for this, consider
>>>>>      a) "program-class" entry
>>>>>      b) "Main-Class" entry
>>>>>      c) if neither is found, scan jar-file for classes implementing
>>>>> "Program" interface
>>>>>      d) if still not found, scan jar-file for classes with "main"
>> method
>>>>>
>>>>>  - if user looks for entry point classes via CLI, check for
>>>>> "ProgramDesciption" interface and show info
>>>>>
>>>>>  - extend WebUI to show all available entry-classes (pull request
>>>>> already there, for multiple entries in "program-class")
>>>>>
>>>>>  - extend WebUI to show "ProgramDescription" info
>>>>>
>>>>>
>>>>> What do you think? I am not too sure about the "auto scan" of the jar
>>>>> file if no manifest entry is provided. We might get some "fat jars" and
>>>>> scanning might take some time.
>>>>>
>>>>>
>>>>> -Matthias
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On 05/19/2015 10:44 AM, Stephan Ewen wrote:
>>>>>> We actually has an interface like that before ("Program"). It is still
>>>>>> supported, but in all new programs we simply use the Java main method.
>>>>> The
>>>>>> advantage is that
>>>>>> most IDEs can create executable JARs automatically, setting the JAR
>>>>>> manifest attributes, etc.
>>>>>>
>>>>>> The "Program" interface still works, though. Most tool classes (like
>>>>>> "PackagedProgram") have a way to figure out whether the code uses
>>>>> "main()"
>>>>>> or implements "Program"
>>>>>> and calls the right method.
>>>>>>
>>>>>> You can try and extend the program interface. If you want to
>> consistently
>>>>>> support multiple programs in one JAR file, you may need to adjust the
>>>>> util
>>>>>> classes as
>>>>>> well to deal with that.
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Tue, May 19, 2015 at 10:10 AM, Matthias J. Sax <
>>>>>> mjsax@informatik.hu-berlin.de> wrote:
>>>>>>
>>>>>>> Supporting an interface like this seems to be a nice idea. Any other
>>>>>>> opinions on it?
>>>>>>>
>>>>>>> It seems to be some more work to get it done right. I don't want to
>>>>>>> start working on it, before it's clear that it has a chance to be
>>>>>>> included in Flink.
>>>>>>>
>>>>>>> @Flavio: I moved the discussion to dev mailing list (user list is not
>>>>>>> appropriate for this discussion). Are you subscribed to it or should
>> I
>>>>>>> cc you in each mail?
>>>>>>>
>>>>>>>
>>>>>>> -Matthias
>>>>>>>
>>>>>>>
>>>>>>> On 05/19/2015 09:39 AM, Flavio Pompermaier wrote:
>>>>>>>> Nice feature Matthias!
>>>>>>>> My suggestion is to create a specific Flink interface to get also
>>>>>>>> description of a job and standardize parameter passing.
>>>>>>>> Then, somewhere (e.g. Manifest) you could specify the list of
>> packages
>>>>>>> (or
>>>>>>>> also directly the classes) to inspect with reflection to extract the
>>>>> list
>>>>>>>> of available Flink jobs.
>>>>>>>> Something like:
>>>>>>>>
>>>>>>>> public interface FlinkJob {
>>>>>>>>
>>>>>>>> /** The name to display in the job submission UI or shell */
>>>>>>>> //e.g. "My Flink HelloWorld"
>>>>>>>> String getDisplayName();
>>>>>>>>  //e.g. "This program does this and that etc.."
>>>>>>>> String getDescription();
>>>>>>>>  //e.g. <0,Integer,"An integer representing my first param">,
>>>>>>> <1,String,"An
>>>>>>>> string representing my second param">
>>>>>>>> List<Tuple3<Integer, TypeInfo, String>> paramDescription;
>>>>>>>>  /** Set up the flink job in the passed ExecutionEnvironment */
>>>>>>>> ExecutionEnvironment config(ExecutionEnvironment env);
>>>>>>>> }
>>>>>>>>
>>>>>>>> What do you think?
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Sun, May 17, 2015 at 10:38 PM, Matthias J. Sax <
>>>>>>>> mjsax@informatik.hu-berlin.de> wrote:
>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> I like the idea that Flink's WebClient can show different plans for
>>>>>>>>> different jobs within a single jar file.
>>>>>>>>>
>>>>>>>>> I prepared a prototype for this feature. You can find it here:
>>>>>>>>> https://github.com/mjsax/flink/tree/multipleJobsWebUI
>>>>>>>>>
>>>>>>>>> To test the feature, you need to prepare a jar file, that contains
>> the
>>>>>>>>> code of multiple programs and specify each entry class in the
>> manifest
>>>>>>>>> file as comma separated values in "program-class" line.
>>>>>>>>>
>>>>>>>>> Feedback is welcome. :)
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> -Matthias
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 05/08/2015 03:08 PM, Flavio Pompermaier wrote:
>>>>>>>>>> Thank you all for the support!
>>>>>>>>>> It will be a really nice feature if the web client could be able
>> to
>>>>>>> show
>>>>>>>>>> me the list of Flink jobs within my jar..
>>>>>>>>>> it should be sufficient to mark them with a special annotation and
>>>>>>>>>> inspect the classes within the jar..
>>>>>>>>>>
>>>>>>>>>> On Fri, May 8, 2015 at 3:03 PM, Malte Schwarzer <ms@mieo.de
>>>>>>>>>> <ma...@mieo.de>> wrote:
>>>>>>>>>>
>>>>>>>>>>     Hi Flavio,
>>>>>>>>>>
>>>>>>>>>>     you also can put each job in a single class and use the –c
>>>>>>> parameter
>>>>>>>>>>     to execute jobs separately:
>>>>>>>>>>
>>>>>>>>>>     /bin/flink run –c com.myflinkjobs.JobA
>>>>>>> /path/to/jar/multiplejobs.jar
>>>>>>>>>>     /bin/flink run –c com.myflinkjobs.JobB
>>>>>>> /path/to/jar/multiplejobs.jar
>>>>>>>>>>     …
>>>>>>>>>>
>>>>>>>>>>     Cheers
>>>>>>>>>>     Malte
>>>>>>>>>>
>>>>>>>>>>     Von: Robert Metzger <rmetzger@apache.org <mailto:
>>>>>>> rmetzger@apache.org
>>>>>>>>>>>
>>>>>>>>>>     Antworten an: <user@flink.apache.org <mailto:
>>>>> user@flink.apache.org
>>>>>>>>>
>>>>>>>>>>     Datum: Freitag, 8. Mai 2015 14:57
>>>>>>>>>>     An: "user@flink.apache.org <ma...@flink.apache.org>"
>>>>>>>>>>     <user@flink.apache.org <ma...@flink.apache.org>>
>>>>>>>>>>     Betreff: Re: Package multiple jobs in a single jar
>>>>>>>>>>
>>>>>>>>>>     Hi Flavio,
>>>>>>>>>>
>>>>>>>>>>     the pom from our quickstart is a good
>>>>>>>>>>     reference:
>>>>>>>>>
>>>>>>>
>>>>>
>> https://github.com/apache/flink/blob/master/flink-quickstart/flink-quickstart-java/src/main/resources/archetype-resources/pom.xml
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>     On Fri, May 8, 2015 at 2:53 PM, Flavio Pompermaier
>>>>>>>>>>     <pompermaier@okkam.it <ma...@okkam.it>> wrote:
>>>>>>>>>>
>>>>>>>>>>         Ok, get it.
>>>>>>>>>>         And is there a reference pom.xml for shading my
>> application
>>>>>>> into
>>>>>>>>>>         one fat-jar? which flink dependencies can I exclude?
>>>>>>>>>>
>>>>>>>>>>         On Fri, May 8, 2015 at 1:05 PM, Fabian Hueske <
>>>>>>> fhueske@gmail.com
>>>>>>>>>>         <ma...@gmail.com>> wrote:
>>>>>>>>>>
>>>>>>>>>>             I didn't say that the main should return the
>>>>>>>>>>             ExecutionEnvironment.
>>>>>>>>>>             You can define and execute as many programs in a main
>>>>>>>>>>             function as you like.
>>>>>>>>>>             The program can be defined somewhere else, e.g., in a
>>>>>>>>>>             function that receives an ExecutionEnvironment and
>>>>> attaches
>>>>>>>>>>             a program such as
>>>>>>>>>>
>>>>>>>>>>             public void buildMyProgram(ExecutionEnvironment env) {
>>>>>>>>>>               DataSet<String> lines = env.readTextFile(...);
>>>>>>>>>>               // do something
>>>>>>>>>>               lines.writeAsText(...);
>>>>>>>>>>             }
>>>>>>>>>>
>>>>>>>>>>             That method could be invoked from main():
>>>>>>>>>>
>>>>>>>>>>             psv main() {
>>>>>>>>>>               ExecutionEnv env = ...
>>>>>>>>>>
>>>>>>>>>>               if(...) {
>>>>>>>>>>                 buildMyProgram(env);
>>>>>>>>>>               }
>>>>>>>>>>               else {
>>>>>>>>>>                 buildSomeOtherProg(env);
>>>>>>>>>>               }
>>>>>>>>>>
>>>>>>>>>>               env.execute();
>>>>>>>>>>
>>>>>>>>>>               // run some more programs
>>>>>>>>>>             }
>>>>>>>>>>
>>>>>>>>>>             2015-05-08 12:56 GMT+02:00 Flavio Pompermaier
>>>>>>>>>>             <pompermaier@okkam.it <ma...@okkam.it>>:
>>>>>>>>>>
>>>>>>>>>>                 Hi Fabian,
>>>>>>>>>>                 thanks for the response.
>>>>>>>>>>                 So my mains should be converted in a method
>> returning
>>>>>>>>>>                 the ExecutionEnvironment.
>>>>>>>>>>                 However it think that it will be very nice to
>> have a
>>>>>>>>>>                 syntax like the one of the Hadoop ProgramDriver to
>>>>>>>>>>                 define jobs to invoke from a single root class.
>>>>>>>>>>                 Do you think it could be useful?
>>>>>>>>>>
>>>>>>>>>>                 On Fri, May 8, 2015 at 12:42 PM, Fabian Hueske
>>>>>>>>>>                 <fhueske@gmail.com <ma...@gmail.com>>
>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>                     You easily have multiple Flink programs in a
>>>>> single
>>>>>>>>>>                     JAR file.
>>>>>>>>>>                     A program is defined using an
>>>>> ExecutionEnvironment
>>>>>>>>>>                     and executed when you call
>>>>>>>>>>                     ExecutionEnvironment.exeucte().
>>>>>>>>>>                     Where and how you do that does not matter.
>>>>>>>>>>
>>>>>>>>>>                     You can for example implement a main function
>>>>> such
>>>>>>>>> as:
>>>>>>>>>>
>>>>>>>>>>                     public static void main(String... args) {
>>>>>>>>>>
>>>>>>>>>>                       if (today == Monday) {
>>>>>>>>>>                         ExecutionEnvironment env = ...
>>>>>>>>>>                         // define Monday prog
>>>>>>>>>>                         env.execute()
>>>>>>>>>>                       }
>>>>>>>>>>                       else {
>>>>>>>>>>                         ExecutionEnvironment env = ...
>>>>>>>>>>                         // define other prog
>>>>>>>>>>                         env.execute()
>>>>>>>>>>                       }
>>>>>>>>>>                     }
>>>>>>>>>>
>>>>>>>>>>                     2015-05-08 11:41 GMT+02:00 Flavio Pompermaier
>>>>>>>>>>                     <pompermaier@okkam.it <mailto:
>>>>> pompermaier@okkam.it
>>>>>>>>>>> :
>>>>>>>>>>
>>>>>>>>>>                         Hi to all,
>>>>>>>>>>                         is there any way to keep multiple jobs in
>> a
>>>>> jar
>>>>>>>>>>                         and then choose at runtime the one to
>> execute
>>>>>>>>>>                         (like what ProgramDriver does in Hadoop)?
>>>>>>>>>>
>>>>>>>>>>                         Best,
>>>>>>>>>>                         Flavio
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>>
> 


Re: Package multiple jobs in a single jar

Posted by Robert Metzger <rm...@apache.org>.
Thank you for working on this.
My responses are inline below:

(Flavio)

> My suggestion is to create a specific Flink interface to get also
> description of a job and standardize parameter passing.


I've recently merged the ParameterTool which is solving the "standardize
parameter passing" problem (at least it presents a best practice) :
http://ci.apache.org/projects/flink/flink-docs-master/apis/best_practices.html#parsing-command-line-arguments-and-passing-them-around-in-your-flink-application

Regarding the description: Maybe we can use the "ProgramDescription"
interface for getting a string describing the program in the web frontend.

(Matthias)

> I don't want to start working on it, before it's clear that it has a
> chance to be
> included in Flink.


I think the changes discussed here won't change the current behavior, but
they add new functionality which
can make the life of our users easier, so I'll vote to include your changes
(given they meet our quality standards)


If multiple classes implement "Program" interface an exception should be
> through (I think that would make sense). However, I am not sure was
> "good" behavior is, if a single "Program"-class is found and an
> additional main-method class.
>   - should "Program"-class be executed (ie, "overwrite" main-method class)
>   - or, better to through an exception ?


I would give a class implementing "Program" priority over a random main()
method in a random class.
Maybe printing a WARN log message informing the user that the "Program"
class has been choosen.


If no "Program"-class is found, but a single main-method class, Flink
> could execute using main method. But I am not sure either, if this is
> "good" behavior. If multiple main-method classes are present, throwing
> and exception is the only way to got, I guess.


I think the best effort approach "one class with main() found" is good. In
case of multiple main methods, a helpful exception is the best approach in
my opinion.


 If the manifest contains "program-class" or "Main-Class" entry,
> should we check the jar file right away if the specified class is there?
> Right now, no check is performed and an error occurs if the user tries
> to execute the job.


I'd say the current approach is sufficient. There is no need to have a
special code path which is doing the check.
I think the error message will be pretty similar in both cases and I fear
that this additional code could also introduce new bugs ;)




On Fri, May 22, 2015 at 9:06 PM, Matthias J. Sax <
mjsax@informatik.hu-berlin.de> wrote:

> Hi,
>
> two more thoughts to this discussion:
>
>  1) looking at the commit history of "CliFrontend", I found the
> following closed issue and the closing pull request
>     * https://issues.apache.org/jira/browse/FLINK-1095
>     * https://github.com/apache/flink/pull/238
> It stand in opposite of Flavio's request to have a job description. Any
> comment on this? Should a removed feature be re-introduced? If not, I
> would suggest to remove the "ProgramDescription" interface completely.
>
>  2) If the manifest contains "program-class" or "Main-Class" entry,
> should we check the jar file right away if the specified class is there?
> Right now, no check is performed and an error occurs if the user tries
> to execute the job.
>
>
> -Matthias
>
>
> On 05/22/2015 12:06 PM, Matthias J. Sax wrote:
> > Thanks for your feedback.
> >
> > I agree on the main method "problem". For scanning and listing all stuff
> > that is found it's fine.
> >
> > The tricky question is the automatic invocation mechanism, if "-c" flag
> > is not used, and no manifest program-class or Main-Class entry is found.
> >
> > If multiple classes implement "Program" interface an exception should be
> > through (I think that would make sense). However, I am not sure was
> > "good" behavior is, if a single "Program"-class is found and an
> > additional main-method class.
> >   - should "Program"-class be executed (ie, "overwrite" main-method
> class)
> >   - or, better to through an exception ?
> >
> > If no "Program"-class is found, but a single main-method class, Flink
> > could execute using main method. But I am not sure either, if this is
> > "good" behavior. If multiple main-method classes are present, throwing
> > and exception is the only way to got, I guess.
> >
> > To sum up: Should Flink consider main-method classes for automatic
> > invocation, or should it be required for main-method classes to either
> > list them in "program-class" or "Main-Class" manifest parameter (to
> > enable them for automatic invocation)?
> >
> >
> > -Matthias
> >
> >
> >
> >
> > On 05/22/2015 09:56 AM, Maximilian Michels wrote:
> >> Hi Matthias,
> >>
> >> Thank you for taking the time to analyze Flink's invocation behavior. I
> >> like your proposal. I'm not sure whether it is a good idea to scan the
> >> entire JAR for main methods. Sometimes, main methods are added solely
> for
> >> testing purposes and don't really serve any practical use. However, if
> >> you're already going through the JAR to find the ProgramDescription
> >> interface, then you might look for main methods as well. As long as it
> is
> >> just a listing without execution, that should be fine.
> >>
> >> Best regards,
> >> Max
> >>
> >> On Thu, May 21, 2015 at 3:43 PM, Matthias J. Sax <
> >> mjsax@informatik.hu-berlin.de> wrote:
> >>
> >>> Hi,
> >>>
> >>> I had a look into the current Workflow of Flink with regard to the
> >>> progressing steps of a jar file.
> >>>
> >>> If I got it right it works as follows (not sure if this is documented
> >>> somewhere):
> >>>
> >>> 1) check, if "-c" flag is used to set program entry point
> >>>    if yes, goto 4
> >>> 2) try to extract "program-class" property from manifest
> >>>    (if found goto 4)
> >>> 3) try to extract "Main-Class" property from manifest
> >>>    -> if not found through exception (this happens also, if no manifest
> >>> file is found at all)
> >>>
> >>> 4) check if entry point class implements "Program" interface
> >>>    if yes, goto 6
> >>> 5) check if entry point class provided "public static void
> main(String[]
> >>> args)" method
> >>>    -> if not, through exception
> >>>
> >>> 6) execute program (ie, show plan/info or really run it)
> >>>
> >>>
> >>> I also "discovered" the interface "ProgramDescription" with a single
> >>> method "String getDescription()". Even if some examples implement this
> >>> interface (and use it in the example itself), Flink basically ignores
> >>> it... From the CLI there is no way to get this info, and the WebUI does
> >>> actually get it if present, however, doesn't show it anywhere...
> >>>
> >>>
> >>> I think it would be nice, if we would extend the following functions:
> >>>
> >>>  - extend the possibility to specify multiple entry classes in
> >>> "program-class" or "Main-Class" -> in this case, the user needs to use
> >>> "-c" flag to pick program to run every time
> >>>
> >>>  - add a CLI option that allows the user to see what entry point
> classes
> >>> are available
> >>>    for this, consider
> >>>      a) "program-class" entry
> >>>      b) "Main-Class" entry
> >>>      c) if neither is found, scan jar-file for classes implementing
> >>> "Program" interface
> >>>      d) if still not found, scan jar-file for classes with "main"
> method
> >>>
> >>>  - if user looks for entry point classes via CLI, check for
> >>> "ProgramDesciption" interface and show info
> >>>
> >>>  - extend WebUI to show all available entry-classes (pull request
> >>> already there, for multiple entries in "program-class")
> >>>
> >>>  - extend WebUI to show "ProgramDescription" info
> >>>
> >>>
> >>> What do you think? I am not too sure about the "auto scan" of the jar
> >>> file if no manifest entry is provided. We might get some "fat jars" and
> >>> scanning might take some time.
> >>>
> >>>
> >>> -Matthias
> >>>
> >>>
> >>>
> >>>
> >>> On 05/19/2015 10:44 AM, Stephan Ewen wrote:
> >>>> We actually has an interface like that before ("Program"). It is still
> >>>> supported, but in all new programs we simply use the Java main method.
> >>> The
> >>>> advantage is that
> >>>> most IDEs can create executable JARs automatically, setting the JAR
> >>>> manifest attributes, etc.
> >>>>
> >>>> The "Program" interface still works, though. Most tool classes (like
> >>>> "PackagedProgram") have a way to figure out whether the code uses
> >>> "main()"
> >>>> or implements "Program"
> >>>> and calls the right method.
> >>>>
> >>>> You can try and extend the program interface. If you want to
> consistently
> >>>> support multiple programs in one JAR file, you may need to adjust the
> >>> util
> >>>> classes as
> >>>> well to deal with that.
> >>>>
> >>>>
> >>>>
> >>>> On Tue, May 19, 2015 at 10:10 AM, Matthias J. Sax <
> >>>> mjsax@informatik.hu-berlin.de> wrote:
> >>>>
> >>>>> Supporting an interface like this seems to be a nice idea. Any other
> >>>>> opinions on it?
> >>>>>
> >>>>> It seems to be some more work to get it done right. I don't want to
> >>>>> start working on it, before it's clear that it has a chance to be
> >>>>> included in Flink.
> >>>>>
> >>>>> @Flavio: I moved the discussion to dev mailing list (user list is not
> >>>>> appropriate for this discussion). Are you subscribed to it or should
> I
> >>>>> cc you in each mail?
> >>>>>
> >>>>>
> >>>>> -Matthias
> >>>>>
> >>>>>
> >>>>> On 05/19/2015 09:39 AM, Flavio Pompermaier wrote:
> >>>>>> Nice feature Matthias!
> >>>>>> My suggestion is to create a specific Flink interface to get also
> >>>>>> description of a job and standardize parameter passing.
> >>>>>> Then, somewhere (e.g. Manifest) you could specify the list of
> packages
> >>>>> (or
> >>>>>> also directly the classes) to inspect with reflection to extract the
> >>> list
> >>>>>> of available Flink jobs.
> >>>>>> Something like:
> >>>>>>
> >>>>>> public interface FlinkJob {
> >>>>>>
> >>>>>> /** The name to display in the job submission UI or shell */
> >>>>>> //e.g. "My Flink HelloWorld"
> >>>>>> String getDisplayName();
> >>>>>>  //e.g. "This program does this and that etc.."
> >>>>>> String getDescription();
> >>>>>>  //e.g. <0,Integer,"An integer representing my first param">,
> >>>>> <1,String,"An
> >>>>>> string representing my second param">
> >>>>>> List<Tuple3<Integer, TypeInfo, String>> paramDescription;
> >>>>>>  /** Set up the flink job in the passed ExecutionEnvironment */
> >>>>>> ExecutionEnvironment config(ExecutionEnvironment env);
> >>>>>> }
> >>>>>>
> >>>>>> What do you think?
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> On Sun, May 17, 2015 at 10:38 PM, Matthias J. Sax <
> >>>>>> mjsax@informatik.hu-berlin.de> wrote:
> >>>>>>
> >>>>>>> Hi,
> >>>>>>>
> >>>>>>> I like the idea that Flink's WebClient can show different plans for
> >>>>>>> different jobs within a single jar file.
> >>>>>>>
> >>>>>>> I prepared a prototype for this feature. You can find it here:
> >>>>>>> https://github.com/mjsax/flink/tree/multipleJobsWebUI
> >>>>>>>
> >>>>>>> To test the feature, you need to prepare a jar file, that contains
> the
> >>>>>>> code of multiple programs and specify each entry class in the
> manifest
> >>>>>>> file as comma separated values in "program-class" line.
> >>>>>>>
> >>>>>>> Feedback is welcome. :)
> >>>>>>>
> >>>>>>>
> >>>>>>> -Matthias
> >>>>>>>
> >>>>>>>
> >>>>>>> On 05/08/2015 03:08 PM, Flavio Pompermaier wrote:
> >>>>>>>> Thank you all for the support!
> >>>>>>>> It will be a really nice feature if the web client could be able
> to
> >>>>> show
> >>>>>>>> me the list of Flink jobs within my jar..
> >>>>>>>> it should be sufficient to mark them with a special annotation and
> >>>>>>>> inspect the classes within the jar..
> >>>>>>>>
> >>>>>>>> On Fri, May 8, 2015 at 3:03 PM, Malte Schwarzer <ms@mieo.de
> >>>>>>>> <ma...@mieo.de>> wrote:
> >>>>>>>>
> >>>>>>>>     Hi Flavio,
> >>>>>>>>
> >>>>>>>>     you also can put each job in a single class and use the –c
> >>>>> parameter
> >>>>>>>>     to execute jobs separately:
> >>>>>>>>
> >>>>>>>>     /bin/flink run –c com.myflinkjobs.JobA
> >>>>> /path/to/jar/multiplejobs.jar
> >>>>>>>>     /bin/flink run –c com.myflinkjobs.JobB
> >>>>> /path/to/jar/multiplejobs.jar
> >>>>>>>>     …
> >>>>>>>>
> >>>>>>>>     Cheers
> >>>>>>>>     Malte
> >>>>>>>>
> >>>>>>>>     Von: Robert Metzger <rmetzger@apache.org <mailto:
> >>>>> rmetzger@apache.org
> >>>>>>>>>
> >>>>>>>>     Antworten an: <user@flink.apache.org <mailto:
> >>> user@flink.apache.org
> >>>>>>>
> >>>>>>>>     Datum: Freitag, 8. Mai 2015 14:57
> >>>>>>>>     An: "user@flink.apache.org <ma...@flink.apache.org>"
> >>>>>>>>     <user@flink.apache.org <ma...@flink.apache.org>>
> >>>>>>>>     Betreff: Re: Package multiple jobs in a single jar
> >>>>>>>>
> >>>>>>>>     Hi Flavio,
> >>>>>>>>
> >>>>>>>>     the pom from our quickstart is a good
> >>>>>>>>     reference:
> >>>>>>>
> >>>>>
> >>>
> https://github.com/apache/flink/blob/master/flink-quickstart/flink-quickstart-java/src/main/resources/archetype-resources/pom.xml
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>     On Fri, May 8, 2015 at 2:53 PM, Flavio Pompermaier
> >>>>>>>>     <pompermaier@okkam.it <ma...@okkam.it>> wrote:
> >>>>>>>>
> >>>>>>>>         Ok, get it.
> >>>>>>>>         And is there a reference pom.xml for shading my
> application
> >>>>> into
> >>>>>>>>         one fat-jar? which flink dependencies can I exclude?
> >>>>>>>>
> >>>>>>>>         On Fri, May 8, 2015 at 1:05 PM, Fabian Hueske <
> >>>>> fhueske@gmail.com
> >>>>>>>>         <ma...@gmail.com>> wrote:
> >>>>>>>>
> >>>>>>>>             I didn't say that the main should return the
> >>>>>>>>             ExecutionEnvironment.
> >>>>>>>>             You can define and execute as many programs in a main
> >>>>>>>>             function as you like.
> >>>>>>>>             The program can be defined somewhere else, e.g., in a
> >>>>>>>>             function that receives an ExecutionEnvironment and
> >>> attaches
> >>>>>>>>             a program such as
> >>>>>>>>
> >>>>>>>>             public void buildMyProgram(ExecutionEnvironment env) {
> >>>>>>>>               DataSet<String> lines = env.readTextFile(...);
> >>>>>>>>               // do something
> >>>>>>>>               lines.writeAsText(...);
> >>>>>>>>             }
> >>>>>>>>
> >>>>>>>>             That method could be invoked from main():
> >>>>>>>>
> >>>>>>>>             psv main() {
> >>>>>>>>               ExecutionEnv env = ...
> >>>>>>>>
> >>>>>>>>               if(...) {
> >>>>>>>>                 buildMyProgram(env);
> >>>>>>>>               }
> >>>>>>>>               else {
> >>>>>>>>                 buildSomeOtherProg(env);
> >>>>>>>>               }
> >>>>>>>>
> >>>>>>>>               env.execute();
> >>>>>>>>
> >>>>>>>>               // run some more programs
> >>>>>>>>             }
> >>>>>>>>
> >>>>>>>>             2015-05-08 12:56 GMT+02:00 Flavio Pompermaier
> >>>>>>>>             <pompermaier@okkam.it <ma...@okkam.it>>:
> >>>>>>>>
> >>>>>>>>                 Hi Fabian,
> >>>>>>>>                 thanks for the response.
> >>>>>>>>                 So my mains should be converted in a method
> returning
> >>>>>>>>                 the ExecutionEnvironment.
> >>>>>>>>                 However it think that it will be very nice to
> have a
> >>>>>>>>                 syntax like the one of the Hadoop ProgramDriver to
> >>>>>>>>                 define jobs to invoke from a single root class.
> >>>>>>>>                 Do you think it could be useful?
> >>>>>>>>
> >>>>>>>>                 On Fri, May 8, 2015 at 12:42 PM, Fabian Hueske
> >>>>>>>>                 <fhueske@gmail.com <ma...@gmail.com>>
> >>> wrote:
> >>>>>>>>
> >>>>>>>>                     You easily have multiple Flink programs in a
> >>> single
> >>>>>>>>                     JAR file.
> >>>>>>>>                     A program is defined using an
> >>> ExecutionEnvironment
> >>>>>>>>                     and executed when you call
> >>>>>>>>                     ExecutionEnvironment.exeucte().
> >>>>>>>>                     Where and how you do that does not matter.
> >>>>>>>>
> >>>>>>>>                     You can for example implement a main function
> >>> such
> >>>>>>> as:
> >>>>>>>>
> >>>>>>>>                     public static void main(String... args) {
> >>>>>>>>
> >>>>>>>>                       if (today == Monday) {
> >>>>>>>>                         ExecutionEnvironment env = ...
> >>>>>>>>                         // define Monday prog
> >>>>>>>>                         env.execute()
> >>>>>>>>                       }
> >>>>>>>>                       else {
> >>>>>>>>                         ExecutionEnvironment env = ...
> >>>>>>>>                         // define other prog
> >>>>>>>>                         env.execute()
> >>>>>>>>                       }
> >>>>>>>>                     }
> >>>>>>>>
> >>>>>>>>                     2015-05-08 11:41 GMT+02:00 Flavio Pompermaier
> >>>>>>>>                     <pompermaier@okkam.it <mailto:
> >>> pompermaier@okkam.it
> >>>>>>>>> :
> >>>>>>>>
> >>>>>>>>                         Hi to all,
> >>>>>>>>                         is there any way to keep multiple jobs in
> a
> >>> jar
> >>>>>>>>                         and then choose at runtime the one to
> execute
> >>>>>>>>                         (like what ProgramDriver does in Hadoop)?
> >>>>>>>>
> >>>>>>>>                         Best,
> >>>>>>>>                         Flavio
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>>
> >>>>
> >>>
> >>>
> >>
> >
>
>

Re: Package multiple jobs in a single jar

Posted by "Matthias J. Sax" <mj...@informatik.hu-berlin.de>.
Hi,

two more thoughts to this discussion:

 1) looking at the commit history of "CliFrontend", I found the
following closed issue and the closing pull request
    * https://issues.apache.org/jira/browse/FLINK-1095
    * https://github.com/apache/flink/pull/238
It stand in opposite of Flavio's request to have a job description. Any
comment on this? Should a removed feature be re-introduced? If not, I
would suggest to remove the "ProgramDescription" interface completely.

 2) If the manifest contains "program-class" or "Main-Class" entry,
should we check the jar file right away if the specified class is there?
Right now, no check is performed and an error occurs if the user tries
to execute the job.


-Matthias


On 05/22/2015 12:06 PM, Matthias J. Sax wrote:
> Thanks for your feedback.
> 
> I agree on the main method "problem". For scanning and listing all stuff
> that is found it's fine.
> 
> The tricky question is the automatic invocation mechanism, if "-c" flag
> is not used, and no manifest program-class or Main-Class entry is found.
> 
> If multiple classes implement "Program" interface an exception should be
> through (I think that would make sense). However, I am not sure was
> "good" behavior is, if a single "Program"-class is found and an
> additional main-method class.
>   - should "Program"-class be executed (ie, "overwrite" main-method class)
>   - or, better to through an exception ?
> 
> If no "Program"-class is found, but a single main-method class, Flink
> could execute using main method. But I am not sure either, if this is
> "good" behavior. If multiple main-method classes are present, throwing
> and exception is the only way to got, I guess.
> 
> To sum up: Should Flink consider main-method classes for automatic
> invocation, or should it be required for main-method classes to either
> list them in "program-class" or "Main-Class" manifest parameter (to
> enable them for automatic invocation)?
> 
> 
> -Matthias
> 
> 
> 
> 
> On 05/22/2015 09:56 AM, Maximilian Michels wrote:
>> Hi Matthias,
>>
>> Thank you for taking the time to analyze Flink's invocation behavior. I
>> like your proposal. I'm not sure whether it is a good idea to scan the
>> entire JAR for main methods. Sometimes, main methods are added solely for
>> testing purposes and don't really serve any practical use. However, if
>> you're already going through the JAR to find the ProgramDescription
>> interface, then you might look for main methods as well. As long as it is
>> just a listing without execution, that should be fine.
>>
>> Best regards,
>> Max
>>
>> On Thu, May 21, 2015 at 3:43 PM, Matthias J. Sax <
>> mjsax@informatik.hu-berlin.de> wrote:
>>
>>> Hi,
>>>
>>> I had a look into the current Workflow of Flink with regard to the
>>> progressing steps of a jar file.
>>>
>>> If I got it right it works as follows (not sure if this is documented
>>> somewhere):
>>>
>>> 1) check, if "-c" flag is used to set program entry point
>>>    if yes, goto 4
>>> 2) try to extract "program-class" property from manifest
>>>    (if found goto 4)
>>> 3) try to extract "Main-Class" property from manifest
>>>    -> if not found through exception (this happens also, if no manifest
>>> file is found at all)
>>>
>>> 4) check if entry point class implements "Program" interface
>>>    if yes, goto 6
>>> 5) check if entry point class provided "public static void main(String[]
>>> args)" method
>>>    -> if not, through exception
>>>
>>> 6) execute program (ie, show plan/info or really run it)
>>>
>>>
>>> I also "discovered" the interface "ProgramDescription" with a single
>>> method "String getDescription()". Even if some examples implement this
>>> interface (and use it in the example itself), Flink basically ignores
>>> it... From the CLI there is no way to get this info, and the WebUI does
>>> actually get it if present, however, doesn't show it anywhere...
>>>
>>>
>>> I think it would be nice, if we would extend the following functions:
>>>
>>>  - extend the possibility to specify multiple entry classes in
>>> "program-class" or "Main-Class" -> in this case, the user needs to use
>>> "-c" flag to pick program to run every time
>>>
>>>  - add a CLI option that allows the user to see what entry point classes
>>> are available
>>>    for this, consider
>>>      a) "program-class" entry
>>>      b) "Main-Class" entry
>>>      c) if neither is found, scan jar-file for classes implementing
>>> "Program" interface
>>>      d) if still not found, scan jar-file for classes with "main" method
>>>
>>>  - if user looks for entry point classes via CLI, check for
>>> "ProgramDesciption" interface and show info
>>>
>>>  - extend WebUI to show all available entry-classes (pull request
>>> already there, for multiple entries in "program-class")
>>>
>>>  - extend WebUI to show "ProgramDescription" info
>>>
>>>
>>> What do you think? I am not too sure about the "auto scan" of the jar
>>> file if no manifest entry is provided. We might get some "fat jars" and
>>> scanning might take some time.
>>>
>>>
>>> -Matthias
>>>
>>>
>>>
>>>
>>> On 05/19/2015 10:44 AM, Stephan Ewen wrote:
>>>> We actually has an interface like that before ("Program"). It is still
>>>> supported, but in all new programs we simply use the Java main method.
>>> The
>>>> advantage is that
>>>> most IDEs can create executable JARs automatically, setting the JAR
>>>> manifest attributes, etc.
>>>>
>>>> The "Program" interface still works, though. Most tool classes (like
>>>> "PackagedProgram") have a way to figure out whether the code uses
>>> "main()"
>>>> or implements "Program"
>>>> and calls the right method.
>>>>
>>>> You can try and extend the program interface. If you want to consistently
>>>> support multiple programs in one JAR file, you may need to adjust the
>>> util
>>>> classes as
>>>> well to deal with that.
>>>>
>>>>
>>>>
>>>> On Tue, May 19, 2015 at 10:10 AM, Matthias J. Sax <
>>>> mjsax@informatik.hu-berlin.de> wrote:
>>>>
>>>>> Supporting an interface like this seems to be a nice idea. Any other
>>>>> opinions on it?
>>>>>
>>>>> It seems to be some more work to get it done right. I don't want to
>>>>> start working on it, before it's clear that it has a chance to be
>>>>> included in Flink.
>>>>>
>>>>> @Flavio: I moved the discussion to dev mailing list (user list is not
>>>>> appropriate for this discussion). Are you subscribed to it or should I
>>>>> cc you in each mail?
>>>>>
>>>>>
>>>>> -Matthias
>>>>>
>>>>>
>>>>> On 05/19/2015 09:39 AM, Flavio Pompermaier wrote:
>>>>>> Nice feature Matthias!
>>>>>> My suggestion is to create a specific Flink interface to get also
>>>>>> description of a job and standardize parameter passing.
>>>>>> Then, somewhere (e.g. Manifest) you could specify the list of packages
>>>>> (or
>>>>>> also directly the classes) to inspect with reflection to extract the
>>> list
>>>>>> of available Flink jobs.
>>>>>> Something like:
>>>>>>
>>>>>> public interface FlinkJob {
>>>>>>
>>>>>> /** The name to display in the job submission UI or shell */
>>>>>> //e.g. "My Flink HelloWorld"
>>>>>> String getDisplayName();
>>>>>>  //e.g. "This program does this and that etc.."
>>>>>> String getDescription();
>>>>>>  //e.g. <0,Integer,"An integer representing my first param">,
>>>>> <1,String,"An
>>>>>> string representing my second param">
>>>>>> List<Tuple3<Integer, TypeInfo, String>> paramDescription;
>>>>>>  /** Set up the flink job in the passed ExecutionEnvironment */
>>>>>> ExecutionEnvironment config(ExecutionEnvironment env);
>>>>>> }
>>>>>>
>>>>>> What do you think?
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Sun, May 17, 2015 at 10:38 PM, Matthias J. Sax <
>>>>>> mjsax@informatik.hu-berlin.de> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> I like the idea that Flink's WebClient can show different plans for
>>>>>>> different jobs within a single jar file.
>>>>>>>
>>>>>>> I prepared a prototype for this feature. You can find it here:
>>>>>>> https://github.com/mjsax/flink/tree/multipleJobsWebUI
>>>>>>>
>>>>>>> To test the feature, you need to prepare a jar file, that contains the
>>>>>>> code of multiple programs and specify each entry class in the manifest
>>>>>>> file as comma separated values in "program-class" line.
>>>>>>>
>>>>>>> Feedback is welcome. :)
>>>>>>>
>>>>>>>
>>>>>>> -Matthias
>>>>>>>
>>>>>>>
>>>>>>> On 05/08/2015 03:08 PM, Flavio Pompermaier wrote:
>>>>>>>> Thank you all for the support!
>>>>>>>> It will be a really nice feature if the web client could be able to
>>>>> show
>>>>>>>> me the list of Flink jobs within my jar..
>>>>>>>> it should be sufficient to mark them with a special annotation and
>>>>>>>> inspect the classes within the jar..
>>>>>>>>
>>>>>>>> On Fri, May 8, 2015 at 3:03 PM, Malte Schwarzer <ms@mieo.de
>>>>>>>> <ma...@mieo.de>> wrote:
>>>>>>>>
>>>>>>>>     Hi Flavio,
>>>>>>>>
>>>>>>>>     you also can put each job in a single class and use the –c
>>>>> parameter
>>>>>>>>     to execute jobs separately:
>>>>>>>>
>>>>>>>>     /bin/flink run –c com.myflinkjobs.JobA
>>>>> /path/to/jar/multiplejobs.jar
>>>>>>>>     /bin/flink run –c com.myflinkjobs.JobB
>>>>> /path/to/jar/multiplejobs.jar
>>>>>>>>     …
>>>>>>>>
>>>>>>>>     Cheers
>>>>>>>>     Malte
>>>>>>>>
>>>>>>>>     Von: Robert Metzger <rmetzger@apache.org <mailto:
>>>>> rmetzger@apache.org
>>>>>>>>>
>>>>>>>>     Antworten an: <user@flink.apache.org <mailto:
>>> user@flink.apache.org
>>>>>>>
>>>>>>>>     Datum: Freitag, 8. Mai 2015 14:57
>>>>>>>>     An: "user@flink.apache.org <ma...@flink.apache.org>"
>>>>>>>>     <user@flink.apache.org <ma...@flink.apache.org>>
>>>>>>>>     Betreff: Re: Package multiple jobs in a single jar
>>>>>>>>
>>>>>>>>     Hi Flavio,
>>>>>>>>
>>>>>>>>     the pom from our quickstart is a good
>>>>>>>>     reference:
>>>>>>>
>>>>>
>>> https://github.com/apache/flink/blob/master/flink-quickstart/flink-quickstart-java/src/main/resources/archetype-resources/pom.xml
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>     On Fri, May 8, 2015 at 2:53 PM, Flavio Pompermaier
>>>>>>>>     <pompermaier@okkam.it <ma...@okkam.it>> wrote:
>>>>>>>>
>>>>>>>>         Ok, get it.
>>>>>>>>         And is there a reference pom.xml for shading my application
>>>>> into
>>>>>>>>         one fat-jar? which flink dependencies can I exclude?
>>>>>>>>
>>>>>>>>         On Fri, May 8, 2015 at 1:05 PM, Fabian Hueske <
>>>>> fhueske@gmail.com
>>>>>>>>         <ma...@gmail.com>> wrote:
>>>>>>>>
>>>>>>>>             I didn't say that the main should return the
>>>>>>>>             ExecutionEnvironment.
>>>>>>>>             You can define and execute as many programs in a main
>>>>>>>>             function as you like.
>>>>>>>>             The program can be defined somewhere else, e.g., in a
>>>>>>>>             function that receives an ExecutionEnvironment and
>>> attaches
>>>>>>>>             a program such as
>>>>>>>>
>>>>>>>>             public void buildMyProgram(ExecutionEnvironment env) {
>>>>>>>>               DataSet<String> lines = env.readTextFile(...);
>>>>>>>>               // do something
>>>>>>>>               lines.writeAsText(...);
>>>>>>>>             }
>>>>>>>>
>>>>>>>>             That method could be invoked from main():
>>>>>>>>
>>>>>>>>             psv main() {
>>>>>>>>               ExecutionEnv env = ...
>>>>>>>>
>>>>>>>>               if(...) {
>>>>>>>>                 buildMyProgram(env);
>>>>>>>>               }
>>>>>>>>               else {
>>>>>>>>                 buildSomeOtherProg(env);
>>>>>>>>               }
>>>>>>>>
>>>>>>>>               env.execute();
>>>>>>>>
>>>>>>>>               // run some more programs
>>>>>>>>             }
>>>>>>>>
>>>>>>>>             2015-05-08 12:56 GMT+02:00 Flavio Pompermaier
>>>>>>>>             <pompermaier@okkam.it <ma...@okkam.it>>:
>>>>>>>>
>>>>>>>>                 Hi Fabian,
>>>>>>>>                 thanks for the response.
>>>>>>>>                 So my mains should be converted in a method returning
>>>>>>>>                 the ExecutionEnvironment.
>>>>>>>>                 However it think that it will be very nice to have a
>>>>>>>>                 syntax like the one of the Hadoop ProgramDriver to
>>>>>>>>                 define jobs to invoke from a single root class.
>>>>>>>>                 Do you think it could be useful?
>>>>>>>>
>>>>>>>>                 On Fri, May 8, 2015 at 12:42 PM, Fabian Hueske
>>>>>>>>                 <fhueske@gmail.com <ma...@gmail.com>>
>>> wrote:
>>>>>>>>
>>>>>>>>                     You easily have multiple Flink programs in a
>>> single
>>>>>>>>                     JAR file.
>>>>>>>>                     A program is defined using an
>>> ExecutionEnvironment
>>>>>>>>                     and executed when you call
>>>>>>>>                     ExecutionEnvironment.exeucte().
>>>>>>>>                     Where and how you do that does not matter.
>>>>>>>>
>>>>>>>>                     You can for example implement a main function
>>> such
>>>>>>> as:
>>>>>>>>
>>>>>>>>                     public static void main(String... args) {
>>>>>>>>
>>>>>>>>                       if (today == Monday) {
>>>>>>>>                         ExecutionEnvironment env = ...
>>>>>>>>                         // define Monday prog
>>>>>>>>                         env.execute()
>>>>>>>>                       }
>>>>>>>>                       else {
>>>>>>>>                         ExecutionEnvironment env = ...
>>>>>>>>                         // define other prog
>>>>>>>>                         env.execute()
>>>>>>>>                       }
>>>>>>>>                     }
>>>>>>>>
>>>>>>>>                     2015-05-08 11:41 GMT+02:00 Flavio Pompermaier
>>>>>>>>                     <pompermaier@okkam.it <mailto:
>>> pompermaier@okkam.it
>>>>>>>>> :
>>>>>>>>
>>>>>>>>                         Hi to all,
>>>>>>>>                         is there any way to keep multiple jobs in a
>>> jar
>>>>>>>>                         and then choose at runtime the one to execute
>>>>>>>>                         (like what ProgramDriver does in Hadoop)?
>>>>>>>>
>>>>>>>>                         Best,
>>>>>>>>                         Flavio
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>
> 


Re: Package multiple jobs in a single jar

Posted by "Matthias J. Sax" <mj...@informatik.hu-berlin.de>.
Thanks for your feedback.

I agree on the main method "problem". For scanning and listing all stuff
that is found it's fine.

The tricky question is the automatic invocation mechanism, if "-c" flag
is not used, and no manifest program-class or Main-Class entry is found.

If multiple classes implement "Program" interface an exception should be
through (I think that would make sense). However, I am not sure was
"good" behavior is, if a single "Program"-class is found and an
additional main-method class.
  - should "Program"-class be executed (ie, "overwrite" main-method class)
  - or, better to through an exception ?

If no "Program"-class is found, but a single main-method class, Flink
could execute using main method. But I am not sure either, if this is
"good" behavior. If multiple main-method classes are present, throwing
and exception is the only way to got, I guess.

To sum up: Should Flink consider main-method classes for automatic
invocation, or should it be required for main-method classes to either
list them in "program-class" or "Main-Class" manifest parameter (to
enable them for automatic invocation)?


-Matthias




On 05/22/2015 09:56 AM, Maximilian Michels wrote:
> Hi Matthias,
> 
> Thank you for taking the time to analyze Flink's invocation behavior. I
> like your proposal. I'm not sure whether it is a good idea to scan the
> entire JAR for main methods. Sometimes, main methods are added solely for
> testing purposes and don't really serve any practical use. However, if
> you're already going through the JAR to find the ProgramDescription
> interface, then you might look for main methods as well. As long as it is
> just a listing without execution, that should be fine.
> 
> Best regards,
> Max
> 
> On Thu, May 21, 2015 at 3:43 PM, Matthias J. Sax <
> mjsax@informatik.hu-berlin.de> wrote:
> 
>> Hi,
>>
>> I had a look into the current Workflow of Flink with regard to the
>> progressing steps of a jar file.
>>
>> If I got it right it works as follows (not sure if this is documented
>> somewhere):
>>
>> 1) check, if "-c" flag is used to set program entry point
>>    if yes, goto 4
>> 2) try to extract "program-class" property from manifest
>>    (if found goto 4)
>> 3) try to extract "Main-Class" property from manifest
>>    -> if not found through exception (this happens also, if no manifest
>> file is found at all)
>>
>> 4) check if entry point class implements "Program" interface
>>    if yes, goto 6
>> 5) check if entry point class provided "public static void main(String[]
>> args)" method
>>    -> if not, through exception
>>
>> 6) execute program (ie, show plan/info or really run it)
>>
>>
>> I also "discovered" the interface "ProgramDescription" with a single
>> method "String getDescription()". Even if some examples implement this
>> interface (and use it in the example itself), Flink basically ignores
>> it... From the CLI there is no way to get this info, and the WebUI does
>> actually get it if present, however, doesn't show it anywhere...
>>
>>
>> I think it would be nice, if we would extend the following functions:
>>
>>  - extend the possibility to specify multiple entry classes in
>> "program-class" or "Main-Class" -> in this case, the user needs to use
>> "-c" flag to pick program to run every time
>>
>>  - add a CLI option that allows the user to see what entry point classes
>> are available
>>    for this, consider
>>      a) "program-class" entry
>>      b) "Main-Class" entry
>>      c) if neither is found, scan jar-file for classes implementing
>> "Program" interface
>>      d) if still not found, scan jar-file for classes with "main" method
>>
>>  - if user looks for entry point classes via CLI, check for
>> "ProgramDesciption" interface and show info
>>
>>  - extend WebUI to show all available entry-classes (pull request
>> already there, for multiple entries in "program-class")
>>
>>  - extend WebUI to show "ProgramDescription" info
>>
>>
>> What do you think? I am not too sure about the "auto scan" of the jar
>> file if no manifest entry is provided. We might get some "fat jars" and
>> scanning might take some time.
>>
>>
>> -Matthias
>>
>>
>>
>>
>> On 05/19/2015 10:44 AM, Stephan Ewen wrote:
>>> We actually has an interface like that before ("Program"). It is still
>>> supported, but in all new programs we simply use the Java main method.
>> The
>>> advantage is that
>>> most IDEs can create executable JARs automatically, setting the JAR
>>> manifest attributes, etc.
>>>
>>> The "Program" interface still works, though. Most tool classes (like
>>> "PackagedProgram") have a way to figure out whether the code uses
>> "main()"
>>> or implements "Program"
>>> and calls the right method.
>>>
>>> You can try and extend the program interface. If you want to consistently
>>> support multiple programs in one JAR file, you may need to adjust the
>> util
>>> classes as
>>> well to deal with that.
>>>
>>>
>>>
>>> On Tue, May 19, 2015 at 10:10 AM, Matthias J. Sax <
>>> mjsax@informatik.hu-berlin.de> wrote:
>>>
>>>> Supporting an interface like this seems to be a nice idea. Any other
>>>> opinions on it?
>>>>
>>>> It seems to be some more work to get it done right. I don't want to
>>>> start working on it, before it's clear that it has a chance to be
>>>> included in Flink.
>>>>
>>>> @Flavio: I moved the discussion to dev mailing list (user list is not
>>>> appropriate for this discussion). Are you subscribed to it or should I
>>>> cc you in each mail?
>>>>
>>>>
>>>> -Matthias
>>>>
>>>>
>>>> On 05/19/2015 09:39 AM, Flavio Pompermaier wrote:
>>>>> Nice feature Matthias!
>>>>> My suggestion is to create a specific Flink interface to get also
>>>>> description of a job and standardize parameter passing.
>>>>> Then, somewhere (e.g. Manifest) you could specify the list of packages
>>>> (or
>>>>> also directly the classes) to inspect with reflection to extract the
>> list
>>>>> of available Flink jobs.
>>>>> Something like:
>>>>>
>>>>> public interface FlinkJob {
>>>>>
>>>>> /** The name to display in the job submission UI or shell */
>>>>> //e.g. "My Flink HelloWorld"
>>>>> String getDisplayName();
>>>>>  //e.g. "This program does this and that etc.."
>>>>> String getDescription();
>>>>>  //e.g. <0,Integer,"An integer representing my first param">,
>>>> <1,String,"An
>>>>> string representing my second param">
>>>>> List<Tuple3<Integer, TypeInfo, String>> paramDescription;
>>>>>  /** Set up the flink job in the passed ExecutionEnvironment */
>>>>> ExecutionEnvironment config(ExecutionEnvironment env);
>>>>> }
>>>>>
>>>>> What do you think?
>>>>>
>>>>>
>>>>>
>>>>> On Sun, May 17, 2015 at 10:38 PM, Matthias J. Sax <
>>>>> mjsax@informatik.hu-berlin.de> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I like the idea that Flink's WebClient can show different plans for
>>>>>> different jobs within a single jar file.
>>>>>>
>>>>>> I prepared a prototype for this feature. You can find it here:
>>>>>> https://github.com/mjsax/flink/tree/multipleJobsWebUI
>>>>>>
>>>>>> To test the feature, you need to prepare a jar file, that contains the
>>>>>> code of multiple programs and specify each entry class in the manifest
>>>>>> file as comma separated values in "program-class" line.
>>>>>>
>>>>>> Feedback is welcome. :)
>>>>>>
>>>>>>
>>>>>> -Matthias
>>>>>>
>>>>>>
>>>>>> On 05/08/2015 03:08 PM, Flavio Pompermaier wrote:
>>>>>>> Thank you all for the support!
>>>>>>> It will be a really nice feature if the web client could be able to
>>>> show
>>>>>>> me the list of Flink jobs within my jar..
>>>>>>> it should be sufficient to mark them with a special annotation and
>>>>>>> inspect the classes within the jar..
>>>>>>>
>>>>>>> On Fri, May 8, 2015 at 3:03 PM, Malte Schwarzer <ms@mieo.de
>>>>>>> <ma...@mieo.de>> wrote:
>>>>>>>
>>>>>>>     Hi Flavio,
>>>>>>>
>>>>>>>     you also can put each job in a single class and use the –c
>>>> parameter
>>>>>>>     to execute jobs separately:
>>>>>>>
>>>>>>>     /bin/flink run –c com.myflinkjobs.JobA
>>>> /path/to/jar/multiplejobs.jar
>>>>>>>     /bin/flink run –c com.myflinkjobs.JobB
>>>> /path/to/jar/multiplejobs.jar
>>>>>>>     …
>>>>>>>
>>>>>>>     Cheers
>>>>>>>     Malte
>>>>>>>
>>>>>>>     Von: Robert Metzger <rmetzger@apache.org <mailto:
>>>> rmetzger@apache.org
>>>>>>>>
>>>>>>>     Antworten an: <user@flink.apache.org <mailto:
>> user@flink.apache.org
>>>>>>
>>>>>>>     Datum: Freitag, 8. Mai 2015 14:57
>>>>>>>     An: "user@flink.apache.org <ma...@flink.apache.org>"
>>>>>>>     <user@flink.apache.org <ma...@flink.apache.org>>
>>>>>>>     Betreff: Re: Package multiple jobs in a single jar
>>>>>>>
>>>>>>>     Hi Flavio,
>>>>>>>
>>>>>>>     the pom from our quickstart is a good
>>>>>>>     reference:
>>>>>>
>>>>
>> https://github.com/apache/flink/blob/master/flink-quickstart/flink-quickstart-java/src/main/resources/archetype-resources/pom.xml
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>     On Fri, May 8, 2015 at 2:53 PM, Flavio Pompermaier
>>>>>>>     <pompermaier@okkam.it <ma...@okkam.it>> wrote:
>>>>>>>
>>>>>>>         Ok, get it.
>>>>>>>         And is there a reference pom.xml for shading my application
>>>> into
>>>>>>>         one fat-jar? which flink dependencies can I exclude?
>>>>>>>
>>>>>>>         On Fri, May 8, 2015 at 1:05 PM, Fabian Hueske <
>>>> fhueske@gmail.com
>>>>>>>         <ma...@gmail.com>> wrote:
>>>>>>>
>>>>>>>             I didn't say that the main should return the
>>>>>>>             ExecutionEnvironment.
>>>>>>>             You can define and execute as many programs in a main
>>>>>>>             function as you like.
>>>>>>>             The program can be defined somewhere else, e.g., in a
>>>>>>>             function that receives an ExecutionEnvironment and
>> attaches
>>>>>>>             a program such as
>>>>>>>
>>>>>>>             public void buildMyProgram(ExecutionEnvironment env) {
>>>>>>>               DataSet<String> lines = env.readTextFile(...);
>>>>>>>               // do something
>>>>>>>               lines.writeAsText(...);
>>>>>>>             }
>>>>>>>
>>>>>>>             That method could be invoked from main():
>>>>>>>
>>>>>>>             psv main() {
>>>>>>>               ExecutionEnv env = ...
>>>>>>>
>>>>>>>               if(...) {
>>>>>>>                 buildMyProgram(env);
>>>>>>>               }
>>>>>>>               else {
>>>>>>>                 buildSomeOtherProg(env);
>>>>>>>               }
>>>>>>>
>>>>>>>               env.execute();
>>>>>>>
>>>>>>>               // run some more programs
>>>>>>>             }
>>>>>>>
>>>>>>>             2015-05-08 12:56 GMT+02:00 Flavio Pompermaier
>>>>>>>             <pompermaier@okkam.it <ma...@okkam.it>>:
>>>>>>>
>>>>>>>                 Hi Fabian,
>>>>>>>                 thanks for the response.
>>>>>>>                 So my mains should be converted in a method returning
>>>>>>>                 the ExecutionEnvironment.
>>>>>>>                 However it think that it will be very nice to have a
>>>>>>>                 syntax like the one of the Hadoop ProgramDriver to
>>>>>>>                 define jobs to invoke from a single root class.
>>>>>>>                 Do you think it could be useful?
>>>>>>>
>>>>>>>                 On Fri, May 8, 2015 at 12:42 PM, Fabian Hueske
>>>>>>>                 <fhueske@gmail.com <ma...@gmail.com>>
>> wrote:
>>>>>>>
>>>>>>>                     You easily have multiple Flink programs in a
>> single
>>>>>>>                     JAR file.
>>>>>>>                     A program is defined using an
>> ExecutionEnvironment
>>>>>>>                     and executed when you call
>>>>>>>                     ExecutionEnvironment.exeucte().
>>>>>>>                     Where and how you do that does not matter.
>>>>>>>
>>>>>>>                     You can for example implement a main function
>> such
>>>>>> as:
>>>>>>>
>>>>>>>                     public static void main(String... args) {
>>>>>>>
>>>>>>>                       if (today == Monday) {
>>>>>>>                         ExecutionEnvironment env = ...
>>>>>>>                         // define Monday prog
>>>>>>>                         env.execute()
>>>>>>>                       }
>>>>>>>                       else {
>>>>>>>                         ExecutionEnvironment env = ...
>>>>>>>                         // define other prog
>>>>>>>                         env.execute()
>>>>>>>                       }
>>>>>>>                     }
>>>>>>>
>>>>>>>                     2015-05-08 11:41 GMT+02:00 Flavio Pompermaier
>>>>>>>                     <pompermaier@okkam.it <mailto:
>> pompermaier@okkam.it
>>>>>>>> :
>>>>>>>
>>>>>>>                         Hi to all,
>>>>>>>                         is there any way to keep multiple jobs in a
>> jar
>>>>>>>                         and then choose at runtime the one to execute
>>>>>>>                         (like what ProgramDriver does in Hadoop)?
>>>>>>>
>>>>>>>                         Best,
>>>>>>>                         Flavio
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>
>>
>>
> 


Re: Package multiple jobs in a single jar

Posted by Maximilian Michels <mx...@apache.org>.
Hi Matthias,

Thank you for taking the time to analyze Flink's invocation behavior. I
like your proposal. I'm not sure whether it is a good idea to scan the
entire JAR for main methods. Sometimes, main methods are added solely for
testing purposes and don't really serve any practical use. However, if
you're already going through the JAR to find the ProgramDescription
interface, then you might look for main methods as well. As long as it is
just a listing without execution, that should be fine.

Best regards,
Max

On Thu, May 21, 2015 at 3:43 PM, Matthias J. Sax <
mjsax@informatik.hu-berlin.de> wrote:

> Hi,
>
> I had a look into the current Workflow of Flink with regard to the
> progressing steps of a jar file.
>
> If I got it right it works as follows (not sure if this is documented
> somewhere):
>
> 1) check, if "-c" flag is used to set program entry point
>    if yes, goto 4
> 2) try to extract "program-class" property from manifest
>    (if found goto 4)
> 3) try to extract "Main-Class" property from manifest
>    -> if not found through exception (this happens also, if no manifest
> file is found at all)
>
> 4) check if entry point class implements "Program" interface
>    if yes, goto 6
> 5) check if entry point class provided "public static void main(String[]
> args)" method
>    -> if not, through exception
>
> 6) execute program (ie, show plan/info or really run it)
>
>
> I also "discovered" the interface "ProgramDescription" with a single
> method "String getDescription()". Even if some examples implement this
> interface (and use it in the example itself), Flink basically ignores
> it... From the CLI there is no way to get this info, and the WebUI does
> actually get it if present, however, doesn't show it anywhere...
>
>
> I think it would be nice, if we would extend the following functions:
>
>  - extend the possibility to specify multiple entry classes in
> "program-class" or "Main-Class" -> in this case, the user needs to use
> "-c" flag to pick program to run every time
>
>  - add a CLI option that allows the user to see what entry point classes
> are available
>    for this, consider
>      a) "program-class" entry
>      b) "Main-Class" entry
>      c) if neither is found, scan jar-file for classes implementing
> "Program" interface
>      d) if still not found, scan jar-file for classes with "main" method
>
>  - if user looks for entry point classes via CLI, check for
> "ProgramDesciption" interface and show info
>
>  - extend WebUI to show all available entry-classes (pull request
> already there, for multiple entries in "program-class")
>
>  - extend WebUI to show "ProgramDescription" info
>
>
> What do you think? I am not too sure about the "auto scan" of the jar
> file if no manifest entry is provided. We might get some "fat jars" and
> scanning might take some time.
>
>
> -Matthias
>
>
>
>
> On 05/19/2015 10:44 AM, Stephan Ewen wrote:
> > We actually has an interface like that before ("Program"). It is still
> > supported, but in all new programs we simply use the Java main method.
> The
> > advantage is that
> > most IDEs can create executable JARs automatically, setting the JAR
> > manifest attributes, etc.
> >
> > The "Program" interface still works, though. Most tool classes (like
> > "PackagedProgram") have a way to figure out whether the code uses
> "main()"
> > or implements "Program"
> > and calls the right method.
> >
> > You can try and extend the program interface. If you want to consistently
> > support multiple programs in one JAR file, you may need to adjust the
> util
> > classes as
> > well to deal with that.
> >
> >
> >
> > On Tue, May 19, 2015 at 10:10 AM, Matthias J. Sax <
> > mjsax@informatik.hu-berlin.de> wrote:
> >
> >> Supporting an interface like this seems to be a nice idea. Any other
> >> opinions on it?
> >>
> >> It seems to be some more work to get it done right. I don't want to
> >> start working on it, before it's clear that it has a chance to be
> >> included in Flink.
> >>
> >> @Flavio: I moved the discussion to dev mailing list (user list is not
> >> appropriate for this discussion). Are you subscribed to it or should I
> >> cc you in each mail?
> >>
> >>
> >> -Matthias
> >>
> >>
> >> On 05/19/2015 09:39 AM, Flavio Pompermaier wrote:
> >>> Nice feature Matthias!
> >>> My suggestion is to create a specific Flink interface to get also
> >>> description of a job and standardize parameter passing.
> >>> Then, somewhere (e.g. Manifest) you could specify the list of packages
> >> (or
> >>> also directly the classes) to inspect with reflection to extract the
> list
> >>> of available Flink jobs.
> >>> Something like:
> >>>
> >>> public interface FlinkJob {
> >>>
> >>> /** The name to display in the job submission UI or shell */
> >>> //e.g. "My Flink HelloWorld"
> >>> String getDisplayName();
> >>>  //e.g. "This program does this and that etc.."
> >>> String getDescription();
> >>>  //e.g. <0,Integer,"An integer representing my first param">,
> >> <1,String,"An
> >>> string representing my second param">
> >>> List<Tuple3<Integer, TypeInfo, String>> paramDescription;
> >>>  /** Set up the flink job in the passed ExecutionEnvironment */
> >>> ExecutionEnvironment config(ExecutionEnvironment env);
> >>> }
> >>>
> >>> What do you think?
> >>>
> >>>
> >>>
> >>> On Sun, May 17, 2015 at 10:38 PM, Matthias J. Sax <
> >>> mjsax@informatik.hu-berlin.de> wrote:
> >>>
> >>>> Hi,
> >>>>
> >>>> I like the idea that Flink's WebClient can show different plans for
> >>>> different jobs within a single jar file.
> >>>>
> >>>> I prepared a prototype for this feature. You can find it here:
> >>>> https://github.com/mjsax/flink/tree/multipleJobsWebUI
> >>>>
> >>>> To test the feature, you need to prepare a jar file, that contains the
> >>>> code of multiple programs and specify each entry class in the manifest
> >>>> file as comma separated values in "program-class" line.
> >>>>
> >>>> Feedback is welcome. :)
> >>>>
> >>>>
> >>>> -Matthias
> >>>>
> >>>>
> >>>> On 05/08/2015 03:08 PM, Flavio Pompermaier wrote:
> >>>>> Thank you all for the support!
> >>>>> It will be a really nice feature if the web client could be able to
> >> show
> >>>>> me the list of Flink jobs within my jar..
> >>>>> it should be sufficient to mark them with a special annotation and
> >>>>> inspect the classes within the jar..
> >>>>>
> >>>>> On Fri, May 8, 2015 at 3:03 PM, Malte Schwarzer <ms@mieo.de
> >>>>> <ma...@mieo.de>> wrote:
> >>>>>
> >>>>>     Hi Flavio,
> >>>>>
> >>>>>     you also can put each job in a single class and use the –c
> >> parameter
> >>>>>     to execute jobs separately:
> >>>>>
> >>>>>     /bin/flink run –c com.myflinkjobs.JobA
> >> /path/to/jar/multiplejobs.jar
> >>>>>     /bin/flink run –c com.myflinkjobs.JobB
> >> /path/to/jar/multiplejobs.jar
> >>>>>     …
> >>>>>
> >>>>>     Cheers
> >>>>>     Malte
> >>>>>
> >>>>>     Von: Robert Metzger <rmetzger@apache.org <mailto:
> >> rmetzger@apache.org
> >>>>>>
> >>>>>     Antworten an: <user@flink.apache.org <mailto:
> user@flink.apache.org
> >>>>
> >>>>>     Datum: Freitag, 8. Mai 2015 14:57
> >>>>>     An: "user@flink.apache.org <ma...@flink.apache.org>"
> >>>>>     <user@flink.apache.org <ma...@flink.apache.org>>
> >>>>>     Betreff: Re: Package multiple jobs in a single jar
> >>>>>
> >>>>>     Hi Flavio,
> >>>>>
> >>>>>     the pom from our quickstart is a good
> >>>>>     reference:
> >>>>
> >>
> https://github.com/apache/flink/blob/master/flink-quickstart/flink-quickstart-java/src/main/resources/archetype-resources/pom.xml
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>     On Fri, May 8, 2015 at 2:53 PM, Flavio Pompermaier
> >>>>>     <pompermaier@okkam.it <ma...@okkam.it>> wrote:
> >>>>>
> >>>>>         Ok, get it.
> >>>>>         And is there a reference pom.xml for shading my application
> >> into
> >>>>>         one fat-jar? which flink dependencies can I exclude?
> >>>>>
> >>>>>         On Fri, May 8, 2015 at 1:05 PM, Fabian Hueske <
> >> fhueske@gmail.com
> >>>>>         <ma...@gmail.com>> wrote:
> >>>>>
> >>>>>             I didn't say that the main should return the
> >>>>>             ExecutionEnvironment.
> >>>>>             You can define and execute as many programs in a main
> >>>>>             function as you like.
> >>>>>             The program can be defined somewhere else, e.g., in a
> >>>>>             function that receives an ExecutionEnvironment and
> attaches
> >>>>>             a program such as
> >>>>>
> >>>>>             public void buildMyProgram(ExecutionEnvironment env) {
> >>>>>               DataSet<String> lines = env.readTextFile(...);
> >>>>>               // do something
> >>>>>               lines.writeAsText(...);
> >>>>>             }
> >>>>>
> >>>>>             That method could be invoked from main():
> >>>>>
> >>>>>             psv main() {
> >>>>>               ExecutionEnv env = ...
> >>>>>
> >>>>>               if(...) {
> >>>>>                 buildMyProgram(env);
> >>>>>               }
> >>>>>               else {
> >>>>>                 buildSomeOtherProg(env);
> >>>>>               }
> >>>>>
> >>>>>               env.execute();
> >>>>>
> >>>>>               // run some more programs
> >>>>>             }
> >>>>>
> >>>>>             2015-05-08 12:56 GMT+02:00 Flavio Pompermaier
> >>>>>             <pompermaier@okkam.it <ma...@okkam.it>>:
> >>>>>
> >>>>>                 Hi Fabian,
> >>>>>                 thanks for the response.
> >>>>>                 So my mains should be converted in a method returning
> >>>>>                 the ExecutionEnvironment.
> >>>>>                 However it think that it will be very nice to have a
> >>>>>                 syntax like the one of the Hadoop ProgramDriver to
> >>>>>                 define jobs to invoke from a single root class.
> >>>>>                 Do you think it could be useful?
> >>>>>
> >>>>>                 On Fri, May 8, 2015 at 12:42 PM, Fabian Hueske
> >>>>>                 <fhueske@gmail.com <ma...@gmail.com>>
> wrote:
> >>>>>
> >>>>>                     You easily have multiple Flink programs in a
> single
> >>>>>                     JAR file.
> >>>>>                     A program is defined using an
> ExecutionEnvironment
> >>>>>                     and executed when you call
> >>>>>                     ExecutionEnvironment.exeucte().
> >>>>>                     Where and how you do that does not matter.
> >>>>>
> >>>>>                     You can for example implement a main function
> such
> >>>> as:
> >>>>>
> >>>>>                     public static void main(String... args) {
> >>>>>
> >>>>>                       if (today == Monday) {
> >>>>>                         ExecutionEnvironment env = ...
> >>>>>                         // define Monday prog
> >>>>>                         env.execute()
> >>>>>                       }
> >>>>>                       else {
> >>>>>                         ExecutionEnvironment env = ...
> >>>>>                         // define other prog
> >>>>>                         env.execute()
> >>>>>                       }
> >>>>>                     }
> >>>>>
> >>>>>                     2015-05-08 11:41 GMT+02:00 Flavio Pompermaier
> >>>>>                     <pompermaier@okkam.it <mailto:
> pompermaier@okkam.it
> >>>>>> :
> >>>>>
> >>>>>                         Hi to all,
> >>>>>                         is there any way to keep multiple jobs in a
> jar
> >>>>>                         and then choose at runtime the one to execute
> >>>>>                         (like what ProgramDriver does in Hadoop)?
> >>>>>
> >>>>>                         Best,
> >>>>>                         Flavio
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>
> >>>>
> >>>
> >>
> >>
> >
>
>

Re: Package multiple jobs in a single jar

Posted by "Matthias J. Sax" <mj...@informatik.hu-berlin.de>.
Hi,

I had a look into the current Workflow of Flink with regard to the
progressing steps of a jar file.

If I got it right it works as follows (not sure if this is documented
somewhere):

1) check, if "-c" flag is used to set program entry point
   if yes, goto 4
2) try to extract "program-class" property from manifest
   (if found goto 4)
3) try to extract "Main-Class" property from manifest
   -> if not found through exception (this happens also, if no manifest
file is found at all)

4) check if entry point class implements "Program" interface
   if yes, goto 6
5) check if entry point class provided "public static void main(String[]
args)" method
   -> if not, through exception

6) execute program (ie, show plan/info or really run it)


I also "discovered" the interface "ProgramDescription" with a single
method "String getDescription()". Even if some examples implement this
interface (and use it in the example itself), Flink basically ignores
it... From the CLI there is no way to get this info, and the WebUI does
actually get it if present, however, doesn't show it anywhere...


I think it would be nice, if we would extend the following functions:

 - extend the possibility to specify multiple entry classes in
"program-class" or "Main-Class" -> in this case, the user needs to use
"-c" flag to pick program to run every time

 - add a CLI option that allows the user to see what entry point classes
are available
   for this, consider
     a) "program-class" entry
     b) "Main-Class" entry
     c) if neither is found, scan jar-file for classes implementing
"Program" interface
     d) if still not found, scan jar-file for classes with "main" method

 - if user looks for entry point classes via CLI, check for
"ProgramDesciption" interface and show info

 - extend WebUI to show all available entry-classes (pull request
already there, for multiple entries in "program-class")

 - extend WebUI to show "ProgramDescription" info


What do you think? I am not too sure about the "auto scan" of the jar
file if no manifest entry is provided. We might get some "fat jars" and
scanning might take some time.


-Matthias




On 05/19/2015 10:44 AM, Stephan Ewen wrote:
> We actually has an interface like that before ("Program"). It is still
> supported, but in all new programs we simply use the Java main method. The
> advantage is that
> most IDEs can create executable JARs automatically, setting the JAR
> manifest attributes, etc.
> 
> The "Program" interface still works, though. Most tool classes (like
> "PackagedProgram") have a way to figure out whether the code uses "main()"
> or implements "Program"
> and calls the right method.
> 
> You can try and extend the program interface. If you want to consistently
> support multiple programs in one JAR file, you may need to adjust the util
> classes as
> well to deal with that.
> 
> 
> 
> On Tue, May 19, 2015 at 10:10 AM, Matthias J. Sax <
> mjsax@informatik.hu-berlin.de> wrote:
> 
>> Supporting an interface like this seems to be a nice idea. Any other
>> opinions on it?
>>
>> It seems to be some more work to get it done right. I don't want to
>> start working on it, before it's clear that it has a chance to be
>> included in Flink.
>>
>> @Flavio: I moved the discussion to dev mailing list (user list is not
>> appropriate for this discussion). Are you subscribed to it or should I
>> cc you in each mail?
>>
>>
>> -Matthias
>>
>>
>> On 05/19/2015 09:39 AM, Flavio Pompermaier wrote:
>>> Nice feature Matthias!
>>> My suggestion is to create a specific Flink interface to get also
>>> description of a job and standardize parameter passing.
>>> Then, somewhere (e.g. Manifest) you could specify the list of packages
>> (or
>>> also directly the classes) to inspect with reflection to extract the list
>>> of available Flink jobs.
>>> Something like:
>>>
>>> public interface FlinkJob {
>>>
>>> /** The name to display in the job submission UI or shell */
>>> //e.g. "My Flink HelloWorld"
>>> String getDisplayName();
>>>  //e.g. "This program does this and that etc.."
>>> String getDescription();
>>>  //e.g. <0,Integer,"An integer representing my first param">,
>> <1,String,"An
>>> string representing my second param">
>>> List<Tuple3<Integer, TypeInfo, String>> paramDescription;
>>>  /** Set up the flink job in the passed ExecutionEnvironment */
>>> ExecutionEnvironment config(ExecutionEnvironment env);
>>> }
>>>
>>> What do you think?
>>>
>>>
>>>
>>> On Sun, May 17, 2015 at 10:38 PM, Matthias J. Sax <
>>> mjsax@informatik.hu-berlin.de> wrote:
>>>
>>>> Hi,
>>>>
>>>> I like the idea that Flink's WebClient can show different plans for
>>>> different jobs within a single jar file.
>>>>
>>>> I prepared a prototype for this feature. You can find it here:
>>>> https://github.com/mjsax/flink/tree/multipleJobsWebUI
>>>>
>>>> To test the feature, you need to prepare a jar file, that contains the
>>>> code of multiple programs and specify each entry class in the manifest
>>>> file as comma separated values in "program-class" line.
>>>>
>>>> Feedback is welcome. :)
>>>>
>>>>
>>>> -Matthias
>>>>
>>>>
>>>> On 05/08/2015 03:08 PM, Flavio Pompermaier wrote:
>>>>> Thank you all for the support!
>>>>> It will be a really nice feature if the web client could be able to
>> show
>>>>> me the list of Flink jobs within my jar..
>>>>> it should be sufficient to mark them with a special annotation and
>>>>> inspect the classes within the jar..
>>>>>
>>>>> On Fri, May 8, 2015 at 3:03 PM, Malte Schwarzer <ms@mieo.de
>>>>> <ma...@mieo.de>> wrote:
>>>>>
>>>>>     Hi Flavio,
>>>>>
>>>>>     you also can put each job in a single class and use the –c
>> parameter
>>>>>     to execute jobs separately:
>>>>>
>>>>>     /bin/flink run –c com.myflinkjobs.JobA
>> /path/to/jar/multiplejobs.jar
>>>>>     /bin/flink run –c com.myflinkjobs.JobB
>> /path/to/jar/multiplejobs.jar
>>>>>     …
>>>>>
>>>>>     Cheers
>>>>>     Malte
>>>>>
>>>>>     Von: Robert Metzger <rmetzger@apache.org <mailto:
>> rmetzger@apache.org
>>>>>>
>>>>>     Antworten an: <user@flink.apache.org <mailto:user@flink.apache.org
>>>>
>>>>>     Datum: Freitag, 8. Mai 2015 14:57
>>>>>     An: "user@flink.apache.org <ma...@flink.apache.org>"
>>>>>     <user@flink.apache.org <ma...@flink.apache.org>>
>>>>>     Betreff: Re: Package multiple jobs in a single jar
>>>>>
>>>>>     Hi Flavio,
>>>>>
>>>>>     the pom from our quickstart is a good
>>>>>     reference:
>>>>
>> https://github.com/apache/flink/blob/master/flink-quickstart/flink-quickstart-java/src/main/resources/archetype-resources/pom.xml
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>     On Fri, May 8, 2015 at 2:53 PM, Flavio Pompermaier
>>>>>     <pompermaier@okkam.it <ma...@okkam.it>> wrote:
>>>>>
>>>>>         Ok, get it.
>>>>>         And is there a reference pom.xml for shading my application
>> into
>>>>>         one fat-jar? which flink dependencies can I exclude?
>>>>>
>>>>>         On Fri, May 8, 2015 at 1:05 PM, Fabian Hueske <
>> fhueske@gmail.com
>>>>>         <ma...@gmail.com>> wrote:
>>>>>
>>>>>             I didn't say that the main should return the
>>>>>             ExecutionEnvironment.
>>>>>             You can define and execute as many programs in a main
>>>>>             function as you like.
>>>>>             The program can be defined somewhere else, e.g., in a
>>>>>             function that receives an ExecutionEnvironment and attaches
>>>>>             a program such as
>>>>>
>>>>>             public void buildMyProgram(ExecutionEnvironment env) {
>>>>>               DataSet<String> lines = env.readTextFile(...);
>>>>>               // do something
>>>>>               lines.writeAsText(...);
>>>>>             }
>>>>>
>>>>>             That method could be invoked from main():
>>>>>
>>>>>             psv main() {
>>>>>               ExecutionEnv env = ...
>>>>>
>>>>>               if(...) {
>>>>>                 buildMyProgram(env);
>>>>>               }
>>>>>               else {
>>>>>                 buildSomeOtherProg(env);
>>>>>               }
>>>>>
>>>>>               env.execute();
>>>>>
>>>>>               // run some more programs
>>>>>             }
>>>>>
>>>>>             2015-05-08 12:56 GMT+02:00 Flavio Pompermaier
>>>>>             <pompermaier@okkam.it <ma...@okkam.it>>:
>>>>>
>>>>>                 Hi Fabian,
>>>>>                 thanks for the response.
>>>>>                 So my mains should be converted in a method returning
>>>>>                 the ExecutionEnvironment.
>>>>>                 However it think that it will be very nice to have a
>>>>>                 syntax like the one of the Hadoop ProgramDriver to
>>>>>                 define jobs to invoke from a single root class.
>>>>>                 Do you think it could be useful?
>>>>>
>>>>>                 On Fri, May 8, 2015 at 12:42 PM, Fabian Hueske
>>>>>                 <fhueske@gmail.com <ma...@gmail.com>> wrote:
>>>>>
>>>>>                     You easily have multiple Flink programs in a single
>>>>>                     JAR file.
>>>>>                     A program is defined using an ExecutionEnvironment
>>>>>                     and executed when you call
>>>>>                     ExecutionEnvironment.exeucte().
>>>>>                     Where and how you do that does not matter.
>>>>>
>>>>>                     You can for example implement a main function such
>>>> as:
>>>>>
>>>>>                     public static void main(String... args) {
>>>>>
>>>>>                       if (today == Monday) {
>>>>>                         ExecutionEnvironment env = ...
>>>>>                         // define Monday prog
>>>>>                         env.execute()
>>>>>                       }
>>>>>                       else {
>>>>>                         ExecutionEnvironment env = ...
>>>>>                         // define other prog
>>>>>                         env.execute()
>>>>>                       }
>>>>>                     }
>>>>>
>>>>>                     2015-05-08 11:41 GMT+02:00 Flavio Pompermaier
>>>>>                     <pompermaier@okkam.it <mailto:pompermaier@okkam.it
>>>>>> :
>>>>>
>>>>>                         Hi to all,
>>>>>                         is there any way to keep multiple jobs in a jar
>>>>>                         and then choose at runtime the one to execute
>>>>>                         (like what ProgramDriver does in Hadoop)?
>>>>>
>>>>>                         Best,
>>>>>                         Flavio
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>
>>
> 


Re: Package multiple jobs in a single jar

Posted by Stephan Ewen <se...@apache.org>.
We actually has an interface like that before ("Program"). It is still
supported, but in all new programs we simply use the Java main method. The
advantage is that
most IDEs can create executable JARs automatically, setting the JAR
manifest attributes, etc.

The "Program" interface still works, though. Most tool classes (like
"PackagedProgram") have a way to figure out whether the code uses "main()"
or implements "Program"
and calls the right method.

You can try and extend the program interface. If you want to consistently
support multiple programs in one JAR file, you may need to adjust the util
classes as
well to deal with that.



On Tue, May 19, 2015 at 10:10 AM, Matthias J. Sax <
mjsax@informatik.hu-berlin.de> wrote:

> Supporting an interface like this seems to be a nice idea. Any other
> opinions on it?
>
> It seems to be some more work to get it done right. I don't want to
> start working on it, before it's clear that it has a chance to be
> included in Flink.
>
> @Flavio: I moved the discussion to dev mailing list (user list is not
> appropriate for this discussion). Are you subscribed to it or should I
> cc you in each mail?
>
>
> -Matthias
>
>
> On 05/19/2015 09:39 AM, Flavio Pompermaier wrote:
> > Nice feature Matthias!
> > My suggestion is to create a specific Flink interface to get also
> > description of a job and standardize parameter passing.
> > Then, somewhere (e.g. Manifest) you could specify the list of packages
> (or
> > also directly the classes) to inspect with reflection to extract the list
> > of available Flink jobs.
> > Something like:
> >
> > public interface FlinkJob {
> >
> > /** The name to display in the job submission UI or shell */
> > //e.g. "My Flink HelloWorld"
> > String getDisplayName();
> >  //e.g. "This program does this and that etc.."
> > String getDescription();
> >  //e.g. <0,Integer,"An integer representing my first param">,
> <1,String,"An
> > string representing my second param">
> > List<Tuple3<Integer, TypeInfo, String>> paramDescription;
> >  /** Set up the flink job in the passed ExecutionEnvironment */
> > ExecutionEnvironment config(ExecutionEnvironment env);
> > }
> >
> > What do you think?
> >
> >
> >
> > On Sun, May 17, 2015 at 10:38 PM, Matthias J. Sax <
> > mjsax@informatik.hu-berlin.de> wrote:
> >
> >> Hi,
> >>
> >> I like the idea that Flink's WebClient can show different plans for
> >> different jobs within a single jar file.
> >>
> >> I prepared a prototype for this feature. You can find it here:
> >> https://github.com/mjsax/flink/tree/multipleJobsWebUI
> >>
> >> To test the feature, you need to prepare a jar file, that contains the
> >> code of multiple programs and specify each entry class in the manifest
> >> file as comma separated values in "program-class" line.
> >>
> >> Feedback is welcome. :)
> >>
> >>
> >> -Matthias
> >>
> >>
> >> On 05/08/2015 03:08 PM, Flavio Pompermaier wrote:
> >>> Thank you all for the support!
> >>> It will be a really nice feature if the web client could be able to
> show
> >>> me the list of Flink jobs within my jar..
> >>> it should be sufficient to mark them with a special annotation and
> >>> inspect the classes within the jar..
> >>>
> >>> On Fri, May 8, 2015 at 3:03 PM, Malte Schwarzer <ms@mieo.de
> >>> <ma...@mieo.de>> wrote:
> >>>
> >>>     Hi Flavio,
> >>>
> >>>     you also can put each job in a single class and use the –c
> parameter
> >>>     to execute jobs separately:
> >>>
> >>>     /bin/flink run –c com.myflinkjobs.JobA
> /path/to/jar/multiplejobs.jar
> >>>     /bin/flink run –c com.myflinkjobs.JobB
> /path/to/jar/multiplejobs.jar
> >>>     …
> >>>
> >>>     Cheers
> >>>     Malte
> >>>
> >>>     Von: Robert Metzger <rmetzger@apache.org <mailto:
> rmetzger@apache.org
> >>>>
> >>>     Antworten an: <user@flink.apache.org <mailto:user@flink.apache.org
> >>
> >>>     Datum: Freitag, 8. Mai 2015 14:57
> >>>     An: "user@flink.apache.org <ma...@flink.apache.org>"
> >>>     <user@flink.apache.org <ma...@flink.apache.org>>
> >>>     Betreff: Re: Package multiple jobs in a single jar
> >>>
> >>>     Hi Flavio,
> >>>
> >>>     the pom from our quickstart is a good
> >>>     reference:
> >>
> https://github.com/apache/flink/blob/master/flink-quickstart/flink-quickstart-java/src/main/resources/archetype-resources/pom.xml
> >>>
> >>>
> >>>
> >>>
> >>>     On Fri, May 8, 2015 at 2:53 PM, Flavio Pompermaier
> >>>     <pompermaier@okkam.it <ma...@okkam.it>> wrote:
> >>>
> >>>         Ok, get it.
> >>>         And is there a reference pom.xml for shading my application
> into
> >>>         one fat-jar? which flink dependencies can I exclude?
> >>>
> >>>         On Fri, May 8, 2015 at 1:05 PM, Fabian Hueske <
> fhueske@gmail.com
> >>>         <ma...@gmail.com>> wrote:
> >>>
> >>>             I didn't say that the main should return the
> >>>             ExecutionEnvironment.
> >>>             You can define and execute as many programs in a main
> >>>             function as you like.
> >>>             The program can be defined somewhere else, e.g., in a
> >>>             function that receives an ExecutionEnvironment and attaches
> >>>             a program such as
> >>>
> >>>             public void buildMyProgram(ExecutionEnvironment env) {
> >>>               DataSet<String> lines = env.readTextFile(...);
> >>>               // do something
> >>>               lines.writeAsText(...);
> >>>             }
> >>>
> >>>             That method could be invoked from main():
> >>>
> >>>             psv main() {
> >>>               ExecutionEnv env = ...
> >>>
> >>>               if(...) {
> >>>                 buildMyProgram(env);
> >>>               }
> >>>               else {
> >>>                 buildSomeOtherProg(env);
> >>>               }
> >>>
> >>>               env.execute();
> >>>
> >>>               // run some more programs
> >>>             }
> >>>
> >>>             2015-05-08 12:56 GMT+02:00 Flavio Pompermaier
> >>>             <pompermaier@okkam.it <ma...@okkam.it>>:
> >>>
> >>>                 Hi Fabian,
> >>>                 thanks for the response.
> >>>                 So my mains should be converted in a method returning
> >>>                 the ExecutionEnvironment.
> >>>                 However it think that it will be very nice to have a
> >>>                 syntax like the one of the Hadoop ProgramDriver to
> >>>                 define jobs to invoke from a single root class.
> >>>                 Do you think it could be useful?
> >>>
> >>>                 On Fri, May 8, 2015 at 12:42 PM, Fabian Hueske
> >>>                 <fhueske@gmail.com <ma...@gmail.com>> wrote:
> >>>
> >>>                     You easily have multiple Flink programs in a single
> >>>                     JAR file.
> >>>                     A program is defined using an ExecutionEnvironment
> >>>                     and executed when you call
> >>>                     ExecutionEnvironment.exeucte().
> >>>                     Where and how you do that does not matter.
> >>>
> >>>                     You can for example implement a main function such
> >> as:
> >>>
> >>>                     public static void main(String... args) {
> >>>
> >>>                       if (today == Monday) {
> >>>                         ExecutionEnvironment env = ...
> >>>                         // define Monday prog
> >>>                         env.execute()
> >>>                       }
> >>>                       else {
> >>>                         ExecutionEnvironment env = ...
> >>>                         // define other prog
> >>>                         env.execute()
> >>>                       }
> >>>                     }
> >>>
> >>>                     2015-05-08 11:41 GMT+02:00 Flavio Pompermaier
> >>>                     <pompermaier@okkam.it <mailto:pompermaier@okkam.it
> >>>> :
> >>>
> >>>                         Hi to all,
> >>>                         is there any way to keep multiple jobs in a jar
> >>>                         and then choose at runtime the one to execute
> >>>                         (like what ProgramDriver does in Hadoop)?
> >>>
> >>>                         Best,
> >>>                         Flavio
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>
> >>
> >
>
>

Re: Package multiple jobs in a single jar

Posted by "Matthias J. Sax" <mj...@informatik.hu-berlin.de>.
Supporting an interface like this seems to be a nice idea. Any other
opinions on it?

It seems to be some more work to get it done right. I don't want to
start working on it, before it's clear that it has a chance to be
included in Flink.

@Flavio: I moved the discussion to dev mailing list (user list is not
appropriate for this discussion). Are you subscribed to it or should I
cc you in each mail?


-Matthias


On 05/19/2015 09:39 AM, Flavio Pompermaier wrote:
> Nice feature Matthias!
> My suggestion is to create a specific Flink interface to get also
> description of a job and standardize parameter passing.
> Then, somewhere (e.g. Manifest) you could specify the list of packages (or
> also directly the classes) to inspect with reflection to extract the list
> of available Flink jobs.
> Something like:
> 
> public interface FlinkJob {
> 
> /** The name to display in the job submission UI or shell */
> //e.g. "My Flink HelloWorld"
> String getDisplayName();
>  //e.g. "This program does this and that etc.."
> String getDescription();
>  //e.g. <0,Integer,"An integer representing my first param">, <1,String,"An
> string representing my second param">
> List<Tuple3<Integer, TypeInfo, String>> paramDescription;
>  /** Set up the flink job in the passed ExecutionEnvironment */
> ExecutionEnvironment config(ExecutionEnvironment env);
> }
> 
> What do you think?
> 
> 
> 
> On Sun, May 17, 2015 at 10:38 PM, Matthias J. Sax <
> mjsax@informatik.hu-berlin.de> wrote:
> 
>> Hi,
>>
>> I like the idea that Flink's WebClient can show different plans for
>> different jobs within a single jar file.
>>
>> I prepared a prototype for this feature. You can find it here:
>> https://github.com/mjsax/flink/tree/multipleJobsWebUI
>>
>> To test the feature, you need to prepare a jar file, that contains the
>> code of multiple programs and specify each entry class in the manifest
>> file as comma separated values in "program-class" line.
>>
>> Feedback is welcome. :)
>>
>>
>> -Matthias
>>
>>
>> On 05/08/2015 03:08 PM, Flavio Pompermaier wrote:
>>> Thank you all for the support!
>>> It will be a really nice feature if the web client could be able to show
>>> me the list of Flink jobs within my jar..
>>> it should be sufficient to mark them with a special annotation and
>>> inspect the classes within the jar..
>>>
>>> On Fri, May 8, 2015 at 3:03 PM, Malte Schwarzer <ms@mieo.de
>>> <ma...@mieo.de>> wrote:
>>>
>>>     Hi Flavio,
>>>
>>>     you also can put each job in a single class and use the –c parameter
>>>     to execute jobs separately:
>>>
>>>     /bin/flink run –c com.myflinkjobs.JobA /path/to/jar/multiplejobs.jar
>>>     /bin/flink run –c com.myflinkjobs.JobB /path/to/jar/multiplejobs.jar
>>>     …
>>>
>>>     Cheers
>>>     Malte
>>>
>>>     Von: Robert Metzger <rmetzger@apache.org <mailto:rmetzger@apache.org
>>>>
>>>     Antworten an: <user@flink.apache.org <ma...@flink.apache.org>>
>>>     Datum: Freitag, 8. Mai 2015 14:57
>>>     An: "user@flink.apache.org <ma...@flink.apache.org>"
>>>     <user@flink.apache.org <ma...@flink.apache.org>>
>>>     Betreff: Re: Package multiple jobs in a single jar
>>>
>>>     Hi Flavio,
>>>
>>>     the pom from our quickstart is a good
>>>     reference:
>> https://github.com/apache/flink/blob/master/flink-quickstart/flink-quickstart-java/src/main/resources/archetype-resources/pom.xml
>>>
>>>
>>>
>>>
>>>     On Fri, May 8, 2015 at 2:53 PM, Flavio Pompermaier
>>>     <pompermaier@okkam.it <ma...@okkam.it>> wrote:
>>>
>>>         Ok, get it.
>>>         And is there a reference pom.xml for shading my application into
>>>         one fat-jar? which flink dependencies can I exclude?
>>>
>>>         On Fri, May 8, 2015 at 1:05 PM, Fabian Hueske <fhueske@gmail.com
>>>         <ma...@gmail.com>> wrote:
>>>
>>>             I didn't say that the main should return the
>>>             ExecutionEnvironment.
>>>             You can define and execute as many programs in a main
>>>             function as you like.
>>>             The program can be defined somewhere else, e.g., in a
>>>             function that receives an ExecutionEnvironment and attaches
>>>             a program such as
>>>
>>>             public void buildMyProgram(ExecutionEnvironment env) {
>>>               DataSet<String> lines = env.readTextFile(...);
>>>               // do something
>>>               lines.writeAsText(...);
>>>             }
>>>
>>>             That method could be invoked from main():
>>>
>>>             psv main() {
>>>               ExecutionEnv env = ...
>>>
>>>               if(...) {
>>>                 buildMyProgram(env);
>>>               }
>>>               else {
>>>                 buildSomeOtherProg(env);
>>>               }
>>>
>>>               env.execute();
>>>
>>>               // run some more programs
>>>             }
>>>
>>>             2015-05-08 12:56 GMT+02:00 Flavio Pompermaier
>>>             <pompermaier@okkam.it <ma...@okkam.it>>:
>>>
>>>                 Hi Fabian,
>>>                 thanks for the response.
>>>                 So my mains should be converted in a method returning
>>>                 the ExecutionEnvironment.
>>>                 However it think that it will be very nice to have a
>>>                 syntax like the one of the Hadoop ProgramDriver to
>>>                 define jobs to invoke from a single root class.
>>>                 Do you think it could be useful?
>>>
>>>                 On Fri, May 8, 2015 at 12:42 PM, Fabian Hueske
>>>                 <fhueske@gmail.com <ma...@gmail.com>> wrote:
>>>
>>>                     You easily have multiple Flink programs in a single
>>>                     JAR file.
>>>                     A program is defined using an ExecutionEnvironment
>>>                     and executed when you call
>>>                     ExecutionEnvironment.exeucte().
>>>                     Where and how you do that does not matter.
>>>
>>>                     You can for example implement a main function such
>> as:
>>>
>>>                     public static void main(String... args) {
>>>
>>>                       if (today == Monday) {
>>>                         ExecutionEnvironment env = ...
>>>                         // define Monday prog
>>>                         env.execute()
>>>                       }
>>>                       else {
>>>                         ExecutionEnvironment env = ...
>>>                         // define other prog
>>>                         env.execute()
>>>                       }
>>>                     }
>>>
>>>                     2015-05-08 11:41 GMT+02:00 Flavio Pompermaier
>>>                     <pompermaier@okkam.it <mailto:pompermaier@okkam.it
>>>> :
>>>
>>>                         Hi to all,
>>>                         is there any way to keep multiple jobs in a jar
>>>                         and then choose at runtime the one to execute
>>>                         (like what ProgramDriver does in Hadoop)?
>>>
>>>                         Best,
>>>                         Flavio
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>
>>
> 


Re: Package multiple jobs in a single jar

Posted by Flavio Pompermaier <po...@okkam.it>.
Nice feature Matthias!
My suggestion is to create a specific Flink interface to get also
description of a job and standardize parameter passing.
Then, somewhere (e.g. Manifest) you could specify the list of packages (or
also directly the classes) to inspect with reflection to extract the list
of available Flink jobs.
Something like:

public interface FlinkJob {

/** The name to display in the job submission UI or shell */
//e.g. "My Flink HelloWorld"
String getDisplayName();
 //e.g. "This program does this and that etc.."
String getDescription();
 //e.g. <0,Integer,"An integer representing my first param">, <1,String,"An
string representing my second param">
List<Tuple3<Integer, TypeInfo, String>> paramDescription;
 /** Set up the flink job in the passed ExecutionEnvironment */
ExecutionEnvironment config(ExecutionEnvironment env);
}

What do you think?



On Sun, May 17, 2015 at 10:38 PM, Matthias J. Sax <
mjsax@informatik.hu-berlin.de> wrote:

> Hi,
>
> I like the idea that Flink's WebClient can show different plans for
> different jobs within a single jar file.
>
> I prepared a prototype for this feature. You can find it here:
> https://github.com/mjsax/flink/tree/multipleJobsWebUI
>
> To test the feature, you need to prepare a jar file, that contains the
> code of multiple programs and specify each entry class in the manifest
> file as comma separated values in "program-class" line.
>
> Feedback is welcome. :)
>
>
> -Matthias
>
>
> On 05/08/2015 03:08 PM, Flavio Pompermaier wrote:
> > Thank you all for the support!
> > It will be a really nice feature if the web client could be able to show
> > me the list of Flink jobs within my jar..
> > it should be sufficient to mark them with a special annotation and
> > inspect the classes within the jar..
> >
> > On Fri, May 8, 2015 at 3:03 PM, Malte Schwarzer <ms@mieo.de
> > <ma...@mieo.de>> wrote:
> >
> >     Hi Flavio,
> >
> >     you also can put each job in a single class and use the –c parameter
> >     to execute jobs separately:
> >
> >     /bin/flink run –c com.myflinkjobs.JobA /path/to/jar/multiplejobs.jar
> >     /bin/flink run –c com.myflinkjobs.JobB /path/to/jar/multiplejobs.jar
> >     …
> >
> >     Cheers
> >     Malte
> >
> >     Von: Robert Metzger <rmetzger@apache.org <mailto:rmetzger@apache.org
> >>
> >     Antworten an: <user@flink.apache.org <ma...@flink.apache.org>>
> >     Datum: Freitag, 8. Mai 2015 14:57
> >     An: "user@flink.apache.org <ma...@flink.apache.org>"
> >     <user@flink.apache.org <ma...@flink.apache.org>>
> >     Betreff: Re: Package multiple jobs in a single jar
> >
> >     Hi Flavio,
> >
> >     the pom from our quickstart is a good
> >     reference:
> https://github.com/apache/flink/blob/master/flink-quickstart/flink-quickstart-java/src/main/resources/archetype-resources/pom.xml
> >
> >
> >
> >
> >     On Fri, May 8, 2015 at 2:53 PM, Flavio Pompermaier
> >     <pompermaier@okkam.it <ma...@okkam.it>> wrote:
> >
> >         Ok, get it.
> >         And is there a reference pom.xml for shading my application into
> >         one fat-jar? which flink dependencies can I exclude?
> >
> >         On Fri, May 8, 2015 at 1:05 PM, Fabian Hueske <fhueske@gmail.com
> >         <ma...@gmail.com>> wrote:
> >
> >             I didn't say that the main should return the
> >             ExecutionEnvironment.
> >             You can define and execute as many programs in a main
> >             function as you like.
> >             The program can be defined somewhere else, e.g., in a
> >             function that receives an ExecutionEnvironment and attaches
> >             a program such as
> >
> >             public void buildMyProgram(ExecutionEnvironment env) {
> >               DataSet<String> lines = env.readTextFile(...);
> >               // do something
> >               lines.writeAsText(...);
> >             }
> >
> >             That method could be invoked from main():
> >
> >             psv main() {
> >               ExecutionEnv env = ...
> >
> >               if(...) {
> >                 buildMyProgram(env);
> >               }
> >               else {
> >                 buildSomeOtherProg(env);
> >               }
> >
> >               env.execute();
> >
> >               // run some more programs
> >             }
> >
> >             2015-05-08 12:56 GMT+02:00 Flavio Pompermaier
> >             <pompermaier@okkam.it <ma...@okkam.it>>:
> >
> >                 Hi Fabian,
> >                 thanks for the response.
> >                 So my mains should be converted in a method returning
> >                 the ExecutionEnvironment.
> >                 However it think that it will be very nice to have a
> >                 syntax like the one of the Hadoop ProgramDriver to
> >                 define jobs to invoke from a single root class.
> >                 Do you think it could be useful?
> >
> >                 On Fri, May 8, 2015 at 12:42 PM, Fabian Hueske
> >                 <fhueske@gmail.com <ma...@gmail.com>> wrote:
> >
> >                     You easily have multiple Flink programs in a single
> >                     JAR file.
> >                     A program is defined using an ExecutionEnvironment
> >                     and executed when you call
> >                     ExecutionEnvironment.exeucte().
> >                     Where and how you do that does not matter.
> >
> >                     You can for example implement a main function such
> as:
> >
> >                     public static void main(String... args) {
> >
> >                       if (today == Monday) {
> >                         ExecutionEnvironment env = ...
> >                         // define Monday prog
> >                         env.execute()
> >                       }
> >                       else {
> >                         ExecutionEnvironment env = ...
> >                         // define other prog
> >                         env.execute()
> >                       }
> >                     }
> >
> >                     2015-05-08 11:41 GMT+02:00 Flavio Pompermaier
> >                     <pompermaier@okkam.it <mailto:pompermaier@okkam.it
> >>:
> >
> >                         Hi to all,
> >                         is there any way to keep multiple jobs in a jar
> >                         and then choose at runtime the one to execute
> >                         (like what ProgramDriver does in Hadoop)?
> >
> >                         Best,
> >                         Flavio
> >
> >
> >
> >
> >
> >
> >
> >
> >
>
>

Re: Package multiple jobs in a single jar

Posted by Flavio Pompermaier <po...@okkam.it>.
Nice feature Matthias!
My suggestion is to create a specific Flink interface to get also
description of a job and standardize parameter passing.
Then, somewhere (e.g. Manifest) you could specify the list of packages (or
also directly the classes) to inspect with reflection to extract the list
of available Flink jobs.
Something like:

public interface FlinkJob {

/** The name to display in the job submission UI or shell */
//e.g. "My Flink HelloWorld"
String getDisplayName();
 //e.g. "This program does this and that etc.."
String getDescription();
 //e.g. <0,Integer,"An integer representing my first param">, <1,String,"An
string representing my second param">
List<Tuple3<Integer, TypeInfo, String>> paramDescription;
 /** Set up the flink job in the passed ExecutionEnvironment */
ExecutionEnvironment config(ExecutionEnvironment env);
}

What do you think?



On Sun, May 17, 2015 at 10:38 PM, Matthias J. Sax <
mjsax@informatik.hu-berlin.de> wrote:

> Hi,
>
> I like the idea that Flink's WebClient can show different plans for
> different jobs within a single jar file.
>
> I prepared a prototype for this feature. You can find it here:
> https://github.com/mjsax/flink/tree/multipleJobsWebUI
>
> To test the feature, you need to prepare a jar file, that contains the
> code of multiple programs and specify each entry class in the manifest
> file as comma separated values in "program-class" line.
>
> Feedback is welcome. :)
>
>
> -Matthias
>
>
> On 05/08/2015 03:08 PM, Flavio Pompermaier wrote:
> > Thank you all for the support!
> > It will be a really nice feature if the web client could be able to show
> > me the list of Flink jobs within my jar..
> > it should be sufficient to mark them with a special annotation and
> > inspect the classes within the jar..
> >
> > On Fri, May 8, 2015 at 3:03 PM, Malte Schwarzer <ms@mieo.de
> > <ma...@mieo.de>> wrote:
> >
> >     Hi Flavio,
> >
> >     you also can put each job in a single class and use the –c parameter
> >     to execute jobs separately:
> >
> >     /bin/flink run –c com.myflinkjobs.JobA /path/to/jar/multiplejobs.jar
> >     /bin/flink run –c com.myflinkjobs.JobB /path/to/jar/multiplejobs.jar
> >     …
> >
> >     Cheers
> >     Malte
> >
> >     Von: Robert Metzger <rmetzger@apache.org <mailto:rmetzger@apache.org
> >>
> >     Antworten an: <user@flink.apache.org <ma...@flink.apache.org>>
> >     Datum: Freitag, 8. Mai 2015 14:57
> >     An: "user@flink.apache.org <ma...@flink.apache.org>"
> >     <user@flink.apache.org <ma...@flink.apache.org>>
> >     Betreff: Re: Package multiple jobs in a single jar
> >
> >     Hi Flavio,
> >
> >     the pom from our quickstart is a good
> >     reference:
> https://github.com/apache/flink/blob/master/flink-quickstart/flink-quickstart-java/src/main/resources/archetype-resources/pom.xml
> >
> >
> >
> >
> >     On Fri, May 8, 2015 at 2:53 PM, Flavio Pompermaier
> >     <pompermaier@okkam.it <ma...@okkam.it>> wrote:
> >
> >         Ok, get it.
> >         And is there a reference pom.xml for shading my application into
> >         one fat-jar? which flink dependencies can I exclude?
> >
> >         On Fri, May 8, 2015 at 1:05 PM, Fabian Hueske <fhueske@gmail.com
> >         <ma...@gmail.com>> wrote:
> >
> >             I didn't say that the main should return the
> >             ExecutionEnvironment.
> >             You can define and execute as many programs in a main
> >             function as you like.
> >             The program can be defined somewhere else, e.g., in a
> >             function that receives an ExecutionEnvironment and attaches
> >             a program such as
> >
> >             public void buildMyProgram(ExecutionEnvironment env) {
> >               DataSet<String> lines = env.readTextFile(...);
> >               // do something
> >               lines.writeAsText(...);
> >             }
> >
> >             That method could be invoked from main():
> >
> >             psv main() {
> >               ExecutionEnv env = ...
> >
> >               if(...) {
> >                 buildMyProgram(env);
> >               }
> >               else {
> >                 buildSomeOtherProg(env);
> >               }
> >
> >               env.execute();
> >
> >               // run some more programs
> >             }
> >
> >             2015-05-08 12:56 GMT+02:00 Flavio Pompermaier
> >             <pompermaier@okkam.it <ma...@okkam.it>>:
> >
> >                 Hi Fabian,
> >                 thanks for the response.
> >                 So my mains should be converted in a method returning
> >                 the ExecutionEnvironment.
> >                 However it think that it will be very nice to have a
> >                 syntax like the one of the Hadoop ProgramDriver to
> >                 define jobs to invoke from a single root class.
> >                 Do you think it could be useful?
> >
> >                 On Fri, May 8, 2015 at 12:42 PM, Fabian Hueske
> >                 <fhueske@gmail.com <ma...@gmail.com>> wrote:
> >
> >                     You easily have multiple Flink programs in a single
> >                     JAR file.
> >                     A program is defined using an ExecutionEnvironment
> >                     and executed when you call
> >                     ExecutionEnvironment.exeucte().
> >                     Where and how you do that does not matter.
> >
> >                     You can for example implement a main function such
> as:
> >
> >                     public static void main(String... args) {
> >
> >                       if (today == Monday) {
> >                         ExecutionEnvironment env = ...
> >                         // define Monday prog
> >                         env.execute()
> >                       }
> >                       else {
> >                         ExecutionEnvironment env = ...
> >                         // define other prog
> >                         env.execute()
> >                       }
> >                     }
> >
> >                     2015-05-08 11:41 GMT+02:00 Flavio Pompermaier
> >                     <pompermaier@okkam.it <mailto:pompermaier@okkam.it
> >>:
> >
> >                         Hi to all,
> >                         is there any way to keep multiple jobs in a jar
> >                         and then choose at runtime the one to execute
> >                         (like what ProgramDriver does in Hadoop)?
> >
> >                         Best,
> >                         Flavio
> >
> >
> >
> >
> >
> >
> >
> >
> >
>
>