You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flink.apache.org by Flavio Pompermaier <po...@okkam.it> on 2020/03/18 13:45:49 UTC

PackagedProgram and ProgramDescription

Hi all,
what do you think if we exploit this job-submission sprint to address also
the problem discussed in https://issues.apache.org/jira/browse/FLINK-10862?

Best,
Flavio

Re: PackagedProgram and ProgramDescription

Posted by Chesnay Schepler <ch...@apache.org>.
Let's try this again; the formatting went haywire for some reason...

public class MetaDataUtils {

     public static ExecutionConfig.GlobalJobParameters 
createMetaData(ParameterTool parameterTool) {
         Map<String, String> metaData = new 
HashMap<>(parameterTool.toMap());
         setFromManifest(metaData, "Commit-ID");
         setFromManifest(metaData, "Commit-Message");
         setFromManifest(metaData, "Commit-Time");

         return new MetaData(metaData);
     }

     private static void setFromManifest(Map<String, String> metaData, 
String property) {
         metaData.put(property, guard(() -> Manifests.read(property)));
     }

     private static String guard(Supplier<String> supplier) {
         try {
             return supplier.get();
         } catch (IllegalArgumentException iae) {
             return "unknown";
         }
     }

     private static class MetaData extends 
ExecutionConfig.GlobalJobParameters {
         private final Map<String, String> data;

         private MetaData(Map<String, String> data) {
             this.data = data;
         }

         @Override
         public Map<String, String> toMap() {
             return data;
         }
     }
}

On 15/07/2020 15:25, Chesnay Schepler wrote:
> For completeness sake, here's an example of what we're doing to add 
> the job arguments and some manifest entries to the global job parameters:
> (Manifests is a class from jcabi-manifests)
>
> public class MetaDataUtils {
>
>     public static ExecutionConfig.GlobalJobParameters 
> createMetaData(ParameterTool parameterTool) {
>         Map<String, String> metaData =new 
> HashMap<>(parameterTool.toMap()); setFromManifest(metaData, 
> "Commit-ID"); setFromManifest(metaData, "Commit-Message"); 
> setFromManifest(metaData, "Commit-Time"); return new 
> MetaData(metaData); }
>
>     private static void setFromManifest(Map<String, String> metaData, 
> String property) {
>         metaData.put(property, guard(() -> Manifests.read(property))); }
>
>     private static String guard(Supplier<String> supplier) {
>         try {
>             return supplier.get(); }catch (IllegalArgumentException 
> iae) {
>             return "unknown"; }
>     }
>
>     private static class MetaDataextends 
> ExecutionConfig.GlobalJobParameters {
>         private final Map<String, String> data; private 
> MetaData(Map<String, String> data) {
>             this.data = data; }
>
>         @Override
>         public Map<String, String> toMap() {
>             return data; }
>     }
> }
>
>
> On 15/07/2020 15:01, Flavio Pompermaier wrote:
>> Thanks Chesnay for the tip.
>> I'll try to investigate the usage of GlobalJobParameters.
>>
>> On Wed, Jul 15, 2020 at 2:51 PM Chesnay Schepler <ch...@apache.org> 
>> wrote:
>>
>>> The more we strive towards a model where an application can submit
>>> multiple jobs it will become increasingly important to be able to 
>>> attach
>>> meta data to a job/application to have any idea what is going on.
>>>
>>> But I don't think the PackagedProgram/ProgramDescription is the way to
>>> go; and I'd envision rather something like a meta data object that is
>>> attached to the environment/execute calls. But we have to figure out 
>>> how
>>> to do this in a way that also works for the SQL APIs.
>>>
>>> What we have done internally is to encode such information in the
>>> GlobalJobParameters which are then available in the WebUI. We have
>>> things like commit IDs encoded into the jar manifest, that we 
>>> extract at
>>> submission time and put them into the parameters.
>>> My guess would be that such approach can work sufficiently for all
>>> dataset/datastream/table API users.
>>>
>>> On 15/07/2020 14:05, Flavio Pompermaier wrote:
>>>> Ok, it's not a problem for me if the community is not interested in
>>> pushing
>>>> this thing forward.
>>>> When we develop a Job is super useful for us to have the job 
>>>> describing
>>>> itself somehow (what it does and which parameters it requires).
>>>> If this is not in Flink I have to implement it somewhere else but I 
>>>> can't
>>>> think that this is not a common situation.
>>>> However I think I can live with it :D
>>>>
>>>> Best,
>>>> Flavio
>>>>
>>>> On Wed, Jul 15, 2020 at 12:01 PM Aljoscha Krettek 
>>>> <al...@apache.org>
>>>> wrote:
>>>>
>>>>> I think no-one is interested to push this personally right now. We 
>>>>> would
>>>>> need a champion that is interested and pushes this forward.
>>>>>
>>>>> Best,
>>>>> Aljoscha
>>>>>
>>>>> On Mon, Mar 30, 2020, at 12:45, Flavio Pompermaier wrote:
>>>>>> I would personally like to see a way of describing a Flink 
>>>>>> job/pipeline
>>>>>> (including its parameters and types) in order to enable better UIs,
>>> then
>>>>>> the important thing is to make things consistent and aligned with 
>>>>>> the
>>> new
>>>>>> client developments and exploit this new dev sprint to fix such 
>>>>>> issues.
>>>>>>
>>>>>> On Mon, Mar 30, 2020 at 11:38 AM Aljoscha Krettek 
>>>>>> <aljoscha@apache.org
>>>>>> wrote:
>>>>>>
>>>>>>> On 18.03.20 14:45, Flavio Pompermaier wrote:
>>>>>>>> what do you think if we exploit this job-submission sprint to 
>>>>>>>> address
>>>>>>> also
>>>>>>>> the problem discussed in
>>>>>>> https://issues.apache.org/jira/browse/FLINK-10862?
>>>>>>>
>>>>>>> That's a good idea! What should we do? It seems that most 
>>>>>>> committers
>>> on
>>>>>>> the issue were in favour of deprecating/removing 
>>>>>>> ProgramDescription.
>>>>>>>
>
>


Re: PackagedProgram and ProgramDescription

Posted by Chesnay Schepler <ch...@apache.org>.
For completeness sake, here's an example of what we're doing to add the 
job arguments and some manifest entries to the global job parameters:
(Manifests is a class from jcabi-manifests)

public class MetaDataUtils {

     public static ExecutionConfig.GlobalJobParameters createMetaData(ParameterTool parameterTool) {
         Map<String, String> metaData =new HashMap<>(parameterTool.toMap()); setFromManifest(metaData, "Commit-ID"); setFromManifest(metaData, "Commit-Message"); setFromManifest(metaData, "Commit-Time"); return new MetaData(metaData); }

     private static void setFromManifest(Map<String, String> metaData, String property) {
         metaData.put(property, guard(() -> Manifests.read(property))); }

     private static String guard(Supplier<String> supplier) {
         try {
             return supplier.get(); }catch (IllegalArgumentException iae) {
             return "unknown"; }
     }

     private static class MetaDataextends ExecutionConfig.GlobalJobParameters {
         private final Map<String, String> data; private MetaData(Map<String, String> data) {
             this.data = data; }

         @Override
         public Map<String, String> toMap() {
             return data; }
     }
}


On 15/07/2020 15:01, Flavio Pompermaier wrote:
> Thanks Chesnay for the tip.
> I'll try to investigate the usage of GlobalJobParameters.
>
> On Wed, Jul 15, 2020 at 2:51 PM Chesnay Schepler <ch...@apache.org> wrote:
>
>> The more we strive towards a model where an application can submit
>> multiple jobs it will become increasingly important to be able to attach
>> meta data to a job/application to have any idea what is going on.
>>
>> But I don't think the PackagedProgram/ProgramDescription is the way to
>> go; and I'd envision rather something like a meta data object that is
>> attached to the environment/execute calls. But we have to figure out how
>> to do this in a way that also works for the SQL APIs.
>>
>> What we have done internally is to encode such information in the
>> GlobalJobParameters which are then available in the WebUI. We have
>> things like commit IDs encoded into the jar manifest, that we extract at
>> submission time and put them into the parameters.
>> My guess would be that such approach can work sufficiently for all
>> dataset/datastream/table API users.
>>
>> On 15/07/2020 14:05, Flavio Pompermaier wrote:
>>> Ok, it's not a problem for me if the community is not interested in
>> pushing
>>> this thing forward.
>>> When we develop a Job is super useful for us to have the job describing
>>> itself somehow (what it does and which parameters it requires).
>>> If this is not in Flink I have to implement it somewhere else but I can't
>>> think that this is not a common situation.
>>> However I think I can live with it :D
>>>
>>> Best,
>>> Flavio
>>>
>>> On Wed, Jul 15, 2020 at 12:01 PM Aljoscha Krettek <al...@apache.org>
>>> wrote:
>>>
>>>> I think no-one is interested to push this personally right now. We would
>>>> need a champion that is interested and pushes this forward.
>>>>
>>>> Best,
>>>> Aljoscha
>>>>
>>>> On Mon, Mar 30, 2020, at 12:45, Flavio Pompermaier wrote:
>>>>> I would personally like to see a way of describing a Flink job/pipeline
>>>>> (including its parameters and types) in order to enable better UIs,
>> then
>>>>> the important thing is to make things consistent and aligned with the
>> new
>>>>> client developments and exploit this new dev sprint to fix such issues.
>>>>>
>>>>> On Mon, Mar 30, 2020 at 11:38 AM Aljoscha Krettek <aljoscha@apache.org
>>>>> wrote:
>>>>>
>>>>>> On 18.03.20 14:45, Flavio Pompermaier wrote:
>>>>>>> what do you think if we exploit this job-submission sprint to address
>>>>>> also
>>>>>>> the problem discussed in
>>>>>> https://issues.apache.org/jira/browse/FLINK-10862?
>>>>>>
>>>>>> That's a good idea! What should we do? It seems that most committers
>> on
>>>>>> the issue were in favour of deprecating/removing ProgramDescription.
>>>>>>


Re: PackagedProgram and ProgramDescription

Posted by Flavio Pompermaier <po...@okkam.it>.
Thanks Chesnay for the tip.
I'll try to investigate the usage of GlobalJobParameters.

On Wed, Jul 15, 2020 at 2:51 PM Chesnay Schepler <ch...@apache.org> wrote:

> The more we strive towards a model where an application can submit
> multiple jobs it will become increasingly important to be able to attach
> meta data to a job/application to have any idea what is going on.
>
> But I don't think the PackagedProgram/ProgramDescription is the way to
> go; and I'd envision rather something like a meta data object that is
> attached to the environment/execute calls. But we have to figure out how
> to do this in a way that also works for the SQL APIs.
>
> What we have done internally is to encode such information in the
> GlobalJobParameters which are then available in the WebUI. We have
> things like commit IDs encoded into the jar manifest, that we extract at
> submission time and put them into the parameters.
> My guess would be that such approach can work sufficiently for all
> dataset/datastream/table API users.
>
> On 15/07/2020 14:05, Flavio Pompermaier wrote:
> > Ok, it's not a problem for me if the community is not interested in
> pushing
> > this thing forward.
> > When we develop a Job is super useful for us to have the job describing
> > itself somehow (what it does and which parameters it requires).
> > If this is not in Flink I have to implement it somewhere else but I can't
> > think that this is not a common situation.
> > However I think I can live with it :D
> >
> > Best,
> > Flavio
> >
> > On Wed, Jul 15, 2020 at 12:01 PM Aljoscha Krettek <al...@apache.org>
> > wrote:
> >
> >> I think no-one is interested to push this personally right now. We would
> >> need a champion that is interested and pushes this forward.
> >>
> >> Best,
> >> Aljoscha
> >>
> >> On Mon, Mar 30, 2020, at 12:45, Flavio Pompermaier wrote:
> >>> I would personally like to see a way of describing a Flink job/pipeline
> >>> (including its parameters and types) in order to enable better UIs,
> then
> >>> the important thing is to make things consistent and aligned with the
> new
> >>> client developments and exploit this new dev sprint to fix such issues.
> >>>
> >>> On Mon, Mar 30, 2020 at 11:38 AM Aljoscha Krettek <aljoscha@apache.org
> >
> >>> wrote:
> >>>
> >>>> On 18.03.20 14:45, Flavio Pompermaier wrote:
> >>>>> what do you think if we exploit this job-submission sprint to address
> >>>> also
> >>>>> the problem discussed in
> >>>> https://issues.apache.org/jira/browse/FLINK-10862?
> >>>>
> >>>> That's a good idea! What should we do? It seems that most committers
> on
> >>>> the issue were in favour of deprecating/removing ProgramDescription.
> >>>>
>

Re: PackagedProgram and ProgramDescription

Posted by Chesnay Schepler <ch...@apache.org>.
The more we strive towards a model where an application can submit 
multiple jobs it will become increasingly important to be able to attach 
meta data to a job/application to have any idea what is going on.

But I don't think the PackagedProgram/ProgramDescription is the way to 
go; and I'd envision rather something like a meta data object that is 
attached to the environment/execute calls. But we have to figure out how 
to do this in a way that also works for the SQL APIs.

What we have done internally is to encode such information in the 
GlobalJobParameters which are then available in the WebUI. We have 
things like commit IDs encoded into the jar manifest, that we extract at 
submission time and put them into the parameters.
My guess would be that such approach can work sufficiently for all 
dataset/datastream/table API users.

On 15/07/2020 14:05, Flavio Pompermaier wrote:
> Ok, it's not a problem for me if the community is not interested in pushing
> this thing forward.
> When we develop a Job is super useful for us to have the job describing
> itself somehow (what it does and which parameters it requires).
> If this is not in Flink I have to implement it somewhere else but I can't
> think that this is not a common situation.
> However I think I can live with it :D
>
> Best,
> Flavio
>
> On Wed, Jul 15, 2020 at 12:01 PM Aljoscha Krettek <al...@apache.org>
> wrote:
>
>> I think no-one is interested to push this personally right now. We would
>> need a champion that is interested and pushes this forward.
>>
>> Best,
>> Aljoscha
>>
>> On Mon, Mar 30, 2020, at 12:45, Flavio Pompermaier wrote:
>>> I would personally like to see a way of describing a Flink job/pipeline
>>> (including its parameters and types) in order to enable better UIs, then
>>> the important thing is to make things consistent and aligned with the new
>>> client developments and exploit this new dev sprint to fix such issues.
>>>
>>> On Mon, Mar 30, 2020 at 11:38 AM Aljoscha Krettek <al...@apache.org>
>>> wrote:
>>>
>>>> On 18.03.20 14:45, Flavio Pompermaier wrote:
>>>>> what do you think if we exploit this job-submission sprint to address
>>>> also
>>>>> the problem discussed in
>>>> https://issues.apache.org/jira/browse/FLINK-10862?
>>>>
>>>> That's a good idea! What should we do? It seems that most committers on
>>>> the issue were in favour of deprecating/removing ProgramDescription.
>>>>


Re: PackagedProgram and ProgramDescription

Posted by Flavio Pompermaier <po...@okkam.it>.
Ok, it's not a problem for me if the community is not interested in pushing
this thing forward.
When we develop a Job is super useful for us to have the job describing
itself somehow (what it does and which parameters it requires).
If this is not in Flink I have to implement it somewhere else but I can't
think that this is not a common situation.
However I think I can live with it :D

Best,
Flavio

On Wed, Jul 15, 2020 at 12:01 PM Aljoscha Krettek <al...@apache.org>
wrote:

> I think no-one is interested to push this personally right now. We would
> need a champion that is interested and pushes this forward.
>
> Best,
> Aljoscha
>
> On Mon, Mar 30, 2020, at 12:45, Flavio Pompermaier wrote:
> > I would personally like to see a way of describing a Flink job/pipeline
> > (including its parameters and types) in order to enable better UIs, then
> > the important thing is to make things consistent and aligned with the new
> > client developments and exploit this new dev sprint to fix such issues.
> >
> > On Mon, Mar 30, 2020 at 11:38 AM Aljoscha Krettek <al...@apache.org>
> > wrote:
> >
> > > On 18.03.20 14:45, Flavio Pompermaier wrote:
> > > > what do you think if we exploit this job-submission sprint to address
> > > also
> > > > the problem discussed in
> > > https://issues.apache.org/jira/browse/FLINK-10862?
> > >
> > > That's a good idea! What should we do? It seems that most committers on
> > > the issue were in favour of deprecating/removing ProgramDescription.
> > >
> >

Re: PackagedProgram and ProgramDescription

Posted by Aljoscha Krettek <al...@apache.org>.
I think no-one is interested to push this personally right now. We would need a champion that is interested and pushes this forward.

Best,
Aljoscha

On Mon, Mar 30, 2020, at 12:45, Flavio Pompermaier wrote:
> I would personally like to see a way of describing a Flink job/pipeline
> (including its parameters and types) in order to enable better UIs, then
> the important thing is to make things consistent and aligned with the new
> client developments and exploit this new dev sprint to fix such issues.
> 
> On Mon, Mar 30, 2020 at 11:38 AM Aljoscha Krettek <al...@apache.org>
> wrote:
> 
> > On 18.03.20 14:45, Flavio Pompermaier wrote:
> > > what do you think if we exploit this job-submission sprint to address
> > also
> > > the problem discussed in
> > https://issues.apache.org/jira/browse/FLINK-10862?
> >
> > That's a good idea! What should we do? It seems that most committers on
> > the issue were in favour of deprecating/removing ProgramDescription.
> >
>

Re: PackagedProgram and ProgramDescription

Posted by Flavio Pompermaier <po...@okkam.it>.
I would personally like to see a way of describing a Flink job/pipeline
(including its parameters and types) in order to enable better UIs, then
the important thing is to make things consistent and aligned with the new
client developments and exploit this new dev sprint to fix such issues.

On Mon, Mar 30, 2020 at 11:38 AM Aljoscha Krettek <al...@apache.org>
wrote:

> On 18.03.20 14:45, Flavio Pompermaier wrote:
> > what do you think if we exploit this job-submission sprint to address
> also
> > the problem discussed in
> https://issues.apache.org/jira/browse/FLINK-10862?
>
> That's a good idea! What should we do? It seems that most committers on
> the issue were in favour of deprecating/removing ProgramDescription.
>

Re: PackagedProgram and ProgramDescription

Posted by Aljoscha Krettek <al...@apache.org>.
On 18.03.20 14:45, Flavio Pompermaier wrote:
> what do you think if we exploit this job-submission sprint to address also
> the problem discussed in https://issues.apache.org/jira/browse/FLINK-10862?

That's a good idea! What should we do? It seems that most committers on 
the issue were in favour of deprecating/removing ProgramDescription.