You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@flink.apache.org by Flavio Pompermaier <po...@okkam.it> on 2020/08/07 10:54:58 UTC

Flink job percentage

Hi to all,
one of our customers asked us to see a percentage of completion of a Flink
Batch job. Is there any already implemented heuristic I can use to compute
it? Will this be possible also when DataSet api will migrate to
DataStream..?

Thanks in advance,
Flavio

Re: Flink job percentage

Posted by Flavio Pompermaier <po...@okkam.it>.

Ok I understood. Unfortunately the documentation is not able to extract the
Map type of status-count that is  Map<ExecutionState, Integer> and I
thought that the job status and execution status were equivalent.
And what about the heuristic...? Could it make sense

On Thu, Nov 5, 2020 at 11:33 AM Chesnay Schepler <ch...@apache.org> wrote:

> Admittedly, it can be out-of-sync if someone forgets to regenerate the
> documentation, but they cannot be mixed up.
>
> On 11/5/2020 11:31 AM, Chesnay Schepler wrote:
>
> The "mismatch" is due to you mixing job and vertex states.
>
> These are the states a job can be in (based on
> org.apache.flink.api.common.JobStatus):
>
> [ "CREATED", "RUNNING", "FAILING", "FAILED", "CANCELLING", "CANCELED",
> "FINISHED", "RESTARTING", "SUSPENDED", "RECONCILING" ]
>
> These are the states a vertex can be in (based on
> org.apache.flink.runtime.execution.ExecutionState):
>
> [ "CREATED", "SCHEDULED", "DEPLOYING", "RUNNING", "FINISHED", "CANCELING",
> "CANCELED", "FAILED", "RECONCILING" ]
>
> Naturally, for your code you only want to check for the lattern.
>
> The documentation is hence correct. FYI, we directly access the
> corresponding enums to generate this list, so it _cannot_ be out-of-sync.
>
> On 11/5/2020 11:16 AM, Flavio Pompermaier wrote:
>
> What do you thinkin about this very rough heuristic (obviously it makes
> sense only for batch jobs)?
> It's far from perfect but at least it gives an idea of something going on..
> PS: I found some mismatch from the states documented in [1] and the ones I
> found in the ExecutionState enum..
> [1]
> https://ci.apache.org/projects/flink/flink-docs-stable/monitoring/rest_api.html#jobs-jobid
>
>     Map<ExecutionState, Integer> statusCount =
> jobDetails.getJobVerticesPerState();
>     int uncompleted = statusCount.getOrDefault(ExecutionState.CREATED, 0)
> + //
>         statusCount.getOrDefault(ExecutionState.RUNNING, 0) + ///
>         statusCount.getOrDefault(ExecutionState.CANCELING, 0) + //
>         statusCount.getOrDefault(ExecutionState.DEPLOYING, 0) + //
>         // statusCount.getOrDefault(ExecutionState.FAILING,0)+ // not
> found in Flink 1.11.0
>         // statusCount.getOrDefault(ExecutionState.SUSPENDED,0)+ /// not
> found in Flink 1.11.0
>         statusCount.getOrDefault(ExecutionState.RECONCILING, 0) + //
>         // statusCount.getOrDefault(ExecutionState.RESTARTING,0) + /// not
> found in Flink 1.11.0
>         statusCount.getOrDefault(ExecutionState.RUNNING, 0) + //
>         statusCount.getOrDefault(ExecutionState.SCHEDULED, 0);
>     int completed = statusCount.getOrDefault(ExecutionState.FINISHED, 0) +
> //
>         statusCount.getOrDefault(ExecutionState.FAILED, 0) + //
>         statusCount.getOrDefault(ExecutionState.CANCELED, 0);
>     final Integer completionPercentage = Math.floorDiv(completed,
> completed + uncompleted);
>
> Thanks in advance,
> Flavio
>
> On Thu, Aug 13, 2020 at 4:17 PM Arvid Heise <ar...@ververica.com> wrote:
>
>> Hi Flavio,
>>
>> This is a daunting task to implement properly. There is an easy fix in
>> related workflow systems though. Assuming that it's a rerunning task, then
>> you simply store the run times of the last run, use some kind of low-pass
>> filter (=decaying average) and compare the current runtime with the
>> expected runtime. Even if Flink would have some estimation, it's probably
>> not more accurate than this.
>>
>> Best,
>>
>> Arvid
>>
>> On Tue, Aug 11, 2020 at 10:26 AM Robert Metzger <rm...@apache.org>
>> wrote:
>>
>>> Hi Flavio,
>>>
>>> I'm not aware of such a heuristic being implemented anywhere. You need
>>> to come up with something yourself.
>>>
>>> On Fri, Aug 7, 2020 at 12:55 PM Flavio Pompermaier <po...@okkam.it>
>>> wrote:
>>>
>>>> Hi to all,
>>>> one of our customers asked us to see a percentage of completion of a
>>>> Flink Batch job. Is there any already implemented heuristic I can use to
>>>> compute it? Will this be possible also when DataSet api will migrate to
>>>> DataStream..?
>>>>
>>>> Thanks in advance,
>>>> Flavio
>>>>
>>>
>>
>> --
>>
>> Arvid Heise | Senior Java Developer
>>
>> <https://www.ververica.com/>
>>
>> Follow us @VervericaData
>>
>> --
>>
>> Join Flink Forward <https://flink-forward.org/> - The Apache Flink
>> Conference
>>
>> Stream Processing | Event Driven | Real Time
>>
>> --
>>
>> Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany
>>
>> --
>> Ververica GmbH Registered at Amtsgericht Charlottenburg: HRB 158244 B Managing
>> Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji (Toni) Cheng
>>
>>
>
>
>

Re: Flink job percentage

Posted by Chesnay Schepler <ch...@apache.org>.

Admittedly, it can be out-of-sync if someone forgets to regenerate the 
documentation, but they cannot be mixed up.

On 11/5/2020 11:31 AM, Chesnay Schepler wrote:
> |The "mismatch" is due to you mixing job and vertex states.
> |
>
> |These are the states a job can be in (based on 
> org.apache.flink.api.common.JobStatus):|
>
>     |[ "CREATED", "RUNNING", "FAILING", "FAILED", "CANCELLING",
>     "CANCELED", "FINISHED", "RESTARTING", "SUSPENDED", "RECONCILING" ]||
>     |
>
> |These are the states a vertex can be in (based on 
> org.apache.flink.runtime.execution.ExecutionState):|
>
>     |[ "CREATED", "SCHEDULED", "DEPLOYING", "RUNNING", "FINISHED",
>     "CANCELING", "CANCELED", "FAILED", "RECONCILING" ]|
>
> |Naturally, for your code you only want to check for the lattern.
> |
>
> |The documentation is hence correct. FYI, we directly access the 
> corresponding enums to generate this list, so it _cannot_ be out-of-sync.|
> ||||||||
>
>
> On 11/5/2020 11:16 AM, Flavio Pompermaier wrote:
>> What do you thinkin about this very rough heuristic (obviously it 
>> makes sense only for batch jobs)?
>> It's far from perfect but at least it gives an idea of something 
>> going on..
>> PS: I found some mismatch from the states documented in [1] and the 
>> ones I found in the ExecutionState enum..
>> [1] 
>> https://ci.apache.org/projects/flink/flink-docs-stable/monitoring/rest_api.html#jobs-jobid
>>
>>     Map<ExecutionState, Integer> statusCount = 
>> jobDetails.getJobVerticesPerState();
>>     int uncompleted = 
>> statusCount.getOrDefault(ExecutionState.CREATED, 0) + //
>> statusCount.getOrDefault(ExecutionState.RUNNING, 0) + ///
>> statusCount.getOrDefault(ExecutionState.CANCELING, 0) + //
>> statusCount.getOrDefault(ExecutionState.DEPLOYING, 0) + //
>>         // statusCount.getOrDefault(ExecutionState.FAILING,0)+ // not 
>> found in Flink 1.11.0
>>         // statusCount.getOrDefault(ExecutionState.SUSPENDED,0)+ /// 
>> not found in Flink 1.11.0
>> statusCount.getOrDefault(ExecutionState.RECONCILING, 0) + //
>>         // statusCount.getOrDefault(ExecutionState.RESTARTING,0) + 
>> /// not found in Flink 1.11.0
>> statusCount.getOrDefault(ExecutionState.RUNNING, 0) + //
>> statusCount.getOrDefault(ExecutionState.SCHEDULED, 0);
>>     int completed = statusCount.getOrDefault(ExecutionState.FINISHED, 
>> 0) + //
>> statusCount.getOrDefault(ExecutionState.FAILED, 0) + //
>> statusCount.getOrDefault(ExecutionState.CANCELED, 0);
>>     final Integer completionPercentage = Math.floorDiv(completed, 
>> completed + uncompleted);
>>
>> Thanks in advance,
>> Flavio
>>
>> On Thu, Aug 13, 2020 at 4:17 PM Arvid Heise <arvid@ververica.com 
>> <ma...@ververica.com>> wrote:
>>
>>     Hi Flavio,
>>
>>     This is a daunting task to implement properly. There is an easy
>>     fix in related workflow systems though. Assuming that it's a
>>     rerunning task, then you simply store the run times of the last
>>     run, use some kind of low-pass filter (=decaying average) and
>>     compare the current runtime with the expected runtime. Even if
>>     Flink would have some estimation, it's probably not more accurate
>>     than this.
>>
>>     Best,
>>
>>     Arvid
>>
>>     On Tue, Aug 11, 2020 at 10:26 AM Robert Metzger
>>     <rmetzger@apache.org <ma...@apache.org>> wrote:
>>
>>         Hi Flavio,
>>
>>         I'm not aware of such a heuristic being implemented anywhere.
>>         You need to come up with something yourself.
>>
>>         On Fri, Aug 7, 2020 at 12:55 PM Flavio Pompermaier
>>         <pompermaier@okkam.it <ma...@okkam.it>> wrote:
>>
>>             Hi to all,
>>             one of our customers asked us to see a percentage of
>>             completion of a Flink Batch job. Is there any already
>>             implemented heuristic I can use to compute it? Will this
>>             be possible also when DataSet api will migrate to
>>             DataStream..?
>>
>>             Thanks in advance,
>>             Flavio
>>
>>
>>
>>     -- 
>>
>>     Arvid Heise| Senior Java Developer
>>
>>     <https://www.ververica.com/>
>>
>>
>>     Follow us @VervericaData
>>
>>     --
>>
>>     Join Flink Forward <https://flink-forward.org/>- The Apache
>>     FlinkConference
>>
>>     Stream Processing | Event Driven | Real Time
>>
>>     --
>>
>>     Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany
>>
>>     --
>>
>>     Ververica GmbHRegistered at Amtsgericht Charlottenburg: HRB
>>     158244 BManaging Directors: Timothy Alexander Steinert, Yip Park
>>     Tung Jason, Ji (Toni) Cheng
>>
>

Re: Flink job percentage

Posted by Chesnay Schepler <ch...@apache.org>.

No, because that would break the API and any log-parsing infrastructure 
relying on it.

On 11/5/2020 2:56 PM, Flavio Pompermaier wrote:
> Just another question: should I open a JIRA to rename 
> ExecutionState.CANCELING to CANCELLING (indeed the enum's Javadoc 
> report CANCELLING)?
>
> On Thu, Nov 5, 2020 at 11:31 AM Chesnay Schepler <chesnay@apache.org 
> <ma...@apache.org>> wrote:
>
>     |The "mismatch" is due to you mixing job and vertex states.
>     |
>
>     |These are the states a job can be in (based on
>     org.apache.flink.api.common.JobStatus):|
>
>         |[ "CREATED", "RUNNING", "FAILING", "FAILED", "CANCELLING",
>         "CANCELED", "FINISHED", "RESTARTING", "SUSPENDED",
>         "RECONCILING" ]||
>         |
>
>     |These are the states a vertex can be in (based on
>     org.apache.flink.runtime.execution.ExecutionState):|
>
>         |[ "CREATED", "SCHEDULED", "DEPLOYING", "RUNNING", "FINISHED",
>         "CANCELING", "CANCELED", "FAILED", "RECONCILING" ]|
>
>     |Naturally, for your code you only want to check for the lattern.
>     |
>
>     |The documentation is hence correct. FYI, we directly access the
>     corresponding enums to generate this list, so it _cannot_ be
>     out-of-sync.|
>     ||||||||
>
>
>     On 11/5/2020 11:16 AM, Flavio Pompermaier wrote:
>>     What do you thinkin about this very rough heuristic (obviously it
>>     makes sense only for batch jobs)?
>>     It's far from perfect but at least it gives an idea of something
>>     going on..
>>     PS: I found some mismatch from the states documented in [1] and
>>     the ones I found in the ExecutionState enum..
>>     [1]
>>     https://ci.apache.org/projects/flink/flink-docs-stable/monitoring/rest_api.html#jobs-jobid
>>
>>         Map<ExecutionState, Integer> statusCount =
>>     jobDetails.getJobVerticesPerState();
>>         int uncompleted =
>>     statusCount.getOrDefault(ExecutionState.CREATED, 0) + //
>>     statusCount.getOrDefault(ExecutionState.RUNNING, 0) + ///
>>     statusCount.getOrDefault(ExecutionState.CANCELING, 0) + //
>>     statusCount.getOrDefault(ExecutionState.DEPLOYING, 0) + //
>>             // statusCount.getOrDefault(ExecutionState.FAILING,0)+ //
>>     not found in Flink 1.11.0
>>             // statusCount.getOrDefault(ExecutionState.SUSPENDED,0)+
>>     /// not found in Flink 1.11.0
>>     statusCount.getOrDefault(ExecutionState.RECONCILING, 0) + //
>>             // statusCount.getOrDefault(ExecutionState.RESTARTING,0)
>>     + /// not found in Flink 1.11.0
>>     statusCount.getOrDefault(ExecutionState.RUNNING, 0) + //
>>     statusCount.getOrDefault(ExecutionState.SCHEDULED, 0);
>>         int completed =
>>     statusCount.getOrDefault(ExecutionState.FINISHED, 0) + //
>>     statusCount.getOrDefault(ExecutionState.FAILED, 0) + //
>>     statusCount.getOrDefault(ExecutionState.CANCELED, 0);
>>         final Integer completionPercentage = Math.floorDiv(completed,
>>     completed + uncompleted);
>>
>>     Thanks in advance,
>>     Flavio
>>
>>     On Thu, Aug 13, 2020 at 4:17 PM Arvid Heise <arvid@ververica.com
>>     <ma...@ververica.com>> wrote:
>>
>>         Hi Flavio,
>>
>>         This is a daunting task to implement properly. There is an
>>         easy fix in related workflow systems though. Assuming that
>>         it's a rerunning task, then you simply store the run times of
>>         the last run, use some kind of low-pass filter (=decaying
>>         average) and compare the current runtime with the expected
>>         runtime. Even if Flink would have some estimation, it's
>>         probably not more accurate than this.
>>
>>         Best,
>>
>>         Arvid
>>
>>         On Tue, Aug 11, 2020 at 10:26 AM Robert Metzger
>>         <rmetzger@apache.org <ma...@apache.org>> wrote:
>>
>>             Hi Flavio,
>>
>>             I'm not aware of such a heuristic being implemented
>>             anywhere. You need to come up with something yourself.
>>
>>             On Fri, Aug 7, 2020 at 12:55 PM Flavio Pompermaier
>>             <pompermaier@okkam.it <ma...@okkam.it>> wrote:
>>
>>                 Hi to all,
>>                 one of our customers asked us to see a percentage of
>>                 completion of a Flink Batch job. Is there any already
>>                 implemented heuristic I can use to compute it? Will
>>                 this be possible also when DataSet api will migrate
>>                 to DataStream..?
>>
>>                 Thanks in advance,
>>                 Flavio
>>
>>
>>
>>         -- 
>>
>>         Arvid Heise| Senior Java Developer
>>
>>         <https://www.ververica.com/>
>>
>>
>>         Follow us @VervericaData
>>
>>         --
>>
>>         Join Flink Forward <https://flink-forward.org/>- The Apache
>>         FlinkConference
>>
>>         Stream Processing | Event Driven | Real Time
>>
>>         --
>>
>>         Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany
>>
>>         --
>>
>>         Ververica GmbHRegistered at Amtsgericht Charlottenburg: HRB
>>         158244 BManaging Directors: Timothy Alexander Steinert, Yip
>>         Park Tung Jason, Ji (Toni) Cheng
>>
>

Re: Flink job percentage

Posted by Flavio Pompermaier <po...@okkam.it>.

Just another question: should I open a JIRA to rename
ExecutionState.CANCELING to CANCELLING (indeed the enum's Javadoc report
CANCELLING)?

On Thu, Nov 5, 2020 at 11:31 AM Chesnay Schepler <ch...@apache.org> wrote:

> The "mismatch" is due to you mixing job and vertex states.
>
> These are the states a job can be in (based on
> org.apache.flink.api.common.JobStatus):
>
> [ "CREATED", "RUNNING", "FAILING", "FAILED", "CANCELLING", "CANCELED",
> "FINISHED", "RESTARTING", "SUSPENDED", "RECONCILING" ]
>
> These are the states a vertex can be in (based on
> org.apache.flink.runtime.execution.ExecutionState):
>
> [ "CREATED", "SCHEDULED", "DEPLOYING", "RUNNING", "FINISHED", "CANCELING",
> "CANCELED", "FAILED", "RECONCILING" ]
>
> Naturally, for your code you only want to check for the lattern.
>
> The documentation is hence correct. FYI, we directly access the
> corresponding enums to generate this list, so it _cannot_ be out-of-sync.
>
> On 11/5/2020 11:16 AM, Flavio Pompermaier wrote:
>
> What do you thinkin about this very rough heuristic (obviously it makes
> sense only for batch jobs)?
> It's far from perfect but at least it gives an idea of something going on..
> PS: I found some mismatch from the states documented in [1] and the ones I
> found in the ExecutionState enum..
> [1]
> https://ci.apache.org/projects/flink/flink-docs-stable/monitoring/rest_api.html#jobs-jobid
>
>     Map<ExecutionState, Integer> statusCount =
> jobDetails.getJobVerticesPerState();
>     int uncompleted = statusCount.getOrDefault(ExecutionState.CREATED, 0)
> + //
>         statusCount.getOrDefault(ExecutionState.RUNNING, 0) + ///
>         statusCount.getOrDefault(ExecutionState.CANCELING, 0) + //
>         statusCount.getOrDefault(ExecutionState.DEPLOYING, 0) + //
>         // statusCount.getOrDefault(ExecutionState.FAILING,0)+ // not
> found in Flink 1.11.0
>         // statusCount.getOrDefault(ExecutionState.SUSPENDED,0)+ /// not
> found in Flink 1.11.0
>         statusCount.getOrDefault(ExecutionState.RECONCILING, 0) + //
>         // statusCount.getOrDefault(ExecutionState.RESTARTING,0) + /// not
> found in Flink 1.11.0
>         statusCount.getOrDefault(ExecutionState.RUNNING, 0) + //
>         statusCount.getOrDefault(ExecutionState.SCHEDULED, 0);
>     int completed = statusCount.getOrDefault(ExecutionState.FINISHED, 0) +
> //
>         statusCount.getOrDefault(ExecutionState.FAILED, 0) + //
>         statusCount.getOrDefault(ExecutionState.CANCELED, 0);
>     final Integer completionPercentage = Math.floorDiv(completed,
> completed + uncompleted);
>
> Thanks in advance,
> Flavio
>
> On Thu, Aug 13, 2020 at 4:17 PM Arvid Heise <ar...@ververica.com> wrote:
>
>> Hi Flavio,
>>
>> This is a daunting task to implement properly. There is an easy fix in
>> related workflow systems though. Assuming that it's a rerunning task, then
>> you simply store the run times of the last run, use some kind of low-pass
>> filter (=decaying average) and compare the current runtime with the
>> expected runtime. Even if Flink would have some estimation, it's probably
>> not more accurate than this.
>>
>> Best,
>>
>> Arvid
>>
>> On Tue, Aug 11, 2020 at 10:26 AM Robert Metzger <rm...@apache.org>
>> wrote:
>>
>>> Hi Flavio,
>>>
>>> I'm not aware of such a heuristic being implemented anywhere. You need
>>> to come up with something yourself.
>>>
>>> On Fri, Aug 7, 2020 at 12:55 PM Flavio Pompermaier <po...@okkam.it>
>>> wrote:
>>>
>>>> Hi to all,
>>>> one of our customers asked us to see a percentage of completion of a
>>>> Flink Batch job. Is there any already implemented heuristic I can use to
>>>> compute it? Will this be possible also when DataSet api will migrate to
>>>> DataStream..?
>>>>
>>>> Thanks in advance,
>>>> Flavio
>>>>
>>>
>>
>> --
>>
>> Arvid Heise | Senior Java Developer
>>
>> <https://www.ververica.com/>
>>
>> Follow us @VervericaData
>>
>> --
>>
>> Join Flink Forward <https://flink-forward.org/> - The Apache Flink
>> Conference
>>
>> Stream Processing | Event Driven | Real Time
>>
>> --
>>
>> Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany
>>
>> --
>> Ververica GmbH Registered at Amtsgericht Charlottenburg: HRB 158244 B Managing
>> Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji (Toni) Cheng
>>
>>
>
>

Re: Flink job percentage

Posted by Chesnay Schepler <ch...@apache.org>.

|The "mismatch" is due to you mixing job and vertex states.
|

|These are the states a job can be in (based on 
org.apache.flink.api.common.JobStatus):|

    |[ "CREATED", "RUNNING", "FAILING", "FAILED", "CANCELLING",
    "CANCELED", "FINISHED", "RESTARTING", "SUSPENDED", "RECONCILING" ]||
    |

|These are the states a vertex can be in (based on 
org.apache.flink.runtime.execution.ExecutionState):|

    |[ "CREATED", "SCHEDULED", "DEPLOYING", "RUNNING", "FINISHED",
    "CANCELING", "CANCELED", "FAILED", "RECONCILING" ]|

|Naturally, for your code you only want to check for the lattern.
|

|The documentation is hence correct. FYI, we directly access the 
corresponding enums to generate this list, so it _cannot_ be out-of-sync.|
||||||||


On 11/5/2020 11:16 AM, Flavio Pompermaier wrote:
> What do you thinkin about this very rough heuristic (obviously it 
> makes sense only for batch jobs)?
> It's far from perfect but at least it gives an idea of something going 
> on..
> PS: I found some mismatch from the states documented in [1] and the 
> ones I found in the ExecutionState enum..
> [1] 
> https://ci.apache.org/projects/flink/flink-docs-stable/monitoring/rest_api.html#jobs-jobid
>
>     Map<ExecutionState, Integer> statusCount = 
> jobDetails.getJobVerticesPerState();
>     int uncompleted = statusCount.getOrDefault(ExecutionState.CREATED, 
> 0) + //
> statusCount.getOrDefault(ExecutionState.RUNNING, 0) + ///
> statusCount.getOrDefault(ExecutionState.CANCELING, 0) + //
> statusCount.getOrDefault(ExecutionState.DEPLOYING, 0) + //
>         // statusCount.getOrDefault(ExecutionState.FAILING,0)+ // not 
> found in Flink 1.11.0
>         // statusCount.getOrDefault(ExecutionState.SUSPENDED,0)+ /// 
> not found in Flink 1.11.0
> statusCount.getOrDefault(ExecutionState.RECONCILING, 0) + //
>         // statusCount.getOrDefault(ExecutionState.RESTARTING,0) + /// 
> not found in Flink 1.11.0
> statusCount.getOrDefault(ExecutionState.RUNNING, 0) + //
> statusCount.getOrDefault(ExecutionState.SCHEDULED, 0);
>     int completed = statusCount.getOrDefault(ExecutionState.FINISHED, 
> 0) + //
> statusCount.getOrDefault(ExecutionState.FAILED, 0) + //
> statusCount.getOrDefault(ExecutionState.CANCELED, 0);
>     final Integer completionPercentage = Math.floorDiv(completed, 
> completed + uncompleted);
>
> Thanks in advance,
> Flavio
>
> On Thu, Aug 13, 2020 at 4:17 PM Arvid Heise <arvid@ververica.com 
> <ma...@ververica.com>> wrote:
>
>     Hi Flavio,
>
>     This is a daunting task to implement properly. There is an easy
>     fix in related workflow systems though. Assuming that it's a
>     rerunning task, then you simply store the run times of the last
>     run, use some kind of low-pass filter (=decaying average) and
>     compare the current runtime with the expected runtime. Even if
>     Flink would have some estimation, it's probably not more accurate
>     than this.
>
>     Best,
>
>     Arvid
>
>     On Tue, Aug 11, 2020 at 10:26 AM Robert Metzger
>     <rmetzger@apache.org <ma...@apache.org>> wrote:
>
>         Hi Flavio,
>
>         I'm not aware of such a heuristic being implemented anywhere.
>         You need to come up with something yourself.
>
>         On Fri, Aug 7, 2020 at 12:55 PM Flavio Pompermaier
>         <pompermaier@okkam.it <ma...@okkam.it>> wrote:
>
>             Hi to all,
>             one of our customers asked us to see a percentage of
>             completion of a Flink Batch job. Is there any already
>             implemented heuristic I can use to compute it? Will this
>             be possible also when DataSet api will migrate to
>             DataStream..?
>
>             Thanks in advance,
>             Flavio
>
>
>
>     -- 
>
>     Arvid Heise| Senior Java Developer
>
>     <https://www.ververica.com/>
>
>
>     Follow us @VervericaData
>
>     --
>
>     Join Flink Forward <https://flink-forward.org/>- The Apache
>     FlinkConference
>
>     Stream Processing | Event Driven | Real Time
>
>     --
>
>     Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany
>
>     --
>
>     Ververica GmbHRegistered at Amtsgericht Charlottenburg: HRB 158244
>     BManaging Directors: Timothy Alexander Steinert, Yip Park Tung
>     Jason, Ji (Toni) Cheng
>

Re: Flink job percentage

Posted by Flavio Pompermaier <po...@okkam.it>.

What do you thinkin about this very rough heuristic (obviously it makes
sense only for batch jobs)?
It's far from perfect but at least it gives an idea of something going on..
PS: I found some mismatch from the states documented in [1] and the ones I
found in the ExecutionState enum..
[1]
https://ci.apache.org/projects/flink/flink-docs-stable/monitoring/rest_api.html#jobs-jobid

    Map<ExecutionState, Integer> statusCount =
jobDetails.getJobVerticesPerState();
    int uncompleted = statusCount.getOrDefault(ExecutionState.CREATED, 0) +
//
        statusCount.getOrDefault(ExecutionState.RUNNING, 0) + ///
        statusCount.getOrDefault(ExecutionState.CANCELING, 0) + //
        statusCount.getOrDefault(ExecutionState.DEPLOYING, 0) + //
        // statusCount.getOrDefault(ExecutionState.FAILING,0)+ // not found
in Flink 1.11.0
        // statusCount.getOrDefault(ExecutionState.SUSPENDED,0)+ /// not
found in Flink 1.11.0
        statusCount.getOrDefault(ExecutionState.RECONCILING, 0) + //
        // statusCount.getOrDefault(ExecutionState.RESTARTING,0) + /// not
found in Flink 1.11.0
        statusCount.getOrDefault(ExecutionState.RUNNING, 0) + //
        statusCount.getOrDefault(ExecutionState.SCHEDULED, 0);
    int completed = statusCount.getOrDefault(ExecutionState.FINISHED, 0) +
//
        statusCount.getOrDefault(ExecutionState.FAILED, 0) + //
        statusCount.getOrDefault(ExecutionState.CANCELED, 0);
    final Integer completionPercentage = Math.floorDiv(completed, completed
+ uncompleted);

Thanks in advance,
Flavio

On Thu, Aug 13, 2020 at 4:17 PM Arvid Heise <ar...@ververica.com> wrote:

> Hi Flavio,
>
> This is a daunting task to implement properly. There is an easy fix in
> related workflow systems though. Assuming that it's a rerunning task, then
> you simply store the run times of the last run, use some kind of low-pass
> filter (=decaying average) and compare the current runtime with the
> expected runtime. Even if Flink would have some estimation, it's probably
> not more accurate than this.
>
> Best,
>
> Arvid
>
> On Tue, Aug 11, 2020 at 10:26 AM Robert Metzger <rm...@apache.org>
> wrote:
>
>> Hi Flavio,
>>
>> I'm not aware of such a heuristic being implemented anywhere. You need to
>> come up with something yourself.
>>
>> On Fri, Aug 7, 2020 at 12:55 PM Flavio Pompermaier <po...@okkam.it>
>> wrote:
>>
>>> Hi to all,
>>> one of our customers asked us to see a percentage of completion of a
>>> Flink Batch job. Is there any already implemented heuristic I can use to
>>> compute it? Will this be possible also when DataSet api will migrate to
>>> DataStream..?
>>>
>>> Thanks in advance,
>>> Flavio
>>>
>>
>
> --
>
> Arvid Heise | Senior Java Developer
>
> <https://www.ververica.com/>
>
> Follow us @VervericaData
>
> --
>
> Join Flink Forward <https://flink-forward.org/> - The Apache Flink
> Conference
>
> Stream Processing | Event Driven | Real Time
>
> --
>
> Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany
>
> --
> Ververica GmbH
> Registered at Amtsgericht Charlottenburg: HRB 158244 B
> Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji
> (Toni) Cheng
>

Re: Flink job percentage

Posted by Arvid Heise <ar...@ververica.com>.

Hi Flavio,

This is a daunting task to implement properly. There is an easy fix in
related workflow systems though. Assuming that it's a rerunning task, then
you simply store the run times of the last run, use some kind of low-pass
filter (=decaying average) and compare the current runtime with the
expected runtime. Even if Flink would have some estimation, it's probably
not more accurate than this.

Best,

Arvid

On Tue, Aug 11, 2020 at 10:26 AM Robert Metzger <rm...@apache.org> wrote:

> Hi Flavio,
>
> I'm not aware of such a heuristic being implemented anywhere. You need to
> come up with something yourself.
>
> On Fri, Aug 7, 2020 at 12:55 PM Flavio Pompermaier <po...@okkam.it>
> wrote:
>
>> Hi to all,
>> one of our customers asked us to see a percentage of completion of a
>> Flink Batch job. Is there any already implemented heuristic I can use to
>> compute it? Will this be possible also when DataSet api will migrate to
>> DataStream..?
>>
>> Thanks in advance,
>> Flavio
>>
>

-- 

Arvid Heise | Senior Java Developer

<https://www.ververica.com/>

Follow us @VervericaData

--

Join Flink Forward <https://flink-forward.org/> - The Apache Flink
Conference

Stream Processing | Event Driven | Real Time

--

Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany

--
Ververica GmbH
Registered at Amtsgericht Charlottenburg: HRB 158244 B
Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji
(Toni) Cheng

Re: Flink job percentage

Posted by Robert Metzger <rm...@apache.org>.

Hi Flavio,

I'm not aware of such a heuristic being implemented anywhere. You need to
come up with something yourself.

On Fri, Aug 7, 2020 at 12:55 PM Flavio Pompermaier <po...@okkam.it>
wrote:

> Hi to all,
> one of our customers asked us to see a percentage of completion of a Flink
> Batch job. Is there any already implemented heuristic I can use to compute
> it? Will this be possible also when DataSet api will migrate to
> DataStream..?
>
> Thanks in advance,
> Flavio
>