You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Pierre Zemb <pi...@gmail.com> on 2018/08/21 19:35:53 UTC

Question about QueryableState

Hi!

I’ve started to deploy a small Flink cluster (4tm and 1jm for now on
1.6.0), and deployed a small job on it. Because of the current load, job is
completely handled by a single tm. I’ve created a small proxy that is using
QueryableStateClient
<https://ci.apache.org/projects/flink/flink-docs-master/api/java/org/apache/flink/queryablestate/client/QueryableStateClient.html>
to access the current state. It is working nicely, except under certain
circumstances. It seems to me that I can only access the state through a
node that is holding a part of the job. Here’s an example:

   -

   job on tm1. Pointing QueryableStateClient to tm1. State accessible
   -

   job still on tm1. Pointing QueryableStateClient to tm2 (for example).
   State inaccessible
   -

   killing tm1, job is now on tm2. State accessible
   -

   job still on tm2. Pointing QueryableStateClient to tm3. State
   inaccessible
   -

   adding some parallelism to spread job on tm1 and tm2. Pointing
   QueryableStateClient to either tm1 and tm2 is working
   -

   job still on tm1 and tm2. Pointing QueryableStateClient to tm3. State
   inaccessible

When the state is inaccessible, I can see this (generated here
<https://github.com/apache/flink/blob/release-1.6/flink-queryable-state/flink-queryable-state-runtime/src/main/java/org/apache/flink/queryablestate/client/proxy/KvStateClientProxyHandler.java#L228>
):

java.lang.RuntimeException: Failed request 0.
 Caused by: org.apache.flink.queryablestate.exceptions.UnknownLocationException:
Could not retrieve location of state=repo-status of
job=3ac3bc00b2d5bc0752917186a288d40a. Potential reasons are: i) the
state is not ready, or ii) the job does not exist.
    at org.apache.flink.queryablestate.client.proxy.KvStateClientProxyHandler.getKvStateLookupInfo(KvStateClientProxyHandler.java:228)
    at org.apache.flink.queryablestate.client.proxy.KvStateClientProxyHandler.getState(KvStateClientProxyHandler.java:162)
    at org.apache.flink.queryablestate.client.proxy.KvStateClientProxyHandler.executeActionAsync(KvStateClientProxyHandler.java:129)
    at org.apache.flink.queryablestate.client.proxy.KvStateClientProxyHandler.handleRequest(KvStateClientProxyHandler.java:119)
    at org.apache.flink.queryablestate.client.proxy.KvStateClientProxyHandler.handleRequest(KvStateClientProxyHandler.java:63)
    at org.apache.flink.queryablestate.network.AbstractServerHandler$AsyncRequestTask.run(AbstractServerHandler.java:236)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)

From the documentation, I can see that:

The client connects to a Client Proxy running on a given Task Manager. The
proxy is the entry point of the client to the Flink cluster. It forwards
the requests of the client to the Job Manager and the required Task
Manager, and forwards the final response back the client.

Did I miss something? Is the QueryableStateClientProxy only fetching info
from a job that is running on his local tm? If so, is there a way to
retrieve the job-graph? Or maybe another solution?

Thanks!
Pierre Zemb
​
-- 
Cordialement,
Pierre Zemb
pierrezemb.fr
Software Engineer, Metrics Data Platform @OVH

Re: Question about QueryableState

Posted by Kostas Kloudas <k....@data-artisans.com>.
Thanks a lot Pierre!

Kostas

> On Aug 27, 2018, at 2:16 PM, Pierre Zemb <pi...@gmail.com> wrote:
> 
> Hi!
> Just created the JIRA (https://issues.apache.org/jira/browse/FLINK-10225 <https://issues.apache.org/jira/browse/FLINK-10225>).
> 
> Thanks for your reply,
> Pierre
> 
> Le jeu. 23 août 2018 à 14:31, Kostas Kloudas <k.kloudas@data-artisans.com <ma...@data-artisans.com>> a écrit :
> Hi Pierre,
> 
> You are right that this should not happen.
> It seems like a bug.
> Could you open a JIRA and post it here?
> 
> Thanks,
> Kostas
> 
> 
>> On Aug 21, 2018, at 9:35 PM, Pierre Zemb <pierre.zemb.isen@gmail.com <ma...@gmail.com>> wrote:
>> 
>> Hi!
>> 
>> I’ve started to deploy a small Flink cluster (4tm and 1jm for now on 1.6.0), and deployed a small job on it. Because of the current load, job is completely handled by a single tm. I’ve created a small proxy that is using QueryableStateClient <https://ci.apache.org/projects/flink/flink-docs-master/api/java/org/apache/flink/queryablestate/client/QueryableStateClient.html> to access the current state. It is working nicely, except under certain circumstances. It seems to me that I can only access the state through a node that is holding a part of the job. Here’s an example:
>> 
>> job on tm1. Pointing QueryableStateClient to tm1. State accessible
>> job still on tm1. Pointing QueryableStateClient to tm2 (for example). State inaccessible
>> killing tm1, job is now on tm2. State accessible
>> job still on tm2. Pointing QueryableStateClient to tm3. State inaccessible
>> adding some parallelism to spread job on tm1 and tm2. Pointing QueryableStateClient to either tm1 and tm2 is working
>> job still on tm1 and tm2. Pointing QueryableStateClient to tm3. State inaccessible
>> When the state is inaccessible, I can see this (generated here <https://github.com/apache/flink/blob/release-1.6/flink-queryable-state/flink-queryable-state-runtime/src/main/java/org/apache/flink/queryablestate/client/proxy/KvStateClientProxyHandler.java#L228>):
>> 
>> java.lang.RuntimeException: Failed request 0.
>>  Caused by: org.apache.flink.queryablestate.exceptions.UnknownLocationException: Could not retrieve location of state=repo-status of job=3ac3bc00b2d5bc0752917186a288d40a. Potential reasons are: i) the state is not ready, or ii) the job does not exist.
>>     at org.apache.flink.queryablestate.client.proxy.KvStateClientProxyHandler.getKvStateLookupInfo(KvStateClientProxyHandler.java:228)
>>     at org.apache.flink.queryablestate.client.proxy.KvStateClientProxyHandler.getState(KvStateClientProxyHandler.java:162)
>>     at org.apache.flink.queryablestate.client.proxy.KvStateClientProxyHandler.executeActionAsync(KvStateClientProxyHandler.java:129)
>>     at org.apache.flink.queryablestate.client.proxy.KvStateClientProxyHandler.handleRequest(KvStateClientProxyHandler.java:119)
>>     at org.apache.flink.queryablestate.client.proxy.KvStateClientProxyHandler.handleRequest(KvStateClientProxyHandler.java:63)
>>     at org.apache.flink.queryablestate.network.AbstractServerHandler$AsyncRequestTask.run(AbstractServerHandler.java:236)
>>     at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>>     at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>>     at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>>     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>>     at java.lang.Thread.run(Thread.java:745)
>> From the documentation, I can see that:
>> 
>> The client connects to a Client Proxy running on a given Task Manager. The proxy is the entry point of the client to the Flink cluster. It forwards the requests of the client to the Job Manager and the required Task Manager, and forwards the final response back the client.
>> 
>> Did I miss something? Is the QueryableStateClientProxy only fetching info from a job that is running on his local tm? If so, is there a way to retrieve the job-graph? Or maybe another solution? 
>> 
>> Thanks!
>> Pierre Zemb
>> 
>> -- 
>> Cordialement,
>> Pierre Zemb
>> pierrezemb.fr <>
>> Software Engineer, Metrics Data Platform @OVH
> 
> -- 
> Cordialement,
> Pierre Zemb
> pierrezemb.fr <>
> Software Engineer, Metrics Data Platform @OVH


Re: Question about QueryableState

Posted by Pierre Zemb <pi...@gmail.com>.
Hi!
Just created the JIRA (https://issues.apache.org/jira/browse/FLINK-10225).

Thanks for your reply,
Pierre

Le jeu. 23 août 2018 à 14:31, Kostas Kloudas <k....@data-artisans.com>
a écrit :

> Hi Pierre,
>
> You are right that this should not happen.
> It seems like a bug.
> Could you open a JIRA and post it here?
>
> Thanks,
> Kostas
>
>
> On Aug 21, 2018, at 9:35 PM, Pierre Zemb <pi...@gmail.com>
> wrote:
>
> Hi!
>
> I’ve started to deploy a small Flink cluster (4tm and 1jm for now on
> 1.6.0), and deployed a small job on it. Because of the current load, job is
> completely handled by a single tm. I’ve created a small proxy that is using
> QueryableStateClient
> <https://ci.apache.org/projects/flink/flink-docs-master/api/java/org/apache/flink/queryablestate/client/QueryableStateClient.html>
> to access the current state. It is working nicely, except under certain
> circumstances. It seems to me that I can only access the state through a
> node that is holding a part of the job. Here’s an example:
>
>    - job on tm1. Pointing QueryableStateClient to tm1. State accessible
>    - job still on tm1. Pointing QueryableStateClient to tm2 (for
>    example). State inaccessible
>    - killing tm1, job is now on tm2. State accessible
>    - job still on tm2. Pointing QueryableStateClient to tm3. State
>    inaccessible
>    - adding some parallelism to spread job on tm1 and tm2. Pointing
>    QueryableStateClient to either tm1 and tm2 is working
>    - job still on tm1 and tm2. Pointing QueryableStateClient to tm3.
>    State inaccessible
>
> When the state is inaccessible, I can see this (generated here
> <https://github.com/apache/flink/blob/release-1.6/flink-queryable-state/flink-queryable-state-runtime/src/main/java/org/apache/flink/queryablestate/client/proxy/KvStateClientProxyHandler.java#L228>
> ):
>
> java.lang.RuntimeException: Failed request 0.
>  Caused by: org.apache.flink.queryablestate.exceptions.UnknownLocationException: Could not retrieve location of state=repo-status of job=3ac3bc00b2d5bc0752917186a288d40a. Potential reasons are: i) the state is not ready, or ii) the job does not exist.
>     at org.apache.flink.queryablestate.client.proxy.KvStateClientProxyHandler.getKvStateLookupInfo(KvStateClientProxyHandler.java:228)
>     at org.apache.flink.queryablestate.client.proxy.KvStateClientProxyHandler.getState(KvStateClientProxyHandler.java:162)
>     at org.apache.flink.queryablestate.client.proxy.KvStateClientProxyHandler.executeActionAsync(KvStateClientProxyHandler.java:129)
>     at org.apache.flink.queryablestate.client.proxy.KvStateClientProxyHandler.handleRequest(KvStateClientProxyHandler.java:119)
>     at org.apache.flink.queryablestate.client.proxy.KvStateClientProxyHandler.handleRequest(KvStateClientProxyHandler.java:63)
>     at org.apache.flink.queryablestate.network.AbstractServerHandler$AsyncRequestTask.run(AbstractServerHandler.java:236)
>     at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>     at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>     at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>     at java.lang.Thread.run(Thread.java:745)
>
> From the documentation, I can see that:
>
> The client connects to a Client Proxy running on a given Task Manager. The
> proxy is the entry point of the client to the Flink cluster. It forwards
> the requests of the client to the Job Manager and the required Task
> Manager, and forwards the final response back the client.
>
> Did I miss something? Is the QueryableStateClientProxy only fetching info
> from a job that is running on his local tm? If so, is there a way to
> retrieve the job-graph? Or maybe another solution?
>
> Thanks!
> Pierre Zemb
> ​
> --
> Cordialement,
> Pierre Zemb
> pierrezemb.fr
> Software Engineer, Metrics Data Platform @OVH
>
>
> --
Cordialement,
Pierre Zemb
pierrezemb.fr
Software Engineer, Metrics Data Platform @OVH

Re: Question about QueryableState

Posted by Kostas Kloudas <k....@data-artisans.com>.
Hi Pierre,

You are right that this should not happen.
It seems like a bug.
Could you open a JIRA and post it here?

Thanks,
Kostas

> On Aug 21, 2018, at 9:35 PM, Pierre Zemb <pi...@gmail.com> wrote:
> 
> Hi!
> 
> I’ve started to deploy a small Flink cluster (4tm and 1jm for now on 1.6.0), and deployed a small job on it. Because of the current load, job is completely handled by a single tm. I’ve created a small proxy that is using QueryableStateClient <https://ci.apache.org/projects/flink/flink-docs-master/api/java/org/apache/flink/queryablestate/client/QueryableStateClient.html> to access the current state. It is working nicely, except under certain circumstances. It seems to me that I can only access the state through a node that is holding a part of the job. Here’s an example:
> 
> job on tm1. Pointing QueryableStateClient to tm1. State accessible
> job still on tm1. Pointing QueryableStateClient to tm2 (for example). State inaccessible
> killing tm1, job is now on tm2. State accessible
> job still on tm2. Pointing QueryableStateClient to tm3. State inaccessible
> adding some parallelism to spread job on tm1 and tm2. Pointing QueryableStateClient to either tm1 and tm2 is working
> job still on tm1 and tm2. Pointing QueryableStateClient to tm3. State inaccessible
> When the state is inaccessible, I can see this (generated here <https://github.com/apache/flink/blob/release-1.6/flink-queryable-state/flink-queryable-state-runtime/src/main/java/org/apache/flink/queryablestate/client/proxy/KvStateClientProxyHandler.java#L228>):
> 
> java.lang.RuntimeException: Failed request 0.
>  Caused by: org.apache.flink.queryablestate.exceptions.UnknownLocationException: Could not retrieve location of state=repo-status of job=3ac3bc00b2d5bc0752917186a288d40a. Potential reasons are: i) the state is not ready, or ii) the job does not exist.
>     at org.apache.flink.queryablestate.client.proxy.KvStateClientProxyHandler.getKvStateLookupInfo(KvStateClientProxyHandler.java:228)
>     at org.apache.flink.queryablestate.client.proxy.KvStateClientProxyHandler.getState(KvStateClientProxyHandler.java:162)
>     at org.apache.flink.queryablestate.client.proxy.KvStateClientProxyHandler.executeActionAsync(KvStateClientProxyHandler.java:129)
>     at org.apache.flink.queryablestate.client.proxy.KvStateClientProxyHandler.handleRequest(KvStateClientProxyHandler.java:119)
>     at org.apache.flink.queryablestate.client.proxy.KvStateClientProxyHandler.handleRequest(KvStateClientProxyHandler.java:63)
>     at org.apache.flink.queryablestate.network.AbstractServerHandler$AsyncRequestTask.run(AbstractServerHandler.java:236)
>     at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>     at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>     at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>     at java.lang.Thread.run(Thread.java:745)
> From the documentation, I can see that:
> 
> The client connects to a Client Proxy running on a given Task Manager. The proxy is the entry point of the client to the Flink cluster. It forwards the requests of the client to the Job Manager and the required Task Manager, and forwards the final response back the client.
> 
> Did I miss something? Is the QueryableStateClientProxy only fetching info from a job that is running on his local tm? If so, is there a way to retrieve the job-graph? Or maybe another solution? 
> 
> Thanks!
> Pierre Zemb
> 
> -- 
> Cordialement,
> Pierre Zemb
> pierrezemb.fr <>
> Software Engineer, Metrics Data Platform @OVH