You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Francisco Gonzalez Barea <Fr...@piksel.com> on 2017/10/19 07:27:51 UTC

Re: Flink REST API async?

Hello,

Going back on this thread, quick question: Will this be supported in next Flink version? If not, when is it expected to be included?

Regards


On 8 Aug 2017, at 15:46, Aljoscha Krettek <al...@apache.org>> wrote:

I quickly talked to Till about this. The new JobManager, once FLIP-6 is implemented, will have a new REST endpoint that allows submitting a JobGraph directly. With this, we no longer have to execute the user main() method in the WebRuntimeMonitor (which is a component that the current JobManager process loads to serve the web frontend and the REST interface).

This should solve the problem, but unfortunately it doesn't solve your current problem.

Best,
Aljoscha
On 8. Aug 2017, at 10:26, Francisco Gonzalez Barea <Fr...@piksel.com>> wrote:

Aha ok… Thanks for your answer Eron.

Regards


On 7 Aug 2017, at 19:04, Eron Wright <er...@gmail.com>> wrote:

When you submit a program via the REST API, the main method executes inside the JobManager process.    Unfortunately a static variable is used to establish the execution environment that the program obtains from `ExecutionEnvironment.getExecutionEnvironment()`.  From the stack trace it appears that two main methods are executing simultaneously and one is corrupting the other.

On Mon, Aug 7, 2017 at 8:21 AM, Francisco Gonzalez Barea <Fr...@piksel.com>> wrote:
Hi there!

We are doing some POCs submitting jobs remotely to Flink. We tried with Flink CLI and now we´re testing the Rest API.

So the point is that when we try to execute a set of requests in an async way (using CompletableFutures) only a couple of them run successfully. For the rest we get the exception copied at the end of the email.

Do you know the reason for this?

Thanks in advance!!
Regards,

org.apache.flink.client.program.ProgramInvocationException: The main method caused an error.
  at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:545)
  at org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:419)
  at org.apache.flink.client.program.OptimizerPlanEnvironment.getOptimizedPlan(OptimizerPlanEnvironment.java:80)
 at org.apache.flink.client.program.ClusterClient.getOptimizedPlan(ClusterClient.java:318)
 at org.apache.flink.runtime.webmonitor.handlers.JarActionHandler.getJobGraphAndClassLoader(JarActionHandler.java:72)
 at org.apache.flink.runtime.webmonitor.handlers.JarRunHandler.handleJsonRequest(JarRunHandler.java:61)
 at org.apache.flink.runtime.webmonitor.handlers.AbstractJsonRequestHandler.handleRequest(AbstractJsonRequestHandler.java:41)
 at org.apache.flink.runtime.webmonitor.RuntimeMonitorHandler.respondAsLeader(RuntimeMonitorHandler.java:109)
 at org.apache.flink.runtime.webmonitor.RuntimeMonitorHandlerBase.channelRead0(RuntimeMonitorHandlerBase.java:97)
 at org.apache.flink.runtime.webmonitor.RuntimeMonitorHandlerBase.channelRead0(RuntimeMonitorHandlerBase.java:44)
 at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
 at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
 at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
 at io.netty.handler.codec.http.router.Handler.routed(Handler.java:62)
 at io.netty.handler.codec.http.router.DualAbstractHandler.channelRead0(DualAbstractHandler.java:57)
 at io.netty.handler.codec.http.router.DualAbstractHandler.channelRead0(DualAbstractHandler.java:20)
 at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
 at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
 at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
 at org.apache.flink.runtime.webmonitor.HttpRequestHandler.channelRead0(HttpRequestHandler.java:159)
 at org.apache.flink.runtime.webmonitor.HttpRequestHandler.channelRead0(HttpRequestHandler.java:65)
 at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
 at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
 at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
 at io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86)
 at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
 at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
 at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:242)
 at io.netty.channel.CombinedChannelDuplexHandler.channelRead(CombinedChannelDuplexHandler.java:147)
 at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
 at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
 at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:847)
 at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131)
 at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
 at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
 at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
 at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
 at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
 at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137)
 at java.lang.Thread.run(Thread.java:748)
 \nCaused by: java.util.ConcurrentModificationException
 at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:901)
 at java.util.ArrayList$Itr.next(ArrayList.java:851)
 at org.apache.flink.api.java.operators.OperatorTranslation.translateToPlan(OperatorTranslation.java:49)
 at org.apache.flink.api.java.ExecutionEnvironment.createProgramPlan(ExecutionEnvironment.java:1065)
 at org.apache.flink.api.java.ExecutionEnvironment.createProgramPlan(ExecutionEnvironment.java:1032)
 at org.apache.flink.client.program.OptimizerPlanEnvironment.execute(OptimizerPlanEnvironment.java:47)
 at com.piksel.sequoia.media.common.launcher.Launcher.main(Launcher.java:56)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
 at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:498)
 at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:528)
 ... 39 more\n”}



This message is private and confidential. If you have received this message in error, please notify the sender or servicedesk@piksel.com<ma...@piksel.com> and remove it from your system.

Piksel Inc is a company registered in the United States, 2100 Powers Ferry Road SE, Suite 400, Atlanta, GA 30339





Re: Flink REST API async?

Posted by Francisco Gonzalez <Fr...@piksel.com>.
Sorry guys,

in the previous message, when I talked about the task managers performance,
I meant *Jobmanager* performance


Francisco Gonzalez wrote
> Hi guys,
> 
> After investigating a bit more about this topic, we found a solution
> adding
> a small change in the Flink-1.3.2 source code. 
> 
> We found that the issue occurred when different threads tried to build the
> Tuple2&lt;JobGraph, ClassLoader&gt; object at the same time (due to they
> use the
> static ExecutionEnvironmnet variable mentioned before in this thread). So,
> we just used a semaphore to lock threads at that point. We have done some
> perf suites in our side, and this seems to be working fine (you can take a
> look at the code at the attached file, and if you decide to include it in
> the flink source code we can create a pull request). JarRunHandler.java
> &lt;http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/file/t1051/JarRunHandler.java&gt;  
> 
> However, after getting this, we have noticed we don´t get too much
> performance improvement in Flink. Besides knowing that the semaphore we´ve
> added would add some latency, we have also realised that some task
> managers
> are not using all their capacity at all. We have increased the CPU and
> Memory capacity for the Flink instance and the same, we don´t realise too
> much improvement. 
> 
> Do you have any clue or hint to make Flink use better the resources?
> 
> Thanks in advance
> 
> 
> 
> --
> Sent from:
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/





--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Flink REST API async?

Posted by Francisco Gonzalez <Fr...@piksel.com>.
Hi guys,

After investigating a bit more about this topic, we found a solution adding
a small change in the Flink-1.3.2 source code. 

We found that the issue occurred when different threads tried to build the
Tuple2<JobGraph, ClassLoader> object at the same time (due to they use the
static ExecutionEnvironmnet variable mentioned before in this thread). So,
we just used a semaphore to lock threads at that point. We have done some
perf suites in our side, and this seems to be working fine (you can take a
look at the code at the attached file, and if you decide to include it in
the flink source code we can create a pull request). JarRunHandler.java
<http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/file/t1051/JarRunHandler.java>  

However, after getting this, we have noticed we don´t get too much
performance improvement in Flink. Besides knowing that the semaphore we´ve
added would add some latency, we have also realised that some task managers
are not using all their capacity at all. We have increased the CPU and
Memory capacity for the Flink instance and the same, we don´t realise too
much improvement. 

Do you have any clue or hint to make Flink use better the resources?

Thanks in advance



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Flink REST API async?

Posted by Aljoscha Krettek <al...@apache.org>.
Hi,

Unfortunately, the FLIP-6 efforts are taking longer than expected and we won't have those changes to the REST API in the 1.4 release (which should happen in about a month).  We are planning to very quickly release 1.5 after that, with the changes to the REST API.

The only work-around I can think of so far is to use one cluster, i.e. one JobManager per job that you submit. This is actually what we started recommending a while ago and is something that we will recommend more aggressively in the Future because having only one Job per JobManager makes managing and debugging easier.

Best,
Aljoscha

> On 19. Oct 2017, at 09:27, Francisco Gonzalez Barea <Fr...@piksel.com> wrote:
> 
> Hello,
> 
> Going back on this thread, quick question: Will this be supported in next Flink version? If not, when is it expected to be included?
> 
> Regards
> 
> 
>> On 8 Aug 2017, at 15:46, Aljoscha Krettek <aljoscha@apache.org <ma...@apache.org>> wrote:
>> 
>> I quickly talked to Till about this. The new JobManager, once FLIP-6 is implemented, will have a new REST endpoint that allows submitting a JobGraph directly. With this, we no longer have to execute the user main() method in the WebRuntimeMonitor (which is a component that the current JobManager process loads to serve the web frontend and the REST interface).
>> 
>> This should solve the problem, but unfortunately it doesn't solve your current problem.
>> 
>> Best,
>> Aljoscha
>>> On 8. Aug 2017, at 10:26, Francisco Gonzalez Barea <Francisco.Gonzalez@piksel.com <ma...@piksel.com>> wrote:
>>> 
>>> Aha ok… Thanks for your answer Eron.
>>> 
>>> Regards
>>> 
>>> 
>>>> On 7 Aug 2017, at 19:04, Eron Wright <eronwright@gmail.com <ma...@gmail.com>> wrote:
>>>> 
>>>> When you submit a program via the REST API, the main method executes inside the JobManager process.    Unfortunately a static variable is used to establish the execution environment that the program obtains from `ExecutionEnvironment.getExecutionEnvironment()`.  From the stack trace it appears that two main methods are executing simultaneously and one is corrupting the other.
>>>> 
>>>> On Mon, Aug 7, 2017 at 8:21 AM, Francisco Gonzalez Barea <Francisco.Gonzalez@piksel.com <ma...@piksel.com>> wrote:
>>>> Hi there!
>>>> 
>>>> We are doing some POCs submitting jobs remotely to Flink. We tried with Flink CLI and now we´re testing the Rest API. 
>>>> 
>>>> So the point is that when we try to execute a set of requests in an async way (using CompletableFutures) only a couple of them run successfully. For the rest we get the exception copied at the end of the email.
>>>> 
>>>> Do you know the reason for this?
>>>> 
>>>> Thanks in advance!!
>>>> Regards,
>>>> 
>>>> org.apache.flink.client.program.ProgramInvocationException: The main method caused an error.
>>>>   at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:545)
>>>>   at org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:419)
>>>>   at org.apache.flink.client.program.OptimizerPlanEnvironment.getOptimizedPlan(OptimizerPlanEnvironment.java:80)
>>>>  at org.apache.flink.client.program.ClusterClient.getOptimizedPlan(ClusterClient.java:318)
>>>>  at org.apache.flink.runtime.webmonitor.handlers.JarActionHandler.getJobGraphAndClassLoader(JarActionHandler.java:72)
>>>>  at org.apache.flink.runtime.webmonitor.handlers.JarRunHandler.handleJsonRequest(JarRunHandler.java:61)
>>>>  at org.apache.flink.runtime.webmonitor.handlers.AbstractJsonRequestHandler.handleRequest(AbstractJsonRequestHandler.java:41)
>>>>  at org.apache.flink.runtime.webmonitor.RuntimeMonitorHandler.respondAsLeader(RuntimeMonitorHandler.java:109)
>>>>  at org.apache.flink.runtime.webmonitor.RuntimeMonitorHandlerBase.channelRead0(RuntimeMonitorHandlerBase.java:97)
>>>>  at org.apache.flink.runtime.webmonitor.RuntimeMonitorHandlerBase.channelRead0(RuntimeMonitorHandlerBase.java:44)
>>>>  at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
>>>>  at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
>>>>  at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
>>>>  at io.netty.handler.codec.http.router.Handler.routed(Handler.java:62)
>>>>  at io.netty.handler.codec.http.router.DualAbstractHandler.channelRead0(DualAbstractHandler.java:57)
>>>>  at io.netty.handler.codec.http.router.DualAbstractHandler.channelRead0(DualAbstractHandler.java:20)
>>>>  at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
>>>>  at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
>>>>  at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
>>>>  at org.apache.flink.runtime.webmonitor.HttpRequestHandler.channelRead0(HttpRequestHandler.java:159)
>>>>  at org.apache.flink.runtime.webmonitor.HttpRequestHandler.channelRead0(HttpRequestHandler.java:65)
>>>>  at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
>>>>  at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
>>>>  at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
>>>>  at io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86)
>>>>  at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
>>>>  at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
>>>>  at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:242)
>>>>  at io.netty.channel.CombinedChannelDuplexHandler.channelRead(CombinedChannelDuplexHandler.java:147)
>>>>  at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
>>>>  at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
>>>>  at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:847)
>>>>  at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131)
>>>>  at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
>>>>  at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
>>>>  at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
>>>>  at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
>>>>  at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
>>>>  at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137)
>>>>  at java.lang.Thread.run(Thread.java:748)
>>>>  \nCaused by: java.util.ConcurrentModificationException
>>>>  at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:901)
>>>>  at java.util.ArrayList$Itr.next(ArrayList.java:851)
>>>>  at org.apache.flink.api.java.operators.OperatorTranslation.translateToPlan(OperatorTranslation.java:49)
>>>>  at org.apache.flink.api.java.ExecutionEnvironment.createProgramPlan(ExecutionEnvironment.java:1065)
>>>>  at org.apache.flink.api.java.ExecutionEnvironment.createProgramPlan(ExecutionEnvironment.java:1032)
>>>>  at org.apache.flink.client.program.OptimizerPlanEnvironment.execute(OptimizerPlanEnvironment.java:47)
>>>>  at com.piksel.sequoia.media.common.launcher.Launcher.main(Launcher.java:56)
>>>>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>>  at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>>>>  at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>>>  at java.lang.reflect.Method.invoke(Method.java:498)
>>>>  at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:528)
>>>>  ... 39 more\n”}
>>>> 
>>>> 
>>>> This message is private and confidential. If you have received this message in error, please notify the sender or servicedesk@piksel.com <ma...@piksel.com> and remove it from your system.
>>>> 
>>>> Piksel Inc is a company registered in the United States, 2100 Powers Ferry Road SE, Suite 400, Atlanta, GA 30339
>>>> 
>>>> 
>>> 
>> 
>