You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@oozie.apache.org by Jialong Wu <ji...@gmail.com> on 2014/05/13 22:46:31 UTC

Oozie workflows suspended in YARN (JA009: Unknown rpc kind RPC_WRITABLE error)

Hi all,

We are observing some strange behaviors in Oozie running workflows under
YARN. Jobs are being launched properly from Oozie, but the workflow would
go into SUSPENDED state with the running action in START_MANUAL state after
about 20 minutes. The only error message I can find is from the Oozie UI
Action Info dialog box and as follows:

Status: START_MANUL
Error Code: JA009
Error Message: JA009: Unknown rpc kind RPC_WRITABLE

We ran into this error when we were configuring Oozie to work with YARN,
and the cause was that Oozie was using the old clients to talk to YARN RM.
That was fixed by setting the correct CATALINA_BASE in oozie-env.sh. We
suspect that somehow Oozie is still using the old client to check the
status of a running job, but we couldn't figure out which configuration is
causing this to happen.

Just to add some additional information regarding this issue. The workflow
only gets suspended when it runs over a certain time limit. Our observation
is about 20 minutes. Any workflow that completes under that time limit
doesn't have this issue.

Have anyone run into this issue before ? Any pointers to how to debug this
issue is very much appreciated !

Cheers,
Jialong

Re: Oozie workflows suspended in YARN (JA009: Unknown rpc kind RPC_WRITABLE error)

Posted by Jialong Wu <ji...@gmail.com>.
Forgot to mention the software versions we are testing with:

Oozie: 3.3.2-cdh4.6.0
Hadoop: 2.0.0-cdh4.6.0

On Tue, May 13, 2014 at 1:46 PM, Jialong Wu <ji...@gmail.com> wrote:

> Hi all,
>
> We are observing some strange behaviors in Oozie running workflows under
> YARN. Jobs are being launched properly from Oozie, but the workflow would
> go into SUSPENDED state with the running action in START_MANUAL state after
> about 20 minutes. The only error message I can find is from the Oozie UI
> Action Info dialog box and as follows:
>
> Status: START_MANUL
> Error Code: JA009
> Error Message: JA009: Unknown rpc kind RPC_WRITABLE
>
> We ran into this error when we were configuring Oozie to work with YARN,
> and the cause was that Oozie was using the old clients to talk to YARN RM.
> That was fixed by setting the correct CATALINA_BASE in oozie-env.sh. We
> suspect that somehow Oozie is still using the old client to check the
> status of a running job, but we couldn't figure out which configuration is
> causing this to happen.
>
> Just to add some additional information regarding this issue. The workflow
> only gets suspended when it runs over a certain time limit. Our observation
> is about 20 minutes. Any workflow that completes under that time limit
> doesn't have this issue.
>
> Have anyone run into this issue before ? Any pointers to how to debug this
> issue is very much appreciated !
>
> Cheers,
> Jialong
>