You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@oozie.apache.org by Dongying Jiao <pi...@gmail.com> on 2017/06/06 07:04:03 UTC

Oozie job in PREP state for a long time

Hi:
I have a oozie coordinator job run at 02:00 o'clock everyday, sometimes,
the job can run smoothly, but sometimes, the job is stuck in PREP state for
a long time.

This is my part of my coordinator.xml:
<coordinator-app name="CoordinatorForETL"
  frequency="${coordinatorFrequency}"
  start="${startTime}" end="${endTime}" timezone="America/New_York"
  xmlns="uri:oozie:coordinator:0.2">
  <controls>
    <timeout>10</timeout>
    <concurrency>1</concurrency>
  </controls>
  <action>
    <workflow>
.............
This is part of the workflow.xml:
......
<start to="flowDecision"/>
  <decision name="flowDecision">
  <switch>
    <case to="q1">${workflowType eq "etl" || workflowType eq "all"}</case>
    <case to="prediction">${workflowType eq "prediction"}</case>
    <case to="errorOnDecision">${workflowType eq "cleaning"}</case>
    <default to="errorOnDecision"/>
  </switch>
   </decision>
.......

From my latest run, the job in PREP state for about 30 min. From oozie log,
the "start" node of the job is done at 02:00, but until 02:32, the
"flowDecision" node started to execute. During that period, I can see other
oozie jobs are running from log, but didn't find any error or exception in
log.

From my understanding, oozie job in PREP state means the job is not
submitted to yarn yet, so can't find application id on yarn.
I wonder if this relates to oozie queue mechanism or concurrency control.
If yes, do you have experience on how to tune them?

Thanks a lot.

Best Regards,
Dong Ying

Re: Oozie job in PREP state for a long time

Posted by Dongying Jiao <pi...@gmail.com>.

Hi Attila:
The values of CallableQueueServices are default in my oozie-site. I will
try to increase them and see the result.
Thank you very much.

Best Regards,
Dong Ying

2017-06-08 1:18 GMT+08:00 Attila Sasvari <as...@cloudera.com>:

>  In the Oozie book (Apache Oozie: The Workflow Scheduler for Hadoop by
> Mohammad Kamrul Islam & Aravind Srinivasan) there are some hints on server
> tuning (see Chapter 11 Oozie operations / Service settings / The
> CallableQueueService).
>
> Default settings for the CallableQueueService are quite conservative. If
> you increase oozie.service.CallableQueueService.threads, then change
> oozie.service.CallableQueueService.callable.concurrency accordingly, and
> consider increasing the Oozie server’s VM heap size. For example if you
> bump oozie.service.CallableQueueService.threads to 100, set
> oozie.service.CallableQueueService.callable.concurrency to 30. You can
> also
> adjust oozie.service.CallableQueueService.queue.size.
>
> However, finding optimal settings for your Oozie server really depends on
> your environment (e.g. hardware size, resources, server capacity) and
> workflow charateristics.
>
> Hope this helps,
> Attila
>
> On Wed, Jun 7, 2017 at 6:38 AM, Dongying Jiao <pi...@gmail.com>
> wrote:
>
> > Hi Andras and Attila:
> > Thanks for your advice.
> > I will check the cluster utility when this job runs next time, but I find
> > some warning in oozie.log:
> >
> > 2017-06-05 02:18:18,952  WARN CallableQueueService:523 - SERVER[
> > 363748lpp2mn006.geicoddc.net] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[-]
> > ACTION[-] max concurrency for callable [switch] exceeded, requeueing with
> > [500]ms delay
> >
> > 2017-06-05 02:18:38,433  WARN CallableQueueService:523 - SERVER[
> > 363748lpp2mn006.geicoddc.net] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[-]
> > ACTION[-] max concurrency for callable [#composite#job.notification]
> > exceeded, requeueing with [500]ms delay
> >
> > Does it mean I should increase oozie.service.
> > CallableQueueService.callable.
> > concurrency?
> >
> > BTW, I am using Oozie 4.2.0.
> >
> > Thanks
> >
> >
> > 2017-06-06 21:04 GMT+08:00 Attila Sasvari <as...@cloudera.com>:
> >
> > > Hi Dong Ying,
> > >
> > > Many thanks Andras, these are good ideas.
> > >
> > > In addition, can you confirm that you have enough vcores / memory in
> your
> > > cluster for containers?
> > >
> > > You can check and try to adjust the following YARN settings:
> > > - yarn.nodemanager.resource.cpu-vcores
> > > - yarn.nodemanager.resource.memory-mb
> > >  (look at your yarn-site.xml / yarn-default.xml)
> > >
> > > Also I would also recommend check overall cluster utilization when
> Oozie
> > > jobs get into PREP state. Are there a lot of running jobs using a lot
> of
> > > resources (vcores, memory) at the time when your coordinator tries to
> > > submit the job? You can look at resource manager and history server.
> Hope
> > > this helps.
> > >
> > > Best,
> > > - Attila
> > >
> > > * yarn settings -
> > > https://hadoop.apache.org/docs/r2.6.0/hadoop-yarn/
> > hadoop-yarn-common/yarn-
> > > default.xml
> > >
> > >
> > >
> > >
> > > On Tue, Jun 6, 2017 at 2:26 PM, Andras Piros <
> andras.piros@cloudera.com>
> > > wrote:
> > >
> > > > Hi Dong Ying,
> > > >
> > > > do you see any logs having this snippet queue is full within the
> Oozie
> > > > webapp logs?
> > > >
> > > > What are the values of these parameters:
> > > >
> > > >    -
> > > >
> > > >    oozie.service.CallableQueueService.queue.size
> > > >
> > > >    -
> > > >
> > > >    oozie.service.CallableQueueService.threads
> > > >
> > > >    -
> > > >
> > > >    oozie.service.CallableQueueService.callable.concurrency
> > > >
> > > >
> > > > Regards,
> > > >
> > > > Andras
> > > >
> > > > On Tue, Jun 6, 2017 at 9:04 AM, Dongying Jiao <
> pineapplejdy@gmail.com>
> > > > wrote:
> > > >
> > > > > Hi:
> > > > > I have a oozie coordinator job run at 02:00 o'clock everyday,
> > > sometimes,
> > > > > the job can run smoothly, but sometimes, the job is stuck in PREP
> > state
> > > > for
> > > > > a long time.
> > > > >
> > > > > This is my part of my coordinator.xml:
> > > > > <coordinator-app name="CoordinatorForETL"
> > > > >   frequency="${coordinatorFrequency}"
> > > > >   start="${startTime}" end="${endTime}" timezone="America/New_York"
> > > > >   xmlns="uri:oozie:coordinator:0.2">
> > > > >   <controls>
> > > > >     <timeout>10</timeout>
> > > > >     <concurrency>1</concurrency>
> > > > >   </controls>
> > > > >   <action>
> > > > >     <workflow>
> > > > > .............
> > > > > This is part of the workflow.xml:
> > > > > ......
> > > > > <start to="flowDecision"/>
> > > > >   <decision name="flowDecision">
> > > > >   <switch>
> > > > >     <case to="q1">${workflowType eq "etl" || workflowType eq
> > > > "all"}</case>
> > > > >     <case to="prediction">${workflowType eq "prediction"}</case>
> > > > >     <case to="errorOnDecision">${workflowType eq
> "cleaning"}</case>
> > > > >     <default to="errorOnDecision"/>
> > > > >   </switch>
> > > > >    </decision>
> > > > > .......
> > > > >
> > > > > From my latest run, the job in PREP state for about 30 min. From
> > oozie
> > > > log,
> > > > > the "start" node of the job is done at 02:00, but until 02:32, the
> > > > > "flowDecision" node started to execute. During that period, I can
> see
> > > > other
> > > > > oozie jobs are running from log, but didn't find any error or
> > exception
> > > > in
> > > > > log.
> > > > >
> > > > > From my understanding, oozie job in PREP state means the job is not
> > > > > submitted to yarn yet, so can't find application id on yarn.
> > > > > I wonder if this relates to oozie queue mechanism or concurrency
> > > control.
> > > > > If yes, do you have experience on how to tune them?
> > > > >
> > > > > Thanks a lot.
> > > > >
> > > > > Best Regards,
> > > > > Dong Ying
> > > > >
> > > >
> > >
> >
>

Re: Oozie job in PREP state for a long time

Posted by Attila Sasvari <as...@cloudera.com>.

 In the Oozie book (Apache Oozie: The Workflow Scheduler for Hadoop by
Mohammad Kamrul Islam & Aravind Srinivasan) there are some hints on server
tuning (see Chapter 11 Oozie operations / Service settings / The
CallableQueueService).

Default settings for the CallableQueueService are quite conservative. If
you increase oozie.service.CallableQueueService.threads, then change
oozie.service.CallableQueueService.callable.concurrency accordingly, and
consider increasing the Oozie server’s VM heap size. For example if you
bump oozie.service.CallableQueueService.threads to 100, set
oozie.service.CallableQueueService.callable.concurrency to 30. You can also
adjust oozie.service.CallableQueueService.queue.size.

However, finding optimal settings for your Oozie server really depends on
your environment (e.g. hardware size, resources, server capacity) and
workflow charateristics.

Hope this helps,
Attila

On Wed, Jun 7, 2017 at 6:38 AM, Dongying Jiao <pi...@gmail.com>
wrote:

> Hi Andras and Attila:
> Thanks for your advice.
> I will check the cluster utility when this job runs next time, but I find
> some warning in oozie.log:
>
> 2017-06-05 02:18:18,952  WARN CallableQueueService:523 - SERVER[
> 363748lpp2mn006.geicoddc.net] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[-]
> ACTION[-] max concurrency for callable [switch] exceeded, requeueing with
> [500]ms delay
>
> 2017-06-05 02:18:38,433  WARN CallableQueueService:523 - SERVER[
> 363748lpp2mn006.geicoddc.net] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[-]
> ACTION[-] max concurrency for callable [#composite#job.notification]
> exceeded, requeueing with [500]ms delay
>
> Does it mean I should increase oozie.service.
> CallableQueueService.callable.
> concurrency?
>
> BTW, I am using Oozie 4.2.0.
>
> Thanks
>
>
> 2017-06-06 21:04 GMT+08:00 Attila Sasvari <as...@cloudera.com>:
>
> > Hi Dong Ying,
> >
> > Many thanks Andras, these are good ideas.
> >
> > In addition, can you confirm that you have enough vcores / memory in your
> > cluster for containers?
> >
> > You can check and try to adjust the following YARN settings:
> > - yarn.nodemanager.resource.cpu-vcores
> > - yarn.nodemanager.resource.memory-mb
> >  (look at your yarn-site.xml / yarn-default.xml)
> >
> > Also I would also recommend check overall cluster utilization when Oozie
> > jobs get into PREP state. Are there a lot of running jobs using a lot of
> > resources (vcores, memory) at the time when your coordinator tries to
> > submit the job? You can look at resource manager and history server. Hope
> > this helps.
> >
> > Best,
> > - Attila
> >
> > * yarn settings -
> > https://hadoop.apache.org/docs/r2.6.0/hadoop-yarn/
> hadoop-yarn-common/yarn-
> > default.xml
> >
> >
> >
> >
> > On Tue, Jun 6, 2017 at 2:26 PM, Andras Piros <an...@cloudera.com>
> > wrote:
> >
> > > Hi Dong Ying,
> > >
> > > do you see any logs having this snippet queue is full within the Oozie
> > > webapp logs?
> > >
> > > What are the values of these parameters:
> > >
> > >    -
> > >
> > >    oozie.service.CallableQueueService.queue.size
> > >
> > >    -
> > >
> > >    oozie.service.CallableQueueService.threads
> > >
> > >    -
> > >
> > >    oozie.service.CallableQueueService.callable.concurrency
> > >
> > >
> > > Regards,
> > >
> > > Andras
> > >
> > > On Tue, Jun 6, 2017 at 9:04 AM, Dongying Jiao <pi...@gmail.com>
> > > wrote:
> > >
> > > > Hi:
> > > > I have a oozie coordinator job run at 02:00 o'clock everyday,
> > sometimes,
> > > > the job can run smoothly, but sometimes, the job is stuck in PREP
> state
> > > for
> > > > a long time.
> > > >
> > > > This is my part of my coordinator.xml:
> > > > <coordinator-app name="CoordinatorForETL"
> > > >   frequency="${coordinatorFrequency}"
> > > >   start="${startTime}" end="${endTime}" timezone="America/New_York"
> > > >   xmlns="uri:oozie:coordinator:0.2">
> > > >   <controls>
> > > >     <timeout>10</timeout>
> > > >     <concurrency>1</concurrency>
> > > >   </controls>
> > > >   <action>
> > > >     <workflow>
> > > > .............
> > > > This is part of the workflow.xml:
> > > > ......
> > > > <start to="flowDecision"/>
> > > >   <decision name="flowDecision">
> > > >   <switch>
> > > >     <case to="q1">${workflowType eq "etl" || workflowType eq
> > > "all"}</case>
> > > >     <case to="prediction">${workflowType eq "prediction"}</case>
> > > >     <case to="errorOnDecision">${workflowType eq "cleaning"}</case>
> > > >     <default to="errorOnDecision"/>
> > > >   </switch>
> > > >    </decision>
> > > > .......
> > > >
> > > > From my latest run, the job in PREP state for about 30 min. From
> oozie
> > > log,
> > > > the "start" node of the job is done at 02:00, but until 02:32, the
> > > > "flowDecision" node started to execute. During that period, I can see
> > > other
> > > > oozie jobs are running from log, but didn't find any error or
> exception
> > > in
> > > > log.
> > > >
> > > > From my understanding, oozie job in PREP state means the job is not
> > > > submitted to yarn yet, so can't find application id on yarn.
> > > > I wonder if this relates to oozie queue mechanism or concurrency
> > control.
> > > > If yes, do you have experience on how to tune them?
> > > >
> > > > Thanks a lot.
> > > >
> > > > Best Regards,
> > > > Dong Ying
> > > >
> > >
> >
>

Re: Oozie job in PREP state for a long time

Posted by Dongying Jiao <pi...@gmail.com>.

Hi Andras and Attila:
Thanks for your advice.
I will check the cluster utility when this job runs next time, but I find
some warning in oozie.log:

2017-06-05 02:18:18,952  WARN CallableQueueService:523 - SERVER[
363748lpp2mn006.geicoddc.net] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[-]
ACTION[-] max concurrency for callable [switch] exceeded, requeueing with
[500]ms delay

2017-06-05 02:18:38,433  WARN CallableQueueService:523 - SERVER[
363748lpp2mn006.geicoddc.net] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[-]
ACTION[-] max concurrency for callable [#composite#job.notification]
exceeded, requeueing with [500]ms delay

Does it mean I should increase oozie.service.CallableQueueService.callable.
concurrency?

BTW, I am using Oozie 4.2.0.

Thanks


2017-06-06 21:04 GMT+08:00 Attila Sasvari <as...@cloudera.com>:

> Hi Dong Ying,
>
> Many thanks Andras, these are good ideas.
>
> In addition, can you confirm that you have enough vcores / memory in your
> cluster for containers?
>
> You can check and try to adjust the following YARN settings:
> - yarn.nodemanager.resource.cpu-vcores
> - yarn.nodemanager.resource.memory-mb
>  (look at your yarn-site.xml / yarn-default.xml)
>
> Also I would also recommend check overall cluster utilization when Oozie
> jobs get into PREP state. Are there a lot of running jobs using a lot of
> resources (vcores, memory) at the time when your coordinator tries to
> submit the job? You can look at resource manager and history server. Hope
> this helps.
>
> Best,
> - Attila
>
> * yarn settings -
> https://hadoop.apache.org/docs/r2.6.0/hadoop-yarn/hadoop-yarn-common/yarn-
> default.xml
>
>
>
>
> On Tue, Jun 6, 2017 at 2:26 PM, Andras Piros <an...@cloudera.com>
> wrote:
>
> > Hi Dong Ying,
> >
> > do you see any logs having this snippet queue is full within the Oozie
> > webapp logs?
> >
> > What are the values of these parameters:
> >
> >    -
> >
> >    oozie.service.CallableQueueService.queue.size
> >
> >    -
> >
> >    oozie.service.CallableQueueService.threads
> >
> >    -
> >
> >    oozie.service.CallableQueueService.callable.concurrency
> >
> >
> > Regards,
> >
> > Andras
> >
> > On Tue, Jun 6, 2017 at 9:04 AM, Dongying Jiao <pi...@gmail.com>
> > wrote:
> >
> > > Hi:
> > > I have a oozie coordinator job run at 02:00 o'clock everyday,
> sometimes,
> > > the job can run smoothly, but sometimes, the job is stuck in PREP state
> > for
> > > a long time.
> > >
> > > This is my part of my coordinator.xml:
> > > <coordinator-app name="CoordinatorForETL"
> > >   frequency="${coordinatorFrequency}"
> > >   start="${startTime}" end="${endTime}" timezone="America/New_York"
> > >   xmlns="uri:oozie:coordinator:0.2">
> > >   <controls>
> > >     <timeout>10</timeout>
> > >     <concurrency>1</concurrency>
> > >   </controls>
> > >   <action>
> > >     <workflow>
> > > .............
> > > This is part of the workflow.xml:
> > > ......
> > > <start to="flowDecision"/>
> > >   <decision name="flowDecision">
> > >   <switch>
> > >     <case to="q1">${workflowType eq "etl" || workflowType eq
> > "all"}</case>
> > >     <case to="prediction">${workflowType eq "prediction"}</case>
> > >     <case to="errorOnDecision">${workflowType eq "cleaning"}</case>
> > >     <default to="errorOnDecision"/>
> > >   </switch>
> > >    </decision>
> > > .......
> > >
> > > From my latest run, the job in PREP state for about 30 min. From oozie
> > log,
> > > the "start" node of the job is done at 02:00, but until 02:32, the
> > > "flowDecision" node started to execute. During that period, I can see
> > other
> > > oozie jobs are running from log, but didn't find any error or exception
> > in
> > > log.
> > >
> > > From my understanding, oozie job in PREP state means the job is not
> > > submitted to yarn yet, so can't find application id on yarn.
> > > I wonder if this relates to oozie queue mechanism or concurrency
> control.
> > > If yes, do you have experience on how to tune them?
> > >
> > > Thanks a lot.
> > >
> > > Best Regards,
> > > Dong Ying
> > >
> >
>

Re: Oozie job in PREP state for a long time

Posted by Attila Sasvari <as...@cloudera.com>.

Hi Dong Ying,

Many thanks Andras, these are good ideas.

In addition, can you confirm that you have enough vcores / memory in your
cluster for containers?

You can check and try to adjust the following YARN settings:
- yarn.nodemanager.resource.cpu-vcores
- yarn.nodemanager.resource.memory-mb
 (look at your yarn-site.xml / yarn-default.xml)

Also I would also recommend check overall cluster utilization when Oozie
jobs get into PREP state. Are there a lot of running jobs using a lot of
resources (vcores, memory) at the time when your coordinator tries to
submit the job? You can look at resource manager and history server. Hope
this helps.

Best,
- Attila

* yarn settings -
https://hadoop.apache.org/docs/r2.6.0/hadoop-yarn/hadoop-yarn-common/yarn-default.xml




On Tue, Jun 6, 2017 at 2:26 PM, Andras Piros <an...@cloudera.com>
wrote:

> Hi Dong Ying,
>
> do you see any logs having this snippet queue is full within the Oozie
> webapp logs?
>
> What are the values of these parameters:
>
>    -
>
>    oozie.service.CallableQueueService.queue.size
>
>    -
>
>    oozie.service.CallableQueueService.threads
>
>    -
>
>    oozie.service.CallableQueueService.callable.concurrency
>
>
> Regards,
>
> Andras
>
> On Tue, Jun 6, 2017 at 9:04 AM, Dongying Jiao <pi...@gmail.com>
> wrote:
>
> > Hi:
> > I have a oozie coordinator job run at 02:00 o'clock everyday, sometimes,
> > the job can run smoothly, but sometimes, the job is stuck in PREP state
> for
> > a long time.
> >
> > This is my part of my coordinator.xml:
> > <coordinator-app name="CoordinatorForETL"
> >   frequency="${coordinatorFrequency}"
> >   start="${startTime}" end="${endTime}" timezone="America/New_York"
> >   xmlns="uri:oozie:coordinator:0.2">
> >   <controls>
> >     <timeout>10</timeout>
> >     <concurrency>1</concurrency>
> >   </controls>
> >   <action>
> >     <workflow>
> > .............
> > This is part of the workflow.xml:
> > ......
> > <start to="flowDecision"/>
> >   <decision name="flowDecision">
> >   <switch>
> >     <case to="q1">${workflowType eq "etl" || workflowType eq
> "all"}</case>
> >     <case to="prediction">${workflowType eq "prediction"}</case>
> >     <case to="errorOnDecision">${workflowType eq "cleaning"}</case>
> >     <default to="errorOnDecision"/>
> >   </switch>
> >    </decision>
> > .......
> >
> > From my latest run, the job in PREP state for about 30 min. From oozie
> log,
> > the "start" node of the job is done at 02:00, but until 02:32, the
> > "flowDecision" node started to execute. During that period, I can see
> other
> > oozie jobs are running from log, but didn't find any error or exception
> in
> > log.
> >
> > From my understanding, oozie job in PREP state means the job is not
> > submitted to yarn yet, so can't find application id on yarn.
> > I wonder if this relates to oozie queue mechanism or concurrency control.
> > If yes, do you have experience on how to tune them?
> >
> > Thanks a lot.
> >
> > Best Regards,
> > Dong Ying
> >
>

Re: Oozie job in PREP state for a long time

Posted by Andras Piros <an...@cloudera.com>.

Hi Dong Ying,

do you see any logs having this snippet queue is full within the Oozie
webapp logs?

What are the values of these parameters:

   -

   oozie.service.CallableQueueService.queue.size

   -

   oozie.service.CallableQueueService.threads

   -

   oozie.service.CallableQueueService.callable.concurrency


Regards,

Andras

On Tue, Jun 6, 2017 at 9:04 AM, Dongying Jiao <pi...@gmail.com>
wrote:

> Hi:
> I have a oozie coordinator job run at 02:00 o'clock everyday, sometimes,
> the job can run smoothly, but sometimes, the job is stuck in PREP state for
> a long time.
>
> This is my part of my coordinator.xml:
> <coordinator-app name="CoordinatorForETL"
>   frequency="${coordinatorFrequency}"
>   start="${startTime}" end="${endTime}" timezone="America/New_York"
>   xmlns="uri:oozie:coordinator:0.2">
>   <controls>
>     <timeout>10</timeout>
>     <concurrency>1</concurrency>
>   </controls>
>   <action>
>     <workflow>
> .............
> This is part of the workflow.xml:
> ......
> <start to="flowDecision"/>
>   <decision name="flowDecision">
>   <switch>
>     <case to="q1">${workflowType eq "etl" || workflowType eq "all"}</case>
>     <case to="prediction">${workflowType eq "prediction"}</case>
>     <case to="errorOnDecision">${workflowType eq "cleaning"}</case>
>     <default to="errorOnDecision"/>
>   </switch>
>    </decision>
> .......
>
> From my latest run, the job in PREP state for about 30 min. From oozie log,
> the "start" node of the job is done at 02:00, but until 02:32, the
> "flowDecision" node started to execute. During that period, I can see other
> oozie jobs are running from log, but didn't find any error or exception in
> log.
>
> From my understanding, oozie job in PREP state means the job is not
> submitted to yarn yet, so can't find application id on yarn.
> I wonder if this relates to oozie queue mechanism or concurrency control.
> If yes, do you have experience on how to tune them?
>
> Thanks a lot.
>
> Best Regards,
> Dong Ying
>