You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@oozie.apache.org by Shwetha GS <sh...@inmobi.com> on 2013/11/08 13:38:30 UTC

Issue with CallableQueueService

Hi,

We have seen weird issues with CallableQueueService with oozie 3.3.2. We
couldn't root-cause the exact code causing the issue, so not sure if its
already fixed in 4.0. Any pointers will be helpful:

Materialisation for a coord just stops. CoordMaterializeTriggerService
picks up that coord at every materialisation interval,
but CoordMaterializeTransitionXCommand doesn't get called. Looks
like CoordMaterializeTransitionXCommand is lost somewhere in the queue.
Whenever this issue happens, the number of coords picked up for
materialisation is 40-50 and we also see this log:
oozie.log-2013-11-08-01:2013-11-08 01:00:12,225  WARN
CallableQueueService:542 - USER[-] GROUP[-] max concurrency for callable
[#composite#coord_mater] exceeded, requeueing with [500]ms delay
Restarting oozie resumes materialization.

Looks like materialisation batch size is 50, and in callable queue service,
composite callable batch size is set to 10, and max concurrency is 3. So,
when there are more than 30 coords picked for materialisation, the 4th/5th
batch of coords is magically lost somewhere. Code looks fine and don't know
where the leak is.

Tried re-producing this in local machine by tuning these configs, but
couldn't get anything

Thanks,
Shwetha

-- 
_____________________________________________________________
The information contained in this communication is intended solely for the 
use of the individual or entity to whom it is addressed and others 
authorized to receive it. It may contain confidential or legally privileged 
information. If you are not the intended recipient you are hereby notified 
that any disclosure, copying, distribution or taking any action in reliance 
on the contents of this information is strictly prohibited and may be 
unlawful. If you have received this communication in error, please notify 
us immediately by responding to this email and then delete it from your 
system. The firm is neither liable for the proper and complete transmission 
of the information contained in this communication nor for any delay in its 
receipt.

Re: Issue with CallableQueueService

Posted by Shwetha GS <sh...@inmobi.com>.
For a coord, materialisation happens for sometime and then just stops. But
materialisation resumes after oozie restart. The issue is with just a few
coords and not all. There are no actions for that coord in waiting state.


On Tue, Nov 12, 2013 at 2:55 AM, Mohammad Islam <mi...@yahoo.com> wrote:

> Hi Swetha,
> I think I didn't understand the problem correctly.
>
> 1. Is coordinator materialization taking very long?
> In this case, throttle and concurrency will be the way. And it is not a
> bug.
>
> 2. OR it is not materializing anything?
> If it doesn't materialized any action that is an issue and potential bug.
> If it materialized few and then stop forever, than it could be an issue
> too. In the second  case, how many actions are in WAITING state?
>
> Regards,
> Mohammad
>
>
>
>
>
> On Sunday, November 10, 2013 11:16 PM, Shwetha GS <sh...@inmobi.com>
> wrote:
>
> Yes, we can increase the number of threads. We already change the throttle
> for coords depending on coord  frequency.
>
> I agree that we can tune it better. But it will probably just make the
> issue less frequent. I was wondering where the actual issue is.
>
>
> On Mon, Nov 11, 2013 at 12:40 PM, Mohammad Islam <mi...@yahoo.com>
> wrote:
>
> > I can see these two props could play some role.
> > >>1. oozie.service.CallableQueueService.threads - 30 (We can probably
> > increase this)
> >
> > Definitely increasing this will be helpful. Try to use 100 (for example).
> > What is the Oozie server JVM -xmx value?
> > It could allow you to increase callable concurrency further.
> >
> >
> > >>2. oozie.service.coord.default.throttle - default, 12
> > It means there could be at most 12 coordinator actions be in WAITING
> state
> > for ONE job. You can override this for specific job through its control
> > section (name is "throttle") of coordinator xml.
> >
> > Regards,
> > Mohammad
> >
> >
> >
> > On Sunday, November 10, 2013 9:48 PM, Shwetha GS <sh...@inmobi.com>
> > wrote:
> >
> > Hi Mohammad,
> >
> > Thanks for checking this.
> >
> > There was no other related warning about queueing. The default value of
> > CoordMaterializeTriggerService.materialization.system.limit is 50
> >     private static final int CONF_MATERIALIZATION_SYSTEM_LIMIT_DEFAULT =
> > 50;
> >                 int materializationLimit = Services.get().getConf()
> >                         .getInt(CONF_MATERIALIZATION_SYSTEM_LIMIT,
> > CONF_MATERIALIZATION_SYSTEM_LIMIT_DEFAULT);
> >
> >
> > Since increasing the concurrency also affects other commands, we
> decreased
> > the coord materialization batch size to 30.
> >
> > oozie.service.CallableQueueService.queue.size - 10000
> > oozie.service.CallableQueueService.threads - 30 (We can probably increase
> > this)
> > oozie.service.coord.default.throttle - default, 12
> >
> > Looks like some edge case. Let me know if you need more info
> >
> > -Shwetha
> >
> > On Mon, Nov 11, 2013 at 9:50 AM, Mohammad Islam <mi...@yahoo.com>
> > wrote:
> >
> > > Hi Swetha,
> > > Was there any other warning message or requeuing message happened (pls
> > > check ooze.log)?
> > > Why the materialization batch size is 50? default is 10. right?
> > > Did you try to increase the concurrency value to 10 (for example)?
> > >
> > >
> > > What is the value "queue.size"? It should show if the system was
> heavily
> > > loaded.
> > >
> > > What are the values for these props:
> > > oozie.service.CallableQueueService.queue.size
> > >
> > > oozie.service.CallableQueueService.threads
> > >
> > > oozie.service.coord.default.throttle
> > >
> > >
> > > Regards,
> > > Mohammad
> > >
> > >
> > >
> > >
> > > On Friday, November 8, 2013 6:42 PM, Shwetha GS <shwetha.gs@inmobi.com
> >
> > > wrote:
> > >
> > > Oozie dev, did anyone get a chance to take a look at this
> > >
> > >
> > >
> > >
> > > > On 08-Nov-2013, at 6:08 pm, Shwetha GS <sh...@inmobi.com>
> wrote:
> > > >
> > > > Hi,
> > > >
> > > > We have seen weird issues with CallableQueueService with oozie 3.3.2.
> > We
> > > couldn't root-cause the exact code causing the issue, so not sure if
> its
> > > already fixed in 4.0. Any pointers will be helpful:
> > > >
> > > > Materialisation for a coord just stops.
> CoordMaterializeTriggerService
> > > picks up that coord at every materialisation interval, but
> > > CoordMaterializeTransitionXCommand doesn't get called. Looks like
> > > CoordMaterializeTransitionXCommand is lost somewhere in the queue.
> > Whenever
> > > this issue happens, the number of coords picked up for materialisation
> is
> > > 40-50 and we also see this log:
> > > > oozie.log-2013-11-08-01:2013-11-08 01:00:12,225  WARN
> > > CallableQueueService:542 - USER[-] GROUP[-] max concurrency for
> callable
> > > [#composite#coord_mater] exceeded, requeueing with [500]ms delay
> > > > Restarting oozie resumes materialization.
> > > >
> > > > Looks like materialisation batch size is 50, and in callable queue
> > > service, composite callable batch size is set to 10, and max
> concurrency
> > is
> > > 3. So, when there are more than 30 coords picked for materialisation,
> the
> > > 4th/5th batch of coords is magically lost somewhere. Code looks fine
> and
> > > don't know where the leak is.
> > > >
> > > > Tried re-producing this in local machine by tuning these configs, but
> > > couldn't get anything
> > > >
> > > > Thanks,
> > > > Shwetha
> > > >
> > > >
> > >
> > > --
> > > _____________________________________________________________
> > > The information contained in this communication is intended solely for
> > the
> > > use of the individual or entity to whom it is addressed and others
> > > authorized to receive it. It may contain confidential or legally
> > privileged
> > > information. If you are not the intended recipient you are hereby
> > notified
> > > that any disclosure, copying, distribution or taking any action in
> > reliance
> > > on the contents of this information is strictly prohibited and may be
> > > unlawful. If you have received this communication in error, please
> notify
> > > us immediately by responding to this email and then delete it from your
> > > system. The firm is neither liable for the proper and complete
> > transmission
> > > of the information contained in this communication nor for any delay in
> > its
> > > receipt.
>
> >
> >
> > --
> > _____________________________________________________________
> > The information contained in this communication is intended solely for
> the
> > use of the individual or entity to whom it is addressed and others
> > authorized to receive it. It may contain confidential or legally
> privileged
> > information. If you are not the intended recipient you are hereby
> notified
> > that any disclosure, copying, distribution or taking any action in
> reliance
> > on the contents of this information is strictly prohibited and may be
> > unlawful. If you have received this communication in error, please notify
> > us immediately by responding to this email and then delete it from your
> > system. The firm is neither liable for the proper and complete
> transmission
> > of the information contained in this communication nor for any delay in
> its
> > receipt.
> >
>
> --
> _____________________________________________________________
> The information contained in this communication is intended solely for the
> use of the individual or entity to whom it is addressed and others
> authorized to receive it. It may contain confidential or legally privileged
> information. If you are not the intended recipient you are hereby notified
> that any disclosure, copying, distribution or taking any action in reliance
> on the contents of this information is strictly prohibited and may be
> unlawful. If you have received this communication in error, please notify
> us immediately by responding to this email and then delete it from your
> system. The firm is neither liable for the proper and complete transmission
> of the information contained in this communication nor for any delay in its
> receipt.
>

-- 
_____________________________________________________________
The information contained in this communication is intended solely for the 
use of the individual or entity to whom it is addressed and others 
authorized to receive it. It may contain confidential or legally privileged 
information. If you are not the intended recipient you are hereby notified 
that any disclosure, copying, distribution or taking any action in reliance 
on the contents of this information is strictly prohibited and may be 
unlawful. If you have received this communication in error, please notify 
us immediately by responding to this email and then delete it from your 
system. The firm is neither liable for the proper and complete transmission 
of the information contained in this communication nor for any delay in its 
receipt.

Re: Issue with CallableQueueService

Posted by Mohammad Islam <mi...@yahoo.com>.
Hi Swetha,
I think I didn't understand the problem correctly.

1. Is coordinator materialization taking very long?
In this case, throttle and concurrency will be the way. And it is not a bug.

2. OR it is not materializing anything?
If it doesn't materialized any action that is an issue and potential bug. If it materialized few and then stop forever, than it could be an issue too. In the second  case, how many actions are in WAITING state? 

Regards,
Mohammad





On Sunday, November 10, 2013 11:16 PM, Shwetha GS <sh...@inmobi.com> wrote:
 
Yes, we can increase the number of threads. We already change the throttle
for coords depending on coord  frequency.

I agree that we can tune it better. But it will probably just make the
issue less frequent. I was wondering where the actual issue is.


On Mon, Nov 11, 2013 at 12:40 PM, Mohammad Islam <mi...@yahoo.com> wrote:

> I can see these two props could play some role.
> >>1. oozie.service.CallableQueueService.threads - 30 (We can probably
> increase this)
>
> Definitely increasing this will be helpful. Try to use 100 (for example).
> What is the Oozie server JVM -xmx value?
> It could allow you to increase callable concurrency further.
>
>
> >>2. oozie.service.coord.default.throttle - default, 12
> It means there could be at most 12 coordinator actions be in WAITING state
> for ONE job. You can override this for specific job through its control
> section (name is "throttle") of coordinator xml.
>
> Regards,
> Mohammad
>
>
>
> On Sunday, November 10, 2013 9:48 PM, Shwetha GS <sh...@inmobi.com>
> wrote:
>
> Hi Mohammad,
>
> Thanks for checking this.
>
> There was no other related warning about queueing. The default value of
> CoordMaterializeTriggerService.materialization.system.limit is 50
>     private static final int CONF_MATERIALIZATION_SYSTEM_LIMIT_DEFAULT =
> 50;
>                 int materializationLimit = Services.get().getConf()
>                         .getInt(CONF_MATERIALIZATION_SYSTEM_LIMIT,
> CONF_MATERIALIZATION_SYSTEM_LIMIT_DEFAULT);
>
>
> Since increasing the concurrency also affects other commands, we decreased
> the coord materialization batch size to 30.
>
> oozie.service.CallableQueueService.queue.size - 10000
> oozie.service.CallableQueueService.threads - 30 (We can probably increase
> this)
> oozie.service.coord.default.throttle - default, 12
>
> Looks like some edge case. Let me know if you need more info
>
> -Shwetha
>
> On Mon, Nov 11, 2013 at 9:50 AM, Mohammad Islam <mi...@yahoo.com>
> wrote:
>
> > Hi Swetha,
> > Was there any other warning message or requeuing message happened (pls
> > check ooze.log)?
> > Why the materialization batch size is 50? default is 10. right?
> > Did you try to increase the concurrency value to 10 (for example)?
> >
> >
> > What is the value "queue.size"? It should show if the system was heavily
> > loaded.
> >
> > What are the values for these props:
> > oozie.service.CallableQueueService.queue.size
> >
> > oozie.service.CallableQueueService.threads
> >
> > oozie.service.coord.default.throttle
> >
> >
> > Regards,
> > Mohammad
> >
> >
> >
> >
> > On Friday, November 8, 2013 6:42 PM, Shwetha GS <sh...@inmobi.com>
> > wrote:
> >
> > Oozie dev, did anyone get a chance to take a look at this
> >
> >
> >
> >
> > > On 08-Nov-2013, at 6:08 pm, Shwetha GS <sh...@inmobi.com> wrote:
> > >
> > > Hi,
> > >
> > > We have seen weird issues with CallableQueueService with oozie 3.3.2.
> We
> > couldn't root-cause the exact code causing the issue, so not sure if its
> > already fixed in 4.0. Any pointers will be helpful:
> > >
> > > Materialisation for a coord just stops. CoordMaterializeTriggerService
> > picks up that coord at every materialisation interval, but
> > CoordMaterializeTransitionXCommand doesn't get called. Looks like
> > CoordMaterializeTransitionXCommand is lost somewhere in the queue.
> Whenever
> > this issue happens, the number of coords picked up for materialisation is
> > 40-50 and we also see this log:
> > > oozie.log-2013-11-08-01:2013-11-08 01:00:12,225  WARN
> > CallableQueueService:542 - USER[-] GROUP[-] max concurrency for callable
> > [#composite#coord_mater] exceeded, requeueing with [500]ms delay
> > > Restarting oozie resumes materialization.
> > >
> > > Looks like materialisation batch size is 50, and in callable queue
> > service, composite callable batch size is set to 10, and max concurrency
> is
> > 3. So, when there are more than 30 coords picked for materialisation, the
> > 4th/5th batch of coords is magically lost somewhere. Code looks fine and
> > don't know where the leak is.
> > >
> > > Tried re-producing this in local machine by tuning these configs, but
> > couldn't get anything
> > >
> > > Thanks,
> > > Shwetha
> > >
> > >
> >
> > --
> > _____________________________________________________________
> > The information contained in this communication is intended solely for
> the
> > use of the individual or entity to whom it is addressed and others
> > authorized to receive it. It may contain confidential or legally
> privileged
> > information. If you are not the intended recipient you are hereby
> notified
> > that any disclosure, copying, distribution or taking any action in
> reliance
> > on the contents of this information is strictly prohibited and may be
> > unlawful. If you have received this communication in error, please notify
> > us immediately by responding to this email and then delete it from your
> > system. The firm is neither liable for the proper and complete
> transmission
> > of the information contained in this communication nor for any delay in
> its
> > receipt.

>
>
> --
> _____________________________________________________________
> The information contained in this communication is intended solely for the
> use of the individual or entity to whom it is addressed and others
> authorized to receive it. It may contain confidential or legally privileged
> information. If you are not the intended recipient you are hereby notified
> that any disclosure, copying, distribution or taking any action in reliance
> on the contents of this information is strictly prohibited and may be
> unlawful. If you have received this communication in error, please notify
> us immediately by responding to this email and then delete it from your
> system. The firm is neither liable for the proper and complete transmission
> of the information contained in this communication nor for any delay in its
> receipt.
>

-- 
_____________________________________________________________
The information contained in this communication is intended solely for the 
use of the individual or entity to whom it is addressed and others 
authorized to receive it. It may contain confidential or legally privileged 
information. If you are not the intended recipient you are hereby notified 
that any disclosure, copying, distribution or taking any action in reliance 
on the contents of this information is strictly prohibited and may be 
unlawful. If you have received this communication in error, please notify 
us immediately by responding to this email and then delete it from your 
system. The firm is neither liable for the proper and complete transmission 
of the information contained in this communication nor for any delay in its 
receipt.

Re: Issue with CallableQueueService

Posted by Shwetha GS <sh...@inmobi.com>.
Yes, we can increase the number of threads. We already change the throttle
for coords depending on coord  frequency.

I agree that we can tune it better. But it will probably just make the
issue less frequent. I was wondering where the actual issue is.


On Mon, Nov 11, 2013 at 12:40 PM, Mohammad Islam <mi...@yahoo.com> wrote:

> I can see these two props could play some role.
> >>1. oozie.service.CallableQueueService.threads - 30 (We can probably
> increase this)
>
> Definitely increasing this will be helpful. Try to use 100 (for example).
> What is the Oozie server JVM -xmx value?
> It could allow you to increase callable concurrency further.
>
>
> >>2. oozie.service.coord.default.throttle - default, 12
> It means there could be at most 12 coordinator actions be in WAITING state
> for ONE job. You can override this for specific job through its control
> section (name is "throttle") of coordinator xml.
>
> Regards,
> Mohammad
>
>
>
> On Sunday, November 10, 2013 9:48 PM, Shwetha GS <sh...@inmobi.com>
> wrote:
>
> Hi Mohammad,
>
> Thanks for checking this.
>
> There was no other related warning about queueing. The default value of
> CoordMaterializeTriggerService.materialization.system.limit is 50
>     private static final int CONF_MATERIALIZATION_SYSTEM_LIMIT_DEFAULT =
> 50;
>                 int materializationLimit = Services.get().getConf()
>                         .getInt(CONF_MATERIALIZATION_SYSTEM_LIMIT,
> CONF_MATERIALIZATION_SYSTEM_LIMIT_DEFAULT);
>
>
> Since increasing the concurrency also affects other commands, we decreased
> the coord materialization batch size to 30.
>
> oozie.service.CallableQueueService.queue.size - 10000
> oozie.service.CallableQueueService.threads - 30 (We can probably increase
> this)
> oozie.service.coord.default.throttle - default, 12
>
> Looks like some edge case. Let me know if you need more info
>
> -Shwetha
>
> On Mon, Nov 11, 2013 at 9:50 AM, Mohammad Islam <mi...@yahoo.com>
> wrote:
>
> > Hi Swetha,
> > Was there any other warning message or requeuing message happened (pls
> > check ooze.log)?
> > Why the materialization batch size is 50? default is 10. right?
> > Did you try to increase the concurrency value to 10 (for example)?
> >
> >
> > What is the value "queue.size"? It should show if the system was heavily
> > loaded.
> >
> > What are the values for these props:
> > oozie.service.CallableQueueService.queue.size
> >
> > oozie.service.CallableQueueService.threads
> >
> > oozie.service.coord.default.throttle
> >
> >
> > Regards,
> > Mohammad
> >
> >
> >
> >
> > On Friday, November 8, 2013 6:42 PM, Shwetha GS <sh...@inmobi.com>
> > wrote:
> >
> > Oozie dev, did anyone get a chance to take a look at this
> >
> >
> >
> >
> > > On 08-Nov-2013, at 6:08 pm, Shwetha GS <sh...@inmobi.com> wrote:
> > >
> > > Hi,
> > >
> > > We have seen weird issues with CallableQueueService with oozie 3.3.2.
> We
> > couldn't root-cause the exact code causing the issue, so not sure if its
> > already fixed in 4.0. Any pointers will be helpful:
> > >
> > > Materialisation for a coord just stops. CoordMaterializeTriggerService
> > picks up that coord at every materialisation interval, but
> > CoordMaterializeTransitionXCommand doesn't get called. Looks like
> > CoordMaterializeTransitionXCommand is lost somewhere in the queue.
> Whenever
> > this issue happens, the number of coords picked up for materialisation is
> > 40-50 and we also see this log:
> > > oozie.log-2013-11-08-01:2013-11-08 01:00:12,225  WARN
> > CallableQueueService:542 - USER[-] GROUP[-] max concurrency for callable
> > [#composite#coord_mater] exceeded, requeueing with [500]ms delay
> > > Restarting oozie resumes materialization.
> > >
> > > Looks like materialisation batch size is 50, and in callable queue
> > service, composite callable batch size is set to 10, and max concurrency
> is
> > 3. So, when there are more than 30 coords picked for materialisation, the
> > 4th/5th batch of coords is magically lost somewhere. Code looks fine and
> > don't know where the leak is.
> > >
> > > Tried re-producing this in local machine by tuning these configs, but
> > couldn't get anything
> > >
> > > Thanks,
> > > Shwetha
> > >
> > >
> >
> > --
> > _____________________________________________________________
> > The information contained in this communication is intended solely for
> the
> > use of the individual or entity to whom it is addressed and others
> > authorized to receive it. It may contain confidential or legally
> privileged
> > information. If you are not the intended recipient you are hereby
> notified
> > that any disclosure, copying, distribution or taking any action in
> reliance
> > on the contents of this information is strictly prohibited and may be
> > unlawful. If you have received this communication in error, please notify
> > us immediately by responding to this email and then delete it from your
> > system. The firm is neither liable for the proper and complete
> transmission
> > of the information contained in this communication nor for any delay in
> its
> > receipt.
>
>
> --
> _____________________________________________________________
> The information contained in this communication is intended solely for the
> use of the individual or entity to whom it is addressed and others
> authorized to receive it. It may contain confidential or legally privileged
> information. If you are not the intended recipient you are hereby notified
> that any disclosure, copying, distribution or taking any action in reliance
> on the contents of this information is strictly prohibited and may be
> unlawful. If you have received this communication in error, please notify
> us immediately by responding to this email and then delete it from your
> system. The firm is neither liable for the proper and complete transmission
> of the information contained in this communication nor for any delay in its
> receipt.
>

-- 
_____________________________________________________________
The information contained in this communication is intended solely for the 
use of the individual or entity to whom it is addressed and others 
authorized to receive it. It may contain confidential or legally privileged 
information. If you are not the intended recipient you are hereby notified 
that any disclosure, copying, distribution or taking any action in reliance 
on the contents of this information is strictly prohibited and may be 
unlawful. If you have received this communication in error, please notify 
us immediately by responding to this email and then delete it from your 
system. The firm is neither liable for the proper and complete transmission 
of the information contained in this communication nor for any delay in its 
receipt.

Re: Issue with CallableQueueService

Posted by Mohammad Islam <mi...@yahoo.com>.
I can see these two props could play some role.  
>>1. oozie.service.CallableQueueService.threads - 30 (We can probably increase this)

Definitely increasing this will be helpful. Try to use 100 (for example). What is the Oozie server JVM -xmx value?
It could allow you to increase callable concurrency further.


>>2. oozie.service.coord.default.throttle - default, 12
It means there could be at most 12 coordinator actions be in WAITING state for ONE job. You can override this for specific job through its control section (name is "throttle") of coordinator xml.

Regards,
Mohammad 



On Sunday, November 10, 2013 9:48 PM, Shwetha GS <sh...@inmobi.com> wrote:
 
Hi Mohammad,

Thanks for checking this.

There was no other related warning about queueing. The default value of
CoordMaterializeTriggerService.materialization.system.limit is 50
    private static final int CONF_MATERIALIZATION_SYSTEM_LIMIT_DEFAULT = 50;
                int materializationLimit = Services.get().getConf()
                        .getInt(CONF_MATERIALIZATION_SYSTEM_LIMIT,
CONF_MATERIALIZATION_SYSTEM_LIMIT_DEFAULT);


Since increasing the concurrency also affects other commands, we decreased
the coord materialization batch size to 30.

oozie.service.CallableQueueService.queue.size - 10000
oozie.service.CallableQueueService.threads - 30 (We can probably increase
this)
oozie.service.coord.default.throttle - default, 12

Looks like some edge case. Let me know if you need more info

-Shwetha

On Mon, Nov 11, 2013 at 9:50 AM, Mohammad Islam <mi...@yahoo.com> wrote:

> Hi Swetha,
> Was there any other warning message or requeuing message happened (pls
> check ooze.log)?
> Why the materialization batch size is 50? default is 10. right?
> Did you try to increase the concurrency value to 10 (for example)?
>
>
> What is the value "queue.size"? It should show if the system was heavily
> loaded.
>
> What are the values for these props:
> oozie.service.CallableQueueService.queue.size
>
> oozie.service.CallableQueueService.threads
>
> oozie.service.coord.default.throttle
>
>
> Regards,
> Mohammad
>
>
>
>
> On Friday, November 8, 2013 6:42 PM, Shwetha GS <sh...@inmobi.com>
> wrote:
>
> Oozie dev, did anyone get a chance to take a look at this
>
>
>
>
> > On 08-Nov-2013, at 6:08 pm, Shwetha GS <sh...@inmobi.com> wrote:
> >
> > Hi,
> >
> > We have seen weird issues with CallableQueueService with oozie 3.3.2. We
> couldn't root-cause the exact code causing the issue, so not sure if its
> already fixed in 4.0. Any pointers will be helpful:
> >
> > Materialisation for a coord just stops. CoordMaterializeTriggerService
> picks up that coord at every materialisation interval, but
> CoordMaterializeTransitionXCommand doesn't get called. Looks like
> CoordMaterializeTransitionXCommand is lost somewhere in the queue. Whenever
> this issue happens, the number of coords picked up for materialisation is
> 40-50 and we also see this log:
> > oozie.log-2013-11-08-01:2013-11-08 01:00:12,225  WARN
> CallableQueueService:542 - USER[-] GROUP[-] max concurrency for callable
> [#composite#coord_mater] exceeded, requeueing with [500]ms delay
> > Restarting oozie resumes materialization.
> >
> > Looks like materialisation batch size is 50, and in callable queue
> service, composite callable batch size is set to 10, and max concurrency is
> 3. So, when there are more than 30 coords picked for materialisation, the
> 4th/5th batch of coords is magically lost somewhere. Code looks fine and
> don't know where the leak is.
> >
> > Tried re-producing this in local machine by tuning these configs, but
> couldn't get anything
> >
> > Thanks,
> > Shwetha
> >
> >
>
> --
> _____________________________________________________________
> The information contained in this communication is intended solely for the
> use of the individual or entity to whom it is addressed and others
> authorized to receive it. It may contain confidential or legally privileged
> information. If you are not the intended recipient you are hereby notified
> that any disclosure, copying, distribution or taking any action in reliance
> on the contents of this information is strictly prohibited and may be
> unlawful. If you have received this communication in error, please notify
> us immediately by responding to this email and then delete it from your
> system. The firm is neither liable for the proper and complete transmission
> of the information contained in this communication nor for any delay in its
> receipt.


-- 
_____________________________________________________________
The information contained in this communication is intended solely for the 
use of the individual or entity to whom it is addressed and others 
authorized to receive it. It may contain confidential or legally privileged 
information. If you are not the intended recipient you are hereby notified 
that any disclosure, copying, distribution or taking any action in reliance 
on the contents of this information is strictly prohibited and may be 
unlawful. If you have received this communication in error, please notify 
us immediately by responding to this email and then delete it from your 
system. The firm is neither liable for the proper and complete transmission 
of the information contained in this communication nor for any delay in its 
receipt.

Re: Issue with CallableQueueService

Posted by Shwetha GS <sh...@inmobi.com>.
Hi Mohammad,

Thanks for checking this.

There was no other related warning about queueing. The default value of
CoordMaterializeTriggerService.materialization.system.limit is 50
    private static final int CONF_MATERIALIZATION_SYSTEM_LIMIT_DEFAULT = 50;
                int materializationLimit = Services.get().getConf()
                        .getInt(CONF_MATERIALIZATION_SYSTEM_LIMIT,
CONF_MATERIALIZATION_SYSTEM_LIMIT_DEFAULT);


Since increasing the concurrency also affects other commands, we decreased
the coord materialization batch size to 30.

oozie.service.CallableQueueService.queue.size - 10000
oozie.service.CallableQueueService.threads - 30 (We can probably increase
this)
oozie.service.coord.default.throttle - default, 12

Looks like some edge case. Let me know if you need more info

-Shwetha

On Mon, Nov 11, 2013 at 9:50 AM, Mohammad Islam <mi...@yahoo.com> wrote:

> Hi Swetha,
> Was there any other warning message or requeuing message happened (pls
> check ooze.log)?
> Why the materialization batch size is 50? default is 10. right?
> Did you try to increase the concurrency value to 10 (for example)?
>
>
> What is the value "queue.size"? It should show if the system was heavily
> loaded.
>
> What are the values for these props:
> oozie.service.CallableQueueService.queue.size
>
> oozie.service.CallableQueueService.threads
>
> oozie.service.coord.default.throttle
>
>
> Regards,
> Mohammad
>
>
>
>
> On Friday, November 8, 2013 6:42 PM, Shwetha GS <sh...@inmobi.com>
> wrote:
>
> Oozie dev, did anyone get a chance to take a look at this
>
>
>
>
> > On 08-Nov-2013, at 6:08 pm, Shwetha GS <sh...@inmobi.com> wrote:
> >
> > Hi,
> >
> > We have seen weird issues with CallableQueueService with oozie 3.3.2. We
> couldn't root-cause the exact code causing the issue, so not sure if its
> already fixed in 4.0. Any pointers will be helpful:
> >
> > Materialisation for a coord just stops. CoordMaterializeTriggerService
> picks up that coord at every materialisation interval, but
> CoordMaterializeTransitionXCommand doesn't get called. Looks like
> CoordMaterializeTransitionXCommand is lost somewhere in the queue. Whenever
> this issue happens, the number of coords picked up for materialisation is
> 40-50 and we also see this log:
> > oozie.log-2013-11-08-01:2013-11-08 01:00:12,225  WARN
> CallableQueueService:542 - USER[-] GROUP[-] max concurrency for callable
> [#composite#coord_mater] exceeded, requeueing with [500]ms delay
> > Restarting oozie resumes materialization.
> >
> > Looks like materialisation batch size is 50, and in callable queue
> service, composite callable batch size is set to 10, and max concurrency is
> 3. So, when there are more than 30 coords picked for materialisation, the
> 4th/5th batch of coords is magically lost somewhere. Code looks fine and
> don't know where the leak is.
> >
> > Tried re-producing this in local machine by tuning these configs, but
> couldn't get anything
> >
> > Thanks,
> > Shwetha
> >
> >
>
> --
> _____________________________________________________________
> The information contained in this communication is intended solely for the
> use of the individual or entity to whom it is addressed and others
> authorized to receive it. It may contain confidential or legally privileged
> information. If you are not the intended recipient you are hereby notified
> that any disclosure, copying, distribution or taking any action in reliance
> on the contents of this information is strictly prohibited and may be
> unlawful. If you have received this communication in error, please notify
> us immediately by responding to this email and then delete it from your
> system. The firm is neither liable for the proper and complete transmission
> of the information contained in this communication nor for any delay in its
> receipt.

-- 
_____________________________________________________________
The information contained in this communication is intended solely for the 
use of the individual or entity to whom it is addressed and others 
authorized to receive it. It may contain confidential or legally privileged 
information. If you are not the intended recipient you are hereby notified 
that any disclosure, copying, distribution or taking any action in reliance 
on the contents of this information is strictly prohibited and may be 
unlawful. If you have received this communication in error, please notify 
us immediately by responding to this email and then delete it from your 
system. The firm is neither liable for the proper and complete transmission 
of the information contained in this communication nor for any delay in its 
receipt.

Re: Issue with CallableQueueService

Posted by Mohammad Islam <mi...@yahoo.com>.
Hi Swetha,
Was there any other warning message or requeuing message happened (pls check ooze.log)?
Why the materialization batch size is 50? default is 10. right?
Did you try to increase the concurrency value to 10 (for example)?


What is the value "queue.size"? It should show if the system was heavily loaded.

What are the values for these props:
oozie.service.CallableQueueService.queue.size

oozie.service.CallableQueueService.threads

oozie.service.coord.default.throttle


Regards,
Mohammad




On Friday, November 8, 2013 6:42 PM, Shwetha GS <sh...@inmobi.com> wrote:
 
Oozie dev, did anyone get a chance to take a look at this




> On 08-Nov-2013, at 6:08 pm, Shwetha GS <sh...@inmobi.com> wrote:
>
> Hi,
>
> We have seen weird issues with CallableQueueService with oozie 3.3.2. We couldn't root-cause the exact code causing the issue, so not sure if its already fixed in 4.0. Any pointers will be helpful:
>
> Materialisation for a coord just stops. CoordMaterializeTriggerService picks up that coord at every materialisation interval, but CoordMaterializeTransitionXCommand doesn't get called. Looks like CoordMaterializeTransitionXCommand is lost somewhere in the queue. Whenever this issue happens, the number of coords picked up for materialisation is 40-50 and we also see this log:
> oozie.log-2013-11-08-01:2013-11-08 01:00:12,225  WARN CallableQueueService:542 - USER[-] GROUP[-] max concurrency for callable [#composite#coord_mater] exceeded, requeueing with [500]ms delay
> Restarting oozie resumes materialization.
>
> Looks like materialisation batch size is 50, and in callable queue service, composite callable batch size is set to 10, and max concurrency is 3. So, when there are more than 30 coords picked for materialisation, the 4th/5th batch of coords is magically lost somewhere. Code looks fine and don't know where the leak is.
>
> Tried re-producing this in local machine by tuning these configs, but couldn't get anything
>
> Thanks,
> Shwetha
>
>

-- 
_____________________________________________________________
The information contained in this communication is intended solely for the 
use of the individual or entity to whom it is addressed and others 
authorized to receive it. It may contain confidential or legally privileged 
information. If you are not the intended recipient you are hereby notified 
that any disclosure, copying, distribution or taking any action in reliance 
on the contents of this information is strictly prohibited and may be 
unlawful. If you have received this communication in error, please notify 
us immediately by responding to this email and then delete it from your 
system. The firm is neither liable for the proper and complete transmission 
of the information contained in this communication nor for any delay in its 
receipt.

Re: Issue with CallableQueueService

Posted by Shwetha GS <sh...@inmobi.com>.
Oozie dev, did anyone get a chance to take a look at this



> On 08-Nov-2013, at 6:08 pm, Shwetha GS <sh...@inmobi.com> wrote:
>
> Hi,
>
> We have seen weird issues with CallableQueueService with oozie 3.3.2. We couldn't root-cause the exact code causing the issue, so not sure if its already fixed in 4.0. Any pointers will be helpful:
>
> Materialisation for a coord just stops. CoordMaterializeTriggerService picks up that coord at every materialisation interval, but CoordMaterializeTransitionXCommand doesn't get called. Looks like CoordMaterializeTransitionXCommand is lost somewhere in the queue. Whenever this issue happens, the number of coords picked up for materialisation is 40-50 and we also see this log:
> oozie.log-2013-11-08-01:2013-11-08 01:00:12,225  WARN CallableQueueService:542 - USER[-] GROUP[-] max concurrency for callable [#composite#coord_mater] exceeded, requeueing with [500]ms delay
> Restarting oozie resumes materialization.
>
> Looks like materialisation batch size is 50, and in callable queue service, composite callable batch size is set to 10, and max concurrency is 3. So, when there are more than 30 coords picked for materialisation, the 4th/5th batch of coords is magically lost somewhere. Code looks fine and don't know where the leak is.
>
> Tried re-producing this in local machine by tuning these configs, but couldn't get anything
>
> Thanks,
> Shwetha
>
>

-- 
_____________________________________________________________
The information contained in this communication is intended solely for the 
use of the individual or entity to whom it is addressed and others 
authorized to receive it. It may contain confidential or legally privileged 
information. If you are not the intended recipient you are hereby notified 
that any disclosure, copying, distribution or taking any action in reliance 
on the contents of this information is strictly prohibited and may be 
unlawful. If you have received this communication in error, please notify 
us immediately by responding to this email and then delete it from your 
system. The firm is neither liable for the proper and complete transmission 
of the information contained in this communication nor for any delay in its 
receipt.