You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@oozie.apache.org by Denis Yuen <De...@oicr.on.ca> on 2013/12/18 20:45:49 UTC

Oozie workflow size

Hi,

Are there any good resources or does anyone have experience regarding running workflows with a very large number of actions?

We're currently using an Oozie install allocated with 4GB of memory connected to a postgres database and we're successfully running workflows with hundreds of actions. However, we're having trouble scaling up to workflows that contain tens of thousands of actions. For example, errors like "E0603: SQL error in operation, Ran out of memory retrieving query results" or "E0603: SQL error in operation, An I/O error occured while sending to the backend" occur in the Oozie logs, but we also see other symptoms like the Oozie console becoming very slow and unresponsive.

What are the typical and maximum workflow sizes that people have seen? Both in terms of total number of actions in a workflow or the maximum number of actions after a fork in a workflow would be useful.

I want to get an idea of whether we're even in the ballpark so that its worthwhile looking at tuning the various configuration settings for Oozie or whether we're simply too far out to be reasonable.

Thanks!

-- Denis

RE: Oozie workflow size

Posted by Denis Yuen <De...@oicr.on.ca>.
Hi,

Thanks for the heads-up!

For total number of actions in a single workflow, what is the largest number of actions that have been run reliably? 
Also, could you elaborate more on how the internal queue (default 10,000) limits the number of total actions in a workflow? (i.e. is this a hard limit, should workflows with more actions than the size of the internal queue completely fail, work in an unreliable manner, etc?)

To elaborate on our side, when running single workflows in the range of 10,000 to 30,000 actions, the out of memory errors originate from a number of different places but usually related to what looks like the de-serialization of the wf_instance (workflow instance). It looks like each action needs to de-serialize the (in this case, very large) workflow. Would it be correct to say that this would be dominating factor in terms of memory consumption for larger workflows (or is there a workaround and/or other factors dominate)?

I've attached example stack traces below in case anyone is curious. 

Stack traces:

Example 1:

2013-12-18 21:20:45,544 ERROR JobXCommand:536 - USER[-] GROUP[-] TOKEN[-] APP[-] JOB[-] ACTION[-] XException,
org.apache.oozie.command.CommandException: E0603: SQL error in operation, Ran out of memory retrieving query results. {prepstmnt 287634419 SELECT t0.id, t0.bean_type, t0.app_name, t0.app_path, t0.conf, t0.group_name, t0.parent_id, t0.run, t0.user_name, t0.auth_token, t0.created_time, t0.end_time, t0.external_id, t0.last_modified_time, t0.log_token, t0.proto_action_conf, t0.sla_xml, t0.start_time, t0.status, t0.wf_instance FROM WF_JOBS t0 WHERE (t0.id = ?) AND t0.bean_type = ?} [code=0, state=53200]
        at org.apache.oozie.command.wf.JobXCommand.execute(JobXCommand.java:71)
        at org.apache.oozie.command.wf.JobXCommand.execute(JobXCommand.java:33)
        at org.apache.oozie.command.XCommand.call(XCommand.java:277)
        at org.apache.oozie.DagEngine.getJob(DagEngine.java:331)
        at org.apache.oozie.servlet.V1JobServlet.getWorkflowJob(V1JobServlet.java:679)
        at org.apache.oozie.servlet.V1JobServlet.getJob(V1JobServlet.java:212)
        at org.apache.oozie.servlet.BaseJobServlet.doGet(BaseJobServlet.java:228)
...
Caused by: org.apache.oozie.executor.jpa.JPAExecutorException: E0603: SQL error in operation, Ran out of memory retrieving query results. {prepstmnt 287634419 SELECT t0.id, t0.bean_type, t0.app_name, t0.app_path, t0.conf, t0.group_name, t0.parent_id, t0.run, t0.user_name, t0.auth_token, t0.created_time, t0.end_time, t0.external_id, t0.last_modified_time, t0.log_token, t0.proto_action_conf, t0.sla_xml, t0.start_time, t0.status, t0.wf_instance FROM WF_JOBS t0 WHERE (t0.id = ?) AND t0.bean_type = ?} [code=0, state=53200]
        at org.apache.oozie.executor.jpa.WorkflowJobGetJPAExecutor.execute(WorkflowJobGetJPAExecutor.java:62)
        at org.apache.oozie.executor.jpa.WorkflowJobGetJPAExecutor.execute(WorkflowJobGetJPAExecutor.java:32)
        at org.apache.oozie.service.JPAService.execute(JPAService.java:212)
        at org.apache.oozie.executor.jpa.WorkflowInfoWithActionsSubsetGetJPAExecutor.execute(WorkflowInfoWithActionsSubsetGetJPAExecutor.java:65)
        at org.apache.oozie.executor.jpa.WorkflowInfoWithActionsSubsetGetJPAExecutor.execute(WorkflowInfoWithActionsSubsetGetJPAExecutor.java:35)
        at org.apache.oozie.service.JPAService.execute(JPAService.java:212)
        at org.apache.oozie.command.wf.JobXCommand.execute(JobXCommand.java:62)
        ... 29 more
Caused by: <openjpa-2.2.2-r422266:1468616 fatal general error> org.apache.openjpa.persistence.PersistenceException: Ran out of memory retrieving query results. {prepstmnt 287634419 SELECT t0.id, t0.bean_type, t0.app_name, t0.app_path, t0.conf, t0.group_name, t0.parent_id, t0.run, t0.user_name, t0.auth_token, t0.created_time, t0.end_time, t0.external_id, t0.last_modified_time, t0.log_token, t0.proto_action_conf, t0.sla_xml, t0.start_time, t0.status, t0.wf_instance FROM WF_JOBS t0 WHERE (t0.id = ?) AND t0.bean_type = ?} [code=0, state=53200]
        at org.apache.openjpa.jdbc.sql.DBDictionary.narrow(DBDictionary.java:4962)
        at org.apache.openjpa.jdbc.sql.DBDictionary.newStoreException(DBDictionary.java:4922)
        at org.apache.openjpa.jdbc.sql.SQLExceptions.getStore(SQLExceptions.java:136)
        at org.apache.openjpa.jdbc.sql.SQLExceptions.getStore(SQLExceptions.java:110)
        at org.apache.openjpa.jdbc.sql.SQLExceptions.getStore(SQLExceptions.java:62)

Example 2:

2013-12-18 21:25:39,797 ERROR ActionCheckXCommand:536 - USER[-] GROUP[-] TOKEN[-] APP[-] JOB[-] ACTION[-] XException,
org.apache.oozie.command.CommandException: E0603: SQL error in operation, Ran out of memory retrieving query results. {prepstmnt 906542123 SELECT t0.id, t0.bean_type, t0.app_name, t0.app_path, t0.conf, t0.group_name, t0.parent_id, t0.run, t0.user_name, t0.auth_token, t0.created_time, t0.end_time, t0.external_id, t0.last_modified_time, t0.log_token, t0.proto_action_conf, t0.sla_xml, t0.start_time, t0.status, t0.wf_instance FROM WF_JOBS t0 WHERE (t0.id = ?) AND t0.bean_type = ?} [code=0, state=53200]
        at org.apache.oozie.command.wf.ActionCheckXCommand.eagerLoadState(ActionCheckXCommand.java:97)
        at org.apache.oozie.command.XCommand.call(XCommand.java:246)
        at org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:326)
        at org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:255)
        at org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:175)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:662)
Caused by: org.apache.oozie.executor.jpa.JPAExecutorException: E0603: SQL error in operation, Ran out of memory retrieving query results. {prepstmnt 906542123 SELECT t0.id, t0.bean_type, t0.app_name, t0.app_path, t0.conf, t0.group_name, t0.parent_id, t0.run, t0.user_name, t0.auth_token, t0.created_time, t0.end_time, t0.external_id, t0.last_modified_time, t0.log_token, t0.proto_action_conf, t0.sla_xml, t0.start_time, t0.status, t0.wf_instance FROM WF_JOBS t0 WHERE (t0.id = ?) AND t0.bean_type = ?} [code=0, state=53200]
        at org.apache.oozie.executor.jpa.WorkflowJobGetJPAExecutor.execute(WorkflowJobGetJPAExecutor.java:62)
        at org.apache.oozie.executor.jpa.WorkflowJobGetJPAExecutor.execute(WorkflowJobGetJPAExecutor.java:32)
        at org.apache.oozie.service.JPAService.execute(JPAService.java:212)
        at org.apache.oozie.command.wf.ActionCheckXCommand.eagerLoadState(ActionCheckXCommand.java:87)
        ... 7 more
Caused by: <openjpa-2.2.2-r422266:1468616 fatal general error> org.apache.openjpa.persistence.PersistenceException: Ran out of memory retrieving query results. {prepstmnt 906542123 SELECT t0.id, t0.bean_type, t0.app_name, t0.app_path, t0.conf, t0.group_name, t0.parent_id, t0.run, t0.user_name, t0.auth_token, t0.created_time, t0.end_time, t0.external_id, t0.last_modified_time, t0.log_token, t0.proto_action_conf, t0.sla_xml, t0.start_time, t0.status, t0.wf_instance FROM WF_JOBS t0 WHERE (t0.id = ?) AND t0.bean_type = ?} [code=0, state=53200]
        at org.apache.openjpa.jdbc.sql.DBDictionary.narrow(DBDictionary.java:4962)
        at org.apache.openjpa.jdbc.sql.DBDictionary.newStoreException(DBDictionary.java:4922)
        at org.apache.openjpa.jdbc.sql.SQLExceptions.getStore(SQLExceptions.java:136)

 

 

-- Denis 

________________________________________
From: Mona Chitnis [chitnis@yahoo-inc.com]
Sent: December 23, 2013 4:17 PM
To: user@oozie.apache.org
Subject: Re: Oozie workflow size

Hi Denis,

Oozie can scale for tens of thousands of "overall" workflow actions I.e.
Actions that are executed via multiple workflows and are staggered in
time. For parallel actions in a single workflow, we have come across
around a maximum of 100 forks, but with some slowness to stream the logs.
Total number of actions in a workflow can be >100, limited by the size of
the internal queue (default 10,000) Oozie server maintains to insert and
then process the various commands on those actions. For such a high-scale
application, you can consider running Oozie with 8GB of memory.

Improvements around reducing the memory footprint have been recently added
to trunk and will be available in the next release.

--
Mona

On 12/18/13 11:45 AM, "Denis Yuen" <De...@oicr.on.ca> wrote:

>Hi,
>
>Are there any good resources or does anyone have experience regarding
>running workflows with a very large number of actions?
>
>We're currently using an Oozie install allocated with 4GB of memory
>connected to a postgres database and we're successfully running workflows
>with hundreds of actions. However, we're having trouble scaling up to
>workflows that contain tens of thousands of actions. For example, errors
>like "E0603: SQL error in operation, Ran out of memory retrieving query
>results" or "E0603: SQL error in operation, An I/O error occured while
>sending to the backend" occur in the Oozie logs, but we also see other
>symptoms like the Oozie console becoming very slow and unresponsive.
>
>What are the typical and maximum workflow sizes that people have seen?
>Both in terms of total number of actions in a workflow or the maximum
>number of actions after a fork in a workflow would be useful.
>
>I want to get an idea of whether we're even in the ballpark so that its
>worthwhile looking at tuning the various configuration settings for Oozie
>or whether we're simply too far out to be reasonable.
>
>Thanks!
>
>-- Denis



Re: Oozie workflow size

Posted by Mona Chitnis <ch...@yahoo-inc.com>.
Hi Denis,

Oozie can scale for tens of thousands of "overall" workflow actions I.e.
Actions that are executed via multiple workflows and are staggered in
time. For parallel actions in a single workflow, we have come across
around a maximum of 100 forks, but with some slowness to stream the logs.
Total number of actions in a workflow can be >100, limited by the size of
the internal queue (default 10,000) Oozie server maintains to insert and
then process the various commands on those actions. For such a high-scale
application, you can consider running Oozie with 8GB of memory.

Improvements around reducing the memory footprint have been recently added
to trunk and will be available in the next release.

--
Mona

On 12/18/13 11:45 AM, "Denis Yuen" <De...@oicr.on.ca> wrote:

>Hi,
>
>Are there any good resources or does anyone have experience regarding
>running workflows with a very large number of actions?
>
>We're currently using an Oozie install allocated with 4GB of memory
>connected to a postgres database and we're successfully running workflows
>with hundreds of actions. However, we're having trouble scaling up to
>workflows that contain tens of thousands of actions. For example, errors
>like "E0603: SQL error in operation, Ran out of memory retrieving query
>results" or "E0603: SQL error in operation, An I/O error occured while
>sending to the backend" occur in the Oozie logs, but we also see other
>symptoms like the Oozie console becoming very slow and unresponsive.
>
>What are the typical and maximum workflow sizes that people have seen?
>Both in terms of total number of actions in a workflow or the maximum
>number of actions after a fork in a workflow would be useful.
>
>I want to get an idea of whether we're even in the ballpark so that its
>worthwhile looking at tuning the various configuration settings for Oozie
>or whether we're simply too far out to be reasonable.
>
>Thanks!
>
>-- Denis