You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@oozie.apache.org by Deepika Khera <dk...@lyris.com> on 2012/01/31 19:22:50 UTC

Workflow xml

Hi,

I want to transform my existing map reduce jobs to using Oozie. 2 of my
major use cases for the M/R jobs are:

1. Using TableMapper/TableReducer for jobs reading from hbase/writing to
hbase.
2. Using MultipleOutputs for writing out multiple files.

I am thinking the way to achieve the above two is by configuring these
M/R jobs through a java action. Is there a better or cleaner way to do
this? The second option is to blow out the job configuration that is
done for TableMapper/TableReducer or MultipleOutputs, but that seems a
little dirty and complicated.

Thanks,
Deepika


Re: Killing coordinator jobs

Posted by Mohammad Islam <mi...@yahoo.com>.
Wandering why don't you got the error message when you ask to kill.

Looks like it is a bug. If you think so, please create a JIRA at :
http://issues.apache.org/jira/browse/OOZIE


Regards,
Mohammad


----- Original Message -----
From: Deepika Khera <dk...@lyris.com>
To: oozie-users@incubator.apache.org
Cc: 
Sent: Friday, February 10, 2012 12:56 PM
Subject: Re: Killing coordinator jobs

Thanks Mohammad. I reinstalled oozie and got rid of the coordinator
jobs. So the locking exceptions went away.

Still I was not able to kill jobs though.
Turned out the issue was that though jobs were running as user "oozie" ,
I was trying to kill them with my own user. Once I fixed that , I was
able to kill jobs. 

Regards,
Deepika



On Fri, 2012-02-10 at 02:40 -0800, Mohammad Islam wrote:
> Hi Deepika,
> It is hard to debug this remotely and find the root cause.
> 
> Do you see this exception when you send the Kill command?
> Or this exception is coming on and off.
> You can verify this by running "oozie kill" command and tail on that log.
> 
> one easy and hacky way is to stop oozie service and restart and then kill.
> Another question: what oozie version are you using and what is the DB ?
> Regards,
> Mohammad
> 
> ________________________________
> From: Deepika Khera <dk...@lyris.com>
> To: "oozie-users@incubator.apache.org" <oo...@incubator.apache.org> 
> Sent: Thursday, February 9, 2012 5:54 PM
> Subject: Killing coordinator jobs
> 
> Hi,
> 
> While testing my coordinator.xml I ended up creating a bunch of
> coordinator jobs that are on oozie.
> 
> I am trying to kill them now and am unable to do so.
> 
> I get the error:
> Error: E0607 : E0607: Other error in operation
> [updateCoordinatorAction], An optimistic lock violation was detected
> when flushing object instance "A lock could not be obtained within the
> time requested {prepstmnt 1385108855 UPDATE COORD_ACTIONS SET
> action_number = ?, action_xml = ?, console_url = NULL, created_conf = ?,
> error_code = NULL, error_message = NULL, external_status = NULL,
> missing_dependencies = ?, run_conf = ?, time_out = ?, tracker_uri =
> NULL, job_type = NULL, created_time = ?, external_id = NULL, job_id = ?,
> last_modified_time = ?, nominal_time = ?, sla_xml = NULL, status = ?
> WHERE id IN (SELECT DISTINCT t0.id FROM COORD_ACTIONS t0 WHERE (t0.id
> = ?) AND t0.bean_type = ?) [params=(int) 4, (String) <coordinator-app
> xmlns="uri:oozie:coordinator:0.1" name="clickstream..., (String)
> <configuration>    <property>      <name>oozie.coord.application.pat...,
> (String)
> hdfs://localhost:54310/usr/org/data/clickstream/input/2012/02/10/{...,
> (String) <configuration>    <property>
> <name>oozie.coord.application.pat..., (int) 120, (Timestamp) 2012-02-09
> 17:38:59.035, (String) 0000009-120209131354394-oozie-oozi-C, (Timestamp)
> 2012-02-09 17:49:42.324, (Timestamp) 2012-02-09 17:35:00.0, (String)
> KILLED, (String) 0000009-120209131354394-oozie-oozi-C@4, (String)
> CoordinatorActionBean]} [code=30000, state=40XL1] [java.lang.String]" to
> the data store.  This indicates that the object was concurrently
> modified in another transaction.
> 
> 
> 
> Seems like since all the coordinators are waiting on the same input
> directory, its not letting me kill it.
> Or maybe something else. 
> This is my local cluster and if there is a way to kill all jobs I can
> even do that.
> 
> Would appreciate any pointers on how to resolve this.
> 
> Thanks,
> Deepika

Re: Killing coordinator jobs

Posted by Deepika Khera <dk...@lyris.com>.
Thanks Mohammad. I reinstalled oozie and got rid of the coordinator
jobs. So the locking exceptions went away.

Still I was not able to kill jobs though.
Turned out the issue was that though jobs were running as user "oozie" ,
I was trying to kill them with my own user. Once I fixed that , I was
able to kill jobs. 

Regards,
Deepika



On Fri, 2012-02-10 at 02:40 -0800, Mohammad Islam wrote:
> Hi Deepika,
> It is hard to debug this remotely and find the root cause.
> 
> Do you see this exception when you send the Kill command?
> Or this exception is coming on and off.
> You can verify this by running "oozie kill" command and tail on that log.
> 
> one easy and hacky way is to stop oozie service and restart and then kill.
> Another question: what oozie version are you using and what is the DB ?
> Regards,
> Mohammad
> 
> ________________________________
> From: Deepika Khera <dk...@lyris.com>
> To: "oozie-users@incubator.apache.org" <oo...@incubator.apache.org> 
> Sent: Thursday, February 9, 2012 5:54 PM
> Subject: Killing coordinator jobs
> 
> Hi,
> 
> While testing my coordinator.xml I ended up creating a bunch of
> coordinator jobs that are on oozie.
> 
> I am trying to kill them now and am unable to do so.
> 
> I get the error:
> Error: E0607 : E0607: Other error in operation
> [updateCoordinatorAction], An optimistic lock violation was detected
> when flushing object instance "A lock could not be obtained within the
> time requested {prepstmnt 1385108855 UPDATE COORD_ACTIONS SET
> action_number = ?, action_xml = ?, console_url = NULL, created_conf = ?,
> error_code = NULL, error_message = NULL, external_status = NULL,
> missing_dependencies = ?, run_conf = ?, time_out = ?, tracker_uri =
> NULL, job_type = NULL, created_time = ?, external_id = NULL, job_id = ?,
> last_modified_time = ?, nominal_time = ?, sla_xml = NULL, status = ?
> WHERE id IN (SELECT DISTINCT t0.id FROM COORD_ACTIONS t0 WHERE (t0.id
> = ?) AND t0.bean_type = ?) [params=(int) 4, (String) <coordinator-app
> xmlns="uri:oozie:coordinator:0.1" name="clickstream..., (String)
> <configuration>    <property>      <name>oozie.coord.application.pat...,
> (String)
> hdfs://localhost:54310/usr/org/data/clickstream/input/2012/02/10/{...,
> (String) <configuration>    <property>
> <name>oozie.coord.application.pat..., (int) 120, (Timestamp) 2012-02-09
> 17:38:59.035, (String) 0000009-120209131354394-oozie-oozi-C, (Timestamp)
> 2012-02-09 17:49:42.324, (Timestamp) 2012-02-09 17:35:00.0, (String)
> KILLED, (String) 0000009-120209131354394-oozie-oozi-C@4, (String)
> CoordinatorActionBean]} [code=30000, state=40XL1] [java.lang.String]" to
> the data store.  This indicates that the object was concurrently
> modified in another transaction.
> 
> 
> 
> Seems like since all the coordinators are waiting on the same input
> directory, its not letting me kill it.
> Or maybe something else. 
> This is my local cluster and if there is a way to kill all jobs I can
> even do that.
> 
> Would appreciate any pointers on how to resolve this.
> 
> Thanks,
> Deepika



Re: Killing coordinator jobs

Posted by Mohammad Islam <mi...@yahoo.com>.
Hi Deepika,
It is hard to debug this remotely and find the root cause.

Do you see this exception when you send the Kill command?
Or this exception is coming on and off.
You can verify this by running "oozie kill" command and tail on that log.

one easy and hacky way is to stop oozie service and restart and then kill.
Another question: what oozie version are you using and what is the DB ?
Regards,
Mohammad

________________________________
From: Deepika Khera <dk...@lyris.com>
To: "oozie-users@incubator.apache.org" <oo...@incubator.apache.org> 
Sent: Thursday, February 9, 2012 5:54 PM
Subject: Killing coordinator jobs

Hi,

While testing my coordinator.xml I ended up creating a bunch of
coordinator jobs that are on oozie.

I am trying to kill them now and am unable to do so.

I get the error:
Error: E0607 : E0607: Other error in operation
[updateCoordinatorAction], An optimistic lock violation was detected
when flushing object instance "A lock could not be obtained within the
time requested {prepstmnt 1385108855 UPDATE COORD_ACTIONS SET
action_number = ?, action_xml = ?, console_url = NULL, created_conf = ?,
error_code = NULL, error_message = NULL, external_status = NULL,
missing_dependencies = ?, run_conf = ?, time_out = ?, tracker_uri =
NULL, job_type = NULL, created_time = ?, external_id = NULL, job_id = ?,
last_modified_time = ?, nominal_time = ?, sla_xml = NULL, status = ?
WHERE id IN (SELECT DISTINCT t0.id FROM COORD_ACTIONS t0 WHERE (t0.id
= ?) AND t0.bean_type = ?) [params=(int) 4, (String) <coordinator-app
xmlns="uri:oozie:coordinator:0.1" name="clickstream..., (String)
<configuration>    <property>      <name>oozie.coord.application.pat...,
(String)
hdfs://localhost:54310/usr/org/data/clickstream/input/2012/02/10/{...,
(String) <configuration>    <property>
<name>oozie.coord.application.pat..., (int) 120, (Timestamp) 2012-02-09
17:38:59.035, (String) 0000009-120209131354394-oozie-oozi-C, (Timestamp)
2012-02-09 17:49:42.324, (Timestamp) 2012-02-09 17:35:00.0, (String)
KILLED, (String) 0000009-120209131354394-oozie-oozi-C@4, (String)
CoordinatorActionBean]} [code=30000, state=40XL1] [java.lang.String]" to
the data store.  This indicates that the object was concurrently
modified in another transaction.



Seems like since all the coordinators are waiting on the same input
directory, its not letting me kill it.
Or maybe something else. 
This is my local cluster and if there is a way to kill all jobs I can
even do that.

Would appreciate any pointers on how to resolve this.

Thanks,
Deepika

Killing coordinator jobs

Posted by Deepika Khera <dk...@lyris.com>.
Hi,

While testing my coordinator.xml I ended up creating a bunch of
coordinator jobs that are on oozie.

I am trying to kill them now and am unable to do so.

I get the error:
Error: E0607 : E0607: Other error in operation
[updateCoordinatorAction], An optimistic lock violation was detected
when flushing object instance "A lock could not be obtained within the
time requested {prepstmnt 1385108855 UPDATE COORD_ACTIONS SET
action_number = ?, action_xml = ?, console_url = NULL, created_conf = ?,
error_code = NULL, error_message = NULL, external_status = NULL,
missing_dependencies = ?, run_conf = ?, time_out = ?, tracker_uri =
NULL, job_type = NULL, created_time = ?, external_id = NULL, job_id = ?,
last_modified_time = ?, nominal_time = ?, sla_xml = NULL, status = ?
WHERE id IN (SELECT DISTINCT t0.id FROM COORD_ACTIONS t0 WHERE (t0.id
= ?) AND t0.bean_type = ?) [params=(int) 4, (String) <coordinator-app
xmlns="uri:oozie:coordinator:0.1" name="clickstream..., (String)
<configuration>    <property>      <name>oozie.coord.application.pat...,
(String)
hdfs://localhost:54310/usr/org/data/clickstream/input/2012/02/10/{...,
(String) <configuration>    <property>
<name>oozie.coord.application.pat..., (int) 120, (Timestamp) 2012-02-09
17:38:59.035, (String) 0000009-120209131354394-oozie-oozi-C, (Timestamp)
2012-02-09 17:49:42.324, (Timestamp) 2012-02-09 17:35:00.0, (String)
KILLED, (String) 0000009-120209131354394-oozie-oozi-C@4, (String)
CoordinatorActionBean]} [code=30000, state=40XL1] [java.lang.String]" to
the data store.  This indicates that the object was concurrently
modified in another transaction.



Seems like since all the coordinators are waiting on the same input
directory, its not letting me kill it.
Or maybe something else. 
This is my local cluster and if there is a way to kill all jobs I can
even do that.

Would appreciate any pointers on how to resolve this.

Thanks,
Deepika




Re: Workflow xml

Posted by Mohammad Islam <mi...@yahoo.com>.
Sounds good.
Regards,
Mohammad


----- Original Message -----
From: Deepika Khera <dk...@lyris.com>
To: oozie-users@incubator.apache.org
Cc: 
Sent: Tuesday, January 31, 2012 12:47 PM
Subject: Re: Workflow xml

Thanks for your response Mohammad. 

I agree that job submitted by Java action would mean limited support
from Oozie for an M/R job. I was only considering to fall back on Java
API as configuring for instance table mapper, requires to stringify the
scan etc(in addition to dumping the configuration properperties), which
I thought probably would not be the cleanest option. Similarly for
MultipleOutputs setup, validating the file name would need to be done. 
I guess I can have a java handler class do that for me which I can call
from my M/R configuration.  So some work will be needed , but then it
would be a more appropriate approach to handle it.

Thanks,
Deepika






On Tue, 2012-01-31 at 12:31 -0800, Mohammad Islam wrote:
> Hi Deepika,
> Java action based option has some issues such as the job submitted by Java action will not be monitored/controlled/managed by oozie. Thee are other operational issues too.
> 
> I would prefer the MR action instead.
> 
> 1. Define your Mapper and Reducer in wf.xml.
> 2. Define all your configs as Key-value in action configuration block.
> 
> Two potential issues:
> 1. Are your Mapper and Reducer based on new hadoop API (mapredcue instead of mapred)? In that case , you might need to define two extra configurations.
> 
>  2. You might not know all the configurations key names because you are using hadoop java API to set it. Most of the common configurations names are obvious. If it is not known, you can dump your java config object which will print the Key=value pairs. From there, you could get the exact name.
> 
> Even if you need to do these two extra steps, I still suggest to consider this before falling back to java action.
> 
> 
> Regards,
> Mohammad 
>  
> 
> 
> 
> 
> ----- Original Message -----
> From: Deepika Khera <dk...@lyris.com>
> To: oozie-users@incubator.apache.org
> Cc: 
> Sent: Tuesday, January 31, 2012 10:22 AM
> Subject: Workflow xml
> 
> Hi,
> 
> I want to transform my existing map reduce jobs to using Oozie. 2 of my
> major use cases for the M/R jobs are:
> 
> 1. Using TableMapper/TableReducer for jobs reading from hbase/writing to
> hbase.
> 2. Using MultipleOutputs for writing out multiple files.
> 
> I am thinking the way to achieve the above two is by configuring these
> M/R jobs through a java action. Is there a better or cleaner way to do
> this? The second option is to blow out the job configuration that is
> done for TableMapper/TableReducer or MultipleOutputs, but that seems a
> little dirty and complicated.
> 
> Thanks,
> Deepika

Re: Workflow xml

Posted by Deepika Khera <dk...@lyris.com>.
Thanks for your response Mohammad. 

I agree that job submitted by Java action would mean limited support
from Oozie for an M/R job. I was only considering to fall back on Java
API as configuring for instance table mapper, requires to stringify the
scan etc(in addition to dumping the configuration properperties), which
I thought probably would not be the cleanest option. Similarly for
MultipleOutputs setup, validating the file name would need to be done. 
I guess I can have a java handler class do that for me which I can call
from my M/R configuration.  So some work will be needed , but then it
would be a more appropriate approach to handle it.

Thanks,
Deepika






On Tue, 2012-01-31 at 12:31 -0800, Mohammad Islam wrote:
> Hi Deepika,
> Java action based option has some issues such as the job submitted by Java action will not be monitored/controlled/managed by oozie. Thee are other operational issues too.
> 
> I would prefer the MR action instead.
> 
> 1. Define your Mapper and Reducer in wf.xml.
> 2. Define all your configs as Key-value in action configuration block.
> 
> Two potential issues:
> 1. Are your Mapper and Reducer based on new hadoop API (mapredcue instead of mapred)? In that case , you might need to define two extra configurations.
> 
>  2. You might not know all the configurations key names because you are using hadoop java API to set it. Most of the common configurations names are obvious. If it is not known, you can dump your java config object which will print the Key=value pairs. From there, you could get the exact name.
> 
> Even if you need to do these two extra steps, I still suggest to consider this before falling back to java action.
> 
> 
> Regards,
> Mohammad 
>  
> 
> 
> 
> 
> ----- Original Message -----
> From: Deepika Khera <dk...@lyris.com>
> To: oozie-users@incubator.apache.org
> Cc: 
> Sent: Tuesday, January 31, 2012 10:22 AM
> Subject: Workflow xml
> 
> Hi,
> 
> I want to transform my existing map reduce jobs to using Oozie. 2 of my
> major use cases for the M/R jobs are:
> 
> 1. Using TableMapper/TableReducer for jobs reading from hbase/writing to
> hbase.
> 2. Using MultipleOutputs for writing out multiple files.
> 
> I am thinking the way to achieve the above two is by configuring these
> M/R jobs through a java action. Is there a better or cleaner way to do
> this? The second option is to blow out the job configuration that is
> done for TableMapper/TableReducer or MultipleOutputs, but that seems a
> little dirty and complicated.
> 
> Thanks,
> Deepika



Re: Workflow xml

Posted by Mohammad Islam <mi...@yahoo.com>.
Hi Deepika,
Java action based option has some issues such as the job submitted by Java action will not be monitored/controlled/managed by oozie. Thee are other operational issues too.

I would prefer the MR action instead.

1. Define your Mapper and Reducer in wf.xml.
2. Define all your configs as Key-value in action configuration block.

Two potential issues:
1. Are your Mapper and Reducer based on new hadoop API (mapredcue instead of mapred)? In that case , you might need to define two extra configurations.

 2. You might not know all the configurations key names because you are using hadoop java API to set it. Most of the common configurations names are obvious. If it is not known, you can dump your java config object which will print the Key=value pairs. From there, you could get the exact name.

Even if you need to do these two extra steps, I still suggest to consider this before falling back to java action.


Regards,
Mohammad 
 




----- Original Message -----
From: Deepika Khera <dk...@lyris.com>
To: oozie-users@incubator.apache.org
Cc: 
Sent: Tuesday, January 31, 2012 10:22 AM
Subject: Workflow xml

Hi,

I want to transform my existing map reduce jobs to using Oozie. 2 of my
major use cases for the M/R jobs are:

1. Using TableMapper/TableReducer for jobs reading from hbase/writing to
hbase.
2. Using MultipleOutputs for writing out multiple files.

I am thinking the way to achieve the above two is by configuring these
M/R jobs through a java action. Is there a better or cleaner way to do
this? The second option is to blow out the job configuration that is
done for TableMapper/TableReducer or MultipleOutputs, but that seems a
little dirty and complicated.

Thanks,
Deepika