You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@falcon.apache.org by Mark Greene <ma...@markgreene.com> on 2016/01/22 17:16:56 UTC

Falcon Process fails with Pig HCatalog Usage

Hi Dev List,

I'm trying to best understand how to troubleshoot this error (is it Oozie,
or Falcon induced?).

I have a Pig script that I am using as the workflow for my Falcon process.
The pig script uses HCatalogStorer to write to a HCatalog URI that is the
output feed defined in my Falcon Process Entity. The Pig action in the
resulting Ooozie Workflow generated by Falcon fails with the attached stack
trace. The root is that it is *missing a class definitions of
org/apache/hadoop/hive/shims/ShimLoader.*

Running the script manually using pig -x tex -useHCatalog <all the -params
passed by Oozie> <path to pig script> results in a successful execution.
It's only once this is called as a Pig activity in the Falcon-generated
Oozie workflow that the missing class definitions manifests.

I am running the following stack:

HDP-2.3.2.0-2950
Pig 0.15.0.2.3
Hive 1.2.1.2.3
Oozie 4.2.0.2.3
Falcon 0.6.1.2.3

-- 

Mark Greene
*E:* Mark@MarkGreene.com

Re: Falcon Process fails with Pig HCatalog Usage

Posted by Mark Greene <ma...@markgreene.com>.
Sowmya, Venkat,

I've created FALCON-1787 and attached the requested entities and artifacts.

Thanks,
Mark

On Fri, Jan 22, 2016 at 5:01 PM, Sowmya Ramesh <sr...@hortonworks.com>
wrote:

> Mark,
>
> Falcon generates the pig action at the run time to generate the workflow
> and uses pig-action.xml defined in Falcon. pig-action.xml that Falcon uses
> does not have hive in the share lib config.
> As Venkat mentioned, workflow action configuration is overriding the one
> defined in ooze-site.xml.
>
> Couple of work arounds:
> * Update pig-action.xml to have hive in the share lib config and repackage
> falcon-oozie-adaptor-<version>.jar and replace jar at
> "/usr/hdp/current/falcon-server/webapp/falcon/WEB-INF/lib” and restart
> Falcon
> * If you have Falcon code downloaded then update pig-action.xml at
> oozie/src/main/resources/action/process/pig-action.xml and then build
> falcon and reinstall it
>
> I have to understand why hive is required now and how it worked before.
>
> Can you file a bug and attach all the entities to debug further?
>
> Thanks!
>
> On 1/22/16, 2:00 PM, "Mark Greene" <ma...@markgreene.com> wrote:
>
> >Sowmya,
> >
> >I modified the *workflow.xml* generated by Falcon to include hive in
> >the oozie.action.sharelib.for.pig, and the Pig action succeeded!
> >
> >What I'm struggling now to understand is why Falcon is not using the
> >property as defined in my oozie-site.xml, which is correct.
> >
> >Mark
> >
> >On Fri, Jan 22, 2016 at 2:35 PM, Sowmya Ramesh <sr...@hortonworks.com>
> >wrote:
> >
> >> Mark - I looked at pig-action.xml and it has
> >>
> >> <property>
> >>                 <name>oozie.action.sharelib.for.pig</name>
> >>                 <value>pig,hcatalog</value>
> >>             </property>
> >>
> >> Looks like even hive is required in the sharelib. Can you attach the pig
> >> script and process xml too? Also can you try to modify the workflow
> >> generated by Falcon to have hive in ³oozie.action.sharelib.for.pig² and
> >> try rerun the workflow and see if it succeeds?
> >>
> >> Thanks!
> >>
> >>
> >> On 1/22/16, 9:02 AM, "Mark Greene" <ma...@markgreene.com> wrote:
> >>
> >> >Hi Dev List,
> >> >
> >> >I have tracked the issue to the *oozie.action.sharelib.for.pig*
> >>property
> >> >of
> >> >the Falcon-generated workflow not reflecting the
> >> >oozie.action.sharelib.for.pig
> >> >of my oozie-site.xml.
> >> >
> >> >From the workflow.xml generated by Falcon:
> >> >                <property>
> >> >                    <name>oozie.action.sharelib.for.pig</name>
> >> >                    <value>pig,hcatalog</value>
> >> >                </property>
> >> >
> >> >From the oozie-site.xml of the cluster:
> >> >    <property>
> >> >      <name>oozie.action.sharelib.for.pig</name>
> >> >      <value>hive,pig,hcatalog</value>
> >> >    </property>
> >> >
> >> >JIRA has an Issue logged but indicates it is resolved with Oozie
> >>versions
> >> >4+. My stack is running Oozie 4.2.0.
> >> >https://issues.apache.org/jira/browse/FALCON-243
> >> >
> >> >Any advice is appreciated.
> >> >
> >> >Mark
> >> >
> >> >
> >> >On Fri, Jan 22, 2016 at 10:16 AM, Mark Greene <ma...@markgreene.com>
> >> wrote:
> >> >
> >> >> Hi Dev List,
> >> >>
> >> >> I'm trying to best understand how to troubleshoot this error (is it
> >> >>Oozie,
> >> >> or Falcon induced?).
> >> >>
> >> >> I have a Pig script that I am using as the workflow for my Falcon
> >> >>process.
> >> >> The pig script uses HCatalogStorer to write to a HCatalog URI that is
> >> >>the
> >> >> output feed defined in my Falcon Process Entity. The Pig action in
> >>the
> >> >> resulting Ooozie Workflow generated by Falcon fails with the attached
> >> >>stack
> >> >> trace. The root is that it is *missing a class definitions of
> >> >> org/apache/hadoop/hive/shims/ShimLoader.*
> >> >>
> >> >> Running the script manually using pig -x tex -useHCatalog <all the
> >> >>-params
> >> >> passed by Oozie> <path to pig script> results in a successful
> >>execution.
> >> >> It's only once this is called as a Pig activity in the
> >>Falcon-generated
> >> >> Oozie workflow that the missing class definitions manifests.
> >> >>
> >> >> I am running the following stack:
> >> >>
> >> >> HDP-2.3.2.0-2950
> >> >> Pig 0.15.0.2.3
> >> >> Hive 1.2.1.2.3
> >> >> Oozie 4.2.0.2.3
> >> >> Falcon 0.6.1.2.3
> >> >>
> >> >> --
> >> >>
> >> >> Mark Greene
> >> >> *E:* Mark@MarkGreene.com
> >> >>
> >> >
> >> >
> >> >
> >> >--
> >> >
> >> >Mark Greene
> >> >*E:* Mark@MarkGreene.com
> >> >*T: *+1 512 663 0445
> >>
> >>
> >
> >
> >--
> >
> >Mark Greene
> >*E:* Mark@MarkGreene.com
> >*T: *+1 512 663 0445
>
>


-- 

Mark Greene
*E:* Mark@MarkGreene.com
*T: *+1 512 663 0445

Re: Falcon Process fails with Pig HCatalog Usage

Posted by Sowmya Ramesh <sr...@hortonworks.com>.
Mark,

Falcon generates the pig action at the run time to generate the workflow
and uses pig-action.xml defined in Falcon. pig-action.xml that Falcon uses
does not have hive in the share lib config.
As Venkat mentioned, workflow action configuration is overriding the one
defined in ooze-site.xml.

Couple of work arounds:
* Update pig-action.xml to have hive in the share lib config and repackage
falcon-oozie-adaptor-<version>.jar and replace jar at
"/usr/hdp/current/falcon-server/webapp/falcon/WEB-INF/lib” and restart
Falcon
* If you have Falcon code downloaded then update pig-action.xml at
oozie/src/main/resources/action/process/pig-action.xml and then build
falcon and reinstall it

I have to understand why hive is required now and how it worked before.

Can you file a bug and attach all the entities to debug further?

Thanks!

On 1/22/16, 2:00 PM, "Mark Greene" <ma...@markgreene.com> wrote:

>Sowmya,
>
>I modified the *workflow.xml* generated by Falcon to include hive in
>the oozie.action.sharelib.for.pig, and the Pig action succeeded!
>
>What I'm struggling now to understand is why Falcon is not using the
>property as defined in my oozie-site.xml, which is correct.
>
>Mark
>
>On Fri, Jan 22, 2016 at 2:35 PM, Sowmya Ramesh <sr...@hortonworks.com>
>wrote:
>
>> Mark - I looked at pig-action.xml and it has
>>
>> <property>
>>                 <name>oozie.action.sharelib.for.pig</name>
>>                 <value>pig,hcatalog</value>
>>             </property>
>>
>> Looks like even hive is required in the sharelib. Can you attach the pig
>> script and process xml too? Also can you try to modify the workflow
>> generated by Falcon to have hive in ³oozie.action.sharelib.for.pig² and
>> try rerun the workflow and see if it succeeds?
>>
>> Thanks!
>>
>>
>> On 1/22/16, 9:02 AM, "Mark Greene" <ma...@markgreene.com> wrote:
>>
>> >Hi Dev List,
>> >
>> >I have tracked the issue to the *oozie.action.sharelib.for.pig*
>>property
>> >of
>> >the Falcon-generated workflow not reflecting the
>> >oozie.action.sharelib.for.pig
>> >of my oozie-site.xml.
>> >
>> >From the workflow.xml generated by Falcon:
>> >                <property>
>> >                    <name>oozie.action.sharelib.for.pig</name>
>> >                    <value>pig,hcatalog</value>
>> >                </property>
>> >
>> >From the oozie-site.xml of the cluster:
>> >    <property>
>> >      <name>oozie.action.sharelib.for.pig</name>
>> >      <value>hive,pig,hcatalog</value>
>> >    </property>
>> >
>> >JIRA has an Issue logged but indicates it is resolved with Oozie
>>versions
>> >4+. My stack is running Oozie 4.2.0.
>> >https://issues.apache.org/jira/browse/FALCON-243
>> >
>> >Any advice is appreciated.
>> >
>> >Mark
>> >
>> >
>> >On Fri, Jan 22, 2016 at 10:16 AM, Mark Greene <ma...@markgreene.com>
>> wrote:
>> >
>> >> Hi Dev List,
>> >>
>> >> I'm trying to best understand how to troubleshoot this error (is it
>> >>Oozie,
>> >> or Falcon induced?).
>> >>
>> >> I have a Pig script that I am using as the workflow for my Falcon
>> >>process.
>> >> The pig script uses HCatalogStorer to write to a HCatalog URI that is
>> >>the
>> >> output feed defined in my Falcon Process Entity. The Pig action in
>>the
>> >> resulting Ooozie Workflow generated by Falcon fails with the attached
>> >>stack
>> >> trace. The root is that it is *missing a class definitions of
>> >> org/apache/hadoop/hive/shims/ShimLoader.*
>> >>
>> >> Running the script manually using pig -x tex -useHCatalog <all the
>> >>-params
>> >> passed by Oozie> <path to pig script> results in a successful
>>execution.
>> >> It's only once this is called as a Pig activity in the
>>Falcon-generated
>> >> Oozie workflow that the missing class definitions manifests.
>> >>
>> >> I am running the following stack:
>> >>
>> >> HDP-2.3.2.0-2950
>> >> Pig 0.15.0.2.3
>> >> Hive 1.2.1.2.3
>> >> Oozie 4.2.0.2.3
>> >> Falcon 0.6.1.2.3
>> >>
>> >> --
>> >>
>> >> Mark Greene
>> >> *E:* Mark@MarkGreene.com
>> >>
>> >
>> >
>> >
>> >--
>> >
>> >Mark Greene
>> >*E:* Mark@MarkGreene.com
>> >*T: *+1 512 663 0445
>>
>>
>
>
>-- 
>
>Mark Greene
>*E:* Mark@MarkGreene.com
>*T: *+1 512 663 0445


Re: Falcon Process fails with Pig HCatalog Usage

Posted by Venkat Ranganathan <vr...@hortonworks.com>.
The one defined in oozie-site.xml can be overridden by the action configuration and has higher precedence

Venkat





On 1/22/16, 2:00 PM, "Mark Greene" <ma...@markgreene.com> wrote:

>Sowmya,
>
>I modified the *workflow.xml* generated by Falcon to include hive in
>the oozie.action.sharelib.for.pig, and the Pig action succeeded!
>
>What I'm struggling now to understand is why Falcon is not using the
>property as defined in my oozie-site.xml, which is correct.
>
>Mark
>
>On Fri, Jan 22, 2016 at 2:35 PM, Sowmya Ramesh <sr...@hortonworks.com>
>wrote:
>
>> Mark - I looked at pig-action.xml and it has
>>
>> <property>
>>                 <name>oozie.action.sharelib.for.pig</name>
>>                 <value>pig,hcatalog</value>
>>             </property>
>>
>> Looks like even hive is required in the sharelib. Can you attach the pig
>> script and process xml too? Also can you try to modify the workflow
>> generated by Falcon to have hive in ³oozie.action.sharelib.for.pig² and
>> try rerun the workflow and see if it succeeds?
>>
>> Thanks!
>>
>>
>> On 1/22/16, 9:02 AM, "Mark Greene" <ma...@markgreene.com> wrote:
>>
>> >Hi Dev List,
>> >
>> >I have tracked the issue to the *oozie.action.sharelib.for.pig* property
>> >of
>> >the Falcon-generated workflow not reflecting the
>> >oozie.action.sharelib.for.pig
>> >of my oozie-site.xml.
>> >
>> >From the workflow.xml generated by Falcon:
>> >                <property>
>> >                    <name>oozie.action.sharelib.for.pig</name>
>> >                    <value>pig,hcatalog</value>
>> >                </property>
>> >
>> >From the oozie-site.xml of the cluster:
>> >    <property>
>> >      <name>oozie.action.sharelib.for.pig</name>
>> >      <value>hive,pig,hcatalog</value>
>> >    </property>
>> >
>> >JIRA has an Issue logged but indicates it is resolved with Oozie versions
>> >4+. My stack is running Oozie 4.2.0.
>> >https://issues.apache.org/jira/browse/FALCON-243
>> >
>> >Any advice is appreciated.
>> >
>> >Mark
>> >
>> >
>> >On Fri, Jan 22, 2016 at 10:16 AM, Mark Greene <ma...@markgreene.com>
>> wrote:
>> >
>> >> Hi Dev List,
>> >>
>> >> I'm trying to best understand how to troubleshoot this error (is it
>> >>Oozie,
>> >> or Falcon induced?).
>> >>
>> >> I have a Pig script that I am using as the workflow for my Falcon
>> >>process.
>> >> The pig script uses HCatalogStorer to write to a HCatalog URI that is
>> >>the
>> >> output feed defined in my Falcon Process Entity. The Pig action in the
>> >> resulting Ooozie Workflow generated by Falcon fails with the attached
>> >>stack
>> >> trace. The root is that it is *missing a class definitions of
>> >> org/apache/hadoop/hive/shims/ShimLoader.*
>> >>
>> >> Running the script manually using pig -x tex -useHCatalog <all the
>> >>-params
>> >> passed by Oozie> <path to pig script> results in a successful execution.
>> >> It's only once this is called as a Pig activity in the Falcon-generated
>> >> Oozie workflow that the missing class definitions manifests.
>> >>
>> >> I am running the following stack:
>> >>
>> >> HDP-2.3.2.0-2950
>> >> Pig 0.15.0.2.3
>> >> Hive 1.2.1.2.3
>> >> Oozie 4.2.0.2.3
>> >> Falcon 0.6.1.2.3
>> >>
>> >> --
>> >>
>> >> Mark Greene
>> >> *E:* Mark@MarkGreene.com
>> >>
>> >
>> >
>> >
>> >--
>> >
>> >Mark Greene
>> >*E:* Mark@MarkGreene.com
>> >*T: *+1 512 663 0445
>>
>>
>
>
>-- 
>
>Mark Greene
>*E:* Mark@MarkGreene.com
>*T: *+1 512 663 0445

Re: Falcon Process fails with Pig HCatalog Usage

Posted by Mark Greene <ma...@markgreene.com>.
Sowmya,

I modified the *workflow.xml* generated by Falcon to include hive in
the oozie.action.sharelib.for.pig, and the Pig action succeeded!

What I'm struggling now to understand is why Falcon is not using the
property as defined in my oozie-site.xml, which is correct.

Mark

On Fri, Jan 22, 2016 at 2:35 PM, Sowmya Ramesh <sr...@hortonworks.com>
wrote:

> Mark - I looked at pig-action.xml and it has
>
> <property>
>                 <name>oozie.action.sharelib.for.pig</name>
>                 <value>pig,hcatalog</value>
>             </property>
>
> Looks like even hive is required in the sharelib. Can you attach the pig
> script and process xml too? Also can you try to modify the workflow
> generated by Falcon to have hive in ³oozie.action.sharelib.for.pig² and
> try rerun the workflow and see if it succeeds?
>
> Thanks!
>
>
> On 1/22/16, 9:02 AM, "Mark Greene" <ma...@markgreene.com> wrote:
>
> >Hi Dev List,
> >
> >I have tracked the issue to the *oozie.action.sharelib.for.pig* property
> >of
> >the Falcon-generated workflow not reflecting the
> >oozie.action.sharelib.for.pig
> >of my oozie-site.xml.
> >
> >From the workflow.xml generated by Falcon:
> >                <property>
> >                    <name>oozie.action.sharelib.for.pig</name>
> >                    <value>pig,hcatalog</value>
> >                </property>
> >
> >From the oozie-site.xml of the cluster:
> >    <property>
> >      <name>oozie.action.sharelib.for.pig</name>
> >      <value>hive,pig,hcatalog</value>
> >    </property>
> >
> >JIRA has an Issue logged but indicates it is resolved with Oozie versions
> >4+. My stack is running Oozie 4.2.0.
> >https://issues.apache.org/jira/browse/FALCON-243
> >
> >Any advice is appreciated.
> >
> >Mark
> >
> >
> >On Fri, Jan 22, 2016 at 10:16 AM, Mark Greene <ma...@markgreene.com>
> wrote:
> >
> >> Hi Dev List,
> >>
> >> I'm trying to best understand how to troubleshoot this error (is it
> >>Oozie,
> >> or Falcon induced?).
> >>
> >> I have a Pig script that I am using as the workflow for my Falcon
> >>process.
> >> The pig script uses HCatalogStorer to write to a HCatalog URI that is
> >>the
> >> output feed defined in my Falcon Process Entity. The Pig action in the
> >> resulting Ooozie Workflow generated by Falcon fails with the attached
> >>stack
> >> trace. The root is that it is *missing a class definitions of
> >> org/apache/hadoop/hive/shims/ShimLoader.*
> >>
> >> Running the script manually using pig -x tex -useHCatalog <all the
> >>-params
> >> passed by Oozie> <path to pig script> results in a successful execution.
> >> It's only once this is called as a Pig activity in the Falcon-generated
> >> Oozie workflow that the missing class definitions manifests.
> >>
> >> I am running the following stack:
> >>
> >> HDP-2.3.2.0-2950
> >> Pig 0.15.0.2.3
> >> Hive 1.2.1.2.3
> >> Oozie 4.2.0.2.3
> >> Falcon 0.6.1.2.3
> >>
> >> --
> >>
> >> Mark Greene
> >> *E:* Mark@MarkGreene.com
> >>
> >
> >
> >
> >--
> >
> >Mark Greene
> >*E:* Mark@MarkGreene.com
> >*T: *+1 512 663 0445
>
>


-- 

Mark Greene
*E:* Mark@MarkGreene.com
*T: *+1 512 663 0445

Re: Falcon Process fails with Pig HCatalog Usage

Posted by Sowmya Ramesh <sr...@hortonworks.com>.
Mark - I looked at pig-action.xml and it has

<property>
                <name>oozie.action.sharelib.for.pig</name>
                <value>pig,hcatalog</value>
            </property>

Looks like even hive is required in the sharelib. Can you attach the pig
script and process xml too? Also can you try to modify the workflow
generated by Falcon to have hive in ³oozie.action.sharelib.for.pig² and
try rerun the workflow and see if it succeeds?

Thanks!


On 1/22/16, 9:02 AM, "Mark Greene" <ma...@markgreene.com> wrote:

>Hi Dev List,
>
>I have tracked the issue to the *oozie.action.sharelib.for.pig* property
>of
>the Falcon-generated workflow not reflecting the
>oozie.action.sharelib.for.pig
>of my oozie-site.xml.
>
>From the workflow.xml generated by Falcon:
>                <property>
>                    <name>oozie.action.sharelib.for.pig</name>
>                    <value>pig,hcatalog</value>
>                </property>
>
>From the oozie-site.xml of the cluster:
>    <property>
>      <name>oozie.action.sharelib.for.pig</name>
>      <value>hive,pig,hcatalog</value>
>    </property>
>
>JIRA has an Issue logged but indicates it is resolved with Oozie versions
>4+. My stack is running Oozie 4.2.0.
>https://issues.apache.org/jira/browse/FALCON-243
>
>Any advice is appreciated.
>
>Mark
>
>
>On Fri, Jan 22, 2016 at 10:16 AM, Mark Greene <ma...@markgreene.com> wrote:
>
>> Hi Dev List,
>>
>> I'm trying to best understand how to troubleshoot this error (is it
>>Oozie,
>> or Falcon induced?).
>>
>> I have a Pig script that I am using as the workflow for my Falcon
>>process.
>> The pig script uses HCatalogStorer to write to a HCatalog URI that is
>>the
>> output feed defined in my Falcon Process Entity. The Pig action in the
>> resulting Ooozie Workflow generated by Falcon fails with the attached
>>stack
>> trace. The root is that it is *missing a class definitions of
>> org/apache/hadoop/hive/shims/ShimLoader.*
>>
>> Running the script manually using pig -x tex -useHCatalog <all the
>>-params
>> passed by Oozie> <path to pig script> results in a successful execution.
>> It's only once this is called as a Pig activity in the Falcon-generated
>> Oozie workflow that the missing class definitions manifests.
>>
>> I am running the following stack:
>>
>> HDP-2.3.2.0-2950
>> Pig 0.15.0.2.3
>> Hive 1.2.1.2.3
>> Oozie 4.2.0.2.3
>> Falcon 0.6.1.2.3
>>
>> --
>>
>> Mark Greene
>> *E:* Mark@MarkGreene.com
>>
>
>
>
>-- 
>
>Mark Greene
>*E:* Mark@MarkGreene.com
>*T: *+1 512 663 0445


Re: Falcon Process fails with Pig HCatalog Usage

Posted by Mark Greene <ma...@markgreene.com>.
Hi Dev List,

I have tracked the issue to the *oozie.action.sharelib.for.pig* property of
the Falcon-generated workflow not reflecting the oozie.action.sharelib.for.pig
of my oozie-site.xml.

>From the workflow.xml generated by Falcon:
                <property>
                    <name>oozie.action.sharelib.for.pig</name>
                    <value>pig,hcatalog</value>
                </property>

>From the oozie-site.xml of the cluster:
    <property>
      <name>oozie.action.sharelib.for.pig</name>
      <value>hive,pig,hcatalog</value>
    </property>

JIRA has an Issue logged but indicates it is resolved with Oozie versions
4+. My stack is running Oozie 4.2.0.
https://issues.apache.org/jira/browse/FALCON-243

Any advice is appreciated.

Mark


On Fri, Jan 22, 2016 at 10:16 AM, Mark Greene <ma...@markgreene.com> wrote:

> Hi Dev List,
>
> I'm trying to best understand how to troubleshoot this error (is it Oozie,
> or Falcon induced?).
>
> I have a Pig script that I am using as the workflow for my Falcon process.
> The pig script uses HCatalogStorer to write to a HCatalog URI that is the
> output feed defined in my Falcon Process Entity. The Pig action in the
> resulting Ooozie Workflow generated by Falcon fails with the attached stack
> trace. The root is that it is *missing a class definitions of
> org/apache/hadoop/hive/shims/ShimLoader.*
>
> Running the script manually using pig -x tex -useHCatalog <all the -params
> passed by Oozie> <path to pig script> results in a successful execution.
> It's only once this is called as a Pig activity in the Falcon-generated
> Oozie workflow that the missing class definitions manifests.
>
> I am running the following stack:
>
> HDP-2.3.2.0-2950
> Pig 0.15.0.2.3
> Hive 1.2.1.2.3
> Oozie 4.2.0.2.3
> Falcon 0.6.1.2.3
>
> --
>
> Mark Greene
> *E:* Mark@MarkGreene.com
>



-- 

Mark Greene
*E:* Mark@MarkGreene.com
*T: *+1 512 663 0445