You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@pig.apache.org by hc busy <hc...@gmail.com> on 2010/03/09 02:25:28 UTC

ERROR 6017: Execution failed, while processing

Guys, I just ran into a weird exception 500 lines into writing a pig
script... Below attached is the error. Does anybody have any idea about how
to debug this? I don't even know which step of my 500 line pig script caused
this error.

Any suggestions on how to track down the offending operation?

Thanks in advance!
*
*
*
*
*Pig Stack Trace*
*---------------*
*ERROR 6017: Execution failed, while processing
hdfs://tasktracker:44445/tmp/temp1581022765/tmp939224290,
hdfs://tasktracker:44445/tmp/temp1581022765/tmp-1028111033,
hdfs://tasktracker:44445/tmp/temp1581022765/tmp-198156265,
hdfs://tasktracker:44445/tmp/temp1581022765/tmp-72050900,
hdfs://tasktracker:44445/tmp/temp1581022765/tmp-141993299,
hdfs://tasktracker:44445/tmp/temp1581022765/tmp2135611534,
hdfs://tasktracker:44445/tmp/temp1581022765/tmp-2093411384,
hdfs://tasktracker:44445/tmp/temp1581022765/tmp250626628,
hdfs://tasktracker:44445/tmp/temp1581022765/tmp2100381358,
hdfs://tasktracker:44445/tmp/temp1581022765/tmp167762091*
*
*
*org.apache.pig.backend.executionengine.ExecException: ERROR 6017: Execution
failed, while processing
hdfs://tasktracker:44445/tmp/temp1581022765/tmp939224290,
hdfs://tasktracker:44445/tmp/temp1581022765/tmp-1028111033,
hdfs://tasktracker:44445/tmp/temp1581022765/tmp-198156265,
hdfs://tasktracker:44445/tmp/temp1581022765/tmp-72050900,
hdfs://tasktracker:44445/tmp/temp1581022765/tmp-141993299,
hdfs://tasktracker:44445/tmp/temp1581022765/tmp2135611534,
hdfs://tasktracker:44445/tmp/temp1581022765/tmp-2093411384,
hdfs://tasktracker:44445/tmp/temp1581022765/tmp250626628,
hdfs://tasktracker:44445/tmp/temp1581022765/tmp2100381358,
hdfs://tasktracker:44445/tmp/temp1581022765/tmp167762091*
*        at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:181)
*
*        at
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:265)
*
*        at
org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:777)*
*        at org.apache.pig.PigServer.execute(PigServer.java:770)*
*        at org.apache.pig.PigServer.access$100(PigServer.java:89)*
*        at org.apache.pig.PigServer$Graph.execute(PigServer.java:947)*
*        at org.apache.pig.PigServer.executeBatch(PigServer.java:249)*
*        at
org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:115)*
*        at
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:172)
*
*        at
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:144)
*
*        at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89)*
*        at org.apache.pig.Main.main(Main.java:320)*
*
================================================================================
*

Re: ERROR 6017: Execution failed, while processing

Posted by hc busy <hc...@gmail.com>.

Okay, Alan, I went back and read PIG-928, which is great!

The example

%declare a(x, y, k1, k2) join x by k1, y by k2

is nice, but maybe a little more extensive. Because I need something like

>declare template(table1, table2, fieldName1, fieldName2, const1, const2){
   temp = filter table1 by filedName1=const1;
   temp2 = filter table2 by fieldName2=const2;
   temp3 = join temp by common, temp2 by common;
   result = foreach temp3 generate flatten(table1), flatten(table2);
   return result;
}
> D = template(input1, input2, key, name, 'pig', 'user');
> store D into .....;


Where the inside of what's declared is more complicated than one alias
expression. Granted, the syntax of passing in aliases, field names, and
constants, is awkward, but that problem can be solved Mainly the problem is
I retype the same piece of code several times, or I use cat/sed to
substitute and have a "generate pig script from pig script" step, which adds
a lot of room for error. For iterative tasks, I can regenerate different
copies of the 0.5k pig file by substituting in yet more input names...

if we include recursive define's then it would take care of iteration as
well.

Ohh, also, from my other email to pig-dev, reentrant/recursive FOREACH would
help a lot too, is there any plan to enable that as well?


On Tue, Mar 16, 2010 at 5:31 PM, hc busy <hc...@gmail.com> wrote:

> ahhh, that %declare a(...) is exactly what would help with the variable
> name problem. Because otherwise, it's like a register language where all
> function.pig files take parameters named a1,a2,a3,a4,a5,... and before
> #include'ing a pig file, the caller sets a1, a2, a3, a4,...; and
> function.pig will return values in a1, a2, a3 after it's done.
>
> Personally I look at pig as a nice high-level language to type into (as
> opposed to some assembly language that we'll eventually compile sql into).
> In terms of optimization, I think if we stay functional, it is essentially a
> data flow language (except with the possibility of infinitely long pipes)
>
> If there will be no flow control in pig, then the other way to go about
> this is to introduce JIT type technology. Basically the external command
> streams in commands (like my typing into grunt), but the system continuously
> optimizes the execution and caching of intermediate results.
>
> Or else, hey, or else we could just introduce "GOTO" statements. ;-)
>
>
>
> On Tue, Mar 16, 2010 at 3:08 PM, Alan Gates <ga...@yahoo-inc.com> wrote:
>
>> IMO Pig would do best to stay a data flow language and not take on control
>> flow.  (I'm not sure all committers agree with me on this.)  There's no lack
>> of scripting languages out there that can be used for that (as seen on
>> PIG-928) or frameworks like Piglet or Oozie.  But we could still do C
>> preprocessor style stuff.  We've taken the first step of parameter
>> substitution.  If we took two more steps, %include and arguments for
>> parameter substitution (that is the ability to say %declare a(x, y, k1, k2)
>> join x by k1, y by k2) , we would avoid control flow while still adding a
>> lot of benefit.  Full bore data pipelines will always need some kind of work
>> flow system to manage their various Pig components.  But it would be nice if
>> for medium sized jobs (say 500 lines of Pig Latin) Pig was still usable
>> without the added complexity of workflow.  If we do this in steps, include
>> now, arguments for %declare later, I think that's fine.  I'd just like to
>> see a plan for where we're going with it.
>>
>> Alan.
>>
>>
>> On Mar 15, 2010, at 5:28 PM, Dmitriy Ryaboy wrote:
>>
>>  Alan -- yeah, right now we use the rather brittle approach of naming
>>> conventions to do this. Something more template/macro-like would be
>>> better.
>>> Of course something like Piglet, or equivalents in other languages, can
>>> obviate the need for these constructs, and I am not entirely sure
>>> functions,
>>> loops, etc are something we want to get into reinventing. I guess the
>>> question becomes whether we want Pig Latin to be a first-class language
>>> that
>>> programmers write code in directly, or if we shift focus on building out
>>> the
>>> tooling for generating Pig scripts, and Pig Latin becomes something you
>>> drop
>>> into for one-offs.
>>>
>>> -D
>>>
>>> On Mon, Mar 15, 2010 at 4:02 PM, Alan Gates <ga...@yahoo-inc.com> wrote:
>>>
>>>  In your example below how would the results of these load functions be
>>>> accessed in your main script?
>>>>
>>>> I certainly see the value of #include plus functions (or #define if you
>>>> prefer).  Without functions though you'll have namespace clashes (any
>>>> relation names used in the imported files will be visible to other
>>>> imported
>>>> files and to the main script) and the user will have to know the name of
>>>> input and output relations for the imported files so he can use it
>>>> subsequently in his script.  For example if you had a pig script that
>>>> implemented a certain type of join:
>>>>
>>>> RETURN = join INPUT1 by $0, INPUT2 by $0
>>>>
>>>> Now the user has to know that INPUT1 and INPUT2 must be the names of his
>>>> input relations and that the output relation will be named RETURN.  This
>>>> is
>>>> also limited because we can't define which key(s) to do the join on.  To
>>>> make this useful we're going to want a macro or function ability so we
>>>> can
>>>> pass in names of inputs and other parameters (like which keys to join
>>>> on),
>>>> control the names of results, and have variable scoping.
>>>>
>>>> That said, I'm all for it.  I think it would make Pig must more usable.
>>>>
>>>> Alan.
>>>>
>>>>
>>>>
>>>>
>>>> On Mar 15, 2010, at 2:58 PM, Dmitriy Ryaboy wrote:
>>>>
>>>> Alan, this would be quite useful, as essentially this would allow
>>>>
>>>>> developers
>>>>> to create functions by writing them into separate pig scripts and
>>>>> combining
>>>>> them as necessary.
>>>>>
>>>>> For example we have code that auto-generates load statements with
>>>>> fairly
>>>>> complex schemas based on protocol buffers (see
>>>>>
>>>>>
>>>>> http://www.slideshare.net/hadoopusergroup/twitter-protobufs-and-hadoop-hug-021709
>>>>> ).
>>>>> It would be very handy to be able to say something like
>>>>>
>>>>> #include common_jars.pig
>>>>> #include load_tweets.pig
>>>>> #include load_users.pig
>>>>>
>>>>> #include filter_nonenglish_tweets.pig
>>>>> #include geomap_users.pig
>>>>>
>>>>> .. etc ..
>>>>>
>>>>> -D
>>>>>
>>>>> On Mon, Mar 15, 2010 at 2:23 PM, Alan Gates <ga...@yahoo-inc.com>
>>>>> wrote:
>>>>>
>>>>>
>>>>>  On Mar 12, 2010, at 10:36 AM, hc busy wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>>  Is there any work towards something like C languages '#include' in
>>>>>>> Pig?
>>>>>>> My
>>>>>>> large pig script is actually developed separately in several smaller
>>>>>>> pig
>>>>>>> files. Individually the pig files do not run because they depend on
>>>>>>> previous
>>>>>>> scripts, but logically they are separate because each step does
>>>>>>> something
>>>>>>> different.
>>>>>>>
>>>>>>> Currently the only thing existing along these lines is the exec
>>>>>>> command
>>>>>>>
>>>>>>>  in grunt.  I don't think we're opposed to a #include functionality,
>>>>>> we
>>>>>> just
>>>>>> haven't done it.  However, given that Pig doesn't have function calls,
>>>>>> and
>>>>>> presumably each Pig Latin script is self contained, it isn't clear to
>>>>>> me
>>>>>> how
>>>>>> useful it will be.
>>>>>>
>>>>>> Alan.
>>>>>>
>>>>>>
>>>>>>
>>>>
>>
>

Re: ERROR 6017: Execution failed, while processing

Posted by hc busy <hc...@gmail.com>.

ahhh, that %declare a(...) is exactly what would help with the variable name
problem. Because otherwise, it's like a register language where all
function.pig files take parameters named a1,a2,a3,a4,a5,... and before
#include'ing a pig file, the caller sets a1, a2, a3, a4,...; and
function.pig will return values in a1, a2, a3 after it's done.

Personally I look at pig as a nice high-level language to type into (as
opposed to some assembly language that we'll eventually compile sql into).
In terms of optimization, I think if we stay functional, it is essentially a
data flow language (except with the possibility of infinitely long pipes)

If there will be no flow control in pig, then the other way to go about this
is to introduce JIT type technology. Basically the external command streams
in commands (like my typing into grunt), but the system continuously
optimizes the execution and caching of intermediate results.

Or else, hey, or else we could just introduce "GOTO" statements. ;-)



On Tue, Mar 16, 2010 at 3:08 PM, Alan Gates <ga...@yahoo-inc.com> wrote:

> IMO Pig would do best to stay a data flow language and not take on control
> flow.  (I'm not sure all committers agree with me on this.)  There's no lack
> of scripting languages out there that can be used for that (as seen on
> PIG-928) or frameworks like Piglet or Oozie.  But we could still do C
> preprocessor style stuff.  We've taken the first step of parameter
> substitution.  If we took two more steps, %include and arguments for
> parameter substitution (that is the ability to say %declare a(x, y, k1, k2)
> join x by k1, y by k2) , we would avoid control flow while still adding a
> lot of benefit.  Full bore data pipelines will always need some kind of work
> flow system to manage their various Pig components.  But it would be nice if
> for medium sized jobs (say 500 lines of Pig Latin) Pig was still usable
> without the added complexity of workflow.  If we do this in steps, include
> now, arguments for %declare later, I think that's fine.  I'd just like to
> see a plan for where we're going with it.
>
> Alan.
>
>
> On Mar 15, 2010, at 5:28 PM, Dmitriy Ryaboy wrote:
>
>  Alan -- yeah, right now we use the rather brittle approach of naming
>> conventions to do this. Something more template/macro-like would be
>> better.
>> Of course something like Piglet, or equivalents in other languages, can
>> obviate the need for these constructs, and I am not entirely sure
>> functions,
>> loops, etc are something we want to get into reinventing. I guess the
>> question becomes whether we want Pig Latin to be a first-class language
>> that
>> programmers write code in directly, or if we shift focus on building out
>> the
>> tooling for generating Pig scripts, and Pig Latin becomes something you
>> drop
>> into for one-offs.
>>
>> -D
>>
>> On Mon, Mar 15, 2010 at 4:02 PM, Alan Gates <ga...@yahoo-inc.com> wrote:
>>
>>  In your example below how would the results of these load functions be
>>> accessed in your main script?
>>>
>>> I certainly see the value of #include plus functions (or #define if you
>>> prefer).  Without functions though you'll have namespace clashes (any
>>> relation names used in the imported files will be visible to other
>>> imported
>>> files and to the main script) and the user will have to know the name of
>>> input and output relations for the imported files so he can use it
>>> subsequently in his script.  For example if you had a pig script that
>>> implemented a certain type of join:
>>>
>>> RETURN = join INPUT1 by $0, INPUT2 by $0
>>>
>>> Now the user has to know that INPUT1 and INPUT2 must be the names of his
>>> input relations and that the output relation will be named RETURN.  This
>>> is
>>> also limited because we can't define which key(s) to do the join on.  To
>>> make this useful we're going to want a macro or function ability so we
>>> can
>>> pass in names of inputs and other parameters (like which keys to join
>>> on),
>>> control the names of results, and have variable scoping.
>>>
>>> That said, I'm all for it.  I think it would make Pig must more usable.
>>>
>>> Alan.
>>>
>>>
>>>
>>>
>>> On Mar 15, 2010, at 2:58 PM, Dmitriy Ryaboy wrote:
>>>
>>> Alan, this would be quite useful, as essentially this would allow
>>>
>>>> developers
>>>> to create functions by writing them into separate pig scripts and
>>>> combining
>>>> them as necessary.
>>>>
>>>> For example we have code that auto-generates load statements with fairly
>>>> complex schemas based on protocol buffers (see
>>>>
>>>>
>>>> http://www.slideshare.net/hadoopusergroup/twitter-protobufs-and-hadoop-hug-021709
>>>> ).
>>>> It would be very handy to be able to say something like
>>>>
>>>> #include common_jars.pig
>>>> #include load_tweets.pig
>>>> #include load_users.pig
>>>>
>>>> #include filter_nonenglish_tweets.pig
>>>> #include geomap_users.pig
>>>>
>>>> .. etc ..
>>>>
>>>> -D
>>>>
>>>> On Mon, Mar 15, 2010 at 2:23 PM, Alan Gates <ga...@yahoo-inc.com>
>>>> wrote:
>>>>
>>>>
>>>>  On Mar 12, 2010, at 10:36 AM, hc busy wrote:
>>>>>
>>>>>
>>>>>
>>>>>  Is there any work towards something like C languages '#include' in
>>>>>> Pig?
>>>>>> My
>>>>>> large pig script is actually developed separately in several smaller
>>>>>> pig
>>>>>> files. Individually the pig files do not run because they depend on
>>>>>> previous
>>>>>> scripts, but logically they are separate because each step does
>>>>>> something
>>>>>> different.
>>>>>>
>>>>>> Currently the only thing existing along these lines is the exec
>>>>>> command
>>>>>>
>>>>>>  in grunt.  I don't think we're opposed to a #include functionality,
>>>>> we
>>>>> just
>>>>> haven't done it.  However, given that Pig doesn't have function calls,
>>>>> and
>>>>> presumably each Pig Latin script is self contained, it isn't clear to
>>>>> me
>>>>> how
>>>>> useful it will be.
>>>>>
>>>>> Alan.
>>>>>
>>>>>
>>>>>
>>>
>

Re: ERROR 6017: Execution failed, while processing

Posted by Alan Gates <ga...@yahoo-inc.com>.

IMO Pig would do best to stay a data flow language and not take on  
control flow.  (I'm not sure all committers agree with me on this.)   
There's no lack of scripting languages out there that can be used for  
that (as seen on PIG-928) or frameworks like Piglet or Oozie.  But we  
could still do C preprocessor style stuff.  We've taken the first step  
of parameter substitution.  If we took two more steps, %include and  
arguments for parameter substitution (that is the ability to say  
%declare a(x, y, k1, k2) join x by k1, y by k2) , we would avoid  
control flow while still adding a lot of benefit.  Full bore data  
pipelines will always need some kind of work flow system to manage  
their various Pig components.  But it would be nice if for medium  
sized jobs (say 500 lines of Pig Latin) Pig was still usable without  
the added complexity of workflow.  If we do this in steps, include  
now, arguments for %declare later, I think that's fine.  I'd just like  
to see a plan for where we're going with it.

Alan.

On Mar 15, 2010, at 5:28 PM, Dmitriy Ryaboy wrote:

> Alan -- yeah, right now we use the rather brittle approach of naming
> conventions to do this. Something more template/macro-like would be  
> better.
> Of course something like Piglet, or equivalents in other languages,  
> can
> obviate the need for these constructs, and I am not entirely sure  
> functions,
> loops, etc are something we want to get into reinventing. I guess the
> question becomes whether we want Pig Latin to be a first-class  
> language that
> programmers write code in directly, or if we shift focus on building  
> out the
> tooling for generating Pig scripts, and Pig Latin becomes something  
> you drop
> into for one-offs.
>
> -D
>
> On Mon, Mar 15, 2010 at 4:02 PM, Alan Gates <ga...@yahoo-inc.com>  
> wrote:
>
>> In your example below how would the results of these load functions  
>> be
>> accessed in your main script?
>>
>> I certainly see the value of #include plus functions (or #define if  
>> you
>> prefer).  Without functions though you'll have namespace clashes (any
>> relation names used in the imported files will be visible to other  
>> imported
>> files and to the main script) and the user will have to know the  
>> name of
>> input and output relations for the imported files so he can use it
>> subsequently in his script.  For example if you had a pig script that
>> implemented a certain type of join:
>>
>> RETURN = join INPUT1 by $0, INPUT2 by $0
>>
>> Now the user has to know that INPUT1 and INPUT2 must be the names  
>> of his
>> input relations and that the output relation will be named RETURN.   
>> This is
>> also limited because we can't define which key(s) to do the join  
>> on.  To
>> make this useful we're going to want a macro or function ability so  
>> we can
>> pass in names of inputs and other parameters (like which keys to  
>> join on),
>> control the names of results, and have variable scoping.
>>
>> That said, I'm all for it.  I think it would make Pig must more  
>> usable.
>>
>> Alan.
>>
>>
>>
>>
>> On Mar 15, 2010, at 2:58 PM, Dmitriy Ryaboy wrote:
>>
>> Alan, this would be quite useful, as essentially this would allow
>>> developers
>>> to create functions by writing them into separate pig scripts and
>>> combining
>>> them as necessary.
>>>
>>> For example we have code that auto-generates load statements with  
>>> fairly
>>> complex schemas based on protocol buffers (see
>>>
>>> http://www.slideshare.net/hadoopusergroup/twitter-protobufs-and-hadoop-hug-021709
>>> ).
>>> It would be very handy to be able to say something like
>>>
>>> #include common_jars.pig
>>> #include load_tweets.pig
>>> #include load_users.pig
>>>
>>> #include filter_nonenglish_tweets.pig
>>> #include geomap_users.pig
>>>
>>> .. etc ..
>>>
>>> -D
>>>
>>> On Mon, Mar 15, 2010 at 2:23 PM, Alan Gates <ga...@yahoo-inc.com>  
>>> wrote:
>>>
>>>
>>>> On Mar 12, 2010, at 10:36 AM, hc busy wrote:
>>>>
>>>>
>>>>
>>>>> Is there any work towards something like C languages '#include'  
>>>>> in Pig?
>>>>> My
>>>>> large pig script is actually developed separately in several  
>>>>> smaller pig
>>>>> files. Individually the pig files do not run because they depend  
>>>>> on
>>>>> previous
>>>>> scripts, but logically they are separate because each step does
>>>>> something
>>>>> different.
>>>>>
>>>>> Currently the only thing existing along these lines is the exec  
>>>>> command
>>>>>
>>>> in grunt.  I don't think we're opposed to a #include  
>>>> functionality, we
>>>> just
>>>> haven't done it.  However, given that Pig doesn't have function  
>>>> calls,
>>>> and
>>>> presumably each Pig Latin script is self contained, it isn't  
>>>> clear to me
>>>> how
>>>> useful it will be.
>>>>
>>>> Alan.
>>>>
>>>>
>>

Re: ERROR 6017: Execution failed, while processing

Posted by Dmitriy Ryaboy <dv...@gmail.com>.

Alan -- yeah, right now we use the rather brittle approach of naming
conventions to do this. Something more template/macro-like would be better.
Of course something like Piglet, or equivalents in other languages, can
obviate the need for these constructs, and I am not entirely sure functions,
loops, etc are something we want to get into reinventing. I guess the
question becomes whether we want Pig Latin to be a first-class language that
programmers write code in directly, or if we shift focus on building out the
tooling for generating Pig scripts, and Pig Latin becomes something you drop
into for one-offs.

-D

On Mon, Mar 15, 2010 at 4:02 PM, Alan Gates <ga...@yahoo-inc.com> wrote:

> In your example below how would the results of these load functions be
> accessed in your main script?
>
> I certainly see the value of #include plus functions (or #define if you
> prefer).  Without functions though you'll have namespace clashes (any
> relation names used in the imported files will be visible to other imported
> files and to the main script) and the user will have to know the name of
> input and output relations for the imported files so he can use it
> subsequently in his script.  For example if you had a pig script that
> implemented a certain type of join:
>
> RETURN = join INPUT1 by $0, INPUT2 by $0
>
> Now the user has to know that INPUT1 and INPUT2 must be the names of his
> input relations and that the output relation will be named RETURN.  This is
> also limited because we can't define which key(s) to do the join on.  To
> make this useful we're going to want a macro or function ability so we can
> pass in names of inputs and other parameters (like which keys to join on),
> control the names of results, and have variable scoping.
>
> That said, I'm all for it.  I think it would make Pig must more usable.
>
> Alan.
>
>
>
>
> On Mar 15, 2010, at 2:58 PM, Dmitriy Ryaboy wrote:
>
>  Alan, this would be quite useful, as essentially this would allow
>> developers
>> to create functions by writing them into separate pig scripts and
>> combining
>> them as necessary.
>>
>> For example we have code that auto-generates load statements with fairly
>> complex schemas based on protocol buffers (see
>>
>> http://www.slideshare.net/hadoopusergroup/twitter-protobufs-and-hadoop-hug-021709
>> ).
>> It would be very handy to be able to say something like
>>
>> #include common_jars.pig
>> #include load_tweets.pig
>> #include load_users.pig
>>
>> #include filter_nonenglish_tweets.pig
>> #include geomap_users.pig
>>
>> .. etc ..
>>
>> -D
>>
>> On Mon, Mar 15, 2010 at 2:23 PM, Alan Gates <ga...@yahoo-inc.com> wrote:
>>
>>
>>> On Mar 12, 2010, at 10:36 AM, hc busy wrote:
>>>
>>>
>>>
>>>> Is there any work towards something like C languages '#include' in Pig?
>>>> My
>>>> large pig script is actually developed separately in several smaller pig
>>>> files. Individually the pig files do not run because they depend on
>>>> previous
>>>> scripts, but logically they are separate because each step does
>>>> something
>>>> different.
>>>>
>>>> Currently the only thing existing along these lines is the exec command
>>>>
>>> in grunt.  I don't think we're opposed to a #include functionality, we
>>> just
>>> haven't done it.  However, given that Pig doesn't have function calls,
>>> and
>>> presumably each Pig Latin script is self contained, it isn't clear to me
>>> how
>>> useful it will be.
>>>
>>> Alan.
>>>
>>>
>

Re: ERROR 6017: Execution failed, while processing

Posted by Alan Gates <ga...@yahoo-inc.com>.

In your example below how would the results of these load functions be  
accessed in your main script?

I certainly see the value of #include plus functions (or #define if  
you prefer).  Without functions though you'll have namespace clashes  
(any relation names used in the imported files will be visible to  
other imported files and to the main script) and the user will have to  
know the name of input and output relations for the imported files so  
he can use it subsequently in his script.  For example if you had a  
pig script that implemented a certain type of join:

RETURN = join INPUT1 by $0, INPUT2 by $0

Now the user has to know that INPUT1 and INPUT2 must be the names of  
his input relations and that the output relation will be named  
RETURN.  This is also limited because we can't define which key(s) to  
do the join on.  To make this useful we're going to want a macro or  
function ability so we can pass in names of inputs and other  
parameters (like which keys to join on), control the names of results,  
and have variable scoping.

That said, I'm all for it.  I think it would make Pig must more usable.

Alan.

On Mar 15, 2010, at 2:58 PM, Dmitriy Ryaboy wrote:

> Alan, this would be quite useful, as essentially this would allow  
> developers
> to create functions by writing them into separate pig scripts and  
> combining
> them as necessary.
>
> For example we have code that auto-generates load statements with  
> fairly
> complex schemas based on protocol buffers (see
> http://www.slideshare.net/hadoopusergroup/twitter-protobufs-and-hadoop-hug-021709) 
> .
> It would be very handy to be able to say something like
>
> #include common_jars.pig
> #include load_tweets.pig
> #include load_users.pig
>
> #include filter_nonenglish_tweets.pig
> #include geomap_users.pig
>
> .. etc ..
>
> -D
>
> On Mon, Mar 15, 2010 at 2:23 PM, Alan Gates <ga...@yahoo-inc.com>  
> wrote:
>
>>
>> On Mar 12, 2010, at 10:36 AM, hc busy wrote:
>>
>>
>>>
>>> Is there any work towards something like C languages '#include' in  
>>> Pig? My
>>> large pig script is actually developed separately in several  
>>> smaller pig
>>> files. Individually the pig files do not run because they depend on
>>> previous
>>> scripts, but logically they are separate because each step does  
>>> something
>>> different.
>>>
>>> Currently the only thing existing along these lines is the exec  
>>> command
>> in grunt.  I don't think we're opposed to a #include functionality,  
>> we just
>> haven't done it.  However, given that Pig doesn't have function  
>> calls, and
>> presumably each Pig Latin script is self contained, it isn't clear  
>> to me how
>> useful it will be.
>>
>> Alan.
>>

Re: ERROR 6017: Execution failed, while processing

Posted by Dmitriy Ryaboy <dv...@gmail.com>.

Alan, this would be quite useful, as essentially this would allow developers
to create functions by writing them into separate pig scripts and combining
them as necessary.

For example we have code that auto-generates load statements with fairly
complex schemas based on protocol buffers (see
http://www.slideshare.net/hadoopusergroup/twitter-protobufs-and-hadoop-hug-021709).
It would be very handy to be able to say something like

#include common_jars.pig
#include load_tweets.pig
#include load_users.pig

#include filter_nonenglish_tweets.pig
#include geomap_users.pig

.. etc ..

-D

On Mon, Mar 15, 2010 at 2:23 PM, Alan Gates <ga...@yahoo-inc.com> wrote:

>
> On Mar 12, 2010, at 10:36 AM, hc busy wrote:
>
>
>>
>> Is there any work towards something like C languages '#include' in Pig? My
>> large pig script is actually developed separately in several smaller pig
>> files. Individually the pig files do not run because they depend on
>> previous
>> scripts, but logically they are separate because each step does something
>> different.
>>
>>  Currently the only thing existing along these lines is the exec command
> in grunt.  I don't think we're opposed to a #include functionality, we just
> haven't done it.  However, given that Pig doesn't have function calls, and
> presumably each Pig Latin script is self contained, it isn't clear to me how
> useful it will be.
>
> Alan.
>

Re: ERROR 6017: Execution failed, while processing

Posted by Alan Gates <ga...@yahoo-inc.com>.

On Mar 12, 2010, at 10:36 AM, hc busy wrote:

>
>
> Is there any work towards something like C languages '#include' in  
> Pig? My
> large pig script is actually developed separately in several smaller  
> pig
> files. Individually the pig files do not run because they depend on  
> previous
> scripts, but logically they are separate because each step does  
> something
> different.
>
Currently the only thing existing along these lines is the exec  
command in grunt.  I don't think we're opposed to a #include  
functionality, we just haven't done it.  However, given that Pig  
doesn't have function calls, and presumably each Pig Latin script is  
self contained, it isn't clear to me how useful it will be.

Alan.

Re: ERROR 6017: Execution failed, while processing

Posted by hc busy <hc...@gmail.com>.

Oh, I see what my problem is... my
PigPen<http://wiki.apache.org/pig/PigPen>isn't configured correctly so
it's not highlighting any errors...

Is there any work towards something like C languages '#include' in Pig? My
large pig script is actually developed separately in several smaller pig
files. Individually the pig files do not run because they depend on previous
scripts, but logically they are separate because each step does something
different.

If '#include' is allowed, then I can actually edit the original source and
debug in PigPen as opposed to manually concatenating them and editing
outside of repository.




On Wed, Mar 10, 2010 at 9:34 AM, hc busy <hc...@gmail.com> wrote:

>
> Okay, just a quick update, I eventually found the actual java error from
> hadoop logs, but it was equally confusing. It complains of accessing the 4th
> element of a tuple that has only one item. But still, it doesn't say which
> line of pig latin introduced that error.
>
> I commented out portions of my large pig script until I found the offending
> line... I wish there was an easier way to debug this...
>
>
> On Mon, Mar 8, 2010 at 5:25 PM, hc busy <hc...@gmail.com> wrote:
>
>>
>> Guys, I just ran into a weird exception 500 lines into writing a pig
>> script... Below attached is the error. Does anybody have any idea about how
>> to debug this? I don't even know which step of my 500 line pig script caused
>> this error.
>>
>> Any suggestions on how to track down the offending operation?
>>
>> Thanks in advance!
>> *
>> *
>> *
>> *
>> *Pig Stack Trace*
>> *---------------*
>> *ERROR 6017: Execution failed, while processing
>> hdfs://tasktracker:44445/tmp/temp1581022765/tmp939224290,
>> hdfs://tasktracker:44445/tmp/temp1581022765/tmp-1028111033,
>> hdfs://tasktracker:44445/tmp/temp1581022765/tmp-198156265,
>> hdfs://tasktracker:44445/tmp/temp1581022765/tmp-72050900,
>> hdfs://tasktracker:44445/tmp/temp1581022765/tmp-141993299,
>> hdfs://tasktracker:44445/tmp/temp1581022765/tmp2135611534,
>> hdfs://tasktracker:44445/tmp/temp1581022765/tmp-2093411384,
>> hdfs://tasktracker:44445/tmp/temp1581022765/tmp250626628,
>> hdfs://tasktracker:44445/tmp/temp1581022765/tmp2100381358,
>> hdfs://tasktracker:44445/tmp/temp1581022765/tmp167762091*
>> *
>> *
>> *org.apache.pig.backend.executionengine.ExecException: ERROR 6017:
>> Execution failed, while processing
>> hdfs://tasktracker:44445/tmp/temp1581022765/tmp939224290,
>> hdfs://tasktracker:44445/tmp/temp1581022765/tmp-1028111033,
>> hdfs://tasktracker:44445/tmp/temp1581022765/tmp-198156265,
>> hdfs://tasktracker:44445/tmp/temp1581022765/tmp-72050900,
>> hdfs://tasktracker:44445/tmp/temp1581022765/tmp-141993299,
>> hdfs://tasktracker:44445/tmp/temp1581022765/tmp2135611534,
>> hdfs://tasktracker:44445/tmp/temp1581022765/tmp-2093411384,
>> hdfs://tasktracker:44445/tmp/temp1581022765/tmp250626628,
>> hdfs://tasktracker:44445/tmp/temp1581022765/tmp2100381358,
>> hdfs://tasktracker:44445/tmp/temp1581022765/tmp167762091*
>> *        at
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:181)
>> *
>> *        at
>> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:265)
>> *
>> *        at
>> org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:777)*
>> *        at org.apache.pig.PigServer.execute(PigServer.java:770)*
>> *        at org.apache.pig.PigServer.access$100(PigServer.java:89)*
>> *        at org.apache.pig.PigServer$Graph.execute(PigServer.java:947)*
>> *        at org.apache.pig.PigServer.executeBatch(PigServer.java:249)*
>> *        at
>> org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:115)
>> *
>> *        at
>> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:172)
>> *
>> *        at
>> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:144)
>> *
>> *        at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89)*
>> *        at org.apache.pig.Main.main(Main.java:320)*
>> *
>> ================================================================================
>> *
>>
>
>

Re: ERROR 6017: Execution failed, while processing

Posted by hc busy <hc...@gmail.com>.

Oh, I see what my problem is... my
PigPen<http://wiki.apache.org/pig/PigPen>isn't configured correctly so
it's not highlighting any errors...

Is there any work towards something like C languages '#include' in Pig? My
large pig script is actually developed separately in several smaller pig
files. Individually the pig files do not run because they depend on previous
scripts, but logically they are separate because each step does something
different.

If '#include' is allowed, then I can actually edit the original source and
debug in PigPen as opposed to manually concatenating them and editing
outside of repository.




On Wed, Mar 10, 2010 at 9:34 AM, hc busy <hc...@gmail.com> wrote:

>
> Okay, just a quick update, I eventually found the actual java error from
> hadoop logs, but it was equally confusing. It complains of accessing the 4th
> element of a tuple that has only one item. But still, it doesn't say which
> line of pig latin introduced that error.
>
> I commented out portions of my large pig script until I found the offending
> line... I wish there was an easier way to debug this...
>
>
> On Mon, Mar 8, 2010 at 5:25 PM, hc busy <hc...@gmail.com> wrote:
>
>>
>> Guys, I just ran into a weird exception 500 lines into writing a pig
>> script... Below attached is the error. Does anybody have any idea about how
>> to debug this? I don't even know which step of my 500 line pig script caused
>> this error.
>>
>> Any suggestions on how to track down the offending operation?
>>
>> Thanks in advance!
>> *
>> *
>> *
>> *
>> *Pig Stack Trace*
>> *---------------*
>> *ERROR 6017: Execution failed, while processing
>> hdfs://tasktracker:44445/tmp/temp1581022765/tmp939224290,
>> hdfs://tasktracker:44445/tmp/temp1581022765/tmp-1028111033,
>> hdfs://tasktracker:44445/tmp/temp1581022765/tmp-198156265,
>> hdfs://tasktracker:44445/tmp/temp1581022765/tmp-72050900,
>> hdfs://tasktracker:44445/tmp/temp1581022765/tmp-141993299,
>> hdfs://tasktracker:44445/tmp/temp1581022765/tmp2135611534,
>> hdfs://tasktracker:44445/tmp/temp1581022765/tmp-2093411384,
>> hdfs://tasktracker:44445/tmp/temp1581022765/tmp250626628,
>> hdfs://tasktracker:44445/tmp/temp1581022765/tmp2100381358,
>> hdfs://tasktracker:44445/tmp/temp1581022765/tmp167762091*
>> *
>> *
>> *org.apache.pig.backend.executionengine.ExecException: ERROR 6017:
>> Execution failed, while processing
>> hdfs://tasktracker:44445/tmp/temp1581022765/tmp939224290,
>> hdfs://tasktracker:44445/tmp/temp1581022765/tmp-1028111033,
>> hdfs://tasktracker:44445/tmp/temp1581022765/tmp-198156265,
>> hdfs://tasktracker:44445/tmp/temp1581022765/tmp-72050900,
>> hdfs://tasktracker:44445/tmp/temp1581022765/tmp-141993299,
>> hdfs://tasktracker:44445/tmp/temp1581022765/tmp2135611534,
>> hdfs://tasktracker:44445/tmp/temp1581022765/tmp-2093411384,
>> hdfs://tasktracker:44445/tmp/temp1581022765/tmp250626628,
>> hdfs://tasktracker:44445/tmp/temp1581022765/tmp2100381358,
>> hdfs://tasktracker:44445/tmp/temp1581022765/tmp167762091*
>> *        at
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:181)
>> *
>> *        at
>> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:265)
>> *
>> *        at
>> org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:777)*
>> *        at org.apache.pig.PigServer.execute(PigServer.java:770)*
>> *        at org.apache.pig.PigServer.access$100(PigServer.java:89)*
>> *        at org.apache.pig.PigServer$Graph.execute(PigServer.java:947)*
>> *        at org.apache.pig.PigServer.executeBatch(PigServer.java:249)*
>> *        at
>> org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:115)
>> *
>> *        at
>> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:172)
>> *
>> *        at
>> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:144)
>> *
>> *        at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89)*
>> *        at org.apache.pig.Main.main(Main.java:320)*
>> *
>> ================================================================================
>> *
>>
>
>

Re: ERROR 6017: Execution failed, while processing

Posted by hc busy <hc...@gmail.com>.

Okay, just a quick update, I eventually found the actual java error from
hadoop logs, but it was equally confusing. It complains of accessing the 4th
element of a tuple that has only one item. But still, it doesn't say which
line of pig latin introduced that error.

I commented out portions of my large pig script until I found the offending
line... I wish there was an easier way to debug this...


On Mon, Mar 8, 2010 at 5:25 PM, hc busy <hc...@gmail.com> wrote:

>
> Guys, I just ran into a weird exception 500 lines into writing a pig
> script... Below attached is the error. Does anybody have any idea about how
> to debug this? I don't even know which step of my 500 line pig script caused
> this error.
>
> Any suggestions on how to track down the offending operation?
>
> Thanks in advance!
> *
> *
> *
> *
> *Pig Stack Trace*
> *---------------*
> *ERROR 6017: Execution failed, while processing
> hdfs://tasktracker:44445/tmp/temp1581022765/tmp939224290,
> hdfs://tasktracker:44445/tmp/temp1581022765/tmp-1028111033,
> hdfs://tasktracker:44445/tmp/temp1581022765/tmp-198156265,
> hdfs://tasktracker:44445/tmp/temp1581022765/tmp-72050900,
> hdfs://tasktracker:44445/tmp/temp1581022765/tmp-141993299,
> hdfs://tasktracker:44445/tmp/temp1581022765/tmp2135611534,
> hdfs://tasktracker:44445/tmp/temp1581022765/tmp-2093411384,
> hdfs://tasktracker:44445/tmp/temp1581022765/tmp250626628,
> hdfs://tasktracker:44445/tmp/temp1581022765/tmp2100381358,
> hdfs://tasktracker:44445/tmp/temp1581022765/tmp167762091*
> *
> *
> *org.apache.pig.backend.executionengine.ExecException: ERROR 6017:
> Execution failed, while processing
> hdfs://tasktracker:44445/tmp/temp1581022765/tmp939224290,
> hdfs://tasktracker:44445/tmp/temp1581022765/tmp-1028111033,
> hdfs://tasktracker:44445/tmp/temp1581022765/tmp-198156265,
> hdfs://tasktracker:44445/tmp/temp1581022765/tmp-72050900,
> hdfs://tasktracker:44445/tmp/temp1581022765/tmp-141993299,
> hdfs://tasktracker:44445/tmp/temp1581022765/tmp2135611534,
> hdfs://tasktracker:44445/tmp/temp1581022765/tmp-2093411384,
> hdfs://tasktracker:44445/tmp/temp1581022765/tmp250626628,
> hdfs://tasktracker:44445/tmp/temp1581022765/tmp2100381358,
> hdfs://tasktracker:44445/tmp/temp1581022765/tmp167762091*
> *        at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:181)
> *
> *        at
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:265)
> *
> *        at
> org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:777)*
> *        at org.apache.pig.PigServer.execute(PigServer.java:770)*
> *        at org.apache.pig.PigServer.access$100(PigServer.java:89)*
> *        at org.apache.pig.PigServer$Graph.execute(PigServer.java:947)*
> *        at org.apache.pig.PigServer.executeBatch(PigServer.java:249)*
> *        at
> org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:115)*
> *        at
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:172)
> *
> *        at
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:144)
> *
> *        at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89)*
> *        at org.apache.pig.Main.main(Main.java:320)*
> *
> ================================================================================
> *
>

Re: ERROR 6017: Execution failed, while processing

Posted by hc busy <hc...@gmail.com>.

Okay, just a quick update, I eventually found the actual java error from
hadoop logs, but it was equally confusing. It complains of accessing the 4th
element of a tuple that has only one item. But still, it doesn't say which
line of pig latin introduced that error.

I commented out portions of my large pig script until I found the offending
line... I wish there was an easier way to debug this...


On Mon, Mar 8, 2010 at 5:25 PM, hc busy <hc...@gmail.com> wrote:

>
> Guys, I just ran into a weird exception 500 lines into writing a pig
> script... Below attached is the error. Does anybody have any idea about how
> to debug this? I don't even know which step of my 500 line pig script caused
> this error.
>
> Any suggestions on how to track down the offending operation?
>
> Thanks in advance!
> *
> *
> *
> *
> *Pig Stack Trace*
> *---------------*
> *ERROR 6017: Execution failed, while processing
> hdfs://tasktracker:44445/tmp/temp1581022765/tmp939224290,
> hdfs://tasktracker:44445/tmp/temp1581022765/tmp-1028111033,
> hdfs://tasktracker:44445/tmp/temp1581022765/tmp-198156265,
> hdfs://tasktracker:44445/tmp/temp1581022765/tmp-72050900,
> hdfs://tasktracker:44445/tmp/temp1581022765/tmp-141993299,
> hdfs://tasktracker:44445/tmp/temp1581022765/tmp2135611534,
> hdfs://tasktracker:44445/tmp/temp1581022765/tmp-2093411384,
> hdfs://tasktracker:44445/tmp/temp1581022765/tmp250626628,
> hdfs://tasktracker:44445/tmp/temp1581022765/tmp2100381358,
> hdfs://tasktracker:44445/tmp/temp1581022765/tmp167762091*
> *
> *
> *org.apache.pig.backend.executionengine.ExecException: ERROR 6017:
> Execution failed, while processing
> hdfs://tasktracker:44445/tmp/temp1581022765/tmp939224290,
> hdfs://tasktracker:44445/tmp/temp1581022765/tmp-1028111033,
> hdfs://tasktracker:44445/tmp/temp1581022765/tmp-198156265,
> hdfs://tasktracker:44445/tmp/temp1581022765/tmp-72050900,
> hdfs://tasktracker:44445/tmp/temp1581022765/tmp-141993299,
> hdfs://tasktracker:44445/tmp/temp1581022765/tmp2135611534,
> hdfs://tasktracker:44445/tmp/temp1581022765/tmp-2093411384,
> hdfs://tasktracker:44445/tmp/temp1581022765/tmp250626628,
> hdfs://tasktracker:44445/tmp/temp1581022765/tmp2100381358,
> hdfs://tasktracker:44445/tmp/temp1581022765/tmp167762091*
> *        at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:181)
> *
> *        at
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:265)
> *
> *        at
> org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:777)*
> *        at org.apache.pig.PigServer.execute(PigServer.java:770)*
> *        at org.apache.pig.PigServer.access$100(PigServer.java:89)*
> *        at org.apache.pig.PigServer$Graph.execute(PigServer.java:947)*
> *        at org.apache.pig.PigServer.executeBatch(PigServer.java:249)*
> *        at
> org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:115)*
> *        at
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:172)
> *
> *        at
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:144)
> *
> *        at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89)*
> *        at org.apache.pig.Main.main(Main.java:320)*
> *
> ================================================================================
> *
>