You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@pig.apache.org by paradisehit <pa...@163.com> on 2008/09/08 12:27:23 UTC

How can get my A ?

 
A = (a, b, c)
 I just want add a column 0 into A, and the A will be like this:

A = (a, b, c, 0)

How can I?

I use cross, but when I use PARALLEL BIGGER(>=300), it occurs that:

ERROR org.apache.pig.tools.grunt.GruntParser  - java.io.IOException: Unable to store alias null
	at org.apache.pig.impl.util.WrappedIOException.wrap(WrappedIOException.java:34)
	at org.apache.pig.PigServer.registerQuery(PigServer.java:296)
	at org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:475)
	at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:233)
	at org.apache.pig.tools.grunt.GruntParser.parseContOnError(GruntParser.java:91)
	at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:54)
	at org.apache.pig.Main.main(Main.java:270)
Caused by: org.apache.pig.backend.executionengine.ExecException: java.io.IOException: Job failed
	at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:296)
	at org.apache.pig.PigServer.optimizeAndRunQuery(PigServer.java:413)
	at org.apache.pig.PigServer.registerQuery(PigServer.java:293)
	... 5 more
Caused by: java.io.IOException: Job failed
	at org.apache.pig.backend.hadoop.executionengine.POMapreduce.open(POMapreduce.java:188)
	at org.apache.pig.backend.hadoop.executionengine.POMapreduce.open(POMapreduce.java:178)
	at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:282)
	... 7 more

99316 [main] ERROR org.apache.pig.tools.grunt.GruntParser  - java.io.IOException: Unable to store alias null


The Hadoop is 0.18 release, and I use the patch https://issues.apache.org/jira/browse/PIG-253

and I set Parallel 100, it can run OK. Why? I think it opens too many fd, but I have set it 10240.

So 1: how can I get my A?
   2: Why error?

Re:Re: How can get my A ?

Posted by paradisehit <pa...@163.com>.

 
 Indeed The C can not been seen in D, so How can I use it in the GetScalar(C)?

 
 


在2008-09-20，"Alan Gates" <ga...@yahoo-inc.com> 写道：
>
>
>paradisehit wrote:
>>  Thanks, I didn't see the wiki FAQ, but in the wiki PigStreamingFunctionalSpec it is error, can you repaired it? 
>>
>>
>>   
>>>> B = group A all;
>>>> C = foreach B generate COUNT(A);
>>>> store C into 'count';
>>>> D = load 'data2';
>>>> E = foreach D generate $1/GetScalar(C);
>>>>       
>>
>>
>> Also I want to use the C'value as a number to be used in D, how to use it? Write a UDF?
>> Are the alias A, B, C... always Tuples or Bags?
>>   
>Yes, you will need to use a UDF for that. Aliases on the left side are 
>always bags of tuples.
>
>Alan.
>>  
>>  
>>
>>
>> 在2008-09-12，"Olga Natkovich"

Re: How can get my A ?

Posted by Alan Gates <ga...@yahoo-inc.com>.


paradisehit wrote:
>  Thanks, I didn't see the wiki FAQ, but in the wiki PigStreamingFunctionalSpec it is error, can you repaired it? 
>
>
>   
>>> B = group A all;
>>> C = foreach B generate COUNT(A);
>>> store C into 'count';
>>> D = load 'data2';
>>> E = foreach D generate $1/GetScalar(C);
>>>       
>
>
> Also I want to use the C'value as a number to be used in D, how to use it? Write a UDF?
> Are the alias A, B, C... always Tuples or Bags?
>   
Yes, you will need to use a UDF for that. Aliases on the left side are 
always bags of tuples.

Alan.
>  
>  
>
>
> ��2008-09-12��"Olga Natkovich" <ol...@yahoo-inc.com> д����
>   
>> You need to say COUNT(A) not COUNT(B).
>>
>> B is the result of a GROUP and is a set of tuples with 2 fields (group,
>> A) where grup is your key and A is a bag of tuples that matches the key.
>>
>> Olga 
>>
>>     
>>> -----Original Message-----
>>> From: paradisehit [mailto:paradisehit@163.com] 
>>> Sent: Wednesday, September 10, 2008 7:00 PM
>>> To: pig-user@incubator.apache.org
>>> Subject: Re:Re: How can get my A ?
>>>
>>>  And how about when it is not 0, and it is a value that 
>>> generate by another alias, for example:
>>>
>>> CLEAR_LOG = LOAD 'in' AS (query, numbers);
>>>
>>> I want get the total query number, and I use the pig latin as 
>>> the wiki described:
>>>
>>> B = group CLEAR_LOG all;
>>> C = foreach B generate COUNT(B);
>>> store C into 'count';
>>> D = load 'data2';
>>> E = foreach D generate $1/GetScalar(C);
>>>
>>> But it shows that
>>> Invalid alias: B in B: (group: ( ), CLEAR_LOG: (queryString, 
>>> url, type ) ) So I use $1 instead.
>>>
>>> But also I find the GetScalar is not in the Pig, How can I use it?
>>>
>>>
>>>
>>>       
>
>

Re:RE: Re: How can get my A ?

Posted by paradisehit <pa...@163.com>.

 Thanks, I didn't see the wiki FAQ, but in the wiki PigStreamingFunctionalSpec it is error, can you repaired it? 


>> B = group A all;
>> C = foreach B generate COUNT(A);
>> store C into 'count';
>> D = load 'data2';
>> E = foreach D generate $1/GetScalar(C);


Also I want to use the C'value as a number to be used in D, how to use it? Write a UDF?
Are the alias A, B, C... always Tuples or Bags?

 
 


在2008-09-12，"Olga Natkovich" <ol...@yahoo-inc.com> 写道：
>You need to say COUNT(A) not COUNT(B).
>
>B is the result of a GROUP and is a set of tuples with 2 fields (group,
>A) where grup is your key and A is a bag of tuples that matches the key.
>
>Olga 
>
>> -----Original Message-----
>> From: paradisehit [mailto:paradisehit@163.com] 
>> Sent: Wednesday, September 10, 2008 7:00 PM
>> To: pig-user@incubator.apache.org
>> Subject: Re:Re: How can get my A ?
>> 
>>  And how about when it is not 0, and it is a value that 
>> generate by another alias, for example:
>> 
>> CLEAR_LOG = LOAD 'in' AS (query, numbers);
>> 
>> I want get the total query number, and I use the pig latin as 
>> the wiki described:
>> 
>> B = group CLEAR_LOG all;
>> C = foreach B generate COUNT(B);
>> store C into 'count';
>> D = load 'data2';
>> E = foreach D generate $1/GetScalar(C);
>> 
>> But it shows that
>> Invalid alias: B in B: (group: ( ), CLEAR_LOG: (queryString, 
>> url, type ) ) So I use $1 instead.
>> 
>> But also I find the GetScalar is not in the Pig, How can I use it?
>> 
>> 
>>

RE: Re: How can get my A ?

Posted by Olga Natkovich <ol...@yahoo-inc.com>.

You need to say COUNT(A) not COUNT(B).

B is the result of a GROUP and is a set of tuples with 2 fields (group,
A) where grup is your key and A is a bag of tuples that matches the key.

Olga 

> -----Original Message-----
> From: paradisehit [mailto:paradisehit@163.com] 
> Sent: Wednesday, September 10, 2008 7:00 PM
> To: pig-user@incubator.apache.org
> Subject: Re:Re: How can get my A ?
> 
>  And how about when it is not 0, and it is a value that 
> generate by another alias, for example:
> 
> CLEAR_LOG = LOAD 'in' AS (query, numbers);
> 
> I want get the total query number, and I use the pig latin as 
> the wiki described:
> 
> B = group CLEAR_LOG all;
> C = foreach B generate COUNT(B);
> store C into 'count';
> D = load 'data2';
> E = foreach D generate $1/GetScalar(C);
> 
> But it shows that
> Invalid alias: B in B: (group: ( ), CLEAR_LOG: (queryString, 
> url, type ) ) So I use $1 instead.
> 
> But also I find the GetScalar is not in the Pig, How can I use it?
> 
> 
>

Re:Re: How can get my A ?

Posted by paradisehit <pa...@163.com>.

 And how about when it is not 0, and it is a value that generate by another alias, for example:

CLEAR_LOG = LOAD 'in' AS (query, numbers);

I want get the total query number, and I use the pig latin as the wiki described:

B = group CLEAR_LOG all;
C = foreach B generate COUNT(B);
store C into 'count';
D = load 'data2';
E = foreach D generate $1/GetScalar(C);

But it shows that 
Invalid alias: B in B: (group: ( ), CLEAR_LOG: (queryString, url, type ) )
So I use $1 instead.

But also I find the GetScalar is not in the Pig, How can I use it?

Re: How can get my A ?

Posted by Alan Gates <ga...@yahoo-inc.com>.

You have a relation A and you want to tack a zero on the end of each 
tuple? The following line will do that:

B = foreach A generate a, b, c, 0;

The GC overhead limit means you ran out of memory on one of your 
reducers. If I remember correctly CROSS always uses one reducer because 
it has to cross every tuple with every other tuple. So your parallel 
clause is ignored there. With the foreach above you will not have that 
issue.

Alan.

paradisehit wrote:
>  
>  also I got the Error when the PARALLEL number is set 100
>
>
> java.io.IOException: GC overhead limit exceeded
> 	at java.nio.ByteBuffer.wrap(ByteBuffer.java:350)
> 	at java.lang.StringCoding$StringDecoder.decode(StringCoding.java:137)
> 	at java.lang.StringCoding.decode(StringCoding.java:173)
> 	at java.lang.String.(String.java:444)
> 	at java.lang.String.(String.java:516)
> 	at org.apache.pig.data.DataAtom.read(DataAtom.java:182)
> 	at org.apache.pig.data.Tuple.readDatum(Tuple.java:362)
> 	at org.apache.pig.data.Tuple.read(Tuple.java:344)
> 	at org.apache.pig.data.Tuple.readFields(Tuple.java:331)
> 	at org.apache.pig.data.DefaultDataBag$DefaultDataBagIterator.readFromFile(DefaultDataBag.java:220)
> 	at org.apache.pig.data.DefaultDataBag$DefaultDataBagIterator.next(DefaultDataBag.java:207)
> 	at org.apache.pig.data.DefaultDataBag$DefaultDataBagIterator.hasNext(DefaultDataBag.java:134)
> 	at org.apache.pig.impl.eval.collector.DataCollector.addToSuccessor(DataCollector.java:93)
> 	at org.apache.pig.impl.eval.SimpleEvalSpec$1.add(SimpleEvalSpec.java:38)
> 	at org.apache.pig.impl.eval.GenerateSpec$CrossProductItem.(GenerateSpec.java:159)
> 	at org.apache.pig.impl.eval.GenerateSpec$1.add(GenerateSpec.java:79)
> 	at org.apache.pig.backend.hadoop.executionengine.mapreduceExec.PigMapReduce.reduce(PigMapReduce.java:165)
> 	at org.apache.pig.backend.hadoop.executionengine.mapreduceExec.PigMapReduce.reduce(PigMapReduce.java:80)
> 	at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:318)
> 	at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2209)
>
>
>
>  I know it is the GC limit, because I have 286G mapoutput, but there is only 100 reduce(By PARALLEL), so it occurs errors.
>
>  
>
>
> ��2008-09-08��paradisehit <pa...@163.com> д����
>   
>> A = (a, b, c)
>> I just want add a column 0 into A, and the A will be like this:
>>
>> A = (a, b, c, 0)
>>
>> How can I?
>>
>> I use cross, but when I use PARALLEL BIGGER(>=300), it occurs that:
>>
>> ERROR org.apache.pig.tools.grunt.GruntParser  - java.io.IOException: Unable to store alias null
>> 	at org.apache.pig.impl.util.WrappedIOException.wrap(WrappedIOException.java:34)
>> 	at org.apache.pig.PigServer.registerQuery(PigServer.java:296)
>> 	at org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:475)
>> 	at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:233)
>> 	at org.apache.pig.tools.grunt.GruntParser.parseContOnError(GruntParser.java:91)
>> 	at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:54)
>> 	at org.apache.pig.Main.main(Main.java:270)
>> Caused by: org.apache.pig.backend.executionengine.ExecException: java.io.IOException: Job failed
>> 	at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:296)
>> 	at org.apache.pig.PigServer.optimizeAndRunQuery(PigServer.java:413)
>> 	at org.apache.pig.PigServer.registerQuery(PigServer.java:293)
>> 	... 5 more
>> Caused by: java.io.IOException: Job failed
>> 	at org.apache.pig.backend.hadoop.executionengine.POMapreduce.open(POMapreduce.java:188)
>> 	at org.apache.pig.backend.hadoop.executionengine.POMapreduce.open(POMapreduce.java:178)
>> 	at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:282)
>> 	... 7 more
>>
>> 99316 [main] ERROR org.apache.pig.tools.grunt.GruntParser  - java.io.IOException: Unable to store alias null
>>
>>
>> The Hadoop is 0.18 release, and I use the patch https://issues.apache.org/jira/browse/PIG-253
>>
>> and I set Parallel 100, it can run OK. Why? I think it opens too many fd, but I have set it 10240.
>>
>> So 1: how can I get my A?
>>   2: Why error?
>>
>>
>>
>>
>>     
>
>

Re:How can get my A ?

Posted by paradisehit <pa...@163.com>.

 
 also I got the Error when the PARALLEL number is set 100


java.io.IOException: GC overhead limit exceeded
	at java.nio.ByteBuffer.wrap(ByteBuffer.java:350)
	at java.lang.StringCoding$StringDecoder.decode(StringCoding.java:137)
	at java.lang.StringCoding.decode(StringCoding.java:173)
	at java.lang.String.(String.java:444)
	at java.lang.String.(String.java:516)
	at org.apache.pig.data.DataAtom.read(DataAtom.java:182)
	at org.apache.pig.data.Tuple.readDatum(Tuple.java:362)
	at org.apache.pig.data.Tuple.read(Tuple.java:344)
	at org.apache.pig.data.Tuple.readFields(Tuple.java:331)
	at org.apache.pig.data.DefaultDataBag$DefaultDataBagIterator.readFromFile(DefaultDataBag.java:220)
	at org.apache.pig.data.DefaultDataBag$DefaultDataBagIterator.next(DefaultDataBag.java:207)
	at org.apache.pig.data.DefaultDataBag$DefaultDataBagIterator.hasNext(DefaultDataBag.java:134)
	at org.apache.pig.impl.eval.collector.DataCollector.addToSuccessor(DataCollector.java:93)
	at org.apache.pig.impl.eval.SimpleEvalSpec$1.add(SimpleEvalSpec.java:38)
	at org.apache.pig.impl.eval.GenerateSpec$CrossProductItem.(GenerateSpec.java:159)
	at org.apache.pig.impl.eval.GenerateSpec$1.add(GenerateSpec.java:79)
	at org.apache.pig.backend.hadoop.executionengine.mapreduceExec.PigMapReduce.reduce(PigMapReduce.java:165)
	at org.apache.pig.backend.hadoop.executionengine.mapreduceExec.PigMapReduce.reduce(PigMapReduce.java:80)
	at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:318)
	at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2209)



 I know it is the GC limit, because I have 286G mapoutput, but there is only 100 reduce(By PARALLEL), so it occurs errors.

 


在2008-09-08，paradisehit <pa...@163.com> 写道：
> 
>A = (a, b, c)
> I just want add a column 0 into A, and the A will be like this:
>
>A = (a, b, c, 0)
>
>How can I?
>
>I use cross, but when I use PARALLEL BIGGER(>=300), it occurs that:
>
>ERROR org.apache.pig.tools.grunt.GruntParser  - java.io.IOException: Unable to store alias null
>	at org.apache.pig.impl.util.WrappedIOException.wrap(WrappedIOException.java:34)
>	at org.apache.pig.PigServer.registerQuery(PigServer.java:296)
>	at org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:475)
>	at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:233)
>	at org.apache.pig.tools.grunt.GruntParser.parseContOnError(GruntParser.java:91)
>	at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:54)
>	at org.apache.pig.Main.main(Main.java:270)
>Caused by: org.apache.pig.backend.executionengine.ExecException: java.io.IOException: Job failed
>	at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:296)
>	at org.apache.pig.PigServer.optimizeAndRunQuery(PigServer.java:413)
>	at org.apache.pig.PigServer.registerQuery(PigServer.java:293)
>	... 5 more
>Caused by: java.io.IOException: Job failed
>	at org.apache.pig.backend.hadoop.executionengine.POMapreduce.open(POMapreduce.java:188)
>	at org.apache.pig.backend.hadoop.executionengine.POMapreduce.open(POMapreduce.java:178)
>	at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:282)
>	... 7 more
>
>99316 [main] ERROR org.apache.pig.tools.grunt.GruntParser  - java.io.IOException: Unable to store alias null
>
>
>The Hadoop is 0.18 release, and I use the patch https://issues.apache.org/jira/browse/PIG-253
>
>and I set Parallel 100, it can run OK. Why? I think it opens too many fd, but I have set it 10240.
>
>So 1: how can I get my A?
>   2: Why error?
>
> 
> 
>

Re:How can get my A ?

Posted by paradisehit <pa...@163.com>.

 
 No body help me?

 
 


在2008-09-08，paradisehit <pa...@163.com> 写道：
> 
>A = (a, b, c)
> I just want add a column 0 into A, and the A will be like this:
>
>A = (a, b, c, 0)
>
>How can I?
>
>I use cross, but when I use PARALLEL BIGGER(>=300), it occurs that:
>
>ERROR org.apache.pig.tools.grunt.GruntParser  - java.io.IOException: Unable to store alias null
>	at org.apache.pig.impl.util.WrappedIOException.wrap(WrappedIOException.java:34)
>	at org.apache.pig.PigServer.registerQuery(PigServer.java:296)
>	at org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:475)
>	at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:233)
>	at org.apache.pig.tools.grunt.GruntParser.parseContOnError(GruntParser.java:91)
>	at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:54)
>	at org.apache.pig.Main.main(Main.java:270)
>Caused by: org.apache.pig.backend.executionengine.ExecException: java.io.IOException: Job failed
>	at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:296)
>	at org.apache.pig.PigServer.optimizeAndRunQuery(PigServer.java:413)
>	at org.apache.pig.PigServer.registerQuery(PigServer.java:293)
>	... 5 more
>Caused by: java.io.IOException: Job failed
>	at org.apache.pig.backend.hadoop.executionengine.POMapreduce.open(POMapreduce.java:188)
>	at org.apache.pig.backend.hadoop.executionengine.POMapreduce.open(POMapreduce.java:178)
>	at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:282)
>	... 7 more
>
>99316 [main] ERROR org.apache.pig.tools.grunt.GruntParser  - java.io.IOException: Unable to store alias null
>
>
>The Hadoop is 0.18 release, and I use the patch https://issues.apache.org/jira/browse/PIG-253
>
>and I set Parallel 100, it can run OK. Why? I think it opens too many fd, but I have set it 10240.
>
>So 1: how can I get my A?
>   2: Why error?
>
> 
> 
>