You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Cheolsoo Park <pi...@gmail.com> on 2013/12/02 06:04:57 UTC

Re: strange problem with count and distinct subscribers

Which version are you using? I am wondering whether PIG-3466 fixes your
error-
https://issues.apache.org/jira/browse/PIG-3466

You can reproduce the error only when loading more data. You also see a
random type cast error. My guess is that you ran into the race condition
that PIG-3466 fixed, and your bag is corrupted resulting in the type cast
error.





On Mon, Nov 18, 2013 at 6:30 AM, Noam Lavie <No...@pontis.com> wrote:

> Hi,
> I'm trying to run the following pig script (it main purpose is to read
> inputs that contains info about phone calls, the script suppose to count
> the different types of calls and the different subscribers that made them):
>
> SET default_parallel 40;
> allFiles = LOAD
> 'maprfs:///analytics/data/consumers/mapred/facts/done/FACT_VOICE_GE_Analytics9_1/20131114/'
> USING PigStorage(',');
> allFilesFiltered = FILTER allFiles BY $11 MATCHES '.*On.*' AND $4 > 0;
> datesList = FOREACH allFilesFiltered GENERATE SUBSTRING($0, 0, 10) AS day,
> $11 AS callType, $4 AS amount, $1 AS subscriberKey;
> datesGroups = GROUP datesList BY (day, callType);
> datesGroupsAmount = foreach datesGroups {
>     unique_seubscriber = DISTINCT datesList.subscriberKey;
>     GENERATE group.day, group.callType, COUNT(datesList),
> SUM(datesList.amount), COUNT(unique_seubscriber);
> };
> dump datesGroupsAmount;
>
> the problem is with the  unique_seubscriber. The count and distinct
> doesn't work. The strange thing is that if I run script separately for each
> sub folder's input  - the run will succeed for each part, but if I'm giving
> the hall  inputs folders together it fails and I get the following error:
> ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1066: Unable to open
> iterator for alias datesGroupsAmount
>
> Another error that I get from time to time (if I'm making small changes in
> the script) is:
> ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1066: Unable to open
> iterator for alias datesGroupsAmount. Backend error : java.lang.Boolean
> cannot be cast to org.apache.pig.data.Tuple (myne there is a connection
> between the two errors?)
>
> Here is the log file:
>
> Pig Stack Trace
> ---------------
> ERROR 1066: Unable to open iterator for alias datesGroupsAmount
>
> org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to
> open iterator for alias datesGroupsAmount
>                 at
> org.apache.pig.PigServer.openIterator(PigServer.java:836)
>                 at
> org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:696)
>                 at
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:320)
>                 at
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:194)
>                 at
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:170)
>                 at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:84)
>                 at org.apache.pig.Main.run(Main.java:604)
>                 at org.apache.pig.Main.main(Main.java:157)
>                 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
> Method)
>                 at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>                 at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>                 at java.lang.reflect.Method.invoke(Method.java:601)
>                 at org.apache.hadoop.util.RunJar.main(RunJar.java:197)
> Caused by: java.io.IOException: Job terminated with anomalous status FAILED
>                 at
> org.apache.pig.PigServer.openIterator(PigServer.java:828)
>                 ... 12 more
>
>
> any help will be appreciate
> thanks
> Noam
>
>
> ________________________________
>
> This email contains proprietary and/or confidential information of Pontis.
> If you have received this email in error, please delete all copies without
> delay and do not copy, distribute, or rely on any information contained in
> this email.
>

RE: strange problem with count and distinct subscribers

Posted by Noam Lavie <No...@pontis.com>.
Thanks, it does look like this was the problem.

-----Original Message-----
From: Cheolsoo Park [mailto:piaozhexiu@gmail.com]
Sent: Monday, December 02, 2013 7:05 AM
To: user@pig.apache.org
Subject: Re: strange problem with count and distinct subscribers

Which version are you using? I am wondering whether PIG-3466 fixes your
error-
https://issues.apache.org/jira/browse/PIG-3466

You can reproduce the error only when loading more data. You also see a random type cast error. My guess is that you ran into the race condition that PIG-3466 fixed, and your bag is corrupted resulting in the type cast error.





On Mon, Nov 18, 2013 at 6:30 AM, Noam Lavie <No...@pontis.com> wrote:

> Hi,
> I'm trying to run the following pig script (it main purpose is to read
> inputs that contains info about phone calls, the script suppose to
> count the different types of calls and the different subscribers that made them):
>
> SET default_parallel 40;
> allFiles = LOAD
> 'maprfs:///analytics/data/consumers/mapred/facts/done/FACT_VOICE_GE_Analytics9_1/20131114/'
> USING PigStorage(',');
> allFilesFiltered = FILTER allFiles BY $11 MATCHES '.*On.*' AND $4 > 0;
> datesList = FOREACH allFilesFiltered GENERATE SUBSTRING($0, 0, 10) AS
> day,
> $11 AS callType, $4 AS amount, $1 AS subscriberKey; datesGroups =
> GROUP datesList BY (day, callType); datesGroupsAmount = foreach
> datesGroups {
>     unique_seubscriber = DISTINCT datesList.subscriberKey;
>     GENERATE group.day, group.callType, COUNT(datesList),
> SUM(datesList.amount), COUNT(unique_seubscriber); }; dump
> datesGroupsAmount;
>
> the problem is with the  unique_seubscriber. The count and distinct
> doesn't work. The strange thing is that if I run script separately for
> each sub folder's input  - the run will succeed for each part, but if
> I'm giving the hall  inputs folders together it fails and I get the following error:
> ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1066: Unable to open
> iterator for alias datesGroupsAmount
>
> Another error that I get from time to time (if I'm making small
> changes in the script) is:
> ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1066: Unable to open
> iterator for alias datesGroupsAmount. Backend error :
> java.lang.Boolean cannot be cast to org.apache.pig.data.Tuple (myne
> there is a connection between the two errors?)
>
> Here is the log file:
>
> Pig Stack Trace
> ---------------
> ERROR 1066: Unable to open iterator for alias datesGroupsAmount
>
> org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable
> to open iterator for alias datesGroupsAmount
>                 at
> org.apache.pig.PigServer.openIterator(PigServer.java:836)
>                 at
> org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:696)
>                 at
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:320)
>                 at
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:194)
>                 at
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:170)
>                 at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:84)
>                 at org.apache.pig.Main.run(Main.java:604)
>                 at org.apache.pig.Main.main(Main.java:157)
>                 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
> Method)
>                 at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>                 at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>                 at java.lang.reflect.Method.invoke(Method.java:601)
>                 at org.apache.hadoop.util.RunJar.main(RunJar.java:197)
> Caused by: java.io.IOException: Job terminated with anomalous status FAILED
>                 at
> org.apache.pig.PigServer.openIterator(PigServer.java:828)
>                 ... 12 more
>
>
> any help will be appreciate
> thanks
> Noam
>
>
> ________________________________
>
> This email contains proprietary and/or confidential information of Pontis.
> If you have received this email in error, please delete all copies
> without delay and do not copy, distribute, or rely on any information
> contained in this email.
>

________________________________

This email contains proprietary and/or confidential information of Pontis. If you have received this email in error, please delete all copies without delay and do not copy, distribute, or rely on any information contained in this email.