You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by abhishek dodda <ab...@gmail.com> on 2014/05/07 00:35:16 UTC

Pig Job Failure With More Number Of Input Files

Hi all,

There is a pig job which is failing.

*Pig Script*

Register
/opt/cloudera/parcels/CDH-4.2.1-1.cdh4.2.1.p0.5/lib/pig/piggybank.jar;

/* READ LAST 30 DAYS DATA */
/* xyz table is partitioned with dt*/

au = LOAD 'xyz' USING org.apache.hcatalog.pig.HCatLoader();

ad = FILTER au by ($a) and $1 == '0100';

dd = limit ad 10;

dump dd;

*Property file*

a="dt == '20140501' or dt == '20140430' or dt == '20140429' or dt ==
'20140428' or dt == '20140427' or dt == '20140426' or dt == '20140425' or
dt == '20140424' or dt == '20140423' or dt == '20140422' or dt ==
'20140421' or dt == '20140420' or dt == '20140419' or dt == '20140418' or
dt == '20140417' or dt == '20140416' or dt == '20140415' or dt ==
'20140414' or dt == '20140413' or dt == '20140412' or dt == '20140411' or
dt == '20140410' or dt == '20140409' or dt == '20140408' or dt ==
'20140407' or dt == '20140406' or dt == '20140405' or dt == '20140404' or
dt == '20140403' or dt == '20140402'"

*Job is successful when running for 3 days, but failing when using for more
than 3 days. For each date there are more than 30 files. We thought that it
is hitting https://issues.apache.org/jira/browse/MAPREDUCE-2779
<https://issues.apache.org/jira/browse/MAPREDUCE-2779>, for which i have
changed **pig.maxCombinedSplitSize to 256mb to reduce the number of
mappers/splits even that did not help*

*Error*
*java.io.IOException: Split class
hcfpgchfhdfpgjgefphcgfghgofpgdgefpgehchggeheaabkgdgngmhdfpgjhdhdhcfpgcgjgofphcgfghgofpgdgefpgehchggeheaabngdgng
not found*

	at org.apache.hadoop.mapred.MapTask.getSplitDetails(MapTask.java:348)
	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:641)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:331)
	at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:396)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
	at org.apache.hadoop.mapred.Child.main(Child.java:262)
Caused by: java.lang.ClassNotFoundException: *Class
hcfpgchfhdfpgjgefphcgfghgofpgdgefpgehchggeheaabkgdgngmhdfpgjhdhdhcfpgcgjgofphcgfghgofpgdgefpgehchggeheaabngdgng
not found*
	at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1493)
	at org.apache.hadoop.mapred.MapTask.getSplitDetails(MapTask.java:346)
	... 7 more


Any inputs to resolve this issue are appreciated. Thanks for your help

-- 
Thanks,
Abhishek
2018509769

Re: Pig Job Failure With More Number Of Input Files

Posted by Abhishek Agarwal <ab...@gmail.com>.
can you attach your job configuration here? That could help in narrowing
down the problem.


On Wed, May 7, 2014 at 4:05 AM, abhishek dodda <ab...@gmail.com>wrote:

> Hi all,
>
> There is a pig job which is failing.
>
> *Pig Script*
>
> Register
> /opt/cloudera/parcels/CDH-4.2.1-1.cdh4.2.1.p0.5/lib/pig/piggybank.jar;
>
> /* READ LAST 30 DAYS DATA */
> /* xyz table is partitioned with dt*/
>
> au = LOAD 'xyz' USING org.apache.hcatalog.pig.HCatLoader();
>
> ad = FILTER au by ($a) and $1 == '0100';
>
> dd = limit ad 10;
>
> dump dd;
>
> *Property file*
>
> a="dt == '20140501' or dt == '20140430' or dt == '20140429' or dt ==
> '20140428' or dt == '20140427' or dt == '20140426' or dt == '20140425' or
> dt == '20140424' or dt == '20140423' or dt == '20140422' or dt ==
> '20140421' or dt == '20140420' or dt == '20140419' or dt == '20140418' or
> dt == '20140417' or dt == '20140416' or dt == '20140415' or dt ==
> '20140414' or dt == '20140413' or dt == '20140412' or dt == '20140411' or
> dt == '20140410' or dt == '20140409' or dt == '20140408' or dt ==
> '20140407' or dt == '20140406' or dt == '20140405' or dt == '20140404' or
> dt == '20140403' or dt == '20140402'"
>
> *Job is successful when running for 3 days, but failing when using for more
> than 3 days. For each date there are more than 30 files. We thought that it
> is hitting https://issues.apache.org/jira/browse/MAPREDUCE-2779
> <https://issues.apache.org/jira/browse/MAPREDUCE-2779>, for which i have
> changed **pig.maxCombinedSplitSize to 256mb to reduce the number of
> mappers/splits even that did not help*
>
> *Error*
> *java.io.IOException: Split class
>
> hcfpgchfhdfpgjgefphcgfghgofpgdgefpgehchggeheaabkgdgngmhdfpgjhdhdhcfpgcgjgofphcgfghgofpgdgefpgehchggeheaabngdgng
> not found*
>
>         at
> org.apache.hadoop.mapred.MapTask.getSplitDetails(MapTask.java:348)
>         at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:641)
>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:331)
>         at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:396)
>         at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
>         at org.apache.hadoop.mapred.Child.main(Child.java:262)
> Caused by: java.lang.ClassNotFoundException: *Class
>
> hcfpgchfhdfpgjgefphcgfghgofpgdgefpgehchggeheaabkgdgngmhdfpgjhdhdhcfpgcgjgofphcgfghgofpgdgefpgehchggeheaabngdgng
> not found*
>         at
> org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1493)
>         at
> org.apache.hadoop.mapred.MapTask.getSplitDetails(MapTask.java:346)
>         ... 7 more
>
>
> Any inputs to resolve this issue are appreciated. Thanks for your help
>
> --
> Thanks,
> Abhishek
> 2018509769
>



-- 
Regards,
Abhishek Agarwal