You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by ๏̯͡๏ <ÐΞ€ρ@Ҝ>, de...@gmail.com on 2013/09/17 18:50:36 UTC

Pig Error with CombinedFileInputFormat.

I have PIG 0.11 and Hadoop 1.1.2 environment.
Have written a PIG Loader that uses CombinedFileInputFormat. (Custom
implementation) as i have large number of small files (50MB each).

I write a pig script to load and dump the data. It works fine in local
mode, pseudo-distributed mode (1 node cluster) but fails on cluster with
following error in map task


java.lang.NullPointerException
	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigSplit.toString(PigSplit.java:383)
	at java.lang.String.valueOf(String.java:2826)
	at java.lang.StringBuilder.append(StringBuilder.java:115)
	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:728)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:363)
	at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:396)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
	at org.apache.hadoop.mapred.Child.main(Child.java:249)


Any idea what could be the problem ?

Please clarify.

Without CombinedInputFileFormat, for 1215 files 300+ map tasks were started
and now only 1 map task is started. For how many combined splits a map task
is started or is it always 1 irrespective of number of files.

Regards,
Deepak

Re: Pig Error with CombinedFileInputFormat.

Posted by gurmeet <gu...@gmail.com>.
Deepak,

I am also seeing the same issue. Please share the solution
 if you have find out something.

Thanks,
Gurmeet