You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by felix gao <gr...@gmail.com> on 2013/05/23 06:32:41 UTC
pig 12 snapshot avrostorage throws weird exceptions
My job fails at the last stage with the following exceptions
java.io.IOException:
org.apache.avro.file.DataFileWriter$AppendWriteException:
java.lang.RuntimeException: Datum
{(8787,1,32796,1368788157),(11860,1,2090,1368788157),(13962,1,207766,1368788157),(11860,1,5752,1368788157),(8787,1,38848,1368788157),(11860,1,22599,1368788157),(11860,1,62,1368788157),(11860,1,25383,1368788157),(8787,1,32790,1368788157),(13962,1745,13962,1368788157),(8787,1,1,1368788157),(11860,1,34062,1368788157),(11860,1,5814,1368788157),(8787,1,32939,1368788157),(13962,1,195069,1368788157)}
is not in union ["null","int"]
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.runPipeline(PigGenericMapReduce.java:470)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.processOnePackageOutput(PigGenericMapReduce.java:433)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.reduce(PigGenericMapReduce.java:413)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.reduce(PigGenericMapReduce.java:257)
at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:176)
at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:572)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:414)
at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1177)
at org.apache.hadoop.mapred.Child.main(Child.java:264)
Caused by: org.apache.avro.file.DataFileWriter$AppendWriteException:
java.lang.RuntimeException: Datum
{(8787,1,32796,1368788157),(11860,1,2090,1368788157),(13962,1,207766,1368788157),(11860,1,5752,1368788157),(8787,1,38848,1368788157),(11860,1,22599,1368788157),(11860,1,62,1368788157),(11860,1,25383,1368788157),(8787,1,32790,1368788157),(13962,1745,13962,1368788157),(8787,1,1,1368788157),(11860,1,34062,1368788157),(11860,1,5814,1368788157),(8787,1,32939,1368788157),(13962,1,195069,1368788157)}
is not in union ["null","int"]
at org.apache.avro.file.DataFileWriter.append(DataFileWriter.java:263)
at
org.apache.pig.piggybank.storage.avro.PigAvroRecordWriter.write(PigAvroRecordWriter.java:49)
at
org.apache.pig.piggybank.storage.avro.AvroStorage.putNext(AvroStorage.java:728)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:139)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:98)
at
org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.write(ReduceTask.java:514)
at
org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.runPipeline(PigGenericMapReduce.java:468)
... 11 more
Caused by: java.lang.RuntimeException: Datum
{(8787,1,32796,1368788157),(11860,1,2090,1368788157),(13962,1,207766,1368788157),(11860,1,5752,1368788157),(8787,1,38848,1368788157),(11860,1,22599,1368788157),(11860,1,62,1368788157),(11860,1,25383,1368788157),(8787,1,32790,1368788157),(13962,1745,13962,1368788157),(8787,1,1,1368788157),(11860,1,34062,1368788157),(11860,1,5814,1368788157),(8787,1,32939,1368788157),(13962,1,195069,1368788157)}
is not in union ["null","int"]
at
org.apache.pig.piggybank.storage.avro.PigAvroDatumWriter.resolveUnionSchema(PigAvroDatumWriter.java:128)
at
org.apache.pig.piggybank.storage.avro.PigAvroDatumWriter.writeUnion(PigAvroDatumWriter.java:111)
at
org.apache.pig.piggybank.storage.avro.PigAvroDatumWriter.write(PigAvroDatumWriter.java:82)
at
org.apache.pig.piggybank.storage.avro.PigAvroDatumWriter.writeRecord(PigAvroDatumWriter.java:365)
at
org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:66)
at
org.apache.pig.piggybank.storage.avro.PigAvroDatumWriter.write(PigAvroDatumWriter.java:99)
at
org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:58)
at org.apache.avro.file.DataFileWriter.append(DataFileWriter.java:257)
... 18 more
I have a code snippet
REGISTER /var/pig-0.12.0-SNAPSHOT/contrib/piggybank/java/piggybank.jar;
REGISTER /var/pig/libs/avro-1.7.4.jar;
REGISTER /var/pig/libs/avro-tools-1.7.4.jar;
REGISTER /var/pig/libs/json_simple-1.1.jar;
%default READ_AVRO_NO_SCHEMA
'org.apache.pig.piggybank.storage.avro.AvroStorage(\'no_schema_check\',
\'ignore_bad_files\')';
%default UP_SCHEMA '/user/schemas/upv.schema';
%default UP_AVRO_MECHANICS
'org.apache.pig.piggybank.storage.avro.AvroStorage(\'no_schema_check\',\'schema_file\',
\'$UP_SCHEMA\')';
%default DUP_PROFILE_SCHEMA '/user/schemas/upv-grouped.schema';
%default DUP_PROFILE_MECHANICS
'org.apache.pig.piggybank.storage.avro.AvroStorage(\'no_schema_check\',\'schema_file\',
\'$DUP_PROFILE_SCHEMA\')';
/*some code to produce usc_unsampled, sac_unsampled, audience_unsampled
ignored */
STORE usc_unsampled INTO '$OUT_DIR/usc' USING $UP_AVRO_MECHANICS;
STORE sac_unsampled INTO '$OUT_DIR/sac' USING $UP_AVRO_MECHANICS;
STORE audience_unsampled INTO '$OUT_DIR/audience' USING $UP_AVRO_MECHANICS;
unsampled_upv = UNION usc_unsampled, sac_unsampled, audience_unsampled;
qq_main = FOREACH unsampled_upv GENERATE uuid, sid, ns, cat, ts;
qq_grp = GROUP qq_main BY uuid;
qq_sel = FOREACH qq_grp GENERATE group AS uuid, qq_main.(sid,ns,cat,ts) AS
tup;
STORE qq_sel INTO '$OUT_DIR_COMPACT/unsampled-tuples' USING
$DUP_PROFILE_MECHANICS;
the /user/schemas/upv-grouped.schema looks like the following
{ "type": "record", "name": "group", "fields": [
{ "name": "uuid", "type": "string" },
{ "name": "bag", "type":
{ "type": "array", "items":
{ "name": "tuple", "type": "record", "fields": [
{ "name": "sid", "type": "int" },
{ "name": "ns", "type": "int" },
{ "name": "cat", "type": "int" },
{ "name": "ts", "type": "int" } ]
}
}
} ]
}
I am not sure why AvroStorage will throw that exception.
Thanks in advance.
Felix