You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by felix gao <gr...@gmail.com> on 2013/05/23 06:32:41 UTC

pig 12 snapshot avrostorage throws weird exceptions

My job fails at the last stage with the following exceptions

java.io.IOException:
org.apache.avro.file.DataFileWriter$AppendWriteException:
java.lang.RuntimeException: Datum
{(8787,1,32796,1368788157),(11860,1,2090,1368788157),(13962,1,207766,1368788157),(11860,1,5752,1368788157),(8787,1,38848,1368788157),(11860,1,22599,1368788157),(11860,1,62,1368788157),(11860,1,25383,1368788157),(8787,1,32790,1368788157),(13962,1745,13962,1368788157),(8787,1,1,1368788157),(11860,1,34062,1368788157),(11860,1,5814,1368788157),(8787,1,32939,1368788157),(13962,1,195069,1368788157)}
is not in union ["null","int"]

at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.runPipeline(PigGenericMapReduce.java:470)

at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.processOnePackageOutput(PigGenericMapReduce.java:433)

at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.reduce(PigGenericMapReduce.java:413)

at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.reduce(PigGenericMapReduce.java:257)

at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:176)

at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:572)

at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:414)

at org.apache.hadoop.mapred.Child$4.run(Child.java:270)

at java.security.AccessController.doPrivileged(Native Method)

at javax.security.auth.Subject.doAs(Subject.java:396)

at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1177)

at org.apache.hadoop.mapred.Child.main(Child.java:264)

Caused by: org.apache.avro.file.DataFileWriter$AppendWriteException:
java.lang.RuntimeException: Datum
{(8787,1,32796,1368788157),(11860,1,2090,1368788157),(13962,1,207766,1368788157),(11860,1,5752,1368788157),(8787,1,38848,1368788157),(11860,1,22599,1368788157),(11860,1,62,1368788157),(11860,1,25383,1368788157),(8787,1,32790,1368788157),(13962,1745,13962,1368788157),(8787,1,1,1368788157),(11860,1,34062,1368788157),(11860,1,5814,1368788157),(8787,1,32939,1368788157),(13962,1,195069,1368788157)}
is not in union ["null","int"]

at org.apache.avro.file.DataFileWriter.append(DataFileWriter.java:263)

at
org.apache.pig.piggybank.storage.avro.PigAvroRecordWriter.write(PigAvroRecordWriter.java:49)

at
org.apache.pig.piggybank.storage.avro.AvroStorage.putNext(AvroStorage.java:728)

at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:139)

at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:98)

at
org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.write(ReduceTask.java:514)

at
org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)

at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.runPipeline(PigGenericMapReduce.java:468)

... 11 more

Caused by: java.lang.RuntimeException: Datum
{(8787,1,32796,1368788157),(11860,1,2090,1368788157),(13962,1,207766,1368788157),(11860,1,5752,1368788157),(8787,1,38848,1368788157),(11860,1,22599,1368788157),(11860,1,62,1368788157),(11860,1,25383,1368788157),(8787,1,32790,1368788157),(13962,1745,13962,1368788157),(8787,1,1,1368788157),(11860,1,34062,1368788157),(11860,1,5814,1368788157),(8787,1,32939,1368788157),(13962,1,195069,1368788157)}
is not in union ["null","int"]

at
org.apache.pig.piggybank.storage.avro.PigAvroDatumWriter.resolveUnionSchema(PigAvroDatumWriter.java:128)

at
org.apache.pig.piggybank.storage.avro.PigAvroDatumWriter.writeUnion(PigAvroDatumWriter.java:111)

at
org.apache.pig.piggybank.storage.avro.PigAvroDatumWriter.write(PigAvroDatumWriter.java:82)

at
org.apache.pig.piggybank.storage.avro.PigAvroDatumWriter.writeRecord(PigAvroDatumWriter.java:365)

at
org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:66)

at
org.apache.pig.piggybank.storage.avro.PigAvroDatumWriter.write(PigAvroDatumWriter.java:99)

at
org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:58)

at org.apache.avro.file.DataFileWriter.append(DataFileWriter.java:257)

... 18 more



I have a code snippet

REGISTER /var/pig-0.12.0-SNAPSHOT/contrib/piggybank/java/piggybank.jar;
REGISTER /var/pig/libs/avro-1.7.4.jar;
REGISTER /var/pig/libs/avro-tools-1.7.4.jar;
REGISTER /var/pig/libs/json_simple-1.1.jar;

%default READ_AVRO_NO_SCHEMA
'org.apache.pig.piggybank.storage.avro.AvroStorage(\'no_schema_check\',
\'ignore_bad_files\')';
%default UP_SCHEMA '/user/schemas/upv.schema';
%default UP_AVRO_MECHANICS
'org.apache.pig.piggybank.storage.avro.AvroStorage(\'no_schema_check\',\'schema_file\',
\'$UP_SCHEMA\')';
%default DUP_PROFILE_SCHEMA '/user/schemas/upv-grouped.schema';
%default DUP_PROFILE_MECHANICS
'org.apache.pig.piggybank.storage.avro.AvroStorage(\'no_schema_check\',\'schema_file\',
\'$DUP_PROFILE_SCHEMA\')';

/*some code to produce usc_unsampled, sac_unsampled, audience_unsampled
ignored */

STORE usc_unsampled INTO '$OUT_DIR/usc' USING $UP_AVRO_MECHANICS;
STORE sac_unsampled INTO '$OUT_DIR/sac' USING $UP_AVRO_MECHANICS;
STORE audience_unsampled INTO '$OUT_DIR/audience' USING $UP_AVRO_MECHANICS;

unsampled_upv = UNION usc_unsampled, sac_unsampled, audience_unsampled;
qq_main = FOREACH unsampled_upv GENERATE uuid, sid, ns, cat, ts;
qq_grp = GROUP qq_main BY uuid;
qq_sel = FOREACH qq_grp GENERATE group AS uuid, qq_main.(sid,ns,cat,ts) AS
tup;

STORE qq_sel INTO '$OUT_DIR_COMPACT/unsampled-tuples' USING
$DUP_PROFILE_MECHANICS;


the /user/schemas/upv-grouped.schema looks like the following
{ "type": "record", "name": "group", "fields": [
    { "name": "uuid", "type": "string" },
    { "name": "bag", "type":
      { "type": "array", "items":
        { "name": "tuple", "type": "record", "fields": [
          { "name": "sid", "type": "int" },
          { "name": "ns", "type": "int" },
          { "name": "cat", "type": "int" },
          { "name": "ts", "type": "int" } ]
        }
      }
    } ]
}

I am not sure why AvroStorage will throw that exception.

Thanks in advance.

Felix