You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Jagdish Kewat (JIRA)" <ji...@apache.org> on 2016/02/18 15:05:18 UTC

[jira] [Commented] (PIG-4813) AvroStorage doesn't work for schema from external file for EMR

    [ https://issues.apache.org/jira/browse/PIG-4813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15152358#comment-15152358 ] 

Jagdish Kewat commented on PIG-4813:
------------------------------------

The error I am getting is 

My store command in the script looks as shown below.
{code}
store records into 's3://my-bucket/my-output' using org.apache.pig.piggybank.storage.avro.AvroStorage('schema_file', 's3n://my-bucket/my-schema/records.avsc');{code}

{code}
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.io.IOException: Output schema is null!
	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$1.call(MRAppMaster.java:473)
	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$1.call(MRAppMaster.java:453)
	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.callWithJobClassLoader(MRAppMaster.java:1542)
	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.createOutputCommitter(MRAppMaster.java:453)
	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceInit(MRAppMaster.java:371)
	at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$4.run(MRAppMaster.java:1500)
	at java.security.AccessController.doPrivileged(Native Method)
{code}

> AvroStorage doesn't work for schema from external file for EMR
> --------------------------------------------------------------
>
>                 Key: PIG-4813
>                 URL: https://issues.apache.org/jira/browse/PIG-4813
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Jagdish Kewat
>
> Hi Team,
> I couldn't get the schema loading for AvroStorage as described in http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/emr-etl-avro.html working. 
> It works fine if I provide the raw schema string with option 'schema' as described in https://cwiki.apache.org/confluence/display/PIG/AvroStorage.
> On HDFS I don't even need to specify the schema with store command.
> A quick insights regarding the versions.
> * Hadoop :
> {code}
> Hadoop 2.6.0-amzn-2
> Subversion git@aws157git.com:/pkg/Aws157BigTop -r 41f4e6be3ac5d6676a3464f77de79a33e8fdd9f3
> Compiled by ec2-user on 2015-11-16T20:56Z
> Compiled with protoc 2.5.0
> {code}
> * Pig :
> {code}
> Apache Pig version 0.14.0-amzn-0 (r: unknown)
> {code}
> * piggybank jar version:
> ** piggybank-0.14.0.jar
> * avro jar version :
> ** avro-1.7.7.jar
> * avro-ipc jar version :
> ** avro-ipc-1.7.7.jar
> * json-simple jar version
> ** json-simple-1.1.jar
> I tried looking for any pibbybank version of jar for EMR however no luck. I fear I am not using correct versions of jars since the feature should work as it has been documented. 
> Please advise if I am missing anything.
> Thanks,
> Jagdish
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)