You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Johannes Schwenk <jo...@adition.com> on 2012/09/03 13:16:39 UTC

Re: AvroStorage load and store, schema with maps

Thank you very much!

I was confused because it seems to be ok to pass parameters to DEFINEd
functions. If this does not work, it should be a syntax error trying to
pass them anyway. Maybe a parser exception could be thrown?

Thanks again!
Johannes


Am 23.08.2012 21:02, schrieb Cheolsoo Park:
> Actually, I found it in Pig manual:
> 
>  If you need to use different constructor parameters for different calls to
>> the function you will need to create multiple defines – one for each
>> parameter set.
> 
> 
> For example, this works:
> 
> DEFINE AvroStorageNoParam
>> org.apache.pig.piggybank.storage.avro.AvroStorage();
>> DEFINE AvroStorageWithParam
>> org.apache.pig.piggybank.storage.avro.AvroStorage('schema', '{"type" :
>> "map","values" : "string"}');
>> loaded_data = LOAD 'map.avro' USING *AvroStorageNoParam*;
>> describe loaded_data;
>> STORE loaded_data INTO 'output' USING *AvroStorageWithParam*;
> 
> 
> Please see the usage section:
> http://pig.apache.org/docs/r0.10.0/basic.html#define-udfs
> 
> Thanks,
> Cheolsoo
> 
> On Thu, Aug 23, 2012 at 11:11 AM, Cheolsoo Park <ch...@cloudera.com>wrote:
> 
>> Hi Johannes,
>>
>> I was able to reproduce your error with the following Avro schema:
>>
>> {
>>>   "type" : "map",
>>>   "values" : "string"
>>> }
>>
>>
>> The issue is not in AvroStorage but in the DEFINE statement.
>>
>> DEFINE AvroStorage org.apache.pig.piggybank.storage.avro.AvroStorage();
>>
>>
>> AvroStorage has two constructors: one with no parameter and the other with
>> parameters. To define output Avro schema, the second one must be used. But
>> your DEFINE statement makes the first constructor be used always, resulting
>> that output Avro schema is not set. If you remove the DEFINE statement and
>> use the fully qualified name of AvroStorage, everything works. For example,
>>
>> loaded_data = LOAD 'map.avro' USING *
>>> org.apache.pig.piggybank.storage.avro.AvroStorage.AvroStorage*();
>>> describe loaded_data;
>>> STORE loaded_data INTO 'output' USING *
>>> org.apache.pig.piggybank.storage.avro.AvroStorage*('schema', '
>>> {
>>>   "type" : "map",
>>>   "values" : "string"
>>> }
>>> ');
>>
>>
>> Now the question is why DEFINE does not work here.
>>
>> Thanks,
>> Cheolsoo
>>
>>
>> On Thu, Aug 23, 2012 at 8:49 AM, Johannes Schwenk <
>> johannes.schwenk@adition.com> wrote:
>>
>>> Hi all,
>>>
>>> I'm trying to execute the following pig script with pig-0.10.0 and yarn
>>> (cdh4.0.0):
>>>
>>> --
>>> DEFINE AvroStorage org.apache.pig.piggybank.storage.avro.AvroStorage();
>>> loaded_data = LOAD '$input' USING AvroStorage();
>>> STORE loaded_data INTO '$output' USING AvroStorage('same', '$input');
>>> --
>>>
>>> I call the pig this way:
>>>
>>> pig
>>>
>>> -Dpig.additional.jars=lib/piggybank.jar:lib/json-simple-1.1.jar:lib/avro-1.5.3.jar
>>> -file script.pig -param input=input.avro -param output=output.avro
>>>
>>> The input.avro has the following schema:
>>>
>>> http://pastebin.com/ZWU6qLWx
>>>
>>> I always get
>>>
>>> <file script.pig, line 3, column 0> Output Location Validation Failed
>>> for: 'xxx/output.avro' More info to follow:
>>> Please provide schema for Map field!
>>> Details at logfile: xxx/pig_1345735999390.log
>>>
>>> Log excerpt:
>>>
>>> Please provide schema for Map field!
>>>         at
>>>
>>> org.apache.pig.newplan.logical.rules.InputOutputFileValidator$InputOutputFileVisitor.visit(InputOutputFileValidator.java:75)
>>>         at
>>> org.apache.pig.newplan.logical.relational.LOStore.accept(LOStore.java:77)
>>>         at
>>>
>>> org.apache.pig.newplan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:64)
>>>         at
>>>
>>> org.apache.pig.newplan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:66)
>>>         at
>>> org.apache.pig.newplan.DepthFirstWalker.walk(DepthFirstWalker.java:53)
>>>         at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:50)
>>>         at
>>>
>>> org.apache.pig.newplan.logical.rules.InputOutputFileValidator.validate(InputOutputFileValidator.java:45)
>>>         at
>>>
>>> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.compile(HExecutionEngine.java:293)
>>>         at org.apache.pig.PigServer.compilePp(PigServer.java:1316)
>>>         at
>>> org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1253)
>>>         at org.apache.pig.PigServer.execute(PigServer.java:1245)
>>>         at org.apache.pig.PigServer.executeBatch(PigServer.java:362)
>>>         at
>>> org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:132)
>>>         at
>>>
>>> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:193)
>>>         at
>>>
>>> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
>>>         at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:84)
>>>         at org.apache.pig.Main.run(Main.java:430)
>>>         at org.apache.pig.Main.main(Main.java:111)
>>>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>         at
>>>
>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>>         at
>>>
>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>>         at java.lang.reflect.Method.invoke(Method.java:597)
>>>         at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
>>> Caused by: java.io.IOException: Please provide schema for Map field!
>>>         at
>>>
>>> org.apache.pig.piggybank.storage.avro.PigSchema2Avro.convert(PigSchema2Avro.java:110)
>>>         at
>>>
>>> org.apache.pig.piggybank.storage.avro.PigSchema2Avro.convertRecord(PigSchema2Avro.java:151)
>>>         at
>>>
>>> org.apache.pig.piggybank.storage.avro.PigSchema2Avro.convert(PigSchema2Avro.java:62)
>>>         at
>>>
>>> org.apache.pig.piggybank.storage.avro.AvroStorage.checkSchema(AvroStorage.java:534)
>>>         at
>>>
>>> org.apache.pig.newplan.logical.rules.InputOutputFileValidator$InputOutputFileVisitor.visit(InputOutputFileValidator.java:65)
>>>         ... 22 more
>>>
>>>
>>> I also tried to specify
>>>
>>> AvroStorage('{"debug": 5, "schema_file": "schema.avsc", "field22",
>>> "def:pd", "field23", "def:epd"}')
>>>
>>> - same result.
>>>
>>>
>>> Do you have any hints?
>>>
>>> Greetings,
>>> Johannes Schwenk
>>>
>>> --
>>> Softwareentwickler (Reporting)
>>> ________________________________________________________
>>>
>>> ADITION technologies AG
>>> Schwarzwaldstraße 78b
>>> 79117 Freiburg
>>>
>>> http://www.adition.com
>>>
>>> T +49 / (0)761 / 88147 - 30
>>> F +49 / (0)761 / 88147 - 77
>>> SUPPORT +49  / (0)1805 - ADITION
>>>
>>> (Festnetzpreis 14 ct/min; Mobilfunkpreise maximal 42 ct/min)
>>>
>>> Eingetragen beim Amtsgericht Düsseldorf unter HRB 54076
>>> Vorstände: Andreas Kleiser, Jörg Klekamp, Tihomir Perkovic, Marcus
>>> Schlüter
>>> Aufsichtsratsvorsitzender: Rechtsanwalt Daniel Raimer
>>> UStIDNr.: DE 218 858 434
>>>
>>>
>>
> 



Johannes Schwenk

-- 
Softwareentwickler (Reporting)
________________________________________________________

ADITION technologies AG
Schwarzwaldstraße 78b
79117 Freiburg

http://www.adition.com

T +49 / (0)761 / 88147 - 30
F +49 / (0)761 / 88147 - 77
SUPPORT +49  / (0)1805 - ADITION

(Festnetzpreis 14 ct/min; Mobilfunkpreise maximal 42 ct/min)

Eingetragen beim Amtsgericht Düsseldorf unter HRB 54076
Vorstände: Andreas Kleiser, Jörg Klekamp, Tihomir Perkovic, Marcus Schlüter
Aufsichtsratsvorsitzender: Rechtsanwalt Daniel Raimer
UStIDNr.: DE 218 858 434