You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Cheolsoo Park (JIRA)" <ji...@apache.org> on 2012/09/11 08:54:08 UTC

[jira] [Commented] (PIG-2445) AvroStorage can't store two relations in one script

    [ https://issues.apache.org/jira/browse/PIG-2445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13452778#comment-13452778 ] 

Cheolsoo Park commented on PIG-2445:
------------------------------------

AvroStorage can store two relations in one script. In fact, there was the same question to user group a while ago. I am copying my answer here:
{quote}
The AvroStorage has very funny syntax regarding multiple stores. To apply different avro schemas to multiple stores, you have to specify their "index" as follows:

set1 = load 'input1.txt' using PigStorage('|') as ( ... );
store set1 into 'set1' using org.apache.pig.piggybank.storage.avro.AvroStorage('index', '1');

set2 = load 'input2.txt' using PigStorage('|') as ( .. );
store set2 into 'set2' using org.apache.pig.piggybank.storage.avro.AvroStorage('index', '2');

As can be seen, I added the 'index' parameters.

What AvroStorage does is to construct the following string in the frontend:

"1#<1st avro schema>,2#<2nd avro schema>"

and pass it to backend via UdfContext. Now in backend, tasks parse this string to get output schema for each store. 
{quote}

This is also documented at the [AvroStorage wiki|https://cwiki.apache.org/PIG/avrostorage.html#AvroStorage-GlobalParameters]. (Please see "index".) Obviously, this is not very intuitive, so I was thinking of writing a new AvroStorage with more intuitive options although I haven't started yet.

I think that we should close this jira. Please let me know if anyone has objections.

Thanks!
                
> AvroStorage can't store two relations in one script
> ---------------------------------------------------
>
>                 Key: PIG-2445
>                 URL: https://issues.apache.org/jira/browse/PIG-2445
>             Project: Pig
>          Issue Type: New Feature
>          Components: piggybank
>    Affects Versions: 0.9.1, 0.9.2, 0.10.0
>            Reporter: Russell Jurney
>              Labels: avro, fun, happy, pants, pig, pig_udf, storefunc
>
> STORE one INTO '/tmp/one.avro' USING AvroStorage();
> STORE two INTO '/tmp/two.avro' USING AvroStorage();
> -- relation two has the schema of relation one.  BANG!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira