You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Cheolsoo Park (JIRA)" <ji...@apache.org> on 2012/09/11 08:54:08 UTC
[jira] [Commented] (PIG-2445) AvroStorage can't store two relations
in one script
[ https://issues.apache.org/jira/browse/PIG-2445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13452778#comment-13452778 ]
Cheolsoo Park commented on PIG-2445:
------------------------------------
AvroStorage can store two relations in one script. In fact, there was the same question to user group a while ago. I am copying my answer here:
{quote}
The AvroStorage has very funny syntax regarding multiple stores. To apply different avro schemas to multiple stores, you have to specify their "index" as follows:
set1 = load 'input1.txt' using PigStorage('|') as ( ... );
store set1 into 'set1' using org.apache.pig.piggybank.storage.avro.AvroStorage('index', '1');
set2 = load 'input2.txt' using PigStorage('|') as ( .. );
store set2 into 'set2' using org.apache.pig.piggybank.storage.avro.AvroStorage('index', '2');
As can be seen, I added the 'index' parameters.
What AvroStorage does is to construct the following string in the frontend:
"1#<1st avro schema>,2#<2nd avro schema>"
and pass it to backend via UdfContext. Now in backend, tasks parse this string to get output schema for each store.
{quote}
This is also documented at the [AvroStorage wiki|https://cwiki.apache.org/PIG/avrostorage.html#AvroStorage-GlobalParameters]. (Please see "index".) Obviously, this is not very intuitive, so I was thinking of writing a new AvroStorage with more intuitive options although I haven't started yet.
I think that we should close this jira. Please let me know if anyone has objections.
Thanks!
> AvroStorage can't store two relations in one script
> ---------------------------------------------------
>
> Key: PIG-2445
> URL: https://issues.apache.org/jira/browse/PIG-2445
> Project: Pig
> Issue Type: New Feature
> Components: piggybank
> Affects Versions: 0.9.1, 0.9.2, 0.10.0
> Reporter: Russell Jurney
> Labels: avro, fun, happy, pants, pig, pig_udf, storefunc
>
> STORE one INTO '/tmp/one.avro' USING AvroStorage();
> STORE two INTO '/tmp/two.avro' USING AvroStorage();
> -- relation two has the schema of relation one. BANG!
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira