You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Jialong Wu <ji...@gmail.com> on 2010/03/04 20:15:18 UTC

MultiStorage-like UDF on Elastic MapReduce/S3

Hi,

Does anyone have experience running MultiStorage-like UDF on Elastic
MapReduce? Basically we are trying to store output into multiple
directories based on certain field values. We have some success
writing UDF that extends MultiStorage in piggybank to write to HDFS,
but we couldn't get the same UDF to write to S3. We also couldn't find
MultiStorage in Amazon's version of piggybank. Any suggestions on how
we can achieve that on Elastic MapReduce writing to S3? Thanks in
advance!

Thanks,
Jialong

Re: MultiStorage-like UDF on Elastic MapReduce/S3

Posted by Jennie Cochran-Chinn <jc...@adconion.com>.
Amazons extension allows one to write to/read from both s3 or hdfs,  
whereas the last time I checked the non amazon version only allows one  
to do either or but not both.  The MultiStorage in the regular piggy  
bank is not written to support the multiple file systems - which would  
be my guess as to why its not in Amazon's version of piggy bank.  You  
could try to extend MultiStorage to write to the multiple filesystems  
perhaps.

Jennie


On Mar 4, 2010, at 11:15 AM, Jialong Wu wrote:

> Hi,
>
> Does anyone have experience running MultiStorage-like UDF on Elastic
> MapReduce? Basically we are trying to store output into multiple
> directories based on certain field values. We have some success
> writing UDF that extends MultiStorage in piggybank to write to HDFS,
> but we couldn't get the same UDF to write to S3. We also couldn't find
> MultiStorage in Amazon's version of piggybank. Any suggestions on how
> we can achieve that on Elastic MapReduce writing to S3? Thanks in
> advance!
>
> Thanks,
> Jialong