You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@pig.apache.org by "Resch, Markus" <ma...@teamaol.com> on 2012/09/10 09:31:19 UTC

AW: Fallback for output data storage

Thanks for that hint, I've already thought about this solution as well. But this will also involve a second script to remove the fallback data in case of a success. 

We will think of that solution.

Thanks 
Markus
________________________________________
Von: Alan Gates [gates@hortonworks.com]
Gesendet: Donnerstag, 23. August 2012 17:01
An: user@pig.apache.org
Betreff: Re: Fallback for output data storage

You can simply store the data twice at the end of your script.  Pig will split it and send it to both.  It shouldn't fail the HDFS storage if the dbstorage fails (but test this first to make sure I'm correct.)

So your script would look like:

A = load ...
store Z into 'db' using DBStorage();
store Z into '/data/fallback';

Alan.

On Aug 23, 2012, at 4:38 AM, Markus Resch wrote:

> Hi everyone,
>
> we are planing to put our aggregations result into an external data
> base. To handle a connection failure to that external resource properly
> we currently store the result onto the hdfs and sync it to the db after
> that by a second pig script using the db's manufacturers pig data
> storage. We do that because we hardly can effort to redo all the
> aggregations in case of an error at the very end of the aggregation.
>
> If we could do something like to define a fallback data storage (e.g. to
> the hdfs) that will be used in case of an connection issue we could drop
> that entire second step an save a lot of effort.
> Is there anything like this?
>
> Kind Regards
>
> Markus
>