You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Markus Resch <ma...@adtech.de> on 2012/04/23 16:09:36 UTC

Properties Configuration in Custom Load Function

Hey Folks,

We've created our own LOAD function by extending the default AVRO
Storage (basicly we're processing a set of paths to glob by the Avro
Storage)

Our algorithm needs some basic configuration which we're reading out of
a .properties file which is located right beside the pig script. 
Our algorithm works great. According to the output directly after
starting the pig script everything is just fine. But after the jobs runs
for a while we're getting an error message which says it can't find the
properties file. We're assuming that the load gets started on each data
node and we don't have that config there. Is that assumption true? And
if: Is there a way to work around this issue?

Thanks

Markus


Re: Properties Configuration in Custom Load Function

Posted by Dmitriy Ryaboy <dv...@gmail.com>.
Hi Markus,

That's correct -- the loaders will get instantiated both on the
client-side (to allow you to do any setup you need to do), and on the
MR side (to actually do the loading). You can do a couple of things to
get your properties over to the MR side:

1) add your file to the "tmpfiles" property of the jobconf that gets
passed in via setLocation. This may be error-prone since you might be
in a situation where two of your loaders, with different properties,
are processed in the same MR job (for a join, for example).

2) Serialize your properties straight into the udf context, namespaced
using the signature you get via setUDFContextSignature, and
deserialize them on the backend.

D


On Mon, Apr 23, 2012 at 7:09 AM, Markus Resch <ma...@adtech.de> wrote:
> Hey Folks,
>
> We've created our own LOAD function by extending the default AVRO
> Storage (basicly we're processing a set of paths to glob by the Avro
> Storage)
>
> Our algorithm needs some basic configuration which we're reading out of
> a .properties file which is located right beside the pig script.
> Our algorithm works great. According to the output directly after
> starting the pig script everything is just fine. But after the jobs runs
> for a while we're getting an error message which says it can't find the
> properties file. We're assuming that the load gets started on each data
> node and we don't have that config there. Is that assumption true? And
> if: Is there a way to work around this issue?
>
> Thanks
>
> Markus
>