You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Alan Gates (JIRA)" <ji...@apache.org> on 2009/02/10 19:49:00 UTC

[jira] Commented: (PIG-602) Pass global configurations to UDF

    [ https://issues.apache.org/jira/browse/PIG-602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12672360#action_12672360 ] 

Alan Gates commented on PIG-602:
--------------------------------

I propose the following solution.

First, a singleton class will added to pig.

{code}
public class PigConf implements Serializable {

    private static PigConf self;

    private Map<String, Serializable> userConf;

    private PigConf() { ... }

    public static getPigConf() { return self; }

    public Map<String, Serializable> getUserConf();

}
{code}

Pig would take care of serializing this class between the front end and
backend.  So users UDFs could stash keys and values away in this on the front
end and then be guaranteed to pick them back up on the back end.  Pig's map,
reduce, and combiner frameworks would need to change to explicitly desieralize
this and populate it.  The front end would need to change to serialize this as
part of submitting the job to hadoop.

Furthermore, users could populate this from a configuration file by providing
a file on the command line.  We would add a command line argument (such as
-u/-userconf).  Contents of this file would be read using
Properties.loadFromXml and then loaded to PigConf.userConf.

The reason a Properties object is not used for this is that Properties is a
Map<Object, Object> which is too generic.  We would like to constrain the keys
to be Strings, and the values must be Serializable so that we can guarantee
that we can transmit them from front end to back.

Thoughts?


> Pass global configurations to UDF
> ---------------------------------
>
>                 Key: PIG-602
>                 URL: https://issues.apache.org/jira/browse/PIG-602
>             Project: Pig
>          Issue Type: New Feature
>          Components: impl
>            Reporter: Yiping Han
>            Assignee: Alan Gates
>
> We are seeking an easy way to pass a large number of global configurations to UDFs.
> Since our application contains many pig jobs, and has a large number of configurations. Passing configurations through command line is not an ideal way (i.e. modifying single parameter needs to change multiple command lines). And to put everything into the hadoop conf is not an ideal way either.
> We would like to see if Pig can provide such a facility that allows us to pass a configuration file in some format(XML?) and then make it available through out all the UDFs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.