You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Pi Song (JIRA)" <ji...@apache.org> on 2008/04/07 16:19:24 UTC

[jira] Commented: (PIG-58) parameterized Pig scripts

    [ https://issues.apache.org/jira/browse/PIG-58?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12586403#action_12586403 ] 

Pi Song commented on PIG-58:
----------------------------

A quick review:-

1. You assume no escaping in shell command, right?

2. The name "UtilFunctions" implies it does not hold state (even global state). From the way it is used, we should have a better name or refactor is needed.

3. PigFileParser.unquote still doesn't do escaping.

4. <DEFALT> token in PigFileParser misspelled

5. Why ParamLoader.Parse() throw IOException?

6. In UtilFunctions.substitute, what does "replaced_line = replaced_line.replaceAll("\\\\\\$","\\$");" do?

7. Shouldn't logger be declared "private final Logger logger = Logger.getLogger("org.apache.pig.preprocessor.log");" (everything in one line) to make it consistent?

Trivial things:-
1. I prefer HashMap to HashTable
2. Pattern identifier in UtilFunctions.substitute can be made static, this makes it 1 microsecond faster :D



> parameterized Pig scripts
> -------------------------
>
>                 Key: PIG-58
>                 URL: https://issues.apache.org/jira/browse/PIG-58
>             Project: Pig
>          Issue Type: New Feature
>            Reporter: Olga Natkovich
>         Attachments: PIG-58_v1.patch
>
>
> This feature has been requested by several users and would be very useful in conjunction with streaming. The feature would allow pig script to include parameters that are replaced at run time. For instance, if your script needs to run on a daily basis over the data of the previous day, you would be able to use the script and providing a date as a run-time parameter to it.
> Example:
> =======
> Pig script myscript.pig:
> A = load '/data/mydata/%date%';
> B = filter A by $0>'5';
> .....
> Pig command line:
> pig -param date='20080110' myscript.pig
> Proposed interface and implementation:
> Interface:
> =======
> (0) Substitution will be only supported with pig script files.
> (1) Parameters are specified on the command line via -param <param>=<val> construct. Multiple parameters can be specified. They are applied to the script in the order they are specified on the command line
> (2) Default values for the parameters can be specified within the script via decare statement:
> decare <param>=<value>
> (3) Withint the script the parameter will be enclosed in %%. \% can be used te escape.
> Implementation:
> ============
> Use preprocessor to do the substitution. The preprocessor would be invoced by Main before grunt is instanciated and do the following:
> - create a new file in temp location
> - build a hash of parameters from command line and declare statement
> - for each line in the original script
>   if this is a declare line, skip it
>   else for each unescaped pattern %<identifie>% look for a match in the hash. Replace, if found.  Write the line to the temp file.
> - pass the temp file to grunt.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.