You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Olga Natkovich (JIRA)" <ji...@apache.org> on 2010/06/07 03:23:56 UTC

[jira] Commented: (PIG-1333) API interface to Pig

    [ https://issues.apache.org/jira/browse/PIG-1333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12876105#action_12876105 ] 

Olga Natkovich commented on PIG-1333:
-------------------------------------

Patch looks good. A few comments and questions:

(1) General comment. This patch is very large and combines multiple different issues that could have been separated into multiple patches to make it easier to review and test
(2) We are missing script level feature collection. (I see the one at job level.) For each script, we want to collect overall script features such as different operators: join, order by, etc., is it a multiquery, does it have UDF. Also, we would want to know if combiner was used and whether the script spilled but maybe both of those can be at the job level.
(3) We need to add separate comment to the JIRA marked as documentation that describes PigRunner since it is a new interface that we need to include in 0.8.0 documentation.
(4) MapReduceLauncher. Why was exception handling and temp store handling code removed?
(5) OutputStats assumes that location is a path which might not be true for non-file stores.
(6) ScriptState: There are maps/hashes optimized for enums (http://java.sun.com/j2se/1.5.0/docs/api/java/util/EnumMap.html)
(7) Why JobStats is derived from an operator?
(8) Why did JOB_NAME_PREFIX got removed from PigContext?
(9) Why do we need to synchronize getTemporaryFile?

> API interface to Pig
> --------------------
>
>                 Key: PIG-1333
>                 URL: https://issues.apache.org/jira/browse/PIG-1333
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Olga Natkovich
>            Assignee: Richard Ding
>             Fix For: 0.8.0
>
>         Attachments: PIG-1333.patch
>
>
> It would be nice to make Pig more friendly for applications like workflow that would be executing pig scripts on user behalf.
> Currently, they would have to use pig command line to execute the code; however, this has limitation on the kind of output that would be delivered. For instance, it is hard to produce error information that is easy to use programatically or collect statistics.
> The proposal is to create a class that mimics the behavior of the Main but gives users a status object back. The the main code of pig would look somethig like:
> public static void main(String args[])
> {
>     PigStatus ps = PigMain.exec(args);
>     exit (PigStatus.rc);
> }
> We need to define the following:
> - Content of PigStatus. It should at least include
>    * return code
>    * error string
>    * exception 
>    * statistics
> - A way to propagate the status class through pig code

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.