You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Jonathan Coveney (Commented) (JIRA)" <ji...@apache.org> on 2012/03/27 08:57:32 UTC

[jira] [Commented] (PIG-2587) Compute LogicalPlan signature and store in job conf

    [ https://issues.apache.org/jira/browse/PIG-2587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13239263#comment-13239263 ] 

Jonathan Coveney commented on PIG-2587:
---------------------------------------

Bill,

There are a couple of ways to implement a signature like this. One is to just do the hashCode, which is what you did...that will be good for identical scripts. I wonder if it might be worth thinking about some sort of value that wouldn't change with cosmetic changes to the script (ie alias changes and the like)? I guess a signature is one thing, and the hashCode would be adequate, but ideally as long as the sources and transformations are the same, you'd want cosmetic changes not to throw out the tuning you've done.

Is that crazy talk? 80/20 may dictate just going with this approach since it is so simple and saving the bigger optimization for external systems.
                
> Compute LogicalPlan signature and store in job conf
> ---------------------------------------------------
>
>                 Key: PIG-2587
>                 URL: https://issues.apache.org/jira/browse/PIG-2587
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Bill Graham
>            Assignee: Bill Graham
>             Fix For: 0.10, 0.11
>
>         Attachments: pig-2587_1.patch
>
>
> We'd like to be able to uniquely identify a re-executed script (possibly with different inputs/outputs) by creating a signature of the {{LogicalPlan}}. Here's the proposal:
> # Add a new method {{LogicalPlan.getSignature()}} that returns a hash of its {{LogicalPlanPrinter}} output.
> # In {{PigServer.execute()}} set the signature on the job conf after the LP is compiled, but before it's executed.
> (1) would allow an impl of {{PigProgressNotificationListener.setScriptPlan()}} to save the LP signature with the script metadata. Upon subsequent runs (2) would allow an impl of {{PigReducerEstimator}} (see PIG-2574) to retrieve the current LP signature and fetch the historical data for the script. It could then use the previous run data to better estimate the number of reducers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira