You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Alan Gates (JIRA)" <ji...@apache.org> on 2008/05/01 20:24:55 UTC

[jira] Commented: (PIG-161) Rework physical plan

    [ https://issues.apache.org/jira/browse/PIG-161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12593602#action_12593602 ] 

Alan Gates commented on PIG-161:
--------------------------------

bq. 1) In POUserFunc.getNext(Integer) you appear to be doing some kind of cast from Comparison func. There are a couple of problems with this. One, you don't want an if instanceof in a getNext call. We can't afford to do that a million times during processing data. Two, physical operators should never be doing implicit casts. If we need a cast, it should inserted explicitly by the type checker.

bq. Shubham>> POUserFunc needs to handle both the EvalFunc and Comparison func. EvalFunc has the exec() method that does the processing while the Comparison func has the compare() method. In case of Integers, I need to find out which of the two functions I need to use and I do that using the if instanceof. I understand this increases the no. of instructions. Can you pls suggest something to work around this?? Do you think its a good idea to change the Comparison func interface to have an exec() method which can then call the compare method or not have a compare at all and have the exec() method do all the processing for Comparison func as well??

The best solution is probably to create a subclass of POUserFunc (POComparisonFunc maybe) that implements a getNext(Integer) specific to the comparison func.  The logical->physical translator can do the instanceof check then and instantiate the right class.  This avoids doing the instanceof on every getNext().

bq. Also, the cast is because func is an reference of Object type. I check if its a ComparisonFunc and then cast it to ComparisonFunc to access the compare method.

Ignore my comments on the cast, I was thinking that compare returned a boolean instead of an integer.

bq. 2) Several places in getNext are checking if func is null. The constructor should instead guarantee that the function has been called and then no checks should be done in getNext or anything it calls. This code is going to run once for every record processed, so we want to remove every instruction we can from it.

bq. Shubham>> Shravan had pointed out earlier that POUserFunc might not be serializable because of EvalFunc not being serializable. So I had to declare the Object func as transient. The null checking is to make sure that after deserialization func is instantiated with EvalFunc/ComparisonFunc.

Ok, but I assume that after deserialization on the MR side, func only needs to be instatiated once, but you are checking for it in a number of places.  It needs to only be checked once, preferably outside of the getNext loop if possible.





> Rework physical plan
> --------------------
>
>                 Key: PIG-161
>                 URL: https://issues.apache.org/jira/browse/PIG-161
>             Project: Pig
>          Issue Type: Sub-task
>            Reporter: Alan Gates
>            Assignee: Alan Gates
>         Attachments: arithmeticOperators.patch, incr2.patch, incr3.patch, incr4.patch, incr5.patch, MRCompilerTests_PlansAndOutputs.txt, Phy_AbsClass.patch, physicalOps.patch, physicalOps.patch, physicalOps.patch, podistinct.patch, pogenerate.patch, pogenerate.patch, pogenerate.patch, posort.patch
>
>
> This bug tracks work to rework all of the physical operators as described in http://wiki.apache.org/pig/PigTypesFunctionalSpec

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.