You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Alan Gates (JIRA)" <ji...@apache.org> on 2008/06/19 21:29:44 UTC

[jira] Created: (PIG-276) Allow UDFs to have different implementations based on input types

Allow UDFs to have different implementations based on input types
-----------------------------------------------------------------

                 Key: PIG-276
                 URL: https://issues.apache.org/jira/browse/PIG-276
             Project: Pig
          Issue Type: Sub-task
            Reporter: Alan Gates
            Assignee: Pradeep Kamath




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-276) Allow UDFs to have different implementations based on input types

Posted by "Pradeep Kamath (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Pradeep Kamath updated PIG-276:
-------------------------------

    Attachment: udf_funcspec_src.patch

Attached patch contains the modifications in the "src" part of the code to support new design for EvalFunc to handle different input types. 
New design - quoting Alan:
{quote}
With the introduction of types (see
http://issues.apache.org/jira/browse/PIG-157) we need to decide how EvalFunc will interact with the types.  The original proposal was that the DEFINE keyword would be modified to allow specification of types for the UDF.  This has a couple of problems.  One, DEFINE is already used to specify constructor arguments.  Using it to also specify types will be confusing.  Two, it has been pointed out that this type information is a property of the UDF and should therefore be declared by the UDF, not in the script.

Separately, as a way to allow simple function overloading, a change had been proposed to the EvalFunc interface to allow an EvalFunc to specify that for a given type, a different instance of EvalFunc should be used (see https://issues.apache.org/jira/browse/PIG-276).

I would like to propose that we expand the changes in PIG-276 to be more general.  Rather than adding classForType() as proposed in PIG-276, EvalFunc will instead add a function:

public Map<Schema, FuncSpec> getArgToFuncMapping() {
    return null;
}

Where FuncSpec is a new class that contains the name of the class that implements the UDF along with any necessary arguments for the constructor.

The type checker will then, as part of type checking LOUserFunc make a call to this function.  If it receives a null, it will simply leave the UDF as is, and make the assumption that the UDF can handle whatever datatype is being provided to it.  This will cover most existing UDFs, which will not override the default implementation.

If a UDF wants to override the default, it should return a map that gives a FuncSpec for each type of schema that it can support.  For example, for the UDF concat, the map would have two entries:
key: schema(chararray, chararray) value: StringConcat
key: schema(bytearray, bytearray) value: ByteConcat

The type checker will then take the schema of what is being passed to it and perform a lookup in the map.  If it finds an entry, it will use the associated FuncSpec.  If it does not, it will throw an exception saying that that EvalFunc cannot be used with those types.

At this point, the type checker will make no effort to find a best fit function.  Either the fit is perfect, or it will not be done.  In the future we would like to modify the type checker to select a best fit.  
For example, if a UDF says it can handle schema(long) and the type checker finds it has schema(int), it can insert a cast to deal with that.  But in the first pass we will ignore this and depend on the user to insert the casts.

{quote}

One Change to the above proposal is the change in return type of getArgToFuncMapping() :

{code}

public List<FuncSpec> getArgToFuncMapping() {
    return null;
}

{code}

The FuncSpec class will also have a schema member to hold the schema of the input arguments supported by a given FuncSpec object. So The TypeCheckingVisitor will iterate over the List<FuncSpec> to see if a matching FuncSpec can be found corresponding to the schema of the input args it has.

Some other observations:
   * In AVG, if there are some null inputs, these will contribute to the "count" in the average but will be treated as 0 in the "sum" needed for the average
   * SUM, AVG, MIN and MAX on DataByteArrays (i.e. input with no type specified) will compute the function by converting the input to Double (the input will not be permanently casted - a Double copy of the input will be used for the computations)
   * SIZE and CONCAT will return null if *either* of their inputs are null

Deprecation:
bq.
@Deprecated
    public void registerFunction(String function, String functionSpec) 
in favor of:
    public void registerFunction(String function, FuncSpec funcSpec) 

A patch covering changes in Tests will be attached separately


> Allow UDFs to have different implementations based on input types
> -----------------------------------------------------------------
>
>                 Key: PIG-276
>                 URL: https://issues.apache.org/jira/browse/PIG-276
>             Project: Pig
>          Issue Type: Sub-task
>            Reporter: Alan Gates
>            Assignee: Pradeep Kamath
>         Attachments: EvalFunc.patch, EvalFunc_Combined.patch, EvalFunc_unittestcases.patch, udf_funcspec_src.patch, udf_funcspec_tests.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-276) Allow UDFs to have different implementations based on input types

Posted by "Pradeep Kamath (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Pradeep Kamath updated PIG-276:
-------------------------------


The following unit tests still fail with this patch which to my knowledge are not caused due to these changes:
    [junit] Running org.apache.pig.test.TestEvalPipeline 
    [junit] Tests run: 8, Failures: 0, Errors: 1, Time elapsed: 267.675 sec
    [junit] Test org.apache.pig.test.TestEvalPipeline FAILED
    [junit] Running org.apache.pig.test.TestFilterOpNumeric
    [junit] Tests run: 8, Failures: 0, Errors: 1, Time elapsed: 96.559 sec
    [junit] Test org.apache.pig.test.TestFilterOpNumeric FAILED
    [junit] Running org.apache.pig.test.TestStoreOld
    [junit] Tests run: 3, Failures: 0, Errors: 2, Time elapsed: 23.556 sec
    [junit] Test org.apache.pig.test.TestStoreOld FAILED


> Allow UDFs to have different implementations based on input types
> -----------------------------------------------------------------
>
>                 Key: PIG-276
>                 URL: https://issues.apache.org/jira/browse/PIG-276
>             Project: Pig
>          Issue Type: Sub-task
>            Reporter: Alan Gates
>            Assignee: Pradeep Kamath
>         Attachments: EvalFunc.patch, EvalFunc_Combined.patch, EvalFunc_unittestcases.patch, udf_funcspec_src.patch, udf_funcspec_tests.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-276) Allow UDFs to have different implementations based on input types

Posted by "Pradeep Kamath (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Pradeep Kamath updated PIG-276:
-------------------------------


Due to patch PIG-269's latest commit, there will be compilation errors relating to the following code in src/org/apache/pig/impl/PigContext.java *after* applying the patch supplied for this issue.

{code}
public static Object instantiateFuncFromSpec(FuncSpec funcSpec)  {
...
catch(NoSuchMethodException nme) {
            // Second channce. Try with var arg constructor
            try {
                Constructor c = objClass.getConstructor(String[].class);
                String[] argArr = args.toArray(new String[0]) ;
                Object[] wrappedArgs = new Object[1] ;
                wrappedArgs[0] = argArr ;
                ret =  c.newInstance(wrappedArgs);
            }
...
{code}

This will need to be changed to :

{code}
public static Object instantiateFuncFromSpec(FuncSpec funcSpec)  {
...
      }
        catch(NoSuchMethodException nme) {
            // Second channce. Try with var arg constructor
            try {
                Constructor c = objClass.getConstructor(String[].class);
                
                Object[] wrappedArgs = new Object[1] ;
                wrappedArgs[0] = args ;
                ret =  c.newInstance(wrappedArgs);
            }
...
{code}

> Allow UDFs to have different implementations based on input types
> -----------------------------------------------------------------
>
>                 Key: PIG-276
>                 URL: https://issues.apache.org/jira/browse/PIG-276
>             Project: Pig
>          Issue Type: Sub-task
>            Reporter: Alan Gates
>            Assignee: Pradeep Kamath
>         Attachments: EvalFunc.patch, EvalFunc_Combined.patch, EvalFunc_unittestcases.patch, udf_funcspec_src.patch, udf_funcspec_tests.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-276) Allow UDFs to have different implementations based on input types

Posted by "Pi Song (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12612689#action_12612689 ] 

Pi Song commented on PIG-276:
-----------------------------

This patch looks very good to me.
I will commit in 24 hrs ***if there is no objection***.

> Allow UDFs to have different implementations based on input types
> -----------------------------------------------------------------
>
>                 Key: PIG-276
>                 URL: https://issues.apache.org/jira/browse/PIG-276
>             Project: Pig
>          Issue Type: Sub-task
>            Reporter: Alan Gates
>            Assignee: Pradeep Kamath
>         Attachments: EvalFunc.patch, EvalFunc_Combined.patch, EvalFunc_unittestcases.patch, udf_funcspec_src.patch, udf_funcspec_src_v2.patch, udf_funcspec_tests.patch, udf_funcspec_tests_v2.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-276) Allow UDFs to have different implementations based on input types

Posted by "Pradeep Kamath (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Pradeep Kamath updated PIG-276:
-------------------------------

    Attachment: udf_funcspec_tests_v2.patch

Patch for the tests

> Allow UDFs to have different implementations based on input types
> -----------------------------------------------------------------
>
>                 Key: PIG-276
>                 URL: https://issues.apache.org/jira/browse/PIG-276
>             Project: Pig
>          Issue Type: Sub-task
>            Reporter: Alan Gates
>            Assignee: Pradeep Kamath
>         Attachments: EvalFunc.patch, EvalFunc_Combined.patch, EvalFunc_unittestcases.patch, udf_funcspec_src.patch, udf_funcspec_src_v2.patch, udf_funcspec_tests.patch, udf_funcspec_tests_v2.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-276) Allow UDFs to have different implementations based on input types

Posted by "Alan Gates (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12606524#action_12606524 ] 

Alan Gates commented on PIG-276:
--------------------------------

Ideally we would like to support full function overloading for UDFs.  In the meantime, we need a way to allow some highly used UDFs to have separate implementations based on input types.  There are two reasons for this:

# Obeying the law of least astonishment.  Users don't expect to have SUM(int) return a double.
# Performance.  Some crude testing showed that summing longs was 10x faster than summing doubles.  As some of these builtin functions are very frequently used, optimizing them is a worthwhile endeavor.

Based on discussions on PIG-162, I propose the following changes:

It will be possible to specify an implementation of EvalFunc for each type.  In the default implementation (such as SUM) there will be a method:

Class classForType(byte type); // uses DataType types

Given a type, this method will return the appropriate extension of EvalFunc to be used.  This will require the following changes:

# The EvalFunc class will need to have this method added.  It should have a default implementation that returns null.
# The type checker will need to be changed to call classForType as part of checking LOUserFunc. If classForType returns anything other than null, it will need to change mFuncSpec in LOUserFunc.  Currently, the parser does some checks on the function when it loads it (makes sure we can load the indicated class, etc.)  This should be factored out and put in LOUserFunc (or a helper class) so that type checker can do the same checks after it swaps the function.  Also, LOUserFunc shoudl change to keep a reference to the actual UDF (which the parser instantiates), so the type checker doesn't have to instantiate it again.

As for builtins, we need to implement the following specialized functions:

|| External name || input type || output type || mapped to || comments ||
| SUM | long | long | longSum | will handle sum of ints too |
| SUM | double | double | doubleSum | will handle sum of floats too |
| MIN | int | int | intMin | 
| MIN | long | long | longMin | 
| MIN | float | float | floatMin |
| MIN |	double |	double |	doubleMin |	
| MIN |	chararray |	chararray |	charMin |	
| MIN |	bytearray |	bytearray |	byteMin |	
| MAX |	int |	int |	intMax |	
| MAX |	long |	long |	longMax |	
| MAX |	float |	float |	floatMax |	
| MAX |	double |	double |	doubleMax |	
| MAX |	chararray |	chararray |	charMax |	
| MAX |	bytearray |	bytearray |	byteMax |	
| AVG |	long |	double |	longAvg |	will handle avg of ints too |
| AVG |	double |	double |	doubleAvg |	will handle avg of floats too |
| concat |	chararray |	chararray |	charConcat |	new function to concatenate strings |
| concat |	bytearray |	bytearray |	byteConcat |	new function to concatenate strings |
| size |	bag |	long |	bagSize |	returns number of tuples |
| size |	tuple |	long |	tupleSize |	returns number of elements |
| size |	map |	long |	mapSize |	returns number of keys |
| size |	chararray |	long |	charSize |	returns number of characters in chararray |
| size |	bytearray |	long |	byteSize |	returns number of bytes in chararray |


The existing versions of SUM, MIN, MAX, and AVG will need to implement the classForType method.  Default versions of concat and size will need to be implemented that also implement the classForType method.  The default implementations of eval for these two new functions should just error out.


> Allow UDFs to have different implementations based on input types
> -----------------------------------------------------------------
>
>                 Key: PIG-276
>                 URL: https://issues.apache.org/jira/browse/PIG-276
>             Project: Pig
>          Issue Type: Sub-task
>            Reporter: Alan Gates
>            Assignee: Pradeep Kamath
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-276) Allow UDFs to have different implementations based on input types

Posted by "Pradeep Kamath (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Pradeep Kamath updated PIG-276:
-------------------------------

    Attachment: udf_funcspec_tests.patch

Patch covering the changes in tests for the code changes elaborated in the previous comment

> Allow UDFs to have different implementations based on input types
> -----------------------------------------------------------------
>
>                 Key: PIG-276
>                 URL: https://issues.apache.org/jira/browse/PIG-276
>             Project: Pig
>          Issue Type: Sub-task
>            Reporter: Alan Gates
>            Assignee: Pradeep Kamath
>         Attachments: EvalFunc.patch, EvalFunc_Combined.patch, EvalFunc_unittestcases.patch, udf_funcspec_src.patch, udf_funcspec_tests.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-276) Allow UDFs to have different implementations based on input types

Posted by "Pradeep Kamath (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Pradeep Kamath updated PIG-276:
-------------------------------

    Attachment: udf_funcspec_src_v2.patch

There have been some commits yesterday which have broken the earlier submitted patch - mostly in calls to FileSpec which now expects a FuncSpec object instead of a String in the constructor. I regenerated a patch which applies cleanly against revision 675718 ( the latest revision at this time).

I will also be attaching another patch which is for the "tests".

If one of the committers can review (Pi has already reviewed once and the delta since then is minor) and commit before more changes go in, it would be good. This patch is quite big and any usage of FileSpec and funcSpec is affected by this. So it would be good to commit it soon.


> Allow UDFs to have different implementations based on input types
> -----------------------------------------------------------------
>
>                 Key: PIG-276
>                 URL: https://issues.apache.org/jira/browse/PIG-276
>             Project: Pig
>          Issue Type: Sub-task
>            Reporter: Alan Gates
>            Assignee: Pradeep Kamath
>         Attachments: EvalFunc.patch, EvalFunc_Combined.patch, EvalFunc_unittestcases.patch, udf_funcspec_src.patch, udf_funcspec_src_v2.patch, udf_funcspec_tests.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Resolved: (PIG-276) Allow UDFs to have different implementations based on input types

Posted by "Pi Song (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Pi Song resolved PIG-276.
-------------------------

       Resolution: Fixed
    Fix Version/s: types_branch

A few unit tests were still failing after applying the patch. I have fixed them.

Committed.

Thanks Pradeep.

> Allow UDFs to have different implementations based on input types
> -----------------------------------------------------------------
>
>                 Key: PIG-276
>                 URL: https://issues.apache.org/jira/browse/PIG-276
>             Project: Pig
>          Issue Type: Sub-task
>            Reporter: Alan Gates
>            Assignee: Pradeep Kamath
>             Fix For: types_branch
>
>         Attachments: EvalFunc.patch, EvalFunc_Combined.patch, EvalFunc_unittestcases.patch, udf_funcspec_src.patch, udf_funcspec_src_v2.patch, udf_funcspec_tests.patch, udf_funcspec_tests_v2.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-276) Allow UDFs to have different implementations based on input types

Posted by "Pradeep Kamath (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Pradeep Kamath updated PIG-276:
-------------------------------

    Attachment: EvalFunc.patch

Patch with changes to handle standard aggs dependent on input type. I tried using generics but couldn't - hence the rather big patch

Will upload another patch with unit test cases soon

> Allow UDFs to have different implementations based on input types
> -----------------------------------------------------------------
>
>                 Key: PIG-276
>                 URL: https://issues.apache.org/jira/browse/PIG-276
>             Project: Pig
>          Issue Type: Sub-task
>            Reporter: Alan Gates
>            Assignee: Pradeep Kamath
>         Attachments: EvalFunc.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-276) Allow UDFs to have different implementations based on input types

Posted by "Alan Gates (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12606560#action_12606560 ] 

Alan Gates commented on PIG-276:
--------------------------------

I think we do want it to return Class.  This will be called at parse time, not run time.  instantiated UDFs aren't serialized and passed to the run time, so there's no advantage to stashing info in the returned object.

> Allow UDFs to have different implementations based on input types
> -----------------------------------------------------------------
>
>                 Key: PIG-276
>                 URL: https://issues.apache.org/jira/browse/PIG-276
>             Project: Pig
>          Issue Type: Sub-task
>            Reporter: Alan Gates
>            Assignee: Pradeep Kamath
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-276) Allow UDFs to have different implementations based on input types

Posted by "Pradeep Kamath (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Pradeep Kamath updated PIG-276:
-------------------------------

    Attachment: EvalFunc_Combined.patch

New Combined patch with changes from above comment - this patch has both the source files as well as unit test cases

> Allow UDFs to have different implementations based on input types
> -----------------------------------------------------------------
>
>                 Key: PIG-276
>                 URL: https://issues.apache.org/jira/browse/PIG-276
>             Project: Pig
>          Issue Type: Sub-task
>            Reporter: Alan Gates
>            Assignee: Pradeep Kamath
>         Attachments: EvalFunc.patch, EvalFunc_Combined.patch, EvalFunc_unittestcases.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-276) Allow UDFs to have different implementations based on input types

Posted by "Benjamin Reed (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12606811#action_12606811 ] 

Benjamin Reed commented on PIG-276:
-----------------------------------

Agreed.

> Allow UDFs to have different implementations based on input types
> -----------------------------------------------------------------
>
>                 Key: PIG-276
>                 URL: https://issues.apache.org/jira/browse/PIG-276
>             Project: Pig
>          Issue Type: Sub-task
>            Reporter: Alan Gates
>            Assignee: Pradeep Kamath
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-276) Allow UDFs to have different implementations based on input types

Posted by "Benjamin Reed (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12606529#action_12606529 ] 

Benjamin Reed commented on PIG-276:
-----------------------------------

Wouldn't it be more efficient and simpler to use

EvalFunction functionForType(byte type); // uses DataType types

It may help avoid extra instantiations of the EvalFunction. It also allows programmers to tuck meta information into the object they return.

> Allow UDFs to have different implementations based on input types
> -----------------------------------------------------------------
>
>                 Key: PIG-276
>                 URL: https://issues.apache.org/jira/browse/PIG-276
>             Project: Pig
>          Issue Type: Sub-task
>            Reporter: Alan Gates
>            Assignee: Pradeep Kamath
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-276) Allow UDFs to have different implementations based on input types

Posted by "Pradeep Kamath (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Pradeep Kamath updated PIG-276:
-------------------------------

    Attachment: EvalFunc_unittestcases.patch

Patch containing unit test cases to test input type based Aggregate functions

> Allow UDFs to have different implementations based on input types
> -----------------------------------------------------------------
>
>                 Key: PIG-276
>                 URL: https://issues.apache.org/jira/browse/PIG-276
>             Project: Pig
>          Issue Type: Sub-task
>            Reporter: Alan Gates
>            Assignee: Pradeep Kamath
>         Attachments: EvalFunc.patch, EvalFunc_unittestcases.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-276) Allow UDFs to have different implementations based on input types

Posted by "Alan Gates (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12609287#action_12609287 ] 

Alan Gates commented on PIG-276:
--------------------------------

Patch looks very good.  A couple of comments:

# In min and max functions, initial values of, for example, Integer.MAX_VALUE and Integer.MIN_VALUE are being used.  In the case where all nulls or an empty bag are passed to the function, this will result in those values being returned.  We have not defined the semantics of MIN and MAX when they are passed all nulls.  SQL returns null in this case, which is probably the right answer.

# In many of the initial functions, a call is made to getTupleFactory.  We should look into making this static to avoid the cost of getting the tuple factory each time (a similar change in the trunk brought a significant speed up).

In both cases, I don't think these are changes you introduced, you just extended what was already there.  But while we're in there reworking it, we might as well improve it.

> Allow UDFs to have different implementations based on input types
> -----------------------------------------------------------------
>
>                 Key: PIG-276
>                 URL: https://issues.apache.org/jira/browse/PIG-276
>             Project: Pig
>          Issue Type: Sub-task
>            Reporter: Alan Gates
>            Assignee: Pradeep Kamath
>         Attachments: EvalFunc.patch, EvalFunc_unittestcases.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-276) Allow UDFs to have different implementations based on input types

Posted by "Pi Song (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12612131#action_12612131 ] 

Pi Song commented on PIG-276:
-----------------------------

Very good +1

> Allow UDFs to have different implementations based on input types
> -----------------------------------------------------------------
>
>                 Key: PIG-276
>                 URL: https://issues.apache.org/jira/browse/PIG-276
>             Project: Pig
>          Issue Type: Sub-task
>            Reporter: Alan Gates
>            Assignee: Pradeep Kamath
>         Attachments: EvalFunc.patch, EvalFunc_Combined.patch, EvalFunc_unittestcases.patch, udf_funcspec_src.patch, udf_funcspec_tests.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.