You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Zheng Shao (JIRA)" <ji...@apache.org> on 2010/02/04 01:09:27 UTC

[jira] Created: (HIVE-1130) Create argmin and argmax

Create argmin and argmax
------------------------

                 Key: HIVE-1130
                 URL: https://issues.apache.org/jira/browse/HIVE-1130
             Project: Hadoop Hive
          Issue Type: Improvement
            Reporter: Zheng Shao


With HIVE-1128, users can already do what argmax and argmin does.

But it will be helpful if we provide these functions explicitly so people from maths/stats background can use it more easily.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-1130) Create argmin and argmax

Posted by "Pierre Huyn (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-1130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Pierre Huyn updated HIVE-1130:
------------------------------

    Attachment: HIVE-1130.2.patch

Took care of all the items from code review.

Of course, I will also provide argmin, but not until all problems with argmax are resolved.

ANT TEST still fails. I am still puzzled since the udaf tested out fine with the recompiled hive but fails under ANT TEST. How are the 2 environments different?


> Create argmin and argmax
> ------------------------
>
>                 Key: HIVE-1130
>                 URL: https://issues.apache.org/jira/browse/HIVE-1130
>             Project: Hadoop Hive
>          Issue Type: Improvement
>    Affects Versions: 0.7.0
>            Reporter: Zheng Shao
>            Assignee: Pierre Huyn
>             Fix For: 0.7.0
>
>         Attachments: HIVE-1130.1.patch, HIVE-1130.2.patch
>
>
> With HIVE-1128, users can already do what argmax and argmin does.
> But it will be helpful if we provide these functions explicitly so people from maths/stats background can use it more easily.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-1130) Create argmin and argmax

Posted by "John Sichi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-1130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12904762#action_12904762 ] 

John Sichi commented on HIVE-1130:
----------------------------------

Comparing with GenericUDAFMax, I noticed that function uses different inputOI and outputOI, whereas you use a single xInputOI for both.

      outputOI = ObjectInspectorUtils.getStandardObjectInspector(inputOI,
          ObjectInspectorCopyOption.JAVA);

Maybe you need to follow that pattern?


> Create argmin and argmax
> ------------------------
>
>                 Key: HIVE-1130
>                 URL: https://issues.apache.org/jira/browse/HIVE-1130
>             Project: Hadoop Hive
>          Issue Type: Improvement
>    Affects Versions: 0.7.0
>            Reporter: Zheng Shao
>            Assignee: Pierre Huyn
>             Fix For: 0.7.0
>
>         Attachments: HIVE-1130.1.patch, HIVE-1130.2.patch
>
>
> With HIVE-1128, users can already do what argmax and argmin does.
> But it will be helpful if we provide these functions explicitly so people from maths/stats background can use it more easily.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-1130) Create argmin and argmax

Posted by "John Sichi (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-1130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

John Sichi updated HIVE-1130:
-----------------------------

    Status: Open  (was: Patch Available)

> Create argmin and argmax
> ------------------------
>
>                 Key: HIVE-1130
>                 URL: https://issues.apache.org/jira/browse/HIVE-1130
>             Project: Hadoop Hive
>          Issue Type: Improvement
>    Affects Versions: 0.7.0
>            Reporter: Zheng Shao
>            Assignee: Pierre Huyn
>             Fix For: 0.7.0
>
>         Attachments: HIVE-1130.1.patch, HIVE-1130.2.patch
>
>
> With HIVE-1128, users can already do what argmax and argmin does.
> But it will be helpful if we provide these functions explicitly so people from maths/stats background can use it more easily.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-1130) Create argmin and argmax

Posted by "HBase Review Board (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-1130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12904260#action_12904260 ] 

HBase Review Board commented on HIVE-1130:
------------------------------------------

Message from: "John Sichi" <js...@facebook.com>

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://review.cloudera.org/r/746/
-----------------------------------------------------------

Review request for Hive Developers.


Summary
-------

Review by JVS


This addresses bug HIVE-1130.
    http://issues.apache.org/jira/browse/HIVE-1130


Diffs
-----

  http://svn.apache.org/repos/asf/hadoop/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java 990399 
  http://svn.apache.org/repos/asf/hadoop/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFArgMax.java PRE-CREATION 
  http://svn.apache.org/repos/asf/hadoop/hive/trunk/ql/src/test/queries/clientpositive/udaf_argmax.q PRE-CREATION 
  http://svn.apache.org/repos/asf/hadoop/hive/trunk/ql/src/test/results/clientpositive/udaf_argmax.q.out PRE-CREATION 

Diff: http://review.cloudera.org/r/746/diff


Testing
-------


Thanks,

John




> Create argmin and argmax
> ------------------------
>
>                 Key: HIVE-1130
>                 URL: https://issues.apache.org/jira/browse/HIVE-1130
>             Project: Hadoop Hive
>          Issue Type: Improvement
>    Affects Versions: 0.7.0
>            Reporter: Zheng Shao
>            Assignee: Pierre Huyn
>             Fix For: 0.7.0
>
>         Attachments: HIVE-1130.1.patch
>
>
> With HIVE-1128, users can already do what argmax and argmin does.
> But it will be helpful if we provide these functions explicitly so people from maths/stats background can use it more easily.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-1130) Create argmin and argmax

Posted by "Pierre Huyn (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-1130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Pierre Huyn updated HIVE-1130:
------------------------------

    Status: Patch Available  (was: Open)

See comment from previous message.

> Create argmin and argmax
> ------------------------
>
>                 Key: HIVE-1130
>                 URL: https://issues.apache.org/jira/browse/HIVE-1130
>             Project: Hadoop Hive
>          Issue Type: Improvement
>    Affects Versions: 0.7.0
>            Reporter: Zheng Shao
>            Assignee: Pierre Huyn
>             Fix For: 0.7.0
>
>         Attachments: HIVE-1130.1.patch, HIVE-1130.2.patch
>
>
> With HIVE-1128, users can already do what argmax and argmin does.
> But it will be helpful if we provide these functions explicitly so people from maths/stats background can use it more easily.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-1130) Create argmin and argmax

Posted by "John Sichi (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-1130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

John Sichi updated HIVE-1130:
-----------------------------

    Status: Open  (was: Patch Available)

Review comments added at

https://review.cloudera.org/r/746/

I think you uploaded the wrong .q.out since it doesn't match the .q

For the test failures, it's probably because you didn't update show_functions.q.out.

Also, for this and the previous functions you added, can you update this wiki page?

http://wiki.apache.org/hadoop/Hive/LanguageManual/UDF#Built-in_Aggregate_Functions_.28UDAF.29

Thanks!


> Create argmin and argmax
> ------------------------
>
>                 Key: HIVE-1130
>                 URL: https://issues.apache.org/jira/browse/HIVE-1130
>             Project: Hadoop Hive
>          Issue Type: Improvement
>    Affects Versions: 0.7.0
>            Reporter: Zheng Shao
>            Assignee: Pierre Huyn
>             Fix For: 0.7.0
>
>         Attachments: HIVE-1130.1.patch
>
>
> With HIVE-1128, users can already do what argmax and argmin does.
> But it will be helpful if we provide these functions explicitly so people from maths/stats background can use it more easily.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-1130) Create argmin and argmax

Posted by "John Sichi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-1130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12904407#action_12904407 ] 

John Sichi commented on HIVE-1130:
----------------------------------

(Did your stack trace get cut off?)


> Create argmin and argmax
> ------------------------
>
>                 Key: HIVE-1130
>                 URL: https://issues.apache.org/jira/browse/HIVE-1130
>             Project: Hadoop Hive
>          Issue Type: Improvement
>    Affects Versions: 0.7.0
>            Reporter: Zheng Shao
>            Assignee: Pierre Huyn
>             Fix For: 0.7.0
>
>         Attachments: HIVE-1130.1.patch, HIVE-1130.2.patch
>
>
> With HIVE-1128, users can already do what argmax and argmin does.
> But it will be helpful if we provide these functions explicitly so people from maths/stats background can use it more easily.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-1130) Create argmin and argmax

Posted by "Pierre Huyn (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-1130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12904650#action_12904650 ] 

Pierre Huyn commented on HIVE-1130:
-----------------------------------

Not sure why the rest of my message was cut off. It is still in my sent mail. Here is the entire message:

Hi John,

It appears that the call ObjectInspectorUtils.compare(myagg.max, xInputOI, px, xInputOI), which tries to compare 2 integer writables, failed in casting. Here is a fragment of the stack trace:

...
Caused by: java.lang.ClassCastException: org.apache.hadoop.io.IntWritable cannot be cast to org.apache.hadoop.hive.serde2.lazy.LazyPrimitive
        at org.apache.hadoop.hive.serde2.lazy.objectinspector.primitive.AbstractPrimitiveLazyObjectInspector.getPrimitiveWritableObject(AbstractPrimitiveLazyObjectInspector.java:37)
        at org.apache.hadoop.hive.serde2.lazy.objectinspector.primitive.LazyIntObjectInspector.get(LazyIntObjectInspector.java:38)
        at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.compare(ObjectInspectorUtils.java:497)
        at org.apache.hadoop.hive.ql.udf.generic.GenericUDAFArgMax$GenericUDAFArgMaxEvaluator.internalMerge(GenericUDAFArgMax.java:165)
        at org.apache.hadoop.hive.ql.udf.generic.GenericUDAFArgMax$GenericUDAFArgMaxEvaluator.iterate(GenericUDAFArgMax.java:142)
...

Not sure why the casting fails. Both myagg.max and px were passed in as arguments of iterate(). Their data types are not known at compile time. That is the only reason why I use ObjectInspectorUtils.compare() to compare them. Do you see any problem with that?

--- Pierre





> Create argmin and argmax
> ------------------------
>
>                 Key: HIVE-1130
>                 URL: https://issues.apache.org/jira/browse/HIVE-1130
>             Project: Hadoop Hive
>          Issue Type: Improvement
>    Affects Versions: 0.7.0
>            Reporter: Zheng Shao
>            Assignee: Pierre Huyn
>             Fix For: 0.7.0
>
>         Attachments: HIVE-1130.1.patch, HIVE-1130.2.patch
>
>
> With HIVE-1128, users can already do what argmax and argmin does.
> But it will be helpful if we provide these functions explicitly so people from maths/stats background can use it more easily.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-1130) Create argmin and argmax

Posted by "Pierre Huyn (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-1130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12904396#action_12904396 ] 

Pierre Huyn commented on HIVE-1130:
-----------------------------------

Hi John,

It appears that the call ObjectInspectorUtils.compare(myagg.max, xInputOI, px, xInputOI), which tries to compare 2 integer writables, failed in casting. Here is a fragment of the stack trace:



> Create argmin and argmax
> ------------------------
>
>                 Key: HIVE-1130
>                 URL: https://issues.apache.org/jira/browse/HIVE-1130
>             Project: Hadoop Hive
>          Issue Type: Improvement
>    Affects Versions: 0.7.0
>            Reporter: Zheng Shao
>            Assignee: Pierre Huyn
>             Fix For: 0.7.0
>
>         Attachments: HIVE-1130.1.patch, HIVE-1130.2.patch
>
>
> With HIVE-1128, users can already do what argmax and argmin does.
> But it will be helpful if we provide these functions explicitly so people from maths/stats background can use it more easily.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-1130) Create argmin and argmax

Posted by "John Sichi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-1130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12904361#action_12904361 ] 

John Sichi commented on HIVE-1130:
----------------------------------

You'll need to dig into the logs to diagnose the problem.

I noticed this in the end of the last .q.out you uploaded.

+PREHOOK: Output: file:/tmp/nhuyn/hive_2010-08-30_13-07-04_657_9209893909733875962/-mr-10000
+FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask

Is that the failure you are getting when you run with ant test?

If so, you can find more info in one of the exec logs, probably in /tmp/nhuyn.


> Create argmin and argmax
> ------------------------
>
>                 Key: HIVE-1130
>                 URL: https://issues.apache.org/jira/browse/HIVE-1130
>             Project: Hadoop Hive
>          Issue Type: Improvement
>    Affects Versions: 0.7.0
>            Reporter: Zheng Shao
>            Assignee: Pierre Huyn
>             Fix For: 0.7.0
>
>         Attachments: HIVE-1130.1.patch, HIVE-1130.2.patch
>
>
> With HIVE-1128, users can already do what argmax and argmin does.
> But it will be helpful if we provide these functions explicitly so people from maths/stats background can use it more easily.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-1130) Create argmin and argmax

Posted by "Pierre Huyn (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-1130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Pierre Huyn updated HIVE-1130:
------------------------------

               Status: Patch Available  (was: Open)
    Affects Version/s: 0.7.0
        Fix Version/s: 0.7.0

Please review code.

> Create argmin and argmax
> ------------------------
>
>                 Key: HIVE-1130
>                 URL: https://issues.apache.org/jira/browse/HIVE-1130
>             Project: Hadoop Hive
>          Issue Type: Improvement
>    Affects Versions: 0.7.0
>            Reporter: Zheng Shao
>            Assignee: Pierre Huyn
>             Fix For: 0.7.0
>
>         Attachments: HIVE-1130.1.patch
>
>
> With HIVE-1128, users can already do what argmax and argmin does.
> But it will be helpful if we provide these functions explicitly so people from maths/stats background can use it more easily.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-1130) Create argmin and argmax

Posted by "Pierre Huyn (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-1130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Pierre Huyn updated HIVE-1130:
------------------------------

    Attachment: HIVE-1130.1.patch

Initial release of generic user-defined aggregate function:
                                    ARGMAX (X, Y)
which takes a set of pairs (X,Y) and returns a Y that maximizes X.
X must be of a comparable type. Any pair (NULL,Y) is ignored. If the function is applied to an empty set, NULL will be returned. If more than one Y value maximizes X, one of them will be returned arbitrarily.

The current implementation tested out fine with the rebuilt Hive in my working copy of the SVN tree. However, it fails with ANT TEST and I could not figure out why.

The code is ready for review. Also, any help to figure out why ANT TEST fails is appreciated.

> Create argmin and argmax
> ------------------------
>
>                 Key: HIVE-1130
>                 URL: https://issues.apache.org/jira/browse/HIVE-1130
>             Project: Hadoop Hive
>          Issue Type: Improvement
>            Reporter: Zheng Shao
>            Assignee: Pierre Huyn
>         Attachments: HIVE-1130.1.patch
>
>
> With HIVE-1128, users can already do what argmax and argmin does.
> But it will be helpful if we provide these functions explicitly so people from maths/stats background can use it more easily.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Assigned: (HIVE-1130) Create argmin and argmax

Posted by "Pierre Huyn (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-1130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Pierre Huyn reassigned HIVE-1130:
---------------------------------

    Assignee: Pierre Huyn

> Create argmin and argmax
> ------------------------
>
>                 Key: HIVE-1130
>                 URL: https://issues.apache.org/jira/browse/HIVE-1130
>             Project: Hadoop Hive
>          Issue Type: Improvement
>            Reporter: Zheng Shao
>            Assignee: Pierre Huyn
>
> With HIVE-1128, users can already do what argmax and argmin does.
> But it will be helpful if we provide these functions explicitly so people from maths/stats background can use it more easily.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.