You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Pierre Huyn (JIRA)" <ji...@apache.org> on 2010/08/17 18:53:16 UTC

[jira] Created: (HIVE-1549) Add ANSI SQL correlation aggregate function CORR(X,Y).

Add ANSI SQL correlation aggregate function CORR(X,Y).
------------------------------------------------------

                 Key: HIVE-1549
                 URL: https://issues.apache.org/jira/browse/HIVE-1549
             Project: Hadoop Hive
          Issue Type: New Feature
          Components: Query Processor
    Affects Versions: 0.7.0
            Reporter: Pierre Huyn
            Assignee: Pierre Huyn
             Fix For: 0.7.0


Aggregate function that computes the Pearson's coefficient of correlation between a set of number pairs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-1549) Add ANSI SQL correlation aggregate function CORR(X,Y).

Posted by "Pierre Huyn (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-1549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900446#action_12900446 ] 

Pierre Huyn commented on HIVE-1549:
-----------------------------------

Hi John,

I just uploaded a new patch. I assume the conflicts were caused by the particular ways svn works. Hope this patch resolves the conflicts (magically?).

Regards
--- Pierre



> Add ANSI SQL correlation aggregate function CORR(X,Y).
> ------------------------------------------------------
>
>                 Key: HIVE-1549
>                 URL: https://issues.apache.org/jira/browse/HIVE-1549
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Query Processor
>    Affects Versions: 0.7.0
>            Reporter: Pierre Huyn
>            Assignee: Pierre Huyn
>             Fix For: 0.7.0
>
>         Attachments: HIVE-1549.1.patch, HIVE-1549.2.patch, HIVE-1549.3.patch
>
>   Original Estimate: 120h
>  Remaining Estimate: 120h
>
> Aggregate function that computes the Pearson's coefficient of correlation between a set of number pairs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-1549) Add ANSI SQL correlation aggregate function CORR(X,Y).

Posted by "John Sichi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-1549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12899538#action_12899538 ] 

John Sichi commented on HIVE-1549:
----------------------------------

Note HIVE-1545.  Jonathan Chang (from Facebook data science) dropped in a bunch of code there which may or may not be reelvant.


> Add ANSI SQL correlation aggregate function CORR(X,Y).
> ------------------------------------------------------
>
>                 Key: HIVE-1549
>                 URL: https://issues.apache.org/jira/browse/HIVE-1549
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Query Processor
>    Affects Versions: 0.7.0
>            Reporter: Pierre Huyn
>            Assignee: Pierre Huyn
>             Fix For: 0.7.0
>
>   Original Estimate: 120h
>  Remaining Estimate: 120h
>
> Aggregate function that computes the Pearson's coefficient of correlation between a set of number pairs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Issue Comment Edited: (HIVE-1549) Add ANSI SQL correlation aggregate function CORR(X,Y).

Posted by "John Sichi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-1549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900450#action_12900450 ] 

John Sichi edited comment on HIVE-1549 at 8/19/10 4:52 PM:
-----------------------------------------------------------

Hmmm, looking more closely...your patch includes changes to udaf_covar_pop.q.out and covar_samp.q.out (and those include conflict markers, which should never be there in a submitted patch).  But this change shouldn't actually affect those files at all, right?  I think if you just revert those before svn diff, all should be well.


      was (Author: jvs):
    Hmmm, looking more closely...your patch includes changes to udaf_cover_pop.q.out and covar_samp.q.out (and those include conflict markers, which should never be there in a submitted patch).  But this change shouldn't actually affect those files at all, right?  I think if you just revert those before svn diff, all should be well.

  
> Add ANSI SQL correlation aggregate function CORR(X,Y).
> ------------------------------------------------------
>
>                 Key: HIVE-1549
>                 URL: https://issues.apache.org/jira/browse/HIVE-1549
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Query Processor
>    Affects Versions: 0.7.0
>            Reporter: Pierre Huyn
>            Assignee: Pierre Huyn
>             Fix For: 0.7.0
>
>         Attachments: HIVE-1549.1.patch, HIVE-1549.2.patch, HIVE-1549.3.patch
>
>   Original Estimate: 120h
>  Remaining Estimate: 120h
>
> Aggregate function that computes the Pearson's coefficient of correlation between a set of number pairs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Resolved: (HIVE-1549) Add ANSI SQL correlation aggregate function CORR(X,Y).

Posted by "John Sichi (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-1549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

John Sichi resolved HIVE-1549.
------------------------------

    Hadoop Flags: [Reviewed]
      Resolution: Fixed

Committed.  Thanks Pierre!


> Add ANSI SQL correlation aggregate function CORR(X,Y).
> ------------------------------------------------------
>
>                 Key: HIVE-1549
>                 URL: https://issues.apache.org/jira/browse/HIVE-1549
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Query Processor
>    Affects Versions: 0.7.0
>            Reporter: Pierre Huyn
>            Assignee: Pierre Huyn
>             Fix For: 0.7.0
>
>         Attachments: HIVE-1549.1.patch, HIVE-1549.2.patch, HIVE-1549.3.patch, HIVE-1549.4.patch
>
>   Original Estimate: 120h
>  Remaining Estimate: 120h
>
> Aggregate function that computes the Pearson's coefficient of correlation between a set of number pairs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-1549) Add ANSI SQL correlation aggregate function CORR(X,Y).

Posted by "Pierre Huyn (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-1549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Pierre Huyn updated HIVE-1549:
------------------------------

    Attachment: HIVE-1549.4.patch

Rebuilt patch after reverting udaf_covar_pop.q.out and udaf_covar_samp.q.out which are not relevant to this patch and should not have been inconsistent with the trunk.

> Add ANSI SQL correlation aggregate function CORR(X,Y).
> ------------------------------------------------------
>
>                 Key: HIVE-1549
>                 URL: https://issues.apache.org/jira/browse/HIVE-1549
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Query Processor
>    Affects Versions: 0.7.0
>            Reporter: Pierre Huyn
>            Assignee: Pierre Huyn
>             Fix For: 0.7.0
>
>         Attachments: HIVE-1549.1.patch, HIVE-1549.2.patch, HIVE-1549.3.patch, HIVE-1549.4.patch
>
>   Original Estimate: 120h
>  Remaining Estimate: 120h
>
> Aggregate function that computes the Pearson's coefficient of correlation between a set of number pairs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-1549) Add ANSI SQL correlation aggregate function CORR(X,Y).

Posted by "Pierre Huyn (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-1549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Pierre Huyn updated HIVE-1549:
------------------------------

    Attachment: HIVE-1549.3.patch

New patch created to resolve conflicts with other commits. All I did was to refresh my working tree, recompile, and rerun test. Hope this works.

> Add ANSI SQL correlation aggregate function CORR(X,Y).
> ------------------------------------------------------
>
>                 Key: HIVE-1549
>                 URL: https://issues.apache.org/jira/browse/HIVE-1549
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Query Processor
>    Affects Versions: 0.7.0
>            Reporter: Pierre Huyn
>            Assignee: Pierre Huyn
>             Fix For: 0.7.0
>
>         Attachments: HIVE-1549.1.patch, HIVE-1549.2.patch, HIVE-1549.3.patch
>
>   Original Estimate: 120h
>  Remaining Estimate: 120h
>
> Aggregate function that computes the Pearson's coefficient of correlation between a set of number pairs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Work started: (HIVE-1549) Add ANSI SQL correlation aggregate function CORR(X,Y).

Posted by "Pierre Huyn (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-1549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Work on HIVE-1549 started by Pierre Huyn.

> Add ANSI SQL correlation aggregate function CORR(X,Y).
> ------------------------------------------------------
>
>                 Key: HIVE-1549
>                 URL: https://issues.apache.org/jira/browse/HIVE-1549
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Query Processor
>    Affects Versions: 0.7.0
>            Reporter: Pierre Huyn
>            Assignee: Pierre Huyn
>             Fix For: 0.7.0
>
>   Original Estimate: 120h
>  Remaining Estimate: 120h
>
> Aggregate function that computes the Pearson's coefficient of correlation between a set of number pairs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-1549) Add ANSI SQL correlation aggregate function CORR(X,Y).

Posted by "Pierre Huyn (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-1549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Pierre Huyn updated HIVE-1549:
------------------------------

          Status: Patch Available  (was: In Progress)
    Release Note: This CORR udaf is implemented using a stable one-pass algorithm, similar to the one used in the COVAR_POP udaf.

This release is ready for code review.

> Add ANSI SQL correlation aggregate function CORR(X,Y).
> ------------------------------------------------------
>
>                 Key: HIVE-1549
>                 URL: https://issues.apache.org/jira/browse/HIVE-1549
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Query Processor
>    Affects Versions: 0.7.0
>            Reporter: Pierre Huyn
>            Assignee: Pierre Huyn
>             Fix For: 0.7.0
>
>         Attachments: HIVE-1549.1.patch
>
>   Original Estimate: 120h
>  Remaining Estimate: 120h
>
> Aggregate function that computes the Pearson's coefficient of correlation between a set of number pairs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-1549) Add ANSI SQL correlation aggregate function CORR(X,Y).

Posted by "Pierre Huyn (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-1549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900475#action_12900475 ] 

Pierre Huyn commented on HIVE-1549:
-----------------------------------

Hi John,

Please let me know how it goes.
--- Pierre



> Add ANSI SQL correlation aggregate function CORR(X,Y).
> ------------------------------------------------------
>
>                 Key: HIVE-1549
>                 URL: https://issues.apache.org/jira/browse/HIVE-1549
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Query Processor
>    Affects Versions: 0.7.0
>            Reporter: Pierre Huyn
>            Assignee: Pierre Huyn
>             Fix For: 0.7.0
>
>         Attachments: HIVE-1549.1.patch, HIVE-1549.2.patch, HIVE-1549.3.patch, HIVE-1549.4.patch
>
>   Original Estimate: 120h
>  Remaining Estimate: 120h
>
> Aggregate function that computes the Pearson's coefficient of correlation between a set of number pairs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-1549) Add ANSI SQL correlation aggregate function CORR(X,Y).

Posted by "Mayank Lahiri (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-1549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900029#action_12900029 ] 

Mayank Lahiri commented on HIVE-1549:
-------------------------------------

Nice job Pierre! Just a couple of very trivial points:

-- UDAF file, line #116 and line #123, could you amend the error message to indicate that only numeric types are accepted (string is also included as of now).

-- I don't think you need the private boolean warned, line #273

Otherwise, it looks good and the numbers work out.
 

Incidentally, for the future, if your UDAF only stores a small number of values as a partial aggregation, you might just want to consider serializing the values as a list of doubles instead of a struct in terminatePartial() and merge(). It'll probably save you some time and reduce the amount of code in those parts. 

> Add ANSI SQL correlation aggregate function CORR(X,Y).
> ------------------------------------------------------
>
>                 Key: HIVE-1549
>                 URL: https://issues.apache.org/jira/browse/HIVE-1549
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Query Processor
>    Affects Versions: 0.7.0
>            Reporter: Pierre Huyn
>            Assignee: Pierre Huyn
>             Fix For: 0.7.0
>
>         Attachments: HIVE-1549.1.patch
>
>   Original Estimate: 120h
>  Remaining Estimate: 120h
>
> Aggregate function that computes the Pearson's coefficient of correlation between a set of number pairs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-1549) Add ANSI SQL correlation aggregate function CORR(X,Y).

Posted by "Pierre Huyn (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-1549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Pierre Huyn updated HIVE-1549:
------------------------------

    Attachment: HIVE-1549.1.patch

This CORR  UDAF is implemented using a one-pass stable algorithm, very similar to the implementation of the COVAR_POP UPAF. This code release is now ready for review.

> Add ANSI SQL correlation aggregate function CORR(X,Y).
> ------------------------------------------------------
>
>                 Key: HIVE-1549
>                 URL: https://issues.apache.org/jira/browse/HIVE-1549
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Query Processor
>    Affects Versions: 0.7.0
>            Reporter: Pierre Huyn
>            Assignee: Pierre Huyn
>             Fix For: 0.7.0
>
>         Attachments: HIVE-1549.1.patch
>
>   Original Estimate: 120h
>  Remaining Estimate: 120h
>
> Aggregate function that computes the Pearson's coefficient of correlation between a set of number pairs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-1549) Add ANSI SQL correlation aggregate function CORR(X,Y).

Posted by "John Sichi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-1549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900009#action_12900009 ] 

John Sichi commented on HIVE-1549:
----------------------------------

Mayank, if you get time, here's another one to take a look at.


> Add ANSI SQL correlation aggregate function CORR(X,Y).
> ------------------------------------------------------
>
>                 Key: HIVE-1549
>                 URL: https://issues.apache.org/jira/browse/HIVE-1549
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Query Processor
>    Affects Versions: 0.7.0
>            Reporter: Pierre Huyn
>            Assignee: Pierre Huyn
>             Fix For: 0.7.0
>
>         Attachments: HIVE-1549.1.patch
>
>   Original Estimate: 120h
>  Remaining Estimate: 120h
>
> Aggregate function that computes the Pearson's coefficient of correlation between a set of number pairs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-1549) Add ANSI SQL correlation aggregate function CORR(X,Y).

Posted by "John Sichi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-1549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900098#action_12900098 ] 

John Sichi commented on HIVE-1549:
----------------------------------

Will commit when tests pass.


> Add ANSI SQL correlation aggregate function CORR(X,Y).
> ------------------------------------------------------
>
>                 Key: HIVE-1549
>                 URL: https://issues.apache.org/jira/browse/HIVE-1549
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Query Processor
>    Affects Versions: 0.7.0
>            Reporter: Pierre Huyn
>            Assignee: Pierre Huyn
>             Fix For: 0.7.0
>
>         Attachments: HIVE-1549.1.patch, HIVE-1549.2.patch
>
>   Original Estimate: 120h
>  Remaining Estimate: 120h
>
> Aggregate function that computes the Pearson's coefficient of correlation between a set of number pairs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-1549) Add ANSI SQL correlation aggregate function CORR(X,Y).

Posted by "John Sichi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-1549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900450#action_12900450 ] 

John Sichi commented on HIVE-1549:
----------------------------------

Hmmm, looking more closely...your patch includes changes to udaf_cover_pop.q.out and covar_samp.q.out (and those include conflict markers, which should never be there in a submitted patch).  But this change shouldn't actually affect those files at all, right?  I think if you just revert those before svn diff, all should be well.


> Add ANSI SQL correlation aggregate function CORR(X,Y).
> ------------------------------------------------------
>
>                 Key: HIVE-1549
>                 URL: https://issues.apache.org/jira/browse/HIVE-1549
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Query Processor
>    Affects Versions: 0.7.0
>            Reporter: Pierre Huyn
>            Assignee: Pierre Huyn
>             Fix For: 0.7.0
>
>         Attachments: HIVE-1549.1.patch, HIVE-1549.2.patch, HIVE-1549.3.patch
>
>   Original Estimate: 120h
>  Remaining Estimate: 120h
>
> Aggregate function that computes the Pearson's coefficient of correlation between a set of number pairs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-1549) Add ANSI SQL correlation aggregate function CORR(X,Y).

Posted by "Pierre Huyn (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-1549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900062#action_12900062 ] 

Pierre Huyn commented on HIVE-1549:
-----------------------------------

Thanks for your comments. The items have been taken care of in the patch #2.




> Add ANSI SQL correlation aggregate function CORR(X,Y).
> ------------------------------------------------------
>
>                 Key: HIVE-1549
>                 URL: https://issues.apache.org/jira/browse/HIVE-1549
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Query Processor
>    Affects Versions: 0.7.0
>            Reporter: Pierre Huyn
>            Assignee: Pierre Huyn
>             Fix For: 0.7.0
>
>         Attachments: HIVE-1549.1.patch, HIVE-1549.2.patch
>
>   Original Estimate: 120h
>  Remaining Estimate: 120h
>
> Aggregate function that computes the Pearson's coefficient of correlation between a set of number pairs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-1549) Add ANSI SQL correlation aggregate function CORR(X,Y).

Posted by "Mayank Lahiri (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-1549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900011#action_12900011 ] 

Mayank Lahiri commented on HIVE-1549:
-------------------------------------

No problem, reviewing it now...

> Add ANSI SQL correlation aggregate function CORR(X,Y).
> ------------------------------------------------------
>
>                 Key: HIVE-1549
>                 URL: https://issues.apache.org/jira/browse/HIVE-1549
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Query Processor
>    Affects Versions: 0.7.0
>            Reporter: Pierre Huyn
>            Assignee: Pierre Huyn
>             Fix For: 0.7.0
>
>         Attachments: HIVE-1549.1.patch
>
>   Original Estimate: 120h
>  Remaining Estimate: 120h
>
> Aggregate function that computes the Pearson's coefficient of correlation between a set of number pairs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-1549) Add ANSI SQL correlation aggregate function CORR(X,Y).

Posted by "John Sichi (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-1549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

John Sichi updated HIVE-1549:
-----------------------------

    Status: Open  (was: Patch Available)

> Add ANSI SQL correlation aggregate function CORR(X,Y).
> ------------------------------------------------------
>
>                 Key: HIVE-1549
>                 URL: https://issues.apache.org/jira/browse/HIVE-1549
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Query Processor
>    Affects Versions: 0.7.0
>            Reporter: Pierre Huyn
>            Assignee: Pierre Huyn
>             Fix For: 0.7.0
>
>         Attachments: HIVE-1549.1.patch, HIVE-1549.2.patch
>
>   Original Estimate: 120h
>  Remaining Estimate: 120h
>
> Aggregate function that computes the Pearson's coefficient of correlation between a set of number pairs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-1549) Add ANSI SQL correlation aggregate function CORR(X,Y).

Posted by "John Sichi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-1549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900426#action_12900426 ] 

John Sichi commented on HIVE-1549:
----------------------------------

Hi Pierre,

I applied your patch (after resolving some trivial conflicts with a recent context_ngrams checkin from Mayank) but hit some test failures due to test output diffs from other changes which got committed recently in HIVE-1548 (not a problem in your patch).

Could you upload a new patch which resolves the conflicts and updates the test output to match latest trunk?


> Add ANSI SQL correlation aggregate function CORR(X,Y).
> ------------------------------------------------------
>
>                 Key: HIVE-1549
>                 URL: https://issues.apache.org/jira/browse/HIVE-1549
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Query Processor
>    Affects Versions: 0.7.0
>            Reporter: Pierre Huyn
>            Assignee: Pierre Huyn
>             Fix For: 0.7.0
>
>         Attachments: HIVE-1549.1.patch, HIVE-1549.2.patch
>
>   Original Estimate: 120h
>  Remaining Estimate: 120h
>
> Aggregate function that computes the Pearson's coefficient of correlation between a set of number pairs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-1549) Add ANSI SQL correlation aggregate function CORR(X,Y).

Posted by "Mayank Lahiri (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-1549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900075#action_12900075 ] 

Mayank Lahiri commented on HIVE-1549:
-------------------------------------

+1 looks good to me.

> Add ANSI SQL correlation aggregate function CORR(X,Y).
> ------------------------------------------------------
>
>                 Key: HIVE-1549
>                 URL: https://issues.apache.org/jira/browse/HIVE-1549
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Query Processor
>    Affects Versions: 0.7.0
>            Reporter: Pierre Huyn
>            Assignee: Pierre Huyn
>             Fix For: 0.7.0
>
>         Attachments: HIVE-1549.1.patch, HIVE-1549.2.patch
>
>   Original Estimate: 120h
>  Remaining Estimate: 120h
>
> Aggregate function that computes the Pearson's coefficient of correlation between a set of number pairs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-1549) Add ANSI SQL correlation aggregate function CORR(X,Y).

Posted by "Pierre Huyn (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-1549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Pierre Huyn updated HIVE-1549:
------------------------------

    Attachment: HIVE-1549.2.patch

Fixed the 2 issues from Mayank's review.

> Add ANSI SQL correlation aggregate function CORR(X,Y).
> ------------------------------------------------------
>
>                 Key: HIVE-1549
>                 URL: https://issues.apache.org/jira/browse/HIVE-1549
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Query Processor
>    Affects Versions: 0.7.0
>            Reporter: Pierre Huyn
>            Assignee: Pierre Huyn
>             Fix For: 0.7.0
>
>         Attachments: HIVE-1549.1.patch, HIVE-1549.2.patch
>
>   Original Estimate: 120h
>  Remaining Estimate: 120h
>
> Aggregate function that computes the Pearson's coefficient of correlation between a set of number pairs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-1549) Add ANSI SQL correlation aggregate function CORR(X,Y).

Posted by "John Sichi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-1549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900478#action_12900478 ] 

John Sichi commented on HIVE-1549:
----------------------------------

Rerunning tests with latest.


> Add ANSI SQL correlation aggregate function CORR(X,Y).
> ------------------------------------------------------
>
>                 Key: HIVE-1549
>                 URL: https://issues.apache.org/jira/browse/HIVE-1549
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Query Processor
>    Affects Versions: 0.7.0
>            Reporter: Pierre Huyn
>            Assignee: Pierre Huyn
>             Fix For: 0.7.0
>
>         Attachments: HIVE-1549.1.patch, HIVE-1549.2.patch, HIVE-1549.3.patch, HIVE-1549.4.patch
>
>   Original Estimate: 120h
>  Remaining Estimate: 120h
>
> Aggregate function that computes the Pearson's coefficient of correlation between a set of number pairs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.