You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Daniel Dai (JIRA)" <ji...@apache.org> on 2014/01/16 02:22:19 UTC
[jira] [Commented] (PIG-3668) COR built-in function when atleast
one of the coefficient values is NaN
[ https://issues.apache.org/jira/browse/PIG-3668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13872909#comment-13872909 ]
Daniel Dai commented on PIG-3668:
---------------------------------
Will that produce wrong result easily ignored by user? Also there is another occurrence of this pattern in COR.exec, do we need to change that as well?
> COR built-in function when atleast one of the coefficient values is NaN
> -----------------------------------------------------------------------
>
> Key: PIG-3668
> URL: https://issues.apache.org/jira/browse/PIG-3668
> Project: Pig
> Issue Type: Bug
> Components: internal-udfs
> Affects Versions: 0.12.0, 0.11.1, 0.12.1
> Reporter: Hiten Java
> Assignee: Hiten Java
> Attachments: CORR.diff
>
>
> When passing multiple column keys for Correlation analysis, if coefficient value of one of the combinations is NaN, then the value for all other combinations is not computed.
> Pearson Co-efficient value is NaN if all values for a given column are the same.
> Example:
> A = LOAD 'myData' USING org.apache.hcatalog.pig.HCatLoader();
> B = group A all;
> c = foreach B generate group, FLATTEN(COR((bag{tuple(double)}) A.col_1,(bag{tuple(double)}) A.col_2, (bag{tuple(double)}) A.col_3, (bag{tuple(double)}) A.col_4));
> If the value of pearson coefficient for col_1 and col_2 is NaN, then value of co-efficients for all combinations is NaN
> This is happening because of 'return null' statement in catch block on lines 157 and 235 in file org.apache.pig.builtin.COR.java
> If the catch block is removed, then the correlation analysis would continue for the remaining columns. (ApachePig 0.12.0)
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)