You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Hiten Java (JIRA)" <ji...@apache.org> on 2014/01/15 02:24:20 UTC

[jira] [Created] (PIG-3668) COR built-in function when atleast one of the coefficient values is NaN

Hiten Java created PIG-3668:
-------------------------------

             Summary: COR built-in function when atleast one of the coefficient values is NaN
                 Key: PIG-3668
                 URL: https://issues.apache.org/jira/browse/PIG-3668
             Project: Pig
          Issue Type: Bug
          Components: internal-udfs
    Affects Versions: 0.11.1, 0.12.0, 0.11
            Reporter: Hiten Java
            Priority: Trivial


When passing multiple column keys for Correlation analysis, if coefficient value of one of the combinations is NaN, then the value for all other combinations is not computed.

Pearson Co-efficient value is NaN if all values for a given column are the same.

Example:
A = LOAD 'myData' USING org.apache.hcatalog.pig.HCatLoader();
B = group A all;
c = foreach B generate group, FLATTEN(COR((bag{tuple(double)}) A.col_1,(bag{tuple(double)}) A.col_2, (bag{tuple(double)}) A.col_3, (bag{tuple(double)}) A.col_4));

If the value of pearson coefficient for col_1 and col_2 is NaN, then value of co-efficients for all combinations is NaN

This is happening because of 'return null' statement in catch block on lines 157 and 235 in file org.apache.pig.builtin.COR.java
If the catch block is removed, then the correlation analysis would continue for the remaining columns. (ApachePig 0.12.0)



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)