You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Pi Song (JIRA)" <ji...@apache.org> on 2008/06/23 17:02:44 UTC

[jira] Commented: (PIG-277) UDF for computing correlation and covariance between data sets

    [ https://issues.apache.org/jira/browse/PIG-277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12607250#action_12607250 ] 

Pi Song commented on PIG-277:
-----------------------------

Good work
 - Please be a bit more careful with code formatting
 - Please convert tabs to spaces (We use 1 tab = 4 spaces)

Covariance
 - COV.combine: What does this do?
   Tuple tuple = new Tuple(Integer.valueOf(values.size()+"").intValue());
 - This looks a bit ugly:-
{noformat}
catch(RuntimeException t) {
                throw new RuntimeException(t.getMessage() + ": " + input, t);
            }
{noformat}

Correlation
int totalSchemas = Double.valueOf(((1+Math.sqrt(1+4*combined.arity()))/2)).intValue();
I think we may have problems with this line. Javadoc says .intValue() will truncate the fractional part. 

> UDF for computing correlation and covariance between data sets
> --------------------------------------------------------------
>
>                 Key: PIG-277
>                 URL: https://issues.apache.org/jira/browse/PIG-277
>             Project: Pig
>          Issue Type: New Feature
>            Reporter: Ajay Garg
>            Priority: Minor
>         Attachments: stat.patch
>
>
> UDFs for computing correlation and covariance between data sets. Use following commands to compute covariance
> A = load 'input.xml' using PigStorage(':');
> B = group A all;
> define c COV('a','b','c');
> D = foreach B generate group,c(A.$0,A.$1,A.$2);

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.