You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Pi Song (JIRA)" <ji...@apache.org> on 2008/06/23 17:02:44 UTC
[jira] Commented: (PIG-277) UDF for computing correlation and
covariance between data sets
[ https://issues.apache.org/jira/browse/PIG-277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12607250#action_12607250 ]
Pi Song commented on PIG-277:
-----------------------------
Good work
- Please be a bit more careful with code formatting
- Please convert tabs to spaces (We use 1 tab = 4 spaces)
Covariance
- COV.combine: What does this do?
Tuple tuple = new Tuple(Integer.valueOf(values.size()+"").intValue());
- This looks a bit ugly:-
{noformat}
catch(RuntimeException t) {
throw new RuntimeException(t.getMessage() + ": " + input, t);
}
{noformat}
Correlation
int totalSchemas = Double.valueOf(((1+Math.sqrt(1+4*combined.arity()))/2)).intValue();
I think we may have problems with this line. Javadoc says .intValue() will truncate the fractional part.
> UDF for computing correlation and covariance between data sets
> --------------------------------------------------------------
>
> Key: PIG-277
> URL: https://issues.apache.org/jira/browse/PIG-277
> Project: Pig
> Issue Type: New Feature
> Reporter: Ajay Garg
> Priority: Minor
> Attachments: stat.patch
>
>
> UDFs for computing correlation and covariance between data sets. Use following commands to compute covariance
> A = load 'input.xml' using PigStorage(':');
> B = group A all;
> define c COV('a','b','c');
> D = foreach B generate group,c(A.$0,A.$1,A.$2);
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.