You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@pig.apache.org by Renato Marroquín Mogrovejo <re...@gmail.com> on 2013/03/26 23:28:33 UTC

Using Correlation and Covariance UDFs

Hi all,

Could anyone be kind enough to point me to some examples on using the
COVARIANCE and the CORRELATION UDFS described in here?[1]


Renato M.


[1] https://issues.apache.org/jira/browse/PIG-277

Re: Using Correlation and Covariance UDFs

Posted by Renato Marroquín Mogrovejo <re...@gmail.com>.

Thanks !

Renato M.
On Mar 27, 2013 1:31 PM, "Russell Jurney" <ru...@gmail.com> wrote:

> Some UDFs rely on this, but it looks like I could be mistaken. This used to
> be the case in piggybank I think but no longer?
>
>
> On Wed, Mar 27, 2013 at 6:15 AM, Houssam <ho...@haitof.com> wrote:
>
> > Hi Russel,
> >
> > I know what Johnny wrote is correct. But out of curiosity, why would you
> > need to sort the input? Thanks!
> >
> > Houssam
> >
> > On Wed, Mar 27, 2013 at 2:04 AM, Russell Jurney <
> russell.jurney@gmail.com
> > >wrote:
> >
> > > Beware: you must first sort the input.
> > >
> > > D = foreach b { sorted = order B by $0; generate group, COR(sorted.$0,
> > > sorted.$1, ... );
> > >
> > > ,
> > > On Tue, Mar 26, 2013 at 5:11 PM, Johnny Zhang <xi...@cloudera.com>
> > > wrote:
> > >
> > > > Hi, Renato:
> > > > For CORRELATION, I guess you can do something like
> > > > A = load 'random.txt' using PigStorage(':') as
> > > > (f1:double,f2:double,.........,f500:double);
> > > > B = group A all;
> > > > D = foreach B generate group,COR(A.$0,A.$1,A.$2,A.$3,.......A.$499);
> > > >
> > > > For COVARIANCE, I guess the UDF is COV.
> > > >
> > > > Johnny
> > > >
> > > >
> > > > On Tue, Mar 26, 2013 at 3:28 PM, Renato Marroquín Mogrovejo <
> > > > renatoj.marroquin@gmail.com> wrote:
> > > >
> > > > > Hi all,
> > > > >
> > > > > Could anyone be kind enough to point me to some examples on using
> the
> > > > > COVARIANCE and the CORRELATION UDFS described in here?[1]
> > > > >
> > > > >
> > > > > Renato M.
> > > > >
> > > > >
> > > > > [1] https://issues.apache.org/jira/browse/PIG-277
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > Russell Jurney twitter.com/rjurney russell.jurney@gmail.com
> > > datasyndrome.com
> > >
> >
>
>
>
> --
> Russell Jurney twitter.com/rjurney russell.jurney@gmail.com
> datasyndrome.com
>

Re: Using Correlation and Covariance UDFs

Posted by Russell Jurney <ru...@gmail.com>.

Some UDFs rely on this, but it looks like I could be mistaken. This used to
be the case in piggybank I think but no longer?


On Wed, Mar 27, 2013 at 6:15 AM, Houssam <ho...@haitof.com> wrote:

> Hi Russel,
>
> I know what Johnny wrote is correct. But out of curiosity, why would you
> need to sort the input? Thanks!
>
> Houssam
>
> On Wed, Mar 27, 2013 at 2:04 AM, Russell Jurney <russell.jurney@gmail.com
> >wrote:
>
> > Beware: you must first sort the input.
> >
> > D = foreach b { sorted = order B by $0; generate group, COR(sorted.$0,
> > sorted.$1, ... );
> >
> > ,
> > On Tue, Mar 26, 2013 at 5:11 PM, Johnny Zhang <xi...@cloudera.com>
> > wrote:
> >
> > > Hi, Renato:
> > > For CORRELATION, I guess you can do something like
> > > A = load 'random.txt' using PigStorage(':') as
> > > (f1:double,f2:double,.........,f500:double);
> > > B = group A all;
> > > D = foreach B generate group,COR(A.$0,A.$1,A.$2,A.$3,.......A.$499);
> > >
> > > For COVARIANCE, I guess the UDF is COV.
> > >
> > > Johnny
> > >
> > >
> > > On Tue, Mar 26, 2013 at 3:28 PM, Renato Marroquín Mogrovejo <
> > > renatoj.marroquin@gmail.com> wrote:
> > >
> > > > Hi all,
> > > >
> > > > Could anyone be kind enough to point me to some examples on using the
> > > > COVARIANCE and the CORRELATION UDFS described in here?[1]
> > > >
> > > >
> > > > Renato M.
> > > >
> > > >
> > > > [1] https://issues.apache.org/jira/browse/PIG-277
> > > >
> > >
> >
> >
> >
> > --
> > Russell Jurney twitter.com/rjurney russell.jurney@gmail.com
> > datasyndrome.com
> >
>



-- 
Russell Jurney twitter.com/rjurney russell.jurney@gmail.com datasyndrome.com

Re: Using Correlation and Covariance UDFs

Posted by Houssam <ho...@haitof.com>.

Hi Russel,

I know what Johnny wrote is correct. But out of curiosity, why would you
need to sort the input? Thanks!

Houssam

On Wed, Mar 27, 2013 at 2:04 AM, Russell Jurney <ru...@gmail.com>wrote:

> Beware: you must first sort the input.
>
> D = foreach b { sorted = order B by $0; generate group, COR(sorted.$0,
> sorted.$1, ... );
>
> ,
> On Tue, Mar 26, 2013 at 5:11 PM, Johnny Zhang <xi...@cloudera.com>
> wrote:
>
> > Hi, Renato:
> > For CORRELATION, I guess you can do something like
> > A = load 'random.txt' using PigStorage(':') as
> > (f1:double,f2:double,.........,f500:double);
> > B = group A all;
> > D = foreach B generate group,COR(A.$0,A.$1,A.$2,A.$3,.......A.$499);
> >
> > For COVARIANCE, I guess the UDF is COV.
> >
> > Johnny
> >
> >
> > On Tue, Mar 26, 2013 at 3:28 PM, Renato Marroquín Mogrovejo <
> > renatoj.marroquin@gmail.com> wrote:
> >
> > > Hi all,
> > >
> > > Could anyone be kind enough to point me to some examples on using the
> > > COVARIANCE and the CORRELATION UDFS described in here?[1]
> > >
> > >
> > > Renato M.
> > >
> > >
> > > [1] https://issues.apache.org/jira/browse/PIG-277
> > >
> >
>
>
>
> --
> Russell Jurney twitter.com/rjurney russell.jurney@gmail.com
> datasyndrome.com
>

Re: Using Correlation and Covariance UDFs

Posted by Russell Jurney <ru...@gmail.com>.

Beware: you must first sort the input.

D = foreach b { sorted = order B by $0; generate group, COR(sorted.$0,
sorted.$1, ... );


On Tue, Mar 26, 2013 at 5:11 PM, Johnny Zhang <xi...@cloudera.com> wrote:

> Hi, Renato:
> For CORRELATION, I guess you can do something like
> A = load 'random.txt' using PigStorage(':') as
> (f1:double,f2:double,.........,f500:double);
> B = group A all;
> D = foreach B generate group,COR(A.$0,A.$1,A.$2,A.$3,.......A.$499);
>
> For COVARIANCE, I guess the UDF is COV.
>
> Johnny
>
>
> On Tue, Mar 26, 2013 at 3:28 PM, Renato Marroquín Mogrovejo <
> renatoj.marroquin@gmail.com> wrote:
>
> > Hi all,
> >
> > Could anyone be kind enough to point me to some examples on using the
> > COVARIANCE and the CORRELATION UDFS described in here?[1]
> >
> >
> > Renato M.
> >
> >
> > [1] https://issues.apache.org/jira/browse/PIG-277
> >
>



-- 
Russell Jurney twitter.com/rjurney russell.jurney@gmail.com datasyndrome.com

Re: Using Correlation and Covariance UDFs

Posted by Johnny Zhang <xi...@cloudera.com>.

Hi, Renato:
For CORRELATION, I guess you can do something like
A = load 'random.txt' using PigStorage(':') as
(f1:double,f2:double,.........,f500:double);
B = group A all;
D = foreach B generate group,COR(A.$0,A.$1,A.$2,A.$3,.......A.$499);

For COVARIANCE, I guess the UDF is COV.

Johnny


On Tue, Mar 26, 2013 at 3:28 PM, Renato Marroquín Mogrovejo <
renatoj.marroquin@gmail.com> wrote:

> Hi all,
>
> Could anyone be kind enough to point me to some examples on using the
> COVARIANCE and the CORRELATION UDFS described in here?[1]
>
>
> Renato M.
>
>
> [1] https://issues.apache.org/jira/browse/PIG-277
>