You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@commons.apache.org by Devl Devel <de...@gmail.com> on 2012/07/05 23:24:25 UTC

[math] Kendall's Tau Implementation

Hi All,

Below is a proposal for a new feature:

*A concise description of the new feature / enhancement*
*
*
I propose a new feature to implement the Kendall's Tau which is a measure
of Association/Correlation between ranked ordinal data.

*References to definitions and algorithms.*
*
*A basic description is available at
http://en.wikipedia.org/wiki/Kendall_tau_rank_correlation_coefficient however
the test implementation will follow that defined by "Handbook of Parametric
and Nonparametric Statistical Procedures, Fifth Edition, Page 1393 Test 30,
ISBN-10: 1439858012 | ISBN-13: 978-1439858011."

The algorithm is proposed as follows.

Given two rankings or permutations represented by a 2D matrix; columns
indicate rankings (e.g. by an individual) and row are observations of each
rank. The algorithm is to calculate the total number of concordant pairs of
ranks (between columns), discordant pairs of ranks  (between columns) and
calculate the Tau defined as

tau= (Number of concordant - number of discordant)/(n(n-1)/2)
 where n(n-1)/2 is the total number of possible pairs of ranks.

The method will then output the tau value between 0 and 1 where 1 signifies
a "perfect" correlation between the two ranked lists.

Where ties exist within a ranking it is marked as neither concordant nor
discordant in the calculation. An optional merge sort can be used to speed
up the implementation. Details are in the wiki page.

*Some indication of why the addition / enhancement is practically useful*
*
*
Although this implementation is not particularly complex it would be useful
to have it in a consistent format in the commons math package in addition
to existing correlation tests. Kendall's Tau is used effectively in
comparing ranks for products, rankings from search engines or measurements
from engineering equipment.

This  is my first post on this list, I tried to follow the guidelines but
let me know if I need to elaborate.

Regards
Dev

Re: [math] Kendall's Tau Implementation

Posted by Phil Steitz <ph...@gmail.com>.
On 7/5/12 2:24 PM, Devl Devel wrote:
> Hi All,

Welcome!
>
> Below is a proposal for a new feature:
>
> *A concise description of the new feature / enhancement*
> *
> *
> I propose a new feature to implement the Kendall's Tau which is a measure
> of Association/Correlation between ranked ordinal data.
>
> *References to definitions and algorithms.*
> *
> *A basic description is available at
> http://en.wikipedia.org/wiki/Kendall_tau_rank_correlation_coefficient however
> the test implementation will follow that defined by "Handbook of Parametric
> and Nonparametric Statistical Procedures, Fifth Edition, Page 1393 Test 30,
> ISBN-10: 1439858012 | ISBN-13: 978-1439858011."
>
> The algorithm is proposed as follows.
>
> Given two rankings or permutations represented by a 2D matrix; columns
> indicate rankings (e.g. by an individual) and row are observations of each
> rank. The algorithm is to calculate the total number of concordant pairs of
> ranks (between columns), discordant pairs of ranks  (between columns) and
> calculate the Tau defined as
>
> tau= (Number of concordant - number of discordant)/(n(n-1)/2)
>  where n(n-1)/2 is the total number of possible pairs of ranks.
>
> The method will then output the tau value between 0 and 1 where 1 signifies
> a "perfect" correlation between the two ranked lists.
>
> Where ties exist within a ranking it is marked as neither concordant nor
> discordant in the calculation. An optional merge sort can be used to speed
> up the implementation. Details are in the wiki page.
>
> *Some indication of why the addition / enhancement is practically useful*
> *
> *
> Although this implementation is not particularly complex it would be useful
> to have it in a consistent format in the commons math package in addition
> to existing correlation tests. Kendall's Tau is used effectively in
> comparing ranks for products, rankings from search engines or measurements
> from engineering equipment.
>
> This  is my first post on this list, I tried to follow the guidelines but
> let me know if I need to elaborate.

I think a Kendal's Tau implementation would make a great addition to
the correlation package (o.a.c.math3.stat.correlation).  Here is how
you can get started:

0) Get yourself set up to build commons math and run the unit
tests.  If you are familiar with maven, this should not be too
hard.  If you have any questions or run into problems checking out
the sources, building locally, etc., don't hesitate to ask.
1) Look at the Spearman's implementation and the ranking classes in
the stat.ranking package.  That might give you some ideas on how to
implement Kendal's consistently.
2) Open a JIRA ticket with the info above and start attaching
patches implementing the new implementation class and associated
test class.  Run "mvn site" or checkstyle standalone to make sure
your contributed code follows the style guidelines we use.
3) Be patient but persistent and we will get Kendall's Tau into
commons math :)

Thanks in advance!

Phil
>
> Regards
> Dev
>



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [math] Kendall's Tau Implementation

Posted by Phil Steitz <ph...@gmail.com>.
On 7/21/12 7:04 AM, Devl Devel wrote:
> Hi Phil
>
> I took a closer look at the Spearmans correlation and note that it
> uses an underlying PearsonsCorrelation object to do the actual
> work of calculating the correlation value after ranking.

It does that only because Spearman's essentially is Pearson's after
the rank transform.  It is just reusing the implementation code in
Pearson's.
>
>  Do I have to do the same for Kendalls Tau? I.e. Do I need to have
> two classes 1)KendallsTauCorrelation which is the equiv of
> SpearmansCorrelation and then say KendallsTauComputation which is
> the equivilant of PearsonsCorrelation?
No.  No need to add this complexity.
> Of can I just put everything into one class called
> KendallsTauCorrelation which does the ranking using the
> RankingAlgorithm interface *and* tau computation all in one class?

Yes, that would be simpler and better.

Thanks for working on this!

Phil
>
> Hope that makes sense?
> Cheers
> Dev
>
> On Tue, Jul 10, 2012 at 10:10 PM, Phil Steitz
> <phil.steitz@gmail.com <ma...@gmail.com>> wrote:
>
>     On 7/10/12 12:09 PM, Devl Devel wrote:
>     > Hi Phil and All.
>     >
>     > Thanks for the welcome. I manage to get,build and test the
>     SVN trunk branch
>     > and took a look at the Spearmans Rank implementation. I did
>     notice a few
>     > test failures overall in the build such as RealVectorTest,
>     hopefully they
>     > are part of the build and not something I am missing in my
>     checkout.
>
>     Don't worry about the RealVector test failures, that is a known
>     issue that will hopefully soon be resolved.
>     >
>     > My only question for now is: how can I view the Jenkins
>     build to see what's
>     > not passing tests at the moment? I understand there are
>     email alerts
>     > however it would be good to see (readonly) the state of the
>     current build
>     > somehow.
>
>     You can see the test output locally in /target/surefire-reports.
>     You should be able to validate everything locally.
>     >
>     > I've also added a JIRA entry
>     https://issues.apache.org/jira/browse/MATH-814 and
>     > on the wishlist
>     > http://wiki.apache.org/commons/MathWishList#preview
>     >
>     > Will update once there is any progress :)
>
>     Thanks!
>
>     Phil
>     >
>     > Cheers
>     > Dev
>     > On Thu, Jul 5, 2012 at 10:24 PM, Devl Devel
>     <devl.development@gmail.com
>     <ma...@gmail.com>>wrote:
>     >
>     >> Hi All,
>     >>
>     >> Below is a proposal for a new feature:
>     >>
>     >> *A concise description of the new feature / enhancement*
>     >> *
>     >> *
>     >> I propose a new feature to implement the Kendall's Tau
>     which is a measure
>     >> of Association/Correlation between ranked ordinal data.
>     >>
>     >> *References to definitions and algorithms.*
>     >> *
>     >> *A basic description is available at
>     >>
>     http://en.wikipedia.org/wiki/Kendall_tau_rank_correlation_coefficient
>     however
>     >> the test implementation will follow that defined by
>     "Handbook of
>     >> Parametric and Nonparametric Statistical Procedures, Fifth
>     Edition, Page
>     >> 1393 Test 30, ISBN-10: 1439858012 | ISBN-13: 978-1439858011."
>     >>
>     >> The algorithm is proposed as follows.
>     >>
>     >> Given two rankings or permutations represented by a 2D
>     matrix; columns
>     >> indicate rankings (e.g. by an individual) and row are
>     observations of each
>     >> rank. The algorithm is to calculate the total number of
>     concordant pairs of
>     >> ranks (between columns), discordant pairs of ranks
>      (between columns) and
>     >> calculate the Tau defined as
>     >>
>     >> tau= (Number of concordant - number of discordant)/(n(n-1)/2)
>     >>  where n(n-1)/2 is the total number of possible pairs of ranks.
>     >>
>     >> The method will then output the tau value between 0 and 1
>     where 1
>     >> signifies a "perfect" correlation between the two ranked lists.
>     >>
>     >> Where ties exist within a ranking it is marked as neither
>     concordant nor
>     >> discordant in the calculation. An optional merge sort can
>     be used to speed
>     >> up the implementation. Details are in the wiki page.
>     >>
>     >> *Some indication of why the addition / enhancement is
>     practically useful*
>     >> *
>     >> *
>     >> Although this implementation is not particularly complex it
>     would be
>     >> useful to have it in a consistent format in the commons
>     math package in
>     >> addition to existing correlation tests. Kendall's Tau is
>     used effectively
>     >> in comparing ranks for products, rankings from search
>     engines or
>     >> measurements from engineering equipment.
>     >>
>     >> This  is my first post on this list, I tried to follow the
>     guidelines but
>     >> let me know if I need to elaborate.
>     >>
>     >> Regards
>     >> Dev
>     >>
>     >>
>
>
>
>     ---------------------------------------------------------------------
>     To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>     <ma...@commons.apache.org>
>     For additional commands, e-mail: dev-help@commons.apache.org
>     <ma...@commons.apache.org>
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [math] Kendall's Tau Implementation

Posted by Devl Devel <de...@gmail.com>.
Hi Phil

I took a closer look at the Spearmans correlation and note that it uses an
underlying PearsonsCorrelation object to do the actual work of calculating
the correlation value after ranking.

 Do I have to do the same for Kendalls Tau? I.e. Do I need to have two
classes 1)KendallsTauCorrelation which is the equiv of SpearmansCorrelation
and then say KendallsTauComputation which is the equivilant of
PearsonsCorrelation? Of can I just put everything into one class called
KendallsTauCorrelation which does the ranking using the RankingAlgorithm
interface *and* tau computation all in one class?

Hope that makes sense?
Cheers
Dev

On Tue, Jul 10, 2012 at 10:10 PM, Phil Steitz <ph...@gmail.com> wrote:

> On 7/10/12 12:09 PM, Devl Devel wrote:
> > Hi Phil and All.
> >
> > Thanks for the welcome. I manage to get,build and test the SVN trunk
> branch
> > and took a look at the Spearmans Rank implementation. I did notice a few
> > test failures overall in the build such as RealVectorTest, hopefully they
> > are part of the build and not something I am missing in my checkout.
>
> Don't worry about the RealVector test failures, that is a known
> issue that will hopefully soon be resolved.
> >
> > My only question for now is: how can I view the Jenkins build to see
> what's
> > not passing tests at the moment? I understand there are email alerts
> > however it would be good to see (readonly) the state of the current build
> > somehow.
>
> You can see the test output locally in /target/surefire-reports.
> You should be able to validate everything locally.
> >
> > I've also added a JIRA entry
> https://issues.apache.org/jira/browse/MATH-814 and
> > on the wishlist
> > http://wiki.apache.org/commons/MathWishList#preview
> >
> > Will update once there is any progress :)
>
> Thanks!
>
> Phil
> >
> > Cheers
> > Dev
> > On Thu, Jul 5, 2012 at 10:24 PM, Devl Devel <devl.development@gmail.com
> >wrote:
> >
> >> Hi All,
> >>
> >> Below is a proposal for a new feature:
> >>
> >> *A concise description of the new feature / enhancement*
> >> *
> >> *
> >> I propose a new feature to implement the Kendall's Tau which is a
> measure
> >> of Association/Correlation between ranked ordinal data.
> >>
> >> *References to definitions and algorithms.*
> >> *
> >> *A basic description is available at
> >> http://en.wikipedia.org/wiki/Kendall_tau_rank_correlation_coefficienthowever
> >> the test implementation will follow that defined by "Handbook of
> >> Parametric and Nonparametric Statistical Procedures, Fifth Edition, Page
> >> 1393 Test 30, ISBN-10: 1439858012 | ISBN-13: 978-1439858011."
> >>
> >> The algorithm is proposed as follows.
> >>
> >> Given two rankings or permutations represented by a 2D matrix; columns
> >> indicate rankings (e.g. by an individual) and row are observations of
> each
> >> rank. The algorithm is to calculate the total number of concordant
> pairs of
> >> ranks (between columns), discordant pairs of ranks  (between columns)
> and
> >> calculate the Tau defined as
> >>
> >> tau= (Number of concordant - number of discordant)/(n(n-1)/2)
> >>  where n(n-1)/2 is the total number of possible pairs of ranks.
> >>
> >> The method will then output the tau value between 0 and 1 where 1
> >> signifies a "perfect" correlation between the two ranked lists.
> >>
> >> Where ties exist within a ranking it is marked as neither concordant nor
> >> discordant in the calculation. An optional merge sort can be used to
> speed
> >> up the implementation. Details are in the wiki page.
> >>
> >> *Some indication of why the addition / enhancement is practically
> useful*
> >> *
> >> *
> >> Although this implementation is not particularly complex it would be
> >> useful to have it in a consistent format in the commons math package in
> >> addition to existing correlation tests. Kendall's Tau is used
> effectively
> >> in comparing ranks for products, rankings from search engines or
> >> measurements from engineering equipment.
> >>
> >> This  is my first post on this list, I tried to follow the guidelines
> but
> >> let me know if I need to elaborate.
> >>
> >> Regards
> >> Dev
> >>
> >>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>
>

Re: [math] Kendall's Tau Implementation

Posted by Phil Steitz <ph...@gmail.com>.
On 7/10/12 12:09 PM, Devl Devel wrote:
> Hi Phil and All.
>
> Thanks for the welcome. I manage to get,build and test the SVN trunk branch
> and took a look at the Spearmans Rank implementation. I did notice a few
> test failures overall in the build such as RealVectorTest, hopefully they
> are part of the build and not something I am missing in my checkout.

Don't worry about the RealVector test failures, that is a known
issue that will hopefully soon be resolved.
>
> My only question for now is: how can I view the Jenkins build to see what's
> not passing tests at the moment? I understand there are email alerts
> however it would be good to see (readonly) the state of the current build
> somehow.

You can see the test output locally in /target/surefire-reports. 
You should be able to validate everything locally.
>
> I've also added a JIRA entry https://issues.apache.org/jira/browse/MATH-814 and
> on the wishlist
> http://wiki.apache.org/commons/MathWishList#preview
>
> Will update once there is any progress :)

Thanks!

Phil
>
> Cheers
> Dev
> On Thu, Jul 5, 2012 at 10:24 PM, Devl Devel <de...@gmail.com>wrote:
>
>> Hi All,
>>
>> Below is a proposal for a new feature:
>>
>> *A concise description of the new feature / enhancement*
>> *
>> *
>> I propose a new feature to implement the Kendall's Tau which is a measure
>> of Association/Correlation between ranked ordinal data.
>>
>> *References to definitions and algorithms.*
>> *
>> *A basic description is available at
>> http://en.wikipedia.org/wiki/Kendall_tau_rank_correlation_coefficient however
>> the test implementation will follow that defined by "Handbook of
>> Parametric and Nonparametric Statistical Procedures, Fifth Edition, Page
>> 1393 Test 30, ISBN-10: 1439858012 | ISBN-13: 978-1439858011."
>>
>> The algorithm is proposed as follows.
>>
>> Given two rankings or permutations represented by a 2D matrix; columns
>> indicate rankings (e.g. by an individual) and row are observations of each
>> rank. The algorithm is to calculate the total number of concordant pairs of
>> ranks (between columns), discordant pairs of ranks  (between columns) and
>> calculate the Tau defined as
>>
>> tau= (Number of concordant - number of discordant)/(n(n-1)/2)
>>  where n(n-1)/2 is the total number of possible pairs of ranks.
>>
>> The method will then output the tau value between 0 and 1 where 1
>> signifies a "perfect" correlation between the two ranked lists.
>>
>> Where ties exist within a ranking it is marked as neither concordant nor
>> discordant in the calculation. An optional merge sort can be used to speed
>> up the implementation. Details are in the wiki page.
>>
>> *Some indication of why the addition / enhancement is practically useful*
>> *
>> *
>> Although this implementation is not particularly complex it would be
>> useful to have it in a consistent format in the commons math package in
>> addition to existing correlation tests. Kendall's Tau is used effectively
>> in comparing ranks for products, rankings from search engines or
>> measurements from engineering equipment.
>>
>> This  is my first post on this list, I tried to follow the guidelines but
>> let me know if I need to elaborate.
>>
>> Regards
>> Dev
>>
>>



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


[math] Kendall's Tau Implementation

Posted by Devl Devel <de...@gmail.com>.
Hi Phil and All.

Thanks for the welcome. I manage to get,build and test the SVN trunk branch
and took a look at the Spearmans Rank implementation. I did notice a few
test failures overall in the build such as RealVectorTest, hopefully they
are part of the build and not something I am missing in my checkout.

My only question for now is: how can I view the Jenkins build to see what's
not passing tests at the moment? I understand there are email alerts
however it would be good to see (readonly) the state of the current build
somehow.

I've also added a JIRA entry https://issues.apache.org/jira/browse/MATH-814 and
on the wishlist
http://wiki.apache.org/commons/MathWishList#preview

Will update once there is any progress :)

Cheers
Dev
On Thu, Jul 5, 2012 at 10:24 PM, Devl Devel <de...@gmail.com>wrote:

> Hi All,
>
> Below is a proposal for a new feature:
>
> *A concise description of the new feature / enhancement*
> *
> *
> I propose a new feature to implement the Kendall's Tau which is a measure
> of Association/Correlation between ranked ordinal data.
>
> *References to definitions and algorithms.*
> *
> *A basic description is available at
> http://en.wikipedia.org/wiki/Kendall_tau_rank_correlation_coefficient however
> the test implementation will follow that defined by "Handbook of
> Parametric and Nonparametric Statistical Procedures, Fifth Edition, Page
> 1393 Test 30, ISBN-10: 1439858012 | ISBN-13: 978-1439858011."
>
> The algorithm is proposed as follows.
>
> Given two rankings or permutations represented by a 2D matrix; columns
> indicate rankings (e.g. by an individual) and row are observations of each
> rank. The algorithm is to calculate the total number of concordant pairs of
> ranks (between columns), discordant pairs of ranks  (between columns) and
> calculate the Tau defined as
>
> tau= (Number of concordant - number of discordant)/(n(n-1)/2)
>  where n(n-1)/2 is the total number of possible pairs of ranks.
>
> The method will then output the tau value between 0 and 1 where 1
> signifies a "perfect" correlation between the two ranked lists.
>
> Where ties exist within a ranking it is marked as neither concordant nor
> discordant in the calculation. An optional merge sort can be used to speed
> up the implementation. Details are in the wiki page.
>
> *Some indication of why the addition / enhancement is practically useful*
> *
> *
> Although this implementation is not particularly complex it would be
> useful to have it in a consistent format in the commons math package in
> addition to existing correlation tests. Kendall's Tau is used effectively
> in comparing ranks for products, rankings from search engines or
> measurements from engineering equipment.
>
> This  is my first post on this list, I tried to follow the guidelines but
> let me know if I need to elaborate.
>
> Regards
> Dev
>
>