You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@commons.apache.org by Phil Steitz <ph...@gmail.com> on 2012/10/18 07:26:55 UTC

Re: [Math] MATH-816 (mixture model distribution) =?utf-8?B?LiAgICAu? =. .

On 10/17/12 8:36 PM, Becksfort, Jared wrote:
> I see.  I am planning to submit the EM fit for multivariate normal mixture models in the next couple of weeks (Math-817).  A Gibbs sampling DP fit may be a bit further out.   I am not opposed to allowing the number of components to change, but I also like the simplicity of this class.  Whatever you guys decide is probably fine.

I like the interface as implemented for what it represents, but I
agree with Ted's point above.  I also wonder if implementing the
multivariate distribution interface is really buying you anything. 
Certainly not for the Gibbs sampler.  It might be better to just
directly implement EM with an interface that is natural for fitting
and using mixture models.   I am not sure this stuff belongs in the
distribution package in any case.  Where were we intending to place
the EM fit?  Can you describe a little more how exactly the
practical use cases you have in mind will work? 

Phil
>
> Jared 
> ________________________________________
> From: Ted Dunning [ted.dunning@gmail.com]
> Sent: Wednesday, October 17, 2012 9:41 PM
> To: Commons Developers List
> Subject: Re: [Math] MATH-816 (mixture model distribution)       =?utf-8?B?LiAgICAu? ==?utf-8?B?LiAgICAu?=
>
> The issue is that with a fixed number of components, you need to do
> multiple runs to find a best fit number of components.  Gibbs sampling
> against a Dirichlet process can get you to the same answer in about the
> same cost as a single run of EM with a fixed number of models.
>
> On Wed, Oct 17, 2012 at 7:31 PM, Becksfort, Jared <
> Jared.Becksfort@stjude.org> wrote:
>
>> Ted,
>>
>> I am not sure I understand the problem with the fixed number of
>> components.  My understanding is that CM prefers immutable objects. Adding
>> a component to an object would require reweighting in addition to modifying
>> the component list.  A new mixture model could be instantiated using the
>> getComponents function and then adding or removing more components if
>> necessary.
>>
>> Jared
>> ________________________________________
>> From: Ted Dunning [ted.dunning@gmail.com]
>> Sent: Wednesday, October 17, 2012 5:21 PM
>> To: Commons Developers List
>> Subject: Re: [Math] MATH-816 (mixture model
>> distribution)=?utf-8?B?LiAgICAu?    =
>>
>> Seems fine.
>>
>> I think that the limitation to a fixed number of mixture components is a
>> bit limiting.  So is the limitation to a uniform set of components.  Both
>> limitations can be eased without a huge difficultly.
>>
>> Avoiding the fixed number of components can be done by using some variant
>> of Dirichlet processes.  Simply picking k_max relatively large and then
>> using an approximate DP over that finite set works well.
>>
>> That said, mixture models are pretty nice to have.
>>
>> On Wed, Oct 17, 2012 at 2:13 PM, Gilles Sadowski <
>> gilles@harfang.homelinux.org> wrote:
>>
>>> Hello.
>>>
>>> Any objection to commit the code as proposed on the report page?
>>>   https://issues.apache.org/jira/browse/MATH-816
>>>
>>>
>>> Regards,
>>> Gilles
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>>> For additional commands, e-mail: dev-help@commons.apache.org
>>>
>>>
>> Email Disclaimer:  www.stjude.org/emaildisclaimer
>> Consultation Disclaimer:  www.stjude.org/consultationdisclaimer
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>> For additional commands, e-mail: dev-help@commons.apache.org
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


RE: [Math] MATH-816 (mixture model distribution)=?utf-8?B?LiAgICAu? =

Posted by "Becksfort, Jared" <Ja...@STJUDE.ORG>.
Typing this on my phone. Sorry about format.  The sampling part of the mixture model makes it a true distribution according to the abstract class and interface. It will also come in handy for simulating data, I think. I am already using it to simulate mri images.
_

I support adding get dimension function to interface. I can't do anything for a few days though.

_______________________________________
From: Phil Steitz [phil.steitz@gmail.com]
Sent: Thursday, October 18, 2012 2:50 PM
To: Commons Developers List
Subject: Re: [Math] MATH-816 (mixture model distribution)=?utf-8?B?LiAgICAu?    =

On 10/18/12 8:55 AM, Gilles Sadowski wrote:
> On Thu, Oct 18, 2012 at 06:59:22AM -0700, Phil Steitz wrote:
>> On 10/18/12 1:41 AM, Gilles Sadowski wrote:
>>> On Wed, Oct 17, 2012 at 10:26:55PM -0700, Phil Steitz wrote:
>>>> On 10/17/12 8:36 PM, Becksfort, Jared wrote:
>>>>> I see.  I am planning to submit the EM fit for multivariate normal mixture models in the next couple of weeks (Math-817).  A Gibbs sampling DP fit may be a bit further out.   I am not opposed to allowing the number of components to change, but I also like the simplicity of this class.  Whatever you guys decide is probably fine.
>>>> I like the interface as implemented for what it represents,
>>> By "interface", do you mean the class
>>>    MixtureMultivariateRealDistribution"
>>> as implemented in the file on the JIRA page?
>> Yes, the most recent one.  I like the way you set up the
>> constructors, handling the weights and distribution type parameter.
>>>> but I
>>>> agree with Ted's point above.  I also wonder if implementing the
>>>> multivariate distribution interface is really buying you anything.
>>>> Certainly not for the Gibbs sampler.  It might be better to just
>>>> directly implement EM with an interface that is natural for fitting
>>>> and using mixture models.   I am not sure this stuff belongs in the
>>>> distribution package in any case.
>>> As implemented, it seems quite natural.
>>> How this class will be used by non-existing code is beyond the scope of
>>> MATH-816.
>>> [And when the code exists, we can always revisit the design if necessary.]
>> It works for fixed component models, which I guess is OK by
>> consensus to start. The question I was asking is what exactly do you
>> get by having it extend the multivariate real distribution?
> Is it not a kind of distribution?
> [It's obvious that one can sample from it but maybe there are some required
> properties (for a distribution) which are missing from such a mixture (?).]

What is implemented is a legitimate distribution (or more precisely,
a legitimate density, which is all we really model in
RealMultivariateDistribution).  I just wonder whether there is value
in it as a distribution per se, rather than just a container for the
weights and component distribution parameters.  The sample()
implementation is legitimate - I just don't know if it has any
practical value.  I guess the density will be used by the EM impl.
As I said above, I am fine committing and then seeing how the EM
impl uses the class.  Assuming it does turn out to be practically
valuable as a distribution, a natural thing to add would be a
univariate version; but that would require an actual distribution
function.

Phil
>
>> I guess
>> that will become clear when we get the EM implementation.
> Hopefully.
>
>> I am OK committing this, I just wanted to get a clearer picture of
>> how the class was going to be used.
> I wouldn't be able to answer.
>
>
> Gilles
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org



Email Disclaimer:  www.stjude.org/emaildisclaimer
Consultation Disclaimer:  www.stjude.org/consultationdisclaimer


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [Math] MATH-816 (mixture model distribution)

Posted by Phil Steitz <ph...@gmail.com>.
On 10/18/12 8:55 AM, Gilles Sadowski wrote:
> On Thu, Oct 18, 2012 at 06:59:22AM -0700, Phil Steitz wrote:
>> On 10/18/12 1:41 AM, Gilles Sadowski wrote:
>>> On Wed, Oct 17, 2012 at 10:26:55PM -0700, Phil Steitz wrote:
>>>> On 10/17/12 8:36 PM, Becksfort, Jared wrote:
>>>>> I see.  I am planning to submit the EM fit for multivariate normal mixture models in the next couple of weeks (Math-817).  A Gibbs sampling DP fit may be a bit further out.   I am not opposed to allowing the number of components to change, but I also like the simplicity of this class.  Whatever you guys decide is probably fine.
>>>> I like the interface as implemented for what it represents,
>>> By "interface", do you mean the class
>>>    MixtureMultivariateRealDistribution"
>>> as implemented in the file on the JIRA page?
>> Yes, the most recent one.  I like the way you set up the
>> constructors, handling the weights and distribution type parameter.
>>>> but I
>>>> agree with Ted's point above.  I also wonder if implementing the
>>>> multivariate distribution interface is really buying you anything. 
>>>> Certainly not for the Gibbs sampler.  It might be better to just
>>>> directly implement EM with an interface that is natural for fitting
>>>> and using mixture models.   I am not sure this stuff belongs in the
>>>> distribution package in any case.
>>> As implemented, it seems quite natural.
>>> How this class will be used by non-existing code is beyond the scope of
>>> MATH-816.
>>> [And when the code exists, we can always revisit the design if necessary.]
>> It works for fixed component models, which I guess is OK by
>> consensus to start. The question I was asking is what exactly do you
>> get by having it extend the multivariate real distribution?
> Is it not a kind of distribution?
> [It's obvious that one can sample from it but maybe there are some required
> properties (for a distribution) which are missing from such a mixture (?).]

What is implemented is a legitimate distribution (or more precisely,
a legitimate density, which is all we really model in
RealMultivariateDistribution).  I just wonder whether there is value
in it as a distribution per se, rather than just a container for the
weights and component distribution parameters.  The sample()
implementation is legitimate - I just don't know if it has any
practical value.  I guess the density will be used by the EM impl. 
As I said above, I am fine committing and then seeing how the EM
impl uses the class.  Assuming it does turn out to be practically
valuable as a distribution, a natural thing to add would be a
univariate version; but that would require an actual distribution
function.

Phil
>
>> I guess
>> that will become clear when we get the EM implementation.
> Hopefully.
>
>> I am OK committing this, I just wanted to get a clearer picture of
>> how the class was going to be used.
> I wouldn't be able to answer.
>
>
> Gilles
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [Math] MATH-816 (mixture model distribution)

Posted by Gilles Sadowski <gi...@harfang.homelinux.org>.
On Thu, Oct 18, 2012 at 06:59:22AM -0700, Phil Steitz wrote:
> On 10/18/12 1:41 AM, Gilles Sadowski wrote:
> > On Wed, Oct 17, 2012 at 10:26:55PM -0700, Phil Steitz wrote:
> >> On 10/17/12 8:36 PM, Becksfort, Jared wrote:
> >>> I see.  I am planning to submit the EM fit for multivariate normal mixture models in the next couple of weeks (Math-817).  A Gibbs sampling DP fit may be a bit further out.   I am not opposed to allowing the number of components to change, but I also like the simplicity of this class.  Whatever you guys decide is probably fine.
> >> I like the interface as implemented for what it represents,
> > By "interface", do you mean the class
> >    MixtureMultivariateRealDistribution"
> > as implemented in the file on the JIRA page?
> 
> Yes, the most recent one.  I like the way you set up the
> constructors, handling the weights and distribution type parameter.
> >
> >> but I
> >> agree with Ted's point above.  I also wonder if implementing the
> >> multivariate distribution interface is really buying you anything. 
> >> Certainly not for the Gibbs sampler.  It might be better to just
> >> directly implement EM with an interface that is natural for fitting
> >> and using mixture models.   I am not sure this stuff belongs in the
> >> distribution package in any case.
> > As implemented, it seems quite natural.
> > How this class will be used by non-existing code is beyond the scope of
> > MATH-816.
> > [And when the code exists, we can always revisit the design if necessary.]
> 
> It works for fixed component models, which I guess is OK by
> consensus to start. The question I was asking is what exactly do you
> get by having it extend the multivariate real distribution?

Is it not a kind of distribution?
[It's obvious that one can sample from it but maybe there are some required
properties (for a distribution) which are missing from such a mixture (?).]

> I guess
> that will become clear when we get the EM implementation.

Hopefully.

> I am OK committing this, I just wanted to get a clearer picture of
> how the class was going to be used.

I wouldn't be able to answer.


Gilles

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [Math] MATH-816 (mixture model distribution)

Posted by Phil Steitz <ph...@gmail.com>.
On 10/18/12 1:41 AM, Gilles Sadowski wrote:
> On Wed, Oct 17, 2012 at 10:26:55PM -0700, Phil Steitz wrote:
>> On 10/17/12 8:36 PM, Becksfort, Jared wrote:
>>> I see.  I am planning to submit the EM fit for multivariate normal mixture models in the next couple of weeks (Math-817).  A Gibbs sampling DP fit may be a bit further out.   I am not opposed to allowing the number of components to change, but I also like the simplicity of this class.  Whatever you guys decide is probably fine.
>> I like the interface as implemented for what it represents,
> By "interface", do you mean the class
>    MixtureMultivariateRealDistribution"
> as implemented in the file on the JIRA page?

Yes, the most recent one.  I like the way you set up the
constructors, handling the weights and distribution type parameter.
>
>> but I
>> agree with Ted's point above.  I also wonder if implementing the
>> multivariate distribution interface is really buying you anything. 
>> Certainly not for the Gibbs sampler.  It might be better to just
>> directly implement EM with an interface that is natural for fitting
>> and using mixture models.   I am not sure this stuff belongs in the
>> distribution package in any case.
> As implemented, it seems quite natural.
> How this class will be used by non-existing code is beyond the scope of
> MATH-816.
> [And when the code exists, we can always revisit the design if necessary.]

It works for fixed component models, which I guess is OK by
consensus to start. The question I was asking is what exactly do you
get by having it extend the multivariate real distribution? I guess
that will become clear when we get the EM implementation.

I am OK committing this, I just wanted to get a clearer picture of
how the class was going to be used.

Phil
>
>>  Where were we intending to place
>> the EM fit?
> Not in distribution; I agree.
>
>>  Can you describe a little more how exactly the
>> practical use cases you have in mind will work? 
> This will probably be for a new JIRA feature request.
>
>
> Regards,
> Gilles
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [Math] MATH-816 (mixture model distribution)

Posted by Gilles Sadowski <gi...@harfang.homelinux.org>.
On Wed, Oct 17, 2012 at 10:26:55PM -0700, Phil Steitz wrote:
> On 10/17/12 8:36 PM, Becksfort, Jared wrote:
> > I see.  I am planning to submit the EM fit for multivariate normal mixture models in the next couple of weeks (Math-817).  A Gibbs sampling DP fit may be a bit further out.   I am not opposed to allowing the number of components to change, but I also like the simplicity of this class.  Whatever you guys decide is probably fine.
> 
> I like the interface as implemented for what it represents,

By "interface", do you mean the class
   MixtureMultivariateRealDistribution"
as implemented in the file on the JIRA page?

> but I
> agree with Ted's point above.  I also wonder if implementing the
> multivariate distribution interface is really buying you anything. 
> Certainly not for the Gibbs sampler.  It might be better to just
> directly implement EM with an interface that is natural for fitting
> and using mixture models.   I am not sure this stuff belongs in the
> distribution package in any case.

As implemented, it seems quite natural.
How this class will be used by non-existing code is beyond the scope of
MATH-816.
[And when the code exists, we can always revisit the design if necessary.]

>  Where were we intending to place
> the EM fit?

Not in distribution; I agree.

>  Can you describe a little more how exactly the
> practical use cases you have in mind will work? 

This will probably be for a new JIRA feature request.


Regards,
Gilles

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org