You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@commons.apache.org by Ted Dunning <te...@gmail.com> on 2012/10/18 04:41:57 UTC

Re: [Math] MATH-816 (mixture model distribution)=?utf-8?B?LiAgICAu? =

The issue is that with a fixed number of components, you need to do
multiple runs to find a best fit number of components.  Gibbs sampling
against a Dirichlet process can get you to the same answer in about the
same cost as a single run of EM with a fixed number of models.

On Wed, Oct 17, 2012 at 7:31 PM, Becksfort, Jared <
Jared.Becksfort@stjude.org> wrote:

> Ted,
>
> I am not sure I understand the problem with the fixed number of
> components.  My understanding is that CM prefers immutable objects. Adding
> a component to an object would require reweighting in addition to modifying
> the component list.  A new mixture model could be instantiated using the
> getComponents function and then adding or removing more components if
> necessary.
>
> Jared
> ________________________________________
> From: Ted Dunning [ted.dunning@gmail.com]
> Sent: Wednesday, October 17, 2012 5:21 PM
> To: Commons Developers List
> Subject: Re: [Math] MATH-816 (mixture model
> distribution)=?utf-8?B?LiAgICAu?    =
>
> Seems fine.
>
> I think that the limitation to a fixed number of mixture components is a
> bit limiting.  So is the limitation to a uniform set of components.  Both
> limitations can be eased without a huge difficultly.
>
> Avoiding the fixed number of components can be done by using some variant
> of Dirichlet processes.  Simply picking k_max relatively large and then
> using an approximate DP over that finite set works well.
>
> That said, mixture models are pretty nice to have.
>
> On Wed, Oct 17, 2012 at 2:13 PM, Gilles Sadowski <
> gilles@harfang.homelinux.org> wrote:
>
> > Hello.
> >
> > Any objection to commit the code as proposed on the report page?
> >   https://issues.apache.org/jira/browse/MATH-816
> >
> >
> > Regards,
> > Gilles
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> > For additional commands, e-mail: dev-help@commons.apache.org
> >
> >
>
> Email Disclaimer:  www.stjude.org/emaildisclaimer
> Consultation Disclaimer:  www.stjude.org/consultationdisclaimer
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>
>

Re: [Math] MATH-816 (mixture model distribution) . .

Posted by Ted Dunning <te...@gmail.com>.
Existing code does have a certain cachet to it.

On Thu, Oct 18, 2012 at 5:13 AM, Patrick Meyer <me...@gmail.com> wrote:

> I vote for simplicity. Current practice in the social sciences is to fit
> multiple models, each with a different number of components, and use fit
> statistics to choose the best model.
>
> There are some additional features I would like to see added and I have the
> code to contribute if it is not currently there. To be consistent with
> Mplus, we need have the algorithm use multiple random starts and run a few
> of the best starts to completion. Mplus uses this strategy to effectively
> overcome local minima.
>
>
> -----Original Message-----
> From: Becksfort, Jared [mailto:Jared.Becksfort@STJUDE.ORG]
> Sent: Wednesday, October 17, 2012 11:37 PM
> To: Commons Developers List
> Subject: RE: [Math] MATH-816 (mixture model distribution) . .
>
> I see.  I am planning to submit the EM fit for multivariate normal mixture
> models in the next couple of weeks (Math-817).  A Gibbs sampling DP fit may
> be a bit further out.   I am not opposed to allowing the number of
> components to change, but I also like the simplicity of this class.
> Whatever you guys decide is probably fine.
>
> Jared
> ________________________________________
> From: Ted Dunning [ted.dunning@gmail.com]
> Sent: Wednesday, October 17, 2012 9:41 PM
> To: Commons Developers List
> Subject: Re: [Math] MATH-816 (mixture model distribution)
> =?utf-8?B?LiAgICAu? ==?utf-8?B?LiAgICAu?=
>
> The issue is that with a fixed number of components, you need to do
> multiple
> runs to find a best fit number of components.  Gibbs sampling against a
> Dirichlet process can get you to the same answer in about the same cost as
> a
> single run of EM with a fixed number of models.
>
> On Wed, Oct 17, 2012 at 7:31 PM, Becksfort, Jared <
> Jared.Becksfort@stjude.org> wrote:
>
> > Ted,
> >
> > I am not sure I understand the problem with the fixed number of
> > components.  My understanding is that CM prefers immutable objects.
> > Adding a component to an object would require reweighting in addition
> > to modifying the component list.  A new mixture model could be
> > instantiated using the getComponents function and then adding or
> > removing more components if necessary.
> >
> > Jared
> > ________________________________________
> > From: Ted Dunning [ted.dunning@gmail.com]
> > Sent: Wednesday, October 17, 2012 5:21 PM
> > To: Commons Developers List
> > Subject: Re: [Math] MATH-816 (mixture model
> > distribution)=?utf-8?B?LiAgICAu?    =
> >
> > Seems fine.
> >
> > I think that the limitation to a fixed number of mixture components is
> > a bit limiting.  So is the limitation to a uniform set of components.
> > Both limitations can be eased without a huge difficultly.
> >
> > Avoiding the fixed number of components can be done by using some
> > variant of Dirichlet processes.  Simply picking k_max relatively large
> > and then using an approximate DP over that finite set works well.
> >
> > That said, mixture models are pretty nice to have.
> >
> > On Wed, Oct 17, 2012 at 2:13 PM, Gilles Sadowski <
> > gilles@harfang.homelinux.org> wrote:
> >
> > > Hello.
> > >
> > > Any objection to commit the code as proposed on the report page?
> > >   https://issues.apache.org/jira/browse/MATH-816
> > >
> > >
> > > Regards,
> > > Gilles
> > >
> > > --------------------------------------------------------------------
> > > - To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> > > For additional commands, e-mail: dev-help@commons.apache.org
> > >
> > >
> >
> > Email Disclaimer:  www.stjude.org/emaildisclaimer Consultation
> > Disclaimer:  www.stjude.org/consultationdisclaimer
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> > For additional commands, e-mail: dev-help@commons.apache.org
> >
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>
>

Re: [Math] MATH-816 (mixture model distribution)

Posted by Gilles Sadowski <gi...@harfang.homelinux.org>.
On Thu, Oct 18, 2012 at 11:00:17AM -0400, Patrick Meyer wrote:
> Yes, I like the latest changes. It looks cleaner to me.
> 
> It seems that the attachments only describe the mixture distribution and do
> not provide the EM estimation algorithm. Am I missing something?  Didn't the
> original contributor mention the estimation part too? The parts I would like
> to add will need the EM part first. 

Jared has broken up his proposal into three parts; the enclosing issue is
  https://issues.apache.org/jira/browse/MATH-817
which explicitly refers to the EM part, but no implementation of this has
been submitted yet.


Regards,
Gilles

> [...]

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


RE: [Math] MATH-816 (mixture model distribution)

Posted by Patrick Meyer <me...@gmail.com>.
Yes, I like the latest changes. It looks cleaner to me.

It seems that the attachments only describe the mixture distribution and do
not provide the EM estimation algorithm. Am I missing something?  Didn't the
original contributor mention the estimation part too? The parts I would like
to add will need the EM part first. 

Thanks!


-----Original Message-----
From: Gilles Sadowski [mailto:gilles@harfang.homelinux.org] 
Sent: Thursday, October 18, 2012 9:33 AM
To: dev@commons.apache.org
Subject: Re: [Math] MATH-816 (mixture model distribution)

On Thu, Oct 18, 2012 at 08:13:52AM -0400, Patrick Meyer wrote:
> I vote for simplicity. Current practice in the social sciences is to 
> fit multiple models, each with a different number of components, and 
> use fit statistics to choose the best model.

So... Do you vote for the current proposal (as in the latest attachment on
the JIRA page)? [Sorry for being dense. :-)]

[The (simple) "mixture model" code could be in 3.1, due to be out a couple
of weeks _ago_. :-}]

> 
> There are some additional features I would like to see added and I 
> have the code to contribute if it is not currently there. To be 
> consistent with Mplus, we need have the algorithm use multiple random 
> starts and run a few of the best starts to completion. Mplus uses this 
> strategy to effectively overcome local minima.

Proposals welcome; please open a feature request with an outline of the
implementation.


Thanks,
Gilles

> [...]

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [Math] MATH-816 (mixture model distribution)

Posted by Gilles Sadowski <gi...@harfang.homelinux.org>.
On Thu, Oct 18, 2012 at 08:13:52AM -0400, Patrick Meyer wrote:
> I vote for simplicity. Current practice in the social sciences is to fit
> multiple models, each with a different number of components, and use fit
> statistics to choose the best model.

So... Do you vote for the current proposal (as in the latest attachment
on the JIRA page)? [Sorry for being dense. :-)]

[The (simple) "mixture model" code could be in 3.1, due to be out a couple
of weeks _ago_. :-}]

> 
> There are some additional features I would like to see added and I have the
> code to contribute if it is not currently there. To be consistent with
> Mplus, we need have the algorithm use multiple random starts and run a few
> of the best starts to completion. Mplus uses this strategy to effectively
> overcome local minima.

Proposals welcome; please open a feature request with an outline of the
implementation.


Thanks,
Gilles

> [...]

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


RE: [Math] MATH-816 (mixture model distribution) . .

Posted by Patrick Meyer <me...@gmail.com>.
I vote for simplicity. Current practice in the social sciences is to fit
multiple models, each with a different number of components, and use fit
statistics to choose the best model. 

There are some additional features I would like to see added and I have the
code to contribute if it is not currently there. To be consistent with
Mplus, we need have the algorithm use multiple random starts and run a few
of the best starts to completion. Mplus uses this strategy to effectively
overcome local minima.


-----Original Message-----
From: Becksfort, Jared [mailto:Jared.Becksfort@STJUDE.ORG] 
Sent: Wednesday, October 17, 2012 11:37 PM
To: Commons Developers List
Subject: RE: [Math] MATH-816 (mixture model distribution) . .

I see.  I am planning to submit the EM fit for multivariate normal mixture
models in the next couple of weeks (Math-817).  A Gibbs sampling DP fit may
be a bit further out.   I am not opposed to allowing the number of
components to change, but I also like the simplicity of this class.
Whatever you guys decide is probably fine.

Jared
________________________________________
From: Ted Dunning [ted.dunning@gmail.com]
Sent: Wednesday, October 17, 2012 9:41 PM
To: Commons Developers List
Subject: Re: [Math] MATH-816 (mixture model distribution)
=?utf-8?B?LiAgICAu? ==?utf-8?B?LiAgICAu?=

The issue is that with a fixed number of components, you need to do multiple
runs to find a best fit number of components.  Gibbs sampling against a
Dirichlet process can get you to the same answer in about the same cost as a
single run of EM with a fixed number of models.

On Wed, Oct 17, 2012 at 7:31 PM, Becksfort, Jared <
Jared.Becksfort@stjude.org> wrote:

> Ted,
>
> I am not sure I understand the problem with the fixed number of 
> components.  My understanding is that CM prefers immutable objects. 
> Adding a component to an object would require reweighting in addition 
> to modifying the component list.  A new mixture model could be 
> instantiated using the getComponents function and then adding or 
> removing more components if necessary.
>
> Jared
> ________________________________________
> From: Ted Dunning [ted.dunning@gmail.com]
> Sent: Wednesday, October 17, 2012 5:21 PM
> To: Commons Developers List
> Subject: Re: [Math] MATH-816 (mixture model
> distribution)=?utf-8?B?LiAgICAu?    =
>
> Seems fine.
>
> I think that the limitation to a fixed number of mixture components is 
> a bit limiting.  So is the limitation to a uniform set of components.  
> Both limitations can be eased without a huge difficultly.
>
> Avoiding the fixed number of components can be done by using some 
> variant of Dirichlet processes.  Simply picking k_max relatively large 
> and then using an approximate DP over that finite set works well.
>
> That said, mixture models are pretty nice to have.
>
> On Wed, Oct 17, 2012 at 2:13 PM, Gilles Sadowski < 
> gilles@harfang.homelinux.org> wrote:
>
> > Hello.
> >
> > Any objection to commit the code as proposed on the report page?
> >   https://issues.apache.org/jira/browse/MATH-816
> >
> >
> > Regards,
> > Gilles
> >
> > --------------------------------------------------------------------
> > - To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> > For additional commands, e-mail: dev-help@commons.apache.org
> >
> >
>
> Email Disclaimer:  www.stjude.org/emaildisclaimer Consultation 
> Disclaimer:  www.stjude.org/consultationdisclaimer
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


RE: [Math] MATH-816 (mixture model distribution)=?utf-8?B?LiAgICAu? =

Posted by "Becksfort, Jared" <Ja...@STJUDE.ORG>.
Typing this on my phone. Sorry about format.  The sampling part of the mixture model makes it a true distribution according to the abstract class and interface. It will also come in handy for simulating data, I think. I am already using it to simulate mri images.
_

I support adding get dimension function to interface. I can't do anything for a few days though.

_______________________________________
From: Phil Steitz [phil.steitz@gmail.com]
Sent: Thursday, October 18, 2012 2:50 PM
To: Commons Developers List
Subject: Re: [Math] MATH-816 (mixture model distribution)=?utf-8?B?LiAgICAu?    =

On 10/18/12 8:55 AM, Gilles Sadowski wrote:
> On Thu, Oct 18, 2012 at 06:59:22AM -0700, Phil Steitz wrote:
>> On 10/18/12 1:41 AM, Gilles Sadowski wrote:
>>> On Wed, Oct 17, 2012 at 10:26:55PM -0700, Phil Steitz wrote:
>>>> On 10/17/12 8:36 PM, Becksfort, Jared wrote:
>>>>> I see.  I am planning to submit the EM fit for multivariate normal mixture models in the next couple of weeks (Math-817).  A Gibbs sampling DP fit may be a bit further out.   I am not opposed to allowing the number of components to change, but I also like the simplicity of this class.  Whatever you guys decide is probably fine.
>>>> I like the interface as implemented for what it represents,
>>> By "interface", do you mean the class
>>>    MixtureMultivariateRealDistribution"
>>> as implemented in the file on the JIRA page?
>> Yes, the most recent one.  I like the way you set up the
>> constructors, handling the weights and distribution type parameter.
>>>> but I
>>>> agree with Ted's point above.  I also wonder if implementing the
>>>> multivariate distribution interface is really buying you anything.
>>>> Certainly not for the Gibbs sampler.  It might be better to just
>>>> directly implement EM with an interface that is natural for fitting
>>>> and using mixture models.   I am not sure this stuff belongs in the
>>>> distribution package in any case.
>>> As implemented, it seems quite natural.
>>> How this class will be used by non-existing code is beyond the scope of
>>> MATH-816.
>>> [And when the code exists, we can always revisit the design if necessary.]
>> It works for fixed component models, which I guess is OK by
>> consensus to start. The question I was asking is what exactly do you
>> get by having it extend the multivariate real distribution?
> Is it not a kind of distribution?
> [It's obvious that one can sample from it but maybe there are some required
> properties (for a distribution) which are missing from such a mixture (?).]

What is implemented is a legitimate distribution (or more precisely,
a legitimate density, which is all we really model in
RealMultivariateDistribution).  I just wonder whether there is value
in it as a distribution per se, rather than just a container for the
weights and component distribution parameters.  The sample()
implementation is legitimate - I just don't know if it has any
practical value.  I guess the density will be used by the EM impl.
As I said above, I am fine committing and then seeing how the EM
impl uses the class.  Assuming it does turn out to be practically
valuable as a distribution, a natural thing to add would be a
univariate version; but that would require an actual distribution
function.

Phil
>
>> I guess
>> that will become clear when we get the EM implementation.
> Hopefully.
>
>> I am OK committing this, I just wanted to get a clearer picture of
>> how the class was going to be used.
> I wouldn't be able to answer.
>
>
> Gilles
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org



Email Disclaimer:  www.stjude.org/emaildisclaimer
Consultation Disclaimer:  www.stjude.org/consultationdisclaimer


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [Math] MATH-816 (mixture model distribution)

Posted by Phil Steitz <ph...@gmail.com>.
On 10/18/12 8:55 AM, Gilles Sadowski wrote:
> On Thu, Oct 18, 2012 at 06:59:22AM -0700, Phil Steitz wrote:
>> On 10/18/12 1:41 AM, Gilles Sadowski wrote:
>>> On Wed, Oct 17, 2012 at 10:26:55PM -0700, Phil Steitz wrote:
>>>> On 10/17/12 8:36 PM, Becksfort, Jared wrote:
>>>>> I see.  I am planning to submit the EM fit for multivariate normal mixture models in the next couple of weeks (Math-817).  A Gibbs sampling DP fit may be a bit further out.   I am not opposed to allowing the number of components to change, but I also like the simplicity of this class.  Whatever you guys decide is probably fine.
>>>> I like the interface as implemented for what it represents,
>>> By "interface", do you mean the class
>>>    MixtureMultivariateRealDistribution"
>>> as implemented in the file on the JIRA page?
>> Yes, the most recent one.  I like the way you set up the
>> constructors, handling the weights and distribution type parameter.
>>>> but I
>>>> agree with Ted's point above.  I also wonder if implementing the
>>>> multivariate distribution interface is really buying you anything. 
>>>> Certainly not for the Gibbs sampler.  It might be better to just
>>>> directly implement EM with an interface that is natural for fitting
>>>> and using mixture models.   I am not sure this stuff belongs in the
>>>> distribution package in any case.
>>> As implemented, it seems quite natural.
>>> How this class will be used by non-existing code is beyond the scope of
>>> MATH-816.
>>> [And when the code exists, we can always revisit the design if necessary.]
>> It works for fixed component models, which I guess is OK by
>> consensus to start. The question I was asking is what exactly do you
>> get by having it extend the multivariate real distribution?
> Is it not a kind of distribution?
> [It's obvious that one can sample from it but maybe there are some required
> properties (for a distribution) which are missing from such a mixture (?).]

What is implemented is a legitimate distribution (or more precisely,
a legitimate density, which is all we really model in
RealMultivariateDistribution).  I just wonder whether there is value
in it as a distribution per se, rather than just a container for the
weights and component distribution parameters.  The sample()
implementation is legitimate - I just don't know if it has any
practical value.  I guess the density will be used by the EM impl. 
As I said above, I am fine committing and then seeing how the EM
impl uses the class.  Assuming it does turn out to be practically
valuable as a distribution, a natural thing to add would be a
univariate version; but that would require an actual distribution
function.

Phil
>
>> I guess
>> that will become clear when we get the EM implementation.
> Hopefully.
>
>> I am OK committing this, I just wanted to get a clearer picture of
>> how the class was going to be used.
> I wouldn't be able to answer.
>
>
> Gilles
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [Math] MATH-816 (mixture model distribution)

Posted by Gilles Sadowski <gi...@harfang.homelinux.org>.
On Thu, Oct 18, 2012 at 06:59:22AM -0700, Phil Steitz wrote:
> On 10/18/12 1:41 AM, Gilles Sadowski wrote:
> > On Wed, Oct 17, 2012 at 10:26:55PM -0700, Phil Steitz wrote:
> >> On 10/17/12 8:36 PM, Becksfort, Jared wrote:
> >>> I see.  I am planning to submit the EM fit for multivariate normal mixture models in the next couple of weeks (Math-817).  A Gibbs sampling DP fit may be a bit further out.   I am not opposed to allowing the number of components to change, but I also like the simplicity of this class.  Whatever you guys decide is probably fine.
> >> I like the interface as implemented for what it represents,
> > By "interface", do you mean the class
> >    MixtureMultivariateRealDistribution"
> > as implemented in the file on the JIRA page?
> 
> Yes, the most recent one.  I like the way you set up the
> constructors, handling the weights and distribution type parameter.
> >
> >> but I
> >> agree with Ted's point above.  I also wonder if implementing the
> >> multivariate distribution interface is really buying you anything. 
> >> Certainly not for the Gibbs sampler.  It might be better to just
> >> directly implement EM with an interface that is natural for fitting
> >> and using mixture models.   I am not sure this stuff belongs in the
> >> distribution package in any case.
> > As implemented, it seems quite natural.
> > How this class will be used by non-existing code is beyond the scope of
> > MATH-816.
> > [And when the code exists, we can always revisit the design if necessary.]
> 
> It works for fixed component models, which I guess is OK by
> consensus to start. The question I was asking is what exactly do you
> get by having it extend the multivariate real distribution?

Is it not a kind of distribution?
[It's obvious that one can sample from it but maybe there are some required
properties (for a distribution) which are missing from such a mixture (?).]

> I guess
> that will become clear when we get the EM implementation.

Hopefully.

> I am OK committing this, I just wanted to get a clearer picture of
> how the class was going to be used.

I wouldn't be able to answer.


Gilles

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [Math] MATH-816 (mixture model distribution)

Posted by Phil Steitz <ph...@gmail.com>.
On 10/18/12 1:41 AM, Gilles Sadowski wrote:
> On Wed, Oct 17, 2012 at 10:26:55PM -0700, Phil Steitz wrote:
>> On 10/17/12 8:36 PM, Becksfort, Jared wrote:
>>> I see.  I am planning to submit the EM fit for multivariate normal mixture models in the next couple of weeks (Math-817).  A Gibbs sampling DP fit may be a bit further out.   I am not opposed to allowing the number of components to change, but I also like the simplicity of this class.  Whatever you guys decide is probably fine.
>> I like the interface as implemented for what it represents,
> By "interface", do you mean the class
>    MixtureMultivariateRealDistribution"
> as implemented in the file on the JIRA page?

Yes, the most recent one.  I like the way you set up the
constructors, handling the weights and distribution type parameter.
>
>> but I
>> agree with Ted's point above.  I also wonder if implementing the
>> multivariate distribution interface is really buying you anything. 
>> Certainly not for the Gibbs sampler.  It might be better to just
>> directly implement EM with an interface that is natural for fitting
>> and using mixture models.   I am not sure this stuff belongs in the
>> distribution package in any case.
> As implemented, it seems quite natural.
> How this class will be used by non-existing code is beyond the scope of
> MATH-816.
> [And when the code exists, we can always revisit the design if necessary.]

It works for fixed component models, which I guess is OK by
consensus to start. The question I was asking is what exactly do you
get by having it extend the multivariate real distribution? I guess
that will become clear when we get the EM implementation.

I am OK committing this, I just wanted to get a clearer picture of
how the class was going to be used.

Phil
>
>>  Where were we intending to place
>> the EM fit?
> Not in distribution; I agree.
>
>>  Can you describe a little more how exactly the
>> practical use cases you have in mind will work? 
> This will probably be for a new JIRA feature request.
>
>
> Regards,
> Gilles
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [Math] MATH-816 (mixture model distribution)

Posted by Gilles Sadowski <gi...@harfang.homelinux.org>.
On Wed, Oct 17, 2012 at 10:26:55PM -0700, Phil Steitz wrote:
> On 10/17/12 8:36 PM, Becksfort, Jared wrote:
> > I see.  I am planning to submit the EM fit for multivariate normal mixture models in the next couple of weeks (Math-817).  A Gibbs sampling DP fit may be a bit further out.   I am not opposed to allowing the number of components to change, but I also like the simplicity of this class.  Whatever you guys decide is probably fine.
> 
> I like the interface as implemented for what it represents,

By "interface", do you mean the class
   MixtureMultivariateRealDistribution"
as implemented in the file on the JIRA page?

> but I
> agree with Ted's point above.  I also wonder if implementing the
> multivariate distribution interface is really buying you anything. 
> Certainly not for the Gibbs sampler.  It might be better to just
> directly implement EM with an interface that is natural for fitting
> and using mixture models.   I am not sure this stuff belongs in the
> distribution package in any case.

As implemented, it seems quite natural.
How this class will be used by non-existing code is beyond the scope of
MATH-816.
[And when the code exists, we can always revisit the design if necessary.]

>  Where were we intending to place
> the EM fit?

Not in distribution; I agree.

>  Can you describe a little more how exactly the
> practical use cases you have in mind will work? 

This will probably be for a new JIRA feature request.


Regards,
Gilles

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [Math] MATH-816 (mixture model distribution) =?utf-8?B?LiAgICAu? =. .

Posted by Phil Steitz <ph...@gmail.com>.
On 10/17/12 8:36 PM, Becksfort, Jared wrote:
> I see.  I am planning to submit the EM fit for multivariate normal mixture models in the next couple of weeks (Math-817).  A Gibbs sampling DP fit may be a bit further out.   I am not opposed to allowing the number of components to change, but I also like the simplicity of this class.  Whatever you guys decide is probably fine.

I like the interface as implemented for what it represents, but I
agree with Ted's point above.  I also wonder if implementing the
multivariate distribution interface is really buying you anything. 
Certainly not for the Gibbs sampler.  It might be better to just
directly implement EM with an interface that is natural for fitting
and using mixture models.   I am not sure this stuff belongs in the
distribution package in any case.  Where were we intending to place
the EM fit?  Can you describe a little more how exactly the
practical use cases you have in mind will work? 

Phil
>
> Jared 
> ________________________________________
> From: Ted Dunning [ted.dunning@gmail.com]
> Sent: Wednesday, October 17, 2012 9:41 PM
> To: Commons Developers List
> Subject: Re: [Math] MATH-816 (mixture model distribution)       =?utf-8?B?LiAgICAu? ==?utf-8?B?LiAgICAu?=
>
> The issue is that with a fixed number of components, you need to do
> multiple runs to find a best fit number of components.  Gibbs sampling
> against a Dirichlet process can get you to the same answer in about the
> same cost as a single run of EM with a fixed number of models.
>
> On Wed, Oct 17, 2012 at 7:31 PM, Becksfort, Jared <
> Jared.Becksfort@stjude.org> wrote:
>
>> Ted,
>>
>> I am not sure I understand the problem with the fixed number of
>> components.  My understanding is that CM prefers immutable objects. Adding
>> a component to an object would require reweighting in addition to modifying
>> the component list.  A new mixture model could be instantiated using the
>> getComponents function and then adding or removing more components if
>> necessary.
>>
>> Jared
>> ________________________________________
>> From: Ted Dunning [ted.dunning@gmail.com]
>> Sent: Wednesday, October 17, 2012 5:21 PM
>> To: Commons Developers List
>> Subject: Re: [Math] MATH-816 (mixture model
>> distribution)=?utf-8?B?LiAgICAu?    =
>>
>> Seems fine.
>>
>> I think that the limitation to a fixed number of mixture components is a
>> bit limiting.  So is the limitation to a uniform set of components.  Both
>> limitations can be eased without a huge difficultly.
>>
>> Avoiding the fixed number of components can be done by using some variant
>> of Dirichlet processes.  Simply picking k_max relatively large and then
>> using an approximate DP over that finite set works well.
>>
>> That said, mixture models are pretty nice to have.
>>
>> On Wed, Oct 17, 2012 at 2:13 PM, Gilles Sadowski <
>> gilles@harfang.homelinux.org> wrote:
>>
>>> Hello.
>>>
>>> Any objection to commit the code as proposed on the report page?
>>>   https://issues.apache.org/jira/browse/MATH-816
>>>
>>>
>>> Regards,
>>> Gilles
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>>> For additional commands, e-mail: dev-help@commons.apache.org
>>>
>>>
>> Email Disclaimer:  www.stjude.org/emaildisclaimer
>> Consultation Disclaimer:  www.stjude.org/consultationdisclaimer
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>> For additional commands, e-mail: dev-help@commons.apache.org
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org