You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by si...@bt.com on 2013/01/05 20:44:43 UTC

HMM - baum welch and hmmpredict

Hi there, 

I've got a couple of questions about the hmm elements of Mahout. 

- when I get models that are made of NaN I guess this is telling me that the algorithm can't make a prediction? 
- I can train models with 1 hidden state, or 2 hidden states and once or twice with 3 hidden states.. but when I try to train anything more complex it always seems to come back with NaNs - even with data sets like 1 2 3 4 5 1 2 3 4 5 1 2... which in my simple minded view should work well for 4 or 5 hidden states : what am I doing wrong? 
- I have used hmmpredict to produce some... predictions! but how can I give it a sequence and then ask for the next state? Or should I simply use the code to create a custom predictor of my own?

All the best, 

Simon 


----            
Dr. Simon Thompson
Chief Researcher, Customer Experience. 
BT Research. 
BT plc. PP11J. MLBG BT Adastral Park, Martlesham Heath. 
IP5 3RE

Note : 

This email contains BT information, which may be privileged or confidential. It's meant only for the individual(s) or entity named above. If you're not the intended recipient, note that disclosing, copying, distributing or using this information is prohibited. If you've received this email in error, please let me know immediately on the email address above. Thank you.
We monitor our email system, and may record your emails. 
British Telecommunications plc
Registered office: 81 Newgate Street London EC1A 7AJ
Registered in England no: 1800000

RE: HMM - baum welch and hmmpredict

Posted by si...@bt.com.
Hello there, 

I've had a little look at the Mahout source.

I've been using the mahout command line to train my hmm and this invokes the org.apache.mahout.classifer.sequencelearning.hmm.BaumWelchTrainer class main method. 

This class then parses the command line options and calls HmmTrainer. trainBaumWelch(model, observationsArray,epsilon,maxIterations, true); 

The final parameter is "boolean scaled" - set to try this should use the log scaled algorithm logScaledBaumWelch(observedSequence,iteration,alpha,beta); 

So it looks to me like this is a problem with the logScaled implementation. 

I'll carry on having a little dig around and see what I can see. 

Simon 


-----Original Message-----
From: Ted Dunning [mailto:ted.dunning@gmail.com] 
Sent: 07 January 2013 18:49
To: user@mahout.apache.org
Subject: Re: HMM - baum welch and hmmpredict

I think that the log prob version would handle this better.

Note that even with the extra transitions, you get *very* small probs.
 Without those transitions, you are going to get underflow very quickly.
 With log probs, the system should recognize the underflow correctly without having to actually store it.  Since the log probabilities can only step a finite and bounded amount each time, they shouldn't ever get to -Inf even if that would make them happy.


On Mon, Jan 7, 2013 at 7:22 AM, <si...@bt.com> wrote:

> Hi there,
>
> Ted's insight on the synthetic data set causing underflow appears to 
> be correct.
>
> - If I train using a pattern "0 0 0 0 1 1 1 1 2 2 2 2 2 0 0 0 0 1 1 1 
> 1 2
> 2 2 2 2 .... <<repeat for 20 times or so>>" for 3 hidden states and 3 
> observable states .
>
> >mahout baumwelch -i pattern,txt -o out.txt -nh 3 -no 3
>
> I get
>
> >...Initial probabilities:
> 0 1 2
> NaN NaN NaN
> Transition matrix:
>   0 1 2
> 0 NaN NaN NaN
> 1 NaN NaN NaN
> 2 NaN NaN NaN
> Emission matrix:
>   0 1 2
> 0 NaN NaN NaN
> 1 NaN NaN NaN
> 2 NaN NaN NaN
>
>
> but if I introduce a just one transition from the 2 2 2 2 state to the 
> 1 1
> 1 1 state and from the 0 0 0 0 state to the 2 2 2 2 state :
>
> " 0 0 0 0 1 1 1 1 2 2 2 2 2 0 0 0 0 2 1 1 1 1 0 2 2 2 2 2  0 0 0 0 1 1 
> 1 1
> 2 2 2 2 2 0 0 0 0 1 1 1 1 2 2 2 2 2.... "
>
> then I get :
> >....Initial probabilities:
> 0 1 2
> 1.0 1.1123861859130155E-36 1.656560305889225E-39 Transition matrix:
>   0 1 2
> 0 0.7547069426685107 2.1036132690268842E-14 0.24529305733146825
> 1 0.19502627742281503 0.8049737225771799 5.174976422070503E-15
> 2 5.730060776423029E-13 0.2500232192495272 0.7499767807498997 Emission 
> matrix:
>   0 1 2
> 0 0.9810688873254809 0.010907170378894576 0.008023942295624495
> 1 4.0644469515616997E-7 2.1553200545010613E-8 0.9999995720021043
> 2 8.750057097530634E-5 0.9973079487546356 0.002604550674389073
>
> Which looks much more like it, and I can generate a prediction with
>
> >mahout hmmpredict -o out.txt -m newmodel.mod -l 100
>
> >cat out.txt
>
> >0 0 0 0 0 0 0 0 1 2 2 2 2 0 1 1 1 1 1 1 2 0 1 1 2 2 0 0 1 1 1 1 1 1 1 
> >1 1
> 1 1 1
> 1 2 0 0 1 1 2 2 2 2 2 2 2 2 2 2 2 0 0 0 0 0 1 1 1 1 1 1 1 2 2 0 1 1 2 
> 2 2
> 2 2 2
> 2 2 0 0 1 1 1 2 2 2 2 0 1 1 2 2 2 0 1 1
>
> Which is rational.
>
> Newbie stuff, but hope it's handy!
>
> Simon
>
> ----
> Dr. Simon Thompson
>
> ________________________________________
> From: Ted Dunning [ted.dunning@gmail.com]
> Sent: 06 January 2013 20:16
> To: user@mahout.apache.org
> Subject: Re: HMM - baum welch and hmmpredict
>
> It sounds like you are getting some numerical stability issues with 
> the training program.  With HMM's, the most common problem that leads 
> to this is numerical underflow.  I haven't looked at this in detail, 
> however, so I can't comment very knowledgeably.  It is possible that 
> the current implementation has no regularization which might lead to 
> problems for synthetic data-sets such as your counting example because 
> there are no observations for some transitions and the trainer may try 
> to represent this as -Inf in log space.
>
> I can say that the Mahout HMM implementations are a student project 
> and have not seen much run-time or critical review.  That means that 
> the probability of serious bugs in the implementation is much higher 
> than code that is heavily used such as the recommender or the math 
> library.  The student who did the work is good, but that doesn't take 
> the place of wide usage.
>
> On Sat, Jan 5, 2013 at 11:44 AM, <si...@bt.com> wrote:
>
> > Hi there,
> >
> > I've got a couple of questions about the hmm elements of Mahout.
> >
> > - when I get models that are made of NaN I guess this is telling me 
> > that the algorithm can't make a prediction?
> > - I can train models with 1 hidden state, or 2 hidden states and 
> > once or twice with 3 hidden states.. but when I try to train 
> > anything more
> complex
> > it always seems to come back with NaNs - even with data sets like 1 
> > 2 3 4
> > 5 1 2 3 4 5 1 2... which in my simple minded view should work well 
> > for 4 or 5 hidden states : what am I doing wrong?
> > - I have used hmmpredict to produce some... predictions! but how can 
> > I give it a sequence and then ask for the next state? Or should I 
> > simply
> use
> > the code to create a custom predictor of my own?
> >
> > All the best,
> >
> > Simon
> >
> >
> > ----
> > Dr. Simon Thompson
> > Chief Researcher, Customer Experience.
> > BT Research.
> > BT plc. PP11J. MLBG BT Adastral Park, Martlesham Heath.
> > IP5 3RE
> >
> > Note :
> >
> > This email contains BT information, which may be privileged or 
> > confidential. It's meant only for the individual(s) or entity named
> above.
> > If you're not the intended recipient, note that disclosing, copying, 
> > distributing or using this information is prohibited. If you've 
> > received this email in error, please let me know immediately on the 
> > email address above. Thank you.
> > We monitor our email system, and may record your emails.
> > British Telecommunications plc
> > Registered office: 81 Newgate Street London EC1A 7AJ Registered in 
> > England no: 1800000
>

Re: HMM - baum welch and hmmpredict

Posted by Ted Dunning <te...@gmail.com>.
I think that the log prob version would handle this better.

Note that even with the extra transitions, you get *very* small probs.
 Without those transitions, you are going to get underflow very quickly.
 With log probs, the system should recognize the underflow correctly
without having to actually store it.  Since the log probabilities can only
step a finite and bounded amount each time, they shouldn't ever get to -Inf
even if that would make them happy.

On Mon, Jan 7, 2013 at 7:22 AM, <si...@bt.com> wrote:

> Hi there,
>
> Ted's insight on the synthetic data set causing underflow appears to be
> correct.
>
> - If I train using a pattern "0 0 0 0 1 1 1 1 2 2 2 2 2 0 0 0 0 1 1 1 1 2
> 2 2 2 2 .... <<repeat for 20 times or so>>" for 3 hidden states and 3
> observable states .
>
> >mahout baumwelch -i pattern,txt -o out.txt -nh 3 -no 3
>
> I get
>
> >...Initial probabilities:
> 0 1 2
> NaN NaN NaN
> Transition matrix:
>   0 1 2
> 0 NaN NaN NaN
> 1 NaN NaN NaN
> 2 NaN NaN NaN
> Emission matrix:
>   0 1 2
> 0 NaN NaN NaN
> 1 NaN NaN NaN
> 2 NaN NaN NaN
>
>
> but if I introduce a just one transition from the 2 2 2 2 state to the 1 1
> 1 1 state and from the 0 0 0 0 state to the 2 2 2 2 state :
>
> " 0 0 0 0 1 1 1 1 2 2 2 2 2 0 0 0 0 2 1 1 1 1 0 2 2 2 2 2  0 0 0 0 1 1 1 1
> 2 2 2 2 2 0 0 0 0 1 1 1 1 2 2 2 2 2.... "
>
> then I get :
> >....Initial probabilities:
> 0 1 2
> 1.0 1.1123861859130155E-36 1.656560305889225E-39
> Transition matrix:
>   0 1 2
> 0 0.7547069426685107 2.1036132690268842E-14 0.24529305733146825
> 1 0.19502627742281503 0.8049737225771799 5.174976422070503E-15
> 2 5.730060776423029E-13 0.2500232192495272 0.7499767807498997
> Emission matrix:
>   0 1 2
> 0 0.9810688873254809 0.010907170378894576 0.008023942295624495
> 1 4.0644469515616997E-7 2.1553200545010613E-8 0.9999995720021043
> 2 8.750057097530634E-5 0.9973079487546356 0.002604550674389073
>
> Which looks much more like it, and I can generate a prediction with
>
> >mahout hmmpredict -o out.txt -m newmodel.mod -l 100
>
> >cat out.txt
>
> >0 0 0 0 0 0 0 0 1 2 2 2 2 0 1 1 1 1 1 1 2 0 1 1 2 2 0 0 1 1 1 1 1 1 1 1 1
> 1 1 1
> 1 2 0 0 1 1 2 2 2 2 2 2 2 2 2 2 2 0 0 0 0 0 1 1 1 1 1 1 1 2 2 0 1 1 2 2 2
> 2 2 2
> 2 2 0 0 1 1 1 2 2 2 2 0 1 1 2 2 2 0 1 1
>
> Which is rational.
>
> Newbie stuff, but hope it's handy!
>
> Simon
>
> ----
> Dr. Simon Thompson
>
> ________________________________________
> From: Ted Dunning [ted.dunning@gmail.com]
> Sent: 06 January 2013 20:16
> To: user@mahout.apache.org
> Subject: Re: HMM - baum welch and hmmpredict
>
> It sounds like you are getting some numerical stability issues with the
> training program.  With HMM's, the most common problem that leads to this
> is numerical underflow.  I haven't looked at this in detail, however, so I
> can't comment very knowledgeably.  It is possible that the current
> implementation has no regularization which might lead to problems for
> synthetic data-sets such as your counting example because there are no
> observations for some transitions and the trainer may try to represent this
> as -Inf in log space.
>
> I can say that the Mahout HMM implementations are a student project and
> have not seen much run-time or critical review.  That means that the
> probability of serious bugs in the implementation is much higher than code
> that is heavily used such as the recommender or the math library.  The
> student who did the work is good, but that doesn't take the place of wide
> usage.
>
> On Sat, Jan 5, 2013 at 11:44 AM, <si...@bt.com> wrote:
>
> > Hi there,
> >
> > I've got a couple of questions about the hmm elements of Mahout.
> >
> > - when I get models that are made of NaN I guess this is telling me that
> > the algorithm can't make a prediction?
> > - I can train models with 1 hidden state, or 2 hidden states and once or
> > twice with 3 hidden states.. but when I try to train anything more
> complex
> > it always seems to come back with NaNs - even with data sets like 1 2 3 4
> > 5 1 2 3 4 5 1 2... which in my simple minded view should work well for 4
> > or 5 hidden states : what am I doing wrong?
> > - I have used hmmpredict to produce some... predictions! but how can I
> > give it a sequence and then ask for the next state? Or should I simply
> use
> > the code to create a custom predictor of my own?
> >
> > All the best,
> >
> > Simon
> >
> >
> > ----
> > Dr. Simon Thompson
> > Chief Researcher, Customer Experience.
> > BT Research.
> > BT plc. PP11J. MLBG BT Adastral Park, Martlesham Heath.
> > IP5 3RE
> >
> > Note :
> >
> > This email contains BT information, which may be privileged or
> > confidential. It's meant only for the individual(s) or entity named
> above.
> > If you're not the intended recipient, note that disclosing, copying,
> > distributing or using this information is prohibited. If you've received
> > this email in error, please let me know immediately on the email address
> > above. Thank you.
> > We monitor our email system, and may record your emails.
> > British Telecommunications plc
> > Registered office: 81 Newgate Street London EC1A 7AJ
> > Registered in England no: 1800000
>

RE: HMM - baum welch and hmmpredict

Posted by si...@bt.com.
Hi there,

Ted's insight on the synthetic data set causing underflow appears to be correct. 

- If I train using a pattern "0 0 0 0 1 1 1 1 2 2 2 2 2 0 0 0 0 1 1 1 1 2 2 2 2 2 .... <<repeat for 20 times or so>>" for 3 hidden states and 3 observable states . 

>mahout baumwelch -i pattern,txt -o out.txt -nh 3 -no 3

I get 

>...Initial probabilities: 
0 1 2 
NaN NaN NaN 
Transition matrix:
  0 1 2 
0 NaN NaN NaN 
1 NaN NaN NaN 
2 NaN NaN NaN 
Emission matrix: 
  0 1 2 
0 NaN NaN NaN 
1 NaN NaN NaN 
2 NaN NaN NaN 


but if I introduce a just one transition from the 2 2 2 2 state to the 1 1 1 1 state and from the 0 0 0 0 state to the 2 2 2 2 state : 

" 0 0 0 0 1 1 1 1 2 2 2 2 2 0 0 0 0 2 1 1 1 1 0 2 2 2 2 2  0 0 0 0 1 1 1 1 2 2 2 2 2 0 0 0 0 1 1 1 1 2 2 2 2 2.... " 

then I get : 
>....Initial probabilities: 
0 1 2 
1.0 1.1123861859130155E-36 1.656560305889225E-39 
Transition matrix:
  0 1 2 
0 0.7547069426685107 2.1036132690268842E-14 0.24529305733146825 
1 0.19502627742281503 0.8049737225771799 5.174976422070503E-15 
2 5.730060776423029E-13 0.2500232192495272 0.7499767807498997 
Emission matrix: 
  0 1 2 
0 0.9810688873254809 0.010907170378894576 0.008023942295624495 
1 4.0644469515616997E-7 2.1553200545010613E-8 0.9999995720021043 
2 8.750057097530634E-5 0.9973079487546356 0.002604550674389073 

Which looks much more like it, and I can generate a prediction with 

>mahout hmmpredict -o out.txt -m newmodel.mod -l 100

>cat out.txt 

>0 0 0 0 0 0 0 0 1 2 2 2 2 0 1 1 1 1 1 1 2 0 1 1 2 2 0 0 1 1 1 1 1 1 1 1 1 1 1 1 
1 2 0 0 1 1 2 2 2 2 2 2 2 2 2 2 2 0 0 0 0 0 1 1 1 1 1 1 1 2 2 0 1 1 2 2 2 2 2 2 
2 2 0 0 1 1 1 2 2 2 2 0 1 1 2 2 2 0 1 1 

Which is rational. 

Newbie stuff, but hope it's handy! 

Simon 

----
Dr. Simon Thompson

________________________________________
From: Ted Dunning [ted.dunning@gmail.com]
Sent: 06 January 2013 20:16
To: user@mahout.apache.org
Subject: Re: HMM - baum welch and hmmpredict

It sounds like you are getting some numerical stability issues with the
training program.  With HMM's, the most common problem that leads to this
is numerical underflow.  I haven't looked at this in detail, however, so I
can't comment very knowledgeably.  It is possible that the current
implementation has no regularization which might lead to problems for
synthetic data-sets such as your counting example because there are no
observations for some transitions and the trainer may try to represent this
as -Inf in log space.

I can say that the Mahout HMM implementations are a student project and
have not seen much run-time or critical review.  That means that the
probability of serious bugs in the implementation is much higher than code
that is heavily used such as the recommender or the math library.  The
student who did the work is good, but that doesn't take the place of wide
usage.

On Sat, Jan 5, 2013 at 11:44 AM, <si...@bt.com> wrote:

> Hi there,
>
> I've got a couple of questions about the hmm elements of Mahout.
>
> - when I get models that are made of NaN I guess this is telling me that
> the algorithm can't make a prediction?
> - I can train models with 1 hidden state, or 2 hidden states and once or
> twice with 3 hidden states.. but when I try to train anything more complex
> it always seems to come back with NaNs - even with data sets like 1 2 3 4
> 5 1 2 3 4 5 1 2... which in my simple minded view should work well for 4
> or 5 hidden states : what am I doing wrong?
> - I have used hmmpredict to produce some... predictions! but how can I
> give it a sequence and then ask for the next state? Or should I simply use
> the code to create a custom predictor of my own?
>
> All the best,
>
> Simon
>
>
> ----
> Dr. Simon Thompson
> Chief Researcher, Customer Experience.
> BT Research.
> BT plc. PP11J. MLBG BT Adastral Park, Martlesham Heath.
> IP5 3RE
>
> Note :
>
> This email contains BT information, which may be privileged or
> confidential. It's meant only for the individual(s) or entity named above.
> If you're not the intended recipient, note that disclosing, copying,
> distributing or using this information is prohibited. If you've received
> this email in error, please let me know immediately on the email address
> above. Thank you.
> We monitor our email system, and may record your emails.
> British Telecommunications plc
> Registered office: 81 Newgate Street London EC1A 7AJ
> Registered in England no: 1800000

Re: HMM - baum welch and hmmpredict

Posted by Ted Dunning <te...@gmail.com>.
On Sun, Jan 6, 2013 at 1:35 PM, <si...@bt.com> wrote:

> Hi,
>
> I've been using the standalone trainer.
>
> I'll have a look at the log scaled trainers - thanks for the tip!
>
>
Log scaling is absolutely required.  Otherwise, you start dealing with
numerical underflow amazingly quickly.

RE: HMM - baum welch and hmmpredict

Posted by si...@bt.com.
Hi, 

I've been using the standalone trainer. 

I'll have a look at the log scaled trainers - thanks for the tip! 

Best, 

Simon
----
Dr. Simon Thompson
Chief Researcher, Customer Experience.
BT Research.
BT plc. PP11J. MLBG BT Adastral Park, Martlesham Heath.
IP5 3RE

Note :

This email contains BT information, which may be privileged or confidential. It's meant only for the individual(s) or entity named above. If you're not the intended recipient, note that disclosing, copying, distributing or using this information is prohibited. If you've received this email in error, please let me know immediately on the email address above. Thank you.
We monitor our email system, and may record your emails.
British Telecommunications plc
Registered office: 81 Newgate Street London EC1A 7AJ
Registered in England no: 1800000
________________________________________
From: Dhruv Kumar [dhruv21@gmail.com]
Sent: 06 January 2013 21:06
To: user@mahout.apache.org
Subject: Re: HMM - baum welch and hmmpredict

Hi Simon,

Are you using the standalone HMM trainer or are you running with the MapReduce variant using the patch available at https://issues.apache.org/jira/browse/MAHOUT-627?

As Ted mentioned, these trainers can experience arithmetic underflow when the set of states is large. Did you try the log scaled APIs for the Baum Welch trainer? The log scaled versions are more immune to underflows.

-Dhruv

On Jan 6, 2013, at 12:34 PM, simon.2.thompson@bt.com wrote:

> Hi Ted,
>
> thanks very much for the response, very helpful to hear these thoughts.
>
> What I will do is look at the data set issue and report back as to what I find out. I'll prod round the code and see if I can get a clue as to how it produces infinities and so on.
>
> I think that one of the Mahout algorithms (DF) does use NaN for "undecidable"
>
> (ref) http://mail-archives.apache.org/mod_mbox/mahout-dev/201206.mbox/%3C824188178.43658.1340361882497.JavaMail.jiratomcat@issues-vm%3E
>
> So perhaps there is a long term need to think through the output semantics of the library?
>
> I ran an open source project (Zeus Agents - still on source forge! but antique) for many years before it faded, so I know that random suggestions with no technical input is fairly unhelpful, but give me some time and I'll try and come back with something more useful!
>
> Best,
>
> Simon
>
> ----
> Dr. Simon Thompson
> Chief Researcher, Customer Experience.
> BT Research.
> BT plc. PP11J. MLBG BT Adastral Park, Martlesham Heath.
> IP5 3RE
>
> Note :
>
> This email contains BT information, which may be privileged or confidential. It's meant only for the individual(s) or entity named above. If you're not the intended recipient, note that disclosing, copying, distributing or using this information is prohibited. If you've received this email in error, please let me know immediately on the email address above. Thank you.
> We monitor our email system, and may record your emails.
> British Telecommunications plc
> Registered office: 81 Newgate Street London EC1A 7AJ
> Registered in England no: 1800000
> ________________________________________
> From: Ted Dunning [ted.dunning@gmail.com]
> Sent: 06 January 2013 20:16
> To: user@mahout.apache.org
> Subject: Re: HMM - baum welch and hmmpredict
>
> It sounds like you are getting some numerical stability issues with the
> training program.  With HMM's, the most common problem that leads to this
> is numerical underflow.  I haven't looked at this in detail, however, so I
> can't comment very knowledgeably.  It is possible that the current
> implementation has no regularization which might lead to problems for
> synthetic data-sets such as your counting example because there are no
> observations for some transitions and the trainer may try to represent this
> as -Inf in log space.
>
> I can say that the Mahout HMM implementations are a student project and
> have not seen much run-time or critical review.  That means that the
> probability of serious bugs in the implementation is much higher than code
> that is heavily used such as the recommender or the math library.  The
> student who did the work is good, but that doesn't take the place of wide
> usage.
>
> On Sat, Jan 5, 2013 at 11:44 AM, <si...@bt.com> wrote:
>
>> Hi there,
>>
>> I've got a couple of questions about the hmm elements of Mahout.
>>
>> - when I get models that are made of NaN I guess this is telling me that
>> the algorithm can't make a prediction?
>> - I can train models with 1 hidden state, or 2 hidden states and once or
>> twice with 3 hidden states.. but when I try to train anything more complex
>> it always seems to come back with NaNs - even with data sets like 1 2 3 4
>> 5 1 2 3 4 5 1 2... which in my simple minded view should work well for 4
>> or 5 hidden states : what am I doing wrong?
>> - I have used hmmpredict to produce some... predictions! but how can I
>> give it a sequence and then ask for the next state? Or should I simply use
>> the code to create a custom predictor of my own?
>>
>> All the best,
>>
>> Simon
>>
>>
>> ----
>> Dr. Simon Thompson
>> Chief Researcher, Customer Experience.
>> BT Research.
>> BT plc. PP11J. MLBG BT Adastral Park, Martlesham Heath.
>> IP5 3RE
>>
>> Note :
>>
>> This email contains BT information, which may be privileged or
>> confidential. It's meant only for the individual(s) or entity named above.
>> If you're not the intended recipient, note that disclosing, copying,
>> distributing or using this information is prohibited. If you've received
>> this email in error, please let me know immediately on the email address
>> above. Thank you.
>> We monitor our email system, and may record your emails.
>> British Telecommunications plc
>> Registered office: 81 Newgate Street London EC1A 7AJ
>> Registered in England no: 1800000


Re: HMM - baum welch and hmmpredict

Posted by Dhruv Kumar <dh...@gmail.com>.
Hi Simon,

Are you using the standalone HMM trainer or are you running with the MapReduce variant using the patch available at https://issues.apache.org/jira/browse/MAHOUT-627?

As Ted mentioned, these trainers can experience arithmetic underflow when the set of states is large. Did you try the log scaled APIs for the Baum Welch trainer? The log scaled versions are more immune to underflows.

-Dhruv

On Jan 6, 2013, at 12:34 PM, simon.2.thompson@bt.com wrote:

> Hi Ted, 
> 
> thanks very much for the response, very helpful to hear these thoughts. 
> 
> What I will do is look at the data set issue and report back as to what I find out. I'll prod round the code and see if I can get a clue as to how it produces infinities and so on.
> 
> I think that one of the Mahout algorithms (DF) does use NaN for "undecidable" 
> 
> (ref) http://mail-archives.apache.org/mod_mbox/mahout-dev/201206.mbox/%3C824188178.43658.1340361882497.JavaMail.jiratomcat@issues-vm%3E
> 
> So perhaps there is a long term need to think through the output semantics of the library? 
> 
> I ran an open source project (Zeus Agents - still on source forge! but antique) for many years before it faded, so I know that random suggestions with no technical input is fairly unhelpful, but give me some time and I'll try and come back with something more useful! 
> 
> Best,
> 
> Simon
> 
> ----
> Dr. Simon Thompson
> Chief Researcher, Customer Experience.
> BT Research.
> BT plc. PP11J. MLBG BT Adastral Park, Martlesham Heath.
> IP5 3RE
> 
> Note :
> 
> This email contains BT information, which may be privileged or confidential. It's meant only for the individual(s) or entity named above. If you're not the intended recipient, note that disclosing, copying, distributing or using this information is prohibited. If you've received this email in error, please let me know immediately on the email address above. Thank you.
> We monitor our email system, and may record your emails.
> British Telecommunications plc
> Registered office: 81 Newgate Street London EC1A 7AJ
> Registered in England no: 1800000
> ________________________________________
> From: Ted Dunning [ted.dunning@gmail.com]
> Sent: 06 January 2013 20:16
> To: user@mahout.apache.org
> Subject: Re: HMM - baum welch and hmmpredict
> 
> It sounds like you are getting some numerical stability issues with the
> training program.  With HMM's, the most common problem that leads to this
> is numerical underflow.  I haven't looked at this in detail, however, so I
> can't comment very knowledgeably.  It is possible that the current
> implementation has no regularization which might lead to problems for
> synthetic data-sets such as your counting example because there are no
> observations for some transitions and the trainer may try to represent this
> as -Inf in log space.
> 
> I can say that the Mahout HMM implementations are a student project and
> have not seen much run-time or critical review.  That means that the
> probability of serious bugs in the implementation is much higher than code
> that is heavily used such as the recommender or the math library.  The
> student who did the work is good, but that doesn't take the place of wide
> usage.
> 
> On Sat, Jan 5, 2013 at 11:44 AM, <si...@bt.com> wrote:
> 
>> Hi there,
>> 
>> I've got a couple of questions about the hmm elements of Mahout.
>> 
>> - when I get models that are made of NaN I guess this is telling me that
>> the algorithm can't make a prediction?
>> - I can train models with 1 hidden state, or 2 hidden states and once or
>> twice with 3 hidden states.. but when I try to train anything more complex
>> it always seems to come back with NaNs - even with data sets like 1 2 3 4
>> 5 1 2 3 4 5 1 2... which in my simple minded view should work well for 4
>> or 5 hidden states : what am I doing wrong?
>> - I have used hmmpredict to produce some... predictions! but how can I
>> give it a sequence and then ask for the next state? Or should I simply use
>> the code to create a custom predictor of my own?
>> 
>> All the best,
>> 
>> Simon
>> 
>> 
>> ----
>> Dr. Simon Thompson
>> Chief Researcher, Customer Experience.
>> BT Research.
>> BT plc. PP11J. MLBG BT Adastral Park, Martlesham Heath.
>> IP5 3RE
>> 
>> Note :
>> 
>> This email contains BT information, which may be privileged or
>> confidential. It's meant only for the individual(s) or entity named above.
>> If you're not the intended recipient, note that disclosing, copying,
>> distributing or using this information is prohibited. If you've received
>> this email in error, please let me know immediately on the email address
>> above. Thank you.
>> We monitor our email system, and may record your emails.
>> British Telecommunications plc
>> Registered office: 81 Newgate Street London EC1A 7AJ
>> Registered in England no: 1800000


Re: HMM - baum welch and hmmpredict

Posted by Ted Dunning <te...@gmail.com>.
On Sun, Jan 6, 2013 at 12:34 PM, <si...@bt.com> wrote:

> I think that one of the Mahout algorithms (DF) does use NaN for
> "undecidable"
>

Yes.  But I don't think the HMM codes do.


>  So perhaps there is a long term need to think through the output
> semantics of the library?
>

Yes. And no.

Yes, it would be good for the HMM code, but not necessarily.

Large scale HMM's have serious problems with convergence with simple
algorithms.  Basically, you have a problem of diffusion of the solution
from the boundary conditions.

It is likely that entirely different approaches will be necessary at truly
large scale.  See Googles deep learning of language models, for instance.
 Coupled with the lack of a well-known public use case, this has meant that
the development of these algorithms in Mahout is still very rudimentary and
is likely to remain so because the focus of attention is mostly elsewhere.


> I ran an open source project (Zeus Agents - still on source forge! but
> antique) for many years before it faded, so I know that random suggestions
> with no technical input is fairly unhelpful, but give me some time and I'll
> try and come back with something more useful!
>

Well, comments are certainly helpful as well.

Willing hands are even better!

RE: HMM - baum welch and hmmpredict

Posted by si...@bt.com.
Hi Ted, 

thanks very much for the response, very helpful to hear these thoughts. 

What I will do is look at the data set issue and report back as to what I find out. I'll prod round the code and see if I can get a clue as to how it produces infinities and so on.

I think that one of the Mahout algorithms (DF) does use NaN for "undecidable" 

(ref) http://mail-archives.apache.org/mod_mbox/mahout-dev/201206.mbox/%3C824188178.43658.1340361882497.JavaMail.jiratomcat@issues-vm%3E

So perhaps there is a long term need to think through the output semantics of the library? 

I ran an open source project (Zeus Agents - still on source forge! but antique) for many years before it faded, so I know that random suggestions with no technical input is fairly unhelpful, but give me some time and I'll try and come back with something more useful! 

Best,

Simon

----
Dr. Simon Thompson
Chief Researcher, Customer Experience.
BT Research.
BT plc. PP11J. MLBG BT Adastral Park, Martlesham Heath.
IP5 3RE

Note :

This email contains BT information, which may be privileged or confidential. It's meant only for the individual(s) or entity named above. If you're not the intended recipient, note that disclosing, copying, distributing or using this information is prohibited. If you've received this email in error, please let me know immediately on the email address above. Thank you.
We monitor our email system, and may record your emails.
British Telecommunications plc
Registered office: 81 Newgate Street London EC1A 7AJ
Registered in England no: 1800000
________________________________________
From: Ted Dunning [ted.dunning@gmail.com]
Sent: 06 January 2013 20:16
To: user@mahout.apache.org
Subject: Re: HMM - baum welch and hmmpredict

It sounds like you are getting some numerical stability issues with the
training program.  With HMM's, the most common problem that leads to this
is numerical underflow.  I haven't looked at this in detail, however, so I
can't comment very knowledgeably.  It is possible that the current
implementation has no regularization which might lead to problems for
synthetic data-sets such as your counting example because there are no
observations for some transitions and the trainer may try to represent this
as -Inf in log space.

I can say that the Mahout HMM implementations are a student project and
have not seen much run-time or critical review.  That means that the
probability of serious bugs in the implementation is much higher than code
that is heavily used such as the recommender or the math library.  The
student who did the work is good, but that doesn't take the place of wide
usage.

On Sat, Jan 5, 2013 at 11:44 AM, <si...@bt.com> wrote:

> Hi there,
>
> I've got a couple of questions about the hmm elements of Mahout.
>
> - when I get models that are made of NaN I guess this is telling me that
> the algorithm can't make a prediction?
> - I can train models with 1 hidden state, or 2 hidden states and once or
> twice with 3 hidden states.. but when I try to train anything more complex
> it always seems to come back with NaNs - even with data sets like 1 2 3 4
> 5 1 2 3 4 5 1 2... which in my simple minded view should work well for 4
> or 5 hidden states : what am I doing wrong?
> - I have used hmmpredict to produce some... predictions! but how can I
> give it a sequence and then ask for the next state? Or should I simply use
> the code to create a custom predictor of my own?
>
> All the best,
>
> Simon
>
>
> ----
> Dr. Simon Thompson
> Chief Researcher, Customer Experience.
> BT Research.
> BT plc. PP11J. MLBG BT Adastral Park, Martlesham Heath.
> IP5 3RE
>
> Note :
>
> This email contains BT information, which may be privileged or
> confidential. It's meant only for the individual(s) or entity named above.
> If you're not the intended recipient, note that disclosing, copying,
> distributing or using this information is prohibited. If you've received
> this email in error, please let me know immediately on the email address
> above. Thank you.
> We monitor our email system, and may record your emails.
> British Telecommunications plc
> Registered office: 81 Newgate Street London EC1A 7AJ
> Registered in England no: 1800000

Re: HMM - baum welch and hmmpredict

Posted by Ted Dunning <te...@gmail.com>.
It sounds like you are getting some numerical stability issues with the
training program.  With HMM's, the most common problem that leads to this
is numerical underflow.  I haven't looked at this in detail, however, so I
can't comment very knowledgeably.  It is possible that the current
implementation has no regularization which might lead to problems for
synthetic data-sets such as your counting example because there are no
observations for some transitions and the trainer may try to represent this
as -Inf in log space.

I can say that the Mahout HMM implementations are a student project and
have not seen much run-time or critical review.  That means that the
probability of serious bugs in the implementation is much higher than code
that is heavily used such as the recommender or the math library.  The
student who did the work is good, but that doesn't take the place of wide
usage.

On Sat, Jan 5, 2013 at 11:44 AM, <si...@bt.com> wrote:

> Hi there,
>
> I've got a couple of questions about the hmm elements of Mahout.
>
> - when I get models that are made of NaN I guess this is telling me that
> the algorithm can't make a prediction?
> - I can train models with 1 hidden state, or 2 hidden states and once or
> twice with 3 hidden states.. but when I try to train anything more complex
> it always seems to come back with NaNs - even with data sets like 1 2 3 4
> 5 1 2 3 4 5 1 2... which in my simple minded view should work well for 4
> or 5 hidden states : what am I doing wrong?
> - I have used hmmpredict to produce some... predictions! but how can I
> give it a sequence and then ask for the next state? Or should I simply use
> the code to create a custom predictor of my own?
>
> All the best,
>
> Simon
>
>
> ----
> Dr. Simon Thompson
> Chief Researcher, Customer Experience.
> BT Research.
> BT plc. PP11J. MLBG BT Adastral Park, Martlesham Heath.
> IP5 3RE
>
> Note :
>
> This email contains BT information, which may be privileged or
> confidential. It's meant only for the individual(s) or entity named above.
> If you're not the intended recipient, note that disclosing, copying,
> distributing or using this information is prohibited. If you've received
> this email in error, please let me know immediately on the email address
> above. Thank you.
> We monitor our email system, and may record your emails.
> British Telecommunications plc
> Registered office: 81 Newgate Street London EC1A 7AJ
> Registered in England no: 1800000