You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@mahout.apache.org by Svetlomir Kasabov <sk...@smail.inf.fh-brs.de> on 2011/07/24 15:25:10 UTC

HMM investigations

Hello,

I consider using Mahout's implementation for Hidden Markov model (HMM) 
for prediction, but I want to clarify some important questions before 
using it:


1. I've read some literature about HMMs and in some sorces is written, 
that HMMs can also handle continiuous values as input (and not only 
discrete values). Can Mahout's implementation also handle such values ? 
My input data is only continious.

2. Can Mahouts HMM have many hidden markov chains? I don't know, if I 
use the right terminology, but what I need is this HMM "architecture":

X1----X1----X1----...X1  (Markov Chain for input parameter 1 => 
monitoring X1's changes over time)

X2----X2----X2----...X2  (Markov Chain for intput parameter 2 => 
monitoring X2's changes over time)

Y-----Y-----Y-----...Y   (Output value's changes over time)

I think this architecture would allow me to train and predict output Y 
based on inputs X1 and X2.

3. Can we get output probabilities from the HMM for a concrete state?

Many thanks in advance!

Best regards,

Svetlomir.

Re: HMM investigations

Posted by Svetlomir Kasabov <sk...@smail.inf.fh-brs.de>.

Thanks, Ted,

yes, I will prepare it and post it in c.a. 10 minutes.

Am 24.07.2011 21:15, schrieb Ted Dunning:
> I remember this problem.
>
> Is it possible for you to post some sample data?
>
> On Sun, Jul 24, 2011 at 12:08 PM, Svetlomir Kasabov<
> skasab2s@smail.inf.fh-brs.de>  wrote:
>
>> Hello again and thanks for the replies of both of you, I really apreciate
>> them. The most important think is, that you try helping and how you do this
>> is irrelevant :). I didn't feel angry/insulted.
>>
>>
>> Yes, X1 and X2 are two independent hidden sequences, like
>>
>> BP -- BP -- BP (Blood Pressure)
>> HR -- HR -- HR (Heart Rate)
>> And I want to train the model to predict the probability of giving a drug Y
>> to a patient (for example, with this sequence)
>> Y=0 -- Y=0 -- Y=1
>>
>> I already tried this with logistic regression, but ended with poor results
>> (probably because of my small example set). Logistic regression has also no
>> built-in time series and that's why Imust analyze the X's changes using
>> percentiles and then train the logistic model with these percentiles. In
>> this way I reduce the dimensions to only one. That's why I thought that the
>> HMM can do this for me 'out of the box', staying in the dimension of 2, if
>> they allow to have two hidden chains, like this:
>>
>> http://t3.gstatic.com/images?**q=tbn:ANd9GcR8pu4bSm-MSyg3Pj0-**
>> aTyi8FaqUOy4U2bcKJBTBYKKvgAhyw**6P<http://t3.gstatic.com/images?q=tbn:ANd9GcR8pu4bSm-MSyg3Pj0-aTyi8FaqUOy4U2bcKJBTBYKKvgAhyw6P>
>>
>> or 'coupled' HMMs.
>>
>> I am not very experienced with the HMMs, but will read further the
>> literature and Mahout's API :).
>>
>> Maybe reducing the dimensions is not that bad idea? I've read that we can
>> do it with PCA (Principle Components Analysis). Is there a Ḿahout code for
>> this somewhere?
>>
>> Thanks a lot once again,
>>
>> Svetlomir.
>>
>>
>>
>> Am 24.07.2011 20:46, schrieb Ted Dunning:
>>
>>   My impression (and Svetlomir should correct me) is that the intent was to
>>> use two HMM's on separate inputs and then use the decoded state sequences
>>> from those as inputs to a third HMM.
>>>
>>> If that is the question, then I think that Mahout's HMM's are sufficiently
>>> object-ish that this should work.  Obviously, it will take multiple
>>> training
>>> passes to train each separate model.
>>>
>>> On Sun, Jul 24, 2011 at 11:25 AM, Dhruv<dh...@gmail.com>   wrote:
>>>
>>>   Svetlomir and Ted -- I was not trying to be rude, sorry if I came across
>>>> that way because of my exuberance. I apologize.
>>>>
>>>> I was eager to help and may have acted too fast and misunderstood the
>>>> question, so I turn to both of you for a little clarification.
>>>>
>>>> I'm confused whether the X's refer to the hidden states, or training
>>>> instances. Since the hidden sequence is always a Markov Chain in HMMs, I
>>>> assumed that Svetlomir meant that X1 and X2 were two separate hidden
>>>> state
>>>> sequences because Markov Chain was explicitly mentioned in his original
>>>> question. To quote:
>>>>
>>>> -----------
>>>> X1----X1----X1----...X1  (Markov Chain for input parameter 1 =>
>>>>   monitoring
>>>> X1's changes over time)
>>>>
>>>> X2----X2----X2----...X2  (Markov Chain for intput parameter 2 =>
>>>>   monitoring
>>>> X2's changes over time)
>>>> -----------
>>>>
>>>> Further, since X1 and X2 were not slated to have any relationship with
>>>> each
>>>> other and since they were the observations of two different parameters, I
>>>> construed that X1 and X2 represented two separate hidden state sequences.
>>>> I
>>>> gathered that the hidden state sequences X1 and X2 are drawn from two
>>>> disjoint hidden vocabulary sets. The user wants to discover the model on
>>>> some training set and then, to the trained model, feed Y for decoding to
>>>> arrive at the most likely sequence of states, X1 and X2 which emitted Y.
>>>>
>>>> In my answer, I continued with this line saying that in one training, you
>>>> can't arrive at two separate models for X1 and X2 which contain the
>>>> requisite distributions which can be used for decoding, say sequences of
>>>> X1
>>>> to have produced Y or sequence of X2 to have produced Y. Hence, I
>>>> suggested
>>>> having only one set for the hidden states, combining X1s and X2s and then
>>>> train the model on it. Given the domain of application, this may or may
>>>> not
>>>> make sense, hence I was doubtful of formulating the problem as HMM and
>>>> suggested alternatives.
>>>>
>>>> However:
>>>>
>>>> If X's are two separate input sequences for training, then yes, the
>>>> current
>>>> implementation is capable of training the HMM. If Y is the output, then
>>>> one
>>>> can decode, after training, the sequence of hidden states which most
>>>> likely
>>>> produced Y.
>>>>
>>>> For the output probability question, my answer was to use the trained
>>>> model's HmmModel.getEmissionMatrix.**get(hiddenState, emittedState)
>>>> method to
>>>> compute the output probability for a particular hidden state. I believe
>>>> this
>>>> is not what the user wanted?
>>>>
>>>>
>>>> Dhruv
>>>>
>>>> On Sun, Jul 24, 2011 at 12:56 PM, Ted Dunning<te...@gmail.com>
>>>> wrote:
>>>>
>>>>   On Sun, Jul 24, 2011 at 7:52 AM, Dhruv<dh...@gmail.com>   wrote:
>>>>>   ... If you look into the *definition* of HMM,  the hidden sequence is
>>>>> drawn
>>>>>
>>>>>> from
>>>>>> only one set. The hidden sequence's transitions can be expressed as a
>>>>>>
>>>>> joint
>>>>>
>>>>>> probability p(s0, s1). Similarly the observed sequence has a joint
>>>>>> distribution with the hidden sequence such as p(y0, s1) and so on.
>>>>>>
>>>>>>   I think gentler language might be a good idea here.  The question was
>>>>> not
>>>>> at
>>>>> all unreasonable.
>>>>>
>>>>>
>>>>>   The hidden state transitions follow the Markov memorylessness property
>>>>> and
>>>>>
>>>>>> hence form a Markov Chain.
>>>>>>
>>>>>> In your case, you are trying to model your problem assuming that there
>>>>>>
>>>>> are
>>>>>
>>>>>> two underlying state sequences affecting the observed output. This
>>>>>>
>>>>> doesn't
>>>>>
>>>>>> fit into the HMM's definition and you probably want something else.
>>>>>>
>>>>>>   Actually, what the original poster wanted is quite sensible.  While
>>>>> the
>>>>> output sequence is due to a single input sequence, that input sequence
>>>>> is
>>>>> not observable.  As such, we have a noisy channel problem where we want
>>>>>
>>>> to
>>>>
>>>>> estimate something about that original sequence.  The point of the
>>>>> Markov
>>>>> model is that it defines a distribution of output sequence given an
>>>>> input
>>>>> sequence (and model).  This distribution can be inverted so that given a
>>>>> particular output sequence, we can estimate the probability distribution
>>>>>
>>>> of
>>>>
>>>>> input sequences conditional on the output.
>>>>>
>>>>> The typical decoding algorithm for HMM's estimates only the maximum
>>>>> likelihood input sequence but this does not negate the fact that we have
>>>>>
>>>> a
>>>>
>>>>> distribution.  There are alternative decoding algorithms that allow a
>>>>> set
>>>>> of
>>>>> high probability sequences to be estimated or allow a partial
>>>>> probability
>>>>> lattice to be output that allows alternative sequences to be probed.
>>>>>
>>>>> If you do want to fit your problem into the HMM framework, you need to
>>>>>
>>>>>> condense the X1 and X2 sequences into a single set and then condition
>>>>>>
>>>>> the
>>>>> Ys
>>>>>> on it.
>>>>>>
>>>>>>   Not at all.
>>>>>
>>>>>   3. Can we get output probabilities from the HMM for a concrete state?
>>>>>>>   Yes, after training, you can retrieve any of the trained model's
>>>>>> distributions as a Mahout Matrix type and use get(row, col).
>>>>>>
>>>>>>   This is not quite what the question was.
>>>>>

Re: HMM investigations

Posted by Ted Dunning <te...@gmail.com>.

OK. I munched on this data slightly and uploaded it to s3.  See
https://s3.amazonaws.com/mahout.data/heart-rate/index.html for the analysis
described here.

To clean it up and make it slightly more accessible for folks who use .
instead of ,, I reformatted the dates to ISO format and added an id field to
distinguish different cases.  This is in the format that R can read with
this statement:

    x = read.delim("http://s3.amazonaws.com/mahout.data/heart-rate/raw.tsv")

I pulled this into R and had a look.  The basic results are in the plot
file https://s3.amazonaws.com/mahout.data/heart-rate/plot1.png

You can produce this plot using this R command:

    plot(x, bg=c("red", "white")[x$id], pch=21)

My assumptions here are that we have data from two different subjects at 1
minute intervals and that the target for prediction is the 'instable' field
and the fields and HR, SAP, MAP, ShockIndex are fair game as predictors.  I
have added a field delta which is simply SAP-MAP.  My guess is that these
fields are

HR*  ... Heart Rate*
SAP ... *Systolic Arterial Pressure*
MAP ... *Mean Arterial Pressure*
ShockIndex ... *SAP / HR*

Based on the plot produced here, it is clear that if you look at the data
for the entire time period, these two subjects are easily separable.  If you
look at the plots versus i (in the second column from the left), you can see
a number of interesting things.  For one thing, the red patient has much
higher heart rate.  This makes separation relatively trivial, at least for
these two patients.  For another, there is an event where SAP, MAP and
ShockIndex all take a big jump about 3 minutes into the data.

My guess is that the signal you want are events like this.  It is also clear
that this event is not visible in the HR signal.  It may be that you want
these signals to persist for a period of time after the event.

As such, I would recommend (again) using logistic regression with features
including the min and max for heart rate, SAP, MAP and ShockIndex for the
trailing 1, 5, 10 and 15 minute time periods.

If you would like to post more data (possibly lots more), I would be happy
to demonstrate the analysis that I am suggesting.

On Sun, Jul 24, 2011 at 3:39 PM, Dhruv <dh...@gmail.com> wrote:

> Hi Svetlomir,
>
> Thanks for the clarification.
>
> Since in your case the HR, MAP, SAP etc are the hidden variables but they
> are completely observable, you can just count the transitions, emissions
> etc
> to arrive at the required probability distributions.
>
> Is there a previous thread on Mahout where you have discussed this problem?
> I need to understand your requirements about what exactly are you trying to
> predict, the cause and effect relationship etc. That information can help
> model the problem in a way which is more amenable to Mahout's HMM trainers.
>
>
> On Sun, Jul 24, 2011 at 3:44 PM, Svetlomir Kasabov <
> skasab2s@smail.inf.fh-brs.de> wrote:
>
> > So, that is my sample data. The column "instable" is the outcome
> variable,
> > HR, SAP, MAP etc. is the minute-by-minute raw data. From these I
> extracted
> > derived features (using percentiles) and created a training example with
> the
> > data from i=1 to i=25 with instable = yes/no and so on...
> >
> >
> > Thank you.
> >
> >
> >
> >
> > i       instable        HR      SAP     MAP     ShockIndex      tStamp
> > 1       yes     114,0   87,0    74,0    1,5405405405405406
> >  14.Mrz.10,_11:29:00
> > 2       yes     113,0   89,0    70,0    1,6142857142857143
> >  14.Mrz.10,_11:30:00
> > 3       yes     110,0   145,0   116,0   0,9482758620689655
> >  14.Mrz.10,_11:31:00
> > 4       yes     109,0   202,0   201,0   0,5422885572139303
> >  14.Mrz.10,_11:32:00
> > 5       yes     111,0   207,0   205,0   0,5414634146341464
> >  14.Mrz.10,_11:33:00
> > 6       yes     109,0   209,0   208,0   0,5240384615384616
> >  14.Mrz.10,_11:34:00
> > 7       yes     112,0   144,0   116,0   0,9655172413793104
> >  14.Mrz.10,_11:35:00
> > 8       yes     111,0   112,0   87,0    1,2758620689655173
> >  14.Mrz.10,_11:36:00
> > 9       yes     111,0   105,0   84,0    1,3214285714285714
> >  14.Mrz.10,_11:37:00
> > 10      yes     111,0   102,0   73,0    1,5205479452054795
> >  14.Mrz.10,_11:38:00
> > 11      yes     111,0   103,0   72,0    1,5416666666666667
> >  14.Mrz.10,_11:39:00
> > 12      yes     115,0   94,0    74,0    1,554054054054054
> > 14.Mrz.10,_11:40:00
> > 13      yes     113,0   91,0    67,0    1,6865671641791045
> >  14.Mrz.10,_11:41:00
> > 14      yes     109,0   124,0   101,0   1,0792079207920793
> >  14.Mrz.10,_11:42:00
> > 15      yes     109,0   147,0   123,0   0,8861788617886179
> >  14.Mrz.10,_11:43:00
> > 16      yes     110,0   93,0    69,0    1,5942028985507246
> >  14.Mrz.10,_11:44:00
> > 17      yes     108,0   91,0    74,0    1,4594594594594594
> >  14.Mrz.10,_11:45:00
> > 18      yes     109,0   83,0    69,0    1,5797101449275361
> >  14.Mrz.10,_11:46:00
> > 19      yes     110,0   94,0    70,0    1,5714285714285714
> >  14.Mrz.10,_11:47:00
> > 20      yes     109,0   104,0   73,0    1,4931506849315068
> >  14.Mrz.10,_11:48:00
> > 21      yes     107,0   103,0   68,0    1,5735294117647058
> >  14.Mrz.10,_11:49:00
> > 22      yes     109,0   94,0    69,0    1,5797101449275361
> >  14.Mrz.10,_11:50:00
> > 23      yes     108,0   90,0    66,0    1,6363636363636365
> >  14.Mrz.10,_11:51:00
> > 24      yes     109,0   97,0    73,0    1,4931506849315068
> >  14.Mrz.10,_11:52:00
> > 25      yes     110,0   105,0   73,0    1,5068493150684932
> >  14.Mrz.10,_11:53:00
> >
> >
> > 1       no      84,0    138,0   87,0    0,9655172413793104
> >  22.Dez.10,_04:10:00
> > 2       no      83,0    139,0   87,0    0,9540229885057471
> >  22.Dez.10,_04:11:00
> > 3       no      80,0    142,0   89,0    0,898876404494382
> > 22.Dez.10,_04:12:00
> > 4       no      82,0    142,0   87,0    0,9425287356321839
> >  22.Dez.10,_04:13:00
> > 5       no      81,0    140,0   87,0    0,9310344827586207
> >  22.Dez.10,_04:14:00
> > 6       no      77,0    138,0   85,0    0,9058823529411765
> >  22.Dez.10,_04:15:00
> > 7       no      80,0    143,0   89,0    0,898876404494382
> > 22.Dez.10,_04:16:00
> > 8       no      75,0    139,0   87,0    0,8620689655172413
> >  22.Dez.10,_04:17:00
> > 9       no      79,0    137,0   84,0    0,9404761904761905
> >  22.Dez.10,_04:18:00
> > 10      no      81,0    143,0   89,0    0,9101123595505618
> >  22.Dez.10,_04:19:00
> > 11      no      82,0    142,0   91,0    0,9010989010989011
> >  22.Dez.10,_04:20:00
> > 12      no      80,0    142,0   88,0    0,9090909090909091
> >  22.Dez.10,_04:21:00
> > 13      no      79,0    146,0   90,0    0,8777777777777778
> >  22.Dez.10,_04:22:00
> > 14      no      83,0    151,0   94,0    0,8829787234042553
> >  22.Dez.10,_04:23:00
> > 15      no      78,0    146,0   90,0    0,8666666666666667
> >  22.Dez.10,_04:24:00
> > 16      no      80,0    143,0   89,0    0,898876404494382
> > 22.Dez.10,_04:25:00
> > 17      no      81,0    143,0   88,0    0,9204545454545454
> >  22.Dez.10,_04:26:00
> > 18      no      79,0    143,0   88,0    0,8977272727272727
> >  22.Dez.10,_04:27:00
> > 19      no      85,0    145,0   90,0    0,9444444444444444
> >  22.Dez.10,_04:28:00
> > 20      no      82,0    138,0   88,0    0,9318181818181818
> >  22.Dez.10,_04:29:00
> > 21      no      81,0    146,0   91,0    0,8901098901098901
> >  22.Dez.10,_04:30:00
> > 22      no      83,0    135,0   86,0    0,9651162790697675
> >  22.Dez.10,_04:31:00
> > 23      no      80,0    143,0   89,0    0,898876404494382
> > 22.Dez.10,_04:32:00
> > 24      no      85,0    141,0   88,0    0,9659090909090909
> >  22.Dez.10,_04:33:00
> > 25      no      88,0    135,0   88,0    1,0     22.Dez.10,_04:34:00
> >
> >
> >
> >
> >
> > Am 24.07.2011 21:15, schrieb Ted Dunning:
> >
> >> I remember this problem.
> >>
> >>
> >> Is it possible for you to post some sample data?
> >>
> >> On Sun, Jul 24, 2011 at 12:08 PM, Svetlomir Kasabov<
> >> skasab2s@smail.inf.fh-brs.de>  wrote:
> >>
> >>  Hello again and thanks for the replies of both of you, I really
> apreciate
> >>> them. The most important think is, that you try helping and how you do
> >>> this
> >>> is irrelevant :). I didn't feel angry/insulted.
> >>>
> >>>
> >>> Yes, X1 and X2 are two independent hidden sequences, like
> >>>
> >>> BP -- BP -- BP (Blood Pressure)
> >>> HR -- HR -- HR (Heart Rate)
> >>> And I want to train the model to predict the probability of giving a
> drug
> >>> Y
> >>> to a patient (for example, with this sequence)
> >>> Y=0 -- Y=0 -- Y=1
> >>>
> >>> I already tried this with logistic regression, but ended with poor
> >>> results
> >>> (probably because of my small example set). Logistic regression has
> also
> >>> no
> >>> built-in time series and that's why Imust analyze the X's changes using
> >>> percentiles and then train the logistic model with these percentiles.
> In
> >>> this way I reduce the dimensions to only one. That's why I thought that
> >>> the
> >>> HMM can do this for me 'out of the box', staying in the dimension of 2,
> >>> if
> >>> they allow to have two hidden chains, like this:
> >>>
> >>> http://t3.gstatic.com/images?****q=tbn:ANd9GcR8pu4bSm-**MSyg3Pj0-**<
> http://t3.gstatic.com/images?**q=tbn:ANd9GcR8pu4bSm-MSyg3Pj0-**>
> >>> aTyi8FaqUOy4U2bcKJBTBYKKvgAhyw****6P<http://t3.gstatic.com/**
> >>>
> images?q=tbn:ANd9GcR8pu4bSm-**MSyg3Pj0-**aTyi8FaqUOy4U2bcKJBTBYKKvgAhyw*
> >>> *6P<
> http://t3.gstatic.com/images?q=tbn:ANd9GcR8pu4bSm-MSyg3Pj0-aTyi8FaqUOy4U2bcKJBTBYKKvgAhyw6P
> >
> >>> >
> >>>
> >>> or 'coupled' HMMs.
> >>>
> >>> I am not very experienced with the HMMs, but will read further the
> >>> literature and Mahout's API :).
> >>>
> >>> Maybe reducing the dimensions is not that bad idea? I've read that we
> can
> >>> do it with PCA (Principle Components Analysis). Is there a Ḿahout code
> >>> for
> >>> this somewhere?
> >>>
> >>> Thanks a lot once again,
> >>>
> >>> Svetlomir.
> >>>
> >>>
> >>>
> >>> Am 24.07.2011 20:46, schrieb Ted Dunning:
> >>>
> >>>  My impression (and Svetlomir should correct me) is that the intent was
> >>> to
> >>>
> >>>> use two HMM's on separate inputs and then use the decoded state
> >>>> sequences
> >>>> from those as inputs to a third HMM.
> >>>>
> >>>> If that is the question, then I think that Mahout's HMM's are
> >>>> sufficiently
> >>>> object-ish that this should work.  Obviously, it will take multiple
> >>>> training
> >>>> passes to train each separate model.
> >>>>
> >>>> On Sun, Jul 24, 2011 at 11:25 AM, Dhruv<dh...@gmail.com>   wrote:
> >>>>
> >>>>  Svetlomir and Ted -- I was not trying to be rude, sorry if I came
> >>>> across
> >>>>
> >>>>> that way because of my exuberance. I apologize.
> >>>>>
> >>>>> I was eager to help and may have acted too fast and misunderstood the
> >>>>> question, so I turn to both of you for a little clarification.
> >>>>>
> >>>>> I'm confused whether the X's refer to the hidden states, or training
> >>>>> instances. Since the hidden sequence is always a Markov Chain in
> HMMs,
> >>>>> I
> >>>>> assumed that Svetlomir meant that X1 and X2 were two separate hidden
> >>>>> state
> >>>>> sequences because Markov Chain was explicitly mentioned in his
> original
> >>>>> question. To quote:
> >>>>>
> >>>>> -----------
> >>>>> X1----X1----X1----...X1  (Markov Chain for input parameter 1 =>
> >>>>>  monitoring
> >>>>> X1's changes over time)
> >>>>>
> >>>>> X2----X2----X2----...X2  (Markov Chain for intput parameter 2 =>
> >>>>>  monitoring
> >>>>> X2's changes over time)
> >>>>> -----------
> >>>>>
> >>>>> Further, since X1 and X2 were not slated to have any relationship
> with
> >>>>> each
> >>>>> other and since they were the observations of two different
> parameters,
> >>>>> I
> >>>>> construed that X1 and X2 represented two separate hidden state
> >>>>> sequences.
> >>>>> I
> >>>>> gathered that the hidden state sequences X1 and X2 are drawn from two
> >>>>> disjoint hidden vocabulary sets. The user wants to discover the model
> >>>>> on
> >>>>> some training set and then, to the trained model, feed Y for decoding
> >>>>> to
> >>>>> arrive at the most likely sequence of states, X1 and X2 which emitted
> >>>>> Y.
> >>>>>
> >>>>> In my answer, I continued with this line saying that in one training,
> >>>>> you
> >>>>> can't arrive at two separate models for X1 and X2 which contain the
> >>>>> requisite distributions which can be used for decoding, say sequences
> >>>>> of
> >>>>> X1
> >>>>> to have produced Y or sequence of X2 to have produced Y. Hence, I
> >>>>> suggested
> >>>>> having only one set for the hidden states, combining X1s and X2s and
> >>>>> then
> >>>>> train the model on it. Given the domain of application, this may or
> may
> >>>>> not
> >>>>> make sense, hence I was doubtful of formulating the problem as HMM
> and
> >>>>> suggested alternatives.
> >>>>>
> >>>>> However:
> >>>>>
> >>>>> If X's are two separate input sequences for training, then yes, the
> >>>>> current
> >>>>> implementation is capable of training the HMM. If Y is the output,
> then
> >>>>> one
> >>>>> can decode, after training, the sequence of hidden states which most
> >>>>> likely
> >>>>> produced Y.
> >>>>>
> >>>>> For the output probability question, my answer was to use the trained
> >>>>> model's HmmModel.getEmissionMatrix.****get(hiddenState, emittedState)
> >>>>> method to
> >>>>> compute the output probability for a particular hidden state. I
> believe
> >>>>> this
> >>>>> is not what the user wanted?
> >>>>>
> >>>>>
> >>>>> Dhruv
> >>>>>
> >>>>> On Sun, Jul 24, 2011 at 12:56 PM, Ted Dunning<te...@gmail.com>
> >>>>> wrote:
> >>>>>
> >>>>>  On Sun, Jul 24, 2011 at 7:52 AM, Dhruv<dh...@gmail.com>   wrote:
> >>>>>
> >>>>>>  ... If you look into the *definition* of HMM,  the hidden sequence
> is
> >>>>>> drawn
> >>>>>>
> >>>>>>  from
> >>>>>>> only one set. The hidden sequence's transitions can be expressed as
> a
> >>>>>>>
> >>>>>>>  joint
> >>>>>>
> >>>>>>  probability p(s0, s1). Similarly the observed sequence has a joint
> >>>>>>> distribution with the hidden sequence such as p(y0, s1) and so on.
> >>>>>>>
> >>>>>>>  I think gentler language might be a good idea here.  The question
> >>>>>>> was
> >>>>>>>
> >>>>>> not
> >>>>>> at
> >>>>>> all unreasonable.
> >>>>>>
> >>>>>>
> >>>>>>  The hidden state transitions follow the Markov memorylessness
> >>>>>> property
> >>>>>> and
> >>>>>>
> >>>>>>  hence form a Markov Chain.
> >>>>>>>
> >>>>>>> In your case, you are trying to model your problem assuming that
> >>>>>>> there
> >>>>>>>
> >>>>>>>  are
> >>>>>>
> >>>>>>  two underlying state sequences affecting the observed output. This
> >>>>>>>
> >>>>>>>  doesn't
> >>>>>>
> >>>>>>  fit into the HMM's definition and you probably want something else.
> >>>>>>>
> >>>>>>>  Actually, what the original poster wanted is quite sensible.
>  While
> >>>>>>>
> >>>>>> the
> >>>>>> output sequence is due to a single input sequence, that input
> sequence
> >>>>>> is
> >>>>>> not observable.  As such, we have a noisy channel problem where we
> >>>>>> want
> >>>>>>
> >>>>>>  to
> >>>>>
> >>>>>  estimate something about that original sequence.  The point of the
> >>>>>> Markov
> >>>>>> model is that it defines a distribution of output sequence given an
> >>>>>> input
> >>>>>> sequence (and model).  This distribution can be inverted so that
> given
> >>>>>> a
> >>>>>> particular output sequence, we can estimate the probability
> >>>>>> distribution
> >>>>>>
> >>>>>>  of
> >>>>>
> >>>>>  input sequences conditional on the output.
> >>>>>>
> >>>>>> The typical decoding algorithm for HMM's estimates only the maximum
> >>>>>> likelihood input sequence but this does not negate the fact that we
> >>>>>> have
> >>>>>>
> >>>>>>  a
> >>>>>
> >>>>>  distribution.  There are alternative decoding algorithms that allow
> a
> >>>>>> set
> >>>>>> of
> >>>>>> high probability sequences to be estimated or allow a partial
> >>>>>> probability
> >>>>>> lattice to be output that allows alternative sequences to be probed.
> >>>>>>
> >>>>>> If you do want to fit your problem into the HMM framework, you need
> to
> >>>>>>
> >>>>>>  condense the X1 and X2 sequences into a single set and then
> condition
> >>>>>>>
> >>>>>>>  the
> >>>>>> Ys
> >>>>>>
> >>>>>>> on it.
> >>>>>>>
> >>>>>>>  Not at all.
> >>>>>>>
> >>>>>>
> >>>>>>  3. Can we get output probabilities from the HMM for a concrete
> state?
> >>>>>>
> >>>>>>>  Yes, after training, you can retrieve any of the trained model's
> >>>>>>>>
> >>>>>>> distributions as a Mahout Matrix type and use get(row, col).
> >>>>>>>
> >>>>>>>  This is not quite what the question was.
> >>>>>>>
> >>>>>>
> >>>>>>
> >
>

Re: HMM investigations

Posted by Svetlomir Kasabov <sk...@smail.inf.fh-brs.de>.

Hello Dhruv and thanks for the cooperativeness,


the short description of my problem is this article (which is also my 
main source): http://www.ncbi.nlm.nih.gov/pubmed/19163540. I try 
predicting the giving a drug Y (indicating hemodynamic instability), 
based on patient's vital signs like blood pressure and heart rate.

The long description of the problem is here 
www.multi-science.co.uk/acce-free.pdf. I will summarize the information 
into these two points:

1. In order to extract a training example for prediction of instable 
patients, the autors checked when a drug Y was given (for example, at 
time t), went back t-4 hours and used the data from t-4 to t-2.  The 
authours used for training logistic regression. And we know, that  the 
equation Y = ß0 + x1*ß1...xn*ßn has one-dimensional character, But the 
problem is two dimensional: for each patient, you have the time as one 
dimension and multiple parameters: Systolic Arterial Pressure(SAP), 
Heart Rate (HR) etc. in the second dimension. So, they used percentiles 
over the above mentioned time segment in order to map, for example HR's 
changes to X1.

2. On page 15, figure 2 from the pdf, you can see how good the chosen 
parameters (in percentiles) differenciate the stable from unstable patients.


@Ted: many thanks to you too. Your analysis is great. I will post more 
data in some minutes...


Svetlomir.

Am 25.07.2011 00:39, schrieb Dhruv:
> Hi Svetlomir,
>
> Thanks for the clarification.
>
> Since in your case the HR, MAP, SAP etc are the hidden variables but they
> are completely observable, you can just count the transitions, emissions etc
> to arrive at the required probability distributions.
>
> Is there a previous thread on Mahout where you have discussed this problem?
> I need to understand your requirements about what exactly are you trying to
> predict, the cause and effect relationship etc. That information can help
> model the problem in a way which is more amenable to Mahout's HMM trainers.
>
>
> On Sun, Jul 24, 2011 at 3:44 PM, Svetlomir Kasabov<
> skasab2s@smail.inf.fh-brs.de>  wrote:
>
>> So, that is my sample data. The column "instable" is the outcome variable,
>> HR, SAP, MAP etc. is the minute-by-minute raw data. From these I extracted
>> derived features (using percentiles) and created a training example with the
>> data from i=1 to i=25 with instable = yes/no and so on...
>>
>>
>> Thank you.
>>
>>
>>
>>
>> i       instable        HR      SAP     MAP     ShockIndex      tStamp
>> 1       yes     114,0   87,0    74,0    1,5405405405405406
>>   14.Mrz.10,_11:29:00
>> 2       yes     113,0   89,0    70,0    1,6142857142857143
>>   14.Mrz.10,_11:30:00
>> 3       yes     110,0   145,0   116,0   0,9482758620689655
>>   14.Mrz.10,_11:31:00
>> 4       yes     109,0   202,0   201,0   0,5422885572139303
>>   14.Mrz.10,_11:32:00
>> 5       yes     111,0   207,0   205,0   0,5414634146341464
>>   14.Mrz.10,_11:33:00
>> 6       yes     109,0   209,0   208,0   0,5240384615384616
>>   14.Mrz.10,_11:34:00
>> 7       yes     112,0   144,0   116,0   0,9655172413793104
>>   14.Mrz.10,_11:35:00
>> 8       yes     111,0   112,0   87,0    1,2758620689655173
>>   14.Mrz.10,_11:36:00
>> 9       yes     111,0   105,0   84,0    1,3214285714285714
>>   14.Mrz.10,_11:37:00
>> 10      yes     111,0   102,0   73,0    1,5205479452054795
>>   14.Mrz.10,_11:38:00
>> 11      yes     111,0   103,0   72,0    1,5416666666666667
>>   14.Mrz.10,_11:39:00
>> 12      yes     115,0   94,0    74,0    1,554054054054054
>> 14.Mrz.10,_11:40:00
>> 13      yes     113,0   91,0    67,0    1,6865671641791045
>>   14.Mrz.10,_11:41:00
>> 14      yes     109,0   124,0   101,0   1,0792079207920793
>>   14.Mrz.10,_11:42:00
>> 15      yes     109,0   147,0   123,0   0,8861788617886179
>>   14.Mrz.10,_11:43:00
>> 16      yes     110,0   93,0    69,0    1,5942028985507246
>>   14.Mrz.10,_11:44:00
>> 17      yes     108,0   91,0    74,0    1,4594594594594594
>>   14.Mrz.10,_11:45:00
>> 18      yes     109,0   83,0    69,0    1,5797101449275361
>>   14.Mrz.10,_11:46:00
>> 19      yes     110,0   94,0    70,0    1,5714285714285714
>>   14.Mrz.10,_11:47:00
>> 20      yes     109,0   104,0   73,0    1,4931506849315068
>>   14.Mrz.10,_11:48:00
>> 21      yes     107,0   103,0   68,0    1,5735294117647058
>>   14.Mrz.10,_11:49:00
>> 22      yes     109,0   94,0    69,0    1,5797101449275361
>>   14.Mrz.10,_11:50:00
>> 23      yes     108,0   90,0    66,0    1,6363636363636365
>>   14.Mrz.10,_11:51:00
>> 24      yes     109,0   97,0    73,0    1,4931506849315068
>>   14.Mrz.10,_11:52:00
>> 25      yes     110,0   105,0   73,0    1,5068493150684932
>>   14.Mrz.10,_11:53:00
>>
>>
>> 1       no      84,0    138,0   87,0    0,9655172413793104
>>   22.Dez.10,_04:10:00
>> 2       no      83,0    139,0   87,0    0,9540229885057471
>>   22.Dez.10,_04:11:00
>> 3       no      80,0    142,0   89,0    0,898876404494382
>> 22.Dez.10,_04:12:00
>> 4       no      82,0    142,0   87,0    0,9425287356321839
>>   22.Dez.10,_04:13:00
>> 5       no      81,0    140,0   87,0    0,9310344827586207
>>   22.Dez.10,_04:14:00
>> 6       no      77,0    138,0   85,0    0,9058823529411765
>>   22.Dez.10,_04:15:00
>> 7       no      80,0    143,0   89,0    0,898876404494382
>> 22.Dez.10,_04:16:00
>> 8       no      75,0    139,0   87,0    0,8620689655172413
>>   22.Dez.10,_04:17:00
>> 9       no      79,0    137,0   84,0    0,9404761904761905
>>   22.Dez.10,_04:18:00
>> 10      no      81,0    143,0   89,0    0,9101123595505618
>>   22.Dez.10,_04:19:00
>> 11      no      82,0    142,0   91,0    0,9010989010989011
>>   22.Dez.10,_04:20:00
>> 12      no      80,0    142,0   88,0    0,9090909090909091
>>   22.Dez.10,_04:21:00
>> 13      no      79,0    146,0   90,0    0,8777777777777778
>>   22.Dez.10,_04:22:00
>> 14      no      83,0    151,0   94,0    0,8829787234042553
>>   22.Dez.10,_04:23:00
>> 15      no      78,0    146,0   90,0    0,8666666666666667
>>   22.Dez.10,_04:24:00
>> 16      no      80,0    143,0   89,0    0,898876404494382
>> 22.Dez.10,_04:25:00
>> 17      no      81,0    143,0   88,0    0,9204545454545454
>>   22.Dez.10,_04:26:00
>> 18      no      79,0    143,0   88,0    0,8977272727272727
>>   22.Dez.10,_04:27:00
>> 19      no      85,0    145,0   90,0    0,9444444444444444
>>   22.Dez.10,_04:28:00
>> 20      no      82,0    138,0   88,0    0,9318181818181818
>>   22.Dez.10,_04:29:00
>> 21      no      81,0    146,0   91,0    0,8901098901098901
>>   22.Dez.10,_04:30:00
>> 22      no      83,0    135,0   86,0    0,9651162790697675
>>   22.Dez.10,_04:31:00
>> 23      no      80,0    143,0   89,0    0,898876404494382
>> 22.Dez.10,_04:32:00
>> 24      no      85,0    141,0   88,0    0,9659090909090909
>>   22.Dez.10,_04:33:00
>> 25      no      88,0    135,0   88,0    1,0     22.Dez.10,_04:34:00
>>
>>
>>
>>
>>
>> Am 24.07.2011 21:15, schrieb Ted Dunning:
>>
>>> I remember this problem.
>>>
>>>
>>> Is it possible for you to post some sample data?
>>>
>>> On Sun, Jul 24, 2011 at 12:08 PM, Svetlomir Kasabov<
>>> skasab2s@smail.inf.fh-brs.de>   wrote:
>>>
>>>   Hello again and thanks for the replies of both of you, I really apreciate
>>>> them. The most important think is, that you try helping and how you do
>>>> this
>>>> is irrelevant :). I didn't feel angry/insulted.
>>>>
>>>>
>>>> Yes, X1 and X2 are two independent hidden sequences, like
>>>>
>>>> BP -- BP -- BP (Blood Pressure)
>>>> HR -- HR -- HR (Heart Rate)
>>>> And I want to train the model to predict the probability of giving a drug
>>>> Y
>>>> to a patient (for example, with this sequence)
>>>> Y=0 -- Y=0 -- Y=1
>>>>
>>>> I already tried this with logistic regression, but ended with poor
>>>> results
>>>> (probably because of my small example set). Logistic regression has also
>>>> no
>>>> built-in time series and that's why Imust analyze the X's changes using
>>>> percentiles and then train the logistic model with these percentiles. In
>>>> this way I reduce the dimensions to only one. That's why I thought that
>>>> the
>>>> HMM can do this for me 'out of the box', staying in the dimension of 2,
>>>> if
>>>> they allow to have two hidden chains, like this:
>>>>
>>>> http://t3.gstatic.com/images?****q=tbn:ANd9GcR8pu4bSm-**MSyg3Pj0-**<http://t3.gstatic.com/images?**q=tbn:ANd9GcR8pu4bSm-MSyg3Pj0-**>
>>>> aTyi8FaqUOy4U2bcKJBTBYKKvgAhyw****6P<http://t3.gstatic.com/**
>>>> images?q=tbn:ANd9GcR8pu4bSm-**MSyg3Pj0-**aTyi8FaqUOy4U2bcKJBTBYKKvgAhyw*
>>>> *6P<http://t3.gstatic.com/images?q=tbn:ANd9GcR8pu4bSm-MSyg3Pj0-aTyi8FaqUOy4U2bcKJBTBYKKvgAhyw6P>
>>>> or 'coupled' HMMs.
>>>>
>>>> I am not very experienced with the HMMs, but will read further the
>>>> literature and Mahout's API :).
>>>>
>>>> Maybe reducing the dimensions is not that bad idea? I've read that we can
>>>> do it with PCA (Principle Components Analysis). Is there a Ḿahout code
>>>> for
>>>> this somewhere?
>>>>
>>>> Thanks a lot once again,
>>>>
>>>> Svetlomir.
>>>>
>>>>
>>>>
>>>> Am 24.07.2011 20:46, schrieb Ted Dunning:
>>>>
>>>>   My impression (and Svetlomir should correct me) is that the intent was
>>>> to
>>>>
>>>>> use two HMM's on separate inputs and then use the decoded state
>>>>> sequences
>>>>> from those as inputs to a third HMM.
>>>>>
>>>>> If that is the question, then I think that Mahout's HMM's are
>>>>> sufficiently
>>>>> object-ish that this should work.  Obviously, it will take multiple
>>>>> training
>>>>> passes to train each separate model.
>>>>>
>>>>> On Sun, Jul 24, 2011 at 11:25 AM, Dhruv<dh...@gmail.com>    wrote:
>>>>>
>>>>>   Svetlomir and Ted -- I was not trying to be rude, sorry if I came
>>>>> across
>>>>>
>>>>>> that way because of my exuberance. I apologize.
>>>>>>
>>>>>> I was eager to help and may have acted too fast and misunderstood the
>>>>>> question, so I turn to both of you for a little clarification.
>>>>>>
>>>>>> I'm confused whether the X's refer to the hidden states, or training
>>>>>> instances. Since the hidden sequence is always a Markov Chain in HMMs,
>>>>>> I
>>>>>> assumed that Svetlomir meant that X1 and X2 were two separate hidden
>>>>>> state
>>>>>> sequences because Markov Chain was explicitly mentioned in his original
>>>>>> question. To quote:
>>>>>>
>>>>>> -----------
>>>>>> X1----X1----X1----...X1  (Markov Chain for input parameter 1 =>
>>>>>>   monitoring
>>>>>> X1's changes over time)
>>>>>>
>>>>>> X2----X2----X2----...X2  (Markov Chain for intput parameter 2 =>
>>>>>>   monitoring
>>>>>> X2's changes over time)
>>>>>> -----------
>>>>>>
>>>>>> Further, since X1 and X2 were not slated to have any relationship with
>>>>>> each
>>>>>> other and since they were the observations of two different parameters,
>>>>>> I
>>>>>> construed that X1 and X2 represented two separate hidden state
>>>>>> sequences.
>>>>>> I
>>>>>> gathered that the hidden state sequences X1 and X2 are drawn from two
>>>>>> disjoint hidden vocabulary sets. The user wants to discover the model
>>>>>> on
>>>>>> some training set and then, to the trained model, feed Y for decoding
>>>>>> to
>>>>>> arrive at the most likely sequence of states, X1 and X2 which emitted
>>>>>> Y.
>>>>>>
>>>>>> In my answer, I continued with this line saying that in one training,
>>>>>> you
>>>>>> can't arrive at two separate models for X1 and X2 which contain the
>>>>>> requisite distributions which can be used for decoding, say sequences
>>>>>> of
>>>>>> X1
>>>>>> to have produced Y or sequence of X2 to have produced Y. Hence, I
>>>>>> suggested
>>>>>> having only one set for the hidden states, combining X1s and X2s and
>>>>>> then
>>>>>> train the model on it. Given the domain of application, this may or may
>>>>>> not
>>>>>> make sense, hence I was doubtful of formulating the problem as HMM and
>>>>>> suggested alternatives.
>>>>>>
>>>>>> However:
>>>>>>
>>>>>> If X's are two separate input sequences for training, then yes, the
>>>>>> current
>>>>>> implementation is capable of training the HMM. If Y is the output, then
>>>>>> one
>>>>>> can decode, after training, the sequence of hidden states which most
>>>>>> likely
>>>>>> produced Y.
>>>>>>
>>>>>> For the output probability question, my answer was to use the trained
>>>>>> model's HmmModel.getEmissionMatrix.****get(hiddenState, emittedState)
>>>>>> method to
>>>>>> compute the output probability for a particular hidden state. I believe
>>>>>> this
>>>>>> is not what the user wanted?
>>>>>>
>>>>>>
>>>>>> Dhruv
>>>>>>
>>>>>> On Sun, Jul 24, 2011 at 12:56 PM, Ted Dunning<te...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>   On Sun, Jul 24, 2011 at 7:52 AM, Dhruv<dh...@gmail.com>    wrote:
>>>>>>
>>>>>>>   ... If you look into the *definition* of HMM,  the hidden sequence is
>>>>>>> drawn
>>>>>>>
>>>>>>>   from
>>>>>>>> only one set. The hidden sequence's transitions can be expressed as a
>>>>>>>>
>>>>>>>>   joint
>>>>>>>   probability p(s0, s1). Similarly the observed sequence has a joint
>>>>>>>> distribution with the hidden sequence such as p(y0, s1) and so on.
>>>>>>>>
>>>>>>>>   I think gentler language might be a good idea here.  The question
>>>>>>>> was
>>>>>>>>
>>>>>>> not
>>>>>>> at
>>>>>>> all unreasonable.
>>>>>>>
>>>>>>>
>>>>>>>   The hidden state transitions follow the Markov memorylessness
>>>>>>> property
>>>>>>> and
>>>>>>>
>>>>>>>   hence form a Markov Chain.
>>>>>>>> In your case, you are trying to model your problem assuming that
>>>>>>>> there
>>>>>>>>
>>>>>>>>   are
>>>>>>>   two underlying state sequences affecting the observed output. This
>>>>>>>>   doesn't
>>>>>>>   fit into the HMM's definition and you probably want something else.
>>>>>>>>   Actually, what the original poster wanted is quite sensible.  While
>>>>>>>>
>>>>>>> the
>>>>>>> output sequence is due to a single input sequence, that input sequence
>>>>>>> is
>>>>>>> not observable.  As such, we have a noisy channel problem where we
>>>>>>> want
>>>>>>>
>>>>>>>   to
>>>>>>   estimate something about that original sequence.  The point of the
>>>>>>> Markov
>>>>>>> model is that it defines a distribution of output sequence given an
>>>>>>> input
>>>>>>> sequence (and model).  This distribution can be inverted so that given
>>>>>>> a
>>>>>>> particular output sequence, we can estimate the probability
>>>>>>> distribution
>>>>>>>
>>>>>>>   of
>>>>>>   input sequences conditional on the output.
>>>>>>> The typical decoding algorithm for HMM's estimates only the maximum
>>>>>>> likelihood input sequence but this does not negate the fact that we
>>>>>>> have
>>>>>>>
>>>>>>>   a
>>>>>>   distribution.  There are alternative decoding algorithms that allow a
>>>>>>> set
>>>>>>> of
>>>>>>> high probability sequences to be estimated or allow a partial
>>>>>>> probability
>>>>>>> lattice to be output that allows alternative sequences to be probed.
>>>>>>>
>>>>>>> If you do want to fit your problem into the HMM framework, you need to
>>>>>>>
>>>>>>>   condense the X1 and X2 sequences into a single set and then condition
>>>>>>>>   the
>>>>>>> Ys
>>>>>>>
>>>>>>>> on it.
>>>>>>>>
>>>>>>>>   Not at all.
>>>>>>>>
>>>>>>>   3. Can we get output probabilities from the HMM for a concrete state?
>>>>>>>
>>>>>>>>   Yes, after training, you can retrieve any of the trained model's
>>>>>>>> distributions as a Mahout Matrix type and use get(row, col).
>>>>>>>>
>>>>>>>>   This is not quite what the question was.
>>>>>>>>
>>>>>>>

Re: HMM investigations

Posted by Dhruv <dh...@gmail.com>.

Hi Svetlomir,

Thanks for the clarification.

Since in your case the HR, MAP, SAP etc are the hidden variables but they
are completely observable, you can just count the transitions, emissions etc
to arrive at the required probability distributions.

Is there a previous thread on Mahout where you have discussed this problem?
I need to understand your requirements about what exactly are you trying to
predict, the cause and effect relationship etc. That information can help
model the problem in a way which is more amenable to Mahout's HMM trainers.


On Sun, Jul 24, 2011 at 3:44 PM, Svetlomir Kasabov <
skasab2s@smail.inf.fh-brs.de> wrote:

> So, that is my sample data. The column "instable" is the outcome variable,
> HR, SAP, MAP etc. is the minute-by-minute raw data. From these I extracted
> derived features (using percentiles) and created a training example with the
> data from i=1 to i=25 with instable = yes/no and so on...
>
>
> Thank you.
>
>
>
>
> i       instable        HR      SAP     MAP     ShockIndex      tStamp
> 1       yes     114,0   87,0    74,0    1,5405405405405406
>  14.Mrz.10,_11:29:00
> 2       yes     113,0   89,0    70,0    1,6142857142857143
>  14.Mrz.10,_11:30:00
> 3       yes     110,0   145,0   116,0   0,9482758620689655
>  14.Mrz.10,_11:31:00
> 4       yes     109,0   202,0   201,0   0,5422885572139303
>  14.Mrz.10,_11:32:00
> 5       yes     111,0   207,0   205,0   0,5414634146341464
>  14.Mrz.10,_11:33:00
> 6       yes     109,0   209,0   208,0   0,5240384615384616
>  14.Mrz.10,_11:34:00
> 7       yes     112,0   144,0   116,0   0,9655172413793104
>  14.Mrz.10,_11:35:00
> 8       yes     111,0   112,0   87,0    1,2758620689655173
>  14.Mrz.10,_11:36:00
> 9       yes     111,0   105,0   84,0    1,3214285714285714
>  14.Mrz.10,_11:37:00
> 10      yes     111,0   102,0   73,0    1,5205479452054795
>  14.Mrz.10,_11:38:00
> 11      yes     111,0   103,0   72,0    1,5416666666666667
>  14.Mrz.10,_11:39:00
> 12      yes     115,0   94,0    74,0    1,554054054054054
> 14.Mrz.10,_11:40:00
> 13      yes     113,0   91,0    67,0    1,6865671641791045
>  14.Mrz.10,_11:41:00
> 14      yes     109,0   124,0   101,0   1,0792079207920793
>  14.Mrz.10,_11:42:00
> 15      yes     109,0   147,0   123,0   0,8861788617886179
>  14.Mrz.10,_11:43:00
> 16      yes     110,0   93,0    69,0    1,5942028985507246
>  14.Mrz.10,_11:44:00
> 17      yes     108,0   91,0    74,0    1,4594594594594594
>  14.Mrz.10,_11:45:00
> 18      yes     109,0   83,0    69,0    1,5797101449275361
>  14.Mrz.10,_11:46:00
> 19      yes     110,0   94,0    70,0    1,5714285714285714
>  14.Mrz.10,_11:47:00
> 20      yes     109,0   104,0   73,0    1,4931506849315068
>  14.Mrz.10,_11:48:00
> 21      yes     107,0   103,0   68,0    1,5735294117647058
>  14.Mrz.10,_11:49:00
> 22      yes     109,0   94,0    69,0    1,5797101449275361
>  14.Mrz.10,_11:50:00
> 23      yes     108,0   90,0    66,0    1,6363636363636365
>  14.Mrz.10,_11:51:00
> 24      yes     109,0   97,0    73,0    1,4931506849315068
>  14.Mrz.10,_11:52:00
> 25      yes     110,0   105,0   73,0    1,5068493150684932
>  14.Mrz.10,_11:53:00
>
>
> 1       no      84,0    138,0   87,0    0,9655172413793104
>  22.Dez.10,_04:10:00
> 2       no      83,0    139,0   87,0    0,9540229885057471
>  22.Dez.10,_04:11:00
> 3       no      80,0    142,0   89,0    0,898876404494382
> 22.Dez.10,_04:12:00
> 4       no      82,0    142,0   87,0    0,9425287356321839
>  22.Dez.10,_04:13:00
> 5       no      81,0    140,0   87,0    0,9310344827586207
>  22.Dez.10,_04:14:00
> 6       no      77,0    138,0   85,0    0,9058823529411765
>  22.Dez.10,_04:15:00
> 7       no      80,0    143,0   89,0    0,898876404494382
> 22.Dez.10,_04:16:00
> 8       no      75,0    139,0   87,0    0,8620689655172413
>  22.Dez.10,_04:17:00
> 9       no      79,0    137,0   84,0    0,9404761904761905
>  22.Dez.10,_04:18:00
> 10      no      81,0    143,0   89,0    0,9101123595505618
>  22.Dez.10,_04:19:00
> 11      no      82,0    142,0   91,0    0,9010989010989011
>  22.Dez.10,_04:20:00
> 12      no      80,0    142,0   88,0    0,9090909090909091
>  22.Dez.10,_04:21:00
> 13      no      79,0    146,0   90,0    0,8777777777777778
>  22.Dez.10,_04:22:00
> 14      no      83,0    151,0   94,0    0,8829787234042553
>  22.Dez.10,_04:23:00
> 15      no      78,0    146,0   90,0    0,8666666666666667
>  22.Dez.10,_04:24:00
> 16      no      80,0    143,0   89,0    0,898876404494382
> 22.Dez.10,_04:25:00
> 17      no      81,0    143,0   88,0    0,9204545454545454
>  22.Dez.10,_04:26:00
> 18      no      79,0    143,0   88,0    0,8977272727272727
>  22.Dez.10,_04:27:00
> 19      no      85,0    145,0   90,0    0,9444444444444444
>  22.Dez.10,_04:28:00
> 20      no      82,0    138,0   88,0    0,9318181818181818
>  22.Dez.10,_04:29:00
> 21      no      81,0    146,0   91,0    0,8901098901098901
>  22.Dez.10,_04:30:00
> 22      no      83,0    135,0   86,0    0,9651162790697675
>  22.Dez.10,_04:31:00
> 23      no      80,0    143,0   89,0    0,898876404494382
> 22.Dez.10,_04:32:00
> 24      no      85,0    141,0   88,0    0,9659090909090909
>  22.Dez.10,_04:33:00
> 25      no      88,0    135,0   88,0    1,0     22.Dez.10,_04:34:00
>
>
>
>
>
> Am 24.07.2011 21:15, schrieb Ted Dunning:
>
>> I remember this problem.
>>
>>
>> Is it possible for you to post some sample data?
>>
>> On Sun, Jul 24, 2011 at 12:08 PM, Svetlomir Kasabov<
>> skasab2s@smail.inf.fh-brs.de>  wrote:
>>
>>  Hello again and thanks for the replies of both of you, I really apreciate
>>> them. The most important think is, that you try helping and how you do
>>> this
>>> is irrelevant :). I didn't feel angry/insulted.
>>>
>>>
>>> Yes, X1 and X2 are two independent hidden sequences, like
>>>
>>> BP -- BP -- BP (Blood Pressure)
>>> HR -- HR -- HR (Heart Rate)
>>> And I want to train the model to predict the probability of giving a drug
>>> Y
>>> to a patient (for example, with this sequence)
>>> Y=0 -- Y=0 -- Y=1
>>>
>>> I already tried this with logistic regression, but ended with poor
>>> results
>>> (probably because of my small example set). Logistic regression has also
>>> no
>>> built-in time series and that's why Imust analyze the X's changes using
>>> percentiles and then train the logistic model with these percentiles. In
>>> this way I reduce the dimensions to only one. That's why I thought that
>>> the
>>> HMM can do this for me 'out of the box', staying in the dimension of 2,
>>> if
>>> they allow to have two hidden chains, like this:
>>>
>>> http://t3.gstatic.com/images?****q=tbn:ANd9GcR8pu4bSm-**MSyg3Pj0-**<http://t3.gstatic.com/images?**q=tbn:ANd9GcR8pu4bSm-MSyg3Pj0-**>
>>> aTyi8FaqUOy4U2bcKJBTBYKKvgAhyw****6P<http://t3.gstatic.com/**
>>> images?q=tbn:ANd9GcR8pu4bSm-**MSyg3Pj0-**aTyi8FaqUOy4U2bcKJBTBYKKvgAhyw*
>>> *6P<http://t3.gstatic.com/images?q=tbn:ANd9GcR8pu4bSm-MSyg3Pj0-aTyi8FaqUOy4U2bcKJBTBYKKvgAhyw6P>
>>> >
>>>
>>> or 'coupled' HMMs.
>>>
>>> I am not very experienced with the HMMs, but will read further the
>>> literature and Mahout's API :).
>>>
>>> Maybe reducing the dimensions is not that bad idea? I've read that we can
>>> do it with PCA (Principle Components Analysis). Is there a Ḿahout code
>>> for
>>> this somewhere?
>>>
>>> Thanks a lot once again,
>>>
>>> Svetlomir.
>>>
>>>
>>>
>>> Am 24.07.2011 20:46, schrieb Ted Dunning:
>>>
>>>  My impression (and Svetlomir should correct me) is that the intent was
>>> to
>>>
>>>> use two HMM's on separate inputs and then use the decoded state
>>>> sequences
>>>> from those as inputs to a third HMM.
>>>>
>>>> If that is the question, then I think that Mahout's HMM's are
>>>> sufficiently
>>>> object-ish that this should work.  Obviously, it will take multiple
>>>> training
>>>> passes to train each separate model.
>>>>
>>>> On Sun, Jul 24, 2011 at 11:25 AM, Dhruv<dh...@gmail.com>   wrote:
>>>>
>>>>  Svetlomir and Ted -- I was not trying to be rude, sorry if I came
>>>> across
>>>>
>>>>> that way because of my exuberance. I apologize.
>>>>>
>>>>> I was eager to help and may have acted too fast and misunderstood the
>>>>> question, so I turn to both of you for a little clarification.
>>>>>
>>>>> I'm confused whether the X's refer to the hidden states, or training
>>>>> instances. Since the hidden sequence is always a Markov Chain in HMMs,
>>>>> I
>>>>> assumed that Svetlomir meant that X1 and X2 were two separate hidden
>>>>> state
>>>>> sequences because Markov Chain was explicitly mentioned in his original
>>>>> question. To quote:
>>>>>
>>>>> -----------
>>>>> X1----X1----X1----...X1  (Markov Chain for input parameter 1 =>
>>>>>  monitoring
>>>>> X1's changes over time)
>>>>>
>>>>> X2----X2----X2----...X2  (Markov Chain for intput parameter 2 =>
>>>>>  monitoring
>>>>> X2's changes over time)
>>>>> -----------
>>>>>
>>>>> Further, since X1 and X2 were not slated to have any relationship with
>>>>> each
>>>>> other and since they were the observations of two different parameters,
>>>>> I
>>>>> construed that X1 and X2 represented two separate hidden state
>>>>> sequences.
>>>>> I
>>>>> gathered that the hidden state sequences X1 and X2 are drawn from two
>>>>> disjoint hidden vocabulary sets. The user wants to discover the model
>>>>> on
>>>>> some training set and then, to the trained model, feed Y for decoding
>>>>> to
>>>>> arrive at the most likely sequence of states, X1 and X2 which emitted
>>>>> Y.
>>>>>
>>>>> In my answer, I continued with this line saying that in one training,
>>>>> you
>>>>> can't arrive at two separate models for X1 and X2 which contain the
>>>>> requisite distributions which can be used for decoding, say sequences
>>>>> of
>>>>> X1
>>>>> to have produced Y or sequence of X2 to have produced Y. Hence, I
>>>>> suggested
>>>>> having only one set for the hidden states, combining X1s and X2s and
>>>>> then
>>>>> train the model on it. Given the domain of application, this may or may
>>>>> not
>>>>> make sense, hence I was doubtful of formulating the problem as HMM and
>>>>> suggested alternatives.
>>>>>
>>>>> However:
>>>>>
>>>>> If X's are two separate input sequences for training, then yes, the
>>>>> current
>>>>> implementation is capable of training the HMM. If Y is the output, then
>>>>> one
>>>>> can decode, after training, the sequence of hidden states which most
>>>>> likely
>>>>> produced Y.
>>>>>
>>>>> For the output probability question, my answer was to use the trained
>>>>> model's HmmModel.getEmissionMatrix.****get(hiddenState, emittedState)
>>>>> method to
>>>>> compute the output probability for a particular hidden state. I believe
>>>>> this
>>>>> is not what the user wanted?
>>>>>
>>>>>
>>>>> Dhruv
>>>>>
>>>>> On Sun, Jul 24, 2011 at 12:56 PM, Ted Dunning<te...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>  On Sun, Jul 24, 2011 at 7:52 AM, Dhruv<dh...@gmail.com>   wrote:
>>>>>
>>>>>>  ... If you look into the *definition* of HMM,  the hidden sequence is
>>>>>> drawn
>>>>>>
>>>>>>  from
>>>>>>> only one set. The hidden sequence's transitions can be expressed as a
>>>>>>>
>>>>>>>  joint
>>>>>>
>>>>>>  probability p(s0, s1). Similarly the observed sequence has a joint
>>>>>>> distribution with the hidden sequence such as p(y0, s1) and so on.
>>>>>>>
>>>>>>>  I think gentler language might be a good idea here.  The question
>>>>>>> was
>>>>>>>
>>>>>> not
>>>>>> at
>>>>>> all unreasonable.
>>>>>>
>>>>>>
>>>>>>  The hidden state transitions follow the Markov memorylessness
>>>>>> property
>>>>>> and
>>>>>>
>>>>>>  hence form a Markov Chain.
>>>>>>>
>>>>>>> In your case, you are trying to model your problem assuming that
>>>>>>> there
>>>>>>>
>>>>>>>  are
>>>>>>
>>>>>>  two underlying state sequences affecting the observed output. This
>>>>>>>
>>>>>>>  doesn't
>>>>>>
>>>>>>  fit into the HMM's definition and you probably want something else.
>>>>>>>
>>>>>>>  Actually, what the original poster wanted is quite sensible.  While
>>>>>>>
>>>>>> the
>>>>>> output sequence is due to a single input sequence, that input sequence
>>>>>> is
>>>>>> not observable.  As such, we have a noisy channel problem where we
>>>>>> want
>>>>>>
>>>>>>  to
>>>>>
>>>>>  estimate something about that original sequence.  The point of the
>>>>>> Markov
>>>>>> model is that it defines a distribution of output sequence given an
>>>>>> input
>>>>>> sequence (and model).  This distribution can be inverted so that given
>>>>>> a
>>>>>> particular output sequence, we can estimate the probability
>>>>>> distribution
>>>>>>
>>>>>>  of
>>>>>
>>>>>  input sequences conditional on the output.
>>>>>>
>>>>>> The typical decoding algorithm for HMM's estimates only the maximum
>>>>>> likelihood input sequence but this does not negate the fact that we
>>>>>> have
>>>>>>
>>>>>>  a
>>>>>
>>>>>  distribution.  There are alternative decoding algorithms that allow a
>>>>>> set
>>>>>> of
>>>>>> high probability sequences to be estimated or allow a partial
>>>>>> probability
>>>>>> lattice to be output that allows alternative sequences to be probed.
>>>>>>
>>>>>> If you do want to fit your problem into the HMM framework, you need to
>>>>>>
>>>>>>  condense the X1 and X2 sequences into a single set and then condition
>>>>>>>
>>>>>>>  the
>>>>>> Ys
>>>>>>
>>>>>>> on it.
>>>>>>>
>>>>>>>  Not at all.
>>>>>>>
>>>>>>
>>>>>>  3. Can we get output probabilities from the HMM for a concrete state?
>>>>>>
>>>>>>>  Yes, after training, you can retrieve any of the trained model's
>>>>>>>>
>>>>>>> distributions as a Mahout Matrix type and use get(row, col).
>>>>>>>
>>>>>>>  This is not quite what the question was.
>>>>>>>
>>>>>>
>>>>>>
>

Re: HMM investigations

Posted by Svetlomir Kasabov <sk...@smail.inf.fh-brs.de>.

So, that is my sample data. The column "instable" is the outcome 
variable, HR, SAP, MAP etc. is the minute-by-minute raw data. From these 
I extracted derived features (using percentiles) and created a training 
example with the data from i=1 to i=25 with instable = yes/no and so on...


Thank you.




i 	instable 	HR 	SAP 	MAP 	ShockIndex 	tStamp
1 	yes 	114,0 	87,0 	74,0 	1,5405405405405406 	14.Mrz.10,_11:29:00
2 	yes 	113,0 	89,0 	70,0 	1,6142857142857143 	14.Mrz.10,_11:30:00
3 	yes 	110,0 	145,0 	116,0 	0,9482758620689655 	14.Mrz.10,_11:31:00
4 	yes 	109,0 	202,0 	201,0 	0,5422885572139303 	14.Mrz.10,_11:32:00
5 	yes 	111,0 	207,0 	205,0 	0,5414634146341464 	14.Mrz.10,_11:33:00
6 	yes 	109,0 	209,0 	208,0 	0,5240384615384616 	14.Mrz.10,_11:34:00
7 	yes 	112,0 	144,0 	116,0 	0,9655172413793104 	14.Mrz.10,_11:35:00
8 	yes 	111,0 	112,0 	87,0 	1,2758620689655173 	14.Mrz.10,_11:36:00
9 	yes 	111,0 	105,0 	84,0 	1,3214285714285714 	14.Mrz.10,_11:37:00
10 	yes 	111,0 	102,0 	73,0 	1,5205479452054795 	14.Mrz.10,_11:38:00
11 	yes 	111,0 	103,0 	72,0 	1,5416666666666667 	14.Mrz.10,_11:39:00
12 	yes 	115,0 	94,0 	74,0 	1,554054054054054 	14.Mrz.10,_11:40:00
13 	yes 	113,0 	91,0 	67,0 	1,6865671641791045 	14.Mrz.10,_11:41:00
14 	yes 	109,0 	124,0 	101,0 	1,0792079207920793 	14.Mrz.10,_11:42:00
15 	yes 	109,0 	147,0 	123,0 	0,8861788617886179 	14.Mrz.10,_11:43:00
16 	yes 	110,0 	93,0 	69,0 	1,5942028985507246 	14.Mrz.10,_11:44:00
17 	yes 	108,0 	91,0 	74,0 	1,4594594594594594 	14.Mrz.10,_11:45:00
18 	yes 	109,0 	83,0 	69,0 	1,5797101449275361 	14.Mrz.10,_11:46:00
19 	yes 	110,0 	94,0 	70,0 	1,5714285714285714 	14.Mrz.10,_11:47:00
20 	yes 	109,0 	104,0 	73,0 	1,4931506849315068 	14.Mrz.10,_11:48:00
21 	yes 	107,0 	103,0 	68,0 	1,5735294117647058 	14.Mrz.10,_11:49:00
22 	yes 	109,0 	94,0 	69,0 	1,5797101449275361 	14.Mrz.10,_11:50:00
23 	yes 	108,0 	90,0 	66,0 	1,6363636363636365 	14.Mrz.10,_11:51:00
24 	yes 	109,0 	97,0 	73,0 	1,4931506849315068 	14.Mrz.10,_11:52:00
25 	yes 	110,0 	105,0 	73,0 	1,5068493150684932 	14.Mrz.10,_11:53:00


1 	no 	84,0 	138,0 	87,0 	0,9655172413793104 	22.Dez.10,_04:10:00
2 	no 	83,0 	139,0 	87,0 	0,9540229885057471 	22.Dez.10,_04:11:00
3 	no 	80,0 	142,0 	89,0 	0,898876404494382 	22.Dez.10,_04:12:00
4 	no 	82,0 	142,0 	87,0 	0,9425287356321839 	22.Dez.10,_04:13:00
5 	no 	81,0 	140,0 	87,0 	0,9310344827586207 	22.Dez.10,_04:14:00
6 	no 	77,0 	138,0 	85,0 	0,9058823529411765 	22.Dez.10,_04:15:00
7 	no 	80,0 	143,0 	89,0 	0,898876404494382 	22.Dez.10,_04:16:00
8 	no 	75,0 	139,0 	87,0 	0,8620689655172413 	22.Dez.10,_04:17:00
9 	no 	79,0 	137,0 	84,0 	0,9404761904761905 	22.Dez.10,_04:18:00
10 	no 	81,0 	143,0 	89,0 	0,9101123595505618 	22.Dez.10,_04:19:00
11 	no 	82,0 	142,0 	91,0 	0,9010989010989011 	22.Dez.10,_04:20:00
12 	no 	80,0 	142,0 	88,0 	0,9090909090909091 	22.Dez.10,_04:21:00
13 	no 	79,0 	146,0 	90,0 	0,8777777777777778 	22.Dez.10,_04:22:00
14 	no 	83,0 	151,0 	94,0 	0,8829787234042553 	22.Dez.10,_04:23:00
15 	no 	78,0 	146,0 	90,0 	0,8666666666666667 	22.Dez.10,_04:24:00
16 	no 	80,0 	143,0 	89,0 	0,898876404494382 	22.Dez.10,_04:25:00
17 	no 	81,0 	143,0 	88,0 	0,9204545454545454 	22.Dez.10,_04:26:00
18 	no 	79,0 	143,0 	88,0 	0,8977272727272727 	22.Dez.10,_04:27:00
19 	no 	85,0 	145,0 	90,0 	0,9444444444444444 	22.Dez.10,_04:28:00
20 	no 	82,0 	138,0 	88,0 	0,9318181818181818 	22.Dez.10,_04:29:00
21 	no 	81,0 	146,0 	91,0 	0,8901098901098901 	22.Dez.10,_04:30:00
22 	no 	83,0 	135,0 	86,0 	0,9651162790697675 	22.Dez.10,_04:31:00
23 	no 	80,0 	143,0 	89,0 	0,898876404494382 	22.Dez.10,_04:32:00
24 	no 	85,0 	141,0 	88,0 	0,9659090909090909 	22.Dez.10,_04:33:00
25 	no 	88,0 	135,0 	88,0 	1,0 	22.Dez.10,_04:34:00




Am 24.07.2011 21:15, schrieb Ted Dunning:
> I remember this problem.
>
> Is it possible for you to post some sample data?
>
> On Sun, Jul 24, 2011 at 12:08 PM, Svetlomir Kasabov<
> skasab2s@smail.inf.fh-brs.de>  wrote:
>
>> Hello again and thanks for the replies of both of you, I really apreciate
>> them. The most important think is, that you try helping and how you do this
>> is irrelevant :). I didn't feel angry/insulted.
>>
>>
>> Yes, X1 and X2 are two independent hidden sequences, like
>>
>> BP -- BP -- BP (Blood Pressure)
>> HR -- HR -- HR (Heart Rate)
>> And I want to train the model to predict the probability of giving a drug Y
>> to a patient (for example, with this sequence)
>> Y=0 -- Y=0 -- Y=1
>>
>> I already tried this with logistic regression, but ended with poor results
>> (probably because of my small example set). Logistic regression has also no
>> built-in time series and that's why Imust analyze the X's changes using
>> percentiles and then train the logistic model with these percentiles. In
>> this way I reduce the dimensions to only one. That's why I thought that the
>> HMM can do this for me 'out of the box', staying in the dimension of 2, if
>> they allow to have two hidden chains, like this:
>>
>> http://t3.gstatic.com/images?**q=tbn:ANd9GcR8pu4bSm-MSyg3Pj0-**
>> aTyi8FaqUOy4U2bcKJBTBYKKvgAhyw**6P<http://t3.gstatic.com/images?q=tbn:ANd9GcR8pu4bSm-MSyg3Pj0-aTyi8FaqUOy4U2bcKJBTBYKKvgAhyw6P>
>>
>> or 'coupled' HMMs.
>>
>> I am not very experienced with the HMMs, but will read further the
>> literature and Mahout's API :).
>>
>> Maybe reducing the dimensions is not that bad idea? I've read that we can
>> do it with PCA (Principle Components Analysis). Is there a Ḿahout code for
>> this somewhere?
>>
>> Thanks a lot once again,
>>
>> Svetlomir.
>>
>>
>>
>> Am 24.07.2011 20:46, schrieb Ted Dunning:
>>
>>   My impression (and Svetlomir should correct me) is that the intent was to
>>> use two HMM's on separate inputs and then use the decoded state sequences
>>> from those as inputs to a third HMM.
>>>
>>> If that is the question, then I think that Mahout's HMM's are sufficiently
>>> object-ish that this should work.  Obviously, it will take multiple
>>> training
>>> passes to train each separate model.
>>>
>>> On Sun, Jul 24, 2011 at 11:25 AM, Dhruv<dh...@gmail.com>   wrote:
>>>
>>>   Svetlomir and Ted -- I was not trying to be rude, sorry if I came across
>>>> that way because of my exuberance. I apologize.
>>>>
>>>> I was eager to help and may have acted too fast and misunderstood the
>>>> question, so I turn to both of you for a little clarification.
>>>>
>>>> I'm confused whether the X's refer to the hidden states, or training
>>>> instances. Since the hidden sequence is always a Markov Chain in HMMs, I
>>>> assumed that Svetlomir meant that X1 and X2 were two separate hidden
>>>> state
>>>> sequences because Markov Chain was explicitly mentioned in his original
>>>> question. To quote:
>>>>
>>>> -----------
>>>> X1----X1----X1----...X1  (Markov Chain for input parameter 1 =>
>>>>   monitoring
>>>> X1's changes over time)
>>>>
>>>> X2----X2----X2----...X2  (Markov Chain for intput parameter 2 =>
>>>>   monitoring
>>>> X2's changes over time)
>>>> -----------
>>>>
>>>> Further, since X1 and X2 were not slated to have any relationship with
>>>> each
>>>> other and since they were the observations of two different parameters, I
>>>> construed that X1 and X2 represented two separate hidden state sequences.
>>>> I
>>>> gathered that the hidden state sequences X1 and X2 are drawn from two
>>>> disjoint hidden vocabulary sets. The user wants to discover the model on
>>>> some training set and then, to the trained model, feed Y for decoding to
>>>> arrive at the most likely sequence of states, X1 and X2 which emitted Y.
>>>>
>>>> In my answer, I continued with this line saying that in one training, you
>>>> can't arrive at two separate models for X1 and X2 which contain the
>>>> requisite distributions which can be used for decoding, say sequences of
>>>> X1
>>>> to have produced Y or sequence of X2 to have produced Y. Hence, I
>>>> suggested
>>>> having only one set for the hidden states, combining X1s and X2s and then
>>>> train the model on it. Given the domain of application, this may or may
>>>> not
>>>> make sense, hence I was doubtful of formulating the problem as HMM and
>>>> suggested alternatives.
>>>>
>>>> However:
>>>>
>>>> If X's are two separate input sequences for training, then yes, the
>>>> current
>>>> implementation is capable of training the HMM. If Y is the output, then
>>>> one
>>>> can decode, after training, the sequence of hidden states which most
>>>> likely
>>>> produced Y.
>>>>
>>>> For the output probability question, my answer was to use the trained
>>>> model's HmmModel.getEmissionMatrix.**get(hiddenState, emittedState)
>>>> method to
>>>> compute the output probability for a particular hidden state. I believe
>>>> this
>>>> is not what the user wanted?
>>>>
>>>>
>>>> Dhruv
>>>>
>>>> On Sun, Jul 24, 2011 at 12:56 PM, Ted Dunning<te...@gmail.com>
>>>> wrote:
>>>>
>>>>   On Sun, Jul 24, 2011 at 7:52 AM, Dhruv<dh...@gmail.com>   wrote:
>>>>>   ... If you look into the *definition* of HMM,  the hidden sequence is
>>>>> drawn
>>>>>
>>>>>> from
>>>>>> only one set. The hidden sequence's transitions can be expressed as a
>>>>>>
>>>>> joint
>>>>>
>>>>>> probability p(s0, s1). Similarly the observed sequence has a joint
>>>>>> distribution with the hidden sequence such as p(y0, s1) and so on.
>>>>>>
>>>>>>   I think gentler language might be a good idea here.  The question was
>>>>> not
>>>>> at
>>>>> all unreasonable.
>>>>>
>>>>>
>>>>>   The hidden state transitions follow the Markov memorylessness property
>>>>> and
>>>>>
>>>>>> hence form a Markov Chain.
>>>>>>
>>>>>> In your case, you are trying to model your problem assuming that there
>>>>>>
>>>>> are
>>>>>
>>>>>> two underlying state sequences affecting the observed output. This
>>>>>>
>>>>> doesn't
>>>>>
>>>>>> fit into the HMM's definition and you probably want something else.
>>>>>>
>>>>>>   Actually, what the original poster wanted is quite sensible.  While
>>>>> the
>>>>> output sequence is due to a single input sequence, that input sequence
>>>>> is
>>>>> not observable.  As such, we have a noisy channel problem where we want
>>>>>
>>>> to
>>>>
>>>>> estimate something about that original sequence.  The point of the
>>>>> Markov
>>>>> model is that it defines a distribution of output sequence given an
>>>>> input
>>>>> sequence (and model).  This distribution can be inverted so that given a
>>>>> particular output sequence, we can estimate the probability distribution
>>>>>
>>>> of
>>>>
>>>>> input sequences conditional on the output.
>>>>>
>>>>> The typical decoding algorithm for HMM's estimates only the maximum
>>>>> likelihood input sequence but this does not negate the fact that we have
>>>>>
>>>> a
>>>>
>>>>> distribution.  There are alternative decoding algorithms that allow a
>>>>> set
>>>>> of
>>>>> high probability sequences to be estimated or allow a partial
>>>>> probability
>>>>> lattice to be output that allows alternative sequences to be probed.
>>>>>
>>>>> If you do want to fit your problem into the HMM framework, you need to
>>>>>
>>>>>> condense the X1 and X2 sequences into a single set and then condition
>>>>>>
>>>>> the
>>>>> Ys
>>>>>> on it.
>>>>>>
>>>>>>   Not at all.
>>>>>
>>>>>   3. Can we get output probabilities from the HMM for a concrete state?
>>>>>>>   Yes, after training, you can retrieve any of the trained model's
>>>>>> distributions as a Mahout Matrix type and use get(row, col).
>>>>>>
>>>>>>   This is not quite what the question was.
>>>>>

Re: HMM investigations

Posted by Ted Dunning <te...@gmail.com>.

I remember this problem.

Is it possible for you to post some sample data?

On Sun, Jul 24, 2011 at 12:08 PM, Svetlomir Kasabov <
skasab2s@smail.inf.fh-brs.de> wrote:

>
> Hello again and thanks for the replies of both of you, I really apreciate
> them. The most important think is, that you try helping and how you do this
> is irrelevant :). I didn't feel angry/insulted.
>
>
> Yes, X1 and X2 are two independent hidden sequences, like
>
> BP -- BP -- BP (Blood Pressure)
> HR -- HR -- HR (Heart Rate)
> And I want to train the model to predict the probability of giving a drug Y
> to a patient (for example, with this sequence)
> Y=0 -- Y=0 -- Y=1
>
> I already tried this with logistic regression, but ended with poor results
> (probably because of my small example set). Logistic regression has also no
> built-in time series and that's why Imust analyze the X's changes using
> percentiles and then train the logistic model with these percentiles. In
> this way I reduce the dimensions to only one. That's why I thought that the
> HMM can do this for me 'out of the box', staying in the dimension of 2, if
> they allow to have two hidden chains, like this:
>
> http://t3.gstatic.com/images?**q=tbn:ANd9GcR8pu4bSm-MSyg3Pj0-**
> aTyi8FaqUOy4U2bcKJBTBYKKvgAhyw**6P<http://t3.gstatic.com/images?q=tbn:ANd9GcR8pu4bSm-MSyg3Pj0-aTyi8FaqUOy4U2bcKJBTBYKKvgAhyw6P>
>
> or 'coupled' HMMs.
>
> I am not very experienced with the HMMs, but will read further the
> literature and Mahout's API :).
>
> Maybe reducing the dimensions is not that bad idea? I've read that we can
> do it with PCA (Principle Components Analysis). Is there a Ḿahout code for
> this somewhere?
>
> Thanks a lot once again,
>
> Svetlomir.
>
>
>
> Am 24.07.2011 20:46, schrieb Ted Dunning:
>
>  My impression (and Svetlomir should correct me) is that the intent was to
>> use two HMM's on separate inputs and then use the decoded state sequences
>> from those as inputs to a third HMM.
>>
>> If that is the question, then I think that Mahout's HMM's are sufficiently
>> object-ish that this should work.  Obviously, it will take multiple
>> training
>> passes to train each separate model.
>>
>> On Sun, Jul 24, 2011 at 11:25 AM, Dhruv<dh...@gmail.com>  wrote:
>>
>>  Svetlomir and Ted -- I was not trying to be rude, sorry if I came across
>>> that way because of my exuberance. I apologize.
>>>
>>> I was eager to help and may have acted too fast and misunderstood the
>>> question, so I turn to both of you for a little clarification.
>>>
>>> I'm confused whether the X's refer to the hidden states, or training
>>> instances. Since the hidden sequence is always a Markov Chain in HMMs, I
>>> assumed that Svetlomir meant that X1 and X2 were two separate hidden
>>> state
>>> sequences because Markov Chain was explicitly mentioned in his original
>>> question. To quote:
>>>
>>> -----------
>>> X1----X1----X1----...X1  (Markov Chain for input parameter 1 =>
>>>  monitoring
>>> X1's changes over time)
>>>
>>> X2----X2----X2----...X2  (Markov Chain for intput parameter 2 =>
>>>  monitoring
>>> X2's changes over time)
>>> -----------
>>>
>>> Further, since X1 and X2 were not slated to have any relationship with
>>> each
>>> other and since they were the observations of two different parameters, I
>>> construed that X1 and X2 represented two separate hidden state sequences.
>>> I
>>> gathered that the hidden state sequences X1 and X2 are drawn from two
>>> disjoint hidden vocabulary sets. The user wants to discover the model on
>>> some training set and then, to the trained model, feed Y for decoding to
>>> arrive at the most likely sequence of states, X1 and X2 which emitted Y.
>>>
>>> In my answer, I continued with this line saying that in one training, you
>>> can't arrive at two separate models for X1 and X2 which contain the
>>> requisite distributions which can be used for decoding, say sequences of
>>> X1
>>> to have produced Y or sequence of X2 to have produced Y. Hence, I
>>> suggested
>>> having only one set for the hidden states, combining X1s and X2s and then
>>> train the model on it. Given the domain of application, this may or may
>>> not
>>> make sense, hence I was doubtful of formulating the problem as HMM and
>>> suggested alternatives.
>>>
>>> However:
>>>
>>> If X's are two separate input sequences for training, then yes, the
>>> current
>>> implementation is capable of training the HMM. If Y is the output, then
>>> one
>>> can decode, after training, the sequence of hidden states which most
>>> likely
>>> produced Y.
>>>
>>> For the output probability question, my answer was to use the trained
>>> model's HmmModel.getEmissionMatrix.**get(hiddenState, emittedState)
>>> method to
>>> compute the output probability for a particular hidden state. I believe
>>> this
>>> is not what the user wanted?
>>>
>>>
>>> Dhruv
>>>
>>> On Sun, Jul 24, 2011 at 12:56 PM, Ted Dunning<te...@gmail.com>
>>> wrote:
>>>
>>>  On Sun, Jul 24, 2011 at 7:52 AM, Dhruv<dh...@gmail.com>  wrote:
>>>>
>>>>  ... If you look into the *definition* of HMM,  the hidden sequence is
>>>>>
>>>> drawn
>>>>
>>>>> from
>>>>> only one set. The hidden sequence's transitions can be expressed as a
>>>>>
>>>> joint
>>>>
>>>>> probability p(s0, s1). Similarly the observed sequence has a joint
>>>>> distribution with the hidden sequence such as p(y0, s1) and so on.
>>>>>
>>>>>  I think gentler language might be a good idea here.  The question was
>>>> not
>>>> at
>>>> all unreasonable.
>>>>
>>>>
>>>>  The hidden state transitions follow the Markov memorylessness property
>>>>>
>>>> and
>>>>
>>>>> hence form a Markov Chain.
>>>>>
>>>>> In your case, you are trying to model your problem assuming that there
>>>>>
>>>> are
>>>>
>>>>> two underlying state sequences affecting the observed output. This
>>>>>
>>>> doesn't
>>>>
>>>>> fit into the HMM's definition and you probably want something else.
>>>>>
>>>>>  Actually, what the original poster wanted is quite sensible.  While
>>>> the
>>>> output sequence is due to a single input sequence, that input sequence
>>>> is
>>>> not observable.  As such, we have a noisy channel problem where we want
>>>>
>>> to
>>>
>>>> estimate something about that original sequence.  The point of the
>>>> Markov
>>>> model is that it defines a distribution of output sequence given an
>>>> input
>>>> sequence (and model).  This distribution can be inverted so that given a
>>>> particular output sequence, we can estimate the probability distribution
>>>>
>>> of
>>>
>>>> input sequences conditional on the output.
>>>>
>>>> The typical decoding algorithm for HMM's estimates only the maximum
>>>> likelihood input sequence but this does not negate the fact that we have
>>>>
>>> a
>>>
>>>> distribution.  There are alternative decoding algorithms that allow a
>>>> set
>>>> of
>>>> high probability sequences to be estimated or allow a partial
>>>> probability
>>>> lattice to be output that allows alternative sequences to be probed.
>>>>
>>>> If you do want to fit your problem into the HMM framework, you need to
>>>>
>>>>> condense the X1 and X2 sequences into a single set and then condition
>>>>>
>>>> the
>>>
>>>> Ys
>>>>> on it.
>>>>>
>>>>>  Not at all.
>>>>
>>>>
>>>>  3. Can we get output probabilities from the HMM for a concrete state?
>>>>>>
>>>>>>  Yes, after training, you can retrieve any of the trained model's
>>>>> distributions as a Mahout Matrix type and use get(row, col).
>>>>>
>>>>>  This is not quite what the question was.
>>>>
>>>>
>

Re: HMM investigations

Posted by Svetlomir Kasabov <sk...@smail.inf.fh-brs.de>.

Hello again and thanks for the replies of both of you, I really 
apreciate them. The most important think is, that you try helping and 
how you do this is irrelevant :). I didn't feel angry/insulted.


Yes, X1 and X2 are two independent hidden sequences, like

BP -- BP -- BP (Blood Pressure)
HR -- HR -- HR (Heart Rate)
And I want to train the model to predict the probability of giving a 
drug Y to a patient (for example, with this sequence)
Y=0 -- Y=0 -- Y=1

I already tried this with logistic regression, but ended with poor 
results (probably because of my small example set). Logistic regression 
has also no built-in time series and that's why Imust analyze the X's 
changes using percentiles and then train the logistic model with these 
percentiles. In this way I reduce the dimensions to only one. That's why 
I thought that the HMM can do this for me 'out of the box', staying in 
the dimension of 2, if they allow to have two hidden chains, like this:

http://t3.gstatic.com/images?q=tbn:ANd9GcR8pu4bSm-MSyg3Pj0-aTyi8FaqUOy4U2bcKJBTBYKKvgAhyw6P

or 'coupled' HMMs.

I am not very experienced with the HMMs, but will read further the 
literature and Mahout's API :).

Maybe reducing the dimensions is not that bad idea? I've read that we 
can do it with PCA (Principle Components Analysis). Is there a Ḿahout 
code for this somewhere?

Thanks a lot once again,

Svetlomir.



Am 24.07.2011 20:46, schrieb Ted Dunning:
> My impression (and Svetlomir should correct me) is that the intent was to
> use two HMM's on separate inputs and then use the decoded state sequences
> from those as inputs to a third HMM.
>
> If that is the question, then I think that Mahout's HMM's are sufficiently
> object-ish that this should work.  Obviously, it will take multiple training
> passes to train each separate model.
>
> On Sun, Jul 24, 2011 at 11:25 AM, Dhruv<dh...@gmail.com>  wrote:
>
>> Svetlomir and Ted -- I was not trying to be rude, sorry if I came across
>> that way because of my exuberance. I apologize.
>>
>> I was eager to help and may have acted too fast and misunderstood the
>> question, so I turn to both of you for a little clarification.
>>
>> I'm confused whether the X's refer to the hidden states, or training
>> instances. Since the hidden sequence is always a Markov Chain in HMMs, I
>> assumed that Svetlomir meant that X1 and X2 were two separate hidden state
>> sequences because Markov Chain was explicitly mentioned in his original
>> question. To quote:
>>
>> -----------
>> X1----X1----X1----...X1  (Markov Chain for input parameter 1 =>  monitoring
>> X1's changes over time)
>>
>> X2----X2----X2----...X2  (Markov Chain for intput parameter 2 =>  monitoring
>> X2's changes over time)
>> -----------
>>
>> Further, since X1 and X2 were not slated to have any relationship with each
>> other and since they were the observations of two different parameters, I
>> construed that X1 and X2 represented two separate hidden state sequences. I
>> gathered that the hidden state sequences X1 and X2 are drawn from two
>> disjoint hidden vocabulary sets. The user wants to discover the model on
>> some training set and then, to the trained model, feed Y for decoding to
>> arrive at the most likely sequence of states, X1 and X2 which emitted Y.
>>
>> In my answer, I continued with this line saying that in one training, you
>> can't arrive at two separate models for X1 and X2 which contain the
>> requisite distributions which can be used for decoding, say sequences of X1
>> to have produced Y or sequence of X2 to have produced Y. Hence, I suggested
>> having only one set for the hidden states, combining X1s and X2s and then
>> train the model on it. Given the domain of application, this may or may not
>> make sense, hence I was doubtful of formulating the problem as HMM and
>> suggested alternatives.
>>
>> However:
>>
>> If X's are two separate input sequences for training, then yes, the current
>> implementation is capable of training the HMM. If Y is the output, then one
>> can decode, after training, the sequence of hidden states which most likely
>> produced Y.
>>
>> For the output probability question, my answer was to use the trained
>> model's HmmModel.getEmissionMatrix.get(hiddenState, emittedState) method to
>> compute the output probability for a particular hidden state. I believe
>> this
>> is not what the user wanted?
>>
>>
>> Dhruv
>>
>> On Sun, Jul 24, 2011 at 12:56 PM, Ted Dunning<te...@gmail.com>
>> wrote:
>>
>>> On Sun, Jul 24, 2011 at 7:52 AM, Dhruv<dh...@gmail.com>  wrote:
>>>
>>>> ... If you look into the *definition* of HMM,  the hidden sequence is
>>> drawn
>>>> from
>>>> only one set. The hidden sequence's transitions can be expressed as a
>>> joint
>>>> probability p(s0, s1). Similarly the observed sequence has a joint
>>>> distribution with the hidden sequence such as p(y0, s1) and so on.
>>>>
>>> I think gentler language might be a good idea here.  The question was not
>>> at
>>> all unreasonable.
>>>
>>>
>>>> The hidden state transitions follow the Markov memorylessness property
>>> and
>>>> hence form a Markov Chain.
>>>>
>>>> In your case, you are trying to model your problem assuming that there
>>> are
>>>> two underlying state sequences affecting the observed output. This
>>> doesn't
>>>> fit into the HMM's definition and you probably want something else.
>>>>
>>> Actually, what the original poster wanted is quite sensible.  While the
>>> output sequence is due to a single input sequence, that input sequence is
>>> not observable.  As such, we have a noisy channel problem where we want
>> to
>>> estimate something about that original sequence.  The point of the Markov
>>> model is that it defines a distribution of output sequence given an input
>>> sequence (and model).  This distribution can be inverted so that given a
>>> particular output sequence, we can estimate the probability distribution
>> of
>>> input sequences conditional on the output.
>>>
>>> The typical decoding algorithm for HMM's estimates only the maximum
>>> likelihood input sequence but this does not negate the fact that we have
>> a
>>> distribution.  There are alternative decoding algorithms that allow a set
>>> of
>>> high probability sequences to be estimated or allow a partial probability
>>> lattice to be output that allows alternative sequences to be probed.
>>>
>>> If you do want to fit your problem into the HMM framework, you need to
>>>> condense the X1 and X2 sequences into a single set and then condition
>> the
>>>> Ys
>>>> on it.
>>>>
>>> Not at all.
>>>
>>>
>>>>> 3. Can we get output probabilities from the HMM for a concrete state?
>>>>>
>>>> Yes, after training, you can retrieve any of the trained model's
>>>> distributions as a Mahout Matrix type and use get(row, col).
>>>>
>>> This is not quite what the question was.
>>>

Re: HMM investigations

Posted by Ted Dunning <te...@gmail.com>.

My impression (and Svetlomir should correct me) is that the intent was to
use two HMM's on separate inputs and then use the decoded state sequences
from those as inputs to a third HMM.

If that is the question, then I think that Mahout's HMM's are sufficiently
object-ish that this should work.  Obviously, it will take multiple training
passes to train each separate model.

On Sun, Jul 24, 2011 at 11:25 AM, Dhruv <dh...@gmail.com> wrote:

> Svetlomir and Ted -- I was not trying to be rude, sorry if I came across
> that way because of my exuberance. I apologize.
>
> I was eager to help and may have acted too fast and misunderstood the
> question, so I turn to both of you for a little clarification.
>
> I'm confused whether the X's refer to the hidden states, or training
> instances. Since the hidden sequence is always a Markov Chain in HMMs, I
> assumed that Svetlomir meant that X1 and X2 were two separate hidden state
> sequences because Markov Chain was explicitly mentioned in his original
> question. To quote:
>
> -----------
> X1----X1----X1----...X1  (Markov Chain for input parameter 1 => monitoring
> X1's changes over time)
>
> X2----X2----X2----...X2  (Markov Chain for intput parameter 2 => monitoring
> X2's changes over time)
> -----------
>
> Further, since X1 and X2 were not slated to have any relationship with each
> other and since they were the observations of two different parameters, I
> construed that X1 and X2 represented two separate hidden state sequences. I
> gathered that the hidden state sequences X1 and X2 are drawn from two
> disjoint hidden vocabulary sets. The user wants to discover the model on
> some training set and then, to the trained model, feed Y for decoding to
> arrive at the most likely sequence of states, X1 and X2 which emitted Y.
>
> In my answer, I continued with this line saying that in one training, you
> can't arrive at two separate models for X1 and X2 which contain the
> requisite distributions which can be used for decoding, say sequences of X1
> to have produced Y or sequence of X2 to have produced Y. Hence, I suggested
> having only one set for the hidden states, combining X1s and X2s and then
> train the model on it. Given the domain of application, this may or may not
> make sense, hence I was doubtful of formulating the problem as HMM and
> suggested alternatives.
>
> However:
>
> If X's are two separate input sequences for training, then yes, the current
> implementation is capable of training the HMM. If Y is the output, then one
> can decode, after training, the sequence of hidden states which most likely
> produced Y.
>
> For the output probability question, my answer was to use the trained
> model's HmmModel.getEmissionMatrix.get(hiddenState, emittedState) method to
> compute the output probability for a particular hidden state. I believe
> this
> is not what the user wanted?
>
>
> Dhruv
>
> On Sun, Jul 24, 2011 at 12:56 PM, Ted Dunning <te...@gmail.com>
> wrote:
>
> > On Sun, Jul 24, 2011 at 7:52 AM, Dhruv <dh...@gmail.com> wrote:
> >
> > > ... If you look into the *definition* of HMM,  the hidden sequence is
> > drawn
> > > from
> > > only one set. The hidden sequence's transitions can be expressed as a
> > joint
> > > probability p(s0, s1). Similarly the observed sequence has a joint
> > > distribution with the hidden sequence such as p(y0, s1) and so on.
> > >
> >
> > I think gentler language might be a good idea here.  The question was not
> > at
> > all unreasonable.
> >
> >
> > >
> > > The hidden state transitions follow the Markov memorylessness property
> > and
> > > hence form a Markov Chain.
> > >
> > > In your case, you are trying to model your problem assuming that there
> > are
> > > two underlying state sequences affecting the observed output. This
> > doesn't
> > > fit into the HMM's definition and you probably want something else.
> > >
> >
> > Actually, what the original poster wanted is quite sensible.  While the
> > output sequence is due to a single input sequence, that input sequence is
> > not observable.  As such, we have a noisy channel problem where we want
> to
> > estimate something about that original sequence.  The point of the Markov
> > model is that it defines a distribution of output sequence given an input
> > sequence (and model).  This distribution can be inverted so that given a
> > particular output sequence, we can estimate the probability distribution
> of
> > input sequences conditional on the output.
> >
> > The typical decoding algorithm for HMM's estimates only the maximum
> > likelihood input sequence but this does not negate the fact that we have
> a
> > distribution.  There are alternative decoding algorithms that allow a set
> > of
> > high probability sequences to be estimated or allow a partial probability
> > lattice to be output that allows alternative sequences to be probed.
> >
> > If you do want to fit your problem into the HMM framework, you need to
> > > condense the X1 and X2 sequences into a single set and then condition
> the
> > > Ys
> > > on it.
> > >
> >
> > Not at all.
> >
> >
> > > > 3. Can we get output probabilities from the HMM for a concrete state?
> > > >
> > >
> > > Yes, after training, you can retrieve any of the trained model's
> > > distributions as a Mahout Matrix type and use get(row, col).
> > >
> >
> > This is not quite what the question was.
> >
>

Re: HMM investigations

Posted by Dhruv <dh...@gmail.com>.

Svetlomir and Ted -- I was not trying to be rude, sorry if I came across
that way because of my exuberance. I apologize.

I was eager to help and may have acted too fast and misunderstood the
question, so I turn to both of you for a little clarification.

I'm confused whether the X's refer to the hidden states, or training
instances. Since the hidden sequence is always a Markov Chain in HMMs, I
assumed that Svetlomir meant that X1 and X2 were two separate hidden state
sequences because Markov Chain was explicitly mentioned in his original
question. To quote:

-----------
X1----X1----X1----...X1  (Markov Chain for input parameter 1 => monitoring
X1's changes over time)

X2----X2----X2----...X2  (Markov Chain for intput parameter 2 => monitoring
X2's changes over time)
-----------

Further, since X1 and X2 were not slated to have any relationship with each
other and since they were the observations of two different parameters, I
construed that X1 and X2 represented two separate hidden state sequences. I
gathered that the hidden state sequences X1 and X2 are drawn from two
disjoint hidden vocabulary sets. The user wants to discover the model on
some training set and then, to the trained model, feed Y for decoding to
arrive at the most likely sequence of states, X1 and X2 which emitted Y.

In my answer, I continued with this line saying that in one training, you
can't arrive at two separate models for X1 and X2 which contain the
requisite distributions which can be used for decoding, say sequences of X1
to have produced Y or sequence of X2 to have produced Y. Hence, I suggested
having only one set for the hidden states, combining X1s and X2s and then
train the model on it. Given the domain of application, this may or may not
make sense, hence I was doubtful of formulating the problem as HMM and
suggested alternatives.

However:

If X's are two separate input sequences for training, then yes, the current
implementation is capable of training the HMM. If Y is the output, then one
can decode, after training, the sequence of hidden states which most likely
produced Y.

For the output probability question, my answer was to use the trained
model's HmmModel.getEmissionMatrix.get(hiddenState, emittedState) method to
compute the output probability for a particular hidden state. I believe this
is not what the user wanted?

Dhruv

On Sun, Jul 24, 2011 at 12:56 PM, Ted Dunning <te...@gmail.com> wrote:

> On Sun, Jul 24, 2011 at 7:52 AM, Dhruv <dh...@gmail.com> wrote:
>
> > ... If you look into the *definition* of HMM,  the hidden sequence is
> drawn
> > from
> > only one set. The hidden sequence's transitions can be expressed as a
> joint
> > probability p(s0, s1). Similarly the observed sequence has a joint
> > distribution with the hidden sequence such as p(y0, s1) and so on.
> >
>
> I think gentler language might be a good idea here.  The question was not
> at
> all unreasonable.
>
>
> >
> > The hidden state transitions follow the Markov memorylessness property
> and
> > hence form a Markov Chain.
> >
> > In your case, you are trying to model your problem assuming that there
> are
> > two underlying state sequences affecting the observed output. This
> doesn't
> > fit into the HMM's definition and you probably want something else.
> >
>
> Actually, what the original poster wanted is quite sensible.  While the
> output sequence is due to a single input sequence, that input sequence is
> not observable.  As such, we have a noisy channel problem where we want to
> estimate something about that original sequence.  The point of the Markov
> model is that it defines a distribution of output sequence given an input
> sequence (and model).  This distribution can be inverted so that given a
> particular output sequence, we can estimate the probability distribution of
> input sequences conditional on the output.
>
> The typical decoding algorithm for HMM's estimates only the maximum
> likelihood input sequence but this does not negate the fact that we have a
> distribution.  There are alternative decoding algorithms that allow a set
> of
> high probability sequences to be estimated or allow a partial probability
> lattice to be output that allows alternative sequences to be probed.
>
> If you do want to fit your problem into the HMM framework, you need to
> > condense the X1 and X2 sequences into a single set and then condition the
> > Ys
> > on it.
> >
>
> Not at all.
>
>
> > > 3. Can we get output probabilities from the HMM for a concrete state?
> > >
> >
> > Yes, after training, you can retrieve any of the trained model's
> > distributions as a Mahout Matrix type and use get(row, col).
> >
>
> This is not quite what the question was.
>

Re: HMM investigations

Posted by Ted Dunning <te...@gmail.com>.

On Sun, Jul 24, 2011 at 7:52 AM, Dhruv <dh...@gmail.com> wrote:

> ... If you look into the *definition* of HMM,  the hidden sequence is drawn
> from
> only one set. The hidden sequence's transitions can be expressed as a joint
> probability p(s0, s1). Similarly the observed sequence has a joint
> distribution with the hidden sequence such as p(y0, s1) and so on.
>

I think gentler language might be a good idea here.  The question was not at
all unreasonable.

>
> The hidden state transitions follow the Markov memorylessness property and
> hence form a Markov Chain.
>
> In your case, you are trying to model your problem assuming that there are
> two underlying state sequences affecting the observed output. This doesn't
> fit into the HMM's definition and you probably want something else.
>

Actually, what the original poster wanted is quite sensible.  While the
output sequence is due to a single input sequence, that input sequence is
not observable.  As such, we have a noisy channel problem where we want to
estimate something about that original sequence.  The point of the Markov
model is that it defines a distribution of output sequence given an input
sequence (and model).  This distribution can be inverted so that given a
particular output sequence, we can estimate the probability distribution of
input sequences conditional on the output.

The typical decoding algorithm for HMM's estimates only the maximum
likelihood input sequence but this does not negate the fact that we have a
distribution.  There are alternative decoding algorithms that allow a set of
high probability sequences to be estimated or allow a partial probability
lattice to be output that allows alternative sequences to be probed.

If you do want to fit your problem into the HMM framework, you need to
> condense the X1 and X2 sequences into a single set and then condition the
> Ys
> on it.
>

Not at all.

> > 3. Can we get output probabilities from the HMM for a concrete state?
> >
>
> Yes, after training, you can retrieve any of the trained model's
> distributions as a Mahout Matrix type and use get(row, col).
>

This is not quite what the question was.

Re: HMM investigations

Posted by Dhruv <dh...@gmail.com>.

On Sun, Jul 24, 2011 at 9:25 AM, Svetlomir Kasabov <
skasab2s@smail.inf.fh-brs.de> wrote:

> Hello,
>
> I consider using Mahout's implementation for Hidden Markov model (HMM) for
> prediction, but I want to clarify some important questions before using it:
>
>
> 1. I've read some literature about HMMs and in some sorces is written, that
> HMMs can also handle continiuous values as input (and not only discrete
> values). Can Mahout's implementation also handle such values ? My input data
> is only continious.
>

Mahout's HMM implementation can only handle discrete values. You could
sample your input to make it discrete.

> 2. Can Mahouts HMM have many hidden markov chains? I don't know, if I use
> the right terminology, but what I need is this HMM "architecture":
>
> X1----X1----X1----...X1  (Markov Chain for input parameter 1 => monitoring
> X1's changes over time)
>
> X2----X2----X2----...X2  (Markov Chain for intput parameter 2 => monitoring
> X2's changes over time)
>
> Y-----Y-----Y-----...Y   (Output value's changes over time)
>
> I think this architecture would allow me to train and predict output Y
> based on inputs X1 and X2.
>

If you look into the *definition* of HMM,  the hidden sequence is drawn from
only one set. The hidden sequence's transitions can be expressed as a joint
probability p(s0, s1). Similarly the observed sequence has a joint
distribution with the hidden sequence such as p(y0, s1) and so on.

The hidden state transitions follow the Markov memorylessness property and
hence form a Markov Chain.

In your case, you are trying to model your problem assuming that there are
two underlying state sequences affecting the observed output. This doesn't
fit into the HMM's definition and you probably want something else.

If you do want to fit your problem into the HMM framework, you need to
condense the X1 and X2 sequences into a single set and then condition the Ys
on it.

> 3. Can we get output probabilities from the HMM for a concrete state?
>

Yes, after training, you can retrieve any of the trained model's
distributions as a Mahout Matrix type and use get(row, col).

Recent refactorings have also made it possible to print the trained model on
screen.

>
> Many thanks in advance!
>
> Best regards,
>
> Svetlomir.
>
>