You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@opennlp.apache.org by Robert <bo...@centrum.cz> on 2011/09/21 19:03:25 UTC

Input for OpenNLP maximum entropy and TADM

Hello,
    I am quite new to maximum entropy and unfortunately, I'm fighting with input
to TADM (tadm.sf.net) and to OpenNLP. Maybe my question is off-topic, however
 I would like to know the answer. However, I would like to have an equivalent
inputs for OpenNLP GIS and TADM.

For example to open nlp the input are is the set of observations with the format:
(outcome_i, events_i) where events_i is a set of events.

For example, let have a database of binary vectors:

1 0 0 1 1 1 1 1 1 1
0 1 0 0 1 1 1 0 0 0
0 0 0 1 1 1 1 1 1 1
1 0 1 1 1 0 1 1 1 1
0 0 0 0 0 0 0 1 1 1
1 0 0 0 1 0 1 1 1 1
0 0 0 1 0 1 0 1 1 0
1 0 1 1 1 1 1 1 0 0
0 1 0 1 1 0 1 1 1 0

let model the column zero using all other columns. The input can be:

outcome | strings representing the observations/events
class=1 | 1=0 2=0 3=1 4=1 5=1 6=1 7=1 8=1 9=1
class=0 | 1=1 2=0 3=0 4=1 5=1 6=1 7=0 8=0 9=0
class=0 | 1=0 2=0 3=1 4=1 5=1 6=1 7=1 8=1 9=1
class=1 | 1=0 2=1 3=1 4=1 5=0 6=1 7=1 8=1 9=1
class=0 | 1=0 2=0 3=0 4=0 5=0 6=0 7=1 8=1 9=1
class=1 | 1=0 2=0 3=0 4=1 5=0 6=1 7=1 8=1 9=1
class=0 | 1=0 2=0 3=1 4=0 5=1 6=0 7=1 8=1 9=0
class=1 | 1=0 2=1 3=1 4=1 5=1 6=1 7=1 8=0 9=0
class=0 | 1=1 2=0 3=1 4=1 5=0 6=1 7=1 8=1 9=0
class=0 | 1=0 2=0 3=1 4=1 5=1 6=1 7=1 8=0 9=0
class=1 | 1=0 2=0 3=1 4=0 5=1 6=1 7=0 8=0 9=1
class=0 | 1=1 2=0 3=1 4=1 5=1 6=1 7=1 8=0 9=0
class=1 | 1=1 2=0 3=1 4=1 5=0 6=1 7=0 8=0 9=0
class=0 | 1=1 2=0 3=1 4=0 5=1 6=1 7=1 8=0 9=0

The string n=m means the bit n is set to the value m, counting n from 0. 

Now, what is the equivalent input for TADM ? I suspect that the input for TADM
consists of all observations. Is this correct ? However as I have only a very limited
number of observations I get a probability distribution that gives probability of
1/n for n bits.

I understand that this question is maybe a bit out of topic, but maybe there is someone who knows
answer to my question.

Thanks,
     Robert

Re: Input for OpenNLP maximum entropy and TADM

Posted by Jason Baldridge <ja...@gmail.com>.
The discussion here will perhaps help:

https://sourceforge.net/projects/tadm/forums/forum/473054/topic/1675097

Note that TADM isn't being actively supported as far as I am aware. I tried
to updated it a couple years ago, ran into trouble with the latest versions
of PetSC and TAO and ran out of time to do it. Having said that it still has
better training for maxent models since it does LMVM and OpenNLP Maxent
still (sadly) just does GIS. Been on my todo to change that situation for
some time, but I never seem to find the time. :(

Jason

On Wed, Sep 21, 2011 at 12:03 PM, Robert <bo...@centrum.cz> wrote:

>
> Hello,
>    I am quite new to maximum entropy and unfortunately, I'm fighting with
> input
> to TADM (tadm.sf.net) and to OpenNLP. Maybe my question is off-topic,
> however
>  I would like to know the answer. However, I would like to have an
> equivalent
> inputs for OpenNLP GIS and TADM.
>
> For example to open nlp the input are is the set of observations with the
> format:
> (outcome_i, events_i) where events_i is a set of events.
>
> For example, let have a database of binary vectors:
>
> 1 0 0 1 1 1 1 1 1 1
> 0 1 0 0 1 1 1 0 0 0
> 0 0 0 1 1 1 1 1 1 1
> 1 0 1 1 1 0 1 1 1 1
> 0 0 0 0 0 0 0 1 1 1
> 1 0 0 0 1 0 1 1 1 1
> 0 0 0 1 0 1 0 1 1 0
> 1 0 1 1 1 1 1 1 0 0
> 0 1 0 1 1 0 1 1 1 0
>
> let model the column zero using all other columns. The input can be:
>
> outcome | strings representing the observations/events
> class=1 | 1=0 2=0 3=1 4=1 5=1 6=1 7=1 8=1 9=1
> class=0 | 1=1 2=0 3=0 4=1 5=1 6=1 7=0 8=0 9=0
> class=0 | 1=0 2=0 3=1 4=1 5=1 6=1 7=1 8=1 9=1
> class=1 | 1=0 2=1 3=1 4=1 5=0 6=1 7=1 8=1 9=1
> class=0 | 1=0 2=0 3=0 4=0 5=0 6=0 7=1 8=1 9=1
> class=1 | 1=0 2=0 3=0 4=1 5=0 6=1 7=1 8=1 9=1
> class=0 | 1=0 2=0 3=1 4=0 5=1 6=0 7=1 8=1 9=0
> class=1 | 1=0 2=1 3=1 4=1 5=1 6=1 7=1 8=0 9=0
> class=0 | 1=1 2=0 3=1 4=1 5=0 6=1 7=1 8=1 9=0
> class=0 | 1=0 2=0 3=1 4=1 5=1 6=1 7=1 8=0 9=0
> class=1 | 1=0 2=0 3=1 4=0 5=1 6=1 7=0 8=0 9=1
> class=0 | 1=1 2=0 3=1 4=1 5=1 6=1 7=1 8=0 9=0
> class=1 | 1=1 2=0 3=1 4=1 5=0 6=1 7=0 8=0 9=0
> class=0 | 1=1 2=0 3=1 4=0 5=1 6=1 7=1 8=0 9=0
>
> The string n=m means the bit n is set to the value m, counting n from 0.
>
> Now, what is the equivalent input for TADM ? I suspect that the input for
> TADM
> consists of all observations. Is this correct ? However as I have only a
> very limited
> number of observations I get a probability distribution that gives
> probability of
> 1/n for n bits.
>
> I understand that this question is maybe a bit out of topic, but maybe
> there is someone who knows
> answer to my question.
>
> Thanks,
>      Robert
>



-- 
Jason Baldridge
Associate Professor, Department of Linguistics
The University of Texas at Austin
http://www.jasonbaldridge.com
http://twitter.com/jasonbaldridge