You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@opennlp.apache.org by Adriano Santos <ad...@gmail.com> on 2012/03/27 22:04:55 UTC

Document Categorizer - Classifying: Help

Hi, everyone.

 I'm trying to use Document Categorizer - Classifying, but  I could not run
the example ...
 I saw the documentation:
http://opennlp.apache.org/documentation/1.5.2-incubating/manual/opennlp.html#opennlp
 Can anyone help me with this?

Thanks for all.

-- 

Adriano Araújo Santos
***********************************************

*Professor da **Escola Superior de Aviação Civil - ESAC* *
*

*Professor do Curso de Sistemas de Informação - FACISA*
*Professor do Departamento de Computação da UEPB
* *PMI Membership
Mestrando em Ciência da Computação da UFCG*

*Pós-graduando em Gestão Empresarial de Projetos - MBA*

*MSP Lead - Microsoft Student Partner
Lider do Grupo de Usuários.NUG
**Twitter:* @Adriano_Santos

*Site:**https://sites.google.com/site/adrianosantospb*

Re: Document Categorizer - Classifying: Help

Posted by Jörn Kottmann <ko...@gmail.com>.
On 03/30/2012 04:24 PM, Adriano Santos wrote:
> In first time, I used this file:
>
> GMDecrease Major acquisitions that have a lower gross margin than the
> existing network also
> GMIncrease The upward movement of gross margin resulted from amounts
> pursuant to adjustments
>
> Second, this:
>
> GMDecrease Major acquisitions that have a lower gross margin than the
> existing network also \
>             had a negative impact on the overall gross margin, but it
> should improve following \
>             the implementation of its integration strategies .
> GMIncrease The upward movement of gross margin resulted from amounts
> pursuant to adjustments \
>             to obligations towards dealers .
>
> as documentation sample.
>
> where, GMDecrease and GMIncrease are class. Ok?
>
>
> I saw that I must use more document in training, correct? So, how can I
> represent many document in one class? This way:
>
> GMDecrease Major acquisitions that have a lower gross margin than the
> existing network also
> GMDecrease To perform classification you will need a maxent model - these
> are encapsulated in the DoccatModel class of OpenNLP tool
> GMDecrease First you need to grab the bytes from the serialized model on an
> InputStream - we'll leave it you to do that, since you were the one who
> serialized it to begin with. Now for the easy part
> GMDecrease The Document Categorizer can be trained on annotated training
> material. The data must be in OpenNLP Document Categorizer training format.
> ...
>
> GMIncrease The upward movement of gross margin resulted from amounts
> pursuant to adjustments
> GMIncrease The tags array contains one part-of-speech tag for each token in
> the input array
> GMIncrease Looks like the mailing list sever removed your attachment.
> Anyway, the output indicates
> ...
>

Yes, looks good. Format is class label + document in one line.
The document is whitespace tokenized.

Jörn

Re: Document Categorizer - Classifying: Help

Posted by Adriano Santos <ad...@gmail.com>.
In first time, I used this file:

GMDecrease Major acquisitions that have a lower gross margin than the
existing network also
GMIncrease The upward movement of gross margin resulted from amounts
pursuant to adjustments

Second, this:

GMDecrease Major acquisitions that have a lower gross margin than the
existing network also \
           had a negative impact on the overall gross margin, but it
should improve following \
           the implementation of its integration strategies .
GMIncrease The upward movement of gross margin resulted from amounts
pursuant to adjustments \
           to obligations towards dealers .

as documentation sample.

where, GMDecrease and GMIncrease are class. Ok?


I saw that I must use more document in training, correct? So, how can I
represent many document in one class? This way:

GMDecrease Major acquisitions that have a lower gross margin than the
existing network also
GMDecrease To perform classification you will need a maxent model - these
are encapsulated in the DoccatModel class of OpenNLP tool
GMDecrease First you need to grab the bytes from the serialized model on an
InputStream - we'll leave it you to do that, since you were the one who
serialized it to begin with. Now for the easy part
GMDecrease The Document Categorizer can be trained on annotated training
material. The data must be in OpenNLP Document Categorizer training format.
...

GMIncrease The upward movement of gross margin resulted from amounts
pursuant to adjustments
GMIncrease The tags array contains one part-of-speech tag for each token in
the input array
GMIncrease Looks like the mailing list sever removed your attachment.
Anyway, the output indicates
...




On Fri, Mar 30, 2012 at 10:49 AM, Jörn Kottmann <ko...@gmail.com> wrote:

> On 03/30/2012 03:47 PM, Adriano Santos wrote:
>
>> Ok...
>> I was trying use the documentation sample, 'cause I need to know if I can
>> use OpenNLP in my master's thesis as an experiment.
>> I attached the file that I used. I need to know if the format, that I
>> used, is correct.
>>
>
> Looks like the mailing list sever removed your attachment. Anyway, the
> output indicates
> that you just used two training documents. Was this intended?
> Otherwise there might be an issue with the format.
>
> Jörn
>



-- 

Adriano Araújo Santos
***********************************************

*Professor da **Escola Superior de Aviação Civil - ESAC* *
*

*Professor do Curso de Sistemas de Informação - FACISA*
*Professor do Departamento de Computação da UEPB
* *PMI Membership
Mestrando em Ciência da Computação da UFCG*

*Pós-graduando em Gestão Empresarial de Projetos - MBA*

*MSP Lead - Microsoft Student Partner
Lider do Grupo de Usuários.NUG
**Twitter:* @Adriano_Santos

*Site:**https://sites.google.com/site/adrianosantospb*

Re: Document Categorizer - Classifying: Help

Posted by Jörn Kottmann <ko...@gmail.com>.
On 03/30/2012 03:47 PM, Adriano Santos wrote:
> Ok...
> I was trying use the documentation sample, 'cause I need to know if I 
> can use OpenNLP in my master's thesis as an experiment.
> I attached the file that I used. I need to know if the format, that I 
> used, is correct.

Looks like the mailing list sever removed your attachment. Anyway, the 
output indicates
that you just used two training documents. Was this intended?
Otherwise there might be an issue with the format.

Jörn

Re: Document Categorizer - Classifying: Help

Posted by Adriano Santos <ad...@gmail.com>.
Ok...
I was trying use the documentation sample, 'cause I need to know if I can
use OpenNLP in my master's thesis as an experiment.
I attached the file that I used. I need to know if the format, that I used,
is correct.


On Fri, Mar 30, 2012 at 10:39 AM, Jörn Kottmann <ko...@gmail.com> wrote:

> Sorry for this bug. We have a jira for it, but no one every took time to
> fix it.
> Well, instead of the stack trace you should see an error message which
> tells you that you don't have enough training data.
>
> You should try with a few hundred examples at least, otherwise
> the model you produce will not really work.
>
> Jörn
>
>
> On 03/30/2012 03:36 PM, Adriano Santos wrote:
>
>> Hi Jörn, thanks for help me.
>>
>> I changed the class path and OpenNLP version. Ran, again, the sample and
>> returned this error:
>>
>> C:\apache-opennlp-1.5.2\bin>**opennlp DoccatTrainer -encoding UTF-8
>> -lang en
>> -data
>>  en-doccat.train -model en-doccat.bin
>> Indexing events using cutoff of 5
>>
>>         Computing event counts...  done. 2 events
>>         Indexing...  Dropped event GMDecrease:[bow=Major,
>> bow=acquisitions,
>> bow=
>> that, bow=have, bow=a, bow=lower, bow=gross, bow=margin, bow=than,
>> bow=the,
>> bow=
>> existing, bow=network, bow=also]
>> Dropped event GMIncrease:[bow=The, bow=upward, bow=movement, bow=of,
>> bow=gross,
>> bow=margin, bow=resulted, bow=from, bow=amounts, bow=pursuant, bow=to,
>> bow=adjus
>> tments]
>> done.
>> Sorting and merging events... Done indexing.
>> Incorporating indexed data for training...
>> Exception in thread "main" java.lang.NullPointerException
>>         at opennlp.maxent.GISTrainer.**trainModel(GISTrainer.java:**263)
>>         at opennlp.maxent.GIS.trainModel(**GIS.java:256)
>>         at opennlp.model.TrainUtil.train(**TrainUtil.java:182)
>>         at
>> opennlp.tools.doccat.**DocumentCategorizerME.train(**DocumentCategorizerM
>> E.java:154)
>>         at
>> opennlp.tools.doccat.**DocumentCategorizerME.train(**DocumentCategorizerM
>> E.java:176)
>>         at
>> opennlp.tools.doccat.**DocumentCategorizerME.train(**DocumentCategorizerM
>> E.java:192)
>>         at
>> opennlp.tools.cmdline.doccat.**DoccatTrainerTool.run(**DoccatTrainerTool.
>> java:91)
>>         at opennlp.tools.cmdline.CLI.**main(CLI.java:191)
>>
>>
>>
>> On Fri, Mar 30, 2012 at 10:18 AM, Jörn Kottmann<ko...@gmail.com>
>>  wrote:
>>
>>  Looks like you do not have the maxent jar on the classpath.
>>> Maybe it is just an issue with our script (does that work with head?).
>>>
>>> Anyway, try to go to this dir:
>>>
>>> C:\Program Files\Apache Software Foundation\opennlp-tools-1.5.0
>>>
>>> and type: bin/opennlp
>>>
>>> Or does it not work because of the whitespace in Program Files?
>>>
>>> I suggest that you try 1.5.2, if I remember it correctly we spent some
>>> time on this script to fix it.
>>>
>>> Jörn
>>>
>>>
>>> On 03/30/2012 03:14 PM, Adriano Santos wrote:
>>>
>>>  Hi, people.
>>>>
>>>> So... I run the exemple and return this error:
>>>>
>>>> C:\Program Files\Apache Software Foundation\opennlp-tools-1.5.****
>>>>
>>>> 0\bin>opennlp
>>>> Docc
>>>> atTrainer -encoding UTF-8 -lang en -data en-doccat.train -model
>>>> en-doccat.bin
>>>> Exception in thread "main" java.lang.****NoClassDefFoundError:
>>>> opennlp/model/EventSt
>>>> ream
>>>>         at opennlp.tools.cmdline.CLI.<****clinit>(CLI.java:107)
>>>> Caused by: java.lang.****ClassNotFoundException:
>>>> opennlp.model.EventStream
>>>>         at java.net.URLClassLoader$1.run(****URLClassLoader.java:366)
>>>>         at java.net.URLClassLoader$1.run(****URLClassLoader.java:355)
>>>>         at java.security.****AccessController.doPrivileged(****Native
>>>> Method)
>>>>         at java.net.URLClassLoader.****findClass(URLClassLoader.java:**
>>>> **354)
>>>>         at java.lang.ClassLoader.****loadClass(ClassLoader.java:****
>>>> 423)
>>>>         at sun.misc.Launcher$****AppClassLoader.loadClass(**
>>>> Launcher.java:308)
>>>>         at java.lang.ClassLoader.****loadClass(ClassLoader.java:****
>>>> 356)
>>>>
>>>>         ... 1 more
>>>>
>>>> I'm using opennlp-tools-1.5.0 version.
>>>>
>>>> Thanks for all.
>>>>
>>>>
>>>> On Tue, Mar 27, 2012 at 8:40 PM, william.colen@gmail.com<
>>>> william.colen@gmail.com>   wrote:
>>>>
>>>>  Hi, Adriano,
>>>>
>>>>> We don't have any ready to use model for Document Categorizer yet. You
>>>>> should try training your own using the instructions.
>>>>>
>>>>> Regards,
>>>>> William
>>>>>
>>>>>
>>>>> On Tue, Mar 27, 2012 at 5:31 PM, Adriano Santos<adriano.nego@gmail.com
>>>>> >
>>>>> wrote:
>>>>>
>>>>>  To perform classification I need a maxent model. But I don’t have an
>>>>>> example this. In the others tasks (Name Finder, Tokenizer, Sentence
>>>>>> Detector...) has example... I’m beginner in the OpenNLP and I’d like
>>>>>> run
>>>>>> all existents examples.
>>>>>>
>>>>>> Can you help me?
>>>>>>
>>>>>> On Tue, Mar 27, 2012 at 5:17 PM, Jörn Kottmann<ko...@gmail.com>
>>>>>>
>>>>>>  wrote:
>>>>>
>>>>>  On 03/27/2012 10:04 PM, Adriano Santos wrote:
>>>>>>
>>>>>>>   I'm trying to use Document Categorizer - Classifying, but  I could
>>>>>>>
>>>>>>>> not
>>>>>>>> run
>>>>>>>> the example .
>>>>>>>>
>>>>>>>>  What the problem you have? Do you get an exception?
>>>>>>>>
>>>>>>> Jörn
>>>>>>>
>>>>>>>
>>>>>>>  --
>>>>>>
>>>>>> Adriano Araújo Santos
>>>>>> ***************************************************
>>>>>>
>>>>>>
>>>>>> *Professor da **Escola Superior de Aviação Civil - ESAC* *
>>>>>> *
>>>>>>
>>>>>> *Professor do Curso de Sistemas de Informação - FACISA*
>>>>>> *Professor do Departamento de Computação da UEPB
>>>>>> * *PMI Membership
>>>>>> Mestrando em Ciência da Computação da UFCG*
>>>>>>
>>>>>> *Pós-graduando em Gestão Empresarial de Projetos - MBA*
>>>>>>
>>>>>> *MSP Lead - Microsoft Student Partner
>>>>>> Lider do Grupo de Usuários.NUG
>>>>>> **Twitter:* @Adriano_Santos
>>>>>>
>>>>>> *Site:**https://sites.google.****com/site/adrianosantospb*<htt**
>>>>>> ps://sites.google.com/site/**adrianosantospb*<https://sites.google.com/site/adrianosantospb*>
>>>>>> >
>>>>>>
>>>>>>
>>>>
>>
>


-- 

Adriano Araújo Santos
***********************************************

*Professor da **Escola Superior de Aviação Civil - ESAC* *
*

*Professor do Curso de Sistemas de Informação - FACISA*
*Professor do Departamento de Computação da UEPB
* *PMI Membership
Mestrando em Ciência da Computação da UFCG*

*Pós-graduando em Gestão Empresarial de Projetos - MBA*

*MSP Lead - Microsoft Student Partner
Lider do Grupo de Usuários.NUG
**Twitter:* @Adriano_Santos

*Site:**https://sites.google.com/site/adrianosantospb*

Re: Document Categorizer - Classifying: Help

Posted by Jörn Kottmann <ko...@gmail.com>.
Sorry for this bug. We have a jira for it, but no one every took time to 
fix it.
Well, instead of the stack trace you should see an error message which
tells you that you don't have enough training data.

You should try with a few hundred examples at least, otherwise
the model you produce will not really work.

Jörn

On 03/30/2012 03:36 PM, Adriano Santos wrote:
> Hi Jörn, thanks for help me.
>
> I changed the class path and OpenNLP version. Ran, again, the sample and
> returned this error:
>
> C:\apache-opennlp-1.5.2\bin>opennlp DoccatTrainer -encoding UTF-8 -lang en
> -data
>   en-doccat.train -model en-doccat.bin
> Indexing events using cutoff of 5
>
>          Computing event counts...  done. 2 events
>          Indexing...  Dropped event GMDecrease:[bow=Major, bow=acquisitions,
> bow=
> that, bow=have, bow=a, bow=lower, bow=gross, bow=margin, bow=than, bow=the,
> bow=
> existing, bow=network, bow=also]
> Dropped event GMIncrease:[bow=The, bow=upward, bow=movement, bow=of,
> bow=gross,
> bow=margin, bow=resulted, bow=from, bow=amounts, bow=pursuant, bow=to,
> bow=adjus
> tments]
> done.
> Sorting and merging events... Done indexing.
> Incorporating indexed data for training...
> Exception in thread "main" java.lang.NullPointerException
>          at opennlp.maxent.GISTrainer.trainModel(GISTrainer.java:263)
>          at opennlp.maxent.GIS.trainModel(GIS.java:256)
>          at opennlp.model.TrainUtil.train(TrainUtil.java:182)
>          at
> opennlp.tools.doccat.DocumentCategorizerME.train(DocumentCategorizerM
> E.java:154)
>          at
> opennlp.tools.doccat.DocumentCategorizerME.train(DocumentCategorizerM
> E.java:176)
>          at
> opennlp.tools.doccat.DocumentCategorizerME.train(DocumentCategorizerM
> E.java:192)
>          at
> opennlp.tools.cmdline.doccat.DoccatTrainerTool.run(DoccatTrainerTool.
> java:91)
>          at opennlp.tools.cmdline.CLI.main(CLI.java:191)
>
>
>
> On Fri, Mar 30, 2012 at 10:18 AM, Jörn Kottmann<ko...@gmail.com>  wrote:
>
>> Looks like you do not have the maxent jar on the classpath.
>> Maybe it is just an issue with our script (does that work with head?).
>>
>> Anyway, try to go to this dir:
>>
>> C:\Program Files\Apache Software Foundation\opennlp-tools-1.5.0
>>
>> and type: bin/opennlp
>>
>> Or does it not work because of the whitespace in Program Files?
>>
>> I suggest that you try 1.5.2, if I remember it correctly we spent some
>> time on this script to fix it.
>>
>> Jörn
>>
>>
>> On 03/30/2012 03:14 PM, Adriano Santos wrote:
>>
>>> Hi, people.
>>>
>>> So... I run the exemple and return this error:
>>>
>>> C:\Program Files\Apache Software Foundation\opennlp-tools-1.5.**
>>> 0\bin>opennlp
>>> Docc
>>> atTrainer -encoding UTF-8 -lang en -data en-doccat.train -model
>>> en-doccat.bin
>>> Exception in thread "main" java.lang.**NoClassDefFoundError:
>>> opennlp/model/EventSt
>>> ream
>>>          at opennlp.tools.cmdline.CLI.<**clinit>(CLI.java:107)
>>> Caused by: java.lang.**ClassNotFoundException: opennlp.model.EventStream
>>>          at java.net.URLClassLoader$1.run(**URLClassLoader.java:366)
>>>          at java.net.URLClassLoader$1.run(**URLClassLoader.java:355)
>>>          at java.security.**AccessController.doPrivileged(**Native Method)
>>>          at java.net.URLClassLoader.**findClass(URLClassLoader.java:**354)
>>>          at java.lang.ClassLoader.**loadClass(ClassLoader.java:**423)
>>>          at sun.misc.Launcher$**AppClassLoader.loadClass(**
>>> Launcher.java:308)
>>>          at java.lang.ClassLoader.**loadClass(ClassLoader.java:**356)
>>>          ... 1 more
>>>
>>> I'm using opennlp-tools-1.5.0 version.
>>>
>>> Thanks for all.
>>>
>>>
>>> On Tue, Mar 27, 2012 at 8:40 PM, william.colen@gmail.com<
>>> william.colen@gmail.com>   wrote:
>>>
>>>   Hi, Adriano,
>>>> We don't have any ready to use model for Document Categorizer yet. You
>>>> should try training your own using the instructions.
>>>>
>>>> Regards,
>>>> William
>>>>
>>>>
>>>> On Tue, Mar 27, 2012 at 5:31 PM, Adriano Santos<ad...@gmail.com>
>>>> wrote:
>>>>
>>>>> To perform classification I need a maxent model. But I don’t have an
>>>>> example this. In the others tasks (Name Finder, Tokenizer, Sentence
>>>>> Detector...) has example... I’m beginner in the OpenNLP and I’d like run
>>>>> all existents examples.
>>>>>
>>>>> Can you help me?
>>>>>
>>>>> On Tue, Mar 27, 2012 at 5:17 PM, Jörn Kottmann<ko...@gmail.com>
>>>>>
>>>> wrote:
>>>>
>>>>> On 03/27/2012 10:04 PM, Adriano Santos wrote:
>>>>>>    I'm trying to use Document Categorizer - Classifying, but  I could
>>>>>>> not
>>>>>>> run
>>>>>>> the example .
>>>>>>>
>>>>>>>   What the problem you have? Do you get an exception?
>>>>>> Jörn
>>>>>>
>>>>>>
>>>>> --
>>>>>
>>>>> Adriano Araújo Santos
>>>>> *************************************************
>>>>>
>>>>> *Professor da **Escola Superior de Aviação Civil - ESAC* *
>>>>> *
>>>>>
>>>>> *Professor do Curso de Sistemas de Informação - FACISA*
>>>>> *Professor do Departamento de Computação da UEPB
>>>>> * *PMI Membership
>>>>> Mestrando em Ciência da Computação da UFCG*
>>>>>
>>>>> *Pós-graduando em Gestão Empresarial de Projetos - MBA*
>>>>>
>>>>> *MSP Lead - Microsoft Student Partner
>>>>> Lider do Grupo de Usuários.NUG
>>>>> **Twitter:* @Adriano_Santos
>>>>>
>>>>> *Site:**https://sites.google.**com/site/adrianosantospb*<https://sites.google.com/site/adrianosantospb*>
>>>>>
>>>
>


Re: Document Categorizer - Classifying: Help

Posted by Adriano Santos <ad...@gmail.com>.
Hi Jörn, thanks for help me.

I changed the class path and OpenNLP version. Ran, again, the sample and
returned this error:

C:\apache-opennlp-1.5.2\bin>opennlp DoccatTrainer -encoding UTF-8 -lang en
-data
 en-doccat.train -model en-doccat.bin
Indexing events using cutoff of 5

        Computing event counts...  done. 2 events
        Indexing...  Dropped event GMDecrease:[bow=Major, bow=acquisitions,
bow=
that, bow=have, bow=a, bow=lower, bow=gross, bow=margin, bow=than, bow=the,
bow=
existing, bow=network, bow=also]
Dropped event GMIncrease:[bow=The, bow=upward, bow=movement, bow=of,
bow=gross,
bow=margin, bow=resulted, bow=from, bow=amounts, bow=pursuant, bow=to,
bow=adjus
tments]
done.
Sorting and merging events... Done indexing.
Incorporating indexed data for training...
Exception in thread "main" java.lang.NullPointerException
        at opennlp.maxent.GISTrainer.trainModel(GISTrainer.java:263)
        at opennlp.maxent.GIS.trainModel(GIS.java:256)
        at opennlp.model.TrainUtil.train(TrainUtil.java:182)
        at
opennlp.tools.doccat.DocumentCategorizerME.train(DocumentCategorizerM
E.java:154)
        at
opennlp.tools.doccat.DocumentCategorizerME.train(DocumentCategorizerM
E.java:176)
        at
opennlp.tools.doccat.DocumentCategorizerME.train(DocumentCategorizerM
E.java:192)
        at
opennlp.tools.cmdline.doccat.DoccatTrainerTool.run(DoccatTrainerTool.
java:91)
        at opennlp.tools.cmdline.CLI.main(CLI.java:191)



On Fri, Mar 30, 2012 at 10:18 AM, Jörn Kottmann <ko...@gmail.com> wrote:

> Looks like you do not have the maxent jar on the classpath.
> Maybe it is just an issue with our script (does that work with head?).
>
> Anyway, try to go to this dir:
>
> C:\Program Files\Apache Software Foundation\opennlp-tools-1.5.0
>
> and type: bin/opennlp
>
> Or does it not work because of the whitespace in Program Files?
>
> I suggest that you try 1.5.2, if I remember it correctly we spent some
> time on this script to fix it.
>
> Jörn
>
>
> On 03/30/2012 03:14 PM, Adriano Santos wrote:
>
>> Hi, people.
>>
>> So... I run the exemple and return this error:
>>
>> C:\Program Files\Apache Software Foundation\opennlp-tools-1.5.**
>> 0\bin>opennlp
>> Docc
>> atTrainer -encoding UTF-8 -lang en -data en-doccat.train -model
>> en-doccat.bin
>> Exception in thread "main" java.lang.**NoClassDefFoundError:
>> opennlp/model/EventSt
>> ream
>>         at opennlp.tools.cmdline.CLI.<**clinit>(CLI.java:107)
>> Caused by: java.lang.**ClassNotFoundException: opennlp.model.EventStream
>>         at java.net.URLClassLoader$1.run(**URLClassLoader.java:366)
>>         at java.net.URLClassLoader$1.run(**URLClassLoader.java:355)
>>         at java.security.**AccessController.doPrivileged(**Native Method)
>>         at java.net.URLClassLoader.**findClass(URLClassLoader.java:**354)
>>         at java.lang.ClassLoader.**loadClass(ClassLoader.java:**423)
>>         at sun.misc.Launcher$**AppClassLoader.loadClass(**
>> Launcher.java:308)
>>         at java.lang.ClassLoader.**loadClass(ClassLoader.java:**356)
>>         ... 1 more
>>
>> I'm using opennlp-tools-1.5.0 version.
>>
>> Thanks for all.
>>
>>
>> On Tue, Mar 27, 2012 at 8:40 PM, william.colen@gmail.com<
>> william.colen@gmail.com>  wrote:
>>
>>  Hi, Adriano,
>>>
>>> We don't have any ready to use model for Document Categorizer yet. You
>>> should try training your own using the instructions.
>>>
>>> Regards,
>>> William
>>>
>>>
>>> On Tue, Mar 27, 2012 at 5:31 PM, Adriano Santos<ad...@gmail.com>
>>> wrote:
>>>
>>>> To perform classification I need a maxent model. But I don’t have an
>>>> example this. In the others tasks (Name Finder, Tokenizer, Sentence
>>>> Detector...) has example... I’m beginner in the OpenNLP and I’d like run
>>>> all existents examples.
>>>>
>>>> Can you help me?
>>>>
>>>> On Tue, Mar 27, 2012 at 5:17 PM, Jörn Kottmann<ko...@gmail.com>
>>>>
>>> wrote:
>>>
>>>> On 03/27/2012 10:04 PM, Adriano Santos wrote:
>>>>>
>>>>>   I'm trying to use Document Categorizer - Classifying, but  I could
>>>>>> not
>>>>>> run
>>>>>> the example .
>>>>>>
>>>>>>  What the problem you have? Do you get an exception?
>>>>>
>>>>> Jörn
>>>>>
>>>>>
>>>>
>>>> --
>>>>
>>>> Adriano Araújo Santos
>>>> *************************************************
>>>>
>>>> *Professor da **Escola Superior de Aviação Civil - ESAC* *
>>>> *
>>>>
>>>> *Professor do Curso de Sistemas de Informação - FACISA*
>>>> *Professor do Departamento de Computação da UEPB
>>>> * *PMI Membership
>>>> Mestrando em Ciência da Computação da UFCG*
>>>>
>>>> *Pós-graduando em Gestão Empresarial de Projetos - MBA*
>>>>
>>>> *MSP Lead - Microsoft Student Partner
>>>> Lider do Grupo de Usuários.NUG
>>>> **Twitter:* @Adriano_Santos
>>>>
>>>> *Site:**https://sites.google.**com/site/adrianosantospb*<https://sites.google.com/site/adrianosantospb*>
>>>>
>>>
>>
>>
>


-- 

Adriano Araújo Santos
***********************************************

*Professor da **Escola Superior de Aviação Civil - ESAC* *
*

*Professor do Curso de Sistemas de Informação - FACISA*
*Professor do Departamento de Computação da UEPB
* *PMI Membership
Mestrando em Ciência da Computação da UFCG*

*Pós-graduando em Gestão Empresarial de Projetos - MBA*

*MSP Lead - Microsoft Student Partner
Lider do Grupo de Usuários.NUG
**Twitter:* @Adriano_Santos

*Site:**https://sites.google.com/site/adrianosantospb*

Re: Document Categorizer - Classifying: Help

Posted by Jörn Kottmann <ko...@gmail.com>.
Looks like you do not have the maxent jar on the classpath.
Maybe it is just an issue with our script (does that work with head?).

Anyway, try to go to this dir:
C:\Program Files\Apache Software Foundation\opennlp-tools-1.5.0

and type: bin/opennlp

Or does it not work because of the whitespace in Program Files?

I suggest that you try 1.5.2, if I remember it correctly we spent some
time on this script to fix it.

Jörn

On 03/30/2012 03:14 PM, Adriano Santos wrote:
> Hi, people.
>
> So... I run the exemple and return this error:
>
> C:\Program Files\Apache Software Foundation\opennlp-tools-1.5.0\bin>opennlp
> Docc
> atTrainer -encoding UTF-8 -lang en -data en-doccat.train -model
> en-doccat.bin
> Exception in thread "main" java.lang.NoClassDefFoundError:
> opennlp/model/EventSt
> ream
>          at opennlp.tools.cmdline.CLI.<clinit>(CLI.java:107)
> Caused by: java.lang.ClassNotFoundException: opennlp.model.EventStream
>          at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
>          at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
>          at java.security.AccessController.doPrivileged(Native Method)
>          at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
>          at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
>          at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
>          at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
>          ... 1 more
>
> I'm using opennlp-tools-1.5.0 version.
>
> Thanks for all.
>
>
> On Tue, Mar 27, 2012 at 8:40 PM, william.colen@gmail.com<
> william.colen@gmail.com>  wrote:
>
>> Hi, Adriano,
>>
>> We don't have any ready to use model for Document Categorizer yet. You
>> should try training your own using the instructions.
>>
>> Regards,
>> William
>>
>>
>> On Tue, Mar 27, 2012 at 5:31 PM, Adriano Santos<ad...@gmail.com>
>> wrote:
>>> To perform classification I need a maxent model. But I don’t have an
>>> example this. In the others tasks (Name Finder, Tokenizer, Sentence
>>> Detector...) has example... I’m beginner in the OpenNLP and I’d like run
>>> all existents examples.
>>>
>>> Can you help me?
>>>
>>> On Tue, Mar 27, 2012 at 5:17 PM, Jörn Kottmann<ko...@gmail.com>
>> wrote:
>>>> On 03/27/2012 10:04 PM, Adriano Santos wrote:
>>>>
>>>>>   I'm trying to use Document Categorizer - Classifying, but  I could not
>>>>> run
>>>>> the example .
>>>>>
>>>> What the problem you have? Do you get an exception?
>>>>
>>>> Jörn
>>>>
>>>
>>>
>>> --
>>>
>>> Adriano Araújo Santos
>>> ***********************************************
>>>
>>> *Professor da **Escola Superior de Aviação Civil - ESAC* *
>>> *
>>>
>>> *Professor do Curso de Sistemas de Informação - FACISA*
>>> *Professor do Departamento de Computação da UEPB
>>> * *PMI Membership
>>> Mestrando em Ciência da Computação da UFCG*
>>>
>>> *Pós-graduando em Gestão Empresarial de Projetos - MBA*
>>>
>>> *MSP Lead - Microsoft Student Partner
>>> Lider do Grupo de Usuários.NUG
>>> **Twitter:* @Adriano_Santos
>>>
>>> *Site:**https://sites.google.com/site/adrianosantospb*
>
>


Re: Document Categorizer - Classifying: Help

Posted by Adriano Santos <ad...@gmail.com>.
Hi, people.

So... I run the exemple and return this error:

C:\Program Files\Apache Software Foundation\opennlp-tools-1.5.0\bin>opennlp
Docc
atTrainer -encoding UTF-8 -lang en -data en-doccat.train -model
en-doccat.bin
Exception in thread "main" java.lang.NoClassDefFoundError:
opennlp/model/EventSt
ream
        at opennlp.tools.cmdline.CLI.<clinit>(CLI.java:107)
Caused by: java.lang.ClassNotFoundException: opennlp.model.EventStream
        at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
        at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
        ... 1 more

I'm using opennlp-tools-1.5.0 version.

Thanks for all.


On Tue, Mar 27, 2012 at 8:40 PM, william.colen@gmail.com <
william.colen@gmail.com> wrote:

> Hi, Adriano,
>
> We don't have any ready to use model for Document Categorizer yet. You
> should try training your own using the instructions.
>
> Regards,
> William
>
>
> On Tue, Mar 27, 2012 at 5:31 PM, Adriano Santos <ad...@gmail.com>
> wrote:
> > To perform classification I need a maxent model. But I don’t have an
> > example this. In the others tasks (Name Finder, Tokenizer, Sentence
> > Detector...) has example... I’m beginner in the OpenNLP and I’d like run
> > all existents examples.
> >
> > Can you help me?
> >
> > On Tue, Mar 27, 2012 at 5:17 PM, Jörn Kottmann <ko...@gmail.com>
> wrote:
> >
> >> On 03/27/2012 10:04 PM, Adriano Santos wrote:
> >>
> >>>  I'm trying to use Document Categorizer - Classifying, but  I could not
> >>> run
> >>> the example .
> >>>
> >>
> >> What the problem you have? Do you get an exception?
> >>
> >> Jörn
> >>
> >
> >
> >
> > --
> >
> > Adriano Araújo Santos
> > ***********************************************
> >
> > *Professor da **Escola Superior de Aviação Civil - ESAC* *
> > *
> >
> > *Professor do Curso de Sistemas de Informação - FACISA*
> > *Professor do Departamento de Computação da UEPB
> > * *PMI Membership
> > Mestrando em Ciência da Computação da UFCG*
> >
> > *Pós-graduando em Gestão Empresarial de Projetos - MBA*
> >
> > *MSP Lead - Microsoft Student Partner
> > Lider do Grupo de Usuários.NUG
> > **Twitter:* @Adriano_Santos
> >
> > *Site:**https://sites.google.com/site/adrianosantospb*
>



-- 

Adriano Araújo Santos
***********************************************

*Professor da **Escola Superior de Aviação Civil - ESAC* *
*

*Professor do Curso de Sistemas de Informação - FACISA*
*Professor do Departamento de Computação da UEPB
* *PMI Membership
Mestrando em Ciência da Computação da UFCG*

*Pós-graduando em Gestão Empresarial de Projetos - MBA*

*MSP Lead - Microsoft Student Partner
Lider do Grupo de Usuários.NUG
**Twitter:* @Adriano_Santos

*Site:**https://sites.google.com/site/adrianosantospb*

Re: Document Categorizer - Classifying: Help

Posted by "william.colen@gmail.com" <wi...@gmail.com>.
Hi, Adriano,

We don't have any ready to use model for Document Categorizer yet. You
should try training your own using the instructions.

Regards,
William


On Tue, Mar 27, 2012 at 5:31 PM, Adriano Santos <ad...@gmail.com> wrote:
> To perform classification I need a maxent model. But I don’t have an
> example this. In the others tasks (Name Finder, Tokenizer, Sentence
> Detector...) has example... I’m beginner in the OpenNLP and I’d like run
> all existents examples.
>
> Can you help me?
>
> On Tue, Mar 27, 2012 at 5:17 PM, Jörn Kottmann <ko...@gmail.com> wrote:
>
>> On 03/27/2012 10:04 PM, Adriano Santos wrote:
>>
>>>  I'm trying to use Document Categorizer - Classifying, but  I could not
>>> run
>>> the example .
>>>
>>
>> What the problem you have? Do you get an exception?
>>
>> Jörn
>>
>
>
>
> --
>
> Adriano Araújo Santos
> ***********************************************
>
> *Professor da **Escola Superior de Aviação Civil - ESAC* *
> *
>
> *Professor do Curso de Sistemas de Informação - FACISA*
> *Professor do Departamento de Computação da UEPB
> * *PMI Membership
> Mestrando em Ciência da Computação da UFCG*
>
> *Pós-graduando em Gestão Empresarial de Projetos - MBA*
>
> *MSP Lead - Microsoft Student Partner
> Lider do Grupo de Usuários.NUG
> **Twitter:* @Adriano_Santos
>
> *Site:**https://sites.google.com/site/adrianosantospb*

Re: Document Categorizer - Classifying: Help

Posted by Adriano Santos <ad...@gmail.com>.
To perform classification I need a maxent model. But I don’t have an
example this. In the others tasks (Name Finder, Tokenizer, Sentence
Detector...) has example... I’m beginner in the OpenNLP and I’d like run
all existents examples.

Can you help me?

On Tue, Mar 27, 2012 at 5:17 PM, Jörn Kottmann <ko...@gmail.com> wrote:

> On 03/27/2012 10:04 PM, Adriano Santos wrote:
>
>>  I'm trying to use Document Categorizer - Classifying, but  I could not
>> run
>> the example .
>>
>
> What the problem you have? Do you get an exception?
>
> Jörn
>



-- 

Adriano Araújo Santos
***********************************************

*Professor da **Escola Superior de Aviação Civil - ESAC* *
*

*Professor do Curso de Sistemas de Informação - FACISA*
*Professor do Departamento de Computação da UEPB
* *PMI Membership
Mestrando em Ciência da Computação da UFCG*

*Pós-graduando em Gestão Empresarial de Projetos - MBA*

*MSP Lead - Microsoft Student Partner
Lider do Grupo de Usuários.NUG
**Twitter:* @Adriano_Santos

*Site:**https://sites.google.com/site/adrianosantospb*

Re: Document Categorizer - Classifying: Help

Posted by Jörn Kottmann <ko...@gmail.com>.
On 03/27/2012 10:04 PM, Adriano Santos wrote:
>   I'm trying to use Document Categorizer - Classifying, but  I could not run
> the example .

What the problem you have? Do you get an exception?

Jörn