You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@opennlp.apache.org by Jean-Claude Dauphin <jc...@gmail.com> on 2011/05/13 11:33:04 UTC

maxent model is not compatible with Tokenizer training

Hi,

I tried to produce train models for french from a set of french human
resource positions data which are splitted in sentences and use it as sample
train data stream.
It works fine for the sentence detector model using *
SentenceDectectorME.train*

However, if I use the same sample as Tokenizer training content with *
opennlp.tools.tokenize.TokenizerME.train* , I got the following error:

The maxent model is not compatible!

Here is an excerpt from the training data:
---------------------------------------------------------------------------------------------------------------
L<SKIP>'<SKIP>anglais est indispensable<SKIP>, l<SKIP>'<SKIP>allemand est un
plus<SKIP>.
Votre rigueur naturelle associée à de bonnes capacités relationnelles vous
permettront de réussir dans ce poste<SKIP>.
L<SKIP>'<SKIP>agence Manpower ST CHAMOND recherche pour l<SKIP>'<SKIP>un de
ses clients un Responsable technique H/F<SKIP>.
Notre client<SKIP>, société spécialisée dans la commercialisation de système
de chauffage et sanitaire recherche son responsable technique<SKIP>.
De niveaux Bac +2 technique type Génie Climatique ou équivalent<SKIP>, vous
avez une expérience technico-commerciale dans le secteur du bâtiment<SKIP>.
Une connaissance du chauffage<SKIP>, du sanitaire et/ou de
l<SKIP>'<SKIP>électricité serait un plus<SKIP>.
Vous avez l<SKIP>'<SKIP>esprit d<SKIP>'<SKIP>équipe<SKIP>, le sens de
l<SKIP>'<SKIP>accueil et du contact téléphonique<SKIP>.
Vous maîtrisez les logiciels Word et Excel<SKIP>.
Une connaissance de l<SKIP>'<SKIP>anglais ou l<SKIP>'<SKIP>italien est
souhaitée<SKIP>.
Votre mission s<SKIP>'<SKIP>articulera autour de quatre activités
principales <SKIP>:

Diagnostic et renseignement technique par téléphone<SKIP>,
-------------------------------------------------------------------------------------

I would appreciate any advice on this issue.

Thank you for yr time,

JCD
-- 
Jean-Claude Dauphin

jc.dauphin@gmail.com
jc.dauphin@afus.unesco.org

http://kenai.com/projects/j-isis/
http://www.unesco.org/isis/
http://www.unesco.org/idams/
http://www.greenstone.org

Re: maxent model is not compatible with Tokenizer training

Posted by Jörn Kottmann <ko...@gmail.com>.

Nice that it works now.

I forgot to mention that you should remove the SPLIT tags in order
to train a sentence detector.

Jörn

On 5/13/11 11:56 AM, Jean-Claude Dauphin wrote:
> Thanks a lot Jörn, it works now. I don't know why I typed SKIP instead of
> SPLIT and I was focused on the error message.
>
> Sorry for taking yr time.
>
> Best wishes,
>
> Jean-Claude
>
>
> On Fri, May 13, 2011 at 11:47 AM, Jörn Kottmann<ko...@gmail.com>  wrote:
>
>> On 5/13/11 11:33 AM, Jean-Claude Dauphin wrote:
>>
>>> Hi,
>>>
>>> I tried to produce train models for french from a set of french human
>>> resource positions data which are splitted in sentences and use it as
>>> sample
>>> train data stream.
>>> It works fine for the sentence detector model using *
>>> SentenceDectectorME.train*
>>>
>>> However, if I use the same sample as Tokenizer training content with *
>>> opennlp.tools.tokenize.TokenizerME.train* , I got the following error:
>>>
>>> The maxent model is not compatible!
>>>
>> The error message sounds a bit strange, what it means is that you only
>> train
>> with NO_SPLIT events (I guess). The produced model will not be able to
>> split any tokens.
>>
>> We should fix the model validation code, or put out some more meaningful
>> error
>> message.
>>
>> Anyway, to solve you problem rename your<SKIP>  tags to<SPLIT>.
>>
>> Have a look at our documentation here:
>>
>> http://incubator.apache.org/opennlp/documentation/manual/opennlp.html#tools.tokenizer.cmdline.training
>>
>> Hope that helps,
>> Jörn
>>
>
>

Re: maxent model is not compatible with Tokenizer training

Posted by Jean-Claude Dauphin <jc...@gmail.com>.

Thanks a lot Jörn, it works now. I don't know why I typed SKIP instead of
SPLIT and I was focused on the error message.

Sorry for taking yr time.

Best wishes,

Jean-Claude


On Fri, May 13, 2011 at 11:47 AM, Jörn Kottmann <ko...@gmail.com> wrote:

> On 5/13/11 11:33 AM, Jean-Claude Dauphin wrote:
>
>> Hi,
>>
>> I tried to produce train models for french from a set of french human
>> resource positions data which are splitted in sentences and use it as
>> sample
>> train data stream.
>> It works fine for the sentence detector model using *
>> SentenceDectectorME.train*
>>
>> However, if I use the same sample as Tokenizer training content with *
>> opennlp.tools.tokenize.TokenizerME.train* , I got the following error:
>>
>> The maxent model is not compatible!
>>
>
> The error message sounds a bit strange, what it means is that you only
> train
> with NO_SPLIT events (I guess). The produced model will not be able to
> split any tokens.
>
> We should fix the model validation code, or put out some more meaningful
> error
> message.
>
> Anyway, to solve you problem rename your <SKIP> tags to <SPLIT>.
>
> Have a look at our documentation here:
>
> http://incubator.apache.org/opennlp/documentation/manual/opennlp.html#tools.tokenizer.cmdline.training
>
> Hope that helps,
> Jörn
>



-- 
Jean-Claude Dauphin

jc.dauphin@gmail.com
jc.dauphin@afus.unesco.org

http://kenai.com/projects/j-isis/
http://www.unesco.org/isis/
http://www.unesco.org/idams/
http://www.greenstone.org

Re: maxent model is not compatible with Tokenizer training

Posted by Jörn Kottmann <ko...@gmail.com>.

On 5/13/11 11:33 AM, Jean-Claude Dauphin wrote:
> Hi,
>
> I tried to produce train models for french from a set of french human
> resource positions data which are splitted in sentences and use it as sample
> train data stream.
> It works fine for the sentence detector model using *
> SentenceDectectorME.train*
>
> However, if I use the same sample as Tokenizer training content with *
> opennlp.tools.tokenize.TokenizerME.train* , I got the following error:
>
> The maxent model is not compatible!

The error message sounds a bit strange, what it means is that you only train
with NO_SPLIT events (I guess). The produced model will not be able to 
split any tokens.

We should fix the model validation code, or put out some more meaningful 
error
message.

Anyway, to solve you problem rename your <SKIP> tags to <SPLIT>.

Have a look at our documentation here:
http://incubator.apache.org/opennlp/documentation/manual/opennlp.html#tools.tokenizer.cmdline.training

Hope that helps,
Jörn