You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@stanbol.apache.org by Sonia Gomez <go...@hotmail.com> on 2012/01/20 10:33:55 UTC

setting extraction sensitivity




Hello
I want setting extraction sensitivity of entity, i have in my text this french words ".....avec les chantiers ....." and Stanbol extract this entity " Person : Les Paul"
can i add the stop word or setting the extraction sensitivity ?
Thanks for your helps 
 		 	   		  

Re: setting extraction sensitivity

Posted by Rupert Westenthaler <ru...@gmail.com>.
Hi Sonia

Since revision 1228163 the NER engine uses the language as extracted by the langid engine to check if a NER model for that language is available. If no model is available it does not process the text.

Have a look at

    https://issues.apache.org/jira/browse/STANBOL-102

for details

best
Rupert

On 20.01.2012, at 11:34, Olivier Grisel wrote:

> 2012/1/20 Sonia Gomez <go...@hotmail.com>:
>> 
>> Hello
>> I want setting extraction sensitivity of entity, i have in my text this french words ".....avec les chantiers ....." and Stanbol extract this entity " Person : Les Paul"
>> can i add the stop word or setting the extraction sensitivity ?
> 
> The default NER engine in Stanbol
> (NamedEntityExtractionEnhancementEngine) does not work correctly on
> non-English content. You should probably disable that engine on French
> content.
> 
> Building statistic OpenNLP model for French is not a easy task to
> solve (although it's possible and deserve some investing time in it).
> 
> -- 
> Olivier
> http://twitter.com/ogrisel - http://github.com/ogrisel


Re: setting extraction sensitivity

Posted by Olivier Grisel <ol...@ensta.org>.
2012/1/20 Sonia Gomez <go...@hotmail.com>:
>
> Hello
> I want setting extraction sensitivity of entity, i have in my text this french words ".....avec les chantiers ....." and Stanbol extract this entity " Person : Les Paul"
> can i add the stop word or setting the extraction sensitivity ?

The default NER engine in Stanbol
(NamedEntityExtractionEnhancementEngine) does not work correctly on
non-English content. You should probably disable that engine on French
content.

Building statistic OpenNLP model for French is not a easy task to
solve (although it's possible and deserve some investing time in it).

-- 
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel