You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@stanbol.apache.org by Allel Benbrahim <ab...@object-ive.com> on 2012/03/25 16:46:18 UTC

Relevance of results extracted with Stanbol

Hello
The results we get from Stanbol are quite oftenly fuzzy.
For instance, we have in a text an occurrence "Jean-Luc Mélenchon", who is
candidate to the french elections, and the result obtained by Stanbol for
this is "People -> Jean-Luc Godard", who is a famous french movie-maker.

It seams that this issue is similar to the one reported by Mathieu d'Aquin
and for which a Jira case has been opened in September 2011 with an update
on the 6th of March 2012.

Could you confirm us that this issue is still ongoing ?
Would it be more relevant if we extracted results in english rather than
french ?
Is french planned in the roadmap ?
Thanks

Re: Relevance of results extracted with Stanbol

Posted by Rupert Westenthaler <ru...@gmail.com>.

On 25.03.2012, at 16:46, Allel Benbrahim wrote:

> Hello
> The results we get from Stanbol are quite oftenly fuzzy.
> For instance, we have in a text an occurrence "Jean-Luc Mélenchon", who is
> candidate to the french elections, and the result obtained by Stanbol for
> this is "People -> Jean-Luc Godard", who is a famous french movie-maker.
> 

Do you get this result by using the "NER engine -> NamedEntity linking engine" or the "KeywordLinkingEngine"?

> It seams that this issue is similar to the one reported by Mathieu d'Aquin
> and for which a Jira case has been opened in September 2011 with an update
> on the 6th of March 2012.
> 
Do you refer to 

    http://markmail.org/message/jifnvswo7rlq2epv and 
    https://issues.apache.org/jira/browse/STANBOL-320?

?

> Could you confirm us that this issue is still ongoing ?
> Would it be more relevant if we extracted results in english rather than
> french ?

English is definitely better supported than French, because OpenNLP has both NER models and POS (part of speech) models for English and nothing for French. 

> Is french planned in the roadmap ?

I am unsure if we should invest much time in filtering and post-processing of Enhancement results as such "optimization" are rather application case specific. However for some common sources of false suggestions (as the one referenced by STANBOL-320) might be exceptions to that.

I think in the long run investing in good entity disambiguation algorithms is the better way to go - and yes there are plans in that direction.

best
Rupert

> Thanks