You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@stanbol.apache.org by Fabian Christ <ch...@googlemail.com> on 2012/11/27 11:14:10 UTC

Confused by engines names

Hi,

enhancement engines in Stanbol can have several names and this is confusing
myself and very likely our users. Here are some examples that I came across
when trying to identify the running engines. I started to look at the
Web-UI and clicked through the OSGi console.

dbpediaLinking (NamedEntityTaggingEngine) ->
Named Entity Tagging -> Entity Tagging ->
/engines/entitytagging

entityhubExtraction (EntityLinkingEngine) ->
Entityhub Linking -> Entityhub Linking ->
/engines/entityhublinking

Could we simplify this a bit to make it more obvious especially for new
users what is going on?

Best,
 - Fabian

-- 
Fabian
http://twitter.com/fctwitt

Re: Confused by engines names

Posted by Rupert Westenthaler <ru...@gmail.com>.
Hi

there are also inconsistencies in the names of the OSGI parameters,
default names of Engines ... Thats why I would like to make an other
0.* release and than change/fix all those things while working towards
the 1.0 release

best
Rupert

On Tue, Nov 27, 2012 at 8:38 PM, Reto Bachmann-Gmür <re...@apache.org> wrote:
> Which reminded me that we already discussed once that the artifact names
> are unncessarily long, create STANBOL-820. Maybe some other renaming could
> be done with that?
>
> Cheers,
> Reto
>
> On Tue, Nov 27, 2012 at 12:17 PM, Rupert Westenthaler <
> rupert.westenthaler@gmail.com> wrote:
>
>> Hi Fabian
>>
>> Short version:
>>
>> I totally agree. Our vocabulary has changed over time, but the Engines
>> still use the names as when they where introduced. Changing them
>> (artifactIds and class names) is dangerous as this does break
>> backwards compatibility. So I would suggest change names only if we
>> can also come up with better implementation/design.
>>
>> Regarding Vocabulary I think we should prefer the terms
>> "EntityLinking" and "NamedEntityLinking" and deprecate all others like
>> "keyword" instead of "entity" or "extraction" or "tagging" instead of
>> "linking".
>>
>> The 'engines/entitylinking' and 'engines/entityhublinking' introduced
>> by STANBOL-733 do already use this new terminology. They also
>> deprecate the 'engines/keywordextraction'.
>>
>> - - -
>>
>> Long version with more background information
>>
>> Regarding the linking of Entities there are currently two different
>> principles:
>>
>> * "NamedEntityLinking": A "NamedEntity" has a 'selected text' AND a
>> 'type'. So the selected text AND the type can be used for linking
>> * "EntityLinking": An "Entity" does only have a 'selected text'. Here
>> linking is only possible based on the selected text.
>>
>> The plan would be to also have two Engine implementations that support
>> those linking models.
>>
>> * 'NamedEntityLinkingEngine' (currently /engines/entitytagging)
>> * 'EntityLinkingEngine' (was /engines/keywordextraction (now
>> deprecated) ; since yesterday  /engines/entitylinking)
>>
>> Those should not have external dependencies (meaning to Stanbol
>> components other than Stanbol Commons, Enhancer module; also not other
>> major frameworks such as Solr or OpenNLP; no calls to external
>> services). That would allow to keep those Engines within the enhancer
>> module but also means that those implementation can not be directly
>> used by the user (as the Service used for linking will be just defined
>> by an Interface without an actual implementation.
>>
>> Because of that there will be "Engines" that are based on the above,
>> but come with adapters to Services that do support the EntityLookup.
>> The default will be implementations based on the StanbolEntityhub, but
>> Stanbol users could also implement versions for their own
>> infrastructure needs.
>>
>> The "EntityhubLinking" module [1] is the first example. When you look
>> at the module you will recognize that it does not contain an single
>> EnhancementEngine implementation. It only provides Entityhub specific
>> implementations of the EntitySearcher interface defined by the
>> "EntityLinkingEngine" and a OSGI component that allows users to
>> configure an EntityLinkingEngine instance that uses the Entityhub to
>> lookup Entities.
>>
>> Current state:
>>
>> Currently we are not yet there. The '/engines/entitytagging' still
>> implements both NamedEntityLinking AND Lookup via the Entityhub. This
>> engine could be replaced by a 'engines/namedentitylinking' that
>> follows the design as described above. The new
>> '/engines/entitylinking' already implements the above design. However
>> it still depends on the Entityhub, because the EntitySearcher
>> interface [3] that is still using the Entityhub Model classes.
>>
>> 'engines/entityhublinking' currently provides the ability to do
>> 'entitylinking' with the Entityhub. As soon as the
>> 'engines/namedentitylinking' is available I would add named entity
>> linking functionality to that module. In a last step this module will
>> also move out of the /enhancer component (as already suggested by
>> STANBOL-805 [4]).
>>
>>
>> BTW this design was the result of this [2] discussion on the Stanbol
>> dev mailing list.
>>
>> best
>> Rupert
>>
>>
>>
>> [1]
>> http://svn.apache.org/repos/asf/stanbol/trunk/enhancer/engines/entityhublinking/
>> [2] http://markmail.org/message/nptkntyuthv7wwqh
>> [3]
>> http://stanbol.staging.apache.org/docs/trunk/components/enhancer/engines/entitylinking#entitysearcher
>> [4] https://issues.apache.org/jira/browse/STANBOL-805
>>
>>
>> On Tue, Nov 27, 2012 at 11:14 AM, Fabian Christ
>> <ch...@googlemail.com> wrote:
>> > Hi,
>> >
>> > enhancement engines in Stanbol can have several names and this is
>> confusing
>> > myself and very likely our users. Here are some examples that I came
>> across
>> > when trying to identify the running engines. I started to look at the
>> > Web-UI and clicked through the OSGi console.
>> >
>> > dbpediaLinking (NamedEntityTaggingEngine) ->
>> > Named Entity Tagging -> Entity Tagging ->
>> > /engines/entitytagging
>> >
>> > entityhubExtraction (EntityLinkingEngine) ->
>> > Entityhub Linking -> Entityhub Linking ->
>> > /engines/entityhublinking
>> >
>> > Could we simplify this a bit to make it more obvious especially for new
>> > users what is going on?
>> >
>> > Best,
>> >  - Fabian
>> >
>> > --
>> > Fabian
>> > http://twitter.com/fctwitt
>>
>>
>>
>> --
>> | Rupert Westenthaler             rupert.westenthaler@gmail.com
>> | Bodenlehenstraße 11                             ++43-699-11108907
>> | A-5500 Bischofshofen
>>



-- 
| Rupert Westenthaler             rupert.westenthaler@gmail.com
| Bodenlehenstraße 11                             ++43-699-11108907
| A-5500 Bischofshofen

Re: Confused by engines names

Posted by Reto Bachmann-Gmür <re...@apache.org>.
Which reminded me that we already discussed once that the artifact names
are unncessarily long, create STANBOL-820. Maybe some other renaming could
be done with that?

Cheers,
Reto

On Tue, Nov 27, 2012 at 12:17 PM, Rupert Westenthaler <
rupert.westenthaler@gmail.com> wrote:

> Hi Fabian
>
> Short version:
>
> I totally agree. Our vocabulary has changed over time, but the Engines
> still use the names as when they where introduced. Changing them
> (artifactIds and class names) is dangerous as this does break
> backwards compatibility. So I would suggest change names only if we
> can also come up with better implementation/design.
>
> Regarding Vocabulary I think we should prefer the terms
> "EntityLinking" and "NamedEntityLinking" and deprecate all others like
> "keyword" instead of "entity" or "extraction" or "tagging" instead of
> "linking".
>
> The 'engines/entitylinking' and 'engines/entityhublinking' introduced
> by STANBOL-733 do already use this new terminology. They also
> deprecate the 'engines/keywordextraction'.
>
> - - -
>
> Long version with more background information
>
> Regarding the linking of Entities there are currently two different
> principles:
>
> * "NamedEntityLinking": A "NamedEntity" has a 'selected text' AND a
> 'type'. So the selected text AND the type can be used for linking
> * "EntityLinking": An "Entity" does only have a 'selected text'. Here
> linking is only possible based on the selected text.
>
> The plan would be to also have two Engine implementations that support
> those linking models.
>
> * 'NamedEntityLinkingEngine' (currently /engines/entitytagging)
> * 'EntityLinkingEngine' (was /engines/keywordextraction (now
> deprecated) ; since yesterday  /engines/entitylinking)
>
> Those should not have external dependencies (meaning to Stanbol
> components other than Stanbol Commons, Enhancer module; also not other
> major frameworks such as Solr or OpenNLP; no calls to external
> services). That would allow to keep those Engines within the enhancer
> module but also means that those implementation can not be directly
> used by the user (as the Service used for linking will be just defined
> by an Interface without an actual implementation.
>
> Because of that there will be "Engines" that are based on the above,
> but come with adapters to Services that do support the EntityLookup.
> The default will be implementations based on the StanbolEntityhub, but
> Stanbol users could also implement versions for their own
> infrastructure needs.
>
> The "EntityhubLinking" module [1] is the first example. When you look
> at the module you will recognize that it does not contain an single
> EnhancementEngine implementation. It only provides Entityhub specific
> implementations of the EntitySearcher interface defined by the
> "EntityLinkingEngine" and a OSGI component that allows users to
> configure an EntityLinkingEngine instance that uses the Entityhub to
> lookup Entities.
>
> Current state:
>
> Currently we are not yet there. The '/engines/entitytagging' still
> implements both NamedEntityLinking AND Lookup via the Entityhub. This
> engine could be replaced by a 'engines/namedentitylinking' that
> follows the design as described above. The new
> '/engines/entitylinking' already implements the above design. However
> it still depends on the Entityhub, because the EntitySearcher
> interface [3] that is still using the Entityhub Model classes.
>
> 'engines/entityhublinking' currently provides the ability to do
> 'entitylinking' with the Entityhub. As soon as the
> 'engines/namedentitylinking' is available I would add named entity
> linking functionality to that module. In a last step this module will
> also move out of the /enhancer component (as already suggested by
> STANBOL-805 [4]).
>
>
> BTW this design was the result of this [2] discussion on the Stanbol
> dev mailing list.
>
> best
> Rupert
>
>
>
> [1]
> http://svn.apache.org/repos/asf/stanbol/trunk/enhancer/engines/entityhublinking/
> [2] http://markmail.org/message/nptkntyuthv7wwqh
> [3]
> http://stanbol.staging.apache.org/docs/trunk/components/enhancer/engines/entitylinking#entitysearcher
> [4] https://issues.apache.org/jira/browse/STANBOL-805
>
>
> On Tue, Nov 27, 2012 at 11:14 AM, Fabian Christ
> <ch...@googlemail.com> wrote:
> > Hi,
> >
> > enhancement engines in Stanbol can have several names and this is
> confusing
> > myself and very likely our users. Here are some examples that I came
> across
> > when trying to identify the running engines. I started to look at the
> > Web-UI and clicked through the OSGi console.
> >
> > dbpediaLinking (NamedEntityTaggingEngine) ->
> > Named Entity Tagging -> Entity Tagging ->
> > /engines/entitytagging
> >
> > entityhubExtraction (EntityLinkingEngine) ->
> > Entityhub Linking -> Entityhub Linking ->
> > /engines/entityhublinking
> >
> > Could we simplify this a bit to make it more obvious especially for new
> > users what is going on?
> >
> > Best,
> >  - Fabian
> >
> > --
> > Fabian
> > http://twitter.com/fctwitt
>
>
>
> --
> | Rupert Westenthaler             rupert.westenthaler@gmail.com
> | Bodenlehenstraße 11                             ++43-699-11108907
> | A-5500 Bischofshofen
>

Re: Confused by engines names

Posted by Fabian Christ <ch...@googlemail.com>.
Hi Rupert,

thanks for the detailed explanations (as always). I see that it is already
on the radar. IMO it is a great design to decouple engines and entity
lookup.

Best,
 - Fabian


2012/11/27 Rupert Westenthaler <ru...@gmail.com>

> Hi Fabian
>
> Short version:
>
> I totally agree. Our vocabulary has changed over time, but the Engines
> still use the names as when they where introduced. Changing them
> (artifactIds and class names) is dangerous as this does break
> backwards compatibility. So I would suggest change names only if we
> can also come up with better implementation/design.
>
> Regarding Vocabulary I think we should prefer the terms
> "EntityLinking" and "NamedEntityLinking" and deprecate all others like
> "keyword" instead of "entity" or "extraction" or "tagging" instead of
> "linking".
>
> The 'engines/entitylinking' and 'engines/entityhublinking' introduced
> by STANBOL-733 do already use this new terminology. They also
> deprecate the 'engines/keywordextraction'.
>
> - - -
>
> Long version with more background information
>
> Regarding the linking of Entities there are currently two different
> principles:
>
> * "NamedEntityLinking": A "NamedEntity" has a 'selected text' AND a
> 'type'. So the selected text AND the type can be used for linking
> * "EntityLinking": An "Entity" does only have a 'selected text'. Here
> linking is only possible based on the selected text.
>
> The plan would be to also have two Engine implementations that support
> those linking models.
>
> * 'NamedEntityLinkingEngine' (currently /engines/entitytagging)
> * 'EntityLinkingEngine' (was /engines/keywordextraction (now
> deprecated) ; since yesterday  /engines/entitylinking)
>
> Those should not have external dependencies (meaning to Stanbol
> components other than Stanbol Commons, Enhancer module; also not other
> major frameworks such as Solr or OpenNLP; no calls to external
> services). That would allow to keep those Engines within the enhancer
> module but also means that those implementation can not be directly
> used by the user (as the Service used for linking will be just defined
> by an Interface without an actual implementation.
>
> Because of that there will be "Engines" that are based on the above,
> but come with adapters to Services that do support the EntityLookup.
> The default will be implementations based on the StanbolEntityhub, but
> Stanbol users could also implement versions for their own
> infrastructure needs.
>
> The "EntityhubLinking" module [1] is the first example. When you look
> at the module you will recognize that it does not contain an single
> EnhancementEngine implementation. It only provides Entityhub specific
> implementations of the EntitySearcher interface defined by the
> "EntityLinkingEngine" and a OSGI component that allows users to
> configure an EntityLinkingEngine instance that uses the Entityhub to
> lookup Entities.
>
> Current state:
>
> Currently we are not yet there. The '/engines/entitytagging' still
> implements both NamedEntityLinking AND Lookup via the Entityhub. This
> engine could be replaced by a 'engines/namedentitylinking' that
> follows the design as described above. The new
> '/engines/entitylinking' already implements the above design. However
> it still depends on the Entityhub, because the EntitySearcher
> interface [3] that is still using the Entityhub Model classes.
>
> 'engines/entityhublinking' currently provides the ability to do
> 'entitylinking' with the Entityhub. As soon as the
> 'engines/namedentitylinking' is available I would add named entity
> linking functionality to that module. In a last step this module will
> also move out of the /enhancer component (as already suggested by
> STANBOL-805 [4]).
>
>
> BTW this design was the result of this [2] discussion on the Stanbol
> dev mailing list.
>
> best
> Rupert
>
>
>
> [1]
> http://svn.apache.org/repos/asf/stanbol/trunk/enhancer/engines/entityhublinking/
> [2] http://markmail.org/message/nptkntyuthv7wwqh
> [3]
> http://stanbol.staging.apache.org/docs/trunk/components/enhancer/engines/entitylinking#entitysearcher
> [4] https://issues.apache.org/jira/browse/STANBOL-805
>
>
> On Tue, Nov 27, 2012 at 11:14 AM, Fabian Christ
> <ch...@googlemail.com> wrote:
> > Hi,
> >
> > enhancement engines in Stanbol can have several names and this is
> confusing
> > myself and very likely our users. Here are some examples that I came
> across
> > when trying to identify the running engines. I started to look at the
> > Web-UI and clicked through the OSGi console.
> >
> > dbpediaLinking (NamedEntityTaggingEngine) ->
> > Named Entity Tagging -> Entity Tagging ->
> > /engines/entitytagging
> >
> > entityhubExtraction (EntityLinkingEngine) ->
> > Entityhub Linking -> Entityhub Linking ->
> > /engines/entityhublinking
> >
> > Could we simplify this a bit to make it more obvious especially for new
> > users what is going on?
> >
> > Best,
> >  - Fabian
> >
> > --
> > Fabian
> > http://twitter.com/fctwitt
>
>
>
> --
> | Rupert Westenthaler             rupert.westenthaler@gmail.com
> | Bodenlehenstraße 11                             ++43-699-11108907
> | A-5500 Bischofshofen
>



-- 
Fabian
http://twitter.com/fctwitt

Re: Confused by engines names

Posted by Rupert Westenthaler <ru...@gmail.com>.
Hi Fabian

Short version:

I totally agree. Our vocabulary has changed over time, but the Engines
still use the names as when they where introduced. Changing them
(artifactIds and class names) is dangerous as this does break
backwards compatibility. So I would suggest change names only if we
can also come up with better implementation/design.

Regarding Vocabulary I think we should prefer the terms
"EntityLinking" and "NamedEntityLinking" and deprecate all others like
"keyword" instead of "entity" or "extraction" or "tagging" instead of
"linking".

The 'engines/entitylinking' and 'engines/entityhublinking' introduced
by STANBOL-733 do already use this new terminology. They also
deprecate the 'engines/keywordextraction'.

- - -

Long version with more background information

Regarding the linking of Entities there are currently two different principles:

* "NamedEntityLinking": A "NamedEntity" has a 'selected text' AND a
'type'. So the selected text AND the type can be used for linking
* "EntityLinking": An "Entity" does only have a 'selected text'. Here
linking is only possible based on the selected text.

The plan would be to also have two Engine implementations that support
those linking models.

* 'NamedEntityLinkingEngine' (currently /engines/entitytagging)
* 'EntityLinkingEngine' (was /engines/keywordextraction (now
deprecated) ; since yesterday  /engines/entitylinking)

Those should not have external dependencies (meaning to Stanbol
components other than Stanbol Commons, Enhancer module; also not other
major frameworks such as Solr or OpenNLP; no calls to external
services). That would allow to keep those Engines within the enhancer
module but also means that those implementation can not be directly
used by the user (as the Service used for linking will be just defined
by an Interface without an actual implementation.

Because of that there will be "Engines" that are based on the above,
but come with adapters to Services that do support the EntityLookup.
The default will be implementations based on the StanbolEntityhub, but
Stanbol users could also implement versions for their own
infrastructure needs.

The "EntityhubLinking" module [1] is the first example. When you look
at the module you will recognize that it does not contain an single
EnhancementEngine implementation. It only provides Entityhub specific
implementations of the EntitySearcher interface defined by the
"EntityLinkingEngine" and a OSGI component that allows users to
configure an EntityLinkingEngine instance that uses the Entityhub to
lookup Entities.

Current state:

Currently we are not yet there. The '/engines/entitytagging' still
implements both NamedEntityLinking AND Lookup via the Entityhub. This
engine could be replaced by a 'engines/namedentitylinking' that
follows the design as described above. The new
'/engines/entitylinking' already implements the above design. However
it still depends on the Entityhub, because the EntitySearcher
interface [3] that is still using the Entityhub Model classes.

'engines/entityhublinking' currently provides the ability to do
'entitylinking' with the Entityhub. As soon as the
'engines/namedentitylinking' is available I would add named entity
linking functionality to that module. In a last step this module will
also move out of the /enhancer component (as already suggested by
STANBOL-805 [4]).


BTW this design was the result of this [2] discussion on the Stanbol
dev mailing list.

best
Rupert



[1] http://svn.apache.org/repos/asf/stanbol/trunk/enhancer/engines/entityhublinking/
[2] http://markmail.org/message/nptkntyuthv7wwqh
[3] http://stanbol.staging.apache.org/docs/trunk/components/enhancer/engines/entitylinking#entitysearcher
[4] https://issues.apache.org/jira/browse/STANBOL-805


On Tue, Nov 27, 2012 at 11:14 AM, Fabian Christ
<ch...@googlemail.com> wrote:
> Hi,
>
> enhancement engines in Stanbol can have several names and this is confusing
> myself and very likely our users. Here are some examples that I came across
> when trying to identify the running engines. I started to look at the
> Web-UI and clicked through the OSGi console.
>
> dbpediaLinking (NamedEntityTaggingEngine) ->
> Named Entity Tagging -> Entity Tagging ->
> /engines/entitytagging
>
> entityhubExtraction (EntityLinkingEngine) ->
> Entityhub Linking -> Entityhub Linking ->
> /engines/entityhublinking
>
> Could we simplify this a bit to make it more obvious especially for new
> users what is going on?
>
> Best,
>  - Fabian
>
> --
> Fabian
> http://twitter.com/fctwitt



--
| Rupert Westenthaler             rupert.westenthaler@gmail.com
| Bodenlehenstraße 11                             ++43-699-11108907
| A-5500 Bischofshofen