You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@stanbol.apache.org by Rafa Haro <rh...@zaizi.com> on 2012/12/18 15:46:13 UTC

Using Disambiguation-mlt with the new EntityHub Linking Engine

Hi all,

I have been trying to use disambiguation-mlt engine with the new 
EntityHub Linking Engine for Spanish. My goal is to link and 
disambiguate with any kind of entity within the EntityHub, not only with 
Named Entities. So, I have configured a new Enhancement Chain including 
only language detection, OpenNlpSentenceDetectionEngine, 
OpenNlpTokenizerEngine, EntityLinkingEngine and Disambiguation-mlt 
(installing the bundle version 0.10). After a few tests, the 
disambiguation engine is working but is not able to disambiguate 
anything. Removing the disambiguation engine from the Enhancement Chain 
we have find out that only one candidate for each detected entity is 
given. Therefore I think that maybe the disambiguation engine is working 
fine but actually doesn't need to disambiguate anything due to only one 
candidate is being passed to it from entityHub linking engine.

What can be happening? Our suggestions parameter is set to 5

Thanks. Regards

This message should be regarded as confidential. If you have received this email in error please notify the sender and destroy it immediately. Statements of intent shall only become binding when confirmed in hard copy by an authorised signatory.

Zaizi Ltd is registered in England and Wales with the registration number 6440931. The Registered Office is 222 Westbourne Studios, 242 Acklam Road, London W10 5JJ, UK.

Re: Using Disambiguation-mlt with the new EntityHub Linking Engine

Posted by Rupert Westenthaler <ru...@gmail.com>.
On Wed, Dec 19, 2012 at 1:08 PM, Rafa Haro <rh...@zaizi.com> wrote:
> Hi Rupert,
>
> Thanks. Now is working perfectly.
>
> By the way, Is the pos-tagger model for Spanish installed in Stanbol? I want
> to know if is possible to filter the disambiguation just for nouns
>

Yes it is installed. As the Spanish POS model does not provide
ProperNouns the configuration enables all Nouns by default. So yes the
results of the EntityhubLinkingEngine will provide suggestions for all
Nouns.

best
Rupert

> Thanks
>
> El 19/12/12 12:27, Rupert Westenthaler escribió:
>
>> enhancer.engines.linking.suggestions="20"
>>      enhancer.engines.linking.minFoundTokens="1"
>>      enhancer.engines.linking.minLabelScore="0.33"
>>      enhancer.engines.linking.minTextScore="0.33"
>>      enhancer.engines.linking.minMatchScore="0.2"
>
>
> This message should be regarded as confidential. If you have received this
> email in error please notify the sender and destroy it immediately.
> Statements of intent shall only become binding when confirmed in hard copy
> by an authorised signatory.
>
> Zaizi Ltd is registered in England and Wales with the registration number
> 6440931. The Registered Office is 222 Westbourne Studios, 242 Acklam Road,
> London W10 5JJ, UK.
>



--
| Rupert Westenthaler             rupert.westenthaler@gmail.com
| Bodenlehenstraße 11                             ++43-699-11108907
| A-5500 Bischofshofen

Re: Using Disambiguation-mlt with the new EntityHub Linking Engine

Posted by Rafa Haro <rh...@zaizi.com>.
Hi Rupert,

Thanks. Now is working perfectly.

By the way, Is the pos-tagger model for Spanish installed in Stanbol? I 
want to know if is possible to filter the disambiguation just for nouns

Thanks

El 19/12/12 12:27, Rupert Westenthaler escribió:
> enhancer.engines.linking.suggestions="20"
>      enhancer.engines.linking.minFoundTokens="1"
>      enhancer.engines.linking.minLabelScore="0.33"
>      enhancer.engines.linking.minTextScore="0.33"
>      enhancer.engines.linking.minMatchScore="0.2"

This message should be regarded as confidential. If you have received this email in error please notify the sender and destroy it immediately. Statements of intent shall only become binding when confirmed in hard copy by an authorised signatory.

Zaizi Ltd is registered in England and Wales with the registration number 6440931. The Registered Office is 222 Westbourne Studios, 242 Acklam Road, London W10 5JJ, UK.


Re: Using Disambiguation-mlt with the new EntityHub Linking Engine

Posted by Rupert Westenthaler <ru...@gmail.com>.
Hi

my fault as you are probably getting

org.apache.sling.installer.core.impl.OsgiInstallerImpl Cannot create
InternalResource (resource will be ignored):InstallableResource,
priority=100, id={path} java.io.IOException: Unable to read dictionary
from input stream: {path}
[..]
Caused by: java.io.IOException: Unexpected token 78; expected: 61
(line=19, pos=48)
    at org.apache.felix.cm.file.ConfigurationHandler.readFailure(ConfigurationHandler.java:650)
    at org.apache.felix.cm.file.ConfigurationHandler.readInternal(ConfigurationHandler.java:274)
    at org.apache.felix.cm.file.ConfigurationHandler.read(ConfigurationHandler.java:237)
    at org.apache.sling.installer.core.impl.InternalResource.readDictionary(InternalResource.java:243)
    at org.apache.sling.installer.core.impl.InternalResource.create(InternalResource.java:98)
    ... 6 more

The reason for that is that the config files requires

    {key}=[{data-type}]"{value}"

That means that my example is an illegal formatted config file

Changing to

    enhancer.engines.linking.suggestions="20"
    enhancer.engines.linking.minFoundTokens="1"
    enhancer.engines.linking.minLabelScore="0.33"
    enhancer.engines.linking.minTextScore="0.33"
    enhancer.engines.linking.minMatchScore="0.2"

solved the problem for me

best
Rupert

On Wed, Dec 19, 2012 at 11:07 AM, Rafa Haro <rh...@zaizi.com> wrote:
> Hi Rupert,
>
> Thanks for the instructions. I have tried to change manually the
> configuration and I'm experimenting a weird behaviour. Creating a config
> file with a custom name in the fileinstall directory doesn't have any
> effect. After doing that, I can't see a new EntityHub Linking engine
> instance in the Felix Console. Maybe, it doesn't have to appear at all, I
> don't know.
>
> Therefore, I have tried then to create a new instance with Felix Console and
> change the configuration manually. When I create a new instance, Felix
> Console assigns it a concrete name, for instance:
>
> org.apache.stanbol.enhancer.engines.entityhublinking.EntityhubLinkingEngine.2449f404-1cce-4655-84ce-ae4235d42009
>
> Looking at fileinstall directory in my stanbol working directory, a new file
> with the configuration of this instance appears:
>
> /var/zaizi/workspace/stanbol/launchers/full/target/stanbol/fileinstall/org.apache.stanbol.enhancer.engines.entityhublinking.EntityhubLinkingEngine.org.apache.stanbol.enhancer.engines.entityhublinking.EntityhubLinkingEngine.2449f404-1cce-4655-84ce-ae4235d42009.config
>
> To change manually this file doesn't have any effect too. If you set for
> instance "Suggestions" to 50 (which is a parameter that you can actually see
> in the felix console), this change doesn't appear in the Felix Console Form
> for our instance, which always conserves the last value that you had set
> directly in the console. We have restarting the engine, bundle even
> restarting Stanbol, but the configuration doesn't change.
>
> Therefore, we thought that Felix Engine should be loading the configuration
> from another file in the filesystem. We have found more files with the same
> configuration:
>
> config/org/apache/stanbol/enhancer/engines/entityhublinking/EntityhubLinkingEngine/org/apache/stanbol/enhancer/engines/entityhublinking/EntityhubLinkingEngine/2449f404/ee7315a4-3139-4b0d-ad41-fbbf4deb7fe3.config
>
> AND
>
> config/org/apache/stanbol/enhancer/engines/entityhublinking/EntityhubLinkingEngine/2449f404-1cce-4655-84ce-ae4235d42009.config
>
> AND
>
> fileinstall/org.apache.stanbol.enhancer.engines.entityhublinking.EntityhubLinkingEngine.org.apache.stanbol.enhancer.engines.entityhublinking.EntityhubLinkingEngine.2449f404-1cce-4655-84ce-ae4235d42009.config
>
>
> We have tried changing all these files manually but we didn't success.
>
>
> The configuration of the engine is always the last configuration you did
> directly in the console. Even these files change its values when you restart
> Stanbol.
>
> Any idea here??
>
> Thanks again Rupert
> El 18/12/12 19:54, Rupert Westenthaler escribió:
>
>> Hi
>>
>> Those properties are not available in the Felix Webconsole. You can
>> only configure them by using OSGI config files. The
>> EntityLinkingEngine has simple to much configuration parameters to
>> include them all in the Form of the Felix Webconsole.
>>
>> The best is to use default configuration for the dbpedia
>> EntityhubLinkingEngine [1] as a template and adapt it to your needs.
>> e.g. by
>>
>> adding
>>
>> enhancer.engines.linking.minFoundTokens=1
>> enhancer.engines.linking.minLabelScore=0.33
>> enhancer.engines.linking.minTextScore=0.33
>> enhancer.engines.linking.minMatchScore=0.2
>>
>> you will also need to increase the value of
>> "enhancer.engines.linking.suggestions".
>>
>> Note that you do NOT need to use the datatypes (e.g. {key}=I"1" for
>> Integer). The Engine is implemented in a way that is also supports
>> string values as long as it can parse the expected numeric values from
>> the provided values.
>>
>> The file must follow the name
>>
>> "org.apache.stanbol.enhancer.engines.entityhublinking.EntityhubLinkingEngine-{instance_name}.config".
>>
>> You can use the Sling Fileinstaller to activate your configuration
>> file. Simple create the {stanbl-working-dir}/stanbol/fileinstall
>> directory and copy the config file into this directory.
>>
>> best
>> Rupert
>>
>> p.s. in my last mail I used outdated keys. Also the documentation on
>> the Stanbol website noted the wrong keys. I corrected this in the
>> meantime
>>
>>
>> [1]
>> http://svn.apache.org/repos/asf/stanbol/trunk/data/defaultconfig/src/main/resources/config/org.apache.stanbol.enhancer.engines.entityhublinking.EntityhubLinkingEngine-dbpedia.config
>>
>>
>> On Tue, Dec 18, 2012 at 4:27 PM, Rafa Haro <rh...@zaizi.com> wrote:
>>>
>>> Hi Rupert,
>>>
>>> In which revision is it possible to configure such parameters? We are
>>> working with revision 1421282 and I can't see these options in the Engine
>>> Configuration Dialogue.
>>>
>>> Regards
>>>
>>> El 18/12/12 16:21, Rupert Westenthaler escribió:
>>>
>>>> Hi Rafa
>>>>
>>>> To use the disambiguation engine you will need to tweak the parameters
>>>> for the EntityhubLinkingEngine. The relevant parameters are
>>>>
>>>> * Min Label Match Score
>>>>
>>>>
>>>> "org.apache.stanbol.enhancer.engines.keywordextraction.minLabelMatchFactor"
>>>> * Min Matched Tokens
>>>> "org.apache.stanbol.enhancer.engines.keywordextraction.minFoundTokens"
>>>>
>>>> see [1] for the documentation
>>>>
>>>> from the Documentation:
>>>>
>>>> If used in combination with an disambiguation Engine one might want to
>>>> consider to suggest Entities where only a single token of multi-token
>>>> labels do match. In such cases a configuration like Min Matched
>>>> Tokens=1 and Min Label Match Score <= 0.5 (e.g. 0.4) might be
>>>> considered. With such scenarios users will also want to considerable
>>>> increase the value for Max Suggestions (typically values > 10).
>>>>
>>>> I would suggest that you start of with "minLabelMatchFactor=0.33" and
>>>> "minFoundTokens=1". In addition I would set the number of suggestions
>>>> to ~20.
>>>>
>>>> best
>>>> Rupert
>>>>
>>>>
>>>> [1]
>>>>
>>>> http://stanbol.apache.org/docs/trunk/components/enhancer/engines/entitylinking#entity-linker-configuration
>>>>
>>>> On Tue, Dec 18, 2012 at 3:46 PM, Rafa Haro <rh...@zaizi.com> wrote:
>>>>>
>>>>> Hi all,
>>>>>
>>>>> I have been trying to use disambiguation-mlt engine with the new
>>>>> EntityHub
>>>>> Linking Engine for Spanish. My goal is to link and disambiguate with
>>>>> any
>>>>> kind of entity within the EntityHub, not only with Named Entities. So,
>>>>> I
>>>>> have configured a new Enhancement Chain including only language
>>>>> detection,
>>>>> OpenNlpSentenceDetectionEngine, OpenNlpTokenizerEngine,
>>>>> EntityLinkingEngine
>>>>> and Disambiguation-mlt (installing the bundle version 0.10). After a
>>>>> few
>>>>> tests, the disambiguation engine is working but is not able to
>>>>> disambiguate
>>>>> anything. Removing the disambiguation engine from the Enhancement Chain
>>>>> we
>>>>> have find out that only one candidate for each detected entity is
>>>>> given.
>>>>> Therefore I think that maybe the disambiguation engine is working fine
>>>>> but
>>>>> actually doesn't need to disambiguate anything due to only one
>>>>> candidate
>>>>> is
>>>>> being passed to it from entityHub linking engine.
>>>>>
>>>>> What can be happening? Our suggestions parameter is set to 5
>>>>>
>>>>> Thanks. Regards
>>>>>
>>>>> This message should be regarded as confidential. If you have received
>>>>> this
>>>>> email in error please notify the sender and destroy it immediately.
>>>>> Statements of intent shall only become binding when confirmed in hard
>>>>> copy
>>>>> by an authorised signatory.
>>>>>
>>>>> Zaizi Ltd is registered in England and Wales with the registration
>>>>> number
>>>>> 6440931. The Registered Office is 222 Westbourne Studios, 242 Acklam
>>>>> Road,
>>>>> London W10 5JJ, UK.
>>>>
>>>>
>>>>
>>> This message should be regarded as confidential. If you have received
>>> this
>>> email in error please notify the sender and destroy it immediately.
>>> Statements of intent shall only become binding when confirmed in hard
>>> copy
>>> by an authorised signatory.
>>>
>>> Zaizi Ltd is registered in England and Wales with the registration number
>>> 6440931. The Registered Office is 222 Westbourne Studios, 242 Acklam
>>> Road,
>>> London W10 5JJ, UK.
>>>
>>
>>
>
>
> This message should be regarded as confidential. If you have received this
> email in error please notify the sender and destroy it immediately.
> Statements of intent shall only become binding when confirmed in hard copy
> by an authorised signatory.
>
> Zaizi Ltd is registered in England and Wales with the registration number
> 6440931. The Registered Office is 222 Westbourne Studios, 242 Acklam Road,
> London W10 5JJ, UK.



--
| Rupert Westenthaler             rupert.westenthaler@gmail.com
| Bodenlehenstraße 11                             ++43-699-11108907
| A-5500 Bischofshofen

Re: Using Disambiguation-mlt with the new EntityHub Linking Engine

Posted by Rafa Haro <rh...@zaizi.com>.
Hi Rupert,

Thanks for the instructions. I have tried to change manually the 
configuration and I'm experimenting a weird behaviour. Creating a config 
file with a custom name in the fileinstall directory doesn't have any 
effect. After doing that, I can't see a new EntityHub Linking engine 
instance in the Felix Console. Maybe, it doesn't have to appear at all, 
I don't know.

Therefore, I have tried then to create a new instance with Felix Console 
and change the configuration manually. When I create a new instance, 
Felix Console assigns it a concrete name, for instance:

org.apache.stanbol.enhancer.engines.entityhublinking.EntityhubLinkingEngine.2449f404-1cce-4655-84ce-ae4235d42009

Looking at fileinstall directory in my stanbol working directory, a new 
file with the configuration of this instance appears:

/var/zaizi/workspace/stanbol/launchers/full/target/stanbol/fileinstall/org.apache.stanbol.enhancer.engines.entityhublinking.EntityhubLinkingEngine.org.apache.stanbol.enhancer.engines.entityhublinking.EntityhubLinkingEngine.2449f404-1cce-4655-84ce-ae4235d42009.config

To change manually this file doesn't have any effect too. If you set for 
instance "Suggestions" to 50 (which is a parameter that you can actually 
see in the felix console), this change doesn't appear in the Felix 
Console Form for our instance, which always conserves the last value 
that you had set directly in the console. We have restarting the engine, 
bundle even restarting Stanbol, but the configuration doesn't change.

Therefore, we thought that Felix Engine should be loading the 
configuration from another file in the filesystem. We have found more 
files with the same configuration:

config/org/apache/stanbol/enhancer/engines/entityhublinking/EntityhubLinkingEngine/org/apache/stanbol/enhancer/engines/entityhublinking/EntityhubLinkingEngine/2449f404/ee7315a4-3139-4b0d-ad41-fbbf4deb7fe3.config

AND

config/org/apache/stanbol/enhancer/engines/entityhublinking/EntityhubLinkingEngine/2449f404-1cce-4655-84ce-ae4235d42009.config

AND

fileinstall/org.apache.stanbol.enhancer.engines.entityhublinking.EntityhubLinkingEngine.org.apache.stanbol.enhancer.engines.entityhublinking.EntityhubLinkingEngine.2449f404-1cce-4655-84ce-ae4235d42009.config


We have tried changing all these files manually but we didn't success.


The configuration of the engine is always the last configuration you did 
directly in the console. Even these files change its values when you 
restart Stanbol.

Any idea here??

Thanks again Rupert
El 18/12/12 19:54, Rupert Westenthaler escribió:
> Hi
>
> Those properties are not available in the Felix Webconsole. You can
> only configure them by using OSGI config files. The
> EntityLinkingEngine has simple to much configuration parameters to
> include them all in the Form of the Felix Webconsole.
>
> The best is to use default configuration for the dbpedia
> EntityhubLinkingEngine [1] as a template and adapt it to your needs.
> e.g. by
>
> adding
>
> enhancer.engines.linking.minFoundTokens=1
> enhancer.engines.linking.minLabelScore=0.33
> enhancer.engines.linking.minTextScore=0.33
> enhancer.engines.linking.minMatchScore=0.2
>
> you will also need to increase the value of
> "enhancer.engines.linking.suggestions".
>
> Note that you do NOT need to use the datatypes (e.g. {key}=I"1" for
> Integer). The Engine is implemented in a way that is also supports
> string values as long as it can parse the expected numeric values from
> the provided values.
>
> The file must follow the name
> "org.apache.stanbol.enhancer.engines.entityhublinking.EntityhubLinkingEngine-{instance_name}.config".
>
> You can use the Sling Fileinstaller to activate your configuration
> file. Simple create the {stanbl-working-dir}/stanbol/fileinstall
> directory and copy the config file into this directory.
>
> best
> Rupert
>
> p.s. in my last mail I used outdated keys. Also the documentation on
> the Stanbol website noted the wrong keys. I corrected this in the
> meantime
>
>
> [1] http://svn.apache.org/repos/asf/stanbol/trunk/data/defaultconfig/src/main/resources/config/org.apache.stanbol.enhancer.engines.entityhublinking.EntityhubLinkingEngine-dbpedia.config
>
>
> On Tue, Dec 18, 2012 at 4:27 PM, Rafa Haro <rh...@zaizi.com> wrote:
>> Hi Rupert,
>>
>> In which revision is it possible to configure such parameters? We are
>> working with revision 1421282 and I can't see these options in the Engine
>> Configuration Dialogue.
>>
>> Regards
>>
>> El 18/12/12 16:21, Rupert Westenthaler escribió:
>>
>>> Hi Rafa
>>>
>>> To use the disambiguation engine you will need to tweak the parameters
>>> for the EntityhubLinkingEngine. The relevant parameters are
>>>
>>> * Min Label Match Score
>>>
>>> "org.apache.stanbol.enhancer.engines.keywordextraction.minLabelMatchFactor"
>>> * Min Matched Tokens
>>> "org.apache.stanbol.enhancer.engines.keywordextraction.minFoundTokens"
>>>
>>> see [1] for the documentation
>>>
>>> from the Documentation:
>>>
>>> If used in combination with an disambiguation Engine one might want to
>>> consider to suggest Entities where only a single token of multi-token
>>> labels do match. In such cases a configuration like Min Matched
>>> Tokens=1 and Min Label Match Score <= 0.5 (e.g. 0.4) might be
>>> considered. With such scenarios users will also want to considerable
>>> increase the value for Max Suggestions (typically values > 10).
>>>
>>> I would suggest that you start of with "minLabelMatchFactor=0.33" and
>>> "minFoundTokens=1". In addition I would set the number of suggestions
>>> to ~20.
>>>
>>> best
>>> Rupert
>>>
>>>
>>> [1]
>>> http://stanbol.apache.org/docs/trunk/components/enhancer/engines/entitylinking#entity-linker-configuration
>>>
>>> On Tue, Dec 18, 2012 at 3:46 PM, Rafa Haro <rh...@zaizi.com> wrote:
>>>> Hi all,
>>>>
>>>> I have been trying to use disambiguation-mlt engine with the new
>>>> EntityHub
>>>> Linking Engine for Spanish. My goal is to link and disambiguate with any
>>>> kind of entity within the EntityHub, not only with Named Entities. So, I
>>>> have configured a new Enhancement Chain including only language
>>>> detection,
>>>> OpenNlpSentenceDetectionEngine, OpenNlpTokenizerEngine,
>>>> EntityLinkingEngine
>>>> and Disambiguation-mlt (installing the bundle version 0.10). After a few
>>>> tests, the disambiguation engine is working but is not able to
>>>> disambiguate
>>>> anything. Removing the disambiguation engine from the Enhancement Chain
>>>> we
>>>> have find out that only one candidate for each detected entity is given.
>>>> Therefore I think that maybe the disambiguation engine is working fine
>>>> but
>>>> actually doesn't need to disambiguate anything due to only one candidate
>>>> is
>>>> being passed to it from entityHub linking engine.
>>>>
>>>> What can be happening? Our suggestions parameter is set to 5
>>>>
>>>> Thanks. Regards
>>>>
>>>> This message should be regarded as confidential. If you have received
>>>> this
>>>> email in error please notify the sender and destroy it immediately.
>>>> Statements of intent shall only become binding when confirmed in hard
>>>> copy
>>>> by an authorised signatory.
>>>>
>>>> Zaizi Ltd is registered in England and Wales with the registration number
>>>> 6440931. The Registered Office is 222 Westbourne Studios, 242 Acklam
>>>> Road,
>>>> London W10 5JJ, UK.
>>>
>>>
>> This message should be regarded as confidential. If you have received this
>> email in error please notify the sender and destroy it immediately.
>> Statements of intent shall only become binding when confirmed in hard copy
>> by an authorised signatory.
>>
>> Zaizi Ltd is registered in England and Wales with the registration number
>> 6440931. The Registered Office is 222 Westbourne Studios, 242 Acklam Road,
>> London W10 5JJ, UK.
>>
>
>


This message should be regarded as confidential. If you have received this email in error please notify the sender and destroy it immediately. Statements of intent shall only become binding when confirmed in hard copy by an authorised signatory.

Zaizi Ltd is registered in England and Wales with the registration number 6440931. The Registered Office is 222 Westbourne Studios, 242 Acklam Road, London W10 5JJ, UK.

Re: Using Disambiguation-mlt with the new EntityHub Linking Engine

Posted by Rupert Westenthaler <ru...@gmail.com>.
Hi

Those properties are not available in the Felix Webconsole. You can
only configure them by using OSGI config files. The
EntityLinkingEngine has simple to much configuration parameters to
include them all in the Form of the Felix Webconsole.

The best is to use default configuration for the dbpedia
EntityhubLinkingEngine [1] as a template and adapt it to your needs.
e.g. by

adding

enhancer.engines.linking.minFoundTokens=1
enhancer.engines.linking.minLabelScore=0.33
enhancer.engines.linking.minTextScore=0.33
enhancer.engines.linking.minMatchScore=0.2

you will also need to increase the value of
"enhancer.engines.linking.suggestions".

Note that you do NOT need to use the datatypes (e.g. {key}=I"1" for
Integer). The Engine is implemented in a way that is also supports
string values as long as it can parse the expected numeric values from
the provided values.

The file must follow the name
"org.apache.stanbol.enhancer.engines.entityhublinking.EntityhubLinkingEngine-{instance_name}.config".

You can use the Sling Fileinstaller to activate your configuration
file. Simple create the {stanbl-working-dir}/stanbol/fileinstall
directory and copy the config file into this directory.

best
Rupert

p.s. in my last mail I used outdated keys. Also the documentation on
the Stanbol website noted the wrong keys. I corrected this in the
meantime


[1] http://svn.apache.org/repos/asf/stanbol/trunk/data/defaultconfig/src/main/resources/config/org.apache.stanbol.enhancer.engines.entityhublinking.EntityhubLinkingEngine-dbpedia.config


On Tue, Dec 18, 2012 at 4:27 PM, Rafa Haro <rh...@zaizi.com> wrote:
> Hi Rupert,
>
> In which revision is it possible to configure such parameters? We are
> working with revision 1421282 and I can't see these options in the Engine
> Configuration Dialogue.
>
> Regards
>
> El 18/12/12 16:21, Rupert Westenthaler escribió:
>
>> Hi Rafa
>>
>> To use the disambiguation engine you will need to tweak the parameters
>> for the EntityhubLinkingEngine. The relevant parameters are
>>
>> * Min Label Match Score
>>
>> "org.apache.stanbol.enhancer.engines.keywordextraction.minLabelMatchFactor"
>> * Min Matched Tokens
>> "org.apache.stanbol.enhancer.engines.keywordextraction.minFoundTokens"
>>
>> see [1] for the documentation
>>
>> from the Documentation:
>>
>> If used in combination with an disambiguation Engine one might want to
>> consider to suggest Entities where only a single token of multi-token
>> labels do match. In such cases a configuration like Min Matched
>> Tokens=1 and Min Label Match Score <= 0.5 (e.g. 0.4) might be
>> considered. With such scenarios users will also want to considerable
>> increase the value for Max Suggestions (typically values > 10).
>>
>> I would suggest that you start of with "minLabelMatchFactor=0.33" and
>> "minFoundTokens=1". In addition I would set the number of suggestions
>> to ~20.
>>
>> best
>> Rupert
>>
>>
>> [1]
>> http://stanbol.apache.org/docs/trunk/components/enhancer/engines/entitylinking#entity-linker-configuration
>>
>> On Tue, Dec 18, 2012 at 3:46 PM, Rafa Haro <rh...@zaizi.com> wrote:
>>>
>>> Hi all,
>>>
>>> I have been trying to use disambiguation-mlt engine with the new
>>> EntityHub
>>> Linking Engine for Spanish. My goal is to link and disambiguate with any
>>> kind of entity within the EntityHub, not only with Named Entities. So, I
>>> have configured a new Enhancement Chain including only language
>>> detection,
>>> OpenNlpSentenceDetectionEngine, OpenNlpTokenizerEngine,
>>> EntityLinkingEngine
>>> and Disambiguation-mlt (installing the bundle version 0.10). After a few
>>> tests, the disambiguation engine is working but is not able to
>>> disambiguate
>>> anything. Removing the disambiguation engine from the Enhancement Chain
>>> we
>>> have find out that only one candidate for each detected entity is given.
>>> Therefore I think that maybe the disambiguation engine is working fine
>>> but
>>> actually doesn't need to disambiguate anything due to only one candidate
>>> is
>>> being passed to it from entityHub linking engine.
>>>
>>> What can be happening? Our suggestions parameter is set to 5
>>>
>>> Thanks. Regards
>>>
>>> This message should be regarded as confidential. If you have received
>>> this
>>> email in error please notify the sender and destroy it immediately.
>>> Statements of intent shall only become binding when confirmed in hard
>>> copy
>>> by an authorised signatory.
>>>
>>> Zaizi Ltd is registered in England and Wales with the registration number
>>> 6440931. The Registered Office is 222 Westbourne Studios, 242 Acklam
>>> Road,
>>> London W10 5JJ, UK.
>>
>>
>>
>
> This message should be regarded as confidential. If you have received this
> email in error please notify the sender and destroy it immediately.
> Statements of intent shall only become binding when confirmed in hard copy
> by an authorised signatory.
>
> Zaizi Ltd is registered in England and Wales with the registration number
> 6440931. The Registered Office is 222 Westbourne Studios, 242 Acklam Road,
> London W10 5JJ, UK.
>



-- 
| Rupert Westenthaler             rupert.westenthaler@gmail.com
| Bodenlehenstraße 11                             ++43-699-11108907
| A-5500 Bischofshofen

Re: Using Disambiguation-mlt with the new EntityHub Linking Engine

Posted by Rafa Haro <rh...@zaizi.com>.
Hi Rupert,

In which revision is it possible to configure such parameters? We are 
working with revision 1421282 and I can't see these options in the 
Engine Configuration Dialogue.

Regards

El 18/12/12 16:21, Rupert Westenthaler escribió:
> Hi Rafa
>
> To use the disambiguation engine you will need to tweak the parameters
> for the EntityhubLinkingEngine. The relevant parameters are
>
> * Min Label Match Score
> "org.apache.stanbol.enhancer.engines.keywordextraction.minLabelMatchFactor"
> * Min Matched Tokens
> "org.apache.stanbol.enhancer.engines.keywordextraction.minFoundTokens"
>
> see [1] for the documentation
>
> from the Documentation:
>
> If used in combination with an disambiguation Engine one might want to
> consider to suggest Entities where only a single token of multi-token
> labels do match. In such cases a configuration like Min Matched
> Tokens=1 and Min Label Match Score <= 0.5 (e.g. 0.4) might be
> considered. With such scenarios users will also want to considerable
> increase the value for Max Suggestions (typically values > 10).
>
> I would suggest that you start of with "minLabelMatchFactor=0.33" and
> "minFoundTokens=1". In addition I would set the number of suggestions
> to ~20.
>
> best
> Rupert
>
>
> [1] http://stanbol.apache.org/docs/trunk/components/enhancer/engines/entitylinking#entity-linker-configuration
>
> On Tue, Dec 18, 2012 at 3:46 PM, Rafa Haro <rh...@zaizi.com> wrote:
>> Hi all,
>>
>> I have been trying to use disambiguation-mlt engine with the new EntityHub
>> Linking Engine for Spanish. My goal is to link and disambiguate with any
>> kind of entity within the EntityHub, not only with Named Entities. So, I
>> have configured a new Enhancement Chain including only language detection,
>> OpenNlpSentenceDetectionEngine, OpenNlpTokenizerEngine, EntityLinkingEngine
>> and Disambiguation-mlt (installing the bundle version 0.10). After a few
>> tests, the disambiguation engine is working but is not able to disambiguate
>> anything. Removing the disambiguation engine from the Enhancement Chain we
>> have find out that only one candidate for each detected entity is given.
>> Therefore I think that maybe the disambiguation engine is working fine but
>> actually doesn't need to disambiguate anything due to only one candidate is
>> being passed to it from entityHub linking engine.
>>
>> What can be happening? Our suggestions parameter is set to 5
>>
>> Thanks. Regards
>>
>> This message should be regarded as confidential. If you have received this
>> email in error please notify the sender and destroy it immediately.
>> Statements of intent shall only become binding when confirmed in hard copy
>> by an authorised signatory.
>>
>> Zaizi Ltd is registered in England and Wales with the registration number
>> 6440931. The Registered Office is 222 Westbourne Studios, 242 Acklam Road,
>> London W10 5JJ, UK.
>
>

This message should be regarded as confidential. If you have received this email in error please notify the sender and destroy it immediately. Statements of intent shall only become binding when confirmed in hard copy by an authorised signatory.

Zaizi Ltd is registered in England and Wales with the registration number 6440931. The Registered Office is 222 Westbourne Studios, 242 Acklam Road, London W10 5JJ, UK.


Re: Using Disambiguation-mlt with the new EntityHub Linking Engine

Posted by Rupert Westenthaler <ru...@gmail.com>.
Hi Rafa

To use the disambiguation engine you will need to tweak the parameters
for the EntityhubLinkingEngine. The relevant parameters are

* Min Label Match Score
"org.apache.stanbol.enhancer.engines.keywordextraction.minLabelMatchFactor"
* Min Matched Tokens
"org.apache.stanbol.enhancer.engines.keywordextraction.minFoundTokens"

see [1] for the documentation

from the Documentation:

If used in combination with an disambiguation Engine one might want to
consider to suggest Entities where only a single token of multi-token
labels do match. In such cases a configuration like Min Matched
Tokens=1 and Min Label Match Score <= 0.5 (e.g. 0.4) might be
considered. With such scenarios users will also want to considerable
increase the value for Max Suggestions (typically values > 10).

I would suggest that you start of with "minLabelMatchFactor=0.33" and
"minFoundTokens=1". In addition I would set the number of suggestions
to ~20.

best
Rupert


[1] http://stanbol.apache.org/docs/trunk/components/enhancer/engines/entitylinking#entity-linker-configuration

On Tue, Dec 18, 2012 at 3:46 PM, Rafa Haro <rh...@zaizi.com> wrote:
> Hi all,
>
> I have been trying to use disambiguation-mlt engine with the new EntityHub
> Linking Engine for Spanish. My goal is to link and disambiguate with any
> kind of entity within the EntityHub, not only with Named Entities. So, I
> have configured a new Enhancement Chain including only language detection,
> OpenNlpSentenceDetectionEngine, OpenNlpTokenizerEngine, EntityLinkingEngine
> and Disambiguation-mlt (installing the bundle version 0.10). After a few
> tests, the disambiguation engine is working but is not able to disambiguate
> anything. Removing the disambiguation engine from the Enhancement Chain we
> have find out that only one candidate for each detected entity is given.
> Therefore I think that maybe the disambiguation engine is working fine but
> actually doesn't need to disambiguate anything due to only one candidate is
> being passed to it from entityHub linking engine.
>
> What can be happening? Our suggestions parameter is set to 5
>
> Thanks. Regards
>
> This message should be regarded as confidential. If you have received this
> email in error please notify the sender and destroy it immediately.
> Statements of intent shall only become binding when confirmed in hard copy
> by an authorised signatory.
>
> Zaizi Ltd is registered in England and Wales with the registration number
> 6440931. The Registered Office is 222 Westbourne Studios, 242 Acklam Road,
> London W10 5JJ, UK.



-- 
| Rupert Westenthaler             rupert.westenthaler@gmail.com
| Bodenlehenstraße 11                             ++43-699-11108907
| A-5500 Bischofshofen