You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@opennlp.apache.org by "Mark Giaconia (JIRA)" <ji...@apache.org> on 2013/10/22 13:37:00 UTC

[jira] [Commented] (OPENNLP-608) EntityLinker framework should provide the means to return multiple types of scores for linkedspans

    [ https://issues.apache.org/jira/browse/OPENNLP-608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13801737#comment-13801737 ] 

Mark Giaconia commented on OPENNLP-608:
---------------------------------------

Implemented a HashMap within BaseLink. Just wanted to post some sample results:
Notice  how the scores handle different types of ambiguity:
the scores are dice coefficient, mysql ranking, countrycontext analysis, and geohash binning (linear clustering).

linked span 37 to 38, from sentence 0, using search/name India, to the following gaz entries:
India : india in in : {dice=1.0, mysqlfulltext=10.826136589050293, countrycontext=1.0, geohashbin=0.09375000000000011}
linked span 57 to 58, from sentence 0, using search/name China, to the following gaz entries:
China : china in ch : {dice=1.0, mysqlfulltext=8.958383560180664, countrycontext=1.0, geohashbin=0.09375000000000011}
linked span 94 to 95, from sentence 0, using search/name India, to the following gaz entries:
India : india in in : {dice=1.0, mysqlfulltext=10.826136589050293, countrycontext=1.0, geohashbin=0.09375000000000011}
linked span 98 to 100, from sentence 0, using search/name Sri Lanka, to the following gaz entries:
Sri Lanka : sri lanka in ce : {dice=1.0, mysqlfulltext=13.77316665649414, countrycontext=0.9090909090909091, geohashbin=1.0}
Sri Lanka : sri lanka in ce : {dice=1.0, mysqlfulltext=13.77316665649414, countrycontext=0.9090909090909091, geohashbin=1.0}
Sri Lanka : lanka mess in ce : {dice=0.5333333333333333, mysqlfulltext=12.89000129699707, countrycontext=0.1590909090909091, geohashbin=1.0}
Sri Lanka : chula lanka in ce : {dice=0.5, mysqlfulltext=12.89000129699707, countrycontext=0.1590909090909091, geohashbin=1.0}
Sri Lanka : lanka in ce : {dice=0.7272727272727273, mysqlfulltext=11.906668663024902, countrycontext=0.1590909090909091, geohashbin=1.0}
Sri Lanka : lanka in ce : {dice=0.7272727272727273, mysqlfulltext=11.906668663024902, countrycontext=0.1590909090909091, geohashbin=1.0}
Sri Lanka : lanka in in : {dice=0.7272727272727273, mysqlfulltext=11.906668663024902, countrycontext=0.48860606060606065, geohashbin=0.09375000000000011}
Sri Lanka : lanka in in : {dice=0.7272727272727273, mysqlfulltext=11.906668663024902, countrycontext=0.48860606060606065, geohashbin=0.09375000000000011}

> EntityLinker framework should provide the means to return multiple types of scores for linkedspans
> --------------------------------------------------------------------------------------------------
>
>                 Key: OPENNLP-608
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-608
>             Project: OpenNLP
>          Issue Type: Sub-task
>          Components: Entity Linker
>    Affects Versions: 1.6.0
>            Reporter: Mark Giaconia
>            Assignee: Mark Giaconia
>            Priority: Minor
>             Fix For: 1.6.0
>
>
> Currently the entitylinker LinkedSpan<BaseLink> only has fields for two scores. The BaseLink class should expose a hashmap<String, Double> in order to allow users to score in multiple ways. The GeoEntityLinker will return a FuzzyString matching score based on Dice coef, a CountryContext score, a GeoHashBinning score, and a database rank score. With these scores a user can define downstream logic per business needs.



--
This message was sent by Atlassian JIRA
(v6.1#6144)