You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@ctakes.apache.org by "jlpainter (via GitHub)" <gi...@apache.org> on 2024/01/30 15:18:29 UTC

[PR] Adding new kernel metrics to the ctakes-ytex concept similarity service. [ctakes]

jlpainter opened a new pull request, #14:
URL: https://github.com/apache/ctakes/pull/14

   This patch includes code to add additional kernel metrics to the ctakes-ytex
   
   These include:
   
   -   Intrinsic Resnik
   -   Resnik
   -   Intrinsic Faith
   -   Faith
   -   Dice
   -   Simpson
   -   Braun-Blanquet
   -   Ochiai
   
   The algorithms for most can be found either in the original Perl UMLS::Similarity package or as described by Sanzhez and Batet in:   https://www.sciencedirect.com/science/article/pii/S1532046411000645
   
   Examples were computed and compared with output from the Perl UMLS::Similarity and verified to be the same. However, this requires that when testing against Perl's package, you must specify to use --instrinsic sanchez as the cTakes YTEX implementation of the IC is ONLY using the Sanchez implementation. If you do not specify the Resnik when calling the perl scripts, it will default to the corpus based IC which results in different numbers being produced. Once you force it to use the Sanchez IC, the distance metrics correspond exactly when running against the same UMLS database installed.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@ctakes.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org