You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@stanbol.apache.org by "ELOUMBAT ASSOUA ALBERT (JIRA)" <ji...@apache.org> on 2014/03/06 19:44:46 UTC

[jira] [Issue Comment Deleted] (STANBOL-1291) Phonetic Linking

     [ https://issues.apache.org/jira/browse/STANBOL-1291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

ELOUMBAT ASSOUA ALBERT updated STANBOL-1291:
--------------------------------------------

    Comment: was deleted

(was: hi,
can you kindly forward the changes to my email address.)

> Phonetic Linking
> ----------------
>
>                 Key: STANBOL-1291
>                 URL: https://issues.apache.org/jira/browse/STANBOL-1291
>             Project: Stanbol
>          Issue Type: New Feature
>          Components: Enhancement Engines
>            Reporter: Rupert Westenthaler
>              Labels: gsoc2014, mentoring
>
> Add Phonetic based EntityLinking support to Apache Stanbol
> The Idea is to
> 1. start of with a sound file
> 2. use a speech to text engine like STANBOL-1007 to get the transcript
> 3. use NLP processing
> 4. use the FST Linking Enigne (STANBOL-1128) to link a SolrIndex configured for Phonetic linking [1].
> 5. correct the text transcript based on labels of linked entities.
> The main question to be answers is if the phonetic matching (step 4) can correctly link Entities even if the writings in the text transcript are incorrect.
> Additional things to validate are
> * the quality of the text transcript good enough
> * does NLP processing still sufficiently well work on text transcripts
> This will definitely also require adaptations to the FST Linking Engine as the score is currently calculated base on the levenshtein distance of the mention with the best matching label of an entity - what does not make sense for this specific use case. 
> [1] http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.PhoneticFilterFactory



--
This message was sent by Atlassian JIRA
(v6.2#6252)