You are viewing a plain text version of this content. The canonical link for it is here.
Posted to oak-issues@jackrabbit.apache.org by "Tommaso Teofili (JIRA)" <ji...@apache.org> on 2016/10/11 13:47:21 UTC

[jira] [Commented] (OAK-4348) Cross language search via SMT

    [ https://issues.apache.org/jira/browse/OAK-4348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15565445#comment-15565445 ] 

Tommaso Teofili commented on OAK-4348:
--------------------------------------

I've reworked the previous implementation by leveraging Lucene index specific API for enhancing the query  ( {{FulltextTermQueryProvider}} ), as it makes more sense than working (and adding dependencies) at oak-core (query engine) level and such workflow is mainly / only meant for full text queries.
The up to date version of integrating Apache Joshua to Oak (Lucene) is at : https://github.com/tteofili/jackrabbit-oak/tree/OAK-4348

> Cross language search via SMT
> -----------------------------
>
>                 Key: OAK-4348
>                 URL: https://issues.apache.org/jira/browse/OAK-4348
>             Project: Jackrabbit Oak
>          Issue Type: New Feature
>          Components: query
>            Reporter: Tommaso Teofili
>            Assignee: Tommaso Teofili
>             Fix For: 1.6
>
>
> It would be interesting to investigate usage of statistical machine translation toolkits (like Apache Joshua) in order to enable cross language search, so that query can be eventually expanded to search over translated terms too.
> Example: 
> - enable spanish to english translation
> - perform full text search for "hola" 
> - query engine looks for translations for "hola"
> - SMT returns "hello"
> - query engine add an additional (UNION) clause for the translated term
> - the query performed by Oak becomes "hello OR hola"
> - both results for english and spanish terms get returned
> This of course should be configurable.
> Note that the integration may happen also via Apache Tika which provides a Translator API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)