You are viewing a plain text version of this content. The canonical link for it is here.
Posted to oak-issues@jackrabbit.apache.org by "Tommaso Teofili (JIRA)" <ji...@apache.org> on 2016/06/10 15:27:20 UTC

[jira] [Commented] (OAK-4348) Cross language search via SMT

    [ https://issues.apache.org/jira/browse/OAK-4348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15324627#comment-15324627 ] 

Tommaso Teofili commented on OAK-4348:
--------------------------------------

I've noticed this has been mentioned at ApacheCon, see [slide deck|http://schd.ws/hosted_files/apachecon2016/a9/Introducing%20Apache%20Joshua%20%28Incubating%29.pdf].

> Cross language search via SMT
> -----------------------------
>
>                 Key: OAK-4348
>                 URL: https://issues.apache.org/jira/browse/OAK-4348
>             Project: Jackrabbit Oak
>          Issue Type: New Feature
>          Components: query
>            Reporter: Tommaso Teofili
>            Assignee: Tommaso Teofili
>             Fix For: 1.6
>
>
> It would be interesting to investigate usage of statistical machine translation toolkits (like Apache Joshua) in order to enable cross language search, so that query can be eventually expanded to search over translated terms too.
> Example: 
> - enable spanish to english translation
> - perform full text search for "hola" 
> - query engine looks for translations for "hola"
> - SMT returns "hello"
> - query engine add an additional (UNION) clause for the translated term
> - the query performed by Oak becomes "hello OR hola"
> - both results for english and spanish terms get returned
> This of course should be configurable.
> Note that the integration may happen also via Apache Tika which provides a Translator API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)