You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2016/03/26 09:37:25 UTC

[jira] [Comment Edited] (SOLR-8714) Implement translation contrib package for LanguageTranslationUpdateProcessor's

    [ https://issues.apache.org/jira/browse/SOLR-8714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15212885#comment-15212885 ] 

Lewis John McGibbney edited comment on SOLR-8714 at 3/26/16 8:36 AM:
---------------------------------------------------------------------

Hi [~teofili] I started a patch which I thought was sound. The blocker right now is SOLR-8716
If we can do the upgrade on Tika then this issue (with Joshua for example backing statistical machine translation via the [language packs|http://joshua-decoder.org/language-packs/] we've been generating) then this issue is IMHO a game changer for the way that Web crawlers harvest and make data available, useful and ultimately meaningful to us all. If we can get Solr doing statistical machine translation at indexing time then this is a game changer (of course others are doing it, but for the open source Apache Solr it would be excellent). 


was (Author: lewismc):
Hi [~teofili] I started a patch which I thought was sound. The blocker right now is SOLR-8716
If we can do the upgrade on Tika then this issue (with Joshua for example backing statistical machine translation via the language packs we've been generating) then this issue is IMHO a game changer for the way that Web crawlers harvest and make data available, useful and ultimately meaningful to us all. If we can get Solr doing statistical machine translation at indexing time then this is a game changer (of course others are doing it, but for the open source Apache Solr it would be excellent). 

> Implement translation contrib package for LanguageTranslationUpdateProcessor's
> ------------------------------------------------------------------------------
>
>                 Key: SOLR-8714
>                 URL: https://issues.apache.org/jira/browse/SOLR-8714
>             Project: Solr
>          Issue Type: New Feature
>            Reporter: Lewis John McGibbney
>             Fix For: master
>
>
> A while back over in Tika we implemented the [Translator|https://github.com/apache/tika/blob/master/tika-core/src/main/java/org/apache/tika/language/translate/Translator.java] interface. This now provides a number of [implementations|https://github.com/apache/tika/tree/master/tika-translate/src/main/java/org/apache/tika/language/translate]. 
> This issue will provide a  translation contrib package offering a LanguageTranslationUpdateProcessor.
> The new processor will probably utilize the existing [Solr Language Identifier|https://github.com/apache/lucene-solr/tree/master/solr/contrib/langid] and would enable a document to be translated based upon a user defined mapping. The LanguageTranslatorUpdateProcessor's should be pluggable and would be placed in an UpdateChain the same as the [LanguageIdentifierUpdateProcessor|https://github.com/apache/lucene-solr/blob/master/solr/contrib/langid/src/java/org/apache/solr/update/processor/LanguageIdentifierUpdateProcessor.java]'s
> It is my intent to also provide a wiki page which can be referenced and maintained in conjunction with the code. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org