You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2016/10/19 16:21:59 UTC

[jira] [Updated] (TIKA-1425) Automatic batching of Microsoft service calls

     [ https://issues.apache.org/jira/browse/TIKA-1425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chris A. Mattmann updated TIKA-1425:
------------------------------------
    Fix Version/s:     (was: 1.14)
                   1.15

> Automatic batching of Microsoft service calls
> ---------------------------------------------
>
>                 Key: TIKA-1425
>                 URL: https://issues.apache.org/jira/browse/TIKA-1425
>             Project: Tika
>          Issue Type: Improvement
>          Components: translation
>    Affects Versions: 1.6
>            Reporter: Lewis John McGibbney
>            Assignee: Lewis John McGibbney
>             Fix For: 1.15
>
>
> Right now when I use the following code I get the stack trace at the bottom of this description. This seems to be because the Request URI is too large to make the service request. We need to have a mechansim within the call to Tika.translate which will, on a service-by-service basis, determine the maximum Request URI which can be sent. I beleive that this should be on the Tika side as how else am I meant to know the maximum request size?
> {code:title=translator.java|borderStyle=solid}
> +    Translator translate = new MicrosoftTranslator();
> +    ((MicrosoftTranslator) translate).setId("...");
> +    ((MicrosoftTranslator) translate).setSecret("...");
>      for (java.util.Map.Entry<Text, Parse> entry : parseResult) {
>        Parse parse = entry.getValue();
>        LOG.info("---------\nUrl\n---------------\n");
> @@ -201,7 +207,7 @@
>        System.out.print(parse.getData().toString());
>        if (dumpText) {
>          LOG.info("---------\nParseText\n---------\n");
> -        System.out.print(parse.getText());
> +        System.out.print(translate.translate(parse.getText(), "fr"));
>        }
> {code}
> {code:title=stacktrace.log|borderStyle=solid}
> Exception in thread "main" java.lang.Exception: [microsoft-translator-api] Error retrieving translation : Server returned HTTP response code: 414 for URL: http://api.microsofttranslator.com/V2/Ajax.svc/Translate?&from=&to=fr&text=%D0%A4%D0...
> ...
> 	at com.memetix.mst.MicrosoftTranslatorAPI.retrieveString(MicrosoftTranslatorAPI.java:202)
> 	at com.memetix.mst.translate.Translate.execute(Translate.java:61)
> 	at com.memetix.mst.translate.Translate.execute(Translate.java:76)
> 	at org.apache.tika.language.translate.MicrosoftTranslator.translate(MicrosoftTranslator.java:104)
> 	at org.apache.nutch.parse.ParserChecker.run(ParserChecker.java:210)
> 	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> 	at org.apache.nutch.parse.ParserChecker.main(ParserChecker.java:228)
> Caused by: java.io.IOException: Server returned HTTP response code: 414 for URL: http://api.microsofttranslator.com/V2/Ajax.svc/Translate?&from=&to=fr&text=%D0%A4%D0%BE%D1%80%D1%83%D0%B...
> ...
> 	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
> 	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
> 	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> 	at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
> 	at sun.net.www.protocol.http.HttpURLConnection$6.run(HttpURLConnection.java:1675)
> 	at sun.net.www.protocol.http.HttpURLConnection$6.run(HttpURLConnection.java:1673)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at sun.net.www.protocol.http.HttpURLConnection.getChainedException(HttpURLConnection.java:1671)
> 	at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1244)
> 	at com.memetix.mst.MicrosoftTranslatorAPI.retrieveResponse(MicrosoftTranslatorAPI.java:178)
> 	at com.memetix.mst.MicrosoftTranslatorAPI.retrieveString(MicrosoftTranslatorAPI.java:199)
> 	... 6 more
> Caused by: java.io.IOException: Server returned HTTP response code: 414 for URL: http://api.microsofttranslator.com/V2/Ajax.svc/Translate?&from=&to=fr&text=%D0%A4%D0%BE...
> ...
> 	at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1626)
> 	at java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:468)
> 	at com.memetix.mst.MicrosoftTranslatorAPI.retrieveResponse(MicrosoftTranslatorAPI.java:177)
> 	... 7 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)