You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2016/10/19 16:21:59 UTC
[jira] [Updated] (TIKA-1425) Automatic batching of Microsoft
service calls
[ https://issues.apache.org/jira/browse/TIKA-1425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Chris A. Mattmann updated TIKA-1425:
------------------------------------
Fix Version/s: (was: 1.14)
1.15
> Automatic batching of Microsoft service calls
> ---------------------------------------------
>
> Key: TIKA-1425
> URL: https://issues.apache.org/jira/browse/TIKA-1425
> Project: Tika
> Issue Type: Improvement
> Components: translation
> Affects Versions: 1.6
> Reporter: Lewis John McGibbney
> Assignee: Lewis John McGibbney
> Fix For: 1.15
>
>
> Right now when I use the following code I get the stack trace at the bottom of this description. This seems to be because the Request URI is too large to make the service request. We need to have a mechansim within the call to Tika.translate which will, on a service-by-service basis, determine the maximum Request URI which can be sent. I beleive that this should be on the Tika side as how else am I meant to know the maximum request size?
> {code:title=translator.java|borderStyle=solid}
> + Translator translate = new MicrosoftTranslator();
> + ((MicrosoftTranslator) translate).setId("...");
> + ((MicrosoftTranslator) translate).setSecret("...");
> for (java.util.Map.Entry<Text, Parse> entry : parseResult) {
> Parse parse = entry.getValue();
> LOG.info("---------\nUrl\n---------------\n");
> @@ -201,7 +207,7 @@
> System.out.print(parse.getData().toString());
> if (dumpText) {
> LOG.info("---------\nParseText\n---------\n");
> - System.out.print(parse.getText());
> + System.out.print(translate.translate(parse.getText(), "fr"));
> }
> {code}
> {code:title=stacktrace.log|borderStyle=solid}
> Exception in thread "main" java.lang.Exception: [microsoft-translator-api] Error retrieving translation : Server returned HTTP response code: 414 for URL: http://api.microsofttranslator.com/V2/Ajax.svc/Translate?&from=&to=fr&text=%D0%A4%D0...
> ...
> at com.memetix.mst.MicrosoftTranslatorAPI.retrieveString(MicrosoftTranslatorAPI.java:202)
> at com.memetix.mst.translate.Translate.execute(Translate.java:61)
> at com.memetix.mst.translate.Translate.execute(Translate.java:76)
> at org.apache.tika.language.translate.MicrosoftTranslator.translate(MicrosoftTranslator.java:104)
> at org.apache.nutch.parse.ParserChecker.run(ParserChecker.java:210)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> at org.apache.nutch.parse.ParserChecker.main(ParserChecker.java:228)
> Caused by: java.io.IOException: Server returned HTTP response code: 414 for URL: http://api.microsofttranslator.com/V2/Ajax.svc/Translate?&from=&to=fr&text=%D0%A4%D0%BE%D1%80%D1%83%D0%B...
> ...
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
> at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
> at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
> at sun.net.www.protocol.http.HttpURLConnection$6.run(HttpURLConnection.java:1675)
> at sun.net.www.protocol.http.HttpURLConnection$6.run(HttpURLConnection.java:1673)
> at java.security.AccessController.doPrivileged(Native Method)
> at sun.net.www.protocol.http.HttpURLConnection.getChainedException(HttpURLConnection.java:1671)
> at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1244)
> at com.memetix.mst.MicrosoftTranslatorAPI.retrieveResponse(MicrosoftTranslatorAPI.java:178)
> at com.memetix.mst.MicrosoftTranslatorAPI.retrieveString(MicrosoftTranslatorAPI.java:199)
> ... 6 more
> Caused by: java.io.IOException: Server returned HTTP response code: 414 for URL: http://api.microsofttranslator.com/V2/Ajax.svc/Translate?&from=&to=fr&text=%D0%A4%D0%BE...
> ...
> at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1626)
> at java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:468)
> at com.memetix.mst.MicrosoftTranslatorAPI.retrieveResponse(MicrosoftTranslatorAPI.java:177)
> ... 7 more
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)