You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Mohsen (JIRA)" <ji...@apache.org> on 2018/09/05 15:31:00 UTC

[jira] [Created] (TIKA-2724) Tika does not recognize http 3xx error codes when passed fileUrl

Mohsen created TIKA-2724:
----------------------------

             Summary: Tika does not recognize http 3xx error codes when passed fileUrl
                 Key: TIKA-2724
                 URL: https://issues.apache.org/jira/browse/TIKA-2724
             Project: Tika
          Issue Type: Bug
          Components: server
    Affects Versions: 1.18
            Reporter: Mohsen


When the {{fileUrl}} passed to the Tika server results in a 3xx http status code, Tika happily returns a 200 response.

*How to reproduce the issue*: Run tika server with {{-enableUnsecureFeatures and -enableFileUrl options. Then send a }}{{fileUrl}} to the server that returns a 300 error code. Here is a sample curl session:
{code:bash}
$ curl -XPUT -H 'fileUrl:http://google.com' localhost:9998/rmeta/text -v
* Trying ::1...
* TCP_NODELAY set
* Connected to localhost (::1) port 9998 (#0)
> PUT /rmeta/text HTTP/1.1
> Host: localhost:9998
> User-Agent: curl/7.54.0
> Accept: */*
> fileUrl:http://google.com
>
< HTTP/1.1 200 OK
< Content-Type: application/json
< Date: Wed, 05 Sep 2018 15:25:12 GMT
< Transfer-Encoding: chunked
< Server: Jetty(8.y.z-SNAPSHOT)
<
* Connection #0 to host localhost left intact
[{"Content-Encoding":"UTF-8","Content-Type":"text/html; charset\u003dUTF-8","Content-Type-Hint":"text/html; charset\u003dUTF-8","X-Parsed-By":["org.apache.tika.parser.DefaultParser","org.apache.tika.parser.html.HtmlParser"],"X-TIKA:content":"\n\n\n\n\n\n\n\n\nGoogle\n\n Search Images Maps Play YouTube News Gmail Drive More »\nWeb History | Settings | Sign in\n\n\n \n\n\n\n\n\t \t\n\n\tAdvanced searchLanguage tools\n\n\n\n\nGoogle offered in: Fran�ais \n\n\nAdvertising�ProgramsBusiness Solutions+GoogleAbout GoogleGoogle.ca\n\n© 2018 - Privacy - Terms\n\n\n","X-TIKA:parse_time_millis":"11","dc:title":"Google","title":"Google"}]{code}
 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)