You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Tim Allison (Jira)" <ji...@apache.org> on 2021/08/17 13:05:00 UTC

[jira] [Created] (TIKA-3527) Add simple URLFetcher to tika-core

Tim Allison created TIKA-3527:
---------------------------------

             Summary: Add simple URLFetcher to tika-core
                 Key: TIKA-3527
                 URL: https://issues.apache.org/jira/browse/TIKA-3527
             Project: Tika
          Issue Type: Task
            Reporter: Tim Allison


In 1.x, users could send a URL including a file url to tika-server and have tika-server fetch the bytes.  In 2.x, we created the tika-pipes modules and included  a file fetcher in tika-core and put an http-fetcher in its own module because of its dependency on httpclient.

To smooth the transition to 2.x, it might be useful to add a URLFetcher that uses the built-in basic Java URL.getConnection() functionality.  I'd want to prohibit the file protocol because of the history with that as a vulnerability.  If folks want to fetch files, they have to explicitly choose a different fetcher and specify a base path.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)