You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Tim Allison (Jira)" <ji...@apache.org> on 2021/08/17 13:05:00 UTC
[jira] [Created] (TIKA-3527) Add simple URLFetcher to tika-core
Tim Allison created TIKA-3527:
---------------------------------
Summary: Add simple URLFetcher to tika-core
Key: TIKA-3527
URL: https://issues.apache.org/jira/browse/TIKA-3527
Project: Tika
Issue Type: Task
Reporter: Tim Allison
In 1.x, users could send a URL including a file url to tika-server and have tika-server fetch the bytes. In 2.x, we created the tika-pipes modules and included a file fetcher in tika-core and put an http-fetcher in its own module because of its dependency on httpclient.
To smooth the transition to 2.x, it might be useful to add a URLFetcher that uses the built-in basic Java URL.getConnection() functionality. I'd want to prohibit the file protocol because of the history with that as a vulnerability. If folks want to fetch files, they have to explicitly choose a different fetcher and specify a base path.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)