You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Hudson (Jira)" <ji...@apache.org> on 2021/08/17 17:53:00 UTC
[jira] [Commented] (TIKA-3527) Add simple URLFetcher to tika-core
[ https://issues.apache.org/jira/browse/TIKA-3527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17400529#comment-17400529 ]
Hudson commented on TIKA-3527:
------------------------------
ABORTED: Integrated in Jenkins build Tika ยป tika-main-jdk8 #310 (See [https://ci-builds.apache.org/job/Tika/job/tika-main-jdk8/310/])
TIKA-3527 -- add a simple UrlFetcher (tallison: [https://github.com/apache/tika/commit/7077e9b822adb798efa260f587ab0a2babcfb746])
* (edit) tika-server/tika-server-standard/src/test/resources/config/tika-config-url-fetcher.xml
* (edit) CHANGES.txt
* (add) tika-core/src/main/java/org/apache/tika/pipes/fetcher/url/UrlFetcher.java
> Add simple URLFetcher to tika-core
> ----------------------------------
>
> Key: TIKA-3527
> URL: https://issues.apache.org/jira/browse/TIKA-3527
> Project: Tika
> Issue Type: Task
> Reporter: Tim Allison
> Assignee: Tim Allison
> Priority: Minor
> Fix For: 2.1.0
>
>
> In 1.x, users could send a URL including a file url to tika-server and have tika-server fetch the bytes. In 2.x, we created the tika-pipes modules and included a file fetcher in tika-core and put an http-fetcher in its own module because of its dependency on httpclient.
> To smooth the transition to 2.x, it might be useful to add a URLFetcher that uses the built-in basic Java URL.getConnection() functionality. I'd want to prohibit the file protocol because of the history with that as a vulnerability. If folks want to fetch files, they have to explicitly choose a different fetcher and specify a base path.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)