You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Hudson (Jira)" <ji...@apache.org> on 2021/08/17 17:53:00 UTC

[jira] [Commented] (TIKA-3527) Add simple URLFetcher to tika-core

    [ https://issues.apache.org/jira/browse/TIKA-3527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17400529#comment-17400529 ] 

Hudson commented on TIKA-3527:
------------------------------

ABORTED: Integrated in Jenkins build Tika ยป tika-main-jdk8 #310 (See [https://ci-builds.apache.org/job/Tika/job/tika-main-jdk8/310/])
TIKA-3527 -- add a simple UrlFetcher (tallison: [https://github.com/apache/tika/commit/7077e9b822adb798efa260f587ab0a2babcfb746])
* (edit) tika-server/tika-server-standard/src/test/resources/config/tika-config-url-fetcher.xml
* (edit) CHANGES.txt
* (add) tika-core/src/main/java/org/apache/tika/pipes/fetcher/url/UrlFetcher.java


> Add simple URLFetcher to tika-core
> ----------------------------------
>
>                 Key: TIKA-3527
>                 URL: https://issues.apache.org/jira/browse/TIKA-3527
>             Project: Tika
>          Issue Type: Task
>            Reporter: Tim Allison
>            Assignee: Tim Allison
>            Priority: Minor
>             Fix For: 2.1.0
>
>
> In 1.x, users could send a URL including a file url to tika-server and have tika-server fetch the bytes.  In 2.x, we created the tika-pipes modules and included  a file fetcher in tika-core and put an http-fetcher in its own module because of its dependency on httpclient.
> To smooth the transition to 2.x, it might be useful to add a URLFetcher that uses the built-in basic Java URL.getConnection() functionality.  I'd want to prohibit the file protocol because of the history with that as a vulnerability.  If folks want to fetch files, they have to explicitly choose a different fetcher and specify a base path.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)