You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Jukka Zitting (JIRA)" <ji...@apache.org> on 2009/10/16 14:24:31 UTC

[jira] Resolved: (TIKA-287) HtmlParser should resolve relative paths in elements

     [ https://issues.apache.org/jira/browse/TIKA-287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jukka Zitting resolved TIKA-287.
--------------------------------

       Resolution: Fixed
    Fix Version/s: 0.5

Thanks, your code was quite useful. I've adapted it to Tika in revision 825863.

> HtmlParser should resolve relative paths in <a href="xxx"> elements
> -------------------------------------------------------------------
>
>                 Key: TIKA-287
>                 URL: https://issues.apache.org/jira/browse/TIKA-287
>             Project: Tika
>          Issue Type: Improvement
>          Components: parser
>    Affects Versions: 0.4
>            Reporter: Ken Krugler
>            Assignee: Jukka Zitting
>             Fix For: 0.5
>
>         Attachments: UrlUtils.java, UrlUtilsTest.java
>
>
> Currently clients of the HtmlParser need to manually keep track of the appropriate base URL to use when resolving relative URLs in href="xxx" attributes.
> The parser should use the metadata RESOURCE_NAME_KEY value as the base.
> The parser should also watch for a <base> element in the <head> section, and use that to update the base URL.
> Note that special care must be taken to work around a known bug in the Java URL() class, when the relative URL is a query string and the base URL doesn't end with a '/'.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.