You are viewing a plain text version of this content. The canonical link for it is here.
Posted to droids-dev@incubator.apache.org by "Richard Frovarp (Resolved) (JIRA)" <ji...@apache.org> on 2011/12/03 01:41:40 UTC

[jira] [Resolved] (DROIDS-115) LinkExtractor getURI(String target) does not resolve correctly when baseUri is provided

     [ https://issues.apache.org/jira/browse/DROIDS-115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Richard Frovarp resolved DROIDS-115.
------------------------------------

       Resolution: Fixed
    Fix Version/s: 0.0.2

LinkExtractor has been marked as deprecated as the Tika code to do it is much better. Use that functionality in the droids-tika module.
                
> LinkExtractor getURI(String target) does not resolve correctly when baseUri is provided
> ---------------------------------------------------------------------------------------
>
>                 Key: DROIDS-115
>                 URL: https://issues.apache.org/jira/browse/DROIDS-115
>             Project: Droids
>          Issue Type: Bug
>          Components: core
>            Reporter: Paul Rogalinski
>             Fix For: 0.0.2
>
>         Attachments: LinkExtractor.java
>
>
> the getURI() Method won't resolve the URL correctly if a baseUri is provided without the trailing slash *and* the relative url to be resolved does not start with "/". Under that circumstances it will resolve to: http://example.comRelativeUrl. 
> edit: previous patch did solve the problem only partially. 
> Modified methods (whole class attached):
>     public LinkExtractor(LinkTask base, Map<String, String> elements) {
>         super();
>         this.base = base;
>         this.elements = elements;
>         this.setBaseUri(base.getURI());
>     }
>     @Override
>     public void startElement(String uri, String loc, String raw, Attributes att) throws SAXException {
>         if (checkBase && BASE_ELEMENT.equalsIgnoreCase(loc) && att.getValue(BASE_ATTRIBUTE) != null) {
>             try {
>                 setBaseUri(new URI(att.getValue(BASE_ATTRIBUTE)));
>                 log.debug("Found base URI: " + baseUri);
>                 checkBase = false;
>             } catch (URISyntaxException e) {
>                 log.debug("Base URI not valid: " + att.getValue(BASE_ATTRIBUTE));
>             }
>         }
>         Iterator<String> it = elements.keySet().iterator();
>         String elem, linkAtt;
>         while (it.hasNext()) {
>             elem = it.next();
>             linkAtt = elements.get(elem);
>             if (elem.equalsIgnoreCase(loc) && att.getValue(linkAtt) != null) {
>                 link = getURI(att.getValue(linkAtt));
>                 log.debug("Found element: " + elem + " with link: " + link);
>                 if (link != null) {
>                     addOutlinkURI(link.toString());
>                     link = null;
>                     anchorText = new StringBuilder();
>                 }
>             }
>         }
>     }
>     public void setBaseUri(URI baseUri) {
>         if (baseUri.toString().endsWith(baseUri.getHost())) {
>             try {
>                 this.baseUri = new URI(baseUri.toString() + "/");
>             } catch (URISyntaxException e) {
>                 log.error("could not fix base URI", e);
>             }
>         } else {
>             this.baseUri = baseUri;
>         }
>     }

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira