You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@maven.apache.org by "Jesse Glick (JIRA)" <ji...@codehaus.org> on 2010/08/23 21:33:40 UTC

[jira] Commented: (WAGON-218) Link Parsing in http is flawed

    [ http://jira.codehaus.org/browse/WAGON-218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=232855#action_232855 ] 

Jesse Glick commented on WAGON-218:
-----------------------------------

Why not get rid of nekohtml (+ XNI), saving 152 Kb as well as complexity in the Maven third-party dependency list, and directly search for {{(?i)<a href="(.+?)">}} or similar? After all, the intended use case is to find links in index listings generated by a small number of distinct pieces of software. These generators are surely not going to use exotic formatting or attributes of the kind created by humans editing HTML by hand or with WYSIWYG designers.

I would be happy to supply a patch if there is interest.

> Link Parsing in http is flawed
> ------------------------------
>
>                 Key: WAGON-218
>                 URL: http://jira.codehaus.org/browse/WAGON-218
>             Project: Maven Wagon
>          Issue Type: Improvement
>          Components: wagon-http, wagon-http-lightweight
>    Affects Versions: 1.0-beta-2
>            Reporter: Joakim Erdfelt
>            Assignee: Joakim Erdfelt
>
> The link parsing in wagon http has a few issues.
> a) not all links detected.
> b) the various ways that page content is identified via url string manipulation isn't working in many example cases.
> c) the use of jtidy introduces a large dependency and high memory usage.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://jira.codehaus.org/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira