You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Chris A. Mattmann (Assigned) (JIRA)" <ji...@apache.org> on 2011/12/21 16:41:32 UTC

[jira] [Assigned] (TIKA-824) Extract rel attr with LinkContentHandler

     [ https://issues.apache.org/jira/browse/TIKA-824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chris A. Mattmann reassigned TIKA-824:
--------------------------------------

    Assignee: Chris A. Mattmann
    
> Extract rel attr with LinkContentHandler
> ----------------------------------------
>
>                 Key: TIKA-824
>                 URL: https://issues.apache.org/jira/browse/TIKA-824
>             Project: Tika
>          Issue Type: Improvement
>          Components: parser
>    Affects Versions: 1.0, 1.1
>            Reporter: Markus Jelsma
>            Assignee: Chris A. Mattmann
>            Priority: Minor
>             Fix For: 1.1
>
>         Attachments: TIKA-824-trunk-1.patch
>
>
> For Nutch we need to extract URL's but need the rel attribute to check for the nofollow value. I've patched the code to return this information in the Link object. It's been tested and i can read the rel in Nutch now.
> Thoughts?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira