You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Markus Jelsma (Created) (JIRA)" <ji...@apache.org> on 2011/12/21 14:49:30 UTC
[jira] [Created] (TIKA-825) Extract rel attr with
LinkContentHandler
Extract rel attr with LinkContentHandler
----------------------------------------
Key: TIKA-825
URL: https://issues.apache.org/jira/browse/TIKA-825
Project: Tika
Issue Type: Improvement
Components: parser
Reporter: Markus Jelsma
Priority: Minor
For Nutch we need to extract URL's but need the rel attribute to check for the nofollow value. I've patched the code to return this information in the Link object. It's been tested and i can read the rel in Nutch now.
Thoughts?
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Closed] (TIKA-825) Extract rel attr with LinkContentHandler
Posted by "Markus Jelsma (Closed) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/TIKA-825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Markus Jelsma closed TIKA-825.
------------------------------
Resolution: Duplicate
For some reason it's added this issue twice. Closing.
> Extract rel attr with LinkContentHandler
> ----------------------------------------
>
> Key: TIKA-825
> URL: https://issues.apache.org/jira/browse/TIKA-825
> Project: Tika
> Issue Type: Improvement
> Components: parser
> Reporter: Markus Jelsma
> Priority: Minor
>
> For Nutch we need to extract URL's but need the rel attribute to check for the nofollow value. I've patched the code to return this information in the Link object. It's been tested and i can read the rel in Nutch now.
> Thoughts?
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira