You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Julien Nioche (JIRA)" <ji...@apache.org> on 2010/07/08 15:54:50 UTC
[jira] Created: (TIKA-460) HTMLHandler misses treatment of A
elements
HTMLHandler misses treatment of A elements
-------------------------------------------
Key: TIKA-460
URL: https://issues.apache.org/jira/browse/TIKA-460
Project: Tika
Issue Type: Bug
Components: parser
Affects Versions: 0.7
Reporter: Julien Nioche
Assignee: Julien Nioche
Fix For: 0.8
The A elements should be processed before any other safe element, otherwise it never happens
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (TIKA-460) HTMLHandler misses treatment of A
elements
Posted by "Ken Krugler (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/TIKA-460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12898361#action_12898361 ]
Ken Krugler commented on TIKA-460:
----------------------------------
In that case, I'd say go ahead and commit it. You'll probably need to re-generate it since I've been mucking with the same files - sorry :(
> HTMLHandler misses treatment of A elements
> -------------------------------------------
>
> Key: TIKA-460
> URL: https://issues.apache.org/jira/browse/TIKA-460
> Project: Tika
> Issue Type: Bug
> Components: parser
> Affects Versions: 0.7
> Reporter: Julien Nioche
> Assignee: Julien Nioche
> Fix For: 0.8
>
> Attachments: TIKA-460.patch
>
>
> The A elements should be processed before any other safe element, otherwise it never happens
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (TIKA-460) HTMLHandler misses treatment of A
elements
Posted by "Julien Nioche (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/TIKA-460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12898335#action_12898335 ]
Julien Nioche commented on TIKA-460:
------------------------------------
Hi Ken, correct. The A's get bypassed otherwise. Tika-463 would be a cleaner way of dealing with situations like these but in the meantime the patch should be OK
> HTMLHandler misses treatment of A elements
> -------------------------------------------
>
> Key: TIKA-460
> URL: https://issues.apache.org/jira/browse/TIKA-460
> Project: Tika
> Issue Type: Bug
> Components: parser
> Affects Versions: 0.7
> Reporter: Julien Nioche
> Assignee: Julien Nioche
> Fix For: 0.8
>
> Attachments: TIKA-460.patch
>
>
> The A elements should be processed before any other safe element, otherwise it never happens
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (TIKA-460) HTMLHandler misses treatment of A
elements
Posted by "Ken Krugler (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/TIKA-460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12898315#action_12898315 ]
Ken Krugler commented on TIKA-460:
----------------------------------
Hi Julien - I'm assuming this is required for proper relative link resolution when using something like the IdentityHtmlMapper, right?
> HTMLHandler misses treatment of A elements
> -------------------------------------------
>
> Key: TIKA-460
> URL: https://issues.apache.org/jira/browse/TIKA-460
> Project: Tika
> Issue Type: Bug
> Components: parser
> Affects Versions: 0.7
> Reporter: Julien Nioche
> Assignee: Julien Nioche
> Fix For: 0.8
>
> Attachments: TIKA-460.patch
>
>
> The A elements should be processed before any other safe element, otherwise it never happens
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Resolved: (TIKA-460) HTMLHandler misses treatment of A
elements
Posted by "Julien Nioche (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/TIKA-460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Julien Nioche resolved TIKA-460.
--------------------------------
Resolution: Fixed
Committed revision 985444
The A elements are now processed correctly when using the IdentityMapper. I have added <A> to the list of safe elements in the DefaultHTMLMapper.
Ken - the element A still have a special treatment so the safe attributes you added in
{code}
put("a", attrSet("rel", "name"));
{code}
are still not used. Since A was not in the list of safe elements these attributes were not used anyway
I still think that we should delegate the logic to the mappers as suggested in TIKA-463 but in the meantime this fix allows us to get to the A's using the IdentityMapper and simplifies the code a bit.
> HTMLHandler misses treatment of A elements
> -------------------------------------------
>
> Key: TIKA-460
> URL: https://issues.apache.org/jira/browse/TIKA-460
> Project: Tika
> Issue Type: Bug
> Components: parser
> Affects Versions: 0.7
> Reporter: Julien Nioche
> Assignee: Julien Nioche
> Fix For: 0.8
>
> Attachments: TIKA-460.patch
>
>
> The A elements should be processed before any other safe element, otherwise it never happens
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (TIKA-460) HTMLHandler misses treatment of A
elements
Posted by "Julien Nioche (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/TIKA-460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12887718#action_12887718 ]
Julien Nioche commented on TIKA-460:
------------------------------------
this would work if we had <a> in the list of safe elements in the DefaultHTMLMapper, which is not the case. Will wait for the outcome of the discussions on TIKA-463 which will affect the way link elements are handled.
> HTMLHandler misses treatment of A elements
> -------------------------------------------
>
> Key: TIKA-460
> URL: https://issues.apache.org/jira/browse/TIKA-460
> Project: Tika
> Issue Type: Bug
> Components: parser
> Affects Versions: 0.7
> Reporter: Julien Nioche
> Assignee: Julien Nioche
> Fix For: 0.8
>
> Attachments: TIKA-460.patch
>
>
> The A elements should be processed before any other safe element, otherwise it never happens
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (TIKA-460) HTMLHandler misses treatment of A
elements
Posted by "Julien Nioche (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/TIKA-460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Julien Nioche updated TIKA-460:
-------------------------------
Attachment: TIKA-460.patch
> HTMLHandler misses treatment of A elements
> -------------------------------------------
>
> Key: TIKA-460
> URL: https://issues.apache.org/jira/browse/TIKA-460
> Project: Tika
> Issue Type: Bug
> Components: parser
> Affects Versions: 0.7
> Reporter: Julien Nioche
> Assignee: Julien Nioche
> Fix For: 0.8
>
> Attachments: TIKA-460.patch
>
>
> The A elements should be processed before any other safe element, otherwise it never happens
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.