You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Julien Nioche (JIRA)" <ji...@apache.org> on 2010/07/08 15:54:50 UTC

[jira] Created: (TIKA-460) HTMLHandler misses treatment of A elements

HTMLHandler misses treatment of A elements 
-------------------------------------------

                 Key: TIKA-460
                 URL: https://issues.apache.org/jira/browse/TIKA-460
             Project: Tika
          Issue Type: Bug
          Components: parser
    Affects Versions: 0.7
            Reporter: Julien Nioche
            Assignee: Julien Nioche
             Fix For: 0.8


The A elements should be processed before any other safe element, otherwise it never happens

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (TIKA-460) HTMLHandler misses treatment of A elements

Posted by "Ken Krugler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/TIKA-460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12898361#action_12898361 ] 

Ken Krugler commented on TIKA-460:
----------------------------------

In that case, I'd say go ahead and commit it. You'll probably need to re-generate it since I've been mucking with the same files - sorry :(


> HTMLHandler misses treatment of A elements 
> -------------------------------------------
>
>                 Key: TIKA-460
>                 URL: https://issues.apache.org/jira/browse/TIKA-460
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 0.7
>            Reporter: Julien Nioche
>            Assignee: Julien Nioche
>             Fix For: 0.8
>
>         Attachments: TIKA-460.patch
>
>
> The A elements should be processed before any other safe element, otherwise it never happens

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (TIKA-460) HTMLHandler misses treatment of A elements

Posted by "Julien Nioche (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/TIKA-460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12898335#action_12898335 ] 

Julien Nioche commented on TIKA-460:
------------------------------------

Hi Ken, correct. The A's get bypassed otherwise. Tika-463 would be a cleaner way of dealing with situations like these but in the meantime the patch should be OK 

> HTMLHandler misses treatment of A elements 
> -------------------------------------------
>
>                 Key: TIKA-460
>                 URL: https://issues.apache.org/jira/browse/TIKA-460
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 0.7
>            Reporter: Julien Nioche
>            Assignee: Julien Nioche
>             Fix For: 0.8
>
>         Attachments: TIKA-460.patch
>
>
> The A elements should be processed before any other safe element, otherwise it never happens

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (TIKA-460) HTMLHandler misses treatment of A elements

Posted by "Ken Krugler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/TIKA-460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12898315#action_12898315 ] 

Ken Krugler commented on TIKA-460:
----------------------------------

Hi Julien - I'm assuming this is required for proper relative link resolution when using something like the IdentityHtmlMapper, right?

> HTMLHandler misses treatment of A elements 
> -------------------------------------------
>
>                 Key: TIKA-460
>                 URL: https://issues.apache.org/jira/browse/TIKA-460
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 0.7
>            Reporter: Julien Nioche
>            Assignee: Julien Nioche
>             Fix For: 0.8
>
>         Attachments: TIKA-460.patch
>
>
> The A elements should be processed before any other safe element, otherwise it never happens

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Resolved: (TIKA-460) HTMLHandler misses treatment of A elements

Posted by "Julien Nioche (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/TIKA-460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Julien Nioche resolved TIKA-460.
--------------------------------

    Resolution: Fixed

Committed revision 985444

The A elements are now processed correctly when using the IdentityMapper. I have added <A> to the list of safe elements in the DefaultHTMLMapper.

Ken - the element A still have a special treatment so the safe attributes you added in       

{code}
put("a", attrSet("rel", "name"));
{code}

are still not used. Since A was not in the list of safe elements these attributes were not used anyway

I still think that we should delegate the logic to the mappers as suggested in TIKA-463 but in the meantime this fix allows us to get to the A's using the IdentityMapper and simplifies the code a bit. 

> HTMLHandler misses treatment of A elements 
> -------------------------------------------
>
>                 Key: TIKA-460
>                 URL: https://issues.apache.org/jira/browse/TIKA-460
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 0.7
>            Reporter: Julien Nioche
>            Assignee: Julien Nioche
>             Fix For: 0.8
>
>         Attachments: TIKA-460.patch
>
>
> The A elements should be processed before any other safe element, otherwise it never happens

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (TIKA-460) HTMLHandler misses treatment of A elements

Posted by "Julien Nioche (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/TIKA-460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12887718#action_12887718 ] 

Julien Nioche commented on TIKA-460:
------------------------------------

this would work if we had <a> in the list of safe elements in the DefaultHTMLMapper, which is not the case. Will wait for the outcome of the discussions on TIKA-463 which will affect the way link elements are handled.  

> HTMLHandler misses treatment of A elements 
> -------------------------------------------
>
>                 Key: TIKA-460
>                 URL: https://issues.apache.org/jira/browse/TIKA-460
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 0.7
>            Reporter: Julien Nioche
>            Assignee: Julien Nioche
>             Fix For: 0.8
>
>         Attachments: TIKA-460.patch
>
>
> The A elements should be processed before any other safe element, otherwise it never happens

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (TIKA-460) HTMLHandler misses treatment of A elements

Posted by "Julien Nioche (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/TIKA-460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Julien Nioche updated TIKA-460:
-------------------------------

    Attachment: TIKA-460.patch

> HTMLHandler misses treatment of A elements 
> -------------------------------------------
>
>                 Key: TIKA-460
>                 URL: https://issues.apache.org/jira/browse/TIKA-460
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 0.7
>            Reporter: Julien Nioche
>            Assignee: Julien Nioche
>             Fix For: 0.8
>
>         Attachments: TIKA-460.patch
>
>
> The A elements should be processed before any other safe element, otherwise it never happens

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.