You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Ken Krugler (JIRA)" <ji...@apache.org> on 2010/08/13 18:56:17 UTC

[jira] Assigned: (TIKA-457) HTMLParser gets an early event

     [ https://issues.apache.org/jira/browse/TIKA-457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ken Krugler reassigned TIKA-457:
--------------------------------

    Assignee: Ken Krugler

> HTMLParser gets an early </body> event
> --------------------------------------
>
>                 Key: TIKA-457
>                 URL: https://issues.apache.org/jira/browse/TIKA-457
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>            Reporter: Julien Nioche
>            Assignee: Ken Krugler
>
> I am using the IdentityMapper in the HTMLparser with this simple document:
> {code}
> <html><head><title> my title </title>
> </head>
> <body>
> <frameset rows=\"20,*\"> 
> <frame src=\"top.html\">
> </frame>
> <frameset cols=\"20,*\">
> <frame src=\"left.html\">
> </frame>
> <frame src=\"invalid.html\"/>
> </frame>
> <frame src=\"right.html\">
> </frame>
> </frameset>
> </frameset>
> </body></html>
> {code}
> Strangely the HTMLHandler is getting a call to endElement on the body *BEFORE*  we reach frameset. As a result the variable bodylevel is decremented back to 0 and the remaining entities are ignored due to the logic implemented in HTMLHandler.
> Any idea?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.