event
[ https://issues.apache.org/jira/browse/TIKA-457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ken Krugler updated TIKA-457: ----------------------------- Attachment: TIKA-457.patch This also improves handling of <frame> elements for [TIKA-463], by resolving relative URLs in src=xxx attributes for these elements. > HTMLParser gets an early </body> event > -------------------------------------- > > Key: TIKA-457 > URL: https://issues.apache.org/jira/browse/TIKA-457 > Project: Tika > Issue Type: Bug > Components: parser > Reporter: Julien Nioche > Assignee: Ken Krugler > Attachments: TIKA-457.patch > > > I am using the IdentityMapper in the HTMLparser with this simple document: > {code} > <html><head><title> my title </title> > </head> > <body> > <frameset rows=\"20,*\"> > <frame src=\"top.html\"> > </frame> > <frameset cols=\"20,*\"> > <frame src=\"left.html\"> > </frame> > <frame src=\"invalid.html\"/> > </frame> > <frame src=\"right.html\"> > </frame> > </frameset> > </frameset> > </body></html> > {code} > Strangely the HTMLHandler is getting a call to endElement on the body *BEFORE* we reach frameset. As a result the variable bodylevel is decremented back to 0 and the remaining entities are ignored due to the logic implemented in HTMLHandler. > Any idea? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.