You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "colin (JIRA)" <ji...@apache.org> on 2015/04/23 14:13:40 UTC
[jira] [Created] (TIKA-1615) Html fragments with comments before
div elements are not been detected as html
colin created TIKA-1615:
---------------------------
Summary: Html fragments with comments before div elements are not been detected as html
Key: TIKA-1615
URL: https://issues.apache.org/jira/browse/TIKA-1615
Project: Tika
Issue Type: Bug
Components: detector
Affects Versions: 1.7
Reporter: colin
We are trying to import html fragments into Solr.
The below is not being detected as html
<!-- test -->
<div>
test
</div>
When the comment is removed the fragment is being parsed as html, this functionality was added by https://issues.apache.org/jira/browse/TIKA-1102
To work around this, we added
<root-XML localName="div"/>
<root-XML localName="DIV"/>
to the <mime-type type="text/html"> element in tika-mimetypes.xml
The fragment is then parsed as expected
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)