You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2015/03/19 07:00:47 UTC
[jira] [Resolved] (TIKA-1365) Incorrectly MimeType detection for
Apache Lucene web site
[ https://issues.apache.org/jira/browse/TIKA-1365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Chris A. Mattmann resolved TIKA-1365.
-------------------------------------
Thank you [~mkrio]!
PR #35 merged:
{noformat}
[chipotle:~/tmp/tika] mattmann% svn commit -m "Fix for TIKA-1365 Lower priority for XML starting with comment, allow HTML starting with comment to be detected as text/html contributed by Matthias Krueger <mk...@mkr.io> this closes #35."
Sending CHANGES.txt
Sending tika-core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml
Sending tika-parsers/src/test/java/org/apache/tika/mime/TestMimeTypes.java
Transmitting file data ...
Committed revision 1667658.
[chipotle:~/tmp/tika] mattmann%
{noformat}
> Incorrectly MimeType detection for Apache Lucene web site
> ---------------------------------------------------------
>
> Key: TIKA-1365
> URL: https://issues.apache.org/jira/browse/TIKA-1365
> Project: Tika
> Issue Type: Bug
> Components: detector
> Affects Versions: 1.5
> Reporter: Tien Nguyen Manh
> Assignee: Chris A. Mattmann
> Fix For: 1.8
>
> Attachments: discussion.html
>
>
> Tika 1.5 detect many page from apache lucene web site as xml, for example this page
> http://lucene.apache.org/core/discussion.html
> Here are error log:, it failed to parse becuase it use xml parser
> Apache Tika was unable to parse the document
> at http://lucene.apache.org/core/discussion.html.
> The full exception stack trace is included below:
> org.apache.tika.exception.TikaException: XML parse error
> at org.apache.tika.parser.xml.XMLParser.parse(XMLParser.java:78)
> at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
> at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
> at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
> at org.apache.tika.gui.TikaGUI.handleStream(TikaGUI.java:320)
> at org.apache.tika.gui.TikaGUI.openURL(TikaGUI.java:293)
> at org.apache.tika.gui.TikaGUI.actionPerformed(TikaGUI.java:247)
> at javax.swing.AbstractButton.fireActionPerformed(AbstractButton.java:2018)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)