You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Markus Jelsma (Created) (JIRA)" <ji...@apache.org> on 2011/12/21 12:27:30 UTC
[jira] [Created] (NUTCH-1230) MimeType utils broken with Tika 1.1
MimeType utils broken with Tika 1.1
-----------------------------------
Key: NUTCH-1230
URL: https://issues.apache.org/jira/browse/NUTCH-1230
Project: Nutch
Issue Type: Bug
Reporter: Markus Jelsma
Assignee: Markus Jelsma
Fix For: 1.5
We used Tika 1.0-SNAPSHOT in production and just switched to 1.1-SNAPSHOT. The new version triggers the following error:
{code}
2011-12-21 12:29:56,665 ERROR http.Http - java.lang.IllegalAccessError: tried to access method org.apache.tika.mime.MimeTypes.getMimeType([B)Lorg/apache/tika/mime/MimeType; from class org.apache.nutch.util.MimeUtil
2011-12-21 12:29:56,665 ERROR http.Http - at org.apache.nutch.util.MimeUtil.autoResolveContentType(MimeUtil.java:169)
2011-12-21 12:29:56,665 ERROR http.Http - at org.apache.nutch.protocol.Content.getContentType(Content.java:292)
2011-12-21 12:29:56,666 ERROR http.Http - at org.apache.nutch.protocol.Content.<init>(Content.java:88)
2011-12-21 12:29:56,666 ERROR http.Http - at org.apache.nutch.protocol.http.api.HttpBase.getProtocolOutput(HttpBase.java:142)
2011-12-21 12:29:56,666 ERROR http.Http - at org.apache.nutch.parse.ParserChecker.run(ParserChecker.java:82)
2011-12-21 12:29:56,666 ERROR http.Http - at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:69)
2011-12-21 12:29:56,666 ERROR http.Http - at org.apache.nutch.parse.ParserChecker.main(ParserChecker.java:138)
{code}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (NUTCH-1230) MimeType API deprecated and breaks
with Tika 1.0
Posted by "Markus Jelsma (Updated) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/NUTCH-1230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Markus Jelsma updated NUTCH-1230:
---------------------------------
Attachment: NUTCH-1230-1.5-3.patch
I feel like a fool sometimes but its sorted now! All tests pass! I will commit and upgrade Tika shortly if there are no objections.
> MimeType API deprecated and breaks with Tika 1.0
> ------------------------------------------------
>
> Key: NUTCH-1230
> URL: https://issues.apache.org/jira/browse/NUTCH-1230
> Project: Nutch
> Issue Type: Bug
> Reporter: Markus Jelsma
> Assignee: Markus Jelsma
> Priority: Blocker
> Fix For: 1.5
>
> Attachments: NUTCH-1230-1.5-2.patch, NUTCH-1230-1.5-3.patch
>
>
> We used Tika 1.0-SNAPSHOT in production and just switched to 1.1-SNAPSHOT. The new version triggers the following error:
> {code}
> 2011-12-21 12:29:56,665 ERROR http.Http - java.lang.IllegalAccessError: tried to access method org.apache.tika.mime.MimeTypes.getMimeType([B)Lorg/apache/tika/mime/MimeType; from class org.apache.nutch.util.MimeUtil
> 2011-12-21 12:29:56,665 ERROR http.Http - at org.apache.nutch.util.MimeUtil.autoResolveContentType(MimeUtil.java:169)
> 2011-12-21 12:29:56,665 ERROR http.Http - at org.apache.nutch.protocol.Content.getContentType(Content.java:292)
> 2011-12-21 12:29:56,666 ERROR http.Http - at org.apache.nutch.protocol.Content.<init>(Content.java:88)
> 2011-12-21 12:29:56,666 ERROR http.Http - at org.apache.nutch.protocol.http.api.HttpBase.getProtocolOutput(HttpBase.java:142)
> 2011-12-21 12:29:56,666 ERROR http.Http - at org.apache.nutch.parse.ParserChecker.run(ParserChecker.java:82)
> 2011-12-21 12:29:56,666 ERROR http.Http - at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:69)
> 2011-12-21 12:29:56,666 ERROR http.Http - at org.apache.nutch.parse.ParserChecker.main(ParserChecker.java:138)
> {code}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (NUTCH-1230) MimeType API deprecated and breaks
with Tika 1.0
Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/NUTCH-1230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13176441#comment-13176441 ]
Hudson commented on NUTCH-1230:
-------------------------------
Integrated in Nutch-trunk #1706 (See [https://builds.apache.org/job/Nutch-trunk/1706/])
NUTCH-1230 and NUTCH-1231 Upgrade to Tika 1.0 and using new Tika detect API
markus : http://svn.apache.org/viewvc/nutch/trunk/viewvc/?view=rev&root=&revision=1224916
Files :
* /nutch/trunk/CHANGES.txt
* /nutch/trunk/ivy/ivy.xml
* /nutch/trunk/src/java/org/apache/nutch/util/MimeUtil.java
* /nutch/trunk/src/plugin/index-more/src/java/org/apache/nutch/indexer/more/MoreIndexingFilter.java
* /nutch/trunk/src/plugin/parse-zip/src/java/org/apache/nutch/parse/zip/ZipTextExtractor.java
* /nutch/trunk/src/plugin/protocol-file/src/java/org/apache/nutch/protocol/file/FileResponse.java
> MimeType API deprecated and breaks with Tika 1.0
> ------------------------------------------------
>
> Key: NUTCH-1230
> URL: https://issues.apache.org/jira/browse/NUTCH-1230
> Project: Nutch
> Issue Type: Bug
> Reporter: Markus Jelsma
> Assignee: Markus Jelsma
> Priority: Blocker
> Fix For: 1.5
>
> Attachments: NUTCH-1230-1.5-2.patch, NUTCH-1230-1.5-3.patch
>
>
> We used Tika 1.0-SNAPSHOT in production and just switched to 1.1-SNAPSHOT. The new version triggers the following error:
> {code}
> 2011-12-21 12:29:56,665 ERROR http.Http - java.lang.IllegalAccessError: tried to access method org.apache.tika.mime.MimeTypes.getMimeType([B)Lorg/apache/tika/mime/MimeType; from class org.apache.nutch.util.MimeUtil
> 2011-12-21 12:29:56,665 ERROR http.Http - at org.apache.nutch.util.MimeUtil.autoResolveContentType(MimeUtil.java:169)
> 2011-12-21 12:29:56,665 ERROR http.Http - at org.apache.nutch.protocol.Content.getContentType(Content.java:292)
> 2011-12-21 12:29:56,666 ERROR http.Http - at org.apache.nutch.protocol.Content.<init>(Content.java:88)
> 2011-12-21 12:29:56,666 ERROR http.Http - at org.apache.nutch.protocol.http.api.HttpBase.getProtocolOutput(HttpBase.java:142)
> 2011-12-21 12:29:56,666 ERROR http.Http - at org.apache.nutch.parse.ParserChecker.run(ParserChecker.java:82)
> 2011-12-21 12:29:56,666 ERROR http.Http - at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:69)
> 2011-12-21 12:29:56,666 ERROR http.Http - at org.apache.nutch.parse.ParserChecker.main(ParserChecker.java:138)
> {code}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Edited] (NUTCH-1230) MimeType utils broken
with Tika 1.1
Posted by "Markus Jelsma (Issue Comment Edited) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/NUTCH-1230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13174023#comment-13174023 ]
Markus Jelsma edited comment on NUTCH-1230 at 12/21/11 11:40 AM:
-----------------------------------------------------------------
Need to use Tika.detect API instead of MimeTypes API.
was (Author: markus17):
We previously used the byte[] as input but Tika now required java.io.File:
http://tika.apache.org/1.0/api/org/apache/tika/Tika.html#detect%28java.io.File%29
in o.a.n.Content we have the byte[] but must pass a File to MimeUtil.autoResolveContentType(). But i have no idea how i can convert an in-memory byte[] to a File?!
I hate getting stuck again, any advice would be more then helpful!
> MimeType utils broken with Tika 1.1
> -----------------------------------
>
> Key: NUTCH-1230
> URL: https://issues.apache.org/jira/browse/NUTCH-1230
> Project: Nutch
> Issue Type: Bug
> Reporter: Markus Jelsma
> Assignee: Markus Jelsma
> Fix For: 1.5
>
>
> We used Tika 1.0-SNAPSHOT in production and just switched to 1.1-SNAPSHOT. The new version triggers the following error:
> {code}
> 2011-12-21 12:29:56,665 ERROR http.Http - java.lang.IllegalAccessError: tried to access method org.apache.tika.mime.MimeTypes.getMimeType([B)Lorg/apache/tika/mime/MimeType; from class org.apache.nutch.util.MimeUtil
> 2011-12-21 12:29:56,665 ERROR http.Http - at org.apache.nutch.util.MimeUtil.autoResolveContentType(MimeUtil.java:169)
> 2011-12-21 12:29:56,665 ERROR http.Http - at org.apache.nutch.protocol.Content.getContentType(Content.java:292)
> 2011-12-21 12:29:56,666 ERROR http.Http - at org.apache.nutch.protocol.Content.<init>(Content.java:88)
> 2011-12-21 12:29:56,666 ERROR http.Http - at org.apache.nutch.protocol.http.api.HttpBase.getProtocolOutput(HttpBase.java:142)
> 2011-12-21 12:29:56,666 ERROR http.Http - at org.apache.nutch.parse.ParserChecker.run(ParserChecker.java:82)
> 2011-12-21 12:29:56,666 ERROR http.Http - at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:69)
> 2011-12-21 12:29:56,666 ERROR http.Http - at org.apache.nutch.parse.ParserChecker.main(ParserChecker.java:138)
> {code}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (NUTCH-1230) MimeType API deprecated and breaks
with Tika 1.0
Posted by "Markus Jelsma (Resolved) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/NUTCH-1230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Markus Jelsma resolved NUTCH-1230.
----------------------------------
Resolution: Fixed
Committed for 1.5 in rev. 1224916.
> MimeType API deprecated and breaks with Tika 1.0
> ------------------------------------------------
>
> Key: NUTCH-1230
> URL: https://issues.apache.org/jira/browse/NUTCH-1230
> Project: Nutch
> Issue Type: Bug
> Reporter: Markus Jelsma
> Assignee: Markus Jelsma
> Priority: Blocker
> Fix For: 1.5
>
> Attachments: NUTCH-1230-1.5-2.patch, NUTCH-1230-1.5-3.patch
>
>
> We used Tika 1.0-SNAPSHOT in production and just switched to 1.1-SNAPSHOT. The new version triggers the following error:
> {code}
> 2011-12-21 12:29:56,665 ERROR http.Http - java.lang.IllegalAccessError: tried to access method org.apache.tika.mime.MimeTypes.getMimeType([B)Lorg/apache/tika/mime/MimeType; from class org.apache.nutch.util.MimeUtil
> 2011-12-21 12:29:56,665 ERROR http.Http - at org.apache.nutch.util.MimeUtil.autoResolveContentType(MimeUtil.java:169)
> 2011-12-21 12:29:56,665 ERROR http.Http - at org.apache.nutch.protocol.Content.getContentType(Content.java:292)
> 2011-12-21 12:29:56,666 ERROR http.Http - at org.apache.nutch.protocol.Content.<init>(Content.java:88)
> 2011-12-21 12:29:56,666 ERROR http.Http - at org.apache.nutch.protocol.http.api.HttpBase.getProtocolOutput(HttpBase.java:142)
> 2011-12-21 12:29:56,666 ERROR http.Http - at org.apache.nutch.parse.ParserChecker.run(ParserChecker.java:82)
> 2011-12-21 12:29:56,666 ERROR http.Http - at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:69)
> 2011-12-21 12:29:56,666 ERROR http.Http - at org.apache.nutch.parse.ParserChecker.main(ParserChecker.java:138)
> {code}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (NUTCH-1230) MimeType utils broken with Tika 1.1
Posted by "Markus Jelsma (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/NUTCH-1230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13174023#comment-13174023 ]
Markus Jelsma commented on NUTCH-1230:
--------------------------------------
We previously used the byte[] as input but Tika now required java.io.File:
http://tika.apache.org/1.0/api/org/apache/tika/Tika.html#detect%28java.io.File%29
in o.a.n.Content we have the byte[] but must pass a File to MimeUtil.autoResolveContentType(). But i have no idea how i can convert an in-memory byte[] to a File?!
I hate getting stuck again, any advice would be more then helpful!
> MimeType utils broken with Tika 1.1
> -----------------------------------
>
> Key: NUTCH-1230
> URL: https://issues.apache.org/jira/browse/NUTCH-1230
> Project: Nutch
> Issue Type: Bug
> Reporter: Markus Jelsma
> Assignee: Markus Jelsma
> Fix For: 1.5
>
>
> We used Tika 1.0-SNAPSHOT in production and just switched to 1.1-SNAPSHOT. The new version triggers the following error:
> {code}
> 2011-12-21 12:29:56,665 ERROR http.Http - java.lang.IllegalAccessError: tried to access method org.apache.tika.mime.MimeTypes.getMimeType([B)Lorg/apache/tika/mime/MimeType; from class org.apache.nutch.util.MimeUtil
> 2011-12-21 12:29:56,665 ERROR http.Http - at org.apache.nutch.util.MimeUtil.autoResolveContentType(MimeUtil.java:169)
> 2011-12-21 12:29:56,665 ERROR http.Http - at org.apache.nutch.protocol.Content.getContentType(Content.java:292)
> 2011-12-21 12:29:56,666 ERROR http.Http - at org.apache.nutch.protocol.Content.<init>(Content.java:88)
> 2011-12-21 12:29:56,666 ERROR http.Http - at org.apache.nutch.protocol.http.api.HttpBase.getProtocolOutput(HttpBase.java:142)
> 2011-12-21 12:29:56,666 ERROR http.Http - at org.apache.nutch.parse.ParserChecker.run(ParserChecker.java:82)
> 2011-12-21 12:29:56,666 ERROR http.Http - at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:69)
> 2011-12-21 12:29:56,666 ERROR http.Http - at org.apache.nutch.parse.ParserChecker.main(ParserChecker.java:138)
> {code}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (NUTCH-1230) MimeType API deprecated and breaks
with Tika 1.0
Posted by "Markus Jelsma (Updated) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/NUTCH-1230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Markus Jelsma updated NUTCH-1230:
---------------------------------
Priority: Blocker (was: Major)
Patch Info: Patch Available
Summary: MimeType API deprecated and breaks with Tika 1.0 (was: MimeType utils broken with Tika 1.1)
> MimeType API deprecated and breaks with Tika 1.0
> ------------------------------------------------
>
> Key: NUTCH-1230
> URL: https://issues.apache.org/jira/browse/NUTCH-1230
> Project: Nutch
> Issue Type: Bug
> Reporter: Markus Jelsma
> Assignee: Markus Jelsma
> Priority: Blocker
> Fix For: 1.5
>
>
> We used Tika 1.0-SNAPSHOT in production and just switched to 1.1-SNAPSHOT. The new version triggers the following error:
> {code}
> 2011-12-21 12:29:56,665 ERROR http.Http - java.lang.IllegalAccessError: tried to access method org.apache.tika.mime.MimeTypes.getMimeType([B)Lorg/apache/tika/mime/MimeType; from class org.apache.nutch.util.MimeUtil
> 2011-12-21 12:29:56,665 ERROR http.Http - at org.apache.nutch.util.MimeUtil.autoResolveContentType(MimeUtil.java:169)
> 2011-12-21 12:29:56,665 ERROR http.Http - at org.apache.nutch.protocol.Content.getContentType(Content.java:292)
> 2011-12-21 12:29:56,666 ERROR http.Http - at org.apache.nutch.protocol.Content.<init>(Content.java:88)
> 2011-12-21 12:29:56,666 ERROR http.Http - at org.apache.nutch.protocol.http.api.HttpBase.getProtocolOutput(HttpBase.java:142)
> 2011-12-21 12:29:56,666 ERROR http.Http - at org.apache.nutch.parse.ParserChecker.run(ParserChecker.java:82)
> 2011-12-21 12:29:56,666 ERROR http.Http - at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:69)
> 2011-12-21 12:29:56,666 ERROR http.Http - at org.apache.nutch.parse.ParserChecker.main(ParserChecker.java:138)
> {code}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (NUTCH-1230) MimeType API deprecated and breaks
with Tika 1.0
Posted by "Markus Jelsma (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/NUTCH-1230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13174071#comment-13174071 ]
Markus Jelsma commented on NUTCH-1230:
--------------------------------------
actually, Tika now returns the octetstream for that data. Please advice!
> MimeType API deprecated and breaks with Tika 1.0
> ------------------------------------------------
>
> Key: NUTCH-1230
> URL: https://issues.apache.org/jira/browse/NUTCH-1230
> Project: Nutch
> Issue Type: Bug
> Reporter: Markus Jelsma
> Assignee: Markus Jelsma
> Priority: Blocker
> Fix For: 1.5
>
> Attachments: NUTCH-1230-1.5-2.patch
>
>
> We used Tika 1.0-SNAPSHOT in production and just switched to 1.1-SNAPSHOT. The new version triggers the following error:
> {code}
> 2011-12-21 12:29:56,665 ERROR http.Http - java.lang.IllegalAccessError: tried to access method org.apache.tika.mime.MimeTypes.getMimeType([B)Lorg/apache/tika/mime/MimeType; from class org.apache.nutch.util.MimeUtil
> 2011-12-21 12:29:56,665 ERROR http.Http - at org.apache.nutch.util.MimeUtil.autoResolveContentType(MimeUtil.java:169)
> 2011-12-21 12:29:56,665 ERROR http.Http - at org.apache.nutch.protocol.Content.getContentType(Content.java:292)
> 2011-12-21 12:29:56,666 ERROR http.Http - at org.apache.nutch.protocol.Content.<init>(Content.java:88)
> 2011-12-21 12:29:56,666 ERROR http.Http - at org.apache.nutch.protocol.http.api.HttpBase.getProtocolOutput(HttpBase.java:142)
> 2011-12-21 12:29:56,666 ERROR http.Http - at org.apache.nutch.parse.ParserChecker.run(ParserChecker.java:82)
> 2011-12-21 12:29:56,666 ERROR http.Http - at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:69)
> 2011-12-21 12:29:56,666 ERROR http.Http - at org.apache.nutch.parse.ParserChecker.main(ParserChecker.java:138)
> {code}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (NUTCH-1230) MimeType API deprecated and breaks
with Tika 1.0
Posted by "Markus Jelsma (Updated) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/NUTCH-1230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Markus Jelsma updated NUTCH-1230:
---------------------------------
Attachment: NUTCH-1230-1.5-2.patch
Patches for MimeUtil and some other classes. Everything works well with Tika 1.0. I removed instances where MimeType's are returned and rely on String now.
The ContentTest fails, MimeUtil now returns octetstream instead of text/html for data "".getBytes("UTF8"). Is this a problem?
> MimeType API deprecated and breaks with Tika 1.0
> ------------------------------------------------
>
> Key: NUTCH-1230
> URL: https://issues.apache.org/jira/browse/NUTCH-1230
> Project: Nutch
> Issue Type: Bug
> Reporter: Markus Jelsma
> Assignee: Markus Jelsma
> Priority: Blocker
> Fix For: 1.5
>
> Attachments: NUTCH-1230-1.5-2.patch
>
>
> We used Tika 1.0-SNAPSHOT in production and just switched to 1.1-SNAPSHOT. The new version triggers the following error:
> {code}
> 2011-12-21 12:29:56,665 ERROR http.Http - java.lang.IllegalAccessError: tried to access method org.apache.tika.mime.MimeTypes.getMimeType([B)Lorg/apache/tika/mime/MimeType; from class org.apache.nutch.util.MimeUtil
> 2011-12-21 12:29:56,665 ERROR http.Http - at org.apache.nutch.util.MimeUtil.autoResolveContentType(MimeUtil.java:169)
> 2011-12-21 12:29:56,665 ERROR http.Http - at org.apache.nutch.protocol.Content.getContentType(Content.java:292)
> 2011-12-21 12:29:56,666 ERROR http.Http - at org.apache.nutch.protocol.Content.<init>(Content.java:88)
> 2011-12-21 12:29:56,666 ERROR http.Http - at org.apache.nutch.protocol.http.api.HttpBase.getProtocolOutput(HttpBase.java:142)
> 2011-12-21 12:29:56,666 ERROR http.Http - at org.apache.nutch.parse.ParserChecker.run(ParserChecker.java:82)
> 2011-12-21 12:29:56,666 ERROR http.Http - at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:69)
> 2011-12-21 12:29:56,666 ERROR http.Http - at org.apache.nutch.parse.ParserChecker.main(ParserChecker.java:138)
> {code}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (NUTCH-1230) MimeType utils broken with Tika 1.1
Posted by "Markus Jelsma (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/NUTCH-1230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13174021#comment-13174021 ]
Markus Jelsma commented on NUTCH-1230:
--------------------------------------
Seems the thing became deprecated in 1.0 and is in my 1.1-snapshot inaccessible. I'll try an upgrade to 1.0.
> MimeType utils broken with Tika 1.1
> -----------------------------------
>
> Key: NUTCH-1230
> URL: https://issues.apache.org/jira/browse/NUTCH-1230
> Project: Nutch
> Issue Type: Bug
> Reporter: Markus Jelsma
> Assignee: Markus Jelsma
> Fix For: 1.5
>
>
> We used Tika 1.0-SNAPSHOT in production and just switched to 1.1-SNAPSHOT. The new version triggers the following error:
> {code}
> 2011-12-21 12:29:56,665 ERROR http.Http - java.lang.IllegalAccessError: tried to access method org.apache.tika.mime.MimeTypes.getMimeType([B)Lorg/apache/tika/mime/MimeType; from class org.apache.nutch.util.MimeUtil
> 2011-12-21 12:29:56,665 ERROR http.Http - at org.apache.nutch.util.MimeUtil.autoResolveContentType(MimeUtil.java:169)
> 2011-12-21 12:29:56,665 ERROR http.Http - at org.apache.nutch.protocol.Content.getContentType(Content.java:292)
> 2011-12-21 12:29:56,666 ERROR http.Http - at org.apache.nutch.protocol.Content.<init>(Content.java:88)
> 2011-12-21 12:29:56,666 ERROR http.Http - at org.apache.nutch.protocol.http.api.HttpBase.getProtocolOutput(HttpBase.java:142)
> 2011-12-21 12:29:56,666 ERROR http.Http - at org.apache.nutch.parse.ParserChecker.run(ParserChecker.java:82)
> 2011-12-21 12:29:56,666 ERROR http.Http - at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:69)
> 2011-12-21 12:29:56,666 ERROR http.Http - at org.apache.nutch.parse.ParserChecker.main(ParserChecker.java:138)
> {code}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (NUTCH-1230) MimeType API deprecated and breaks
with Tika 1.0
Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/NUTCH-1230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13176196#comment-13176196 ]
Hudson commented on NUTCH-1230:
-------------------------------
Integrated in nutch-trunk-maven #80 (See [https://builds.apache.org/job/nutch-trunk-maven/80/])
NUTCH-1230 and NUTCH-1231 Upgrade to Tika 1.0 and using new Tika detect API
markus : http://svn.apache.org/viewvc/nutch/trunk/viewvc/?view=rev&root=&revision=1224916
Files :
* /nutch/trunk/CHANGES.txt
* /nutch/trunk/ivy/ivy.xml
* /nutch/trunk/src/java/org/apache/nutch/util/MimeUtil.java
* /nutch/trunk/src/plugin/index-more/src/java/org/apache/nutch/indexer/more/MoreIndexingFilter.java
* /nutch/trunk/src/plugin/parse-zip/src/java/org/apache/nutch/parse/zip/ZipTextExtractor.java
* /nutch/trunk/src/plugin/protocol-file/src/java/org/apache/nutch/protocol/file/FileResponse.java
> MimeType API deprecated and breaks with Tika 1.0
> ------------------------------------------------
>
> Key: NUTCH-1230
> URL: https://issues.apache.org/jira/browse/NUTCH-1230
> Project: Nutch
> Issue Type: Bug
> Reporter: Markus Jelsma
> Assignee: Markus Jelsma
> Priority: Blocker
> Fix For: 1.5
>
> Attachments: NUTCH-1230-1.5-2.patch, NUTCH-1230-1.5-3.patch
>
>
> We used Tika 1.0-SNAPSHOT in production and just switched to 1.1-SNAPSHOT. The new version triggers the following error:
> {code}
> 2011-12-21 12:29:56,665 ERROR http.Http - java.lang.IllegalAccessError: tried to access method org.apache.tika.mime.MimeTypes.getMimeType([B)Lorg/apache/tika/mime/MimeType; from class org.apache.nutch.util.MimeUtil
> 2011-12-21 12:29:56,665 ERROR http.Http - at org.apache.nutch.util.MimeUtil.autoResolveContentType(MimeUtil.java:169)
> 2011-12-21 12:29:56,665 ERROR http.Http - at org.apache.nutch.protocol.Content.getContentType(Content.java:292)
> 2011-12-21 12:29:56,666 ERROR http.Http - at org.apache.nutch.protocol.Content.<init>(Content.java:88)
> 2011-12-21 12:29:56,666 ERROR http.Http - at org.apache.nutch.protocol.http.api.HttpBase.getProtocolOutput(HttpBase.java:142)
> 2011-12-21 12:29:56,666 ERROR http.Http - at org.apache.nutch.parse.ParserChecker.run(ParserChecker.java:82)
> 2011-12-21 12:29:56,666 ERROR http.Http - at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:69)
> 2011-12-21 12:29:56,666 ERROR http.Http - at org.apache.nutch.parse.ParserChecker.main(ParserChecker.java:138)
> {code}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira