You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Markus Jelsma (Created) (JIRA)" <ji...@apache.org> on 2011/12/21 12:27:30 UTC

[jira] [Created] (NUTCH-1230) MimeType utils broken with Tika 1.1

MimeType utils broken with Tika 1.1
-----------------------------------

                 Key: NUTCH-1230
                 URL: https://issues.apache.org/jira/browse/NUTCH-1230
             Project: Nutch
          Issue Type: Bug
            Reporter: Markus Jelsma
            Assignee: Markus Jelsma
             Fix For: 1.5


We used Tika 1.0-SNAPSHOT in production and just switched to 1.1-SNAPSHOT. The new version triggers the following error:

{code}
2011-12-21 12:29:56,665 ERROR http.Http - java.lang.IllegalAccessError: tried to access method org.apache.tika.mime.MimeTypes.getMimeType([B)Lorg/apache/tika/mime/MimeType; from class org.apache.nutch.util.MimeUtil
2011-12-21 12:29:56,665 ERROR http.Http - at org.apache.nutch.util.MimeUtil.autoResolveContentType(MimeUtil.java:169)
2011-12-21 12:29:56,665 ERROR http.Http - at org.apache.nutch.protocol.Content.getContentType(Content.java:292)
2011-12-21 12:29:56,666 ERROR http.Http - at org.apache.nutch.protocol.Content.<init>(Content.java:88)
2011-12-21 12:29:56,666 ERROR http.Http - at org.apache.nutch.protocol.http.api.HttpBase.getProtocolOutput(HttpBase.java:142)
2011-12-21 12:29:56,666 ERROR http.Http - at org.apache.nutch.parse.ParserChecker.run(ParserChecker.java:82)
2011-12-21 12:29:56,666 ERROR http.Http - at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:69)
2011-12-21 12:29:56,666 ERROR http.Http - at org.apache.nutch.parse.ParserChecker.main(ParserChecker.java:138)
{code}



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (NUTCH-1230) MimeType API deprecated and breaks with Tika 1.0

Posted by "Markus Jelsma (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/NUTCH-1230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Markus Jelsma updated NUTCH-1230:
---------------------------------

    Attachment: NUTCH-1230-1.5-3.patch

I feel like a fool sometimes but its sorted now! All tests pass! I will commit and upgrade Tika shortly if there are no objections.
                
> MimeType API deprecated and breaks with Tika 1.0
> ------------------------------------------------
>
>                 Key: NUTCH-1230
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1230
>             Project: Nutch
>          Issue Type: Bug
>            Reporter: Markus Jelsma
>            Assignee: Markus Jelsma
>            Priority: Blocker
>             Fix For: 1.5
>
>         Attachments: NUTCH-1230-1.5-2.patch, NUTCH-1230-1.5-3.patch
>
>
> We used Tika 1.0-SNAPSHOT in production and just switched to 1.1-SNAPSHOT. The new version triggers the following error:
> {code}
> 2011-12-21 12:29:56,665 ERROR http.Http - java.lang.IllegalAccessError: tried to access method org.apache.tika.mime.MimeTypes.getMimeType([B)Lorg/apache/tika/mime/MimeType; from class org.apache.nutch.util.MimeUtil
> 2011-12-21 12:29:56,665 ERROR http.Http - at org.apache.nutch.util.MimeUtil.autoResolveContentType(MimeUtil.java:169)
> 2011-12-21 12:29:56,665 ERROR http.Http - at org.apache.nutch.protocol.Content.getContentType(Content.java:292)
> 2011-12-21 12:29:56,666 ERROR http.Http - at org.apache.nutch.protocol.Content.<init>(Content.java:88)
> 2011-12-21 12:29:56,666 ERROR http.Http - at org.apache.nutch.protocol.http.api.HttpBase.getProtocolOutput(HttpBase.java:142)
> 2011-12-21 12:29:56,666 ERROR http.Http - at org.apache.nutch.parse.ParserChecker.run(ParserChecker.java:82)
> 2011-12-21 12:29:56,666 ERROR http.Http - at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:69)
> 2011-12-21 12:29:56,666 ERROR http.Http - at org.apache.nutch.parse.ParserChecker.main(ParserChecker.java:138)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (NUTCH-1230) MimeType API deprecated and breaks with Tika 1.0

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/NUTCH-1230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13176441#comment-13176441 ] 

Hudson commented on NUTCH-1230:
-------------------------------

Integrated in Nutch-trunk #1706 (See [https://builds.apache.org/job/Nutch-trunk/1706/])
    NUTCH-1230 and NUTCH-1231 Upgrade to Tika 1.0 and using new Tika detect API

markus : http://svn.apache.org/viewvc/nutch/trunk/viewvc/?view=rev&root=&revision=1224916
Files : 
* /nutch/trunk/CHANGES.txt
* /nutch/trunk/ivy/ivy.xml
* /nutch/trunk/src/java/org/apache/nutch/util/MimeUtil.java
* /nutch/trunk/src/plugin/index-more/src/java/org/apache/nutch/indexer/more/MoreIndexingFilter.java
* /nutch/trunk/src/plugin/parse-zip/src/java/org/apache/nutch/parse/zip/ZipTextExtractor.java
* /nutch/trunk/src/plugin/protocol-file/src/java/org/apache/nutch/protocol/file/FileResponse.java

                
> MimeType API deprecated and breaks with Tika 1.0
> ------------------------------------------------
>
>                 Key: NUTCH-1230
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1230
>             Project: Nutch
>          Issue Type: Bug
>            Reporter: Markus Jelsma
>            Assignee: Markus Jelsma
>            Priority: Blocker
>             Fix For: 1.5
>
>         Attachments: NUTCH-1230-1.5-2.patch, NUTCH-1230-1.5-3.patch
>
>
> We used Tika 1.0-SNAPSHOT in production and just switched to 1.1-SNAPSHOT. The new version triggers the following error:
> {code}
> 2011-12-21 12:29:56,665 ERROR http.Http - java.lang.IllegalAccessError: tried to access method org.apache.tika.mime.MimeTypes.getMimeType([B)Lorg/apache/tika/mime/MimeType; from class org.apache.nutch.util.MimeUtil
> 2011-12-21 12:29:56,665 ERROR http.Http - at org.apache.nutch.util.MimeUtil.autoResolveContentType(MimeUtil.java:169)
> 2011-12-21 12:29:56,665 ERROR http.Http - at org.apache.nutch.protocol.Content.getContentType(Content.java:292)
> 2011-12-21 12:29:56,666 ERROR http.Http - at org.apache.nutch.protocol.Content.<init>(Content.java:88)
> 2011-12-21 12:29:56,666 ERROR http.Http - at org.apache.nutch.protocol.http.api.HttpBase.getProtocolOutput(HttpBase.java:142)
> 2011-12-21 12:29:56,666 ERROR http.Http - at org.apache.nutch.parse.ParserChecker.run(ParserChecker.java:82)
> 2011-12-21 12:29:56,666 ERROR http.Http - at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:69)
> 2011-12-21 12:29:56,666 ERROR http.Http - at org.apache.nutch.parse.ParserChecker.main(ParserChecker.java:138)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Issue Comment Edited] (NUTCH-1230) MimeType utils broken with Tika 1.1

Posted by "Markus Jelsma (Issue Comment Edited) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/NUTCH-1230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13174023#comment-13174023 ] 

Markus Jelsma edited comment on NUTCH-1230 at 12/21/11 11:40 AM:
-----------------------------------------------------------------

Need to use Tika.detect API instead of MimeTypes API.
                
      was (Author: markus17):
    We previously used the byte[] as input but Tika now required java.io.File:
http://tika.apache.org/1.0/api/org/apache/tika/Tika.html#detect%28java.io.File%29

in o.a.n.Content we have the byte[] but must pass a File to MimeUtil.autoResolveContentType(). But i have no idea how i can convert an in-memory byte[] to a File?! 

I hate getting stuck again, any advice would be more then helpful!
                  
> MimeType utils broken with Tika 1.1
> -----------------------------------
>
>                 Key: NUTCH-1230
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1230
>             Project: Nutch
>          Issue Type: Bug
>            Reporter: Markus Jelsma
>            Assignee: Markus Jelsma
>             Fix For: 1.5
>
>
> We used Tika 1.0-SNAPSHOT in production and just switched to 1.1-SNAPSHOT. The new version triggers the following error:
> {code}
> 2011-12-21 12:29:56,665 ERROR http.Http - java.lang.IllegalAccessError: tried to access method org.apache.tika.mime.MimeTypes.getMimeType([B)Lorg/apache/tika/mime/MimeType; from class org.apache.nutch.util.MimeUtil
> 2011-12-21 12:29:56,665 ERROR http.Http - at org.apache.nutch.util.MimeUtil.autoResolveContentType(MimeUtil.java:169)
> 2011-12-21 12:29:56,665 ERROR http.Http - at org.apache.nutch.protocol.Content.getContentType(Content.java:292)
> 2011-12-21 12:29:56,666 ERROR http.Http - at org.apache.nutch.protocol.Content.<init>(Content.java:88)
> 2011-12-21 12:29:56,666 ERROR http.Http - at org.apache.nutch.protocol.http.api.HttpBase.getProtocolOutput(HttpBase.java:142)
> 2011-12-21 12:29:56,666 ERROR http.Http - at org.apache.nutch.parse.ParserChecker.run(ParserChecker.java:82)
> 2011-12-21 12:29:56,666 ERROR http.Http - at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:69)
> 2011-12-21 12:29:56,666 ERROR http.Http - at org.apache.nutch.parse.ParserChecker.main(ParserChecker.java:138)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Resolved] (NUTCH-1230) MimeType API deprecated and breaks with Tika 1.0

Posted by "Markus Jelsma (Resolved) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/NUTCH-1230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Markus Jelsma resolved NUTCH-1230.
----------------------------------

    Resolution: Fixed

Committed for 1.5 in rev. 1224916.
                
> MimeType API deprecated and breaks with Tika 1.0
> ------------------------------------------------
>
>                 Key: NUTCH-1230
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1230
>             Project: Nutch
>          Issue Type: Bug
>            Reporter: Markus Jelsma
>            Assignee: Markus Jelsma
>            Priority: Blocker
>             Fix For: 1.5
>
>         Attachments: NUTCH-1230-1.5-2.patch, NUTCH-1230-1.5-3.patch
>
>
> We used Tika 1.0-SNAPSHOT in production and just switched to 1.1-SNAPSHOT. The new version triggers the following error:
> {code}
> 2011-12-21 12:29:56,665 ERROR http.Http - java.lang.IllegalAccessError: tried to access method org.apache.tika.mime.MimeTypes.getMimeType([B)Lorg/apache/tika/mime/MimeType; from class org.apache.nutch.util.MimeUtil
> 2011-12-21 12:29:56,665 ERROR http.Http - at org.apache.nutch.util.MimeUtil.autoResolveContentType(MimeUtil.java:169)
> 2011-12-21 12:29:56,665 ERROR http.Http - at org.apache.nutch.protocol.Content.getContentType(Content.java:292)
> 2011-12-21 12:29:56,666 ERROR http.Http - at org.apache.nutch.protocol.Content.<init>(Content.java:88)
> 2011-12-21 12:29:56,666 ERROR http.Http - at org.apache.nutch.protocol.http.api.HttpBase.getProtocolOutput(HttpBase.java:142)
> 2011-12-21 12:29:56,666 ERROR http.Http - at org.apache.nutch.parse.ParserChecker.run(ParserChecker.java:82)
> 2011-12-21 12:29:56,666 ERROR http.Http - at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:69)
> 2011-12-21 12:29:56,666 ERROR http.Http - at org.apache.nutch.parse.ParserChecker.main(ParserChecker.java:138)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (NUTCH-1230) MimeType utils broken with Tika 1.1

Posted by "Markus Jelsma (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/NUTCH-1230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13174023#comment-13174023 ] 

Markus Jelsma commented on NUTCH-1230:
--------------------------------------

We previously used the byte[] as input but Tika now required java.io.File:
http://tika.apache.org/1.0/api/org/apache/tika/Tika.html#detect%28java.io.File%29

in o.a.n.Content we have the byte[] but must pass a File to MimeUtil.autoResolveContentType(). But i have no idea how i can convert an in-memory byte[] to a File?! 

I hate getting stuck again, any advice would be more then helpful!
                
> MimeType utils broken with Tika 1.1
> -----------------------------------
>
>                 Key: NUTCH-1230
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1230
>             Project: Nutch
>          Issue Type: Bug
>            Reporter: Markus Jelsma
>            Assignee: Markus Jelsma
>             Fix For: 1.5
>
>
> We used Tika 1.0-SNAPSHOT in production and just switched to 1.1-SNAPSHOT. The new version triggers the following error:
> {code}
> 2011-12-21 12:29:56,665 ERROR http.Http - java.lang.IllegalAccessError: tried to access method org.apache.tika.mime.MimeTypes.getMimeType([B)Lorg/apache/tika/mime/MimeType; from class org.apache.nutch.util.MimeUtil
> 2011-12-21 12:29:56,665 ERROR http.Http - at org.apache.nutch.util.MimeUtil.autoResolveContentType(MimeUtil.java:169)
> 2011-12-21 12:29:56,665 ERROR http.Http - at org.apache.nutch.protocol.Content.getContentType(Content.java:292)
> 2011-12-21 12:29:56,666 ERROR http.Http - at org.apache.nutch.protocol.Content.<init>(Content.java:88)
> 2011-12-21 12:29:56,666 ERROR http.Http - at org.apache.nutch.protocol.http.api.HttpBase.getProtocolOutput(HttpBase.java:142)
> 2011-12-21 12:29:56,666 ERROR http.Http - at org.apache.nutch.parse.ParserChecker.run(ParserChecker.java:82)
> 2011-12-21 12:29:56,666 ERROR http.Http - at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:69)
> 2011-12-21 12:29:56,666 ERROR http.Http - at org.apache.nutch.parse.ParserChecker.main(ParserChecker.java:138)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (NUTCH-1230) MimeType API deprecated and breaks with Tika 1.0

Posted by "Markus Jelsma (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/NUTCH-1230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Markus Jelsma updated NUTCH-1230:
---------------------------------

      Priority: Blocker  (was: Major)
    Patch Info: Patch Available
       Summary: MimeType API deprecated and breaks with Tika 1.0  (was: MimeType utils broken with Tika 1.1)
    
> MimeType API deprecated and breaks with Tika 1.0
> ------------------------------------------------
>
>                 Key: NUTCH-1230
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1230
>             Project: Nutch
>          Issue Type: Bug
>            Reporter: Markus Jelsma
>            Assignee: Markus Jelsma
>            Priority: Blocker
>             Fix For: 1.5
>
>
> We used Tika 1.0-SNAPSHOT in production and just switched to 1.1-SNAPSHOT. The new version triggers the following error:
> {code}
> 2011-12-21 12:29:56,665 ERROR http.Http - java.lang.IllegalAccessError: tried to access method org.apache.tika.mime.MimeTypes.getMimeType([B)Lorg/apache/tika/mime/MimeType; from class org.apache.nutch.util.MimeUtil
> 2011-12-21 12:29:56,665 ERROR http.Http - at org.apache.nutch.util.MimeUtil.autoResolveContentType(MimeUtil.java:169)
> 2011-12-21 12:29:56,665 ERROR http.Http - at org.apache.nutch.protocol.Content.getContentType(Content.java:292)
> 2011-12-21 12:29:56,666 ERROR http.Http - at org.apache.nutch.protocol.Content.<init>(Content.java:88)
> 2011-12-21 12:29:56,666 ERROR http.Http - at org.apache.nutch.protocol.http.api.HttpBase.getProtocolOutput(HttpBase.java:142)
> 2011-12-21 12:29:56,666 ERROR http.Http - at org.apache.nutch.parse.ParserChecker.run(ParserChecker.java:82)
> 2011-12-21 12:29:56,666 ERROR http.Http - at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:69)
> 2011-12-21 12:29:56,666 ERROR http.Http - at org.apache.nutch.parse.ParserChecker.main(ParserChecker.java:138)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (NUTCH-1230) MimeType API deprecated and breaks with Tika 1.0

Posted by "Markus Jelsma (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/NUTCH-1230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13174071#comment-13174071 ] 

Markus Jelsma commented on NUTCH-1230:
--------------------------------------

actually, Tika now returns the octetstream for that data. Please advice!
                
> MimeType API deprecated and breaks with Tika 1.0
> ------------------------------------------------
>
>                 Key: NUTCH-1230
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1230
>             Project: Nutch
>          Issue Type: Bug
>            Reporter: Markus Jelsma
>            Assignee: Markus Jelsma
>            Priority: Blocker
>             Fix For: 1.5
>
>         Attachments: NUTCH-1230-1.5-2.patch
>
>
> We used Tika 1.0-SNAPSHOT in production and just switched to 1.1-SNAPSHOT. The new version triggers the following error:
> {code}
> 2011-12-21 12:29:56,665 ERROR http.Http - java.lang.IllegalAccessError: tried to access method org.apache.tika.mime.MimeTypes.getMimeType([B)Lorg/apache/tika/mime/MimeType; from class org.apache.nutch.util.MimeUtil
> 2011-12-21 12:29:56,665 ERROR http.Http - at org.apache.nutch.util.MimeUtil.autoResolveContentType(MimeUtil.java:169)
> 2011-12-21 12:29:56,665 ERROR http.Http - at org.apache.nutch.protocol.Content.getContentType(Content.java:292)
> 2011-12-21 12:29:56,666 ERROR http.Http - at org.apache.nutch.protocol.Content.<init>(Content.java:88)
> 2011-12-21 12:29:56,666 ERROR http.Http - at org.apache.nutch.protocol.http.api.HttpBase.getProtocolOutput(HttpBase.java:142)
> 2011-12-21 12:29:56,666 ERROR http.Http - at org.apache.nutch.parse.ParserChecker.run(ParserChecker.java:82)
> 2011-12-21 12:29:56,666 ERROR http.Http - at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:69)
> 2011-12-21 12:29:56,666 ERROR http.Http - at org.apache.nutch.parse.ParserChecker.main(ParserChecker.java:138)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (NUTCH-1230) MimeType API deprecated and breaks with Tika 1.0

Posted by "Markus Jelsma (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/NUTCH-1230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Markus Jelsma updated NUTCH-1230:
---------------------------------

    Attachment: NUTCH-1230-1.5-2.patch

Patches for MimeUtil and some other classes. Everything works well with Tika 1.0. I removed instances where MimeType's are returned and rely on String now.

The ContentTest fails, MimeUtil now returns octetstream instead of text/html for data "".getBytes("UTF8"). Is this a problem?
                
> MimeType API deprecated and breaks with Tika 1.0
> ------------------------------------------------
>
>                 Key: NUTCH-1230
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1230
>             Project: Nutch
>          Issue Type: Bug
>            Reporter: Markus Jelsma
>            Assignee: Markus Jelsma
>            Priority: Blocker
>             Fix For: 1.5
>
>         Attachments: NUTCH-1230-1.5-2.patch
>
>
> We used Tika 1.0-SNAPSHOT in production and just switched to 1.1-SNAPSHOT. The new version triggers the following error:
> {code}
> 2011-12-21 12:29:56,665 ERROR http.Http - java.lang.IllegalAccessError: tried to access method org.apache.tika.mime.MimeTypes.getMimeType([B)Lorg/apache/tika/mime/MimeType; from class org.apache.nutch.util.MimeUtil
> 2011-12-21 12:29:56,665 ERROR http.Http - at org.apache.nutch.util.MimeUtil.autoResolveContentType(MimeUtil.java:169)
> 2011-12-21 12:29:56,665 ERROR http.Http - at org.apache.nutch.protocol.Content.getContentType(Content.java:292)
> 2011-12-21 12:29:56,666 ERROR http.Http - at org.apache.nutch.protocol.Content.<init>(Content.java:88)
> 2011-12-21 12:29:56,666 ERROR http.Http - at org.apache.nutch.protocol.http.api.HttpBase.getProtocolOutput(HttpBase.java:142)
> 2011-12-21 12:29:56,666 ERROR http.Http - at org.apache.nutch.parse.ParserChecker.run(ParserChecker.java:82)
> 2011-12-21 12:29:56,666 ERROR http.Http - at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:69)
> 2011-12-21 12:29:56,666 ERROR http.Http - at org.apache.nutch.parse.ParserChecker.main(ParserChecker.java:138)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (NUTCH-1230) MimeType utils broken with Tika 1.1

Posted by "Markus Jelsma (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/NUTCH-1230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13174021#comment-13174021 ] 

Markus Jelsma commented on NUTCH-1230:
--------------------------------------

Seems the thing became deprecated in 1.0 and is in my 1.1-snapshot inaccessible. I'll try an upgrade to 1.0.
                
> MimeType utils broken with Tika 1.1
> -----------------------------------
>
>                 Key: NUTCH-1230
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1230
>             Project: Nutch
>          Issue Type: Bug
>            Reporter: Markus Jelsma
>            Assignee: Markus Jelsma
>             Fix For: 1.5
>
>
> We used Tika 1.0-SNAPSHOT in production and just switched to 1.1-SNAPSHOT. The new version triggers the following error:
> {code}
> 2011-12-21 12:29:56,665 ERROR http.Http - java.lang.IllegalAccessError: tried to access method org.apache.tika.mime.MimeTypes.getMimeType([B)Lorg/apache/tika/mime/MimeType; from class org.apache.nutch.util.MimeUtil
> 2011-12-21 12:29:56,665 ERROR http.Http - at org.apache.nutch.util.MimeUtil.autoResolveContentType(MimeUtil.java:169)
> 2011-12-21 12:29:56,665 ERROR http.Http - at org.apache.nutch.protocol.Content.getContentType(Content.java:292)
> 2011-12-21 12:29:56,666 ERROR http.Http - at org.apache.nutch.protocol.Content.<init>(Content.java:88)
> 2011-12-21 12:29:56,666 ERROR http.Http - at org.apache.nutch.protocol.http.api.HttpBase.getProtocolOutput(HttpBase.java:142)
> 2011-12-21 12:29:56,666 ERROR http.Http - at org.apache.nutch.parse.ParserChecker.run(ParserChecker.java:82)
> 2011-12-21 12:29:56,666 ERROR http.Http - at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:69)
> 2011-12-21 12:29:56,666 ERROR http.Http - at org.apache.nutch.parse.ParserChecker.main(ParserChecker.java:138)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (NUTCH-1230) MimeType API deprecated and breaks with Tika 1.0

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/NUTCH-1230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13176196#comment-13176196 ] 

Hudson commented on NUTCH-1230:
-------------------------------

Integrated in nutch-trunk-maven #80 (See [https://builds.apache.org/job/nutch-trunk-maven/80/])
    NUTCH-1230 and NUTCH-1231 Upgrade to Tika 1.0 and using new Tika detect API

markus : http://svn.apache.org/viewvc/nutch/trunk/viewvc/?view=rev&root=&revision=1224916
Files : 
* /nutch/trunk/CHANGES.txt
* /nutch/trunk/ivy/ivy.xml
* /nutch/trunk/src/java/org/apache/nutch/util/MimeUtil.java
* /nutch/trunk/src/plugin/index-more/src/java/org/apache/nutch/indexer/more/MoreIndexingFilter.java
* /nutch/trunk/src/plugin/parse-zip/src/java/org/apache/nutch/parse/zip/ZipTextExtractor.java
* /nutch/trunk/src/plugin/protocol-file/src/java/org/apache/nutch/protocol/file/FileResponse.java

                
> MimeType API deprecated and breaks with Tika 1.0
> ------------------------------------------------
>
>                 Key: NUTCH-1230
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1230
>             Project: Nutch
>          Issue Type: Bug
>            Reporter: Markus Jelsma
>            Assignee: Markus Jelsma
>            Priority: Blocker
>             Fix For: 1.5
>
>         Attachments: NUTCH-1230-1.5-2.patch, NUTCH-1230-1.5-3.patch
>
>
> We used Tika 1.0-SNAPSHOT in production and just switched to 1.1-SNAPSHOT. The new version triggers the following error:
> {code}
> 2011-12-21 12:29:56,665 ERROR http.Http - java.lang.IllegalAccessError: tried to access method org.apache.tika.mime.MimeTypes.getMimeType([B)Lorg/apache/tika/mime/MimeType; from class org.apache.nutch.util.MimeUtil
> 2011-12-21 12:29:56,665 ERROR http.Http - at org.apache.nutch.util.MimeUtil.autoResolveContentType(MimeUtil.java:169)
> 2011-12-21 12:29:56,665 ERROR http.Http - at org.apache.nutch.protocol.Content.getContentType(Content.java:292)
> 2011-12-21 12:29:56,666 ERROR http.Http - at org.apache.nutch.protocol.Content.<init>(Content.java:88)
> 2011-12-21 12:29:56,666 ERROR http.Http - at org.apache.nutch.protocol.http.api.HttpBase.getProtocolOutput(HttpBase.java:142)
> 2011-12-21 12:29:56,666 ERROR http.Http - at org.apache.nutch.parse.ParserChecker.run(ParserChecker.java:82)
> 2011-12-21 12:29:56,666 ERROR http.Http - at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:69)
> 2011-12-21 12:29:56,666 ERROR http.Http - at org.apache.nutch.parse.ParserChecker.main(ParserChecker.java:138)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira