You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by eShard <zi...@yahoo.com> on 2012/12/11 20:08:45 UTC

Too many Tika errors

I'm running Solr 4.0 on Tomcat 7.0.8 and I'm running the solr/example single
core as well with manifoldcf v1.1
I had everything working but then the crawler stops and I have Tika errors
in the solr log
I had tika 1.1 and that produces these errors: 
org.apache.solr.common.SolrException:
org.apache.tika.exception.TikaException: Unexpected RuntimeException from
org.apache.tika.parser.microsoft.OfficeParser@17bc9c03

So, I upgraded to tika 1.2 and again everything seemed to be working (I
indexed 24,000 files) then I recrawled the repository and again it stops;
this time the tika errors are:
null:java.lang.RuntimeException: java.lang.NoClassDefFoundError:
org/mozilla/universalchardet/CharsetListener at
org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:456)

What's going on here? What version of tika should I use?



--
View this message in context: http://lucene.472066.n3.nabble.com/Too-many-Tika-errors-tp4026126.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Too many Tika errors

Posted by eShard <zi...@yahoo.com>.
Ok, I managed to fix the universal charset error is caused by a missing
dependency
just download universalchardet-1.0.3.jar and put it in your extraction lib

the microsoft errors will probably be fixed in a future release of the POI
jars. (v3.9 didn't fix this error)



--
View this message in context: http://lucene.472066.n3.nabble.com/Too-many-Tika-errors-tp4026126p4026347.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Too many Tika errors

Posted by eShard <zi...@yahoo.com>.
I tried to send it to hidden email but it keeps throwing an error "missing
domain"
please advise...



--
View this message in context: http://lucene.472066.n3.nabble.com/Too-many-Tika-errors-tp4026126p4026184.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Too many Tika errors

Posted by Jack Krupansky <ja...@basetechnology.com>.
>What's going on here? What version of tika should I use?

The version that comes with Solr/SolrCell.

Try sending various document types directly to the Solr Extracting Request 
Handler and see if it might be related to your parameters or specific 
document types. Maybe the document isn't what it seems or is a newer 
version.

-- Jack Krupansky

-----Original Message----- 
From: Mattmann, Chris A (388J)
Sent: Tuesday, December 11, 2012 4:31 PM
To: solr-user@lucene.apache.org
Subject: Re: Too many Tika errors

Hi there -- you may want to post this to the dev@tika.apache.org list.

Cheers,
Chris

On 12/11/12 11:08 AM, "eShard" <zi...@yahoo.com> wrote:

>I'm running Solr 4.0 on Tomcat 7.0.8 and I'm running the solr/example
>single
>core as well with manifoldcf v1.1
>I had everything working but then the crawler stops and I have Tika errors
>in the solr log
>I had tika 1.1 and that produces these errors:
>org.apache.solr.common.SolrException:
>org.apache.tika.exception.TikaException: Unexpected RuntimeException from
>org.apache.tika.parser.microsoft.OfficeParser@17bc9c03
>
>So, I upgraded to tika 1.2 and again everything seemed to be working (I
>indexed 24,000 files) then I recrawled the repository and again it stops;
>this time the tika errors are:
>null:java.lang.RuntimeException: java.lang.NoClassDefFoundError:
>org/mozilla/universalchardet/CharsetListener at
>org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.ja
>va:456)
>
>What's going on here? What version of tika should I use?
>
>
>
>--
>View this message in context:
>http://lucene.472066.n3.nabble.com/Too-many-Tika-errors-tp4026126.html
>Sent from the Solr - User mailing list archive at Nabble.com. 


Re: Too many Tika errors

Posted by "Mattmann, Chris A (388J)" <ch...@jpl.nasa.gov>.
Hi there -- you may want to post this to the dev@tika.apache.org list.

Cheers,
Chris

On 12/11/12 11:08 AM, "eShard" <zi...@yahoo.com> wrote:

>I'm running Solr 4.0 on Tomcat 7.0.8 and I'm running the solr/example
>single
>core as well with manifoldcf v1.1
>I had everything working but then the crawler stops and I have Tika errors
>in the solr log
>I had tika 1.1 and that produces these errors:
>org.apache.solr.common.SolrException:
>org.apache.tika.exception.TikaException: Unexpected RuntimeException from
>org.apache.tika.parser.microsoft.OfficeParser@17bc9c03
>
>So, I upgraded to tika 1.2 and again everything seemed to be working (I
>indexed 24,000 files) then I recrawled the repository and again it stops;
>this time the tika errors are:
>null:java.lang.RuntimeException: java.lang.NoClassDefFoundError:
>org/mozilla/universalchardet/CharsetListener at
>org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.ja
>va:456)
>
>What's going on here? What version of tika should I use?
>
>
>
>--
>View this message in context:
>http://lucene.472066.n3.nabble.com/Too-many-Tika-errors-tp4026126.html
>Sent from the Solr - User mailing list archive at Nabble.com.