You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Srinivas <sr...@solix.com> on 2015/03/23 10:08:23 UTC

document contained more than 100000 characters

Hi,

Present in my project we are using apache tika for reading metadata of the
file,So whenever we handled large files(contained more than 100000
characters file) tika generating the error is file contained more than
100000 characters, So is it possible or not handling large files by using
tika,Please let me know.

 

 

Regards,

Srinivas B  


Re: document contained more than 100000 characters

Posted by Shawn Heisey <ap...@elyograg.org>.
On 3/23/2015 3:08 AM, Srinivas wrote:
> Present in my project we are using apache tika for reading metadata of the
> file,So whenever we handled large files(contained more than 100000
> characters file) tika generating the error is file contained more than
> 100000 characters, So is it possible or not handling large files by using
> tika,Please let me know.

This sounds like a Tika problem.  This is a solr mailing list.  You may
find some Tika expertise here, but this is the incorrect place for a
question about Tika.

Solr does use the Tika parser, in the contrib module for the
ExtractingRequestHandler.  I have never heard of such a limitation in
the context of the ExtractingRequestHandler, and I've heard some people
complain about OutOfMemory exceptions when they index 4 gigabyte PDF
files with our extracting handler ... so I am guessing that you are
using Tika in your own software.  If that is correct, you'll need to ask
your question on a Tika mailing list.

Thanks,
Shawn


Re: document contained more than 100000 characters

Posted by Alexandre Rafalovitch <ar...@gmail.com>.
Apache Tika has it's own mailing list. You may have better luck asking
this question there. If you then need help adapting it in Solr
context, come back.

Regards,
   Alex.

----
Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
http://www.solr-start.com/


On 23 March 2015 at 05:08, Srinivas <sr...@solix.com> wrote:
> Hi,
>
> Present in my project we are using apache tika for reading metadata of the
> file,So whenever we handled large files(contained more than 100000
> characters file) tika generating the error is file contained more than
> 100000 characters, So is it possible or not handling large files by using
> tika,Please let me know.
>
>
>
>
>
> Regards,
>
> Srinivas B
>