You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Amit Kumar (JIRA)" <ji...@apache.org> on 2017/02/22 13:42:44 UTC

[jira] [Closed] (TIKA-2271) Tika parsing gives maximum limit reached error

     [ https://issues.apache.org/jira/browse/TIKA-2271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Amit Kumar closed TIKA-2271.
----------------------------
    Resolution: Not A Problem

One can use the writeLimit to set the limit or even disable it using:

public BodyContentHandler(int writeLimit)

The docs says the following:
writeLimit - maximum number of characters to include in the string, or -1 to disable the write limit

> Tika parsing gives maximum limit reached error
> ----------------------------------------------
>
>                 Key: TIKA-2271
>                 URL: https://issues.apache.org/jira/browse/TIKA-2271
>             Project: Tika
>          Issue Type: Bug
>            Reporter: Amit Kumar
>
> I am using Apache Tika for getting content from PDF files. When I run it I get below error. I don't see this error documented anywhere and this is just a bad surprise.
> org.apache.tika.sax.WriteOutContentHandler$WriteLimitReachedException: Your document contained more than 100000 characters, and so your requested limit has been reached. To receive the full text of the document, increase your limit. (Text up to the limit is however available).
>     at org.apache.tika.sax.WriteOutContentHandler.characters(WriteOutContentHandler.java:141)
>     at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
>     at org.apache.tika.sax.xpath.MatchingContentHandler.characters(MatchingContentHandler.java:85)
>     at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
>     at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
>     at org.apache.tika.sax.SecureContentHandler.characters(SecureContentHandler.java:270)
>     at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
>     at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
>     at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
>     at org.apache.tika.sax.SafeContentHandler.access$001(SafeContentHandler.java:46)
>     at org.apache.tika.sax.SafeContentHandler$1.write(SafeContentHandler.java:82)
>     at org.apache.tika.sax.SafeContentHandler.filter(SafeContentHandler.java:140)
>     at org.apache.tika.sax.SafeContentHandler.characters(SafeContentHandler.java:287)
>     at org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:279)
>     at org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:306)
>     at org.apache.tika.parser.pdf.PDF2XHTML.writeWordSeparator(PDF2XHTML.java:318)
>     at org.apache.pdfbox.text.PDFTextStripper.writeLine(PDFTextStripper.java:1741)
>     at org.apache.pdfbox.text.PDFTextStripper.writePage(PDFTextStripper.java:672)
>     at org.apache.pdfbox.text.PDFTextStripper.processPage(PDFTextStripper.java:392)
>     at org.apache.tika.parser.pdf.PDF2XHTML.processPage(PDF2XHTML.java:141)
>     at org.apache.pdfbox.text.PDFTextStripper.processPages(PDFTextStripper.java:319)
>     at org.apache.pdfbox.text.PDFTextStripper.writeText(PDFTextStripper.java:266)
>     at org.apache.tika.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:111)
>     at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:150)
>     at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
>     at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
>     at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
>     at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:136)
> Just want to know how to get away with this error and be able to parse files again. Or How to make this limit unlimited.
> This question is also raised in SOO http://stackoverflow.com/questions/42392145/tika-parsing-gives-maximum-limit-reached-error



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)