You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@tika.apache.org by "Allison, Timothy B." <ta...@mitre.org> on 2015/06/04 14:18:50 UTC

RE: Memory issues with PDF parser

Hi Mouthgalya,
  We fixed that NPE in https://issues.apache.org/jira/browse/TIKA-1605, and the fix will be available in Tika 1.9, which should be out within a week.
  As for memory issues, we worked around a memory leak in PDFBox with static caching of fonts for Tika 1.7 (may have been 1.8), but there may be others.  One potential memory hog is the processing of inline images within PDFs...have you configured Tika to pull those out (default is to skip them)?  Other than that, I'd recommend dropping a note to the PDFBox users list to get help in diagnosing memory consumption with PDFBox.  Have you tried any memory profiling?

          Best,

                    Tim

From: Mouthgalya Ganapathy [mailto:mouthgalya.ganapathy@fitchratings.com]
Sent: Wednesday, June 03, 2015 3:25 PM
To: tallison@apache.org
Subject: Memory issues with PDF parser

Hi all,
I am trying to use Apache tika 1.8 for extracting contents from pdf. I have the below code for extracting it. It works well for few files. But if I read many files , I see out of memory exception.
I also see a Null pointer exception in the pdf parser. I think the null pointer exception is because of the memory exception.
Any suggestions?

Tika version:
  <dependency>
                     <groupId>org.apache.tika</groupId>
                     <artifactId>tika-server</artifactId>
                     <version>1.8</version>
        </dependency>

I am running it as a part of J2EE APP in JBoss 1.7

Code:-

//Parse the pdf content using Apache Tikka
            InputStream is = null;
            try {
              is = new BufferedInputStream(new FileInputStream(input));
              //Disable write limit.
              contenthandler = new BodyContentHandler(-1);
               metadata = new Metadata();
              pdfparser = new PDFParser();
              context = new ParseContext();
              pdfparser.parse(is, contenthandler, metadata, context);
              docBody=contenthandler.toString();
              //System.out.println(contenthandler.toString());
            }
            catch (Exception e) {
               System.out.println("Exception in updating docbody for report ==> " + report.getDocID());
               if(is==null)
                 System.out.println("The input stream is a null object");
               e.printStackTrace();
              logger.log(Level.SEVERE, e.getMessage(), e);
            }
            finally {
                if (is != null) is.close();
                contenthandler=null;
                metadata=null;
                pdfparser=null;
                context =null;
            }


Exception:-
I am just including the null pointer exception in the parser below.

10:53:11,696 INFO  [stdout] (Thread-11 (HornetQ-client-global-threads-1619682129)) Exception in updating docbody for report ==> RPT_764268
10:53:12,218 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129)) java.lang.NullPointerException
10:53:12,219 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:158)
10:53:12,219 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at com.fitch.researchapi.dao.ResearchReportMDAO.updateDocBody(ResearchReportMDAO.java:881)
10:53:12,219 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at com.fitch.researchapi.dao.ResearchReportMDAO.loadFile_NEW(ResearchReportMDAO.java:965)
10:53:12,220 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at com.fitch.researchapi.dao.ResearchReportMDAO.upsert_NEW(ResearchReportMDAO.java:676)
10:53:12,220 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at com.fitch.research.ejb.ResearchReportManagerBean.processResearchReport(ResearchReportManagerBean.java:70)
10:53:12,221 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at sun.reflect.GeneratedMethodAccessor35.invoke(Unknown Source)
10:53:12,221 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
10:53:12,222 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at java.lang.reflect.Method.invoke(Method.java:597)
10:53:12,222 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.as.ee.component.ManagedReferenceMethodInterceptorFactory$ManagedReferenceMethodInterceptor.processInvocation(ManagedReferenceMethodInterceptorFactory.java:72)
10:53:12,223 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288)
10:53:12,223 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.invocation.WeavedInterceptor.processInvocation(WeavedInterceptor.java:53)
10:53:12,223 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.as.ee.component.interceptors.UserInterceptorFactory$1.processInvocation(UserInterceptorFactory.java:36)
10:53:12,224 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288)
10:53:12,224 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.as.jpa.interceptor.SBInvocationInterceptor.processInvocation(SBInvocationInterceptor.java:47)
10:53:12,225 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288)
10:53:12,225 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.invocation.InitialInterceptor.processInvocation(InitialInterceptor.java:21)
10:53:12,226 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288)
10:53:12,226 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.invocation.ChainedInterceptor.processInvocation(ChainedInterceptor.java:61)
10:53:12,227 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.as.ee.component.interceptors.ComponentDispatcherInterceptor.processInvocation(ComponentDispatcherInterceptor.java:53)
10:53:12,227 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288)
10:53:12,228 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.as.ejb3.component.pool.PooledInstanceInterceptor.processInvocation(PooledInstanceInterceptor.java:51)
10:53:12,228 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288)
10:53:12,229 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.as.ejb3.tx.CMTTxInterceptor.invokeInCallerTx(CMTTxInterceptor.java:202)
10:53:12,229 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.as.ejb3.tx.CMTTxInterceptor.required(CMTTxInterceptor.java:306)
10:53:12,229 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.as.ejb3.tx.CMTTxInterceptor.processInvocation(CMTTxInterceptor.java:190)
10:53:12,230 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288)
10:53:12,230 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.as.ejb3.component.interceptors.CurrentInvocationContextInterceptor.processInvocation(CurrentInvocationContextInterceptor.java:41)
10:53:12,231 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288)
10:53:12,231 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.as.ejb3.component.interceptors.LoggingInterceptor.processInvocation(LoggingInterceptor.java:59)
10:53:12,231 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288)
10:53:12,232 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.as.ee.component.NamespaceContextInterceptor.processInvocation(NamespaceContextInterceptor.java:50)
10:53:12,232 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288)
10:53:12,233 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.as.ejb3.component.interceptors.AdditionalSetupInterceptor.processInvocation(AdditionalSetupInterceptor.java:32)
10:53:12,233 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288)
10:53:12,233 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.as.ee.component.TCCLInterceptor.processInvocation(TCCLInterceptor.java:45)
10:53:12,234 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288)
10:53:12,234 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.invocation.ChainedInterceptor.processInvocation(ChainedInterceptor.java:61)
10:53:12,235 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.as.ee.component.ViewService$View.invoke(ViewService.java:165)
10:53:12,235 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.as.ee.component.ViewDescription$1.processInvocation(ViewDescription.java:173)
10:53:12,235 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288)
10:53:12,236 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.invocation.ChainedInterceptor.processInvocation(ChainedInterceptor.java:61)
10:53:12,236 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.as.ee.component.ProxyInvocationHandler.invoke(ProxyInvocationHandler.java:72)
10:53:12,236 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at com.fitch.research.ejb.ResearchReportManagerBeanLocal$$$view4.processResearchReport(Unknown Source)
10:53:12,868 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at com.fitch.research.ejb.mdb.ResearchQueueManagerMDB.onMessage(ResearchQueueManagerMDB.java:150)
10:53:12,868 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at sun.reflect.GeneratedMethodAccessor34.invoke(Unknown Source)
10:53:12,869 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
10:53:12,869 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at java.lang.reflect.Method.invoke(Method.java:597)
10:53:12,870 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.as.ee.component.ManagedReferenceMethodInterceptorFactory$ManagedReferenceMethodInterceptor.processInvocation(ManagedReferenceMethodInterceptorFactory.java:72)
10:53:12,870 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288)
10:53:12,871 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.invocation.WeavedInterceptor.processInvocation(WeavedInterceptor.java:53)
10:53:12,871 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.as.ee.component.interceptors.UserInterceptorFactory$1.processInvocation(UserInterceptorFactory.java:36)
10:53:12,872 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288)
10:53:12,872 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.invocation.InitialInterceptor.processInvocation(InitialInterceptor.java:21)
10:53:12,872 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288)
10:53:12,873 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.invocation.ChainedInterceptor.processInvocation(ChainedInterceptor.java:61)
10:53:12,873 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.as.ee.component.interceptors.ComponentDispatcherInterceptor.processInvocation(ComponentDispatcherInterceptor.java:53)
10:53:12,874 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288)
10:53:12,874 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.as.ejb3.component.pool.PooledInstanceInterceptor.processInvocation(PooledInstanceInterceptor.java:51)
10:53:12,874 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288)
10:53:12,875 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.as.ejb3.tx.CMTTxInterceptor.invokeInCallerTx(CMTTxInterceptor.java:202)
10:53:12,875 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.as.ejb3.tx.CMTTxInterceptor.required(CMTTxInterceptor.java:306)
10:53:12,876 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.as.ejb3.tx.CMTTxInterceptor.processInvocation(CMTTxInterceptor.java:190)
10:53:12,876 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288)
10:53:12,876 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.as.ejb3.component.interceptors.CurrentInvocationContextInterceptor.processInvocation(CurrentInvocationContextInterceptor.java:41)
10:53:12,877 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288)
10:53:12,877 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.as.ejb3.component.interceptors.LoggingInterceptor.processInvocation(LoggingInterceptor.java:59)
10:53:12,878 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288)
10:53:12,878 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.as.ee.component.NamespaceContextInterceptor.processInvocation(NamespaceContextInterceptor.java:50)
10:53:12,878 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288)
10:53:12,879 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.as.ejb3.component.interceptors.AdditionalSetupInterceptor.processInvocation(AdditionalSetupInterceptor.java:43)
10:53:12,879 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288)
10:53:12,880 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.as.ejb3.component.messagedriven.MessageDrivenComponentDescription$5$1.processInvocation(MessageDrivenComponentDescription.java:184)
10:53:12,880 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288)
10:53:12,881 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.as.ee.component.TCCLInterceptor.processInvocation(TCCLInterceptor.java:45)
10:53:12,881 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288)
10:53:12,881 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.invocation.ChainedInterceptor.processInvocation(ChainedInterceptor.java:61)
10:53:12,882 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.as.ee.component.ViewService$View.invoke(ViewService.java:165)
10:53:12,883 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.as.ee.component.ViewDescription$1.processInvocation(ViewDescription.java:173)
10:53:12,883 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288)

Thanks,
MG
Product Development Team




______________________________________________________________________
Confidentiality Notice: The information contained in this e-mail and any attachment(s) is confidential and for the use of the addressee(s) only. If you are not the intended recipient of this e-mail, do not duplicate or redistribute it by any means. Please delete this e-mail and any attachment(s) and notify us immediately. Unauthorized use, reliance, disclosure or copying of the contents of this e-mail and any attachment(s), or any similar action, is strictly prohibited. Fitch Ratings reserves the right, to the extent permitted by applicable law, to retain, monitor and intercept e-mail messages both to and from its systems.

This e-mail has been scanned by the MessageLabs Email Security System. For more information, please visit http://www.messagelabs.com/email.
______________________________________________________________________

RE: Memory issues with PDF parser

Posted by "Allison, Timothy B." <ta...@mitre.org>.
1)      Oh. Wow.  Y, that's way too small (in my experience)... probably.  How many threads are you running in that?  Yep, caught it thanks to your highlighting...sorry for missing it earlier.

2)      If you must know...see below for pdf files in govdocs1 with Tika 1.9 rc2. :) slightly > 100 caught exceptions out of ~250k files. YMMV.

3)      Ok, got it.  You're good to go with just the PDFParser.  Just checking...

select SORT_STACK_TRACE, count(1) as cnt
from exceptions_b b
join comparisons c on c.id = b.id
where detected_content_type_b='application/pdf'
and sort_stack_trace is not null
group by sort_stack_trace
order by cnt desc;
SORT_STACK_TRACE  <http://localhost:8082/query.do?jsessionid=44f8cf06364087705404f4acf7665325>

CNT  <http://localhost:8082/query.do?jsessionid=44f8cf06364087705404f4acf7665325>

java.io.IOException
    at org.apache.pdfbox.filter.FlateFilter.decode(FlateFilter.java
    at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:379)
    at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:291)
    at org.apache.pdfbox.cos.COSStream.getUnfilteredStream(COSStream.java:225)
    at org.apache.pdfbox.pdfparser.PDFStreamParser.<init>(PDFStreamParser.java:117)
    at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:251)
    at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:235)
    at org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:215)
    at org.apache.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:460)
    at org.apache.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:385)
    at org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:344)
    at o.a.t.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:141)
    at o.a.t.parser.pdf.PDFParser.parse(PDFParser.java:148)
    at o.a.t.parser.CompositeParser.parse(CompositeParser.java:281)
    at o.a.t.parser.CompositeParser.parse(CompositeParser.java:281)
    at o.a.t.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
    at o.a.t.parser.RecursiveParserWrapper.parse(RecursiveParserWrapper.java:159)
    at o.a.t.batch.FileResourceConsumer.parse(FileResourceConsumer.java:410)
    at o.a.t.batch.fs.RecursiveParserWrapperFSConsumer.processFileResource(RecursiveParserWrapperFSConsumer.java:104)
    at o.a.t.batch.FileResourceConsumer._processFileResource(FileResourceConsumer.java:182)
    at o.a.t.batch.FileResourceConsumer.call(FileResourceConsumer.java:115)
    at o.a.t.batch.FileResourceConsumer.call(FileResourceConsumer.java:49)
    at java.util.concurrent.FutureTask.run(FutureTask.java:262)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
    at java.util.concurrent.FutureTask.run(FutureTask.java:262)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)
Caused by: java.util.zip.DataFormatException
    at java.util.zip.Inflater.inflateBytes(Native Method)
    at java.util.zip.Inflater.inflate(Inflater.java:259)
    at java.util.zip.Inflater.inflate(Inflater.java:280)
    at org.apache.pdfbox.filter.FlateFilter.decompress(FlateFilter.java:128)
    at org.apache.pdfbox.filter.FlateFilter.decode(FlateFilter.java:101)
    ... 27 more

19

java.lang.IndexOutOfBoundsException
    at java.util.ArrayList.rangeCheck(ArrayList.java:635)
    at java.util.ArrayList.get(ArrayList.java:411)
    at org.apache.pdfbox.filter.LZWFilter.doLZWDecode(LZWFilter.java:145)
    at org.apache.pdfbox.filter.LZWFilter.decode(LZWFilter.java:114)
    at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:351)
    at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:291)
    at org.apache.pdfbox.cos.COSStream.getUnfilteredStream(COSStream.java:225)
    at org.apache.pdfbox.pdfparser.PDFStreamParser.<init>(PDFStreamParser.java:117)
    at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:251)
    at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:235)
    at org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:215)
    at org.apache.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:460)
    at org.apache.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:385)
    at org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:344)
    at o.a.t.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:141)
    at o.a.t.parser.pdf.PDFParser.parse(PDFParser.java:148)
    at o.a.t.parser.CompositeParser.parse(CompositeParser.java:281)
    at o.a.t.parser.CompositeParser.parse(CompositeParser.java:281)
    at o.a.t.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
    at o.a.t.parser.RecursiveParserWrapper.parse(RecursiveParserWrapper.java:159)
    at o.a.t.batch.FileResourceConsumer.parse(FileResourceConsumer.java:410)
    at o.a.t.batch.fs.RecursiveParserWrapperFSConsumer.processFileResource(RecursiveParserWrapperFSConsumer.java:104)
    at o.a.t.batch.FileResourceConsumer._processFileResource(FileResourceConsumer.java:182)
    at o.a.t.batch.FileResourceConsumer.call(FileResourceConsumer.java:115)
    at o.a.t.batch.FileResourceConsumer.call(FileResourceConsumer.java:49)
    at java.util.concurrent.FutureTask.run(FutureTask.java:262)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
    at java.util.concurrent.FutureTask.run(FutureTask.java:262)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)

16

java.lang.RuntimeException
    at org.apache.pdfbox.pdfparser.PDFStreamParser$1.tryNext(PDFStreamParser.java:198)
    at org.apache.pdfbox.pdfparser.PDFStreamParser$1.hasNext(PDFStreamParser.java:205)
    at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:255)
    at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:235)
    at org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:215)
    at org.apache.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:460)
    at org.apache.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:385)
    at org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:344)
    at o.a.t.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:141)
    at o.a.t.parser.pdf.PDFParser.parse(PDFParser.java:148)
    at o.a.t.parser.CompositeParser.parse(CompositeParser.java:281)
    at o.a.t.parser.CompositeParser.parse(CompositeParser.java:281)
    at o.a.t.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
    at o.a.t.parser.RecursiveParserWrapper.parse(RecursiveParserWrapper.java:159)
    at o.a.t.batch.FileResourceConsumer.parse(FileResourceConsumer.java:410)
    at o.a.t.batch.fs.RecursiveParserWrapperFSConsumer.processFileResource(RecursiveParserWrapperFSConsumer.java:104)
    at o.a.t.batch.FileResourceConsumer._processFileResource(FileResourceConsumer.java:182)
    at o.a.t.batch.FileResourceConsumer.call(FileResourceConsumer.java:115)
    at o.a.t.batch.FileResourceConsumer.call(FileResourceConsumer.java:49)
    at java.util.concurrent.FutureTask.run(FutureTask.java:262)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
    at java.util.concurrent.FutureTask.run(FutureTask.java:262)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException
    at org.apache.pdfbox.pdfparser.PDFStreamParser.parseNextToken(PDFStreamParser.java:391)
    at org.apache.pdfbox.pdfparser.PDFStreamParser.access$000(PDFStreamParser.java:49)
    at org.apache.pdfbox.pdfparser.PDFStreamParser$1.tryNext(PDFStreamParser.java:193)
    ... 24 more

14

java.lang.RuntimeException
    at org.apache.pdfbox.pdfparser.PDFStreamParser$1.tryNext(PDFStreamParser.java:198)
    at org.apache.pdfbox.pdfparser.PDFStreamParser$1.hasNext(PDFStreamParser.java:205)
    at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:255)
    at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:235)
    at org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:215)
    at org.apache.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:460)
    at org.apache.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:385)
    at org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:344)
    at o.a.t.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:141)
    at o.a.t.parser.pdf.PDFParser.parse(PDFParser.java:148)
    at o.a.t.parser.CompositeParser.parse(CompositeParser.java:281)
    at o.a.t.parser.CompositeParser.parse(CompositeParser.java:281)
    at o.a.t.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
    at o.a.t.parser.RecursiveParserWrapper.parse(RecursiveParserWrapper.java:159)
    at o.a.t.batch.FileResourceConsumer.parse(FileResourceConsumer.java:410)
    at o.a.t.batch.fs.RecursiveParserWrapperFSConsumer.processFileResource(RecursiveParserWrapperFSConsumer.java:104)
    at o.a.t.batch.FileResourceConsumer._processFileResource(FileResourceConsumer.java:182)
    at o.a.t.batch.FileResourceConsumer.call(FileResourceConsumer.java:115)
    at o.a.t.batch.FileResourceConsumer.call(FileResourceConsumer.java:49)
    at java.util.concurrent.FutureTask.run(FutureTask.java:262)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
    at java.util.concurrent.FutureTask.run(FutureTask.java:262)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException
    at org.apache.pdfbox.pdfparser.BaseParser.parseDirObject(BaseParser.java:1362)
    at org.apache.pdfbox.pdfparser.BaseParser.parseCOSArray(BaseParser.java:1066)
    at org.apache.pdfbox.pdfparser.PDFStreamParser.parseNextToken(PDFStreamParser.java:276)
    at org.apache.pdfbox.pdfparser.PDFStreamParser.access$000(PDFStreamParser.java:49)
    at org.apache.pdfbox.pdfparser.PDFStreamParser$1.tryNext(PDFStreamParser.java:193)
    ... 24 more

12

java.io.IOException
    at org.apache.pdfbox.filter.FlateFilter.decode(FlateFilter.java
    at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:379)
    at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:291)
    at org.apache.pdfbox.cos.COSStream.getUnfilteredStream(COSStream.java:225)
    at org.apache.pdfbox.pdmodel.common.COSStreamArray.getUnfilteredStream(COSStreamArray.java:197)
    at org.apache.pdfbox.pdfparser.PDFStreamParser.<init>(PDFStreamParser.java:117)
    at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:251)
    at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:235)
    at org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:215)
    at org.apache.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:460)
    at org.apache.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:385)
    at org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:344)
    at o.a.t.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:141)
    at o.a.t.parser.pdf.PDFParser.parse(PDFParser.java:148)
    at o.a.t.parser.CompositeParser.parse(CompositeParser.java:281)
    at o.a.t.parser.CompositeParser.parse(CompositeParser.java:281)
    at o.a.t.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
    at o.a.t.parser.RecursiveParserWrapper.parse(RecursiveParserWrapper.java:159)
    at o.a.t.batch.FileResourceConsumer.parse(FileResourceConsumer.java:410)
    at o.a.t.batch.fs.RecursiveParserWrapperFSConsumer.processFileResource(RecursiveParserWrapperFSConsumer.java:104)
    at o.a.t.batch.FileResourceConsumer._processFileResource(FileResourceConsumer.java:182)
    at o.a.t.batch.FileResourceConsumer.call(FileResourceConsumer.java:115)
    at o.a.t.batch.FileResourceConsumer.call(FileResourceConsumer.java:49)
    at java.util.concurrent.FutureTask.run(FutureTask.java:262)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
    at java.util.concurrent.FutureTask.run(FutureTask.java:262)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)
Caused by: java.util.zip.DataFormatException
    at java.util.zip.Inflater.inflateBytes(Native Method)
    at java.util.zip.Inflater.inflate(Inflater.java:259)
    at java.util.zip.Inflater.inflate(Inflater.java:280)
    at org.apache.pdfbox.filter.FlateFilter.decompress(FlateFilter.java:128)
    at org.apache.pdfbox.filter.FlateFilter.decode(FlateFilter.java:101)
    ... 28 more

6

java.lang.IndexOutOfBoundsException
    at java.util.ArrayList.rangeCheck(ArrayList.java:635)
    at java.util.ArrayList.get(ArrayList.java:411)
    at org.apache.pdfbox.filter.LZWFilter.doLZWDecode(LZWFilter.java:157)
    at org.apache.pdfbox.filter.LZWFilter.decode(LZWFilter.java:114)
    at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:351)
    at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:291)
    at org.apache.pdfbox.cos.COSStream.getUnfilteredStream(COSStream.java:225)
    at org.apache.pdfbox.pdfparser.PDFStreamParser.<init>(PDFStreamParser.java:117)
    at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:251)
    at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:235)
    at org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:215)
    at org.apache.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:460)
    at org.apache.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:385)
    at org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:344)
    at o.a.t.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:141)
    at o.a.t.parser.pdf.PDFParser.parse(PDFParser.java:148)
    at o.a.t.parser.CompositeParser.parse(CompositeParser.java:281)
    at o.a.t.parser.CompositeParser.parse(CompositeParser.java:281)
    at o.a.t.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
    at o.a.t.parser.RecursiveParserWrapper.parse(RecursiveParserWrapper.java:159)
    at o.a.t.batch.FileResourceConsumer.parse(FileResourceConsumer.java:410)
    at o.a.t.batch.fs.RecursiveParserWrapperFSConsumer.processFileResource(RecursiveParserWrapperFSConsumer.java:104)
    at o.a.t.batch.FileResourceConsumer._processFileResource(FileResourceConsumer.java:182)
    at o.a.t.batch.FileResourceConsumer.call(FileResourceConsumer.java:115)
    at o.a.t.batch.FileResourceConsumer.call(FileResourceConsumer.java:49)
    at java.util.concurrent.FutureTask.run(FutureTask.java:262)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
    at java.util.concurrent.FutureTask.run(FutureTask.java:262)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)

5

java.lang.RuntimeException
    at org.apache.pdfbox.pdfparser.PDFStreamParser$1.tryNext(PDFStreamParser.java:198)
    at org.apache.pdfbox.pdfparser.PDFStreamParser$1.hasNext(PDFStreamParser.java:205)
    at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:255)
    at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:235)
    at org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:215)
    at org.apache.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:460)
    at org.apache.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:385)
    at org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:344)
    at o.a.t.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:141)
    at o.a.t.parser.pdf.PDFParser.parse(PDFParser.java:148)
    at o.a.t.parser.CompositeParser.parse(CompositeParser.java:281)
    at o.a.t.parser.CompositeParser.parse(CompositeParser.java:281)
    at o.a.t.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
    at o.a.t.parser.RecursiveParserWrapper.parse(RecursiveParserWrapper.java:159)
    at o.a.t.batch.FileResourceConsumer.parse(FileResourceConsumer.java:410)
    at o.a.t.batch.fs.RecursiveParserWrapperFSConsumer.processFileResource(RecursiveParserWrapperFSConsumer.java:104)
    at o.a.t.batch.FileResourceConsumer._processFileResource(FileResourceConsumer.java:182)
    at o.a.t.batch.FileResourceConsumer.call(FileResourceConsumer.java:115)
    at o.a.t.batch.FileResourceConsumer.call(FileResourceConsumer.java:49)
    at java.util.concurrent.FutureTask.run(FutureTask.java:262)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
    at java.util.concurrent.FutureTask.run(FutureTask.java:262)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException
    at org.apache.pdfbox.cos.COSFloat.<init>(COSFloat.java:62)
    at org.apache.pdfbox.cos.COSNumber.get(COSNumber.java:116)
    at org.apache.pdfbox.pdfparser.BaseParser.parseDirObject(BaseParser.java:1348)
    at org.apache.pdfbox.pdfparser.BaseParser.parseCOSArray(BaseParser.java:1066)
    at org.apache.pdfbox.pdfparser.PDFStreamParser.parseNextToken(PDFStreamParser.java:276)
    at org.apache.pdfbox.pdfparser.PDFStreamParser.access$000(PDFStreamParser.java:49)
    at org.apache.pdfbox.pdfparser.PDFStreamParser$1.tryNext(PDFStreamParser.java:193)
    ... 24 more

4

java.lang.IndexOutOfBoundsException
    at java.util.ArrayList.rangeCheck(ArrayList.java:635)
    at java.util.ArrayList.get(ArrayList.java:411)
    at org.apache.pdfbox.filter.LZWFilter.doLZWDecode(LZWFilter.java:145)
    at org.apache.pdfbox.filter.LZWFilter.decode(LZWFilter.java:114)
    at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:351)
    at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:291)
    at org.apache.pdfbox.cos.COSStream.getUnfilteredStream(COSStream.java:225)
    at org.apache.pdfbox.pdmodel.common.COSStreamArray.getUnfilteredStream(COSStreamArray.java:197)
    at org.apache.pdfbox.pdfparser.PDFStreamParser.<init>(PDFStreamParser.java:117)
    at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:251)
    at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:235)
    at org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:215)
    at org.apache.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:460)
    at org.apache.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:385)
    at org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:344)
    at o.a.t.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:141)
    at o.a.t.parser.pdf.PDFParser.parse(PDFParser.java:148)
    at o.a.t.parser.CompositeParser.parse(CompositeParser.java:281)
    at o.a.t.parser.CompositeParser.parse(CompositeParser.java:281)
    at o.a.t.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
    at o.a.t.parser.RecursiveParserWrapper.parse(RecursiveParserWrapper.java:159)
    at o.a.t.batch.FileResourceConsumer.parse(FileResourceConsumer.java:410)
    at o.a.t.batch.fs.RecursiveParserWrapperFSConsumer.processFileResource(RecursiveParserWrapperFSConsumer.java:104)
    at o.a.t.batch.FileResourceConsumer._processFileResource(FileResourceConsumer.java:182)
    at o.a.t.batch.FileResourceConsumer.call(FileResourceConsumer.java:115)
    at o.a.t.batch.FileResourceConsumer.call(FileResourceConsumer.java:49)
    at java.util.concurrent.FutureTask.run(FutureTask.java:262)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
    at java.util.concurrent.FutureTask.run(FutureTask.java:262)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)

4

java.lang.RuntimeException
    at org.apache.pdfbox.pdfparser.PDFStreamParser$1.tryNext(PDFStreamParser.java:198)
    at org.apache.pdfbox.pdfparser.PDFStreamParser$1.hasNext(PDFStreamParser.java:205)
    at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:255)
    at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:235)
    at org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:215)
    at org.apache.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:460)
    at org.apache.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:385)
    at org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:344)
    at o.a.t.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:141)
    at o.a.t.parser.pdf.PDFParser.parse(PDFParser.java:148)
    at o.a.t.parser.CompositeParser.parse(CompositeParser.java:281)
    at o.a.t.parser.CompositeParser.parse(CompositeParser.java:281)
    at o.a.t.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
    at o.a.t.parser.RecursiveParserWrapper.parse(RecursiveParserWrapper.java:159)
    at o.a.t.batch.FileResourceConsumer.parse(FileResourceConsumer.java:410)
    at o.a.t.batch.fs.RecursiveParserWrapperFSConsumer.processFileResource(RecursiveParserWrapperFSConsumer.java:104)
    at o.a.t.batch.FileResourceConsumer._processFileResource(FileResourceConsumer.java:182)
    at o.a.t.batch.FileResourceConsumer.call(FileResourceConsumer.java:115)
    at o.a.t.batch.FileResourceConsumer.call(FileResourceConsumer.java:49)
    at java.util.concurrent.FutureTask.run(FutureTask.java:262)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
    at java.util.concurrent.FutureTask.run(FutureTask.java:262)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException
    at org.apache.pdfbox.pdfparser.BaseParser.parseDirObject(BaseParser.java:1303)
    at org.apache.pdfbox.pdfparser.BaseParser.parseCOSArray(BaseParser.java:1066)
    at org.apache.pdfbox.pdfparser.PDFStreamParser.parseNextToken(PDFStreamParser.java:276)
    at org.apache.pdfbox.pdfparser.PDFStreamParser.access$000(PDFStreamParser.java:49)
    at org.apache.pdfbox.pdfparser.PDFStreamParser$1.tryNext(PDFStreamParser.java:193)
    ... 24 more

3

java.lang.RuntimeException
    at org.apache.pdfbox.pdfparser.PDFStreamParser$1.tryNext(PDFStreamParser.java:198)
    at org.apache.pdfbox.pdfparser.PDFStreamParser$1.hasNext(PDFStreamParser.java:205)
    at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:255)
    at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:235)
    at org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:215)
    at org.apache.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:460)
    at org.apache.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:385)
    at org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:344)
    at o.a.t.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:141)
    at o.a.t.parser.pdf.PDFParser.parse(PDFParser.java:148)
    at o.a.t.parser.CompositeParser.parse(CompositeParser.java:281)
    at o.a.t.parser.CompositeParser.parse(CompositeParser.java:281)
    at o.a.t.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
    at o.a.t.parser.RecursiveParserWrapper.parse(RecursiveParserWrapper.java:159)
    at o.a.t.batch.FileResourceConsumer.parse(FileResourceConsumer.java:410)
    at o.a.t.batch.fs.RecursiveParserWrapperFSConsumer.processFileResource(RecursiveParserWrapperFSConsumer.java:104)
    at o.a.t.batch.FileResourceConsumer._processFileResource(FileResourceConsumer.java:182)
    at o.a.t.batch.FileResourceConsumer.call(FileResourceConsumer.java:115)
    at o.a.t.batch.FileResourceConsumer.call(FileResourceConsumer.java:49)
    at java.util.concurrent.FutureTask.run(FutureTask.java:262)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
    at java.util.concurrent.FutureTask.run(FutureTask.java:262)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException
    at org.apache.pdfbox.cos.COSNumber.get(COSNumber.java:111)
    at org.apache.pdfbox.pdfparser.PDFStreamParser.parseNextToken(PDFStreamParser.java:361)
    at org.apache.pdfbox.pdfparser.PDFStreamParser.access$000(PDFStreamParser.java:49)
    at org.apache.pdfbox.pdfparser.PDFStreamParser$1.tryNext(PDFStreamParser.java:193)
    ... 24 more

3

java.lang.IndexOutOfBoundsException
    at java.util.ArrayList.rangeCheck(ArrayList.java:635)
    at java.util.ArrayList.get(ArrayList.java:411)
    at org.apache.pdfbox.filter.LZWFilter.doLZWDecode(LZWFilter.java:157)
    at org.apache.pdfbox.filter.LZWFilter.decode(LZWFilter.java:114)
    at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:351)
    at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:291)
    at org.apache.pdfbox.cos.COSStream.getUnfilteredStream(COSStream.java:225)
    at org.apache.pdfbox.pdmodel.common.COSStreamArray.getUnfilteredStream(COSStreamArray.java:197)
    at org.apache.pdfbox.pdfparser.PDFStreamParser.<init>(PDFStreamParser.java:117)
    at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:251)
    at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:235)
    at org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:215)
    at org.apache.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:460)
    at org.apache.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:385)
    at org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:344)
    at o.a.t.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:141)
    at o.a.t.parser.pdf.PDFParser.parse(PDFParser.java:148)
    at o.a.t.parser.CompositeParser.parse(CompositeParser.java:281)
    at o.a.t.parser.CompositeParser.parse(CompositeParser.java:281)
    at o.a.t.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
    at o.a.t.parser.RecursiveParserWrapper.parse(RecursiveParserWrapper.java:159)
    at o.a.t.batch.FileResourceConsumer.parse(FileResourceConsumer.java:410)
    at o.a.t.batch.fs.RecursiveParserWrapperFSConsumer.processFileResource(RecursiveParserWrapperFSConsumer.java:104)
    at o.a.t.batch.FileResourceConsumer._processFileResource(FileResourceConsumer.java:182)
    at o.a.t.batch.FileResourceConsumer.call(FileResourceConsumer.java:115)
    at o.a.t.batch.FileResourceConsumer.call(FileResourceConsumer.java:49)
    at java.util.concurrent.FutureTask.run(FutureTask.java:262)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
    at java.util.concurrent.FutureTask.run(FutureTask.java:262)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)

3

java.io.IOException
    at org.apache.pdfbox.filter.FlateFilter.decode(FlateFilter.java
    at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:379)
    at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:299)
    at org.apache.pdfbox.cos.COSStream.getUnfilteredStream(COSStream.java:225)
    at org.apache.pdfbox.pdfparser.PDFStreamParser.<init>(PDFStreamParser.java:117)
    at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:251)
    at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:235)
    at org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:215)
    at org.apache.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:460)
    at org.apache.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:385)
    at org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:344)
    at o.a.t.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:141)
    at o.a.t.parser.pdf.PDFParser.parse(PDFParser.java:148)
    at o.a.t.parser.CompositeParser.parse(CompositeParser.java:281)
    at o.a.t.parser.CompositeParser.parse(CompositeParser.java:281)
    at o.a.t.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
    at o.a.t.parser.RecursiveParserWrapper.parse(RecursiveParserWrapper.java:159)
    at o.a.t.batch.FileResourceConsumer.parse(FileResourceConsumer.java:410)
    at o.a.t.batch.fs.RecursiveParserWrapperFSConsumer.processFileResource(RecursiveParserWrapperFSConsumer.java:104)
    at o.a.t.batch.FileResourceConsumer._processFileResource(FileResourceConsumer.java:182)
    at o.a.t.batch.FileResourceConsumer.call(FileResourceConsumer.java:115)
    at o.a.t.batch.FileResourceConsumer.call(FileResourceConsumer.java:49)
    at java.util.concurrent.FutureTask.run(FutureTask.java:262)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
    at java.util.concurrent.FutureTask.run(FutureTask.java:262)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)
Caused by: java.util.zip.DataFormatException
    at java.util.zip.Inflater.inflateBytes(Native Method)
    at java.util.zip.Inflater.inflate(Inflater.java:259)
    at java.util.zip.Inflater.inflate(Inflater.java:280)
    at org.apache.pdfbox.filter.FlateFilter.decompress(FlateFilter.java:128)
    at org.apache.pdfbox.filter.FlateFilter.decode(FlateFilter.java:101)
    ... 27 more

2

java.lang.NullPointerException
    at org.apache.pdfbox.filter.LZWFilter.doLZWDecode(LZWFilter.java
    at org.apache.pdfbox.filter.LZWFilter.decode(LZWFilter.java:114)
    at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:351)
    at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:291)
    at org.apache.pdfbox.cos.COSStream.getUnfilteredStream(COSStream.java:225)
    at org.apache.pdfbox.pdfparser.PDFStreamParser.<init>(PDFStreamParser.java:117)
    at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:251)
    at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:235)
    at org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:215)
    at org.apache.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:460)
    at org.apache.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:385)
    at org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:344)
    at o.a.t.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:141)
    at o.a.t.parser.pdf.PDFParser.parse(PDFParser.java:148)
    at o.a.t.parser.CompositeParser.parse(CompositeParser.java:281)
    at o.a.t.parser.CompositeParser.parse(CompositeParser.java:281)
    at o.a.t.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
    at o.a.t.parser.RecursiveParserWrapper.parse(RecursiveParserWrapper.java:159)
    at o.a.t.batch.FileResourceConsumer.parse(FileResourceConsumer.java:410)
    at o.a.t.batch.fs.RecursiveParserWrapperFSConsumer.processFileResource(RecursiveParserWrapperFSConsumer.java:104)
    at o.a.t.batch.FileResourceConsumer._processFileResource(FileResourceConsumer.java:182)
    at o.a.t.batch.FileResourceConsumer.call(FileResourceConsumer.java:115)
    at o.a.t.batch.FileResourceConsumer.call(FileResourceConsumer.java:49)
    at java.util.concurrent.FutureTask.run(FutureTask.java:262)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
    at java.util.concurrent.FutureTask.run(FutureTask.java:262)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)

2

java.lang.RuntimeException
    at org.apache.pdfbox.pdfparser.PDFStreamParser$1.tryNext(PDFStreamParser.java:198)
    at org.apache.pdfbox.pdfparser.PDFStreamParser$1.hasNext(PDFStreamParser.java:205)
    at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:255)
    at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:235)
    at org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:215)
    at org.apache.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:460)
    at org.apache.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:385)
    at org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:344)
    at o.a.t.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:141)
    at o.a.t.parser.pdf.PDFParser.parse(PDFParser.java:148)
    at o.a.t.parser.CompositeParser.parse(CompositeParser.java:281)
    at o.a.t.parser.CompositeParser.parse(CompositeParser.java:281)
    at o.a.t.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
    at o.a.t.parser.RecursiveParserWrapper.parse(RecursiveParserWrapper.java:159)
    at o.a.t.batch.FileResourceConsumer.parse(FileResourceConsumer.java:410)
    at o.a.t.batch.fs.RecursiveParserWrapperFSConsumer.processFileResource(RecursiveParserWrapperFSConsumer.java:104)
    at o.a.t.batch.FileResourceConsumer._processFileResource(FileResourceConsumer.java:182)
    at o.a.t.batch.FileResourceConsumer.call(FileResourceConsumer.java:115)
    at o.a.t.batch.FileResourceConsumer.call(FileResourceConsumer.java:49)
    at java.util.concurrent.FutureTask.run(FutureTask.java:262)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
    at java.util.concurrent.FutureTask.run(FutureTask.java:262)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException
    at org.apache.pdfbox.pdfparser.BaseParser.parseDirObject(BaseParser.java:1362)
    at org.apache.pdfbox.pdfparser.BaseParser.parseCOSDictionaryValue(BaseParser.java:249)
    at org.apache.pdfbox.pdfparser.BaseParser.parseCOSDictionary(BaseParser.java:356)
    at org.apache.pdfbox.pdfparser.PDFStreamParser.parseNextToken(PDFStreamParser.java:257)
    at org.apache.pdfbox.pdfparser.PDFStreamParser.access$000(PDFStreamParser.java:49)
    at org.apache.pdfbox.pdfparser.PDFStreamParser$1.tryNext(PDFStreamParser.java:193)
    ... 24 more

1

java.io.IOException
    at org.apache.pdfbox.pdfparser.BaseParser.readLine(BaseParser.java:1517)
    at org.apache.pdfbox.pdfparser.PDFParser.parseHeader(PDFParser.java:364)
    at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:186)
    at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1239)
    at o.a.t.parser.pdf.PDFParser.parse(PDFParser.java:125)
    at o.a.t.parser.CompositeParser.parse(CompositeParser.java:281)
    at o.a.t.parser.CompositeParser.parse(CompositeParser.java:281)
    at o.a.t.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
    at o.a.t.parser.RecursiveParserWrapper.parse(RecursiveParserWrapper.java:159)
    at o.a.t.batch.FileResourceConsumer.parse(FileResourceConsumer.java:410)
    at o.a.t.batch.fs.RecursiveParserWrapperFSConsumer.processFileResource(RecursiveParserWrapperFSConsumer.java:104)
    at o.a.t.batch.FileResourceConsumer._processFileResource(FileResourceConsumer.java:182)
    at o.a.t.batch.FileResourceConsumer.call(FileResourceConsumer.java:115)
    at o.a.t.batch.FileResourceConsumer.call(FileResourceConsumer.java:49)
    at java.util.concurrent.FutureTask.run(FutureTask.java:262)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
    at java.util.concurrent.FutureTask.run(FutureTask.java:262)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)

1

java.lang.RuntimeException
    at org.apache.pdfbox.pdfparser.PDFStreamParser$1.tryNext(PDFStreamParser.java:198)
    at org.apache.pdfbox.pdfparser.PDFStreamParser$1.hasNext(PDFStreamParser.java:205)
    at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:255)
    at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:235)
    at org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:215)
    at org.apache.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:460)
    at org.apache.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:385)
    at org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:344)
    at o.a.t.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:141)
    at o.a.t.parser.pdf.PDFParser.parse(PDFParser.java:148)
    at o.a.t.parser.CompositeParser.parse(CompositeParser.java:281)
    at o.a.t.parser.CompositeParser.parse(CompositeParser.java:281)
    at o.a.t.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
    at o.a.t.parser.RecursiveParserWrapper.parse(RecursiveParserWrapper.java:159)
    at o.a.t.batch.FileResourceConsumer.parse(FileResourceConsumer.java:410)
    at o.a.t.batch.fs.RecursiveParserWrapperFSConsumer.processFileResource(RecursiveParserWrapperFSConsumer.java:104)
    at o.a.t.batch.FileResourceConsumer._processFileResource(FileResourceConsumer.java:182)
    at o.a.t.batch.FileResourceConsumer.call(FileResourceConsumer.java:115)
    at o.a.t.batch.FileResourceConsumer.call(FileResourceConsumer.java:49)
    at java.util.concurrent.FutureTask.run(FutureTask.java:262)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
    at java.util.concurrent.FutureTask.run(FutureTask.java:262)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException
    at org.apache.pdfbox.pdfparser.BaseParser.parseDirObject(BaseParser.java:1362)
    at org.apache.pdfbox.pdfparser.BaseParser.parseCOSArray(BaseParser.java:1066)
    at org.apache.pdfbox.pdfparser.BaseParser.parseDirObject(BaseParser.java:1275)
    at org.apache.pdfbox.pdfparser.BaseParser.parseCOSArray(BaseParser.java:1066)
    at org.apache.pdfbox.pdfparser.PDFStreamParser.parseNextToken(PDFStreamParser.java:276)
    at org.apache.pdfbox.pdfparser.PDFStreamParser.access$000(PDFStreamParser.java:49)
    at org.apache.pdfbox.pdfparser.PDFStreamParser$1.tryNext(PDFStreamParser.java:193)
    ... 24 more

1

java.lang.ArrayIndexOutOfBoundsException
    at java.util.ArrayList.elementData(ArrayList.java:400)
    at java.util.ArrayList.get(ArrayList.java:413)
    at org.apache.pdfbox.filter.LZWFilter.doLZWDecode(LZWFilter.java:157)
    at org.apache.pdfbox.filter.LZWFilter.decode(LZWFilter.java:114)
    at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:351)
    at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:291)
    at org.apache.pdfbox.cos.COSStream.getUnfilteredStream(COSStream.java:225)
    at org.apache.pdfbox.pdfparser.PDFStreamParser.<init>(PDFStreamParser.java:117)
    at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:251)
    at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:235)
    at org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:215)
    at org.apache.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:460)
    at org.apache.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:385)
    at org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:344)
    at o.a.t.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:141)
    at o.a.t.parser.pdf.PDFParser.parse(PDFParser.java:148)
    at o.a.t.parser.CompositeParser.parse(CompositeParser.java:281)
    at o.a.t.parser.CompositeParser.parse(CompositeParser.java:281)
    at o.a.t.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
    at o.a.t.parser.RecursiveParserWrapper.parse(RecursiveParserWrapper.java:159)
    at o.a.t.batch.FileResourceConsumer.parse(FileResourceConsumer.java:410)
    at o.a.t.batch.fs.RecursiveParserWrapperFSConsumer.processFileResource(RecursiveParserWrapperFSConsumer.java:104)
    at o.a.t.batch.FileResourceConsumer._processFileResource(FileResourceConsumer.java:182)
    at o.a.t.batch.FileResourceConsumer.call(FileResourceConsumer.java:115)
    at o.a.t.batch.FileResourceConsumer.call(FileResourceConsumer.java:49)
    at java.util.concurrent.FutureTask.run(FutureTask.java:262)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
    at java.util.concurrent.FutureTask.run(FutureTask.java:262)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)

1

o.a.t.exception.TikaException
    at o.a.t.parser.CompositeParser.parse(CompositeParser.java:287)
    at o.a.t.parser.CompositeParser.parse(CompositeParser.java:281)
    at o.a.t.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
    at o.a.t.parser.ParserDecorator.parse(ParserDecorator.java:163)
    at o.a.t.parser.RecursiveParserWrapper$EmbeddedParserDecorator.parse(RecursiveParserWrapper.java:318)
    at o.a.t.parser.DelegatingParser.parse(DelegatingParser.java:72)
    at o.a.t.extractor.ParsingEmbeddedDocumentExtractor.parseEmbedded(ParsingEmbeddedDocumentExtractor.java:103)
    at o.a.t.parser.microsoft.AbstractPOIFSExtractor.handleEmbeddedResource(AbstractPOIFSExtractor.java:129)
    at o.a.t.parser.microsoft.WordExtractor.handlePictureCharacterRun(WordExtractor.java:564)
    at o.a.t.parser.microsoft.WordExtractor.handleSpecialCharacterRuns(WordExtractor.java:489)
    at o.a.t.parser.microsoft.WordExtractor.handleParagraph(WordExtractor.java:320)
    at o.a.t.parser.microsoft.WordExtractor.parse(WordExtractor.java:168)
    at o.a.t.parser.microsoft.OfficeParser.parse(OfficeParser.java:146)
    at o.a.t.parser.microsoft.OfficeParser.parse(OfficeParser.java:117)
    at o.a.t.parser.CompositeParser.parse(CompositeParser.java:281)
    at o.a.t.parser.CompositeParser.parse(CompositeParser.java:281)
    at o.a.t.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
    at o.a.t.parser.RecursiveParserWrapper.parse(RecursiveParserWrapper.java:159)
    at o.a.t.batch.FileResourceConsumer.parse(FileResourceConsumer.java:410)
    at o.a.t.batch.fs.RecursiveParserWrapperFSConsumer.processFileResource(RecursiveParserWrapperFSConsumer.java:104)
    at o.a.t.batch.FileResourceConsumer._processFileResource(FileResourceConsumer.java:182)
    at o.a.t.batch.FileResourceConsumer.call(FileResourceConsumer.java:115)
    at o.a.t.batch.FileResourceConsumer.call(FileResourceConsumer.java:49)
    at java.util.concurrent.FutureTask.run(FutureTask.java:262)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
    at java.util.concurrent.FutureTask.run(FutureTask.java:262)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException
    at org.apache.pdfbox.pdfparser.PDFParser.parseHeader(PDFParser.java:379)
    at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:186)
    at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1239)
    at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1204)
    at o.a.t.parser.pdf.PDFParser.parse(PDFParser.java:132)
    at o.a.t.parser.CompositeParser.parse(CompositeParser.java:281)
    ... 28 more

1

java.lang.ClassCastException
    at org.apache.pdfbox.pdfparser.PDFStreamParser.parseNextToken(PDFStreamParser.java:380)
    at org.apache.pdfbox.pdfparser.PDFStreamParser.access$000(PDFStreamParser.java:49)
    at org.apache.pdfbox.pdfparser.PDFStreamParser$1.tryNext(PDFStreamParser.java:193)
    at org.apache.pdfbox.pdfparser.PDFStreamParser$1.hasNext(PDFStreamParser.java:205)
    at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:255)
    at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:235)
    at org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:215)
    at org.apache.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:460)
    at org.apache.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:385)
    at org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:344)
    at o.a.t.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:141)
    at o.a.t.parser.pdf.PDFParser.parse(PDFParser.java:148)
    at o.a.t.parser.CompositeParser.parse(CompositeParser.java:281)
    at o.a.t.parser.CompositeParser.parse(CompositeParser.java:281)
    at o.a.t.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
    at o.a.t.parser.RecursiveParserWrapper.parse(RecursiveParserWrapper.java:159)
    at o.a.t.batch.FileResourceConsumer.parse(FileResourceConsumer.java:410)
    at o.a.t.batch.fs.RecursiveParserWrapperFSConsumer.processFileResource(RecursiveParserWrapperFSConsumer.java:104)
    at o.a.t.batch.FileResourceConsumer._processFileResource(FileResourceConsumer.java:182)
    at o.a.t.batch.FileResourceConsumer.call(FileResourceConsumer.java:115)
    at o.a.t.batch.FileResourceConsumer.call(FileResourceConsumer.java:49)
    at java.util.concurrent.FutureTask.run(FutureTask.java:262)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
    at java.util.concurrent.FutureTask.run(FutureTask.java:262)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)

1

java.io.IOException
    at org.apache.pdfbox.pdfparser.PDFParser.parseHeader(PDFParser.java:379)
    at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:186)
    at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1239)
    at o.a.t.parser.pdf.PDFParser.parse(PDFParser.java:125)
    at o.a.t.parser.CompositeParser.parse(CompositeParser.java:281)
    at o.a.t.parser.CompositeParser.parse(CompositeParser.java:281)
    at o.a.t.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
    at o.a.t.parser.RecursiveParserWrapper.parse(RecursiveParserWrapper.java:159)
    at o.a.t.batch.FileResourceConsumer.parse(FileResourceConsumer.java:410)
    at o.a.t.batch.fs.RecursiveParserWrapperFSConsumer.processFileResource(RecursiveParserWrapperFSConsumer.java:104)
    at o.a.t.batch.FileResourceConsumer._processFileResource(FileResourceConsumer.java:182)
    at o.a.t.batch.FileResourceConsumer.call(FileResourceConsumer.java:115)
    at o.a.t.batch.FileResourceConsumer.call(FileResourceConsumer.java:49)
    at java.util.concurrent.FutureTask.run(FutureTask.java:262)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
    at java.util.concurrent.FutureTask.run(FutureTask.java:262)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)

1

java.lang.IndexOutOfBoundsException
    at java.util.ArrayList.rangeCheck(ArrayList.java:635)
    at java.util.ArrayList.get(ArrayList.java:411)
    at org.apache.pdfbox.filter.LZWFilter.doLZWDecode(LZWFilter.java:145)
    at org.apache.pdfbox.filter.LZWFilter.decode(LZWFilter.java:114)
    at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:351)
    at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:291)
    at org.apache.pdfbox.cos.COSStream.getUnfilteredStream(COSStream.java:225)
    at org.apache.pdfbox.pdfparser.PDFStreamParser.<init>(PDFStreamParser.java:117)
    at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:251)
    at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:244)
    at org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:215)
    at org.apache.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:460)
    at org.apache.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:385)
    at org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:344)
    at o.a.t.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:141)
    at o.a.t.parser.pdf.PDFParser.parse(PDFParser.java:148)
    at o.a.t.parser.CompositeParser.parse(CompositeParser.java:281)
    at o.a.t.parser.CompositeParser.parse(CompositeParser.java:281)
    at o.a.t.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
    at o.a.t.parser.RecursiveParserWrapper.parse(RecursiveParserWrapper.java:159)
    at o.a.t.batch.FileResourceConsumer.parse(FileResourceConsumer.java:410)
    at o.a.t.batch.fs.RecursiveParserWrapperFSConsumer.processFileResource(RecursiveParserWrapperFSConsumer.java:104)
    at o.a.t.batch.FileResourceConsumer._processFileResource(FileResourceConsumer.java:182)
    at o.a.t.batch.FileResourceConsumer.call(FileResourceConsumer.java:115)
    at o.a.t.batch.FileResourceConsumer.call(FileResourceConsumer.java:49)
    at java.util.concurrent.FutureTask.run(FutureTask.java:262)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
    at java.util.concurrent.FutureTask.run(FutureTask.java:262)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)

1

java.io.IOException
    at org.apache.pdfbox.pdmodel.common.PDStream.createFromCOS(PDStream.java:192)
    at org.apache.pdfbox.pdmodel.PDPage.getContents(PDPage.java:639)
    at org.apache.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:380)
    at org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:344)
    at o.a.t.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:141)
    at o.a.t.parser.pdf.PDFParser.parse(PDFParser.java:148)
    at o.a.t.parser.CompositeParser.parse(CompositeParser.java:281)
    at o.a.t.parser.CompositeParser.parse(CompositeParser.java:281)
    at o.a.t.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
    at o.a.t.parser.RecursiveParserWrapper.parse(RecursiveParserWrapper.java:159)
    at o.a.t.batch.FileResourceConsumer.parse(FileResourceConsumer.java:410)
    at o.a.t.batch.fs.RecursiveParserWrapperFSConsumer.processFileResource(RecursiveParserWrapperFSConsumer.java:104)
    at o.a.t.batch.FileResourceConsumer._processFileResource(FileResourceConsumer.java:182)
    at o.a.t.batch.FileResourceConsumer.call(FileResourceConsumer.java:115)
    at o.a.t.batch.FileResourceConsumer.call(FileResourceConsumer.java:49)
    at java.util.concurrent.FutureTask.run(FutureTask.java:262)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
    at java.util.concurrent.FutureTask.run(FutureTask.java:262)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)

1

java.io.IOException
    at org.apache.pdfbox.filter.FilterManager.getFilter(FilterManager.java:106)
    at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:319)
    at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:291)
    at org.apache.pdfbox.cos.COSStream.getUnfilteredStream(COSStream.java:225)
    at org.apache.pdfbox.pdmodel.common.COSStreamArray.getUnfilteredStream(COSStreamArray.java:197)
    at org.apache.pdfbox.pdfparser.PDFStreamParser.<init>(PDFStreamParser.java:117)
    at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:251)
    at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:235)
    at org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:215)
    at org.apache.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:460)
    at org.apache.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:385)
    at org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:344)
    at o.a.t.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:141)
    at o.a.t.parser.pdf.PDFParser.parse(PDFParser.java:148)
    at o.a.t.parser.CompositeParser.parse(CompositeParser.java:281)
    at o.a.t.parser.CompositeParser.parse(CompositeParser.java:281)
    at o.a.t.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
    at o.a.t.parser.RecursiveParserWrapper.parse(RecursiveParserWrapper.java:159)
    at o.a.t.batch.FileResourceConsumer.parse(FileResourceConsumer.java:410)
    at o.a.t.batch.fs.RecursiveParserWrapperFSConsumer.processFileResource(RecursiveParserWrapperFSConsumer.java:104)
    at o.a.t.batch.FileResourceConsumer._processFileResource(FileResourceConsumer.java:182)
    at o.a.t.batch.FileResourceConsumer.call(FileResourceConsumer.java:115)
    at o.a.t.batch.FileResourceConsumer.call(FileResourceConsumer.java:49)
    at java.util.concurrent.FutureTask.run(FutureTask.java:262)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
    at java.util.concurrent.FutureTask.run(FutureTask.java:262)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)

1

java.lang.RuntimeException
    at org.apache.pdfbox.pdfparser.PDFStreamParser$1.tryNext(PDFStreamParser.java:198)
    at org.apache.pdfbox.pdfparser.PDFStreamParser$1.hasNext(PDFStreamParser.java:205)
    at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:255)
    at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:235)
    at org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:215)
    at org.apache.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:460)
    at org.apache.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:385)
    at org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:344)
    at o.a.t.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:141)
    at o.a.t.parser.pdf.PDFParser.parse(PDFParser.java:148)
    at o.a.t.parser.CompositeParser.parse(CompositeParser.java:281)
    at o.a.t.parser.CompositeParser.parse(CompositeParser.java:281)
    at o.a.t.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
    at o.a.t.parser.RecursiveParserWrapper.parse(RecursiveParserWrapper.java:159)
    at o.a.t.batch.FileResourceConsumer.parse(FileResourceConsumer.java:410)
    at o.a.t.batch.fs.RecursiveParserWrapperFSConsumer.processFileResource(RecursiveParserWrapperFSConsumer.java:104)
    at o.a.t.batch.FileResourceConsumer._processFileResource(FileResourceConsumer.java:182)
    at o.a.t.batch.FileResourceConsumer.call(FileResourceConsumer.java:115)
    at o.a.t.batch.FileResourceConsumer.call(FileResourceConsumer.java:49)
    at java.util.concurrent.FutureTask.run(FutureTask.java:262)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
    at java.util.concurrent.FutureTask.run(FutureTask.java:262)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException
    at org.apache.pdfbox.cos.COSFloat.<init>(COSFloat.java:62)
    at org.apache.pdfbox.cos.COSNumber.get(COSNumber.java:116)
    at org.apache.pdfbox.pdfparser.BaseParser.parseDirObject(BaseParser.java:1348)
    at org.apache.pdfbox.pdfparser.BaseParser.parseCOSArray(BaseParser.java:1066)
    at org.apache.pdfbox.pdfparser.PDFStreamParser.parseNextToken(PDFStreamParser.java:276)
    at org.apache.pdfbox.pdfparser.PDFStreamParser.parseNextToken(PDFStreamParser.java:376)
    at org.apache.pdfbox.pdfparser.PDFStreamParser.access$000(PDFStreamParser.java:49)
    at org.apache.pdfbox.pdfparser.PDFStreamParser$1.tryNext(PDFStreamParser.java:193)
    ... 24 more

1

java.lang.RuntimeException
    at org.apache.pdfbox.pdfparser.PDFStreamParser$1.tryNext(PDFStreamParser.java:198)
    at org.apache.pdfbox.pdfparser.PDFStreamParser$1.hasNext(PDFStreamParser.java:205)
    at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:255)
    at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:235)
    at org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:215)
    at org.apache.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:460)
    at org.apache.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:385)
    at org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:344)
    at o.a.t.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:141)
    at o.a.t.parser.pdf.PDFParser.parse(PDFParser.java:148)
    at o.a.t.parser.CompositeParser.parse(CompositeParser.java:281)
    at o.a.t.parser.CompositeParser.parse(CompositeParser.java:281)
    at o.a.t.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
    at o.a.t.parser.RecursiveParserWrapper.parse(RecursiveParserWrapper.java:159)
    at o.a.t.batch.FileResourceConsumer.parse(FileResourceConsumer.java:410)
    at o.a.t.batch.fs.RecursiveParserWrapperFSConsumer.processFileResource(RecursiveParserWrapperFSConsumer.java:104)
    at o.a.t.batch.FileResourceConsumer._processFileResource(FileResourceConsumer.java:182)
    at o.a.t.batch.FileResourceConsumer.call(FileResourceConsumer.java:115)
    at o.a.t.batch.FileResourceConsumer.call(FileResourceConsumer.java:49)
    at java.util.concurrent.FutureTask.run(FutureTask.java:262)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
    at java.util.concurrent.FutureTask.run(FutureTask.java:262)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException
    at org.apache.pdfbox.pdfparser.BaseParser.parseDirObject(BaseParser.java:1289)
    at org.apache.pdfbox.pdfparser.BaseParser.parseCOSArray(BaseParser.java:1066)
    at org.apache.pdfbox.pdfparser.PDFStreamParser.parseNextToken(PDFStreamParser.java:276)
    at org.apache.pdfbox.pdfparser.PDFStreamParser.access$000(PDFStreamParser.java:49)
    at org.apache.pdfbox.pdfparser.PDFStreamParser$1.tryNext(PDFStreamParser.java:193)
    ... 24 more

1

java.io.IOException
    at org.apache.pdfbox.pdfparser.BaseParser.parseCOSDictionaryValue(BaseParser.java:260)
    at org.apache.pdfbox.pdfparser.BaseParser.parseCOSDictionary(BaseParser.java:356)
    at org.apache.pdfbox.pdfparser.BaseParser.parseDirObject(BaseParser.java:1264)
    at org.apache.pdfbox.pdfparser.PDFObjectStreamParser.parse(PDFObjectStreamParser.java:106)
    at org.apache.pdfbox.cos.COSDocument.dereferenceObjectStreams(COSDocument.java:683)
    at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:255)
    at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1239)
    at o.a.t.parser.pdf.PDFParser.parse(PDFParser.java:125)
    at o.a.t.parser.CompositeParser.parse(CompositeParser.java:281)
    at o.a.t.parser.CompositeParser.parse(CompositeParser.java:281)
    at o.a.t.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
    at o.a.t.parser.RecursiveParserWrapper.parse(RecursiveParserWrapper.java:159)
    at o.a.t.batch.FileResourceConsumer.parse(FileResourceConsumer.java:410)
    at o.a.t.batch.fs.RecursiveParserWrapperFSConsumer.processFileResource(RecursiveParserWrapperFSConsumer.java:104)
    at o.a.t.batch.FileResourceConsumer._processFileResource(FileResourceConsumer.java:182)
    at o.a.t.batch.FileResourceConsumer.call(FileResourceConsumer.java:115)
    at o.a.t.batch.FileResourceConsumer.call(FileResourceConsumer.java:49)
    at java.util.concurrent.FutureTask.run(FutureTask.java:262)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
    at java.util.concurrent.FutureTask.run(FutureTask.java:262)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)

1

java.lang.RuntimeException
    at org.apache.pdfbox.pdfparser.PDFStreamParser$1.tryNext(PDFStreamParser.java:198)
    at org.apache.pdfbox.pdfparser.PDFStreamParser$1.hasNext(PDFStreamParser.java:205)
    at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:255)
    at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:235)
    at org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:215)
    at org.apache.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:460)
    at org.apache.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:385)
    at org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:344)
    at o.a.t.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:141)
    at o.a.t.parser.pdf.PDFParser.parse(PDFParser.java:148)
    at o.a.t.parser.CompositeParser.parse(CompositeParser.java:281)
    at o.a.t.parser.CompositeParser.parse(CompositeParser.java:281)
    at o.a.t.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
    at o.a.t.parser.RecursiveParserWrapper.parse(RecursiveParserWrapper.java:159)
    at o.a.t.batch.FileResourceConsumer.parse(FileResourceConsumer.java:410)
    at o.a.t.batch.fs.RecursiveParserWrapperFSConsumer.processFileResource(RecursiveParserWrapperFSConsumer.java:104)
    at o.a.t.batch.FileResourceConsumer._processFileResource(FileResourceConsumer.java:182)
    at o.a.t.batch.FileResourceConsumer.call(FileResourceConsumer.java:115)
    at o.a.t.batch.FileResourceConsumer.call(FileResourceConsumer.java:49)
    at java.util.concurrent.FutureTask.run(FutureTask.java:262)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
    at java.util.concurrent.FutureTask.run(FutureTask.java:262)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException
    at org.apache.pdfbox.pdfparser.BaseParser.parseDirObject(BaseParser.java:1316)
    at org.apache.pdfbox.pdfparser.BaseParser.parseCOSArray(BaseParser.java:1066)
    at org.apache.pdfbox.pdfparser.PDFStreamParser.parseNextToken(PDFStreamParser.java:276)
    at org.apache.pdfbox.pdfparser.PDFStreamParser.access$000(PDFStreamParser.java:49)
    at org.apache.pdfbox.pdfparser.PDFStreamParser$1.tryNext(PDFStreamParser.java:193)
    ... 24 more

1

java.lang.RuntimeException
    at org.apache.pdfbox.pdfparser.PDFStreamParser$1.tryNext(PDFStreamParser.java:198)
    at org.apache.pdfbox.pdfparser.PDFStreamParser$1.hasNext(PDFStreamParser.java:205)
    at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:255)
    at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:235)
    at org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:215)
    at org.apache.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:460)
    at org.apache.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:385)
    at org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:344)
    at o.a.t.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:141)
    at o.a.t.parser.pdf.PDFParser.parse(PDFParser.java:148)
    at o.a.t.parser.CompositeParser.parse(CompositeParser.java:281)
    at o.a.t.parser.CompositeParser.parse(CompositeParser.java:281)
    at o.a.t.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
    at o.a.t.parser.RecursiveParserWrapper.parse(RecursiveParserWrapper.java:159)
    at o.a.t.batch.FileResourceConsumer.parse(FileResourceConsumer.java:410)
    at o.a.t.batch.fs.RecursiveParserWrapperFSConsumer.processFileResource(RecursiveParserWrapperFSConsumer.java:104)
    at o.a.t.batch.FileResourceConsumer._processFileResource(FileResourceConsumer.java:182)
    at o.a.t.batch.FileResourceConsumer.call(FileResourceConsumer.java:115)
    at o.a.t.batch.FileResourceConsumer.call(FileResourceConsumer.java:49)
    at java.util.concurrent.FutureTask.run(FutureTask.java:262)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
    at java.util.concurrent.FutureTask.run(FutureTask.java:262)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException
    at org.apache.pdfbox.cos.COSFloat.<init>(COSFloat.java:62)
    at org.apache.pdfbox.cos.COSNumber.get(COSNumber.java:116)
    at org.apache.pdfbox.pdfparser.BaseParser.parseDirObject(BaseParser.java:1348)
    at org.apache.pdfbox.pdfparser.BaseParser.parseCOSArray(BaseParser.java:1066)
    at org.apache.pdfbox.pdfparser.BaseParser.parseDirObject(BaseParser.java:1275)
    at org.apache.pdfbox.pdfparser.BaseParser.parseCOSArray(BaseParser.java:1066)
    at org.apache.pdfbox.pdfparser.BaseParser.parseDirObject(BaseParser.java:1275)
    at org.apache.pdfbox.pdfparser.BaseParser.parseCOSArray(BaseParser.java:1066)
    at org.apache.pdfbox.pdfparser.PDFStreamParser.parseNextToken(PDFStreamParser.java:276)
    at org.apache.pdfbox.pdfparser.PDFStreamParser.access$000(PDFStreamParser.java:49)
    at org.apache.pdfbox.pdfparser.PDFStreamParser$1.tryNext(PDFStreamParser.java:193)
    ... 24 more

1

_____

RE: Memory issues with PDF parser

Posted by "Allison, Timothy B." <ta...@mitre.org>.
1)      Thank you for the color...I apparently needed it...and sorry for missing that earlier.   I think profiling is the only option.  Although, you might want to write a simple directory crawler/test harness and see if you have the same issues with pure PDFBox; if you go to that extent, you might as well also try PDFBox 2.0/trunk which is quite a bit different under the hood.  As a side note, when I do a batch run against govdocs1, with the exception of a handful of files, I'm ok with -Xmx5g, 10 threads and 250k PDFs.  So, I'd be interested in learning what's different about your PDFs and where the memory load is.  How much heap do you have available to your jvm?  Is there anything else in memory in your jvm or on the same box?  I'm hoping that your db isn't in shared memory?

2)      Text.  Right.  If a PDF document has, say, an MSWord file attached, do you want to get the text from that document as well?  If so, use the AutoDetectParser, and make sure to set the Parser.class in the ParseContext.


From: Mouthgalya Ganapathy [mailto:mouthgalya.ganapathy@fitchratings.com]
Sent: Thursday, June 04, 2015 4:36 PM
To: Allison, Timothy B.
Cc: user@tika.apache.org; Sauparna Sarkar
Subject: RE: Memory issues with PDF parser

Hi Timothy,


1.)    The interesting thing is that if we parse the pdf individually  there is no problem. Only when we do it for many pdf's in bulk, some of the pdf files get the issue. I tried it with pure tika app command line for the same file that erred with the IOexception. It was able to extract it fine.

2.)    Other question: Are you sure that you want to avoid parsing attachments?  -- I don't understand your question. Is it the attachment within the pdf like images, links etc..?

Our objective is very simple . we just want to extract the text content from the pdf. We don't want images , graphs etc.. If you know of some alternative simple ways that's fine too.

Given below is the stack trace after using Tika 1.9-SNAPSHOT. Its throwing "WrappedIOException". I have highlighted the key exceptions. You can also see the " java.lang.OutOfMemoryError: Java heap space" below. The size of this pdf file was 4400KB.

[Server:research-etl-server] 15:48:51,716 INFO  [stdout] (Thread-0 (HornetQ-client-global-threads-1087933741)) Exception in updating docbody for report ==> RPT_749557
[Server:research-etl-server] 15:48:55,408 SEVERE [class com.fitch.researchapi.dao.ResearchReportMDAO] (Thread-0 (HornetQ-client-global-threads-1087933741)) null: org.apache.pdfbox.exceptions.WrappedIOException
[Server:research-etl-server]       at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:278) [tika-server-1.9-SNAPSHOT.jar:1.9-SNAPSHOT]
[Server:research-etl-server]       at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1239) [tika-server-1.9-SNAPSHOT.jar:1.9-SNAPSHOT]
[Server:research-etl-server]       at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:125) [tika-parsers-1.9-SNAPSHOT.jar:1.9-SNAPSHOT]
[Server:research-etl-server]       at com.fitch.researchapi.dao.ResearchReportMDAO.updateDocBody(ResearchReportMDAO.java:890) [RESEARCH_API-1.0.27-SNAPSHOT.jar:]
[Server:research-etl-server]       at com.fitch.researchapi.dao.ResearchReportMDAO.loadFile_NEW(ResearchReportMDAO.java:986) [RESEARCH_API-1.0.27-SNAPSHOT.jar:]
[Server:research-etl-server]       at com.fitch.researchapi.dao.ResearchReportMDAO.upsert_NEW(ResearchReportMDAO.java:679) [RESEARCH_API-1.0.27-SNAPSHOT.jar:]
[Server:research-etl-server]       at com.fitch.research.ejb.ResearchReportManagerBean.processResearchReport(ResearchReportManagerBean.java:70) [research-ejb.jar:]
[Server:research-etl-server]       at sun.reflect.GeneratedMethodAccessor36.invoke(Unknown Source) [:1.6.0_29]
[Server:research-etl-server]       at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) [rt.jar:1.6.0_29]
[Server:research-etl-server]       at java.lang.reflect.Method.invoke(Method.java:597) [rt.jar:1.6.0_29]
[Server:research-etl-server]       at org.jboss.as.ee.component.ManagedReferenceMethodInterceptorFactory$ManagedReferenceMethodInterceptor.processInvocation(ManagedReferenceMethodInterceptorFactory.java:72) [jboss-as-ee-7.1.1.Final.jar:7.1.1.Final]
[Server:research-etl-server]       at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation-1.1.1.Final.jar:1.1.1.Final]
[Server:research-etl-server]       at org.jboss.invocation.WeavedInterceptor.processInvocation(WeavedInterceptor.java:53) [jboss-invocation-1.1.1.Final.jar:1.1.1.Final]
[Server:research-etl-server]       at org.jboss.as.ee.component.interceptors.UserInterceptorFactory$1.processInvocation(UserInterceptorFactory.java:36) [jboss-as-ee-7.1.1.Final.jar:7.1.1.Final]
[Server:research-etl-server]       at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation-1.1.1.Final.jar:1.1.1.Final]
[Server:research-etl-server]       at org.jboss.as.jpa.interceptor.SBInvocationInterceptor.processInvocation(SBInvocationInterceptor.java:47)
[Server:research-etl-server]       at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation-1.1.1.Final.jar:1.1.1.Final]
[Server:research-etl-server]       at org.jboss.invocation.InitialInterceptor.processInvocation(InitialInterceptor.java:21) [jboss-invocation-1.1.1.Final.jar:1.1.1.Final]
[Server:research-etl-server]       at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation-1.1.1.Final.jar:1.1.1.Final]
[Server:research-etl-server]       at org.jboss.invocation.ChainedInterceptor.processInvocation(ChainedInterceptor.java:61) [jboss-invocation-1.1.1.Final.jar:1.1.1.Final]
[Server:research-etl-server]       at org.jboss.as.ee.component.interceptors.ComponentDispatcherInterceptor.processInvocation(ComponentDispatcherInterceptor.java:53) [jboss-as-ee-7.1.1.Final.jar:7.1.1.Final]
[Server:research-etl-server]       at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation-1.1.1.Final.jar:1.1.1.Final]
[Server:research-etl-server]       at org.jboss.as.ejb3.component.pool.PooledInstanceInterceptor.processInvocation(PooledInstanceInterceptor.java:51) [jboss-as-ejb3-7.1.1.Final.jar:7.1.1.Final]
[Server:research-etl-server]       at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation-1.1.1.Final.jar:1.1.1.Final]
[Server:research-etl-server]       at org.jboss.as.ejb3.tx.CMTTxInterceptor.invokeInCallerTx(CMTTxInterceptor.java:202) [jboss-as-ejb3-7.1.1.Final.jar:7.1.1.Final]
[Server:research-etl-server]       at org.jboss.as.ejb3.tx.CMTTxInterceptor.required(CMTTxInterceptor.java:306) [jboss-as-ejb3-7.1.1.Final.jar:7.1.1.Final]
[Server:research-etl-server]       at org.jboss.as.ejb3.tx.CMTTxInterceptor.processInvocation(CMTTxInterceptor.java:190) [jboss-as-ejb3-7.1.1.Final.jar:7.1.1.Final]
[Server:research-etl-server]       at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation-1.1.1.Final.jar:1.1.1.Final]
[Server:research-etl-server]       at org.jboss.as.ejb3.component.interceptors.CurrentInvocationContextInterceptor.processInvocation(CurrentInvocationContextInterceptor.java:41) [jboss-as-ejb3-7.1.1.Final.jar:7.1.1.Final]
[Server:research-etl-server]       at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation-1.1.1.Final.jar:1.1.1.Final]
[Server:research-etl-server]       at org.jboss.as.ejb3.component.interceptors.LoggingInterceptor.processInvocation(LoggingInterceptor.java:59) [jboss-as-ejb3-7.1.1.Final.jar:7.1.1.Final]
[Server:research-etl-server]       at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation-1.1.1.Final.jar:1.1.1.Final]
[Server:research-etl-server]       at org.jboss.as.ee.component.NamespaceContextInterceptor.processInvocation(NamespaceContextInterceptor.java:50) [jboss-as-ee-7.1.1.Final.jar:7.1.1.Final]
[Server:research-etl-server]       at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation-1.1.1.Final.jar:1.1.1.Final]
[Server:research-etl-server]       at org.jboss.as.ejb3.component.interceptors.AdditionalSetupInterceptor.processInvocation(AdditionalSetupInterceptor.java:32) [jboss-as-ejb3-7.1.1.Final.jar:7.1.1.Final]
[Server:research-etl-server]       at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation-1.1.1.Final.jar:1.1.1.Final]
[Server:research-etl-server]       at org.jboss.as.ee.component.TCCLInterceptor.processInvocation(TCCLInterceptor.java:45) [jboss-as-ee-7.1.1.Final.jar:7.1.1.Final]
[Server:research-etl-server]       at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation-1.1.1.Final.jar:1.1.1.Final]
[Server:research-etl-server]       at org.jboss.invocation.ChainedInterceptor.processInvocation(ChainedInterceptor.java:61) [jboss-invocation-1.1.1.Final.jar:1.1.1.Final]
[Server:research-etl-server]       at org.jboss.as.ee.component.ViewService$View.invoke(ViewService.java:165) [jboss-as-ee-7.1.1.Final.jar:7.1.1.Final]
[Server:research-etl-server]       at org.jboss.as.ee.component.ViewDescription$1.processInvocation(ViewDescription.java:173) [jboss-as-ee-7.1.1.Final.jar:7.1.1.Final]
[Server:research-etl-server]       at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation-1.1.1.Final.jar:1.1.1.Final]
[Server:research-etl-server]       at org.jboss.invocation.ChainedInterceptor.processInvocation(ChainedInterceptor.java:61) [jboss-invocation-1.1.1.Final.jar:1.1.1.Final]
[Server:research-etl-server]       at org.jboss.as.ee.component.ProxyInvocationHandler.invoke(ProxyInvocationHandler.java:72) [jboss-as-ee-7.1.1.Final.jar:7.1.1.Final]
[Server:research-etl-server]       at com.fitch.research.ejb.ResearchReportManagerBeanLocal$$$view4.processResearchReport(Unknown Source) [research-ejb.jar:]
[Server:research-etl-server]       at com.fitch.research.ejb.mdb.ResearchQueueManagerMDB.onMessage(ResearchQueueManagerMDB.java:150) [research-ejb.jar:]
[Server:research-etl-server]       at sun.reflect.GeneratedMethodAccessor34.invoke(Unknown Source) [:1.6.0_29]
[Server:research-etl-server]       at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) [rt.jar:1.6.0_29]
[Server:research-etl-server]       at java.lang.reflect.Method.invoke(Method.java:597) [rt.jar:1.6.0_29]
[Server:research-etl-server]       at org.jboss.as.ee.component.ManagedReferenceMethodInterceptorFactory$ManagedReferenceMethodInterceptor.processInvocation(ManagedReferenceMethodInterceptorFactory.java:72) [jboss-as-ee-7.1.1.Final.jar:7.1.1.Final]
[Server:research-etl-server]       at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation-1.1.1.Final.jar:1.1.1.Final]
[Server:research-etl-server]       at org.jboss.invocation.WeavedInterceptor.processInvocation(WeavedInterceptor.java:53) [jboss-invocation-1.1.1.Final.jar:1.1.1.Final]
[Server:research-etl-server]       at org.jboss.as.ee.component.interceptors.UserInterceptorFactory$1.processInvocation(UserInterceptorFactory.java:36) [jboss-as-ee-7.1.1.Final.jar:7.1.1.Final]
[Server:research-etl-server]       at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation-1.1.1.Final.jar:1.1.1.Final]
[Server:research-etl-server]       at org.jboss.invocation.InitialInterceptor.processInvocation(InitialInterceptor.java:21) [jboss-invocation-1.1.1.Final.jar:1.1.1.Final]
[Server:research-etl-server]       at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation-1.1.1.Final.jar:1.1.1.Final]
[Server:research-etl-server]       at org.jboss.invocation.ChainedInterceptor.processInvocation(ChainedInterceptor.java:61) [jboss-invocation-1.1.1.Final.jar:1.1.1.Final]
[Server:research-etl-server]       at org.jboss.as.ee.component.interceptors.ComponentDispatcherInterceptor.processInvocation(ComponentDispatcherInterceptor.java:53) [jboss-as-ee-7.1.1.Final.jar:7.1.1.Final]
[Server:research-etl-server]       at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation-1.1.1.Final.jar:1.1.1.Final]
[Server:research-etl-server]       at org.jboss.as.ejb3.component.pool.PooledInstanceInterceptor.processInvocation(PooledInstanceInterceptor.java:51) [jboss-as-ejb3-7.1.1.Final.jar:7.1.1.Final]
[Server:research-etl-server]       at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation-1.1.1.Final.jar:1.1.1.Final]
[Server:research-etl-server]       at org.jboss.as.ejb3.tx.CMTTxInterceptor.invokeInCallerTx(CMTTxInterceptor.java:202) [jboss-as-ejb3-7.1.1.Final.jar:7.1.1.Final]
[Server:research-etl-server]       at org.jboss.as.ejb3.tx.CMTTxInterceptor.required(CMTTxInterceptor.java:306) [jboss-as-ejb3-7.1.1.Final.jar:7.1.1.Final]
[Server:research-etl-server]       at org.jboss.as.ejb3.tx.CMTTxInterceptor.processInvocation(CMTTxInterceptor.java:190) [jboss-as-ejb3-7.1.1.Final.jar:7.1.1.Final]
[Server:research-etl-server]       at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation-1.1.1.Final.jar:1.1.1.Final]
[Server:research-etl-server]       at org.jboss.as.ejb3.component.interceptors.CurrentInvocationContextInterceptor.processInvocation(CurrentInvocationContextInterceptor.java:41) [jboss-as-ejb3-7.1.1.Final.jar:7.1.1.Final]
[Server:research-etl-server]       at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation-1.1.1.Final.jar:1.1.1.Final]
[Server:research-etl-server]       at org.jboss.as.ejb3.component.interceptors.LoggingInterceptor.processInvocation(LoggingInterceptor.java:59) [jboss-as-ejb3-7.1.1.Final.jar:7.1.1.Final]
[Server:research-etl-server]       at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation-1.1.1.Final.jar:1.1.1.Final]
[Server:research-etl-server]       at org.jboss.as.ee.component.NamespaceContextInterceptor.processInvocation(NamespaceContextInterceptor.java:50) [jboss-as-ee-7.1.1.Final.jar:7.1.1.Final]
[Server:research-etl-server]       at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation-1.1.1.Final.jar:1.1.1.Final]
[Server:research-etl-server]       at org.jboss.as.ejb3.component.interceptors.AdditionalSetupInterceptor.processInvocation(AdditionalSetupInterceptor.java:43) [jboss-as-ejb3-7.1.1.Final.jar:7.1.1.Final]
[Server:research-etl-server]       at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation-1.1.1.Final.jar:1.1.1.Final]
[Server:research-etl-server]       at org.jboss.as.ejb3.component.messagedriven.MessageDrivenComponentDescription$5$1.processInvocation(MessageDrivenComponentDescription.java:184) [jboss-as-ejb3-7.1.1.Final.jar:7.1.1.Final]
[Server:research-etl-server]       at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation-1.1.1.Final.jar:1.1.1.Final]
[Server:research-etl-server]       at org.jboss.as.ee.component.TCCLInterceptor.processInvocation(TCCLInterceptor.java:45) [jboss-as-ee-7.1.1.Final.jar:7.1.1.Final]
[Server:research-etl-server]       at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation-1.1.1.Final.jar:1.1.1.Final]
[Server:research-etl-server]       at org.jboss.invocation.ChainedInterceptor.processInvocation(ChainedInterceptor.java:61) [jboss-invocation-1.1.1.Final.jar:1.1.1.Final]
[Server:research-etl-server]       at org.jboss.as.ee.component.ViewService$View.invoke(ViewService.java:165) [jboss-as-ee-7.1.1.Final.jar:7.1.1.Final]
[Server:research-etl-server]       at org.jboss.as.ee.component.ViewDescription$1.processInvocation(ViewDescription.java:173) [jboss-as-ee-7.1.1.Final.jar:7.1.1.Final]
[Server:research-etl-server]       at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation-1.1.1.Final.jar:1.1.1.Final]
[Server:research-etl-server]       at org.jboss.invocation.ChainedInterceptor.processInvocation(ChainedInterceptor.java:61) [jboss-invocation-1.1.1.Final.jar:1.1.1.Final]
[Server:research-etl-server]       at org.jboss.as.ee.component.ProxyInvocationHandler.invoke(ProxyInvocationHandler.java:72) [jboss-as-ee-7.1.1.Final.jar:7.1.1.Final]
[Server:research-etl-server]       at javax.jms.MessageListener$$$view6.onMessage(Unknown Source) [jboss-jms-api_1.1_spec-1.0.0.Final.jar:1.0.0.Final]
[Server:research-etl-server]       at sun.reflect.GeneratedMethodAccessor33.invoke(Unknown Source) [:1.6.0_29]
[Server:research-etl-server]       at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) [rt.jar:1.6.0_29]
[Server:research-etl-server]       at java.lang.reflect.Method.invoke(Method.java:597) [rt.jar:1.6.0_29]
[Server:research-etl-server]       at org.jboss.as.ejb3.inflow.MessageEndpointInvocationHandler.doInvoke(MessageEndpointInvocationHandler.java:140) [jboss-as-ejb3-7.1.1.Final.jar:7.1.1.Final]
[Server:research-etl-server]       at org.jboss.as.ejb3.inflow.AbstractInvocationHandler.invoke(AbstractInvocationHandler.java:73) [jboss-as-ejb3-7.1.1.Final.jar:7.1.1.Final]
[Server:research-etl-server]       at $Proxy57.onMessage(Unknown Source)         at org.hornetq.ra.inflow.HornetQMessageHandler.onMessage(HornetQMessageHandler.java:278)
[Server:research-etl-server]       at org.hornetq.core.client.impl.ClientConsumerImpl.callOnMessage(ClientConsumerImpl.java:983)
[Server:research-etl-server]       at org.hornetq.core.client.impl.ClientConsumerImpl.access$400(ClientConsumerImpl.java:48)
[Server:research-etl-server]       at org.hornetq.core.client.impl.ClientConsumerImpl$Runner.run(ClientConsumerImpl.java:1113)
[Server:research-etl-server]       at org.hornetq.utils.OrderedExecutorFactory$OrderedExecutor$1.run(OrderedExecutorFactory.java:100)
[Server:research-etl-server]       at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) [rt.jar:1.6.0_29]
[Server:research-etl-server]       at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) [rt.jar:1.6.0_29]
[Server:research-etl-server]       at java.lang.Thread.run(Thread.java:662) [rt.jar:1.6.0_29]
[Server:research-etl-server] Caused by: java.lang.OutOfMemoryError: Java heap space
[Server:research-etl-server]


Thanks,
Mouthgalya Ganapathy
Product Development Team
From: Allison, Timothy B. [mailto:tallison@mitre.org]
Sent: Thursday, June 04, 2015 2:37 PM
To: Mouthgalya Ganapathy
Cc: user@tika.apache.org; Sauparna Sarkar
Subject: RE: Memory issues with PDF parser

You will get the same exception.  If you run the pure Tika app commandline on a triggering file, does it at least show you the "caused by" clause that might give more information?

Other question: Are you sure that you want to avoid parsing attachments?


From: Mouthgalya Ganapathy [mailto:mouthgalya.ganapathy@fitchratings.com]
Sent: Thursday, June 04, 2015 2:55 PM
To: Allison, Timothy B.
Cc: user@tika.apache.org<ma...@tika.apache.org>; Sauparna Sarkar
Subject: RE: Memory issues with PDF parser

Thanks for the update Timothy,
I see that Tika 1.9.-SNAPSHOT is available in maven repo. I am going to try that and  will use TikaInputStreams. I will update the results.

Given below is the IO exception that I get when I use Autoparser to extract pdf contents. I had used Tika 1.6. and pdfbox 1.8.9. I am guessing I will get the same/similar exception when I am going to run it with 1.9-SNAPSHOT.

1:27:53,921 WARN  [org.hornetq.core.client.impl.ClientSessionImpl] (Thread-4 (HornetQ-client-global-threads-248507153)) resetting session after failure
[Server:research-etl-server] 21:29:16,314 INFO  [stdout] (Thread-12 (HornetQ-client-global-threads-248507153)) Exception in updating docbody for report ==> RPT_720610
[Server:research-etl-server] 21:29:23,817 ERROR [stderr] (Thread-12 (HornetQ-client-global-threads-248507153)) org.apache.tika.exception.TikaException: TIKA-198: Illegal IOException from org.apache.tika.parser.pdf.PDFParser@29fe5969<ma...@29fe5969>
[Server:research-etl-server] 21:29:23,818 ERROR [stderr] (Thread-12 (HornetQ-client-global-threads-248507153)) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:250)
[Server:research-etl-server] 21:29:23,818 ERROR [stderr] (Thread-12 (HornetQ-client-global-threads-248507153)) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:244)
[Server:research-etl-server] 21:29:23,820 ERROR [stderr] (Thread-12 (HornetQ-client-global-threads-248507153)) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:121)
[Server:research-etl-server] 21:29:23,820 ERROR [stderr] (Thread-12 (HornetQ-client-global-threads-248507153)) at com.fitch.researchapi.dao.ResearchReportMDAO.updateDocBody(ResearchReportMDAO.java:888)
[Server:research-etl-server] 21:29:23,820 ERROR [stderr] (Thread-12 (HornetQ-client-global-threads-248507153)) at com.fitch.researchapi.dao.ResearchReportMDAO.loadFile_NEW(ResearchReportMDAO.java:983)
[Server:research-etl-server] 21:29:23,821 ERROR [stderr] (Thread-12 (HornetQ-client-global-threads-248507153)) at com.fitch.researchapi.dao.ResearchReportMDAO.upsert_NEW(ResearchReportMDAO.java:678)
[Server:research-etl-server] 21:29:23,821 ERROR [stderr] (Thread-12 (HornetQ-client-global-threads-248507153)) at com.fitch.research.ejb.ResearchReportManagerBean.processResearchReport(ResearchReportManagerBean.java:70)
[Server:research-etl-server] 21:29:23,822 ERROR [stderr] (Thread-12 (HornetQ-client-global-threads-248507153)) at sun.reflect.GeneratedMethodAccessor38.invoke(Unknown Source)
[Server:research-etl-server] 21:29:23,822 WARN  [org.hornetq.core.server.impl.ServerSessionImpl] (hornetq-failure-check-thread) Cleared up resources for session dc692df4-0a50-11e5-8aa3-005056900299
[Server:research-etl-server] 21:29:23,822 ERROR [stderr] (Thread-12 (HornetQ-client-global-threads-248507153)) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
[Server:research-etl-server] 21:29:23,823 ERROR [stderr] (Thread-12 (HornetQ-client-global-threads-248507153)) at java.lang.reflect.Method.invoke(Method.java:597)
[Server:research-etl-server] 21:29:23,823 ERROR [stderr] (Thread-12 (HornetQ-client-global-threads-248507153)) at org.jboss.as.ee.component.ManagedReferenceMethodInterceptorFactory$ManagedReferenceMethodInterceptor.processInvocation(ManagedReferenceMethodInterceptorFactory.java:72)
[Server:research-etl-server] 21:29:23,823 ERROR [stderr] (Thread-12 (HornetQ-client-global-threads-248507153)) at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288)
[Server:research-etl-server] 21:29:23,824 ERROR [stderr] (Thread-12 (HornetQ-client-global-threads-248507153)) at org.jboss.invocation.WeavedInterceptor.processInvocation(WeavedInterceptor.java:53)
[Server:research-etl-server] 21:29:23,824 ERROR [stderr] (Thread-12 (HornetQ-client-global-threads-248507153)) at org.jboss.as.ee.component.interceptors.UserInterceptorFactory$1.processInvocation(UserInterceptorFactory.java:36)
[Server:research-etl-server] 21:29:23,824 ERROR [stderr] (Thread-12 (HornetQ-client-global-threads-248507153)) at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288)



Thanks,
Mouthgalya Ganapathy
Product Development Team
From: Allison, Timothy B. [mailto:tallison@mitre.org]
Sent: Thursday, June 04, 2015 12:50 PM
To: Mouthgalya Ganapathy
Cc: user@tika.apache.org<ma...@tika.apache.org>; Sauparna Sarkar
Subject: RE: Memory issues with PDF parser


1)      Right, the npe is caused by the exception returning null when we call getMessage().  In TIKA-1605, we modified all code in the project to check for null returned by getMessage().  So, in the "fixed" version, you'll still get your good old IOException.  I can't tell from your stacktrace what caused the IOException.

2)      Y, regular builds of 1.9's app (and other modules) are available via Jenkins here: https://builds.apache.org/view/Tika/job/tika-trunk-jdk1.7/org.apache.tika$tika-app/

3)      Ok, makes sense.

For kicks, you may want to change opening the file to:
is = TikaInputStream.get(file)
or maybe:
is = TikaInputStream.get(file, metadata)

And you'll want to surround your closing of the IS in a try/catch block.  Or use IOUtils.closeQuietly.

Finally, are you able to share the particular file that caused the IOException?
From: Mouthgalya Ganapathy [mailto:mouthgalya.ganapathy@fitchratings.com]
Sent: Thursday, June 04, 2015 10:20 AM
To: Allison, Timothy B.; tallison@apache.org<ma...@apache.org>
Cc: user@tika.apache.org<ma...@tika.apache.org>; Sauparna Sarkar
Subject: RE: Memory issues with PDF parser

Hi Timothy,
Thanks for the prompt reply.


1.)    Wouldn't fixing the null pointer exception in turn throw the IO exception? I saw that the null pointer exception was thrown inside the catch block of the IO exception? Any root cause for the IO exception??.

Is that also fixed?



I am including the code that threw the null pointer exception in tike 1.8



Exception:
10:53:12,218 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129)) java.lang.NullPointerException
10:53:12,219 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:158)



Code in the pdf parser:
catch (IOException e) {
            //nonseq parser throws IOException for bad password
            //At the Tika level, we want the same exception to be thrown
            if (e.getMessage().contains("Error (CryptographyException)")) {
                metadata.set("pdf:encrypted", Boolean.toString(true));
                throw new EncryptedDocumentException(e);
            }


2.)    Do you have a snapshot or beta version of tika 1.9 that I could try with our pdf corpus? It would also help in your developer testing.

3.)    For the inline images, we have just set the defaults(which is to skip them as you had mentioned). I have not done any memory profiling till now. I will also try that.



Thanks,
MG

From: Allison, Timothy B. [mailto:tallison@mitre.org]
Sent: Thursday, June 04, 2015 7:19 AM
To: Mouthgalya Ganapathy; tallison@apache.org<ma...@apache.org>
Cc: user@tika.apache.org<ma...@tika.apache.org>
Subject: RE: Memory issues with PDF parser

Hi Mouthgalya,
  We fixed that NPE in https://issues.apache.org/jira/browse/TIKA-1605, and the fix will be available in Tika 1.9, which should be out within a week.
  As for memory issues, we worked around a memory leak in PDFBox with static caching of fonts for Tika 1.7 (may have been 1.8), but there may be others.  One potential memory hog is the processing of inline images within PDFs...have you configured Tika to pull those out (default is to skip them)?  Other than that, I'd recommend dropping a note to the PDFBox users list to get help in diagnosing memory consumption with PDFBox.  Have you tried any memory profiling?

          Best,

                    Tim

From: Mouthgalya Ganapathy [mailto:mouthgalya.ganapathy@fitchratings.com]
Sent: Wednesday, June 03, 2015 3:25 PM
To: tallison@apache.org<ma...@apache.org>
Subject: Memory issues with PDF parser

Hi all,
I am trying to use Apache tika 1.8 for extracting contents from pdf. I have the below code for extracting it. It works well for few files. But if I read many files , I see out of memory exception.
I also see a Null pointer exception in the pdf parser. I think the null pointer exception is because of the memory exception.
Any suggestions?

Tika version:
  <dependency>
                     <groupId>org.apache.tika</groupId>
                     <artifactId>tika-server</artifactId>
                     <version>1.8</version>
        </dependency>

I am running it as a part of J2EE APP in JBoss 1.7

Code:-

//Parse the pdf content using Apache Tikka
            InputStream is = null;
            try {
              is = new BufferedInputStream(new FileInputStream(input));
              //Disable write limit.
              contenthandler = new BodyContentHandler(-1);
               metadata = new Metadata();
              pdfparser = new PDFParser();
              context = new ParseContext();
              pdfparser.parse(is, contenthandler, metadata, context);
              docBody=contenthandler.toString();
              //System.out.println(contenthandler.toString());
            }
            catch (Exception e) {
               System.out.println("Exception in updating docbody for report ==> " + report.getDocID());
               if(is==null)
                 System.out.println("The input stream is a null object");
               e.printStackTrace();
              logger.log(Level.SEVERE, e.getMessage(), e);
            }
            finally {
                if (is != null) is.close();
                contenthandler=null;
                metadata=null;
                pdfparser=null;
                context =null;
            }


Exception:-
I am just including the null pointer exception in the parser below.

10:53:11,696 INFO  [stdout] (Thread-11 (HornetQ-client-global-threads-1619682129)) Exception in updating docbody for report ==> RPT_764268
10:53:12,218 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129)) java.lang.NullPointerException
10:53:12,219 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:158)
10:53:12,219 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at com.fitch.researchapi.dao.ResearchReportMDAO.updateDocBody(ResearchReportMDAO.java:881)
10:53:12,219 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at com.fitch.researchapi.dao.ResearchReportMDAO.loadFile_NEW(ResearchReportMDAO.java:965)
10:53:12,220 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at com.fitch.researchapi.dao.ResearchReportMDAO.upsert_NEW(ResearchReportMDAO.java:676)
10:53:12,220 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at com.fitch.research.ejb.ResearchReportManagerBean.processResearchReport(ResearchReportManagerBean.java:70)
10:53:12,221 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at sun.reflect.GeneratedMethodAccessor35.invoke(Unknown Source)
10:53:12,221 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
10:53:12,222 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at java.lang.reflect.Method.invoke(Method.java:597)
10:53:12,222 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.as.ee.component.ManagedReferenceMethodInterceptorFactory$ManagedReferenceMethodInterceptor.processInvocation(ManagedReferenceMethodInterceptorFactory.java:72)
10:53:12,223 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288)
10:53:12,223 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.invocation.WeavedInterceptor.processInvocation(WeavedInterceptor.java:53)
10:53:12,223 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.as.ee.component.interceptors.UserInterceptorFactory$1.processInvocation(UserInterceptorFactory.java:36)
10:53:12,224 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288)
10:53:12,224 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.as.jpa.interceptor.SBInvocationInterceptor.processInvocation(SBInvocationInterceptor.java:47)
10:53:12,225 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288)
10:53:12,225 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.invocation.InitialInterceptor.processInvocation(InitialInterceptor.java:21)
10:53:12,226 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288)
10:53:12,226 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.invocation.ChainedInterceptor.processInvocation(ChainedInterceptor.java:61)
10:53:12,227 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.as.ee.component.interceptors.ComponentDispatcherInterceptor.processInvocation(ComponentDispatcherInterceptor.java:53)
10:53:12,227 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288)
10:53:12,228 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.as.ejb3.component.pool.PooledInstanceInterceptor.processInvocation(PooledInstanceInterceptor.java:51)
10:53:12,228 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288)
10:53:12,229 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.as.ejb3.tx.CMTTxInterceptor.invokeInCallerTx(CMTTxInterceptor.java:202)
10:53:12,229 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.as.ejb3.tx.CMTTxInterceptor.required(CMTTxInterceptor.java:306)
10:53:12,229 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.as.ejb3.tx.CMTTxInterceptor.processInvocation(CMTTxInterceptor.java:190)
10:53:12,230 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288)
10:53:12,230 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.as.ejb3.component.interceptors.CurrentInvocationContextInterceptor.processInvocation(CurrentInvocationContextInterceptor.java:41)
10:53:12,231 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288)
10:53:12,231 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.as.ejb3.component.interceptors.LoggingInterceptor.processInvocation(LoggingInterceptor.java:59)
10:53:12,231 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288)
10:53:12,232 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.as.ee.component.NamespaceContextInterceptor.processInvocation(NamespaceContextInterceptor.java:50)
10:53:12,232 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288)
10:53:12,233 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.as.ejb3.component.interceptors.AdditionalSetupInterceptor.processInvocation(AdditionalSetupInterceptor.java:32)
10:53:12,233 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288)
10:53:12,233 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.as.ee.component.TCCLInterceptor.processInvocation(TCCLInterceptor.java:45)
10:53:12,234 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288)
10:53:12,234 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.invocation.ChainedInterceptor.processInvocation(ChainedInterceptor.java:61)
10:53:12,235 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.as.ee.component.ViewService$View.invoke(ViewService.java:165)
10:53:12,235 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.as.ee.component.ViewDescription$1.processInvocation(ViewDescription.java:173)
10:53:12,235 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288)
10:53:12,236 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.invocation.ChainedInterceptor.processInvocation(ChainedInterceptor.java:61)
10:53:12,236 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.as.ee.component.ProxyInvocationHandler.invoke(ProxyInvocationHandler.java:72)
10:53:12,236 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at com.fitch.research.ejb.ResearchReportManagerBeanLocal$$$view4.processResearchReport(Unknown Source)
10:53:12,868 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at com.fitch.research.ejb.mdb.ResearchQueueManagerMDB.onMessage(ResearchQueueManagerMDB.java:150)
10:53:12,868 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at sun.reflect.GeneratedMethodAccessor34.invoke(Unknown Source)
10:53:12,869 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
10:53:12,869 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at java.lang.reflect.Method.invoke(Method.java:597)
10:53:12,870 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.as.ee.component.ManagedReferenceMethodInterceptorFactory$ManagedReferenceMethodInterceptor.processInvocation(ManagedReferenceMethodInterceptorFactory.java:72)
10:53:12,870 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288)
10:53:12,871 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.invocation.WeavedInterceptor.processInvocation(WeavedInterceptor.java:53)
10:53:12,871 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.as.ee.component.interceptors.UserInterceptorFactory$1.processInvocation(UserInterceptorFactory.java:36)
10:53:12,872 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288)
10:53:12,872 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.invocation.InitialInterceptor.processInvocation(InitialInterceptor.java:21)
10:53:12,872 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288)
10:53:12,873 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.invocation.ChainedInterceptor.processInvocation(ChainedInterceptor.java:61)
10:53:12,873 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.as.ee.component.interceptors.ComponentDispatcherInterceptor.processInvocation(ComponentDispatcherInterceptor.java:53)
10:53:12,874 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288)
10:53:12,874 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.as.ejb3.component.pool.PooledInstanceInterceptor.processInvocation(PooledInstanceInterceptor.java:51)
10:53:12,874 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288)
10:53:12,875 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.as.ejb3.tx.CMTTxInterceptor.invokeInCallerTx(CMTTxInterceptor.java:202)
10:53:12,875 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.as.ejb3.tx.CMTTxInterceptor.required(CMTTxInterceptor.java:306)
10:53:12,876 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.as.ejb3.tx.CMTTxInterceptor.processInvocation(CMTTxInterceptor.java:190)
10:53:12,876 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288)
10:53:12,876 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.as.ejb3.component.interceptors.CurrentInvocationContextInterceptor.processInvocation(CurrentInvocationContextInterceptor.java:41)
10:53:12,877 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288)
10:53:12,877 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.as.ejb3.component.interceptors.LoggingInterceptor.processInvocation(LoggingInterceptor.java:59)
10:53:12,878 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288)
10:53:12,878 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.as.ee.component.NamespaceContextInterceptor.processInvocation(NamespaceContextInterceptor.java:50)
10:53:12,878 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288)
10:53:12,879 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.as.ejb3.component.interceptors.AdditionalSetupInterceptor.processInvocation(AdditionalSetupInterceptor.java:43)
10:53:12,879 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288)
10:53:12,880 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.as.ejb3.component.messagedriven.MessageDrivenComponentDescription$5$1.processInvocation(MessageDrivenComponentDescription.java:184)
10:53:12,880 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288)
10:53:12,881 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.as.ee.component.TCCLInterceptor.processInvocation(TCCLInterceptor.java:45)
10:53:12,881 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288)
10:53:12,881 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.invocation.ChainedInterceptor.processInvocation(ChainedInterceptor.java:61)
10:53:12,882 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.as.ee.component.ViewService$View.invoke(ViewService.java:165)
10:53:12,883 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.as.ee.component.ViewDescription$1.processInvocation(ViewDescription.java:173)
10:53:12,883 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288)

Thanks,
MG
Product Development Team




______________________________________________________________________
Confidentiality Notice: The information contained in this e-mail and any attachment(s) is confidential and for the use of the addressee(s) only. If you are not the intended recipient of this e-mail, do not duplicate or redistribute it by any means. Please delete this e-mail and any attachment(s) and notify us immediately. Unauthorized use, reliance, disclosure or copying of the contents of this e-mail and any attachment(s), or any similar action, is strictly prohibited. Fitch Ratings reserves the right, to the extent permitted by applicable law, to retain, monitor and intercept e-mail messages both to and from its systems.

This e-mail has been scanned by the MessageLabs Email Security System. For more information, please visit http://www.messagelabs.com/email.
______________________________________________________________________

______________________________________________________________________
This email has been scanned by the Symantec Email Security.cloud service.
For more information please visit http://www.symanteccloud.com
______________________________________________________________________

______________________________________________________________________
Confidentiality Notice: The information contained in this e-mail and any attachment(s) is confidential and for the use of the addressee(s) only. If you are not the intended recipient of this e-mail, do not duplicate or redistribute it by any means. Please delete this e-mail and any attachment(s) and notify us immediately. Unauthorized use, reliance, disclosure or copying of the contents of this e-mail and any attachment(s), or any similar action, is strictly prohibited. Fitch Ratings reserves the right, to the extent permitted by applicable law, to retain, monitor and intercept e-mail messages both to and from its systems.

This e-mail has been scanned by the MessageLabs Email Security System. For more information, please visit http://www.messagelabs.com/email.
______________________________________________________________________

______________________________________________________________________
This email has been scanned by the Symantec Email Security.cloud service.
For more information please visit http://www.symanteccloud.com
______________________________________________________________________

______________________________________________________________________
Confidentiality Notice: The information contained in this e-mail and any attachment(s) is confidential and for the use of the addressee(s) only. If you are not the intended recipient of this e-mail, do not duplicate or redistribute it by any means. Please delete this e-mail and any attachment(s) and notify us immediately. Unauthorized use, reliance, disclosure or copying of the contents of this e-mail and any attachment(s), or any similar action, is strictly prohibited. Fitch Ratings reserves the right, to the extent permitted by applicable law, to retain, monitor and intercept e-mail messages both to and from its systems.

This e-mail has been scanned by the MessageLabs Email Security System. For more information, please visit http://www.messagelabs.com/email.
______________________________________________________________________

______________________________________________________________________
This email has been scanned by the Symantec Email Security.cloud service.
For more information please visit http://www.symanteccloud.com
______________________________________________________________________

______________________________________________________________________
Confidentiality Notice: The information contained in this e-mail and any attachment(s) is confidential and for the use of the addressee(s) only. If you are not the intended recipient of this e-mail, do not duplicate or redistribute it by any means. Please delete this e-mail and any attachment(s) and notify us immediately. Unauthorized use, reliance, disclosure or copying of the contents of this e-mail and any attachment(s), or any similar action, is strictly prohibited. Fitch Ratings reserves the right, to the extent permitted by applicable law, to retain, monitor and intercept e-mail messages both to and from its systems.

This e-mail has been scanned by the MessageLabs Email Security System. For more information, please visit http://www.messagelabs.com/email.
______________________________________________________________________

RE: Memory issues with PDF parser

Posted by "Allison, Timothy B." <ta...@mitre.org>.
You will get the same exception.  If you run the pure Tika app commandline on a triggering file, does it at least show you the "caused by" clause that might give more information?

Other question: Are you sure that you want to avoid parsing attachments?


From: Mouthgalya Ganapathy [mailto:mouthgalya.ganapathy@fitchratings.com]
Sent: Thursday, June 04, 2015 2:55 PM
To: Allison, Timothy B.
Cc: user@tika.apache.org; Sauparna Sarkar
Subject: RE: Memory issues with PDF parser

Thanks for the update Timothy,
I see that Tika 1.9.-SNAPSHOT is available in maven repo. I am going to try that and  will use TikaInputStreams. I will update the results.

Given below is the IO exception that I get when I use Autoparser to extract pdf contents. I had used Tika 1.6. and pdfbox 1.8.9. I am guessing I will get the same/similar exception when I am going to run it with 1.9-SNAPSHOT.

1:27:53,921 WARN  [org.hornetq.core.client.impl.ClientSessionImpl] (Thread-4 (HornetQ-client-global-threads-248507153)) resetting session after failure
[Server:research-etl-server] 21:29:16,314 INFO  [stdout] (Thread-12 (HornetQ-client-global-threads-248507153)) Exception in updating docbody for report ==> RPT_720610
[Server:research-etl-server] 21:29:23,817 ERROR [stderr] (Thread-12 (HornetQ-client-global-threads-248507153)) org.apache.tika.exception.TikaException: TIKA-198: Illegal IOException from org.apache.tika.parser.pdf.PDFParser@29fe5969<ma...@29fe5969>
[Server:research-etl-server] 21:29:23,818 ERROR [stderr] (Thread-12 (HornetQ-client-global-threads-248507153)) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:250)
[Server:research-etl-server] 21:29:23,818 ERROR [stderr] (Thread-12 (HornetQ-client-global-threads-248507153)) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:244)
[Server:research-etl-server] 21:29:23,820 ERROR [stderr] (Thread-12 (HornetQ-client-global-threads-248507153)) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:121)
[Server:research-etl-server] 21:29:23,820 ERROR [stderr] (Thread-12 (HornetQ-client-global-threads-248507153)) at com.fitch.researchapi.dao.ResearchReportMDAO.updateDocBody(ResearchReportMDAO.java:888)
[Server:research-etl-server] 21:29:23,820 ERROR [stderr] (Thread-12 (HornetQ-client-global-threads-248507153)) at com.fitch.researchapi.dao.ResearchReportMDAO.loadFile_NEW(ResearchReportMDAO.java:983)
[Server:research-etl-server] 21:29:23,821 ERROR [stderr] (Thread-12 (HornetQ-client-global-threads-248507153)) at com.fitch.researchapi.dao.ResearchReportMDAO.upsert_NEW(ResearchReportMDAO.java:678)
[Server:research-etl-server] 21:29:23,821 ERROR [stderr] (Thread-12 (HornetQ-client-global-threads-248507153)) at com.fitch.research.ejb.ResearchReportManagerBean.processResearchReport(ResearchReportManagerBean.java:70)
[Server:research-etl-server] 21:29:23,822 ERROR [stderr] (Thread-12 (HornetQ-client-global-threads-248507153)) at sun.reflect.GeneratedMethodAccessor38.invoke(Unknown Source)
[Server:research-etl-server] 21:29:23,822 WARN  [org.hornetq.core.server.impl.ServerSessionImpl] (hornetq-failure-check-thread) Cleared up resources for session dc692df4-0a50-11e5-8aa3-005056900299
[Server:research-etl-server] 21:29:23,822 ERROR [stderr] (Thread-12 (HornetQ-client-global-threads-248507153)) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
[Server:research-etl-server] 21:29:23,823 ERROR [stderr] (Thread-12 (HornetQ-client-global-threads-248507153)) at java.lang.reflect.Method.invoke(Method.java:597)
[Server:research-etl-server] 21:29:23,823 ERROR [stderr] (Thread-12 (HornetQ-client-global-threads-248507153)) at org.jboss.as.ee.component.ManagedReferenceMethodInterceptorFactory$ManagedReferenceMethodInterceptor.processInvocation(ManagedReferenceMethodInterceptorFactory.java:72)
[Server:research-etl-server] 21:29:23,823 ERROR [stderr] (Thread-12 (HornetQ-client-global-threads-248507153)) at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288)
[Server:research-etl-server] 21:29:23,824 ERROR [stderr] (Thread-12 (HornetQ-client-global-threads-248507153)) at org.jboss.invocation.WeavedInterceptor.processInvocation(WeavedInterceptor.java:53)
[Server:research-etl-server] 21:29:23,824 ERROR [stderr] (Thread-12 (HornetQ-client-global-threads-248507153)) at org.jboss.as.ee.component.interceptors.UserInterceptorFactory$1.processInvocation(UserInterceptorFactory.java:36)
[Server:research-etl-server] 21:29:23,824 ERROR [stderr] (Thread-12 (HornetQ-client-global-threads-248507153)) at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288)



Thanks,
Mouthgalya Ganapathy
Product Development Team
From: Allison, Timothy B. [mailto:tallison@mitre.org]
Sent: Thursday, June 04, 2015 12:50 PM
To: Mouthgalya Ganapathy
Cc: user@tika.apache.org<ma...@tika.apache.org>; Sauparna Sarkar
Subject: RE: Memory issues with PDF parser


1)      Right, the npe is caused by the exception returning null when we call getMessage().  In TIKA-1605, we modified all code in the project to check for null returned by getMessage().  So, in the "fixed" version, you'll still get your good old IOException.  I can't tell from your stacktrace what caused the IOException.

2)      Y, regular builds of 1.9's app (and other modules) are available via Jenkins here: https://builds.apache.org/view/Tika/job/tika-trunk-jdk1.7/org.apache.tika$tika-app/

3)      Ok, makes sense.

For kicks, you may want to change opening the file to:
is = TikaInputStream.get(file)
or maybe:
is = TikaInputStream.get(file, metadata)

And you'll want to surround your closing of the IS in a try/catch block.  Or use IOUtils.closeQuietly.

Finally, are you able to share the particular file that caused the IOException?
From: Mouthgalya Ganapathy [mailto:mouthgalya.ganapathy@fitchratings.com]
Sent: Thursday, June 04, 2015 10:20 AM
To: Allison, Timothy B.; tallison@apache.org<ma...@apache.org>
Cc: user@tika.apache.org<ma...@tika.apache.org>; Sauparna Sarkar
Subject: RE: Memory issues with PDF parser

Hi Timothy,
Thanks for the prompt reply.


1.)    Wouldn't fixing the null pointer exception in turn throw the IO exception? I saw that the null pointer exception was thrown inside the catch block of the IO exception? Any root cause for the IO exception??.

Is that also fixed?



I am including the code that threw the null pointer exception in tike 1.8



Exception:
10:53:12,218 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129)) java.lang.NullPointerException
10:53:12,219 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:158)



Code in the pdf parser:
catch (IOException e) {
            //nonseq parser throws IOException for bad password
            //At the Tika level, we want the same exception to be thrown
            if (e.getMessage().contains("Error (CryptographyException)")) {
                metadata.set("pdf:encrypted", Boolean.toString(true));
                throw new EncryptedDocumentException(e);
            }


2.)    Do you have a snapshot or beta version of tika 1.9 that I could try with our pdf corpus? It would also help in your developer testing.

3.)    For the inline images, we have just set the defaults(which is to skip them as you had mentioned). I have not done any memory profiling till now. I will also try that.



Thanks,
MG

From: Allison, Timothy B. [mailto:tallison@mitre.org]
Sent: Thursday, June 04, 2015 7:19 AM
To: Mouthgalya Ganapathy; tallison@apache.org<ma...@apache.org>
Cc: user@tika.apache.org<ma...@tika.apache.org>
Subject: RE: Memory issues with PDF parser

Hi Mouthgalya,
  We fixed that NPE in https://issues.apache.org/jira/browse/TIKA-1605, and the fix will be available in Tika 1.9, which should be out within a week.
  As for memory issues, we worked around a memory leak in PDFBox with static caching of fonts for Tika 1.7 (may have been 1.8), but there may be others.  One potential memory hog is the processing of inline images within PDFs...have you configured Tika to pull those out (default is to skip them)?  Other than that, I'd recommend dropping a note to the PDFBox users list to get help in diagnosing memory consumption with PDFBox.  Have you tried any memory profiling?

          Best,

                    Tim

From: Mouthgalya Ganapathy [mailto:mouthgalya.ganapathy@fitchratings.com]
Sent: Wednesday, June 03, 2015 3:25 PM
To: tallison@apache.org<ma...@apache.org>
Subject: Memory issues with PDF parser

Hi all,
I am trying to use Apache tika 1.8 for extracting contents from pdf. I have the below code for extracting it. It works well for few files. But if I read many files , I see out of memory exception.
I also see a Null pointer exception in the pdf parser. I think the null pointer exception is because of the memory exception.
Any suggestions?

Tika version:
  <dependency>
                     <groupId>org.apache.tika</groupId>
                     <artifactId>tika-server</artifactId>
                     <version>1.8</version>
        </dependency>

I am running it as a part of J2EE APP in JBoss 1.7

Code:-

//Parse the pdf content using Apache Tikka
            InputStream is = null;
            try {
              is = new BufferedInputStream(new FileInputStream(input));
              //Disable write limit.
              contenthandler = new BodyContentHandler(-1);
               metadata = new Metadata();
              pdfparser = new PDFParser();
              context = new ParseContext();
              pdfparser.parse(is, contenthandler, metadata, context);
              docBody=contenthandler.toString();
              //System.out.println(contenthandler.toString());
            }
            catch (Exception e) {
               System.out.println("Exception in updating docbody for report ==> " + report.getDocID());
               if(is==null)
                 System.out.println("The input stream is a null object");
               e.printStackTrace();
              logger.log(Level.SEVERE, e.getMessage(), e);
            }
            finally {
                if (is != null) is.close();
                contenthandler=null;
                metadata=null;
                pdfparser=null;
                context =null;
            }


Exception:-
I am just including the null pointer exception in the parser below.

10:53:11,696 INFO  [stdout] (Thread-11 (HornetQ-client-global-threads-1619682129)) Exception in updating docbody for report ==> RPT_764268
10:53:12,218 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129)) java.lang.NullPointerException
10:53:12,219 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:158)
10:53:12,219 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at com.fitch.researchapi.dao.ResearchReportMDAO.updateDocBody(ResearchReportMDAO.java:881)
10:53:12,219 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at com.fitch.researchapi.dao.ResearchReportMDAO.loadFile_NEW(ResearchReportMDAO.java:965)
10:53:12,220 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at com.fitch.researchapi.dao.ResearchReportMDAO.upsert_NEW(ResearchReportMDAO.java:676)
10:53:12,220 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at com.fitch.research.ejb.ResearchReportManagerBean.processResearchReport(ResearchReportManagerBean.java:70)
10:53:12,221 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at sun.reflect.GeneratedMethodAccessor35.invoke(Unknown Source)
10:53:12,221 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
10:53:12,222 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at java.lang.reflect.Method.invoke(Method.java:597)
10:53:12,222 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.as.ee.component.ManagedReferenceMethodInterceptorFactory$ManagedReferenceMethodInterceptor.processInvocation(ManagedReferenceMethodInterceptorFactory.java:72)
10:53:12,223 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288)
10:53:12,223 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.invocation.WeavedInterceptor.processInvocation(WeavedInterceptor.java:53)
10:53:12,223 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.as.ee.component.interceptors.UserInterceptorFactory$1.processInvocation(UserInterceptorFactory.java:36)
10:53:12,224 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288)
10:53:12,224 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.as.jpa.interceptor.SBInvocationInterceptor.processInvocation(SBInvocationInterceptor.java:47)
10:53:12,225 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288)
10:53:12,225 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.invocation.InitialInterceptor.processInvocation(InitialInterceptor.java:21)
10:53:12,226 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288)
10:53:12,226 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.invocation.ChainedInterceptor.processInvocation(ChainedInterceptor.java:61)
10:53:12,227 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.as.ee.component.interceptors.ComponentDispatcherInterceptor.processInvocation(ComponentDispatcherInterceptor.java:53)
10:53:12,227 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288)
10:53:12,228 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.as.ejb3.component.pool.PooledInstanceInterceptor.processInvocation(PooledInstanceInterceptor.java:51)
10:53:12,228 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288)
10:53:12,229 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.as.ejb3.tx.CMTTxInterceptor.invokeInCallerTx(CMTTxInterceptor.java:202)
10:53:12,229 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.as.ejb3.tx.CMTTxInterceptor.required(CMTTxInterceptor.java:306)
10:53:12,229 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.as.ejb3.tx.CMTTxInterceptor.processInvocation(CMTTxInterceptor.java:190)
10:53:12,230 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288)
10:53:12,230 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.as.ejb3.component.interceptors.CurrentInvocationContextInterceptor.processInvocation(CurrentInvocationContextInterceptor.java:41)
10:53:12,231 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288)
10:53:12,231 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.as.ejb3.component.interceptors.LoggingInterceptor.processInvocation(LoggingInterceptor.java:59)
10:53:12,231 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288)
10:53:12,232 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.as.ee.component.NamespaceContextInterceptor.processInvocation(NamespaceContextInterceptor.java:50)
10:53:12,232 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288)
10:53:12,233 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.as.ejb3.component.interceptors.AdditionalSetupInterceptor.processInvocation(AdditionalSetupInterceptor.java:32)
10:53:12,233 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288)
10:53:12,233 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.as.ee.component.TCCLInterceptor.processInvocation(TCCLInterceptor.java:45)
10:53:12,234 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288)
10:53:12,234 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.invocation.ChainedInterceptor.processInvocation(ChainedInterceptor.java:61)
10:53:12,235 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.as.ee.component.ViewService$View.invoke(ViewService.java:165)
10:53:12,235 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.as.ee.component.ViewDescription$1.processInvocation(ViewDescription.java:173)
10:53:12,235 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288)
10:53:12,236 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.invocation.ChainedInterceptor.processInvocation(ChainedInterceptor.java:61)
10:53:12,236 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.as.ee.component.ProxyInvocationHandler.invoke(ProxyInvocationHandler.java:72)
10:53:12,236 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at com.fitch.research.ejb.ResearchReportManagerBeanLocal$$$view4.processResearchReport(Unknown Source)
10:53:12,868 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at com.fitch.research.ejb.mdb.ResearchQueueManagerMDB.onMessage(ResearchQueueManagerMDB.java:150)
10:53:12,868 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at sun.reflect.GeneratedMethodAccessor34.invoke(Unknown Source)
10:53:12,869 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
10:53:12,869 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at java.lang.reflect.Method.invoke(Method.java:597)
10:53:12,870 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.as.ee.component.ManagedReferenceMethodInterceptorFactory$ManagedReferenceMethodInterceptor.processInvocation(ManagedReferenceMethodInterceptorFactory.java:72)
10:53:12,870 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288)
10:53:12,871 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.invocation.WeavedInterceptor.processInvocation(WeavedInterceptor.java:53)
10:53:12,871 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.as.ee.component.interceptors.UserInterceptorFactory$1.processInvocation(UserInterceptorFactory.java:36)
10:53:12,872 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288)
10:53:12,872 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.invocation.InitialInterceptor.processInvocation(InitialInterceptor.java:21)
10:53:12,872 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288)
10:53:12,873 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.invocation.ChainedInterceptor.processInvocation(ChainedInterceptor.java:61)
10:53:12,873 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.as.ee.component.interceptors.ComponentDispatcherInterceptor.processInvocation(ComponentDispatcherInterceptor.java:53)
10:53:12,874 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288)
10:53:12,874 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.as.ejb3.component.pool.PooledInstanceInterceptor.processInvocation(PooledInstanceInterceptor.java:51)
10:53:12,874 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288)
10:53:12,875 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.as.ejb3.tx.CMTTxInterceptor.invokeInCallerTx(CMTTxInterceptor.java:202)
10:53:12,875 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.as.ejb3.tx.CMTTxInterceptor.required(CMTTxInterceptor.java:306)
10:53:12,876 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.as.ejb3.tx.CMTTxInterceptor.processInvocation(CMTTxInterceptor.java:190)
10:53:12,876 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288)
10:53:12,876 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.as.ejb3.component.interceptors.CurrentInvocationContextInterceptor.processInvocation(CurrentInvocationContextInterceptor.java:41)
10:53:12,877 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288)
10:53:12,877 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.as.ejb3.component.interceptors.LoggingInterceptor.processInvocation(LoggingInterceptor.java:59)
10:53:12,878 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288)
10:53:12,878 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.as.ee.component.NamespaceContextInterceptor.processInvocation(NamespaceContextInterceptor.java:50)
10:53:12,878 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288)
10:53:12,879 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.as.ejb3.component.interceptors.AdditionalSetupInterceptor.processInvocation(AdditionalSetupInterceptor.java:43)
10:53:12,879 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288)
10:53:12,880 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.as.ejb3.component.messagedriven.MessageDrivenComponentDescription$5$1.processInvocation(MessageDrivenComponentDescription.java:184)
10:53:12,880 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288)
10:53:12,881 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.as.ee.component.TCCLInterceptor.processInvocation(TCCLInterceptor.java:45)
10:53:12,881 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288)
10:53:12,881 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.invocation.ChainedInterceptor.processInvocation(ChainedInterceptor.java:61)
10:53:12,882 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.as.ee.component.ViewService$View.invoke(ViewService.java:165)
10:53:12,883 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.as.ee.component.ViewDescription$1.processInvocation(ViewDescription.java:173)
10:53:12,883 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288)

Thanks,
MG
Product Development Team




______________________________________________________________________
Confidentiality Notice: The information contained in this e-mail and any attachment(s) is confidential and for the use of the addressee(s) only. If you are not the intended recipient of this e-mail, do not duplicate or redistribute it by any means. Please delete this e-mail and any attachment(s) and notify us immediately. Unauthorized use, reliance, disclosure or copying of the contents of this e-mail and any attachment(s), or any similar action, is strictly prohibited. Fitch Ratings reserves the right, to the extent permitted by applicable law, to retain, monitor and intercept e-mail messages both to and from its systems.

This e-mail has been scanned by the MessageLabs Email Security System. For more information, please visit http://www.messagelabs.com/email.
______________________________________________________________________

______________________________________________________________________
This email has been scanned by the Symantec Email Security.cloud service.
For more information please visit http://www.symanteccloud.com
______________________________________________________________________

______________________________________________________________________
Confidentiality Notice: The information contained in this e-mail and any attachment(s) is confidential and for the use of the addressee(s) only. If you are not the intended recipient of this e-mail, do not duplicate or redistribute it by any means. Please delete this e-mail and any attachment(s) and notify us immediately. Unauthorized use, reliance, disclosure or copying of the contents of this e-mail and any attachment(s), or any similar action, is strictly prohibited. Fitch Ratings reserves the right, to the extent permitted by applicable law, to retain, monitor and intercept e-mail messages both to and from its systems.

This e-mail has been scanned by the MessageLabs Email Security System. For more information, please visit http://www.messagelabs.com/email.
______________________________________________________________________

______________________________________________________________________
This email has been scanned by the Symantec Email Security.cloud service.
For more information please visit http://www.symanteccloud.com
______________________________________________________________________

______________________________________________________________________
Confidentiality Notice: The information contained in this e-mail and any attachment(s) is confidential and for the use of the addressee(s) only. If you are not the intended recipient of this e-mail, do not duplicate or redistribute it by any means. Please delete this e-mail and any attachment(s) and notify us immediately. Unauthorized use, reliance, disclosure or copying of the contents of this e-mail and any attachment(s), or any similar action, is strictly prohibited. Fitch Ratings reserves the right, to the extent permitted by applicable law, to retain, monitor and intercept e-mail messages both to and from its systems.

This e-mail has been scanned by the MessageLabs Email Security System. For more information, please visit http://www.messagelabs.com/email.
______________________________________________________________________

RE: Memory issues with PDF parser

Posted by "Allison, Timothy B." <ta...@mitre.org>.
1)      Right, the npe is caused by the exception returning null when we call getMessage().  In TIKA-1605, we modified all code in the project to check for null returned by getMessage().  So, in the "fixed" version, you'll still get your good old IOException.  I can't tell from your stacktrace what caused the IOException.

2)      Y, regular builds of 1.9's app (and other modules) are available via Jenkins here: https://builds.apache.org/view/Tika/job/tika-trunk-jdk1.7/org.apache.tika$tika-app/

3)      Ok, makes sense.

For kicks, you may want to change opening the file to:
is = TikaInputStream.get(file)
or maybe:
is = TikaInputStream.get(file, metadata)

And you'll want to surround your closing of the IS in a try/catch block.  Or use IOUtils.closeQuietly.

Finally, are you able to share the particular file that caused the IOException?
From: Mouthgalya Ganapathy [mailto:mouthgalya.ganapathy@fitchratings.com]
Sent: Thursday, June 04, 2015 10:20 AM
To: Allison, Timothy B.; tallison@apache.org
Cc: user@tika.apache.org; Sauparna Sarkar
Subject: RE: Memory issues with PDF parser

Hi Timothy,
Thanks for the prompt reply.


1.)    Wouldn't fixing the null pointer exception in turn throw the IO exception? I saw that the null pointer exception was thrown inside the catch block of the IO exception? Any root cause for the IO exception??.

Is that also fixed?



I am including the code that threw the null pointer exception in tike 1.8



Exception:
10:53:12,218 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129)) java.lang.NullPointerException
10:53:12,219 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:158)



Code in the pdf parser:
catch (IOException e) {
            //nonseq parser throws IOException for bad password
            //At the Tika level, we want the same exception to be thrown
            if (e.getMessage().contains("Error (CryptographyException)")) {
                metadata.set("pdf:encrypted", Boolean.toString(true));
                throw new EncryptedDocumentException(e);
            }


2.)    Do you have a snapshot or beta version of tika 1.9 that I could try with our pdf corpus? It would also help in your developer testing.

3.)    For the inline images, we have just set the defaults(which is to skip them as you had mentioned). I have not done any memory profiling till now. I will also try that.



Thanks,
MG

From: Allison, Timothy B. [mailto:tallison@mitre.org]
Sent: Thursday, June 04, 2015 7:19 AM
To: Mouthgalya Ganapathy; tallison@apache.org<ma...@apache.org>
Cc: user@tika.apache.org<ma...@tika.apache.org>
Subject: RE: Memory issues with PDF parser

Hi Mouthgalya,
  We fixed that NPE in https://issues.apache.org/jira/browse/TIKA-1605, and the fix will be available in Tika 1.9, which should be out within a week.
  As for memory issues, we worked around a memory leak in PDFBox with static caching of fonts for Tika 1.7 (may have been 1.8), but there may be others.  One potential memory hog is the processing of inline images within PDFs...have you configured Tika to pull those out (default is to skip them)?  Other than that, I'd recommend dropping a note to the PDFBox users list to get help in diagnosing memory consumption with PDFBox.  Have you tried any memory profiling?

          Best,

                    Tim

From: Mouthgalya Ganapathy [mailto:mouthgalya.ganapathy@fitchratings.com]
Sent: Wednesday, June 03, 2015 3:25 PM
To: tallison@apache.org<ma...@apache.org>
Subject: Memory issues with PDF parser

Hi all,
I am trying to use Apache tika 1.8 for extracting contents from pdf. I have the below code for extracting it. It works well for few files. But if I read many files , I see out of memory exception.
I also see a Null pointer exception in the pdf parser. I think the null pointer exception is because of the memory exception.
Any suggestions?

Tika version:
  <dependency>
                     <groupId>org.apache.tika</groupId>
                     <artifactId>tika-server</artifactId>
                     <version>1.8</version>
        </dependency>

I am running it as a part of J2EE APP in JBoss 1.7

Code:-

//Parse the pdf content using Apache Tikka
            InputStream is = null;
            try {
              is = new BufferedInputStream(new FileInputStream(input));
              //Disable write limit.
              contenthandler = new BodyContentHandler(-1);
               metadata = new Metadata();
              pdfparser = new PDFParser();
              context = new ParseContext();
              pdfparser.parse(is, contenthandler, metadata, context);
              docBody=contenthandler.toString();
              //System.out.println(contenthandler.toString());
            }
            catch (Exception e) {
               System.out.println("Exception in updating docbody for report ==> " + report.getDocID());
               if(is==null)
                 System.out.println("The input stream is a null object");
               e.printStackTrace();
              logger.log(Level.SEVERE, e.getMessage(), e);
            }
            finally {
                if (is != null) is.close();
                contenthandler=null;
                metadata=null;
                pdfparser=null;
                context =null;
            }


Exception:-
I am just including the null pointer exception in the parser below.

10:53:11,696 INFO  [stdout] (Thread-11 (HornetQ-client-global-threads-1619682129)) Exception in updating docbody for report ==> RPT_764268
10:53:12,218 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129)) java.lang.NullPointerException
10:53:12,219 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:158)
10:53:12,219 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at com.fitch.researchapi.dao.ResearchReportMDAO.updateDocBody(ResearchReportMDAO.java:881)
10:53:12,219 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at com.fitch.researchapi.dao.ResearchReportMDAO.loadFile_NEW(ResearchReportMDAO.java:965)
10:53:12,220 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at com.fitch.researchapi.dao.ResearchReportMDAO.upsert_NEW(ResearchReportMDAO.java:676)
10:53:12,220 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at com.fitch.research.ejb.ResearchReportManagerBean.processResearchReport(ResearchReportManagerBean.java:70)
10:53:12,221 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at sun.reflect.GeneratedMethodAccessor35.invoke(Unknown Source)
10:53:12,221 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
10:53:12,222 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at java.lang.reflect.Method.invoke(Method.java:597)
10:53:12,222 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.as.ee.component.ManagedReferenceMethodInterceptorFactory$ManagedReferenceMethodInterceptor.processInvocation(ManagedReferenceMethodInterceptorFactory.java:72)
10:53:12,223 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288)
10:53:12,223 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.invocation.WeavedInterceptor.processInvocation(WeavedInterceptor.java:53)
10:53:12,223 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.as.ee.component.interceptors.UserInterceptorFactory$1.processInvocation(UserInterceptorFactory.java:36)
10:53:12,224 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288)
10:53:12,224 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.as.jpa.interceptor.SBInvocationInterceptor.processInvocation(SBInvocationInterceptor.java:47)
10:53:12,225 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288)
10:53:12,225 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.invocation.InitialInterceptor.processInvocation(InitialInterceptor.java:21)
10:53:12,226 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288)
10:53:12,226 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.invocation.ChainedInterceptor.processInvocation(ChainedInterceptor.java:61)
10:53:12,227 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.as.ee.component.interceptors.ComponentDispatcherInterceptor.processInvocation(ComponentDispatcherInterceptor.java:53)
10:53:12,227 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288)
10:53:12,228 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.as.ejb3.component.pool.PooledInstanceInterceptor.processInvocation(PooledInstanceInterceptor.java:51)
10:53:12,228 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288)
10:53:12,229 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.as.ejb3.tx.CMTTxInterceptor.invokeInCallerTx(CMTTxInterceptor.java:202)
10:53:12,229 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.as.ejb3.tx.CMTTxInterceptor.required(CMTTxInterceptor.java:306)
10:53:12,229 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.as.ejb3.tx.CMTTxInterceptor.processInvocation(CMTTxInterceptor.java:190)
10:53:12,230 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288)
10:53:12,230 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.as.ejb3.component.interceptors.CurrentInvocationContextInterceptor.processInvocation(CurrentInvocationContextInterceptor.java:41)
10:53:12,231 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288)
10:53:12,231 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.as.ejb3.component.interceptors.LoggingInterceptor.processInvocation(LoggingInterceptor.java:59)
10:53:12,231 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288)
10:53:12,232 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.as.ee.component.NamespaceContextInterceptor.processInvocation(NamespaceContextInterceptor.java:50)
10:53:12,232 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288)
10:53:12,233 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.as.ejb3.component.interceptors.AdditionalSetupInterceptor.processInvocation(AdditionalSetupInterceptor.java:32)
10:53:12,233 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288)
10:53:12,233 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.as.ee.component.TCCLInterceptor.processInvocation(TCCLInterceptor.java:45)
10:53:12,234 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288)
10:53:12,234 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.invocation.ChainedInterceptor.processInvocation(ChainedInterceptor.java:61)
10:53:12,235 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.as.ee.component.ViewService$View.invoke(ViewService.java:165)
10:53:12,235 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.as.ee.component.ViewDescription$1.processInvocation(ViewDescription.java:173)
10:53:12,235 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288)
10:53:12,236 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.invocation.ChainedInterceptor.processInvocation(ChainedInterceptor.java:61)
10:53:12,236 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.as.ee.component.ProxyInvocationHandler.invoke(ProxyInvocationHandler.java:72)
10:53:12,236 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at com.fitch.research.ejb.ResearchReportManagerBeanLocal$$$view4.processResearchReport(Unknown Source)
10:53:12,868 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at com.fitch.research.ejb.mdb.ResearchQueueManagerMDB.onMessage(ResearchQueueManagerMDB.java:150)
10:53:12,868 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at sun.reflect.GeneratedMethodAccessor34.invoke(Unknown Source)
10:53:12,869 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
10:53:12,869 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at java.lang.reflect.Method.invoke(Method.java:597)
10:53:12,870 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.as.ee.component.ManagedReferenceMethodInterceptorFactory$ManagedReferenceMethodInterceptor.processInvocation(ManagedReferenceMethodInterceptorFactory.java:72)
10:53:12,870 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288)
10:53:12,871 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.invocation.WeavedInterceptor.processInvocation(WeavedInterceptor.java:53)
10:53:12,871 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.as.ee.component.interceptors.UserInterceptorFactory$1.processInvocation(UserInterceptorFactory.java:36)
10:53:12,872 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288)
10:53:12,872 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.invocation.InitialInterceptor.processInvocation(InitialInterceptor.java:21)
10:53:12,872 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288)
10:53:12,873 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.invocation.ChainedInterceptor.processInvocation(ChainedInterceptor.java:61)
10:53:12,873 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.as.ee.component.interceptors.ComponentDispatcherInterceptor.processInvocation(ComponentDispatcherInterceptor.java:53)
10:53:12,874 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288)
10:53:12,874 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.as.ejb3.component.pool.PooledInstanceInterceptor.processInvocation(PooledInstanceInterceptor.java:51)
10:53:12,874 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288)
10:53:12,875 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.as.ejb3.tx.CMTTxInterceptor.invokeInCallerTx(CMTTxInterceptor.java:202)
10:53:12,875 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.as.ejb3.tx.CMTTxInterceptor.required(CMTTxInterceptor.java:306)
10:53:12,876 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.as.ejb3.tx.CMTTxInterceptor.processInvocation(CMTTxInterceptor.java:190)
10:53:12,876 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288)
10:53:12,876 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.as.ejb3.component.interceptors.CurrentInvocationContextInterceptor.processInvocation(CurrentInvocationContextInterceptor.java:41)
10:53:12,877 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288)
10:53:12,877 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.as.ejb3.component.interceptors.LoggingInterceptor.processInvocation(LoggingInterceptor.java:59)
10:53:12,878 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288)
10:53:12,878 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.as.ee.component.NamespaceContextInterceptor.processInvocation(NamespaceContextInterceptor.java:50)
10:53:12,878 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288)
10:53:12,879 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.as.ejb3.component.interceptors.AdditionalSetupInterceptor.processInvocation(AdditionalSetupInterceptor.java:43)
10:53:12,879 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288)
10:53:12,880 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.as.ejb3.component.messagedriven.MessageDrivenComponentDescription$5$1.processInvocation(MessageDrivenComponentDescription.java:184)
10:53:12,880 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288)
10:53:12,881 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.as.ee.component.TCCLInterceptor.processInvocation(TCCLInterceptor.java:45)
10:53:12,881 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288)
10:53:12,881 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.invocation.ChainedInterceptor.processInvocation(ChainedInterceptor.java:61)
10:53:12,882 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.as.ee.component.ViewService$View.invoke(ViewService.java:165)
10:53:12,883 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.as.ee.component.ViewDescription$1.processInvocation(ViewDescription.java:173)
10:53:12,883 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129))    at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288)

Thanks,
MG
Product Development Team




______________________________________________________________________
Confidentiality Notice: The information contained in this e-mail and any attachment(s) is confidential and for the use of the addressee(s) only. If you are not the intended recipient of this e-mail, do not duplicate or redistribute it by any means. Please delete this e-mail and any attachment(s) and notify us immediately. Unauthorized use, reliance, disclosure or copying of the contents of this e-mail and any attachment(s), or any similar action, is strictly prohibited. Fitch Ratings reserves the right, to the extent permitted by applicable law, to retain, monitor and intercept e-mail messages both to and from its systems.

This e-mail has been scanned by the MessageLabs Email Security System. For more information, please visit http://www.messagelabs.com/email.
______________________________________________________________________

______________________________________________________________________
This email has been scanned by the Symantec Email Security.cloud service.
For more information please visit http://www.symanteccloud.com
______________________________________________________________________

______________________________________________________________________
Confidentiality Notice: The information contained in this e-mail and any attachment(s) is confidential and for the use of the addressee(s) only. If you are not the intended recipient of this e-mail, do not duplicate or redistribute it by any means. Please delete this e-mail and any attachment(s) and notify us immediately. Unauthorized use, reliance, disclosure or copying of the contents of this e-mail and any attachment(s), or any similar action, is strictly prohibited. Fitch Ratings reserves the right, to the extent permitted by applicable law, to retain, monitor and intercept e-mail messages both to and from its systems.

This e-mail has been scanned by the MessageLabs Email Security System. For more information, please visit http://www.messagelabs.com/email.
______________________________________________________________________