You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@pdfbox.apache.org by Craig Strong <cr...@yahoo.com> on 2014/03/10 21:19:03 UTC

Extracting text from PDF with no embedded fonts

I have been using PDFBox to extract text from several different PDF files fine.  I use the latest PDFBox app with ExtractText class.  There is one PDF that PDFBox (and iText) fails to extract any text even though I can extract the text with Adobe Reader and also pdftotext.exe part of XPdf.  I don't want to have to rely on using pdftotext.exe from a PC since this is part of an automated application.  I think the error relates to an unknown font type and having to use the few fonts installed in the jar file.  I tried running the API classes and trying to force a font from a certain location but I still got errors.  I thought I loaded the font with the loadTTF method but I don't know if that did anything with the font.  I would really like to have this working straight from the ExtractText class anyway.  I'm thinking I might have to build my own after putting a bunch of Windows fonts somewhere and changing a properties file but I really don't know
 if that is the right direction I should be taking and I am new to PDFBox.  Any ideas?
Here are the errors I am getting.  I tried this from both a Windows PC and our system but I get the same errors.  The section starting processEncodedText and on repeats a few times so I just included the first entries.
 
Mar 10, 2014 3:50:44 PM org.apache.pdfbox.pdmodel.font.PDFontFactory createFont                           
WARNING: Substituting TrueType for unknown font subtype=                                                  
Mar 10, 2014 3:50:45 PM org.apache.pdfbox.util.PDFStreamEngine processOperator                            
WARNING: java.lang.NullPointerException                                                                   
Throwable occurred: java.lang.NullPointerException                                                        
        at org.apache.pdfbox.pdmodel.font.PDTrueTypeFont.loadDescriptorDictionary(PDTrueTypeFont.java:375)
        at org.apache.pdfbox.pdmodel.font.PDTrueTypeFont.ensureFontDescriptor(PDTrueTypeFont.java:221)    
        at org.apache.pdfbox.pdmodel.font.PDTrueTypeFont.<init>(PDTrueTypeFont.java:119)    
        at org.apache.pdfbox.pdmodel.font.PDFontFactory.createFont(PDFontFactory.java:121)  
        at org.apache.pdfbox.pdmodel.PDResources.getFonts(PDResources.java:204)             
        at org.apache.pdfbox.util.PDFStreamEngine.getFonts(PDFStreamEngine.java:604)        
        at org.apache.pdfbox.util.operator.SetTextFont.process(SetTextFont.java:54)         
        at org.apache.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:554) 
        at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:268)
        at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:235)
        at org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:215)   
        at org.apache.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:456)     
        at org.apache.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:381)    
        at org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:340)       
        at org.apache.pdfbox.ExtractText.startExtraction(ExtractText.java:275)              
        at org.apache.pdfbox.ExtractText.main(ExtractText.java:85)                          
        at org.apache.pdfbox.PDFBox.main(PDFBox.java:58)                                    
Mar 10, 2014 3:50:45 PM org.apache.pdfbox.util.PDFStreamEngine processEncodedText           
WARNING: java.lang.NullPointerException                                                     
Throwable occurred: java.lang.NullPointerException                                            
        at org.apache.pdfbox.util.PDFStreamEngine.processEncodedText(PDFStreamEngine.java:355)
        at org.apache.pdfbox.util.operator.ShowText.process(ShowText.java:45)                 
        at org.apache.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:554)   
        at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:268)  
        at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:235)  
        at org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:215)     
        at org.apache.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:456)       
        at org.apache.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:381)      
        at org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:340)         
        at org.apache.pdfbox.ExtractText.startExtraction(ExtractText.java:275)                
        at org.apache.pdfbox.ExtractText.main(ExtractText.java:85)                            
        at org.apache.pdfbox.PDFBox.main(PDFBox.java:58)                                      
Mar 10, 2014 3:50:45 PM org.apache.pdfbox.util.PDFStreamEngine processOperator                
WARNING: java.lang.NullPointerException                                                       
Throwable occurred: java.lang.NullPointerException                                            
        at org.apache.pdfbox.util.PDFStreamEngine.processEncodedText(PDFStreamEngine.java:364)
        at org.apache.pdfbox.util.operator.ShowText.process(ShowText.java:45)                 
        at org.apache.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:554)   
        at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:268)  
        at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:235)  
        at org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:215)     
        at org.apache.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:456)       
        at org.apache.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:381)      
        at org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:340)         
        at org.apache.pdfbox.ExtractText.startExtraction(ExtractText.java:275)                
        at org.apache.pdfbox.ExtractText.main(ExtractText.java:85)                            
        at org.apache.pdfbox.PDFBox.main(PDFBox.java:58)                                      

Thanks,
Craig Strong

Fw: Extracting text from PDF with no embedded fonts

Posted by Craig Strong <cr...@yahoo.com>.

I found a solution to my issue.  I was able to install the latest XPdf RPM file for AIX so I can now use pdftotext from PASE on the IBM i.  I can also adjust font manipulation on the fly with a configuration file.  This converts this PDF to text on the same system which PDFBox can't do and I don't have to rely on running pdftotext from a PC.  The -layout option is kind of nice too which puts some spaces similar to the PDF for some easier parsing.  The PDFBox pdfsplit function will have some use later.  Just to be clear, I still like the functionality of PDFBox and also iText.
I appreciate everyone's assistance.

Thanks,
Craig Strong
----- Forwarded Message -----
From: Craig Strong <cr...@yahoo.com>
To: "users@pdfbox.apache.org" <us...@pdfbox.apache.org> 
Sent: Monday, March 10, 2014 4:19 PM
Subject: Extracting text from PDF with no embedded fonts
 

I have been using PDFBox to extract text from several different PDF files fine.  I use the latest PDFBox app with ExtractText class.  There is one PDF that PDFBox (and iText) fails to extract any text even though I can extract the text with Adobe Reader and also pdftotext.exe part of XPdf.  I don't want to have to rely on using pdftotext.exe from a PC since this is part of an automated application.  I think the error relates to an unknown font type and having to use the few fonts installed in the jar file.  I tried running the API classes and trying to force a font from a certain location but I still got errors.  I thought I loaded the font with the loadTTF method but I don't know if that did anything with the font.  I would really like to have this working straight from the ExtractText class anyway.  I'm thinking I might have to build my own after putting a bunch of Windows fonts somewhere and changing a properties file but I really don't know
if that is the right direction I should be taking and I am new to PDFBox.  Any ideas?
Here are the errors I am getting.  I tried this from both a Windows PC and our system but I get the same errors.  The section starting processEncodedText and on repeats a few times so I just included the first entries.
 
Mar 10, 2014 3:50:44 PM org.apache.pdfbox.pdmodel.font.PDFontFactory createFont                           
WARNING: Substituting TrueType for unknown font subtype=                                                  
Mar 10, 2014 3:50:45 PM org.apache.pdfbox.util.PDFStreamEngine processOperator                            
WARNING: java.lang.NullPointerException                                                                   
Throwable occurred: java.lang.NullPointerException                                                        
        at org.apache.pdfbox.pdmodel.font.PDTrueTypeFont.loadDescriptorDictionary(PDTrueTypeFont.java:375)
        at org.apache.pdfbox.pdmodel.font.PDTrueTypeFont.ensureFontDescriptor(PDTrueTypeFont.java:221)    
        at org.apache.pdfbox.pdmodel.font.PDTrueTypeFont.<init>(PDTrueTypeFont.java:119)    
        at org.apache.pdfbox.pdmodel.font.PDFontFactory.createFont(PDFontFactory.java:121)  
        at org.apache.pdfbox.pdmodel.PDResources.getFonts(PDResources.java:204)             
        at org.apache.pdfbox.util.PDFStreamEngine.getFonts(PDFStreamEngine.java:604)        
        at org.apache.pdfbox.util.operator.SetTextFont.process(SetTextFont.java:54)         
        at org.apache.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:554) 
        at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:268)
        at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:235)
        at org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:215)   
        at org.apache.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:456)     
        at org.apache.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:381)    
        at org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:340)       
        at org.apache.pdfbox.ExtractText.startExtraction(ExtractText.java:275)              
        at org.apache.pdfbox.ExtractText.main(ExtractText.java:85)                          
        at org.apache.pdfbox.PDFBox.main(PDFBox.java:58)                                    
Mar 10, 2014 3:50:45 PM org.apache.pdfbox.util.PDFStreamEngine processEncodedText           
WARNING: java.lang.NullPointerException                                                     
Throwable occurred: java.lang.NullPointerException                                            
        at org.apache.pdfbox.util.PDFStreamEngine.processEncodedText(PDFStreamEngine.java:355)
        at org.apache.pdfbox.util.operator.ShowText.process(ShowText.java:45)                 
        at org.apache.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:554)   
        at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:268)  
        at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:235)  
        at org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:215)     
        at org.apache.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:456)       
        at org.apache.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:381)      
        at org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:340)         
        at org.apache.pdfbox.ExtractText.startExtraction(ExtractText.java:275)                
        at org.apache.pdfbox.ExtractText.main(ExtractText.java:85)                            
        at org.apache.pdfbox.PDFBox.main(PDFBox.java:58)                                      
Mar 10, 2014 3:50:45 PM org.apache.pdfbox.util.PDFStreamEngine processOperator                
WARNING: java.lang.NullPointerException                                                       
Throwable occurred: java.lang.NullPointerException                                            
        at org.apache.pdfbox.util.PDFStreamEngine.processEncodedText(PDFStreamEngine.java:364)
        at org.apache.pdfbox.util.operator.ShowText.process(ShowText.java:45)                 
        at org.apache.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:554)   
        at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:268)  
        at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:235)  
        at org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:215)     
        at org.apache.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:456)       
        at org.apache.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:381)      
        at org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:340)         
        at org.apache.pdfbox.ExtractText.startExtraction(ExtractText.java:275)                
        at org.apache.pdfbox.ExtractText.main(ExtractText.java:85)                            
        at org.apache.pdfbox.PDFBox.main(PDFBox.java:58)                                      

Thanks,
Craig Strong

Re: Extracting text from PDF with no embedded fonts

Posted by Craig Strong <cr...@yahoo.com>.
Hi, I used PDFBox 1.8.4.  I went ahead and created an issue with JIRA and uploaded the PDF file there.  I used most of my original email text.
 
Thanks,
Craig
 

________________________________
 From: Tilman Hausherr <TH...@t-online.de>
To: users@pdfbox.apache.org 
Sent: Friday, March 14, 2014 2:52 AM
Subject: Re: Extracting text from PDF with no embedded fonts
  

Hi,

The best would be to create an issue with JIRA and upload the file there, if it isn't confidential.

Re "the latest", did you use an 1.8 version or a 2.0 version?

Tilman

Am 10.03.2014 21:19, schrieb Craig Strong:
> I have been using PDFBox to extract text from several different PDF files fine.  I use the latest PDFBox app with ExtractText class.  There is one PDF that PDFBox (and iText) fails to extract any text even though I can extract the text with Adobe Reader and also pdftotext.exe part of XPdf.  I don't want to have to rely on using pdftotext.exe from a PC since this is part of an automated application.  I think the error relates to an unknown font type and having to use the few fonts installed in the jar file.  I tried running the API classes and trying to force a font from a certain location but I still got errors.  I thought I loaded the font with the loadTTF method but I don't know if that did anything with the font.  I would really like to have this working straight from the ExtractText class anyway.  I'm thinking I might have to build my own after putting a bunch of Windows fonts somewhere and changing a properties file but I really don't know
>   if that is the right direction I should be taking and I am new to PDFBox.  Any ideas?
> Here are the errors I am getting.  I tried this from both a Windows PC and our system but I get the same errors.  The section starting processEncodedText and on repeats a few times so I just included the first entries.
>   Mar 10, 2014 3:50:44 PM org.apache.pdfbox.pdmodel.font.PDFontFactory createFont
> WARNING: Substituting TrueType for unknown font subtype=
> Mar 10, 2014 3:50:45 PM org.apache.pdfbox.util.PDFStreamEngine processOperator
> WARNING: java.lang.NullPointerException
> Throwable occurred: java.lang.NullPointerException
>          at org.apache.pdfbox.pdmodel.font.PDTrueTypeFont.loadDescriptorDictionary(PDTrueTypeFont.java:375)
>          at org.apache.pdfbox.pdmodel.font.PDTrueTypeFont.ensureFontDescriptor(PDTrueTypeFont.java:221)
>          at org.apache.pdfbox.pdmodel.font.PDTrueTypeFont.<init>(PDTrueTypeFont.java:119)
>          at org.apache.pdfbox.pdmodel.font.PDFontFactory.createFont(PDFontFactory.java:121)
>          at org.apache.pdfbox.pdmodel.PDResources.getFonts(PDResources.java:204)
>          at org.apache.pdfbox.util.PDFStreamEngine.getFonts(PDFStreamEngine.java:604)
>          at org.apache.pdfbox.util.operator.SetTextFont.process(SetTextFont.java:54)
>          at org.apache.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:554)
>          at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:268)
>          at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:235)
>          at org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:215)
>          at org.apache.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:456)
>          at org.apache.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:381)
>          at org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:340)
>          at org.apache.pdfbox.ExtractText.startExtraction(ExtractText.java:275)
>          at org.apache.pdfbox.ExtractText.main(ExtractText.java:85)
>          at org.apache.pdfbox.PDFBox.main(PDFBox.java:58)
> Mar 10, 2014 3:50:45 PM org.apache.pdfbox.util.PDFStreamEngine processEncodedText
> WARNING: java.lang.NullPointerException
> Throwable occurred: java.lang.NullPointerException
>          at org.apache.pdfbox.util.PDFStreamEngine.processEncodedText(PDFStreamEngine.java:355)
>          at org.apache.pdfbox.util.operator.ShowText.process(ShowText.java:45)
>          at org.apache.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:554)
>          at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:268)
>          at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:235)
>          at org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:215)
>          at org.apache.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:456)
>          at org.apache.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:381)
>          at org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:340)
>          at org.apache.pdfbox.ExtractText.startExtraction(ExtractText.java:275)
>          at org.apache.pdfbox.ExtractText.main(ExtractText.java:85)
>          at org.apache.pdfbox.PDFBox.main(PDFBox.java:58)
> Mar 10, 2014 3:50:45 PM org.apache.pdfbox.util.PDFStreamEngine processOperator
> WARNING: java.lang.NullPointerException
> Throwable occurred: java.lang.NullPointerException
>          at org.apache.pdfbox.util.PDFStreamEngine.processEncodedText(PDFStreamEngine.java:364)
>          at org.apache.pdfbox.util.operator.ShowText.process(ShowText.java:45)
>          at org.apache.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:554)
>          at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:268)
>          at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:235)
>          at org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:215)
>          at org.apache.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:456)
>          at org.apache.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:381)
>          at org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:340)
>          at org.apache.pdfbox.ExtractText.startExtraction(ExtractText.java:275)
>          at org.apache.pdfbox.ExtractText.main(ExtractText.java:85)
>          at org.apache.pdfbox.PDFBox.main(PDFBox.java:58)
> 
> Thanks,
> Craig Strong

Re: Extracting text from PDF with no embedded fonts

Posted by Tilman Hausherr <TH...@t-online.de>.
Hi,

The best would be to create an issue with JIRA and upload the file 
there, if it isn't confidential.

Re "the latest", did you use an 1.8 version or a 2.0 version?

Tilman

Am 10.03.2014 21:19, schrieb Craig Strong:
> I have been using PDFBox to extract text from several different PDF files fine.  I use the latest PDFBox app with ExtractText class.  There is one PDF that PDFBox (and iText) fails to extract any text even though I can extract the text with Adobe Reader and also pdftotext.exe part of XPdf.  I don't want to have to rely on using pdftotext.exe from a PC since this is part of an automated application.  I think the error relates to an unknown font type and having to use the few fonts installed in the jar file.  I tried running the API classes and trying to force a font from a certain location but I still got errors.  I thought I loaded the font with the loadTTF method but I don't know if that did anything with the font.  I would really like to have this working straight from the ExtractText class anyway.  I'm thinking I might have to build my own after putting a bunch of Windows fonts somewhere and changing a properties file but I really don't know
>   if that is the right direction I should be taking and I am new to PDFBox.  Any ideas?
> Here are the errors I am getting.  I tried this from both a Windows PC and our system but I get the same errors.  The section starting processEncodedText and on repeats a few times so I just included the first entries.
>   
> Mar 10, 2014 3:50:44 PM org.apache.pdfbox.pdmodel.font.PDFontFactory createFont
> WARNING: Substituting TrueType for unknown font subtype=
> Mar 10, 2014 3:50:45 PM org.apache.pdfbox.util.PDFStreamEngine processOperator
> WARNING: java.lang.NullPointerException
> Throwable occurred: java.lang.NullPointerException
>          at org.apache.pdfbox.pdmodel.font.PDTrueTypeFont.loadDescriptorDictionary(PDTrueTypeFont.java:375)
>          at org.apache.pdfbox.pdmodel.font.PDTrueTypeFont.ensureFontDescriptor(PDTrueTypeFont.java:221)
>          at org.apache.pdfbox.pdmodel.font.PDTrueTypeFont.<init>(PDTrueTypeFont.java:119)
>          at org.apache.pdfbox.pdmodel.font.PDFontFactory.createFont(PDFontFactory.java:121)
>          at org.apache.pdfbox.pdmodel.PDResources.getFonts(PDResources.java:204)
>          at org.apache.pdfbox.util.PDFStreamEngine.getFonts(PDFStreamEngine.java:604)
>          at org.apache.pdfbox.util.operator.SetTextFont.process(SetTextFont.java:54)
>          at org.apache.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:554)
>          at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:268)
>          at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:235)
>          at org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:215)
>          at org.apache.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:456)
>          at org.apache.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:381)
>          at org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:340)
>          at org.apache.pdfbox.ExtractText.startExtraction(ExtractText.java:275)
>          at org.apache.pdfbox.ExtractText.main(ExtractText.java:85)
>          at org.apache.pdfbox.PDFBox.main(PDFBox.java:58)
> Mar 10, 2014 3:50:45 PM org.apache.pdfbox.util.PDFStreamEngine processEncodedText
> WARNING: java.lang.NullPointerException
> Throwable occurred: java.lang.NullPointerException
>          at org.apache.pdfbox.util.PDFStreamEngine.processEncodedText(PDFStreamEngine.java:355)
>          at org.apache.pdfbox.util.operator.ShowText.process(ShowText.java:45)
>          at org.apache.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:554)
>          at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:268)
>          at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:235)
>          at org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:215)
>          at org.apache.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:456)
>          at org.apache.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:381)
>          at org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:340)
>          at org.apache.pdfbox.ExtractText.startExtraction(ExtractText.java:275)
>          at org.apache.pdfbox.ExtractText.main(ExtractText.java:85)
>          at org.apache.pdfbox.PDFBox.main(PDFBox.java:58)
> Mar 10, 2014 3:50:45 PM org.apache.pdfbox.util.PDFStreamEngine processOperator
> WARNING: java.lang.NullPointerException
> Throwable occurred: java.lang.NullPointerException
>          at org.apache.pdfbox.util.PDFStreamEngine.processEncodedText(PDFStreamEngine.java:364)
>          at org.apache.pdfbox.util.operator.ShowText.process(ShowText.java:45)
>          at org.apache.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:554)
>          at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:268)
>          at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:235)
>          at org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:215)
>          at org.apache.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:456)
>          at org.apache.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:381)
>          at org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:340)
>          at org.apache.pdfbox.ExtractText.startExtraction(ExtractText.java:275)
>          at org.apache.pdfbox.ExtractText.main(ExtractText.java:85)
>          at org.apache.pdfbox.PDFBox.main(PDFBox.java:58)
>
> Thanks,
> Craig Strong