You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@pdfbox.apache.org by Michel de Lange <mi...@yahoo.co.uk> on 2015/04/25 04:25:26 UTC

newbie question

Dear experts,

I am having a few difficulties starting with pdfbox. The program does 
not seem to find anything in my pdf files. I get this output:

Extract content pdf document leng ----> 0

Here is my code:

public static void main(String[] args){

     PDDocument doc = new PDDocument();
     try {
         doc.load(new File("Shaffer.pdf"));
         String docText=null;
          try {
            PDFTextStripper stripper=new PDFTextStripper();
            docText=stripper.getText(doc);
             System.out.println("Extract content pdf document length -> 
" + docText.length());
          }
          finally {
              if (docText == null) {
              logger.info("****************   PDF content is null 
*********************");
            }
          }

         doc.close();

     } catch (Exception e) {
         // TODO Auto-generated catch block
         e.printStackTrace();
     }
}

The program finds the file, and it is definitely a proper pfd with contents. What am I doing wrong?

Many thanks,



Michel

Re: newbie question

Posted by Michel de Lange <mi...@yahoo.co.uk>.

Hi again,


I have figured it out.


I should have coded:

         PDDocument load = doc.load(new File("Shaffer.pdf"));

         String docText=null;

          try {

            PDFTextStripper stripper=new PDFTextStripper();

            //stripper.setStartPage(1);

            //stripper.setEndPage(2);

            docText=stripper.getText(load);



So I should assign the document to a variable (load), and then pass that 
to getText. Easy if you know how!

Many thanks for your attention and help, it is much appreciated.

With best regards from New Zealand,



Michel


On 26/04/2015 18:23, Tilman Hausherr wrote:
> Am 25.04.2015 um 04:25 schrieb Michel de Lange:
>>
>>
>> The program finds the file, and it is definitely a proper pfd with 
>> contents. What am I doing wrong? 
>
> How do you know that it is definitively a proper PDF with contents? 
> Just because you see something on the screen doesn't mean you can 
> extract text.
>
> https://pdfbox.apache.org/1.8/faq.html#notext
>
> Tilman
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org

Re: newbie question

Posted by Tilman Hausherr <TH...@t-online.de>.

Am 25.04.2015 um 04:25 schrieb Michel de Lange:
>
>
> The program finds the file, and it is definitely a proper pfd with 
> contents. What am I doing wrong? 

How do you know that it is definitively a proper PDF with contents? Just 
because you see something on the screen doesn't mean you can extract text.

https://pdfbox.apache.org/1.8/faq.html#notext

Tilman

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org

Re: newbie question

Posted by Michel de Lange <mi...@yahoo.co.uk>.

Yes, excellent, I see it now.

many thanks for your help, I do very much appreciate it.

With best regards from New Zealand


Michel


On 26/04/2015 19:07, Tilman Hausherr wrote:
> I found the problem. Your document is empty, as you don't assign the 
> result of "load()". Here's how to do it:
>
>    PDDocument doc = PDDocument.load(new File("Shaffer et al. 2015.pdf"));
>         String docText = null;
>         PDFTextStripper stripper = new PDFTextStripper();
>         stripper.setStartPage(1);
>         stripper.setEndPage(2);
>         docText = stripper.getText(doc);
>         System.out.println("Extract content pdf document length -> " + 
> docText.length());
>
> output:
>
> Extract content pdf document length -> 5172
>
> Tilman
>
>
> Am 26.04.2015 um 01:22 schrieb Michel de Lange:
>> Hi again,
>>
>>
>> Thank you for your help. I have set the start and end page, but it 
>> makes no difference.
>>
>> Extract content pdf document length -> 0
>>
>>
>>
>> PDDocument doc = new PDDocument();
>>     try {
>>         doc.load(new File("Shaffer.pdf"));
>>         String docText=null;
>>          try {
>>            PDFTextStripper stripper=new PDFTextStripper();
>>            stripper.setStartPage(1);
>>            stripper.setEndPage(2);
>>            docText=stripper.getText(doc);
>>             System.out.println("Extract content pdf document length 
>> -> " + docText.length());
>>          }
>>
>>
>> Many thanks,
>>
>>
>> Michel
>>
>>
>> On 25/04/2015 19:30, Gilad Denneboom wrote:
>>> Try setting the start and end pages to strip...
>>>
>>> On Sat, Apr 25, 2015 at 4:25 AM, Michel de Lange <
>>> michel_de_lange@yahoo.co.uk> wrote:
>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
>> For additional commands, e-mail: users-help@pdfbox.apache.org
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org

Re: newbie question

Posted by Tilman Hausherr <TH...@t-online.de>.

I found the problem. Your document is empty, as you don't assign the 
result of "load()". Here's how to do it:

    PDDocument doc = PDDocument.load(new File("Shaffer et al. 2015.pdf"));
         String docText = null;
         PDFTextStripper stripper = new PDFTextStripper();
         stripper.setStartPage(1);
         stripper.setEndPage(2);
         docText = stripper.getText(doc);
         System.out.println("Extract content pdf document length -> " + 
docText.length());

output:

Extract content pdf document length -> 5172

Tilman


Am 26.04.2015 um 01:22 schrieb Michel de Lange:
> Hi again,
>
>
> Thank you for your help. I have set the start and end page, but it 
> makes no difference.
>
> Extract content pdf document length -> 0
>
>
>
> PDDocument doc = new PDDocument();
>     try {
>         doc.load(new File("Shaffer.pdf"));
>         String docText=null;
>          try {
>            PDFTextStripper stripper=new PDFTextStripper();
>            stripper.setStartPage(1);
>            stripper.setEndPage(2);
>            docText=stripper.getText(doc);
>             System.out.println("Extract content pdf document length -> 
> " + docText.length());
>          }
>
>
> Many thanks,
>
>
> Michel
>
>
> On 25/04/2015 19:30, Gilad Denneboom wrote:
>> Try setting the start and end pages to strip...
>>
>> On Sat, Apr 25, 2015 at 4:25 AM, Michel de Lange <
>> michel_de_lange@yahoo.co.uk> wrote:
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org

Re: newbie question

Posted by Michel de Lange <mi...@yahoo.co.uk>.

Hi again,


Thank you for your help. I have set the start and end page, but it makes 
no difference.

Extract content pdf document length -> 0



PDDocument doc = new PDDocument();
     try {
         doc.load(new File("Shaffer.pdf"));
         String docText=null;
          try {
            PDFTextStripper stripper=new PDFTextStripper();
            stripper.setStartPage(1);
            stripper.setEndPage(2);
            docText=stripper.getText(doc);
             System.out.println("Extract content pdf document length -> 
" + docText.length());
          }


Many thanks,


Michel


On 25/04/2015 19:30, Gilad Denneboom wrote:
> Try setting the start and end pages to strip...
>
> On Sat, Apr 25, 2015 at 4:25 AM, Michel de Lange <
> michel_de_lange@yahoo.co.uk> wrote:
>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org

Re: newbie question

Posted by Gilad Denneboom <gi...@gmail.com>.

Try setting the start and end pages to strip...

On Sat, Apr 25, 2015 at 4:25 AM, Michel de Lange <
michel_de_lange@yahoo.co.uk> wrote:

> Dear experts,
>
> I am having a few difficulties starting with pdfbox. The program does not
> seem to find anything in my pdf files. I get this output:
>
> Extract content pdf document leng ----> 0
>
> Here is my code:
>
> public static void main(String[] args){
>
>     PDDocument doc = new PDDocument();
>     try {
>         doc.load(new File("Shaffer.pdf"));
>         String docText=null;
>          try {
>            PDFTextStripper stripper=new PDFTextStripper();
>            docText=stripper.getText(doc);
>             System.out.println("Extract content pdf document length -> " +
> docText.length());
>          }
>          finally {
>              if (docText == null) {
>              logger.info("****************   PDF content is null
> *********************");
>            }
>          }
>
>         doc.close();
>
>     } catch (Exception e) {
>         // TODO Auto-generated catch block
>         e.printStackTrace();
>     }
> }
>
> The program finds the file, and it is definitely a proper pfd with
> contents. What am I doing wrong?
>
> Many thanks,
>
>
>
> Michel
>
>