You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@pdfbox.apache.org by Michel de Lange <mi...@yahoo.co.uk> on 2015/04/25 04:25:26 UTC
newbie question
Dear experts,
I am having a few difficulties starting with pdfbox. The program does
not seem to find anything in my pdf files. I get this output:
Extract content pdf document leng ----> 0
Here is my code:
public static void main(String[] args){
PDDocument doc = new PDDocument();
try {
doc.load(new File("Shaffer.pdf"));
String docText=null;
try {
PDFTextStripper stripper=new PDFTextStripper();
docText=stripper.getText(doc);
System.out.println("Extract content pdf document length ->
" + docText.length());
}
finally {
if (docText == null) {
logger.info("**************** PDF content is null
*********************");
}
}
doc.close();
} catch (Exception e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
The program finds the file, and it is definitely a proper pfd with contents. What am I doing wrong?
Many thanks,
Michel
Re: newbie question
Posted by Michel de Lange <mi...@yahoo.co.uk>.
Hi again,
I have figured it out.
I should have coded:
PDDocument load = doc.load(new File("Shaffer.pdf"));
String docText=null;
try {
PDFTextStripper stripper=new PDFTextStripper();
//stripper.setStartPage(1);
//stripper.setEndPage(2);
docText=stripper.getText(load);
So I should assign the document to a variable (load), and then pass that
to getText. Easy if you know how!
Many thanks for your attention and help, it is much appreciated.
With best regards from New Zealand,
Michel
On 26/04/2015 18:23, Tilman Hausherr wrote:
> Am 25.04.2015 um 04:25 schrieb Michel de Lange:
>>
>>
>> The program finds the file, and it is definitely a proper pfd with
>> contents. What am I doing wrong?
>
> How do you know that it is definitively a proper PDF with contents?
> Just because you see something on the screen doesn't mean you can
> extract text.
>
> https://pdfbox.apache.org/1.8/faq.html#notext
>
> Tilman
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org
Re: newbie question
Posted by Tilman Hausherr <TH...@t-online.de>.
Am 25.04.2015 um 04:25 schrieb Michel de Lange:
>
>
> The program finds the file, and it is definitely a proper pfd with
> contents. What am I doing wrong?
How do you know that it is definitively a proper PDF with contents? Just
because you see something on the screen doesn't mean you can extract text.
https://pdfbox.apache.org/1.8/faq.html#notext
Tilman
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org
Re: newbie question
Posted by Michel de Lange <mi...@yahoo.co.uk>.
Yes, excellent, I see it now.
many thanks for your help, I do very much appreciate it.
With best regards from New Zealand
Michel
On 26/04/2015 19:07, Tilman Hausherr wrote:
> I found the problem. Your document is empty, as you don't assign the
> result of "load()". Here's how to do it:
>
> PDDocument doc = PDDocument.load(new File("Shaffer et al. 2015.pdf"));
> String docText = null;
> PDFTextStripper stripper = new PDFTextStripper();
> stripper.setStartPage(1);
> stripper.setEndPage(2);
> docText = stripper.getText(doc);
> System.out.println("Extract content pdf document length -> " +
> docText.length());
>
> output:
>
> Extract content pdf document length -> 5172
>
> Tilman
>
>
> Am 26.04.2015 um 01:22 schrieb Michel de Lange:
>> Hi again,
>>
>>
>> Thank you for your help. I have set the start and end page, but it
>> makes no difference.
>>
>> Extract content pdf document length -> 0
>>
>>
>>
>> PDDocument doc = new PDDocument();
>> try {
>> doc.load(new File("Shaffer.pdf"));
>> String docText=null;
>> try {
>> PDFTextStripper stripper=new PDFTextStripper();
>> stripper.setStartPage(1);
>> stripper.setEndPage(2);
>> docText=stripper.getText(doc);
>> System.out.println("Extract content pdf document length
>> -> " + docText.length());
>> }
>>
>>
>> Many thanks,
>>
>>
>> Michel
>>
>>
>> On 25/04/2015 19:30, Gilad Denneboom wrote:
>>> Try setting the start and end pages to strip...
>>>
>>> On Sat, Apr 25, 2015 at 4:25 AM, Michel de Lange <
>>> michel_de_lange@yahoo.co.uk> wrote:
>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
>> For additional commands, e-mail: users-help@pdfbox.apache.org
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org
Re: newbie question
Posted by Tilman Hausherr <TH...@t-online.de>.
I found the problem. Your document is empty, as you don't assign the
result of "load()". Here's how to do it:
PDDocument doc = PDDocument.load(new File("Shaffer et al. 2015.pdf"));
String docText = null;
PDFTextStripper stripper = new PDFTextStripper();
stripper.setStartPage(1);
stripper.setEndPage(2);
docText = stripper.getText(doc);
System.out.println("Extract content pdf document length -> " +
docText.length());
output:
Extract content pdf document length -> 5172
Tilman
Am 26.04.2015 um 01:22 schrieb Michel de Lange:
> Hi again,
>
>
> Thank you for your help. I have set the start and end page, but it
> makes no difference.
>
> Extract content pdf document length -> 0
>
>
>
> PDDocument doc = new PDDocument();
> try {
> doc.load(new File("Shaffer.pdf"));
> String docText=null;
> try {
> PDFTextStripper stripper=new PDFTextStripper();
> stripper.setStartPage(1);
> stripper.setEndPage(2);
> docText=stripper.getText(doc);
> System.out.println("Extract content pdf document length ->
> " + docText.length());
> }
>
>
> Many thanks,
>
>
> Michel
>
>
> On 25/04/2015 19:30, Gilad Denneboom wrote:
>> Try setting the start and end pages to strip...
>>
>> On Sat, Apr 25, 2015 at 4:25 AM, Michel de Lange <
>> michel_de_lange@yahoo.co.uk> wrote:
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
>
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org
Re: newbie question
Posted by Michel de Lange <mi...@yahoo.co.uk>.
Hi again,
Thank you for your help. I have set the start and end page, but it makes
no difference.
Extract content pdf document length -> 0
PDDocument doc = new PDDocument();
try {
doc.load(new File("Shaffer.pdf"));
String docText=null;
try {
PDFTextStripper stripper=new PDFTextStripper();
stripper.setStartPage(1);
stripper.setEndPage(2);
docText=stripper.getText(doc);
System.out.println("Extract content pdf document length ->
" + docText.length());
}
Many thanks,
Michel
On 25/04/2015 19:30, Gilad Denneboom wrote:
> Try setting the start and end pages to strip...
>
> On Sat, Apr 25, 2015 at 4:25 AM, Michel de Lange <
> michel_de_lange@yahoo.co.uk> wrote:
>
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org
Re: newbie question
Posted by Gilad Denneboom <gi...@gmail.com>.
Try setting the start and end pages to strip...
On Sat, Apr 25, 2015 at 4:25 AM, Michel de Lange <
michel_de_lange@yahoo.co.uk> wrote:
> Dear experts,
>
> I am having a few difficulties starting with pdfbox. The program does not
> seem to find anything in my pdf files. I get this output:
>
> Extract content pdf document leng ----> 0
>
> Here is my code:
>
> public static void main(String[] args){
>
> PDDocument doc = new PDDocument();
> try {
> doc.load(new File("Shaffer.pdf"));
> String docText=null;
> try {
> PDFTextStripper stripper=new PDFTextStripper();
> docText=stripper.getText(doc);
> System.out.println("Extract content pdf document length -> " +
> docText.length());
> }
> finally {
> if (docText == null) {
> logger.info("**************** PDF content is null
> *********************");
> }
> }
>
> doc.close();
>
> } catch (Exception e) {
> // TODO Auto-generated catch block
> e.printStackTrace();
> }
> }
>
> The program finds the file, and it is definitely a proper pfd with
> contents. What am I doing wrong?
>
> Many thanks,
>
>
>
> Michel
>
>