You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@pdfbox.apache.org by xzz <19...@qq.com> on 2012/09/02 12:02:47 UTC

hardly to found information about how to program with pdfbox

hi, I found it's really hard to found how to program with pdfbox. For a instance, how can i get the specific  content like text or graghic on certain page,  I can't found it in Tutorials or Cookbook and the Sample of ExtractTextByArea in Cookbook is just an api of it.  So is there anything further information about it.

Re: hardly to found information about how to program with pdfbox

Posted by rey malahay <re...@gmail.com>.
Hi xzz.

Try this to extract text at a certain page of a  pdf file:

1. Declare a new PDFTextStripper and PDFParser, i.e.

InputStream = new FileInputStream("PATH_TO_YOUR_PDF_FILE");

PDFTextStripper stripper = new PDFTextStripper();
> PDFParser parser = new PDFParser(stream);


2. Once you have set up your stream, pdf stripper and pdf parser, you are
ready to manipulate the contents of the pdf file:

stripper.setSortByPosition( false );
>


// start and end page can be the same if you just want one page.

stripper.setStartPage(the_page_where_parsing_starts);
> stripper.setEndPage((the_page_where_parsing_ends);
> parser.parse();
> stripper.getText(parser.getPDDocument());



I hope this helps. Let me know how this goes.

Thanks,
rey malahay


On 2 September 2012 04:02, xzz <19...@qq.com> wrote:

> hi, I found it's really hard to found how to program with pdfbox. For a
> instance, how can i get the specific  content like text or graghic on
> certain page,  I can't found it in Tutorials or Cookbook and the Sample of
> ExtractTextByArea in Cookbook is just an api of it.  So is there anything
> further information about it.




-- 
My heroes are the ones who survived doing it wrong, who made mistakes, but
recovered from them. - Bono

Re: hardly to found information about how to program with pdfbox

Posted by rey malahay <re...@gmail.com>.
Hi xzz.

Try this to extract text at a certain page of a  pdf file:

1. Declare a new PDFTextStripper and PDFParser, i.e.

InputStream = new FileInputStream("PATH_TO_YOUR_PDF_FILE");

PDFTextStripper stripper = new PDFTextStripper();
> PDFParser parser = new PDFParser(stream);


2. Once you have set up your stream, pdf stripper and pdf parser, you are
ready to manipulate the contents of the pdf file:

stripper.setSortByPosition( false );
>


// start and end page can be the same if you just want one page.

stripper.setStartPage(the_page_where_parsing_starts);
> stripper.setEndPage((the_page_where_parsing_ends);
> parser.parse();
> stripper.getText(parser.getPDDocument());



I hope this helps. Let me know how this goes.

Thanks,
rey malahay


On 2 September 2012 04:02, xzz <19...@qq.com> wrote:

> hi, I found it's really hard to found how to program with pdfbox. For a
> instance, how can i get the specific  content like text or graghic on
> certain page,  I can't found it in Tutorials or Cookbook and the Sample of
> ExtractTextByArea in Cookbook is just an api of it.  So is there anything
> further information about it.




-- 
My heroes are the ones who survived doing it wrong, who made mistakes, but
recovered from them. - Bono