You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@pdfbox.apache.org by xzz <19...@qq.com> on 2012/09/02 12:02:47 UTC
hardly to found information about how to program with pdfbox
hi, I found it's really hard to found how to program with pdfbox. For a instance, how can i get the specific content like text or graghic on certain page, I can't found it in Tutorials or Cookbook and the Sample of ExtractTextByArea in Cookbook is just an api of it. So is there anything further information about it.
Re: hardly to found information about how to program with pdfbox
Posted by rey malahay <re...@gmail.com>.
Hi xzz.
Try this to extract text at a certain page of a pdf file:
1. Declare a new PDFTextStripper and PDFParser, i.e.
InputStream = new FileInputStream("PATH_TO_YOUR_PDF_FILE");
PDFTextStripper stripper = new PDFTextStripper();
> PDFParser parser = new PDFParser(stream);
2. Once you have set up your stream, pdf stripper and pdf parser, you are
ready to manipulate the contents of the pdf file:
stripper.setSortByPosition( false );
>
// start and end page can be the same if you just want one page.
stripper.setStartPage(the_page_where_parsing_starts);
> stripper.setEndPage((the_page_where_parsing_ends);
> parser.parse();
> stripper.getText(parser.getPDDocument());
I hope this helps. Let me know how this goes.
Thanks,
rey malahay
On 2 September 2012 04:02, xzz <19...@qq.com> wrote:
> hi, I found it's really hard to found how to program with pdfbox. For a
> instance, how can i get the specific content like text or graghic on
> certain page, I can't found it in Tutorials or Cookbook and the Sample of
> ExtractTextByArea in Cookbook is just an api of it. So is there anything
> further information about it.
--
My heroes are the ones who survived doing it wrong, who made mistakes, but
recovered from them. - Bono
Re: hardly to found information about how to program with pdfbox
Posted by rey malahay <re...@gmail.com>.
Hi xzz.
Try this to extract text at a certain page of a pdf file:
1. Declare a new PDFTextStripper and PDFParser, i.e.
InputStream = new FileInputStream("PATH_TO_YOUR_PDF_FILE");
PDFTextStripper stripper = new PDFTextStripper();
> PDFParser parser = new PDFParser(stream);
2. Once you have set up your stream, pdf stripper and pdf parser, you are
ready to manipulate the contents of the pdf file:
stripper.setSortByPosition( false );
>
// start and end page can be the same if you just want one page.
stripper.setStartPage(the_page_where_parsing_starts);
> stripper.setEndPage((the_page_where_parsing_ends);
> parser.parse();
> stripper.getText(parser.getPDDocument());
I hope this helps. Let me know how this goes.
Thanks,
rey malahay
On 2 September 2012 04:02, xzz <19...@qq.com> wrote:
> hi, I found it's really hard to found how to program with pdfbox. For a
> instance, how can i get the specific content like text or graghic on
> certain page, I can't found it in Tutorials or Cookbook and the Sample of
> ExtractTextByArea in Cookbook is just an api of it. So is there anything
> further information about it.
--
My heroes are the ones who survived doing it wrong, who made mistakes, but
recovered from them. - Bono