You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@pdfbox.apache.org by "Tilman Hausherr (JIRA)" <ji...@apache.org> on 2017/03/28 18:51:41 UTC

[jira] [Commented] (PDFBOX-3737) Add a method to process page directly

    [ https://issues.apache.org/jira/browse/PDFBOX-3737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15945724#comment-15945724 ] 

Tilman Hausherr commented on PDFBOX-3737:
-----------------------------------------

That's a design decision. PDPage has the page related methods, i.e. reading and writing the page structures. Other jobs that access these structures, e.g. rendering and text extraction are in separate classes. PDFTextStripper has more options than just start and end page.

Btw you don't have to extend PDFTextStripper. This is only if you do low level activities like getting the individual positions of glyphs. For simple text extraction you just create a PDFTextStripper object.


> Add a method to process page directly
> -------------------------------------
>
>                 Key: PDFBOX-3737
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-3737
>             Project: PDFBox
>          Issue Type: Improvement
>          Components: Parsing
>    Affects Versions: 2.0.0, 2.0.5
>            Reporter: Dewang Sun
>            Priority: Minor
>             Fix For: 2.0.6
>
>
> If you want to process a page, you need extend *PDFTextStripper*, and invoke *setStartPage*, *setEndPage* and *processPage*. Therefore, why not add a method to process a page directly.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org