You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Ryan Nideffer (JIRA)" <ji...@apache.org> on 2010/05/13 23:55:42 UTC

[jira] Created: (PDFBOX-726) PDFTextStripper: allow access to currentPageNo variable

PDFTextStripper: allow access to currentPageNo variable
-------------------------------------------------------

                 Key: PDFBOX-726
                 URL: https://issues.apache.org/jira/browse/PDFBOX-726
             Project: PDFBox
          Issue Type: Improvement
          Components: Text extraction
    Affects Versions: 1.1.0
            Reporter: Ryan Nideffer
             Fix For: 1.2.0


I've extended org.apache.pdfbox.util.PDFTextStripper and I'm using it to perform a 2-pass extraction over a document. However, the second pass doesnt happen because I am unable to alter the variable currentPageNo, which maintains the current page number in the pdf document. It is a variable with access modifier of private, and only a get method is provided.

The only time currentPageNo is set to 0 is via 'writePage(PDDocument, OutputStream)' which I am overriding/not calling.

2 possible resolutions:
- make currentPageNo protected instead of private (preferred)
- add setCurrentPageNo method

Thank you,
Ryan

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Resolved: (PDFBOX-726) PDFTextStripper: allow access to currentPageNo variable

Posted by "Andreas Lehmkühler (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PDFBOX-726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andreas Lehmkühler resolved PDFBOX-726.
---------------------------------------

    Resolution: Fixed

IMHO manipulating the current page number directly seems to be a little bit to risky, as it is possible to break the whole extraction process doing so.
But I think I found another working solution: PDFTextStripper now overrides the resetEngine method and resets the current page number every time when that method is called.

I've added the changes with version 956354. Thanks to Ryan for the hint.



> PDFTextStripper: allow access to currentPageNo variable
> -------------------------------------------------------
>
>                 Key: PDFBOX-726
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-726
>             Project: PDFBox
>          Issue Type: Improvement
>          Components: Text extraction
>    Affects Versions: 1.1.0
>            Reporter: Ryan Nideffer
>             Fix For: 1.2.0
>
>
> I've extended org.apache.pdfbox.util.PDFTextStripper and I'm using it to perform a 2-pass extraction over a document. However, the second pass doesnt happen because I am unable to alter the variable currentPageNo, which maintains the current page number in the pdf document. It is a variable with access modifier of private, and only a get method is provided.
> The only time currentPageNo is set to 0 is via 'writePage(PDDocument, OutputStream)' which I am overriding/not calling.
> 2 possible resolutions:
> - make currentPageNo protected instead of private (preferred)
> - add setCurrentPageNo method
> Thank you,
> Ryan

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.