You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Ryan Nideffer (JIRA)" <ji...@apache.org> on 2010/05/13 23:55:42 UTC
[jira] Created: (PDFBOX-726) PDFTextStripper: allow access to
currentPageNo variable
PDFTextStripper: allow access to currentPageNo variable
-------------------------------------------------------
Key: PDFBOX-726
URL: https://issues.apache.org/jira/browse/PDFBOX-726
Project: PDFBox
Issue Type: Improvement
Components: Text extraction
Affects Versions: 1.1.0
Reporter: Ryan Nideffer
Fix For: 1.2.0
I've extended org.apache.pdfbox.util.PDFTextStripper and I'm using it to perform a 2-pass extraction over a document. However, the second pass doesnt happen because I am unable to alter the variable currentPageNo, which maintains the current page number in the pdf document. It is a variable with access modifier of private, and only a get method is provided.
The only time currentPageNo is set to 0 is via 'writePage(PDDocument, OutputStream)' which I am overriding/not calling.
2 possible resolutions:
- make currentPageNo protected instead of private (preferred)
- add setCurrentPageNo method
Thank you,
Ryan
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Resolved: (PDFBOX-726) PDFTextStripper: allow access to
currentPageNo variable
Posted by "Andreas Lehmkühler (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PDFBOX-726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Andreas Lehmkühler resolved PDFBOX-726.
---------------------------------------
Resolution: Fixed
IMHO manipulating the current page number directly seems to be a little bit to risky, as it is possible to break the whole extraction process doing so.
But I think I found another working solution: PDFTextStripper now overrides the resetEngine method and resets the current page number every time when that method is called.
I've added the changes with version 956354. Thanks to Ryan for the hint.
> PDFTextStripper: allow access to currentPageNo variable
> -------------------------------------------------------
>
> Key: PDFBOX-726
> URL: https://issues.apache.org/jira/browse/PDFBOX-726
> Project: PDFBox
> Issue Type: Improvement
> Components: Text extraction
> Affects Versions: 1.1.0
> Reporter: Ryan Nideffer
> Fix For: 1.2.0
>
>
> I've extended org.apache.pdfbox.util.PDFTextStripper and I'm using it to perform a 2-pass extraction over a document. However, the second pass doesnt happen because I am unable to alter the variable currentPageNo, which maintains the current page number in the pdf document. It is a variable with access modifier of private, and only a get method is provided.
> The only time currentPageNo is set to 0 is via 'writePage(PDDocument, OutputStream)' which I am overriding/not calling.
> 2 possible resolutions:
> - make currentPageNo protected instead of private (preferred)
> - add setCurrentPageNo method
> Thank you,
> Ryan
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.