You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Ilija Pavlic (Commented) (JIRA)" <ji...@apache.org> on 2012/01/04 23:55:39 UTC

[jira] [Commented] (PDFBOX-1201) PDFTextStripperByArea y coordinate shifted "up"

    [ https://issues.apache.org/jira/browse/PDFBOX-1201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13179954#comment-13179954 ] 

Ilija Pavlic commented on PDFBOX-1201:
--------------------------------------

Manually shifting the region considerably lower still misses the lines even though they are generously covered by the cyan rectangle. Defining the region to start (x, 0f, width, height) captures the region.
                
> PDFTextStripperByArea y coordinate shifted "up"
> -----------------------------------------------
>
>                 Key: PDFBOX-1201
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-1201
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Text extraction
>    Affects Versions: 1.6.0
>            Reporter: Ilija Pavlic
>            Priority: Minor
>
> The text stripper region seems to be shifted up from the given coordinates, causing lines below the region to be included and ones above the defined region to be included.
> ...
> PDPage page = (PDPage) allPages.get(0);
> PDFTextStripperByArea stripper = new PDFTextStripperByArea();
> Rectangle2D.Float region = new Rectangle2D.Float(x, y, width, height);
> stripper.addRegion("test region", region);
> // overlay the region with a cyan rectangle to check if I got the coordinates and dimensions right
> PDPageContentStream contentStream = new PDPageContentStream(document, page, true, true);
> contentStream.setNonStrokingColor( Color.CYAN );
> contentStream.fillRect(x, y, width, height);
> contentStream.close();
> stripper.extractRegions(page);
> String content = stripper.getTextForRegion("test region");
> ...
> document.save(...);
> ...
> The cyan rectangle overlays the desired region exactly when viewing the saved output document. On the other hand, stripper misses a couple of lines at the bottom of the rectangle and includes couple of lines above the rectangle.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira