You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Ilija Pavlic (Updated) (JIRA)" <ji...@apache.org> on 2012/01/04 23:43:39 UTC

[jira] [Updated] (PDFBOX-1201) PDFTextStripperByArea y coordinate shifted "up"

     [ https://issues.apache.org/jira/browse/PDFBOX-1201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ilija Pavlic updated PDFBOX-1201:
---------------------------------


Unfortunately, I cannot share the sample pdf.
                
> PDFTextStripperByArea y coordinate shifted "up"
> -----------------------------------------------
>
>                 Key: PDFBOX-1201
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-1201
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Text extraction
>    Affects Versions: 1.6.0
>            Reporter: Ilija Pavlic
>            Priority: Minor
>
> The text stripper region seems to be shifted up from the given coordinates, causing lines below the region to be included and ones above the defined region to be included.
> ...
> PDPage page = (PDPage) allPages.get(0);
> PDFTextStripperByArea stripper = new PDFTextStripperByArea();
> Rectangle2D.Float region = new Rectangle2D.Float(x, y, width, height);
> stripper.addRegion("test region", region);
> // overlay the region with a cyan rectangle to check if I got the coordinates and dimensions right
> PDPageContentStream contentStream = new PDPageContentStream(document, page, true, true);
> contentStream.setNonStrokingColor( Color.CYAN );
> contentStream.fillRect(x, y, width, height);
> contentStream.close();
> stripper.extractRegions(page);
> String content = stripper.getTextForRegion("test region");
> ...
> document.save(...);
> ...
> The cyan rectangle overlays the desired region exactly when viewing the saved output document. On the other hand, stripper misses a couple of lines at the bottom of the rectangle and includes couple of lines above the rectangle.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira