You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@pdfbox.apache.org by "Tilman Hausherr (JIRA)" <ji...@apache.org> on 2015/11/09 17:45:11 UTC

[jira] [Closed] (PDFBOX-495) PDFTextStripperByArea extracts text only from 1 region, despite several regions being defined

     [ https://issues.apache.org/jira/browse/PDFBOX-495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tilman Hausherr closed PDFBOX-495.
----------------------------------
    Resolution: Cannot Reproduce

Closing for lack of details. Please reopen only if you attach
- a PDF
- some code that shows the problem on a current version.


> PDFTextStripperByArea extracts text only from 1 region, despite several regions being defined
> ---------------------------------------------------------------------------------------------
>
>                 Key: PDFBOX-495
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-495
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Text extraction
>    Affects Versions: 0.8.0-incubator
>         Environment: Debian, java SE 6
>            Reporter: Ismael Hasan
>
> When trying to extract the text from several areas defined in the PDFTextStripperByArea,  it only
> retrieves the text from one. The problem can be seen with the following steps: 
> Divide a page in 4 regions and add the regions to the stripper in
> the following order:
> 1-upper left, 2-upper right, 3-lower left, 4-lower right.
> After calling "extractRegions" function, only the text for the third
> one is retrieved.
> If the third region is not added (i.e., only regions 1, 2 and 4 are added), only the text for region 2 is retrieved.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org