You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Tilman Hausherr (JIRA)" <ji...@apache.org> on 2017/10/18 16:09:00 UTC

[jira] [Updated] (PDFBOX-3970) x,y co-ordinates of the text inside the cell are not getting correctly.

     [ https://issues.apache.org/jira/browse/PDFBOX-3970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tilman Hausherr updated PDFBOX-3970:
------------------------------------
    Attachment: paragraphNextToTable-marked-1.png

You didn't attach any code so I don't know how you got your values. I have attached the result file of the DrawPrintTextLocations example.

> x,y co-ordinates of the text inside the cell are not getting correctly.
> -----------------------------------------------------------------------
>
>                 Key: PDFBOX-3970
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-3970
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Text extraction
>    Affects Versions: 2.0.7
>         Environment: Operating system: Windows 7 (64 bit).
>            Reporter: Navnath Kumbhar
>         Attachments: paragraphNextToTable-marked-1.png, paragraphNextToTable.pdf
>
>
> Hello Support Team,
> I am working on a project which parses a whole PDF document and stores the extracted text in some .txt file which can be read by other product.
> My issue is regarding extracting the text inside the cell of a table: 
> *x,y co-ordinates of the text inside the cell are not getting correctly.*
> Y value of the last text line in the cell is getting larger than cell's max-Y value.
> I have attached the test file with this bug.
> As you can see in the test document, there is one cell along-with text in it and a text paragraph next to that cell.
> x-y coordinates that I get from pdfbox for all the paths (two vertical and two horizontal lines) of the cell are:
> (in x1,y1,x2,y2 format)
> Horizontal line 1: [100,88,220,88]
> Horizontal line 2: [100,120,220,120]
> Vertical line 1 : [100,88,100,120]
> Vertical line 2: [220,88,220,120]
> (Y values of the above paths are final values by subtracting the actual value given by pdfbox from height of the page as I see that for paths, y-values are processed from bottom to up)
> And bounding box of the last line in that cell is : [102,114,59,7] and hence max-Y of that line becomes 121 (min-Y + height)
>  
> So, if we consider max-Y value of that cell (i.e. 120)  and that of last line in that cell (i.e. 121), clearly, that line goes out of that cell.
> What can be the possible reason for this?
> Thank you in advance!
> Regards,
> Navnath Kumbhar



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org