You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Tilman Hausherr (JIRA)" <ji...@apache.org> on 2017/10/18 16:09:00 UTC
[jira] [Updated] (PDFBOX-3970) x,y co-ordinates of the text inside
the cell are not getting correctly.
[ https://issues.apache.org/jira/browse/PDFBOX-3970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Tilman Hausherr updated PDFBOX-3970:
------------------------------------
Attachment: paragraphNextToTable-marked-1.png
You didn't attach any code so I don't know how you got your values. I have attached the result file of the DrawPrintTextLocations example.
> x,y co-ordinates of the text inside the cell are not getting correctly.
> -----------------------------------------------------------------------
>
> Key: PDFBOX-3970
> URL: https://issues.apache.org/jira/browse/PDFBOX-3970
> Project: PDFBox
> Issue Type: Bug
> Components: Text extraction
> Affects Versions: 2.0.7
> Environment: Operating system: Windows 7 (64 bit).
> Reporter: Navnath Kumbhar
> Attachments: paragraphNextToTable-marked-1.png, paragraphNextToTable.pdf
>
>
> Hello Support Team,
> I am working on a project which parses a whole PDF document and stores the extracted text in some .txt file which can be read by other product.
> My issue is regarding extracting the text inside the cell of a table:
> *x,y co-ordinates of the text inside the cell are not getting correctly.*
> Y value of the last text line in the cell is getting larger than cell's max-Y value.
> I have attached the test file with this bug.
> As you can see in the test document, there is one cell along-with text in it and a text paragraph next to that cell.
> x-y coordinates that I get from pdfbox for all the paths (two vertical and two horizontal lines) of the cell are:
> (in x1,y1,x2,y2 format)
> Horizontal line 1: [100,88,220,88]
> Horizontal line 2: [100,120,220,120]
> Vertical line 1 : [100,88,100,120]
> Vertical line 2: [220,88,220,120]
> (Y values of the above paths are final values by subtracting the actual value given by pdfbox from height of the page as I see that for paths, y-values are processed from bottom to up)
> And bounding box of the last line in that cell is : [102,114,59,7] and hence max-Y of that line becomes 121 (min-Y + height)
>
> So, if we consider max-Y value of that cell (i.e. 120) and that of last line in that cell (i.e. 121), clearly, that line goes out of that cell.
> What can be the possible reason for this?
> Thank you in advance!
> Regards,
> Navnath Kumbhar
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org