You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@pdfbox.apache.org by "John Hewson (JIRA)" <ji...@apache.org> on 2014/06/17 21:33:02 UTC

[jira] [Closed] (PDFBOX-832) Extract text from table, or find table co-ordinates from page. If there is no way to find out table, then just give co-ordinates of rectangle.

     [ https://issues.apache.org/jira/browse/PDFBOX-832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

John Hewson closed PDFBOX-832.
------------------------------

    Resolution: Won't Fix

Closing because tables don't exist in the PDF format - this is a case for OCR. The bounding boxes of text can already be extracted, see PrintTextLocations.

> Extract text from table, or find table co-ordinates from page. If there is no way to find out table, then just give co-ordinates of rectangle.
> ----------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: PDFBOX-832
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-832
>             Project: PDFBox
>          Issue Type: Improvement
>          Components: Text extraction
>    Affects Versions: 1.2.1
>            Reporter: Pratik Thaker
>
> Please provide some mechanism to extract text from a table. If it is not possible to find out table in pdf then just provide co-ordinates of outer rectangle.



--
This message was sent by Atlassian JIRA
(v6.2#6252)