You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Frank van der Hulst (JIRA)" <ji...@apache.org> on 2014/08/22 03:17:11 UTC
[jira] [Commented] (PDFBOX-832) Extract text from table, or find
table co-ordinates from page. If there is no way to find out table, then
just give co-ordinates of rectangle.
[ https://issues.apache.org/jira/browse/PDFBOX-832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14106284#comment-14106284 ]
Frank van der Hulst commented on PDFBOX-832:
--------------------------------------------
I have written a Java class which extracts tables from PDF, using PDFbox. It is NOT fully automated (you must specify the table & column boundaries to it) but it is at least a step towards what is wanted. I'd like to contribute that to the PDFBox project, if someone would tell me how to do that.
> Extract text from table, or find table co-ordinates from page. If there is no way to find out table, then just give co-ordinates of rectangle.
> ----------------------------------------------------------------------------------------------------------------------------------------------
>
> Key: PDFBOX-832
> URL: https://issues.apache.org/jira/browse/PDFBOX-832
> Project: PDFBox
> Issue Type: Improvement
> Components: Text extraction
> Affects Versions: 1.2.1
> Reporter: Pratik Thaker
>
> Please provide some mechanism to extract text from a table. If it is not possible to find out table in pdf then just provide co-ordinates of outer rectangle.
--
This message was sent by Atlassian JIRA
(v6.2#6252)