You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Tilman Hausherr (JIRA)" <ji...@apache.org> on 2014/06/09 21:51:02 UTC
[jira] [Commented] (PDFBOX-18) Possible to Extractact Just Table
From PDF
[ https://issues.apache.org/jira/browse/PDFBOX-18?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14025643#comment-14025643 ]
Tilman Hausherr commented on PDFBOX-18:
---------------------------------------
Indeed, there is no such thing as a "table" in a PDF. See also the similar question and its answer here
http://stackoverflow.com/q/23828463/535646
Many OCR programs attempt to identify "tables" and sometimes it works, sometimes it doesn't.
> Possible to Extractact Just Table From PDF
> ------------------------------------------
>
> Key: PDFBOX-18
> URL: https://issues.apache.org/jira/browse/PDFBOX-18
> Project: PDFBox
> Issue Type: New Feature
> Components: Text extraction
>
> [imported from SourceForge]
> http://sourceforge.net/tracker/index.php?group_id=78314&atid=552835&aid=1020203
> Originally submitted by nobody on 2004-08-31 23:23.
> Sir i want to know is it possible to extract Just Table
> From PDF File ,if it is possible then
> Tell me how i can identify in Streams that this Streams
> contains Table
> Sir i want to mention you also that previously i
> extracted the Text from PDF file and i know the whole
> structure of PDF file
> Just Tell me the exact way how i identify
> Sir i am waiting for you reply
> [comment on SourceForge]
> Originally sent by benlitchfield.
> Logged In: YES
> user_id=601708
> This is an RFE for table support, not a bug request, so I
> am changing the issue type. In addition, PDF documents do
> not contain 'tables', so that information would need to be
> derived and could only be done with little accuracy. I am
> changing the priority to 1, as I will probably never
> implement this myself. Please feel free to submit a patch
> though.
> Ben
--
This message was sent by Atlassian JIRA
(v6.2#6252)