You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by Syed Shahab Shaukat <er...@gmail.com> on 2016/02/03 08:58:35 UTC
Need help to read table data in a pdf
Hi Folks ,
In my project , I need to read data from unstrcured tables in a PDF file.
COuld someone help me with approach (Code snipped /example ) to tackle this
problem.
I could see only basic data extraction using pdfbox.
--
Thanks & Regards ,
Syed Shahab Shaukat.
Re: Need help to read table data in a pdf
Posted by John Hewson <jo...@jahewson.com>.
Check out Tabula, it uses PDFBox:
http://tabula.technology
-- John
> On 2 Feb 2016, at 23:58, Syed Shahab Shaukat <er...@gmail.com> wrote:
>
> Hi Folks ,
>
> In my project , I need to read data from unstrcured tables in a PDF file.
>
> COuld someone help me with approach (Code snipped /example ) to tackle this
> problem.
>
> I could see only basic data extraction using pdfbox.
>
>
>
> --
> Thanks & Regards ,
> Syed Shahab Shaukat.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org
Re: Need help to read table data in a pdf
Posted by Daniel Persson <ma...@gmail.com>.
Hi Syed.
I've done a lot of extraction of data from PDF material using PDFBox. The
current TextStripper isn't the best way to go about extracting data but is
sufficient to fetch all the letters.
What do you mean by unstructured tables? In my day to day use tables are
quite structured.
You could try using the PDFTextStripperByArea if you know where the text
your trying to extract is located. And then get it extracted from a
specific region.
I found this example for that:
http://massapi.com/class/pd/PDFTextStripperByArea.html
It all depends on the scope of your extraction problem, in the worst case
you might want to build a new extractor extending the PDFTextStreamEngine.
Best of luck
Daniel
On Wed, Feb 3, 2016 at 8:58 AM, Syed Shahab Shaukat <er.syedshahab@gmail.com
> wrote:
> Hi Folks ,
>
> In my project , I need to read data from unstrcured tables in a PDF file.
>
> COuld someone help me with approach (Code snipped /example ) to tackle this
> problem.
>
> I could see only basic data extraction using pdfbox.
>
>
>
> --
> Thanks & Regards ,
> Syed Shahab Shaukat.
>