You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by Syed Shahab Shaukat <er...@gmail.com> on 2016/02/03 08:58:35 UTC

Need help to read table data in a pdf

Hi Folks ,

In my project , I need to read data from unstrcured tables in a PDF file.

COuld someone help me with approach (Code snipped /example ) to tackle this
problem.

I could see only basic data extraction using pdfbox.



-- 
Thanks & Regards ,
Syed Shahab Shaukat.

Re: Need help to read table data in a pdf

Posted by John Hewson <jo...@jahewson.com>.
Check out Tabula, it uses PDFBox:

http://tabula.technology

-- John

> On 2 Feb 2016, at 23:58, Syed Shahab Shaukat <er...@gmail.com> wrote:
> 
> Hi Folks ,
> 
> In my project , I need to read data from unstrcured tables in a PDF file.
> 
> COuld someone help me with approach (Code snipped /example ) to tackle this
> problem.
> 
> I could see only basic data extraction using pdfbox.
> 
> 
> 
> -- 
> Thanks & Regards ,
> Syed Shahab Shaukat.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org


Re: Need help to read table data in a pdf

Posted by Daniel Persson <ma...@gmail.com>.
Hi Syed.

I've done a lot of extraction of data from PDF material using PDFBox. The
current TextStripper isn't the best way to go about extracting data but is
sufficient to fetch all the letters.

What do you mean by unstructured tables? In my day to day use tables are
quite structured.

You could try using the PDFTextStripperByArea if you know where the text
your trying to extract is located. And then get it extracted from a
specific region.

I found this example for that:
http://massapi.com/class/pd/PDFTextStripperByArea.html

It all depends on the scope of your extraction problem, in the worst case
you might want to build a new extractor extending the PDFTextStreamEngine.

Best of luck
Daniel

On Wed, Feb 3, 2016 at 8:58 AM, Syed Shahab Shaukat <er.syedshahab@gmail.com
> wrote:

> Hi Folks ,
>
> In my project , I need to read data from unstrcured tables in a PDF file.
>
> COuld someone help me with approach (Code snipped /example ) to tackle this
> problem.
>
> I could see only basic data extraction using pdfbox.
>
>
>
> --
> Thanks & Regards ,
> Syed Shahab Shaukat.
>