You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@pdfbox.apache.org by Divya Muttineni <di...@gmail.com> on 2014/03/04 01:15:29 UTC

Regarding pdf data extraction

I am trying to convert the tabular data from pdf file to text(.txt) file.
In one of the article I came across
org.apache.pdfbox.pdfviewer.PDFPageDrawer.

Can you please help me how to extend this and override the strokepath()
method.


Thank you,
Divya

Re: Regarding pdf data extraction

Posted by Alin Mazilu <im...@gmail.com>.

I don't think that class can help you... All you need is the
PDFTextStripper class...

On Mon, Mar 3, 2014 at 7:15 PM, Divya Muttineni <di...@gmail.com>wrote:

> I am trying to convert the tabular data from pdf file to text(.txt) file.
> In one of the article I came across
> org.apache.pdfbox.pdfviewer.PDFPageDrawer.
>
> Can you please help me how to extend this and override the strokepath()
> method.
>
>
> Thank you,
> Divya
>

Re: Regarding pdf data extraction

Posted by John Hewson <jo...@jahewson.com>.

Take a look at Tabula http://tabula.nerdpower.org which uses PDFBox.

-- John

> On 3 Mar 2014, at 16:15, Divya Muttineni <di...@gmail.com> wrote:
> 
> I am trying to convert the tabular data from pdf file to text(.txt) file.
> In one of the article I came across
> org.apache.pdfbox.pdfviewer.PDFPageDrawer.
> 
> Can you please help me how to extend this and override the strokepath()
> method.
> 
> 
> Thank you,
> Divya

Re: Regarding pdf data extraction

Posted by Alin Mazilu <im...@gmail.com>.

I don't think that class can help you... All you need is the
PDFTextStripper class...

On Mon, Mar 3, 2014 at 7:15 PM, Divya Muttineni <di...@gmail.com>wrote:

> I am trying to convert the tabular data from pdf file to text(.txt) file.
> In one of the article I came across
> org.apache.pdfbox.pdfviewer.PDFPageDrawer.
>
> Can you please help me how to extend this and override the strokepath()
> method.
>
>
> Thank you,
> Divya
>