You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by Divya Muttineni <di...@gmail.com> on 2014/03/04 01:15:29 UTC
Regarding pdf data extraction
I am trying to convert the tabular data from pdf file to text(.txt) file.
In one of the article I came across
org.apache.pdfbox.pdfviewer.PDFPageDrawer.
Can you please help me how to extend this and override the strokepath()
method.
Thank you,
Divya
Re: Regarding pdf data extraction
Posted by Alin Mazilu <im...@gmail.com>.
I don't think that class can help you... All you need is the
PDFTextStripper class...
On Mon, Mar 3, 2014 at 7:15 PM, Divya Muttineni <di...@gmail.com>wrote:
> I am trying to convert the tabular data from pdf file to text(.txt) file.
> In one of the article I came across
> org.apache.pdfbox.pdfviewer.PDFPageDrawer.
>
> Can you please help me how to extend this and override the strokepath()
> method.
>
>
> Thank you,
> Divya
>
Re: Regarding pdf data extraction
Posted by John Hewson <jo...@jahewson.com>.
Take a look at Tabula http://tabula.nerdpower.org which uses PDFBox.
-- John
> On 3 Mar 2014, at 16:15, Divya Muttineni <di...@gmail.com> wrote:
>
> I am trying to convert the tabular data from pdf file to text(.txt) file.
> In one of the article I came across
> org.apache.pdfbox.pdfviewer.PDFPageDrawer.
>
> Can you please help me how to extend this and override the strokepath()
> method.
>
>
> Thank you,
> Divya
Re: Regarding pdf data extraction
Posted by Alin Mazilu <im...@gmail.com>.
I don't think that class can help you... All you need is the
PDFTextStripper class...
On Mon, Mar 3, 2014 at 7:15 PM, Divya Muttineni <di...@gmail.com>wrote:
> I am trying to convert the tabular data from pdf file to text(.txt) file.
> In one of the article I came across
> org.apache.pdfbox.pdfviewer.PDFPageDrawer.
>
> Can you please help me how to extend this and override the strokepath()
> method.
>
>
> Thank you,
> Divya
>