You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by Jake Shin <ja...@hotmail.com> on 2010/02/10 04:30:47 UTC

RE: Question

Hi,

 

I was wondering if this will help to read a pdf file and put it into MS Excel format that I want.  Also, I have to do vlookups for 4-5 columns in Excel after I put values.  Is this possible?

 

Kindly,

 
 		 	   		  
_________________________________________________________________
Check your Hotmail from your phone.
http://go.microsoft.com/?linkid=9708121

Re: Question

Posted by Ted Dunning <te...@gmail.com>.
The text stripper utilities have a structure that would allow you to write a
little bit of code that could do this if you externally define the
positional ranges for the different columns in your PDF.

This requires moderate programming skills to succeed.  Is that in-scope for
you?

On Wed, Feb 10, 2010 at 6:35 AM, Martinez, Mel - 1004 - MITLL <
m.martinez@ll.mit.edu> wrote:

> Unfortunately, the pdf->html conversion does not convert pdf tables to html
> tables.
>
> The pdf->html utility simply takes the output of the PDF text extraction
> process and wraps it in some simple html tags so that you can view the
> extracted text in a web browser.  It does not preserve table structures
> other than as raw extracted text (assuming the table is embedded as text
> and
> not as an image).  It also does not preserve images - though those can be
> extracted separately and in theory re-combined with the html by hand.
>
> -----Original Message-----
> From: Daniel Wilson [mailto:williamstonconsulting@gmail.com]
> Sent: Wednesday, February 10, 2010 8:53 AM
> To: dev@pdfbox.apache.org
> Subject: Re: Question
>
> Jake, I'm really not sure PDFBox will help with that.
>
> But ... maybe.
>
> You could try the PDFBox utility that converts PDF's to HTML.  If your
> PDF's
> are coming out as tables in HTML ... then ... it would not be a long step
> to
> derive a utility that outputs the XML format that Excel 2007 supports.
>
> As far as vlookups, I really don't know.
>
> Daniel Wilson
>
> On Tue, Feb 9, 2010 at 10:30 PM, Jake Shin <ja...@hotmail.com> wrote:
>
> >
> > Hi,
> >
> >
> >
> > I was wondering if this will help to read a pdf file and put it into MS
> > Excel format that I want.  Also, I have to do vlookups for 4-5 columns in
> > Excel after I put values.  Is this possible?
> >
> >
> >
> > Kindly,
> >
> >
> >
> > _________________________________________________________________
> > Check your Hotmail from your phone.
> > http://go.microsoft.com/?linkid=9708121
> >
>



-- 
Ted Dunning, CTO
DeepDyve

RE: Question

Posted by "Martinez, Mel - 1004 - MITLL" <m....@ll.mit.edu>.
Unfortunately, the pdf->html conversion does not convert pdf tables to html
tables.

The pdf->html utility simply takes the output of the PDF text extraction
process and wraps it in some simple html tags so that you can view the
extracted text in a web browser.  It does not preserve table structures
other than as raw extracted text (assuming the table is embedded as text and
not as an image).  It also does not preserve images - though those can be
extracted separately and in theory re-combined with the html by hand.

-----Original Message-----
From: Daniel Wilson [mailto:williamstonconsulting@gmail.com] 
Sent: Wednesday, February 10, 2010 8:53 AM
To: dev@pdfbox.apache.org
Subject: Re: Question

Jake, I'm really not sure PDFBox will help with that.

But ... maybe.

You could try the PDFBox utility that converts PDF's to HTML.  If your PDF's
are coming out as tables in HTML ... then ... it would not be a long step to
derive a utility that outputs the XML format that Excel 2007 supports.

As far as vlookups, I really don't know.

Daniel Wilson

On Tue, Feb 9, 2010 at 10:30 PM, Jake Shin <ja...@hotmail.com> wrote:

>
> Hi,
>
>
>
> I was wondering if this will help to read a pdf file and put it into MS
> Excel format that I want.  Also, I have to do vlookups for 4-5 columns in
> Excel after I put values.  Is this possible?
>
>
>
> Kindly,
>
>
>
> _________________________________________________________________
> Check your Hotmail from your phone.
> http://go.microsoft.com/?linkid=9708121
>

Re: Question

Posted by Daniel Wilson <wi...@gmail.com>.
Jake, I'm really not sure PDFBox will help with that.

But ... maybe.

You could try the PDFBox utility that converts PDF's to HTML.  If your PDF's
are coming out as tables in HTML ... then ... it would not be a long step to
derive a utility that outputs the XML format that Excel 2007 supports.

As far as vlookups, I really don't know.

Daniel Wilson

On Tue, Feb 9, 2010 at 10:30 PM, Jake Shin <ja...@hotmail.com> wrote:

>
> Hi,
>
>
>
> I was wondering if this will help to read a pdf file and put it into MS
> Excel format that I want.  Also, I have to do vlookups for 4-5 columns in
> Excel after I put values.  Is this possible?
>
>
>
> Kindly,
>
>
>
> _________________________________________________________________
> Check your Hotmail from your phone.
> http://go.microsoft.com/?linkid=9708121
>