You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@pdfbox.apache.org by nd...@bidmc.harvard.edu on 2010/01/06 18:10:34 UTC

Help on usage

Hi,

I am trying to use pdfbox to extract text out of pdf files and having a hard time.  I was wondering if anyone has some time to help me.

(1)     Downloaded the pre-built standalone binary (pdfbox-0.8.0-incubating-standalone.jar)
(2)     Extracted the files using:  jar -xf  pdfbox-0.8.0-incubating-standalone.jar

But  could not find the windows binaries of the command line utilities as indicated on the web site - "They are available as windows binaries and as standard Java applications." (http://pdfbox.apache.org/commandlineutilities/index.html)

Thanks,
Nguyen Dao


Re: Help on usage

Posted by Andreas Lehmkuehler <an...@lehmi.de>.
Hi,

ndao@bidmc.harvard.edu schrieb:
> Thanks, Adam for the response, but I am still not clear on how to get the ExtractText function to work.
> 
> The "Command line utility" section of the web site indicates:
> 
> <<
> In order to run them as java applications you need to add the following jars to your classpath
> 
>      %PDFBOX_HOME%\external\log4j-1.2.9.jar
>      %PDFBOX_HOME%\lib\PDFBox-0.X.X.jar
> 
> But after downloading and extracting the jar file 
> (jar -xf pdfbox-0.8.0-incubating-standalone.jar), I coud not find the "lib" directory nor "log4j-1.2.9.jar" in "external" directory.
I'm afraid the website is outdated.

> Does anyone have more detailed instructions on how to get ExtractText feature to work on a command line?
You'll need to include at least the following jars from the "external" directory

- pdfbox, fontbox and jempbox
- icu4j if your are using right2left text
- both lucene jars if you want to index some pdfs using lucene

> 
> Thanks,
> Nguyen 

BR
Andreas Lehmkühler



RE: Help on usage

Posted by nd...@bidmc.harvard.edu.
Thanks, Adam for the response, but I am still not clear on how to get the ExtractText function to work.

The "Command line utility" section of the web site indicates:

<<
In order to run them as java applications you need to add the following jars to your classpath

     %PDFBOX_HOME%\external\log4j-1.2.9.jar
     %PDFBOX_HOME%\lib\PDFBox-0.X.X.jar
>>

But after downloading and extracting the jar file 
(jar -xf pdfbox-0.8.0-incubating-standalone.jar), I coud not find the "lib" directory nor "log4j-1.2.9.jar" in "external" directory.

Does anyone have more detailed instructions on how to get ExtractText feature to work on a command line?

Thanks,
Nguyen 

-----Original Message-----
From: Adam@swmc.com [mailto:Adam@swmc.com] 
Sent: Wednesday, January 06, 2010 12:12 PM
To: users@pdfbox.apache.org
Subject: Re: Help on usage

Here's how to run a jar file (in any O/S)
java -jar programName.jar

--Adam



From:
ndao@bidmc.harvard.edu
To:
users@pdfbox.apache.org
Date:
01/06/2010 09:11
Subject:
Help on usage



Hi,

I am trying to use pdfbox to extract text out of pdf files and having a 
hard time.  I was wondering if anyone has some time to help me.

(1)     Downloaded the pre-built standalone binary 
(pdfbox-0.8.0-incubating-standalone.jar)
(2)     Extracted the files using:  jar -xf 
pdfbox-0.8.0-incubating-standalone.jar

But  could not find the windows binaries of the command line utilities as 
indicated on the web site - "They are available as windows binaries and as 
standard Java applications." (
http://pdfbox.apache.org/commandlineutilities/index.html)

Thanks,
Nguyen Dao




?  Click here to submit conditions  

This email and any content within or attached hereto from  Sun West Mortgage Company, Inc.  is confidential and/or legally privileged. The information is intended only for the use of the individual or entity named on this email. If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution or the taking of any action in reliance on the contents of this email information is strictly prohibited, and that the documents should be returned to this office immediately by email. Receipt by anyone other than the intended recipient is not a waiver of any privilege. Please do not include your social security number, account number, or any other personal or financial information in the content of the email. Should you have any questions, please call  (800) 453 7884.   


Re: Help on usage

Posted by Ad...@swmc.com.
Here's how to run a jar file (in any O/S)
java -jar programName.jar

--Adam



From:
ndao@bidmc.harvard.edu
To:
users@pdfbox.apache.org
Date:
01/06/2010 09:11
Subject:
Help on usage



Hi,

I am trying to use pdfbox to extract text out of pdf files and having a 
hard time.  I was wondering if anyone has some time to help me.

(1)     Downloaded the pre-built standalone binary 
(pdfbox-0.8.0-incubating-standalone.jar)
(2)     Extracted the files using:  jar -xf 
pdfbox-0.8.0-incubating-standalone.jar

But  could not find the windows binaries of the command line utilities as 
indicated on the web site - "They are available as windows binaries and as 
standard Java applications." (
http://pdfbox.apache.org/commandlineutilities/index.html)

Thanks,
Nguyen Dao




?  Click here to submit conditions  

This email and any content within or attached hereto from  Sun West Mortgage Company, Inc.  is confidential and/or legally privileged. The information is intended only for the use of the individual or entity named on this email. If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution or the taking of any action in reliance on the contents of this email information is strictly prohibited, and that the documents should be returned to this office immediately by email. Receipt by anyone other than the intended recipient is not a waiver of any privilege. Please do not include your social security number, account number, or any other personal or financial information in the content of the email. Should you have any questions, please call  (800) 453 7884.