You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by "Natarajan.T" <na...@crimsonlogic.co.in> on 2004/07/23 17:02:12 UTC
PDFBox problem.
FYI,
I am using PDFBox.jar to Convert PDF to Text.
Problem is in the runtime its printing lot of object messages
How can I avoid this one??? How can I go with this one.
import java.io.InputStream;
import java.io.BufferedWriter;
import java.io.IOException;
import org.pdfbox.util.PDFTextStripper;
import org.pdfbox.pdfparser.PDFParser;
import org.pdfbox.pdmodel.PDDocument;
import org.pdfbox.pdmodel.PDDocumentInformation;
/**
* @author natarajant
*
* TODO To change the template for this generated type comment go to
* Window - Preferences - Java - Code Generation - Code and Comments */
public class PDFConverter extends DocumentConverter{
public PDFConverter() {
}
/**
* This method will construct the Lucene document object from the
* given information by extracting the text from PDF file.
*
* @param reader and writer - InputStream
and BufferedWriter
* @return true or false i.e. extract the
text or not
*/
public boolean extractText(InputStream reader, BufferedWriter
writer) throws IOException{
PDFParser parser = null;
PDDocument pdDoc = null;
PDFTextStripper stripper = null;
String pdftext = "";
String pdftitle = "";
try {
parser = new PDFParser(reader);
parser.parse();
pdDoc = parser.getPDDocument();
stripper = new PDFTextStripper();
pdftext = stripper.getText(pdDoc);
writer.write(pdftext +" ");
PDDocumentInformation info =
pdDoc.getDocumentInformation();
pdftitle = info.getTitle();
} catch(Exception err) {
System.out.println(err.getMessage());
}
writer.close();
return true;
}
}
Re: PDFBox problem.
Posted by Zilverline info <in...@zilverline.org>.
Natarajan.T wrote:
>FYI,
>
>I am using PDFBox.jar to Convert PDF to Text.
>
>Problem is in the runtime its printing lot of object messages
>
>How can I avoid this one??? How can I go with this one.
>
>import java.io.InputStream;
>import java.io.BufferedWriter;
>import java.io.IOException;
>
>import org.pdfbox.util.PDFTextStripper;
>import org.pdfbox.pdfparser.PDFParser;
>import org.pdfbox.pdmodel.PDDocument;
>import org.pdfbox.pdmodel.PDDocumentInformation;
>
>
>/**
> * @author natarajant
> *
> * TODO To change the template for this generated type comment go to
> * Window - Preferences - Java - Code Generation - Code and Comments */
>public class PDFConverter extends DocumentConverter{
>
> public PDFConverter() {
> }
>
> /**
> * This method will construct the Lucene document object from the
> * given information by extracting the text from PDF file.
> *
> * @param reader and writer - InputStream
>and BufferedWriter
> * @return true or false i.e. extract the
>text or not
> */
> public boolean extractText(InputStream reader, BufferedWriter
>writer) throws IOException{
>
> PDFParser parser = null;
> PDDocument pdDoc = null;
> PDFTextStripper stripper = null;
> String pdftext = "";
> String pdftitle = "";
> try {
> parser = new PDFParser(reader);
> parser.parse();
> pdDoc = parser.getPDDocument();
>
> stripper = new PDFTextStripper();
> pdftext = stripper.getText(pdDoc);
>
> writer.write(pdftext +" ");
>
> PDDocumentInformation info =
>pdDoc.getDocumentInformation();
> pdftitle = info.getTitle();
>
> } catch(Exception err) {
>
> System.out.println(err.getMessage());
>
>
change this to
return false;
> }
> writer.close();
> return true;
> }
>
>
finally { // close all open resources }
>
>
>}
>
>
>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org
Re: PDFBox problem.
Posted by Ben Litchfield <be...@csh.rit.edu>.
I usually use use -Dlog4j.configuration=log4j.xml when invoking java from
the command line, but I believe this depends on your environment.
ex
java -Dlog4j.configuration=log4j.xml org.pdfbox.ExtractText input.pdf
Ben
On Fri, 23 Jul 2004, Christiaan Fluit wrote:
> We invoke the following code in a static initializer that simply
> disables log4j's output entirely.
>
> static {
> Properties props = new Properties();
> props.put("log4j.threshold", "OFF");
> org.apache.log4j.PropertyConfigurator.configure(props);
> }
>
> Of course, when you make use of log4j in your own code, you have to be
> more specific.
>
>
> Regards,
>
> Chris.
> --
>
> Natarajan.T wrote:
>
> > FYI,
> >
> > I am using PDFBox.jar to Convert PDF to Text.
> >
> > Problem is in the runtime its printing lot of object messages
> >
> > How can I avoid this one??? How can I go with this one.
> >
> > import java.io.InputStream;
> > import java.io.BufferedWriter;
> > import java.io.IOException;
> >
> > import org.pdfbox.util.PDFTextStripper;
> > import org.pdfbox.pdfparser.PDFParser;
> > import org.pdfbox.pdmodel.PDDocument;
> > import org.pdfbox.pdmodel.PDDocumentInformation;
> >
> >
> > /**
> > * @author natarajant
> > *
> > * TODO To change the template for this generated type comment go to
> > * Window - Preferences - Java - Code Generation - Code and Comments */
> > public class PDFConverter extends DocumentConverter{
> >
> > public PDFConverter() {
> > }
> >
> > /**
> > * This method will construct the Lucene document object from the
> > * given information by extracting the text from PDF file.
> > *
> > * @param reader and writer - InputStream
> > and BufferedWriter
> > * @return true or false i.e. extract the
> > text or not
> > */
> > public boolean extractText(InputStream reader, BufferedWriter
> > writer) throws IOException{
> >
> > PDFParser parser = null;
> > PDDocument pdDoc = null;
> > PDFTextStripper stripper = null;
> > String pdftext = "";
> > String pdftitle = "";
> > try {
> > parser = new PDFParser(reader);
> > parser.parse();
> > pdDoc = parser.getPDDocument();
> >
> > stripper = new PDFTextStripper();
> > pdftext = stripper.getText(pdDoc);
> >
> > writer.write(pdftext +" ");
> >
> > PDDocumentInformation info =
> > pdDoc.getDocumentInformation();
> > pdftitle = info.getTitle();
> >
> > } catch(Exception err) {
> >
> > System.out.println(err.getMessage());
> > }
> > writer.close();
> > return true;
> > }
> >
> >
> > }
> >
> >
>
>
> --
> christiaan.fluit@aduna.biz
>
> Aduna
> Prinses Julianaplein 14-b
> 3817 CS Amersfoort
> The Netherlands
>
> +31 33 465 9987 phone
> +31 33 465 9987 fax
>
> http://aduna.biz
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org
Re: PDFBox problem.
Posted by Christiaan Fluit <Ch...@aduna.biz>.
We invoke the following code in a static initializer that simply
disables log4j's output entirely.
static {
Properties props = new Properties();
props.put("log4j.threshold", "OFF");
org.apache.log4j.PropertyConfigurator.configure(props);
}
Of course, when you make use of log4j in your own code, you have to be
more specific.
Regards,
Chris.
--
Natarajan.T wrote:
> FYI,
>
> I am using PDFBox.jar to Convert PDF to Text.
>
> Problem is in the runtime its printing lot of object messages
>
> How can I avoid this one??? How can I go with this one.
>
> import java.io.InputStream;
> import java.io.BufferedWriter;
> import java.io.IOException;
>
> import org.pdfbox.util.PDFTextStripper;
> import org.pdfbox.pdfparser.PDFParser;
> import org.pdfbox.pdmodel.PDDocument;
> import org.pdfbox.pdmodel.PDDocumentInformation;
>
>
> /**
> * @author natarajant
> *
> * TODO To change the template for this generated type comment go to
> * Window - Preferences - Java - Code Generation - Code and Comments */
> public class PDFConverter extends DocumentConverter{
>
> public PDFConverter() {
> }
>
> /**
> * This method will construct the Lucene document object from the
> * given information by extracting the text from PDF file.
> *
> * @param reader and writer - InputStream
> and BufferedWriter
> * @return true or false i.e. extract the
> text or not
> */
> public boolean extractText(InputStream reader, BufferedWriter
> writer) throws IOException{
>
> PDFParser parser = null;
> PDDocument pdDoc = null;
> PDFTextStripper stripper = null;
> String pdftext = "";
> String pdftitle = "";
> try {
> parser = new PDFParser(reader);
> parser.parse();
> pdDoc = parser.getPDDocument();
>
> stripper = new PDFTextStripper();
> pdftext = stripper.getText(pdDoc);
>
> writer.write(pdftext +" ");
>
> PDDocumentInformation info =
> pdDoc.getDocumentInformation();
> pdftitle = info.getTitle();
>
> } catch(Exception err) {
>
> System.out.println(err.getMessage());
> }
> writer.close();
> return true;
> }
>
>
> }
>
>
--
christiaan.fluit@aduna.biz
Aduna
Prinses Julianaplein 14-b
3817 CS Amersfoort
The Netherlands
+31 33 465 9987 phone
+31 33 465 9987 fax
http://aduna.biz
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org