You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by "Natarajan.T" <na...@crimsonlogic.co.in> on 2004/07/23 17:02:12 UTC

PDFBox problem.

FYI,
 
I am using PDFBox.jar  to Convert PDF to Text.
 
Problem is in the runtime its printing lot of object messages
 
How can I avoid this one??? How can I go with this one. 
 
import java.io.InputStream;
import java.io.BufferedWriter;
import java.io.IOException;
 
import org.pdfbox.util.PDFTextStripper;
import org.pdfbox.pdfparser.PDFParser;
import org.pdfbox.pdmodel.PDDocument;
import org.pdfbox.pdmodel.PDDocumentInformation;
 
 
/**
 * @author natarajant
 *
 * TODO To change the template for this generated type comment go to
 * Window - Preferences - Java - Code Generation - Code and Comments  */
public class PDFConverter extends DocumentConverter{
 
      public PDFConverter() {
      }
 
       /**
        * This method will construct the Lucene document object from the
        * given information by extracting the text from PDF file.
        *
        * @param              reader and writer - InputStream
and BufferedWriter
        * @return             true or false i.e. extract the
text or not
        */
        public boolean extractText(InputStream  reader, BufferedWriter
writer) throws IOException{
 
             PDFParser parser = null;
             PDDocument pdDoc = null;
             PDFTextStripper stripper = null;
             String pdftext = "";
             String pdftitle = "";
             try {
             parser = new PDFParser(reader);
                   parser.parse();
                   pdDoc = parser.getPDDocument();
 
                   stripper = new PDFTextStripper();
                   pdftext = stripper.getText(pdDoc);
 
                   writer.write(pdftext +" ");
 
             PDDocumentInformation info =
pdDoc.getDocumentInformation();
                   pdftitle = info.getTitle();
 
       } catch(Exception err) {
 
                   System.out.println(err.getMessage());
            }
            writer.close();
            return true;
       }
 
 
}

Re: PDFBox problem.

Posted by Zilverline info <in...@zilverline.org>.

Natarajan.T wrote:

>FYI,
> 
>I am using PDFBox.jar  to Convert PDF to Text.
> 
>Problem is in the runtime its printing lot of object messages
> 
>How can I avoid this one??? How can I go with this one. 
> 
>import java.io.InputStream;
>import java.io.BufferedWriter;
>import java.io.IOException;
> 
>import org.pdfbox.util.PDFTextStripper;
>import org.pdfbox.pdfparser.PDFParser;
>import org.pdfbox.pdmodel.PDDocument;
>import org.pdfbox.pdmodel.PDDocumentInformation;
> 
> 
>/**
> * @author natarajant
> *
> * TODO To change the template for this generated type comment go to
> * Window - Preferences - Java - Code Generation - Code and Comments  */
>public class PDFConverter extends DocumentConverter{
> 
>      public PDFConverter() {
>      }
> 
>       /**
>        * This method will construct the Lucene document object from the
>        * given information by extracting the text from PDF file.
>        *
>        * @param              reader and writer - InputStream
>and BufferedWriter
>        * @return             true or false i.e. extract the
>text or not
>        */
>        public boolean extractText(InputStream  reader, BufferedWriter
>writer) throws IOException{
> 
>             PDFParser parser = null;
>             PDDocument pdDoc = null;
>             PDFTextStripper stripper = null;
>             String pdftext = "";
>             String pdftitle = "";
>             try {
>             parser = new PDFParser(reader);
>                   parser.parse();
>                   pdDoc = parser.getPDDocument();
> 
>                   stripper = new PDFTextStripper();
>                   pdftext = stripper.getText(pdDoc);
> 
>                   writer.write(pdftext +" ");
> 
>             PDDocumentInformation info =
>pdDoc.getDocumentInformation();
>                   pdftitle = info.getTitle();
> 
>       } catch(Exception err) {
> 
>                   System.out.println(err.getMessage());
>  
>
change this to
 return false;

>            }
>            writer.close();
>            return true;
>       }
>  
>
finally { // close all open resources }

> 
> 
>}
> 
>
>  
>



---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

Re: PDFBox problem.

Posted by Ben Litchfield <be...@csh.rit.edu>.


I usually use use -Dlog4j.configuration=log4j.xml when invoking java from
the command line, but I believe this depends on your environment.

ex

java -Dlog4j.configuration=log4j.xml org.pdfbox.ExtractText input.pdf

Ben



On Fri, 23 Jul 2004, Christiaan Fluit wrote:

> We invoke the following code in a static initializer that simply
> disables log4j's output entirely.
>
> 	static {
> 		Properties props = new Properties();
> 		props.put("log4j.threshold", "OFF");
> 		org.apache.log4j.PropertyConfigurator.configure(props);
> 	}
>
> Of course, when you make use of log4j in your own code, you have to be
> more specific.
>
>
> Regards,
>
> Chris.
> --
>
> Natarajan.T wrote:
>
> > FYI,
> >
> > I am using PDFBox.jar  to Convert PDF to Text.
> >
> > Problem is in the runtime its printing lot of object messages
> >
> > How can I avoid this one??? How can I go with this one.
> >
> > import java.io.InputStream;
> > import java.io.BufferedWriter;
> > import java.io.IOException;
> >
> > import org.pdfbox.util.PDFTextStripper;
> > import org.pdfbox.pdfparser.PDFParser;
> > import org.pdfbox.pdmodel.PDDocument;
> > import org.pdfbox.pdmodel.PDDocumentInformation;
> >
> >
> > /**
> >  * @author natarajant
> >  *
> >  * TODO To change the template for this generated type comment go to
> >  * Window - Preferences - Java - Code Generation - Code and Comments  */
> > public class PDFConverter extends DocumentConverter{
> >
> >       public PDFConverter() {
> >       }
> >
> >        /**
> >         * This method will construct the Lucene document object from the
> >         * given information by extracting the text from PDF file.
> >         *
> >         * @param              reader and writer - InputStream
> > and BufferedWriter
> >         * @return             true or false i.e. extract the
> > text or not
> >         */
> >         public boolean extractText(InputStream  reader, BufferedWriter
> > writer) throws IOException{
> >
> >              PDFParser parser = null;
> >              PDDocument pdDoc = null;
> >              PDFTextStripper stripper = null;
> >              String pdftext = "";
> >              String pdftitle = "";
> >              try {
> >              parser = new PDFParser(reader);
> >                    parser.parse();
> >                    pdDoc = parser.getPDDocument();
> >
> >                    stripper = new PDFTextStripper();
> >                    pdftext = stripper.getText(pdDoc);
> >
> >                    writer.write(pdftext +" ");
> >
> >              PDDocumentInformation info =
> > pdDoc.getDocumentInformation();
> >                    pdftitle = info.getTitle();
> >
> >        } catch(Exception err) {
> >
> >                    System.out.println(err.getMessage());
> >             }
> >             writer.close();
> >             return true;
> >        }
> >
> >
> > }
> >
> >
>
>
> --
> christiaan.fluit@aduna.biz
>
> Aduna
> Prinses Julianaplein 14-b
> 3817 CS Amersfoort
> The Netherlands
>
> +31 33 465 9987 phone
> +31 33 465 9987 fax
>
> http://aduna.biz
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

Re: PDFBox problem.

Posted by Christiaan Fluit <Ch...@aduna.biz>.

We invoke the following code in a static initializer that simply 
disables log4j's output entirely.

	static {
		Properties props = new Properties();
		props.put("log4j.threshold", "OFF");
		org.apache.log4j.PropertyConfigurator.configure(props);
	}

Of course, when you make use of log4j in your own code, you have to be 
more specific.


Regards,

Chris.
--

Natarajan.T wrote:

> FYI,
>  
> I am using PDFBox.jar  to Convert PDF to Text.
>  
> Problem is in the runtime its printing lot of object messages
>  
> How can I avoid this one??? How can I go with this one. 
>  
> import java.io.InputStream;
> import java.io.BufferedWriter;
> import java.io.IOException;
>  
> import org.pdfbox.util.PDFTextStripper;
> import org.pdfbox.pdfparser.PDFParser;
> import org.pdfbox.pdmodel.PDDocument;
> import org.pdfbox.pdmodel.PDDocumentInformation;
>  
>  
> /**
>  * @author natarajant
>  *
>  * TODO To change the template for this generated type comment go to
>  * Window - Preferences - Java - Code Generation - Code and Comments  */
> public class PDFConverter extends DocumentConverter{
>  
>       public PDFConverter() {
>       }
>  
>        /**
>         * This method will construct the Lucene document object from the
>         * given information by extracting the text from PDF file.
>         *
>         * @param              reader and writer - InputStream
> and BufferedWriter
>         * @return             true or false i.e. extract the
> text or not
>         */
>         public boolean extractText(InputStream  reader, BufferedWriter
> writer) throws IOException{
>  
>              PDFParser parser = null;
>              PDDocument pdDoc = null;
>              PDFTextStripper stripper = null;
>              String pdftext = "";
>              String pdftitle = "";
>              try {
>              parser = new PDFParser(reader);
>                    parser.parse();
>                    pdDoc = parser.getPDDocument();
>  
>                    stripper = new PDFTextStripper();
>                    pdftext = stripper.getText(pdDoc);
>  
>                    writer.write(pdftext +" ");
>  
>              PDDocumentInformation info =
> pdDoc.getDocumentInformation();
>                    pdftitle = info.getTitle();
>  
>        } catch(Exception err) {
>  
>                    System.out.println(err.getMessage());
>             }
>             writer.close();
>             return true;
>        }
>  
>  
> }
>  
> 


-- 
christiaan.fluit@aduna.biz

Aduna
Prinses Julianaplein 14-b
3817 CS Amersfoort
The Netherlands

+31 33 465 9987 phone
+31 33 465 9987 fax

http://aduna.biz


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org