You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Daniel Cortes <dc...@fib.upc.edu> on 2005/01/05 13:06:36 UTC
PDFBox deprecated methods
I've been use PDFBox in my indexation of a directory . I've download
the last version of PDFBox (0.6.7.a) and I've seen that the method that
I use to extract
was a deprecated method. PDFTextStripper.getText().
stripper.getText(new PDDocument(cosDoc));
I know a lot of person use same me this method. What are alternative
options ?
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org
Re: PDFBox deprecated methods
Posted by be...@csh.rit.edu.
Daniel,
Yes, that getText( PDDocument ) is the method you should be using.
You no longer need to use a COSDocument object, please note the following
methods that go along with the deprecation of getText( COSDocument )
PDFParser.getPDDocument() - to get a PDDocument instead of a COSDocument after
parsing
PDDocument.load() - A convenience method that does all the PDFParser stuff and
returns a PDDocument
LucenePDFDocument.getDocument() - to go straight from a File/URL to a lucene
document object
Ben
Quoting Daniel Cortes <dc...@fib.upc.edu>:
> Ok I reply myself
> the method deprecated is .getText(Cos Document))
> if you do stripper.getText(new PDDocument(cosDoc)) there isn't any problem.
>
>
> Excuse me, for the question
>
>
> Daniel Cortes wrote:
>
> > I've been use PDFBox in my indexation of a directory . I've download
> > the last version of PDFBox (0.6.7.a) and I've seen that the method
> > that I use to extract
> > was a deprecated method. PDFTextStripper.getText().
> > stripper.getText(new PDDocument(cosDoc));
> > I know a lot of person use same me this method. What are alternative
> > options ?
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> > For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> >
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>
-------------------------------------------------
This mail sent through IMP: http://horde.org/imp/
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org
Re: PDFBox deprecated methods
Posted by Daniel Cortes <dc...@fib.upc.edu>.
Ok I reply myself
the method deprecated is .getText(Cos Document))
if you do stripper.getText(new PDDocument(cosDoc)) there isn't any problem.
Excuse me, for the question
Daniel Cortes wrote:
> I've been use PDFBox in my indexation of a directory . I've download
> the last version of PDFBox (0.6.7.a) and I've seen that the method
> that I use to extract
> was a deprecated method. PDFTextStripper.getText().
> stripper.getText(new PDDocument(cosDoc));
> I know a lot of person use same me this method. What are alternative
> options ?
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org