You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Daniel Cortes <dc...@fib.upc.edu> on 2005/01/05 13:06:36 UTC

PDFBox deprecated methods

I've been use PDFBox in my indexation of a directory . I've download  
the last version of  PDFBox (0.6.7.a) and I've seen that the method that 
I use to extract
was a deprecated method. PDFTextStripper.getText().
stripper.getText(new PDDocument(cosDoc));
I know a lot of person use same me this method. What  are alternative 
options ?



---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: PDFBox deprecated methods

Posted by be...@csh.rit.edu.
Daniel,

Yes, that getText( PDDocument ) is the method you should be using.

You no longer need to use a COSDocument object, please note the following 
methods that go along with the deprecation of getText( COSDocument )

PDFParser.getPDDocument() - to get a PDDocument instead of a COSDocument after 
parsing
PDDocument.load() - A convenience method that does all the PDFParser stuff and 
returns a PDDocument
LucenePDFDocument.getDocument() - to go straight from a File/URL to a lucene 
document object


Ben


Quoting Daniel Cortes <dc...@fib.upc.edu>:

> Ok I reply myself
> the method deprecated is .getText(Cos Document))
> if you do stripper.getText(new PDDocument(cosDoc)) there isn't any problem.
> 
> 
> Excuse me, for the question
> 
> 
> Daniel Cortes wrote:
> 
> > I've been use PDFBox in my indexation of a directory . I've download  
> > the last version of  PDFBox (0.6.7.a) and I've seen that the method 
> > that I use to extract
> > was a deprecated method. PDFTextStripper.getText().
> > stripper.getText(new PDDocument(cosDoc));
> > I know a lot of person use same me this method. What  are alternative 
> > options ?
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> > For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> >
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> 




-------------------------------------------------
This mail sent through IMP: http://horde.org/imp/

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: PDFBox deprecated methods

Posted by Daniel Cortes <dc...@fib.upc.edu>.
Ok I reply myself
the method deprecated is .getText(Cos Document))
if you do stripper.getText(new PDDocument(cosDoc)) there isn't any problem.


Excuse me, for the question


Daniel Cortes wrote:

> I've been use PDFBox in my indexation of a directory . I've download  
> the last version of  PDFBox (0.6.7.a) and I've seen that the method 
> that I use to extract
> was a deprecated method. PDFTextStripper.getText().
> stripper.getText(new PDDocument(cosDoc));
> I know a lot of person use same me this method. What  are alternative 
> options ?
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>



---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org