You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@pdfbox.apache.org by "Romina O. Leon" <ro...@gmail.com> on 2016/01/12 18:35:42 UTC

Migration to PDFBox 2.0.0

Hi! Great library! Thank you so much :)

I'm migrating my application from PDFBox 1.8.10 to 2.0.0, and I'm trying to
get the text (String) of a page, in your website you quote:

Parsing the Page Content

Getting the content for a page has been simplified.

Prior to PDFBox 2.0 parsing the page content was done using

PDStream contents = page.getContents();PDFStreamParser parser = new
PDFStreamParser(contents.getStream());parser.parse();List<Object>
tokens = parser.getTokens();

But, the method getContents() from the PDPage Class returns an InputStream,
which it can't be cast to a PDStream.

And with the example below:

With PDFBox 2.0 the code is reduced to

PDFStreamParser parser = new
PDFStreamParser(page);parser.parse();List<Object> tokens =
parser.getTokens();

I still can't get the page content!

I will apreciate your help!
Thanks!

-- 
Romina Alejandra Osorio León
Teléfono:     (0412) 0905791
E-mail:         rominaleon.7@gmail.com
[image: https://ve.linkedin.com/in/rominaoleon]
<https://ve.linkedin.com/in/rominaoleon>

Re: Migration to PDFBox 2.0.0

Posted by Tilman Hausherr <TH...@t-online.de>.
Am 12.01.2016 um 18:35 schrieb Romina O. Leon:
> Prior to PDFBox 2.0 parsing the page content was done using
>
> PDStream contents = page.getContents();PDFStreamParser parser = new
> PDFStreamParser(contents.getStream());parser.parse();List<Object>
> tokens = parser.getTokens();
>
> But, the method getContents() from the PDPage Class returns an InputStream,
> which it can't be cast to a PDStream.
>
> And with the example below:
>
> With PDFBox 2.0 the code is reduced to
>
> PDFStreamParser parser = new
> PDFStreamParser(page);parser.parse();List<Object> tokens =
> parser.getTokens();
>
> I still can't get the page content!

What do you mean, getting the page contents? You mention getContents(), 
this returns an inputStream that you can read. Getting the tokens is 
shown in the example you quote. Why do you insist on a PDStream?

Tilman



---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org