You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@pdfbox.apache.org by Евгений Король <ff...@gmail.com> on 2019/02/15 05:54:47 UTC

About Stream

Hello. I trying this code:
PDDocument documentpdf = PDDocument.load(new File("file.pdf"));
PDPage page = documentpdf.getPage(0);
Iterator<PDStream> contentStreams = page.getContentStreams();
while(contentStreams.hasNext())
{
   PDStream next = contentStreams.next();
COSStream cosObject = next.getCOSObject();
cosObject.toTextString()
}
So i get stream from PDF (Returns the contents of the stream as a PDF "text
string".) .
What meanin this content? I try to parse it for some image settings. Where
i can read about format of tis content? Maybe in PDF specs?

AW: Re: About Stream

Posted by Tilman Hausherr <TH...@t-online.de>.
Annex A, page 643.

Tilman


------------------------------------------------------------------------
Gesendet mit der Telekom Mail App
<https://kommunikationsdienste.t-online.de/redirects/email_app_android_sendmail_footer>



--- Original-Nachricht ---
Von: Евгений Король
Betreff: Re: About Stream
Datum: 15.02.2019, 13:07 Uhr
An: users@pdfbox.apache.org





Not found "operator summary" in this PDF. Where it is?

пт, 15 февр. 2019 г. в 17:03, Tilman Hausherr <THausherr@t-online.de
<ma...@t-online.de> >:

> yes, click on "operator summary" in the PDF 32000 specification.
>
> There are probably easier ways to do whatever you want.... see the
> examples
> subproject in the source code download.
>
> Tilman
>
>
> ------------------------------------------------------------------------
> Gesendet mit der Telekom Mail App
> <
> 
https://kommunikationsdienste.t-online.de/redirects/email_app_android_sendmail_footer
<https://kommunikationsdienste.t-online.de/redirects/email_app_android_sendmail_footer>
> >
>
>
>
> --- Original-Nachricht ---
> Von: Евгений Король
> Betreff: About Stream
> Datum: 15.02.2019, 6:54 Uhr
> An: users@pdfbox.apache.org <ma...@pdfbox.apache.org>
>
>
>
>
>
> Hello. I trying this code:
> PDDocument documentpdf = PDDocument.load(new File("file.pdf"));
> PDPage page = documentpdf.getPage(0 <http://documentpdf.getPage(0> <
http://documentpdf.getPage(0> <http://documentpdf.getPage(0>> ; );
> Iterator<PDStream> contentStreams = page.getContentStreams
<http://page.getContentStreams>
> <http://page.getContentStreams> <http://page.getContentStreams>> ; ();
> while(contentStreams.hasNext())
> {
> PDStream next = contentStreams.next <http://contentStreams.next> <
http://contentStreams.next> <http://contentStreams.next>> ; ();
> COSStream cosObject = next.getCOSObject <http://next.getCOSObject> <
http://next.getCOSObject> <http://next.getCOSObject>> ; ();
> cosObject.toTextString <http://cosObject.toTextString> <
http://cosObject.toTextString> <http://cosObject.toTextString>> ; ()
> }
> So i get stream from PDF (Returns the contents of the stream as a PDF 
"text
> string".) .
> What meanin this content? I try to parse it for some image settings. 
Where
> i can read about format of tis content? Maybe in PDF specs?
>

Re: About Stream

Posted by Евгений Король <ff...@gmail.com>.
Not found  "operator summary" in this PDF. Where it is?

пт, 15 февр. 2019 г. в 17:03, Tilman Hausherr <TH...@t-online.de>:

> yes, click on "operator summary" in the PDF 32000 specification.
>
> There are probably easier ways to do whatever you want.... see the
> examples
> subproject in the source code download.
>
> Tilman
>
>
> ------------------------------------------------------------------------
> Gesendet mit der Telekom Mail App
> <
> https://kommunikationsdienste.t-online.de/redirects/email_app_android_sendmail_footer
> >
>
>
>
> --- Original-Nachricht ---
> Von: Евгений Король
> Betreff: About Stream
> Datum: 15.02.2019, 6:54 Uhr
> An: users@pdfbox.apache.org
>
>
>
>
>
> Hello. I trying this code:
> PDDocument documentpdf = PDDocument.load(new File("file.pdf"));
> PDPage page = documentpdf.getPage(0 <http://documentpdf.getPage(0> );
> Iterator<PDStream> contentStreams = page.getContentStreams
> <http://page.getContentStreams> ();
> while(contentStreams.hasNext())
> {
> PDStream next = contentStreams.next <http://contentStreams.next> ();
> COSStream cosObject = next.getCOSObject <http://next.getCOSObject> ();
> cosObject.toTextString <http://cosObject.toTextString> ()
> }
> So i get stream from PDF (Returns the contents of the stream as a PDF "text
> string".) .
> What meanin this content? I try to parse it for some image settings. Where
> i can read about format of tis content? Maybe in PDF specs?
>

AW: About Stream

Posted by Tilman Hausherr <TH...@t-online.de>.
yes, click on "operator summary" in the PDF 32000 specification.

There are probably easier ways to do whatever you want.... see the examples 
subproject in the source code download.

Tilman


------------------------------------------------------------------------
Gesendet mit der Telekom Mail App
<https://kommunikationsdienste.t-online.de/redirects/email_app_android_sendmail_footer>



--- Original-Nachricht ---
Von: Евгений Король
Betreff: About Stream
Datum: 15.02.2019, 6:54 Uhr
An: users@pdfbox.apache.org





Hello. I trying this code:
PDDocument documentpdf = PDDocument.load(new File("file.pdf"));
PDPage page = documentpdf.getPage(0 <http://documentpdf.getPage(0> );
Iterator<PDStream> contentStreams = page.getContentStreams
<http://page.getContentStreams> ();
while(contentStreams.hasNext())
{
PDStream next = contentStreams.next <http://contentStreams.next> ();
COSStream cosObject = next.getCOSObject <http://next.getCOSObject> ();
cosObject.toTextString <http://cosObject.toTextString> ()
}
So i get stream from PDF (Returns the contents of the stream as a PDF "text
string".) .
What meanin this content? I try to parse it for some image settings. Where
i can read about format of tis content? Maybe in PDF specs?

AW: Re: About Stream

Posted by Tilman Hausherr <TH...@t-online.de>.
yes there could be others. The numbers are combined with the existing 
current transformation matrix. That is the concatenation that the PDF 
specification mentions.

Tilman


------------------------------------------------------------------------
Gesendet mit der Telekom Mail App
<https://kommunikationsdienste.t-online.de/redirects/email_app_android_sendmail_footer>



--- Original-Nachricht ---
Von: Евгений Король
Betreff: Re: About Stream
Datum: 20.02.2019, 12:37 Uhr
An: users@pdfbox.apache.org





Hello. Why a got this matrix in one of pdf?
q
0 319 -118 0 4866 1373.73 cm
/R20 Do
Q
q
0 319 -118 0 4866 1373.73 cm
/R21 Do
Q
q
0 319 -118 0 4866 1373.73 cm
/R20 Do
Q
Numbers like 4866 and 1373 takes me coordinates bigger than page dimension.
Maybe its not only this matrix multiplying when i compute coordinates for
image?

пн, 18 февр. 2019 г. в 11:03, Tilman Hausherr <THausherr@t-online.de
<ma...@t-online.de> >:

> You can use the PDFStreamParser class, see the RemoveAllText.java 
example.
>
> 0 319 -118 0 4866 1373.73 cm
>
> is a transform ("Concatenate matrix to current transformation matrix").
> That is explained in the PDF specification in "8.3.2 Coordinate Spaces".
> Yes, it influences the size and the position (and more, here: rotation)
> of the image.
>
> Tilman
>
>
> Am 18.02.2019 um 06:57 schrieb Евгений Король:
> > Hello. I found parsing Operators but i need to parse Content.
> > q
> > 0 319 -118 0 4866 1373.73 cm
> > /R21 Do
> > Q
> > I need to know what is this numbers at second line mean. Maybe it is
> Width
> > and Height of Image and position on page?
> >
> > пт, 15 февр. 2019 г. в 18:50, Matteo Gamboz <gamboz@medialab.sissa.it
<ma...@medialab.sissa.it> >:
> >
> >> On Fri, 15 Feb 2019 06:54:47 +0100,
> >> Евгений Король wrote:
> >>
> >>> I try to parse it for some image settings
> >> Maybe the example PrintImageLocations.java can be useful
> >>
> >>
> 
https://svn.apache.org/viewvc/pdfbox/trunk/examples/src/main/java/org/apache/pdfbox/examples/util/PrintImageLocations.java?view=markup
<https://svn.apache.org/viewvc/pdfbox/trunk/examples/src/main/java/org/apache/pdfbox/examples/util/PrintImageLocations.java?view=markup>
> >>
> >>
> >>
> >>
> >>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
<ma...@pdfbox.apache.org>
> >> For additional commands, e-mail: users-help@pdfbox.apache.org
<ma...@pdfbox.apache.org>
> >>
> >>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
<ma...@pdfbox.apache.org>
> For additional commands, e-mail: users-help@pdfbox.apache.org
<ma...@pdfbox.apache.org>
>
>

Re: About Stream

Posted by Евгений Король <ff...@gmail.com>.
Hello. Why a got this matrix in one of pdf?
q
  0 319 -118 0 4866 1373.73 cm
  /R20 Do
Q
q
  0 319 -118 0 4866 1373.73 cm
  /R21 Do
Q
q
  0 319 -118 0 4866 1373.73 cm
  /R20 Do
Q
Numbers like 4866 and 1373 takes me coordinates bigger than page dimension.
Maybe its not only this matrix multiplying when i compute coordinates for
image?

пн, 18 февр. 2019 г. в 11:03, Tilman Hausherr <TH...@t-online.de>:

> You can use the PDFStreamParser class, see the RemoveAllText.java example.
>
> 0 319 -118 0 4866 1373.73 cm
>
> is a transform ("Concatenate matrix to current transformation matrix").
> That is explained in the PDF specification in "8.3.2 Coordinate Spaces".
> Yes, it influences the size and the position (and more, here: rotation)
> of the image.
>
> Tilman
>
>
> Am 18.02.2019 um 06:57 schrieb Евгений Король:
> > Hello. I found parsing Operators but i need to parse Content.
> > q
> >    0 319 -118 0 4866 1373.73 cm
> >    /R21 Do
> > Q
> > I need to know what is this numbers at second line mean. Maybe it is
> Width
> > and Height of Image and position on page?
> >
> > пт, 15 февр. 2019 г. в 18:50, Matteo Gamboz <ga...@medialab.sissa.it>:
> >
> >> On Fri, 15 Feb 2019 06:54:47 +0100,
> >> Евгений Король wrote:
> >>
> >>>   I try to parse it for some image settings
> >> Maybe the example PrintImageLocations.java can be useful
> >>
> >>
> https://svn.apache.org/viewvc/pdfbox/trunk/examples/src/main/java/org/apache/pdfbox/examples/util/PrintImageLocations.java?view=markup
> >>
> >>
> >>
> >>
> >>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> >> For additional commands, e-mail: users-help@pdfbox.apache.org
> >>
> >>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
>
>

Re: About Stream

Posted by Tilman Hausherr <TH...@t-online.de>.
You can use the PDFStreamParser class, see the RemoveAllText.java example.

0 319 -118 0 4866 1373.73 cm

is a transform ("Concatenate matrix to current transformation matrix"). 
That is explained in the PDF specification in "8.3.2 Coordinate Spaces".
Yes, it influences the size and the position (and more, here: rotation) 
of the image.

Tilman


Am 18.02.2019 um 06:57 schrieb Евгений Король:
> Hello. I found parsing Operators but i need to parse Content.
> q
>    0 319 -118 0 4866 1373.73 cm
>    /R21 Do
> Q
> I need to know what is this numbers at second line mean. Maybe it is Width
> and Height of Image and position on page?
>
> пт, 15 февр. 2019 г. в 18:50, Matteo Gamboz <ga...@medialab.sissa.it>:
>
>> On Fri, 15 Feb 2019 06:54:47 +0100,
>> Евгений Король wrote:
>>
>>>   I try to parse it for some image settings
>> Maybe the example PrintImageLocations.java can be useful
>>
>> https://svn.apache.org/viewvc/pdfbox/trunk/examples/src/main/java/org/apache/pdfbox/examples/util/PrintImageLocations.java?view=markup
>>
>>
>>
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
>> For additional commands, e-mail: users-help@pdfbox.apache.org
>>
>>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


Re: About Stream

Posted by Евгений Король <ff...@gmail.com>.
Hello. I found parsing Operators but i need to parse Content.
q
  0 319 -118 0 4866 1373.73 cm
  /R21 Do
Q
I need to know what is this numbers at second line mean. Maybe it is Width
and Height of Image and position on page?

пт, 15 февр. 2019 г. в 18:50, Matteo Gamboz <ga...@medialab.sissa.it>:

> On Fri, 15 Feb 2019 06:54:47 +0100,
> Евгений Король wrote:
>
> >  I try to parse it for some image settings
>
> Maybe the example PrintImageLocations.java can be useful
>
> https://svn.apache.org/viewvc/pdfbox/trunk/examples/src/main/java/org/apache/pdfbox/examples/util/PrintImageLocations.java?view=markup
>
>
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
>
>

Re: About Stream

Posted by Евгений Король <ff...@gmail.com>.
Thanks. Its what i need

пт, 15 февр. 2019 г., 18:50 Matteo Gamboz gamboz@medialab.sissa.it:

> On Fri, 15 Feb 2019 06:54:47 +0100,
> Евгений Король wrote:
>
> >  I try to parse it for some image settings
>
> Maybe the example PrintImageLocations.java can be useful
>
> https://svn.apache.org/viewvc/pdfbox/trunk/examples/src/main/java/org/apache/pdfbox/examples/util/PrintImageLocations.java?view=markup
>
>
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
>
>

Re: About Stream

Posted by Matteo Gamboz <ga...@medialab.sissa.it>.
On Fri, 15 Feb 2019 06:54:47 +0100,
Евгений Король wrote:

>  I try to parse it for some image settings

Maybe the example PrintImageLocations.java can be useful
https://svn.apache.org/viewvc/pdfbox/trunk/examples/src/main/java/org/apache/pdfbox/examples/util/PrintImageLocations.java?view=markup






---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org