You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by Sébastien Dailly <se...@chimrod.com> on 2013/05/14 17:54:56 UTC

PDColorSpace should be a dictionnary ?

Hello,

Is there a reason for PDColorSpace not to inherit from COSDictionary ?

Refering to pdfreference, section 3.7.2 (Resource Dictionnaries), a 
ColorSpace is a dictionary, and this cause me trouble with the following 
resource :

> 127 0 obj
> <<
> /ColorSpace 289 0 R
> /ExtGState 290 0 R
> /Shading 291 0 R
> /XObject <<
> >>

with lead to the following ColorSpace :

> 289 0 obj
> <<
> /CS0 [/DeviceN [/Cyan /Magenta]
>  /DeviceCMYK 97 0 R 98 0 R]
> >>

(in PDF 1.4)

Pdfbox 1.7 convert it as PDDeviceN classes, and does not allow me to 
walk inside the referenced object. Is there a way to process for convert 
into a dictionnary ?

(I'm waiting for the right to diffuse the pdf document)

Thanks,

-- 
Sébastien Dailly

Re: PDColorSpace should be a dictionnary ?

Posted by Maruan Sahyoun <sa...@fileaffairs.de>.
Hi,

> Hello Maruan
> 
>> 
>> as we are currently reworking the documentation as well as planning
>> for PDFBox 2.0 I would like to understand why you think that PDFBox
>> object hierarchy doesn't match closer with the PDF structure. Maybe
>> there is room for improvement.
> 
> Well, I think it is a representation problem.
> 
> I'm used to look in the adobe PDF refence when I'm working with a document. PdfBox help me to deflate the streams, or do the operator processing (matrix product can be painfull to do by the hand), but when I encounter a document wich does not work as expected, I mainly works with vim and the adobe documentation.
> 
> I'm also working with printers, and it is easier to speak about the PDF structure than speaking about any API.
> 
> So I think that I firstly represent myself a PDF document with it internal structure, and then try to look in the Pdfbox API what are the corresponding objects.
> 
> Pdfbox has great improvement, the PDFStreamEngine is one of them, but I think it should not differ to much from the data structure its represent, because the pdf structure is well known, and it is easier to understand the API if you already know the document structure. But this is a choice to do, I think there is no good answer, just a choice to assume…
> 
> -- 
> Sébastien Dailly

thanks for the feedback. Gives me a rough idea what you are looking for. And if you have specific requirements, suggestions while working with PDFBox feel free to drop us a note.

BR
Maruan

Re: PDColorSpace should be a dictionnary ?

Posted by Sébastien Dailly <se...@chimrod.com>.
Hello Maruan

>
> as we are currently reworking the documentation as well as planning
> for PDFBox 2.0 I would like to understand why you think that PDFBox
> object hierarchy doesn't match closer with the PDF structure. Maybe
> there is room for improvement.

Well, I think it is a representation problem.

I'm used to look in the adobe PDF refence when I'm working with a 
document. PdfBox help me to deflate the streams, or do the operator 
processing (matrix product can be painfull to do by the hand), but when 
I encounter a document wich does not work as expected, I mainly works 
with vim and the adobe documentation.

I'm also working with printers, and it is easier to speak about the PDF 
structure than speaking about any API.

So I think that I firstly represent myself a PDF document with it 
internal structure, and then try to look in the Pdfbox API what are the 
corresponding objects.

Pdfbox has great improvement, the PDFStreamEngine is one of them, but I 
think it should not differ to much from the data structure its 
represent, because the pdf structure is well known, and it is easier to 
understand the API if you already know the document structure. But this 
is a choice to do, I think there is no good answer, just a choice to 
assume…

-- 
Sébastien Dailly

Re: PDColorSpace should be a dictionnary ?

Posted by Maruan Sahyoun <sa...@fileaffairs.de>.
Hi Sébastien,

as we are currently reworking the documentation as well as planning for PDFBox 2.0 I would like to understand why you think that PDFBox object hierarchy doesn't match closer with the PDF structure. Maybe there is room for improvement.

Maruan Sahyoun

Re: PDColorSpace should be a dictionnary ?

Posted by Sébastien Dailly <se...@chimrod.com>.
Le 2013-05-15 15:29, Andreas Lehmkuehler a écrit :

Hello,

>> Pdfbox 1.7 convert it as PDDeviceN classes, and does not allow me to 
>> walk inside
>> the referenced object. Is there a way to process for convert into a 
>> dictionnary ?
>
> What exactly are you looking for which can't be accessed using the 
> current
> implementation of the PDDeviceN class? There should be getter/setter 
> for all
> values of the given dictionary.
>

I was exploring manualy a PDF in order to find where was images over a 
600DPI limit. The images was found in mask objects and it require for me 
to look inside object wich I'm not familiar with (ExtGState and so on).

I finaly solved my problem, but I was surprised that PDFBox object 
hierarchy does not match closer with the PDF structure, and send this 
mail on the ML in order to know if there a reason for.

This was not a bug report, just informational request.

Thanks,

-- 
Sébastien

Re: PDColorSpace should be a dictionnary ?

Posted by Andreas Lehmkuehler <an...@lehmi.de>.
Hi,


Am 14.05.2013 17:54, schrieb Sébastien Dailly:
> Hello,
>
> Is there a reason for PDColorSpace not to inherit from COSDictionary ?
>
> Refering to pdfreference, section 3.7.2 (Resource Dictionnaries), a ColorSpace
> is a dictionary, and this cause me trouble with the following resource :
>
>> 127 0 obj
>> <<
>> /ColorSpace 289 0 R
>> /ExtGState 290 0 R
>> /Shading 291 0 R
>> /XObject <<
>> >>
>
> with lead to the following ColorSpace :
>
>> 289 0 obj
>> <<
>> /CS0 [/DeviceN [/Cyan /Magenta]
>>  /DeviceCMYK 97 0 R 98 0 R]
>> >>
>
> (in PDF 1.4)
>
> Pdfbox 1.7 convert it as PDDeviceN classes, and does not allow me to walk inside
> the referenced object. Is there a way to process for convert into a dictionnary ?
What exactly are you looking for which can't be accessed using the current
implementation of the PDDeviceN class? There should be getter/setter for all
values of the given dictionary.

> (I'm waiting for the right to diffuse the pdf document)
>
> Thanks,
>

BR
Andreas Lehmkühler

Re: PDColorSpace should be a dictionnary ?

Posted by Sébastien Dailly <se...@chimrod.com>.
Hello Maruan,

Thanks for your answer and the way to cast to cast the object as a 
dictionnary.

-- 
Sébastien Dailly

Re: PDColorSpace should be a dictionnary ?

Posted by Maruan Sahyoun <sa...@fileaffairs.de>.
Hi,

Am 15.05.2013 um 15:26 schrieb Andreas Lehmkuehler <an...@lehmi.de>:

> Hi,
> 
> Am 14.05.2013 18:17, schrieb Maruan Sahyoun:
>> Hi Sébastien,
>> 
>> to get from the PDModel objects (high level api) to the COS Model objects (low level API) you'd
> > use getCOSObject() which always returns a COSBase but you could either cast to COSDictionary or
> > use instanceof to inspect if it's really one. COSObjectable is used as a marker to mark that
> > a PDModel object has a representation in the COS model.
> That's correct, the PDModel is the high level api and there should be no need
> to access the underlying data directly.
> 
>> See http://pdfbox.apache.org/userguide/index.html for a quick intro into PD Model and COS Model
>> 
>> You are right, it could have been implemented differently but currently that's how it works.


This comment was meant about the fact that although it's clear that PDColorSpace is a dictionary getCOSObject() returns COSBase instead of COSDictionary which might have been clearer and easier to handle.

BR
Maruan

Re: PDColorSpace should be a dictionnary ?

Posted by Andreas Lehmkuehler <an...@lehmi.de>.
Hi,

Am 14.05.2013 18:17, schrieb Maruan Sahyoun:
> Hi Sébastien,
>
> to get from the PDModel objects (high level api) to the COS Model objects (low level API) you'd
 > use getCOSObject() which always returns a COSBase but you could either cast 
to COSDictionary or
 > use instanceof to inspect if it's really one. COSObjectable is used as a 
marker to mark that
 > a PDModel object has a representation in the COS model.
That's correct, the PDModel is the high level api and there should be no need
to access the underlying data directly.

> See http://pdfbox.apache.org/userguide/index.html for a quick intro into PD Model and COS Model
>
> You are right, it could have been implemented differently but currently that's how it works.
The PDModel provides an abstraction layer, so that one has not to deal with the
low level part of the pdf spec. All (supported) values should be available using
a getter/setter.

> BR
> Maruan Sahyoun
>
> Am 14.05.2013 um 17:54 schrieb Sébastien Dailly <se...@chimrod.com>:
>
>> Hello,
>>
>> Is there a reason for PDColorSpace not to inherit from COSDictionary ?
>>
>> Refering to pdfreference, section 3.7.2 (Resource Dictionnaries), a ColorSpace is a dictionary, and this cause me trouble with the following resource :
>>
>>> 127 0 obj
>>> <<
>>> /ColorSpace 289 0 R
>>> /ExtGState 290 0 R
>>> /Shading 291 0 R
>>> /XObject <<
>>>>>
>>
>> with lead to the following ColorSpace :
>>
>>> 289 0 obj
>>> <<
>>> /CS0 [/DeviceN [/Cyan /Magenta]
>>> /DeviceCMYK 97 0 R 98 0 R]
>>>>>
>>
>> (in PDF 1.4)
>>
>> Pdfbox 1.7 convert it as PDDeviceN classes, and does not allow me to walk inside the referenced object. Is there a way to process for convert into a dictionnary ?
>>
>> (I'm waiting for the right to diffuse the pdf document)
>>
>> Thanks,
>>
>> --
>> Sébastien Dailly
>

BR
Andreas Lehmkühler


Re: PDColorSpace should be a dictionnary ?

Posted by Maruan Sahyoun <sa...@fileaffairs.de>.
Hi Sébastien,

to get from the PDModel objects (high level api) to the COS Model objects (low level API) you'd use getCOSObject() which always returns a COSBase but you could either cast to COSDictionary or use instanceof to inspect if it's really one. COSObjectable is used as a marker to mark that a PDModel object has a representation in the COS model.

See http://pdfbox.apache.org/userguide/index.html for a quick intro into PD Model and COS Model

You are right, it could have been implemented differently but currently that's how it works.

BR
Maruan Sahyoun

Am 14.05.2013 um 17:54 schrieb Sébastien Dailly <se...@chimrod.com>:

> Hello,
> 
> Is there a reason for PDColorSpace not to inherit from COSDictionary ?
> 
> Refering to pdfreference, section 3.7.2 (Resource Dictionnaries), a ColorSpace is a dictionary, and this cause me trouble with the following resource :
> 
>> 127 0 obj
>> <<
>> /ColorSpace 289 0 R
>> /ExtGState 290 0 R
>> /Shading 291 0 R
>> /XObject <<
>> >>
> 
> with lead to the following ColorSpace :
> 
>> 289 0 obj
>> <<
>> /CS0 [/DeviceN [/Cyan /Magenta]
>> /DeviceCMYK 97 0 R 98 0 R]
>> >>
> 
> (in PDF 1.4)
> 
> Pdfbox 1.7 convert it as PDDeviceN classes, and does not allow me to walk inside the referenced object. Is there a way to process for convert into a dictionnary ?
> 
> (I'm waiting for the right to diffuse the pdf document)
> 
> Thanks,
> 
> -- 
> Sébastien Dailly