You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@pdfbox.apache.org by Adrian <ad...@gmail.com> on 2020/09/17 10:06:11 UTC

Extract Document Properties from PDF/A files

Hi everyone,
I'm trying to read the document properties ([Author, CreationDate, Creator, Keywords, ModDate, Producer, Subject, Title]) from a PDF/A file but I get NULL values only.
If I open the file with Acrobat Reader DC, I get the message "This file claims compliance with PDF/A standard and has been opened read-only to prevent modification".
Clinking on Enable Editing button the file is no more pdf/a compliant, and then saving the file I can finally read the document properties with PDFBOX.

1) there is a specific way to read a pdf/a document properties?
2) If not, there is a way to remove the pdf/a compliance?

Regards,
Adrian

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


Re: Extract Document Properties from PDF/A files

Posted by Marc Kaufman <ka...@cs.stanford.edu>.
Acrobat will sync the values in Metadata with Document Properties when 
saving the file. Perhaps only the Metadata was present in your test file.

Marc

On 9/22/2020 11:37 PM, Adrian wrote:
> Hi Tilman,
> thanks for your help.
> I still can't read document properties but I've found a copy of those properties inside AdobePDFSchema metadata.
>
> Have a nice day,
> Adrian
>
> On 2020/09/17 17:10:57, Tilman Hausherr <TH...@t-online.de> wrote:
>> Hi,
>>
>> What code did you use to get the document properties?
>>
>> The saved file may have properties saved from Adobe itself.
>>
>> re 1: yes there are two, see ExtractMetadata.java and
>> PrintDocumentMetaData.java in the example subproject of the source code
>> download.
>>
>> Tilman
>>
>> Am 17.09.2020 um 12:06 schrieb Adrian:
>>> Hi everyone,
>>> I'm trying to read the document properties ([Author, CreationDate, Creator, Keywords, ModDate, Producer, Subject, Title]) from a PDF/A file but I get NULL values only.
>>> If I open the file with Acrobat Reader DC, I get the message "This file claims compliance with PDF/A standard and has been opened read-only to prevent modification".
>>> Clinking on Enable Editing button the file is no more pdf/a compliant, and then saving the file I can finally read the document properties with PDFBOX.
>>>
>>> 1) there is a specific way to read a pdf/a document properties?
>>> 2) If not, there is a way to remove the pdf/a compliance?
>>>
>>> Regards,
>>> Adrian

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


Re: Extract Document Properties from PDF/A files

Posted by Tilman Hausherr <TH...@t-online.de>.
Am 23.09.2020 um 08:37 schrieb Adrian:
> Hi Tilman,
> thanks for your help.
> I still can't read document properties but I've found a copy of those properties inside AdobePDFSchema metadata.


It isn't required that they'd be in the document properties. (IIRC, in 
PDF/A-1b when it is in both then it must be identical)

Tilman



>
> Have a nice day,
> Adrian
>
> On 2020/09/17 17:10:57, Tilman Hausherr <TH...@t-online.de> wrote:
>> Hi,
>>
>> What code did you use to get the document properties?
>>
>> The saved file may have properties saved from Adobe itself.
>>
>> re 1: yes there are two, see ExtractMetadata.java and
>> PrintDocumentMetaData.java in the example subproject of the source code
>> download.
>>
>> Tilman
>>
>> Am 17.09.2020 um 12:06 schrieb Adrian:
>>> Hi everyone,
>>> I'm trying to read the document properties ([Author, CreationDate, Creator, Keywords, ModDate, Producer, Subject, Title]) from a PDF/A file but I get NULL values only.
>>> If I open the file with Acrobat Reader DC, I get the message "This file claims compliance with PDF/A standard and has been opened read-only to prevent modification".
>>> Clinking on Enable Editing button the file is no more pdf/a compliant, and then saving the file I can finally read the document properties with PDFBOX.
>>>
>>> 1) there is a specific way to read a pdf/a document properties?
>>> 2) If not, there is a way to remove the pdf/a compliance?
>>>
>>> Regards,
>>> Adrian
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
>>> For additional commands, e-mail: users-help@pdfbox.apache.org
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
>> For additional commands, e-mail: users-help@pdfbox.apache.org
>>
>>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


Re: Extract Document Properties from PDF/A files

Posted by Adrian <ad...@gmail.com>.
Hi Tilman,
thanks for your help. 
I still can't read document properties but I've found a copy of those properties inside AdobePDFSchema metadata.

Have a nice day,
Adrian

On 2020/09/17 17:10:57, Tilman Hausherr <TH...@t-online.de> wrote: 
> Hi,
> 
> What code did you use to get the document properties?
> 
> The saved file may have properties saved from Adobe itself.
> 
> re 1: yes there are two, see ExtractMetadata.java and 
> PrintDocumentMetaData.java in the example subproject of the source code 
> download.
> 
> Tilman
> 
> Am 17.09.2020 um 12:06 schrieb Adrian:
> > Hi everyone,
> > I'm trying to read the document properties ([Author, CreationDate, Creator, Keywords, ModDate, Producer, Subject, Title]) from a PDF/A file but I get NULL values only.
> > If I open the file with Acrobat Reader DC, I get the message "This file claims compliance with PDF/A standard and has been opened read-only to prevent modification".
> > Clinking on Enable Editing button the file is no more pdf/a compliant, and then saving the file I can finally read the document properties with PDFBOX.
> >
> > 1) there is a specific way to read a pdf/a document properties?
> > 2) If not, there is a way to remove the pdf/a compliance?
> >
> > Regards,
> > Adrian
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> > For additional commands, e-mail: users-help@pdfbox.apache.org
> >
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
> 
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


Re: Extract Document Properties from PDF/A files

Posted by Tilman Hausherr <TH...@t-online.de>.
Hi,

What code did you use to get the document properties?

The saved file may have properties saved from Adobe itself.

re 1: yes there are two, see ExtractMetadata.java and 
PrintDocumentMetaData.java in the example subproject of the source code 
download.

Tilman

Am 17.09.2020 um 12:06 schrieb Adrian:
> Hi everyone,
> I'm trying to read the document properties ([Author, CreationDate, Creator, Keywords, ModDate, Producer, Subject, Title]) from a PDF/A file but I get NULL values only.
> If I open the file with Acrobat Reader DC, I get the message "This file claims compliance with PDF/A standard and has been opened read-only to prevent modification".
> Clinking on Enable Editing button the file is no more pdf/a compliant, and then saving the file I can finally read the document properties with PDFBOX.
>
> 1) there is a specific way to read a pdf/a document properties?
> 2) If not, there is a way to remove the pdf/a compliance?
>
> Regards,
> Adrian
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org