You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@pdfbox.apache.org by Colette Joubarne <cj...@privacyanalytics.ca> on 2014/06/13 14:21:06 UTC

Unable to mark document as tagged

I have a tagged pdf doc with the following header:

            /Type/Catalog/Pages 2 0 R/Lang(en-CA) /StructTreeRoot 10 0 R/MarkInfo<</Marked true

I read in the contents, replace some of the text and create a new doc. I copy the document information from the original doc and set marked to true.

            newDoc = new PDDocument();
            newDoc.setDocumentInformation(PTConstants.pdfDoc.getDocumentInformation());

            PDMarkInfo markinfo = new PDMarkInfo();
            markinfo.setMarked(true);
            newDoc.getDocumentCatalog().setMarkInfo(markinfo);

and when I check that it was set, it returns true:

      PDMarkInfo markInfo = PTConstants.pdfDoc.getDocumentCatalog().getMarkInfo();
      if ((markInfo != null) && (markInfo.isMarked())) System.out.println("true");

But, while the resulting document displays correctly, the header indicates that it is not tagged:

/Type /Catalog
/Version /1.4
/Pages 2 0 R
/MarkInfo 3 0 R

Any idea what is going on?

Colette

Re: Unable to mark document as tagged

Posted by Duff Johnson <du...@pdfa.org>.
> this information alone doesn’t make a document a tagged PDF! You might not have the structure information needed within your PDF. 

Indeed…

Not all implementations using tagged PDF are for the purposes of accessibility… but many are. If this is your situation I’d like to suggest that you take a look at the PDF Association’s Matterhorn Protocol. It’s the definitive algorithm for ISO 14289 (PDF/UA) conformance:

http://www.pdfa.org/publication/the-matterhorn-protocol-1/

Duff Johnson

PDF Association, Vice Chairman, & 
  Chairman, North American Chapter

p  +1.617 283 4226
e  duff.johnson@pdfa.org
w  http://www.pdfa.org
l  http://www.linkedin.com/in/duffjohnson/

PDF Association · Association for Digital Document Standards e.V.
Neue Kantstrasse 14 · 14057 Berlin · Germany
Tel +49 30 4439 0310 Fax: +49 30 3940 5099 
District Court: Berlin-Charlottenburg VR 26099 B

Save the Date!
PDF Days in Europe 2014, June 16-17!
http://www.pdfa.org/event/save-the-date-pdf-days-europe-2014/


RE: Unable to mark document as tagged

Posted by Colette Joubarne <cj...@privacyanalytics.ca>.
Maruan and Duff,

This is my first experience using a help forum like this, and the response was great.
I appreciate the help.

I will look into the documentation and hopefully be able to figure out what I am doing wrong.

Colette

-----Original Message-----
From: Duff Johnson [mailto:duff.johnson@pdfa.org] 
Sent: June-13-14 1:59 PM
To: users@pdfbox.apache.org
Subject: Re: Unable to mark document as tagged

Colette,

It might be a good idea to take a look at 14.8 of ISO 32000-1, which defines tagged PDF.

You can download it for free:

http://wwwimages.adobe.com/content/dam/Adobe/en/devnet/pdf/pdfs/PDF32000_2008.pdf

Duff.


On Jun 13, 2014, at 1:52 PM, Maruan Sahyoun <sa...@fileaffairs.de> wrote:

> Colette,
> 
> you are not corrupting the PDF document but the structure Information needed for tagged PDF is missing. 
> 
> Maruan Sahyoun
> 
>> Am 13.06.2014 um 19:41 schrieb Colette Joubarne <cj...@privacyanalytics.ca>:
>> 
>> Maruan,
>> 
>> I use the parser to tokenize, and then loop thru the tokens. If a token is a TJ or Tj operator, I grab the text, in certain cases I replace some of the text (letter by letter, maintaining the existing structure), and add these tokens to a new token list. If it is not a TJ or Tj operator I just copy the token to the new token list. I then write the token list to the doc and save.
>> 
>> If I am corrupting the structure, how is it that the document displays correctly?
>> 
>> Colette
>> 
>> -----Original Message-----
>> From: Maruan Sahyoun [mailto:sahyoun@fileaffairs.de] 
>> Sent: June-13-14 12:54 PM
>> To: users@pdfbox.apache.org
>> Subject: Re: Unable to mark document as tagged
>> 
>> Hi Colette,
>> 
>> the modified version does not contain the structure information needed for tagged PDFs.  How do you create the modified version from the first one?
>> 
>> BR
>> Maruan
>> 
>>> Am 13.06.2014 um 17:48 schrieb Colette Joubarne <cj...@privacyanalytics.ca>:
>>> 
>>> Maruan,
>>> 
>>> I am copying the entire structure from a tagged document and just replacing some of the text, so I would think that the structure is unchanged. Then again who knows what I might have messed up.
>>> 
>>> James-pdf is the original file:
>>> https://dl.dropboxusercontent.com/u/7689859/James.pdf
>>> 
>>> James-mod.pdf is the modified file:
>>> https://dl.dropboxusercontent.com/u/7689859/James-mod.pdf
>>> 
>>> Colette
>>> 
>>> -----Original Message-----
>>> From: Maruan Sahyoun [mailto:sahyoun@fileaffairs.de] 
>>> Sent: June-13-14 10:45 AM
>>> To: users@pdfbox.apache.org
>>> Subject: Re: Unable to mark document as tagged
>>> 
>>> Hi Colette,
>>> 
>>> this information alone doesn't make a document a tagged PDF! You might not have the structure information needed within your PDF. Would you have a works / doesn't work sample which you could upload to a public location as attachments are not allowed on the mailing list?
>>> 
>>> BR
>>> Maruan
>>> 
>>>> Am 13.06.2014 um 15:44 schrieb Colette Joubarne <cj...@privacyanalytics.ca>:
>>>> 
>>>> Maruan,
>>>> 
>>>> Yes you are right, however why is it that when I look at the properties in Adobe Reader it indicates that the document is not tagged?
>>>> 
>>>> 3 0 obj
>>>> <<
>>>> /Marked true
>>>> 
>>>> Colette
>>>> -----Original Message-----
>>>> From: Maruan Sahyoun [mailto:sahyoun@fileaffairs.de] 
>>>> Sent: June-13-14 9:19 AM
>>>> To: users@pdfbox.apache.org
>>>> Subject: Re: Unable to mark document as tagged
>>>> 
>>>> Dear Colette,
>>>> 
>>>> /MarkInfo 3 0 R indicates that the information you are looking for is referenced and should be available in 3 0 obj. Could you verify that?
>>>> 
>>>> With kind regards
>>>> 
>>>> Maruan
>>>> 
>>>>> Am 13.06.2014 um 14:21 schrieb Colette Joubarne <cj...@privacyanalytics.ca>:
>>>>> 
>>>>> I have a tagged pdf doc with the following header:
>>>>> 
>>>>>        /Type/Catalog/Pages 2 0 R/Lang(en-CA) /StructTreeRoot 10 0 R/MarkInfo<</Marked true
>>>>> 
>>>>> I read in the contents, replace some of the text and create a new doc. I copy the document information from the original doc and set marked to true.
>>>>> 
>>>>>        newDoc = new PDDocument();
>>>>>        newDoc.setDocumentInformation(PTConstants.pdfDoc.getDocumentInformation());
>>>>> 
>>>>>        PDMarkInfo markinfo = new PDMarkInfo();
>>>>>        markinfo.setMarked(true);
>>>>>        newDoc.getDocumentCatalog().setMarkInfo(markinfo);
>>>>> 
>>>>> and when I check that it was set, it returns true:
>>>>> 
>>>>>  PDMarkInfo markInfo = PTConstants.pdfDoc.getDocumentCatalog().getMarkInfo();
>>>>>  if ((markInfo != null) && (markInfo.isMarked())) System.out.println("true");
>>>>> 
>>>>> But, while the resulting document displays correctly, the header indicates that it is not tagged:
>>>>> 
>>>>> /Type /Catalog
>>>>> /Version /1.4
>>>>> /Pages 2 0 R
>>>>> /MarkInfo 3 0 R
>>>>> 
>>>>> Any idea what is going on?
>>>>> 
>>>>> Colette
>> 


Re: Unable to mark document as tagged

Posted by Duff Johnson <du...@pdfa.org>.
Colette,

It might be a good idea to take a look at 14.8 of ISO 32000-1, which defines tagged PDF.

You can download it for free:

http://wwwimages.adobe.com/content/dam/Adobe/en/devnet/pdf/pdfs/PDF32000_2008.pdf

Duff.


On Jun 13, 2014, at 1:52 PM, Maruan Sahyoun <sa...@fileaffairs.de> wrote:

> Colette,
> 
> you are not corrupting the PDF document but the structure Information needed for tagged PDF is missing. 
> 
> Maruan Sahyoun
> 
>> Am 13.06.2014 um 19:41 schrieb Colette Joubarne <cj...@privacyanalytics.ca>:
>> 
>> Maruan,
>> 
>> I use the parser to tokenize, and then loop thru the tokens. If a token is a TJ or Tj operator, I grab the text, in certain cases I replace some of the text (letter by letter, maintaining the existing structure), and add these tokens to a new token list. If it is not a TJ or Tj operator I just copy the token to the new token list. I then write the token list to the doc and save.
>> 
>> If I am corrupting the structure, how is it that the document displays correctly?
>> 
>> Colette
>> 
>> -----Original Message-----
>> From: Maruan Sahyoun [mailto:sahyoun@fileaffairs.de] 
>> Sent: June-13-14 12:54 PM
>> To: users@pdfbox.apache.org
>> Subject: Re: Unable to mark document as tagged
>> 
>> Hi Colette,
>> 
>> the modified version does not contain the structure information needed for tagged PDFs.  How do you create the modified version from the first one?
>> 
>> BR
>> Maruan
>> 
>>> Am 13.06.2014 um 17:48 schrieb Colette Joubarne <cj...@privacyanalytics.ca>:
>>> 
>>> Maruan,
>>> 
>>> I am copying the entire structure from a tagged document and just replacing some of the text, so I would think that the structure is unchanged. Then again who knows what I might have messed up.
>>> 
>>> James-pdf is the original file:
>>> https://dl.dropboxusercontent.com/u/7689859/James.pdf
>>> 
>>> James-mod.pdf is the modified file:
>>> https://dl.dropboxusercontent.com/u/7689859/James-mod.pdf
>>> 
>>> Colette
>>> 
>>> -----Original Message-----
>>> From: Maruan Sahyoun [mailto:sahyoun@fileaffairs.de] 
>>> Sent: June-13-14 10:45 AM
>>> To: users@pdfbox.apache.org
>>> Subject: Re: Unable to mark document as tagged
>>> 
>>> Hi Colette,
>>> 
>>> this information alone doesn't make a document a tagged PDF! You might not have the structure information needed within your PDF. Would you have a works / doesn't work sample which you could upload to a public location as attachments are not allowed on the mailing list?
>>> 
>>> BR
>>> Maruan
>>> 
>>>> Am 13.06.2014 um 15:44 schrieb Colette Joubarne <cj...@privacyanalytics.ca>:
>>>> 
>>>> Maruan,
>>>> 
>>>> Yes you are right, however why is it that when I look at the properties in Adobe Reader it indicates that the document is not tagged?
>>>> 
>>>> 3 0 obj
>>>> <<
>>>> /Marked true
>>>> 
>>>> Colette
>>>> -----Original Message-----
>>>> From: Maruan Sahyoun [mailto:sahyoun@fileaffairs.de] 
>>>> Sent: June-13-14 9:19 AM
>>>> To: users@pdfbox.apache.org
>>>> Subject: Re: Unable to mark document as tagged
>>>> 
>>>> Dear Colette,
>>>> 
>>>> /MarkInfo 3 0 R indicates that the information you are looking for is referenced and should be available in 3 0 obj. Could you verify that?
>>>> 
>>>> With kind regards
>>>> 
>>>> Maruan
>>>> 
>>>>> Am 13.06.2014 um 14:21 schrieb Colette Joubarne <cj...@privacyanalytics.ca>:
>>>>> 
>>>>> I have a tagged pdf doc with the following header:
>>>>> 
>>>>>        /Type/Catalog/Pages 2 0 R/Lang(en-CA) /StructTreeRoot 10 0 R/MarkInfo<</Marked true
>>>>> 
>>>>> I read in the contents, replace some of the text and create a new doc. I copy the document information from the original doc and set marked to true.
>>>>> 
>>>>>        newDoc = new PDDocument();
>>>>>        newDoc.setDocumentInformation(PTConstants.pdfDoc.getDocumentInformation());
>>>>> 
>>>>>        PDMarkInfo markinfo = new PDMarkInfo();
>>>>>        markinfo.setMarked(true);
>>>>>        newDoc.getDocumentCatalog().setMarkInfo(markinfo);
>>>>> 
>>>>> and when I check that it was set, it returns true:
>>>>> 
>>>>>  PDMarkInfo markInfo = PTConstants.pdfDoc.getDocumentCatalog().getMarkInfo();
>>>>>  if ((markInfo != null) && (markInfo.isMarked())) System.out.println("true");
>>>>> 
>>>>> But, while the resulting document displays correctly, the header indicates that it is not tagged:
>>>>> 
>>>>> /Type /Catalog
>>>>> /Version /1.4
>>>>> /Pages 2 0 R
>>>>> /MarkInfo 3 0 R
>>>>> 
>>>>> Any idea what is going on?
>>>>> 
>>>>> Colette
>> 


Re: Unable to mark document as tagged

Posted by Maruan Sahyoun <sa...@fileaffairs.de>.
Colette,

you are not corrupting the PDF document but the structure Information needed for tagged PDF is missing. 

Maruan Sahyoun

> Am 13.06.2014 um 19:41 schrieb Colette Joubarne <cj...@privacyanalytics.ca>:
> 
> Maruan,
> 
> I use the parser to tokenize, and then loop thru the tokens. If a token is a TJ or Tj operator, I grab the text, in certain cases I replace some of the text (letter by letter, maintaining the existing structure), and add these tokens to a new token list. If it is not a TJ or Tj operator I just copy the token to the new token list. I then write the token list to the doc and save.
> 
> If I am corrupting the structure, how is it that the document displays correctly?
> 
> Colette
> 
> -----Original Message-----
> From: Maruan Sahyoun [mailto:sahyoun@fileaffairs.de] 
> Sent: June-13-14 12:54 PM
> To: users@pdfbox.apache.org
> Subject: Re: Unable to mark document as tagged
> 
> Hi Colette,
> 
> the modified version does not contain the structure information needed for tagged PDFs.  How do you create the modified version from the first one?
> 
> BR
> Maruan
> 
>> Am 13.06.2014 um 17:48 schrieb Colette Joubarne <cj...@privacyanalytics.ca>:
>> 
>> Maruan,
>> 
>> I am copying the entire structure from a tagged document and just replacing some of the text, so I would think that the structure is unchanged. Then again who knows what I might have messed up.
>> 
>> James-pdf is the original file:
>> https://dl.dropboxusercontent.com/u/7689859/James.pdf
>> 
>> James-mod.pdf is the modified file:
>> https://dl.dropboxusercontent.com/u/7689859/James-mod.pdf
>> 
>> Colette
>> 
>> -----Original Message-----
>> From: Maruan Sahyoun [mailto:sahyoun@fileaffairs.de] 
>> Sent: June-13-14 10:45 AM
>> To: users@pdfbox.apache.org
>> Subject: Re: Unable to mark document as tagged
>> 
>> Hi Colette,
>> 
>> this information alone doesn't make a document a tagged PDF! You might not have the structure information needed within your PDF. Would you have a works / doesn't work sample which you could upload to a public location as attachments are not allowed on the mailing list?
>> 
>> BR
>> Maruan
>> 
>>> Am 13.06.2014 um 15:44 schrieb Colette Joubarne <cj...@privacyanalytics.ca>:
>>> 
>>> Maruan,
>>> 
>>> Yes you are right, however why is it that when I look at the properties in Adobe Reader it indicates that the document is not tagged?
>>> 
>>> 3 0 obj
>>> <<
>>> /Marked true
>>> 
>>> Colette
>>> -----Original Message-----
>>> From: Maruan Sahyoun [mailto:sahyoun@fileaffairs.de] 
>>> Sent: June-13-14 9:19 AM
>>> To: users@pdfbox.apache.org
>>> Subject: Re: Unable to mark document as tagged
>>> 
>>> Dear Colette,
>>> 
>>> /MarkInfo 3 0 R indicates that the information you are looking for is referenced and should be available in 3 0 obj. Could you verify that?
>>> 
>>> With kind regards
>>> 
>>> Maruan
>>> 
>>>> Am 13.06.2014 um 14:21 schrieb Colette Joubarne <cj...@privacyanalytics.ca>:
>>>> 
>>>> I have a tagged pdf doc with the following header:
>>>> 
>>>>         /Type/Catalog/Pages 2 0 R/Lang(en-CA) /StructTreeRoot 10 0 R/MarkInfo<</Marked true
>>>> 
>>>> I read in the contents, replace some of the text and create a new doc. I copy the document information from the original doc and set marked to true.
>>>> 
>>>>         newDoc = new PDDocument();
>>>>         newDoc.setDocumentInformation(PTConstants.pdfDoc.getDocumentInformation());
>>>> 
>>>>         PDMarkInfo markinfo = new PDMarkInfo();
>>>>         markinfo.setMarked(true);
>>>>         newDoc.getDocumentCatalog().setMarkInfo(markinfo);
>>>> 
>>>> and when I check that it was set, it returns true:
>>>> 
>>>>   PDMarkInfo markInfo = PTConstants.pdfDoc.getDocumentCatalog().getMarkInfo();
>>>>   if ((markInfo != null) && (markInfo.isMarked())) System.out.println("true");
>>>> 
>>>> But, while the resulting document displays correctly, the header indicates that it is not tagged:
>>>> 
>>>> /Type /Catalog
>>>> /Version /1.4
>>>> /Pages 2 0 R
>>>> /MarkInfo 3 0 R
>>>> 
>>>> Any idea what is going on?
>>>> 
>>>> Colette
> 

RE: Unable to mark document as tagged

Posted by Colette Joubarne <cj...@privacyanalytics.ca>.
Maruan,

I use the parser to tokenize, and then loop thru the tokens. If a token is a TJ or Tj operator, I grab the text, in certain cases I replace some of the text (letter by letter, maintaining the existing structure), and add these tokens to a new token list. If it is not a TJ or Tj operator I just copy the token to the new token list. I then write the token list to the doc and save.

If I am corrupting the structure, how is it that the document displays correctly?

Colette

-----Original Message-----
From: Maruan Sahyoun [mailto:sahyoun@fileaffairs.de] 
Sent: June-13-14 12:54 PM
To: users@pdfbox.apache.org
Subject: Re: Unable to mark document as tagged

Hi Colette,

the modified version does not contain the structure information needed for tagged PDFs.  How do you create the modified version from the first one?

BR
Maruan

Am 13.06.2014 um 17:48 schrieb Colette Joubarne <cj...@privacyanalytics.ca>:

> Maruan,
> 
> I am copying the entire structure from a tagged document and just replacing some of the text, so I would think that the structure is unchanged. Then again who knows what I might have messed up.
> 
> James-pdf is the original file:
> https://dl.dropboxusercontent.com/u/7689859/James.pdf
> 
> James-mod.pdf is the modified file:
> https://dl.dropboxusercontent.com/u/7689859/James-mod.pdf
> 
> Colette
> 
> -----Original Message-----
> From: Maruan Sahyoun [mailto:sahyoun@fileaffairs.de] 
> Sent: June-13-14 10:45 AM
> To: users@pdfbox.apache.org
> Subject: Re: Unable to mark document as tagged
> 
> Hi Colette,
> 
> this information alone doesn't make a document a tagged PDF! You might not have the structure information needed within your PDF. Would you have a works / doesn't work sample which you could upload to a public location as attachments are not allowed on the mailing list?
> 
> BR
> Maruan
> 
> Am 13.06.2014 um 15:44 schrieb Colette Joubarne <cj...@privacyanalytics.ca>:
> 
>> Maruan,
>> 
>> Yes you are right, however why is it that when I look at the properties in Adobe Reader it indicates that the document is not tagged?
>> 
>> 3 0 obj
>> <<
>> /Marked true
>>>> 
>> 
>> Colette
>> -----Original Message-----
>> From: Maruan Sahyoun [mailto:sahyoun@fileaffairs.de] 
>> Sent: June-13-14 9:19 AM
>> To: users@pdfbox.apache.org
>> Subject: Re: Unable to mark document as tagged
>> 
>> Dear Colette,
>> 
>> /MarkInfo 3 0 R indicates that the information you are looking for is referenced and should be available in 3 0 obj. Could you verify that?
>> 
>> With kind regards
>> 
>> Maruan
>> 
>> Am 13.06.2014 um 14:21 schrieb Colette Joubarne <cj...@privacyanalytics.ca>:
>> 
>>> I have a tagged pdf doc with the following header:
>>> 
>>>          /Type/Catalog/Pages 2 0 R/Lang(en-CA) /StructTreeRoot 10 0 R/MarkInfo<</Marked true
>>> 
>>> I read in the contents, replace some of the text and create a new doc. I copy the document information from the original doc and set marked to true.
>>> 
>>>          newDoc = new PDDocument();
>>>          newDoc.setDocumentInformation(PTConstants.pdfDoc.getDocumentInformation());
>>> 
>>>          PDMarkInfo markinfo = new PDMarkInfo();
>>>          markinfo.setMarked(true);
>>>          newDoc.getDocumentCatalog().setMarkInfo(markinfo);
>>> 
>>> and when I check that it was set, it returns true:
>>> 
>>>    PDMarkInfo markInfo = PTConstants.pdfDoc.getDocumentCatalog().getMarkInfo();
>>>    if ((markInfo != null) && (markInfo.isMarked())) System.out.println("true");
>>> 
>>> But, while the resulting document displays correctly, the header indicates that it is not tagged:
>>> 
>>> /Type /Catalog
>>> /Version /1.4
>>> /Pages 2 0 R
>>> /MarkInfo 3 0 R
>>> 
>>> Any idea what is going on?
>>> 
>>> Colette
>> 
> 


Re: Unable to mark document as tagged

Posted by Maruan Sahyoun <sa...@fileaffairs.de>.
Hi Colette,

the modified version does not contain the structure information needed for tagged PDFs.  How do you create the modified version from the first one?

BR
Maruan

Am 13.06.2014 um 17:48 schrieb Colette Joubarne <cj...@privacyanalytics.ca>:

> Maruan,
> 
> I am copying the entire structure from a tagged document and just replacing some of the text, so I would think that the structure is unchanged. Then again who knows what I might have messed up.
> 
> James-pdf is the original file:
> https://dl.dropboxusercontent.com/u/7689859/James.pdf
> 
> James-mod.pdf is the modified file:
> https://dl.dropboxusercontent.com/u/7689859/James-mod.pdf
> 
> Colette
> 
> -----Original Message-----
> From: Maruan Sahyoun [mailto:sahyoun@fileaffairs.de] 
> Sent: June-13-14 10:45 AM
> To: users@pdfbox.apache.org
> Subject: Re: Unable to mark document as tagged
> 
> Hi Colette,
> 
> this information alone doesn't make a document a tagged PDF! You might not have the structure information needed within your PDF. Would you have a works / doesn't work sample which you could upload to a public location as attachments are not allowed on the mailing list?
> 
> BR
> Maruan
> 
> Am 13.06.2014 um 15:44 schrieb Colette Joubarne <cj...@privacyanalytics.ca>:
> 
>> Maruan,
>> 
>> Yes you are right, however why is it that when I look at the properties in Adobe Reader it indicates that the document is not tagged?
>> 
>> 3 0 obj
>> <<
>> /Marked true
>>>> 
>> 
>> Colette
>> -----Original Message-----
>> From: Maruan Sahyoun [mailto:sahyoun@fileaffairs.de] 
>> Sent: June-13-14 9:19 AM
>> To: users@pdfbox.apache.org
>> Subject: Re: Unable to mark document as tagged
>> 
>> Dear Colette,
>> 
>> /MarkInfo 3 0 R indicates that the information you are looking for is referenced and should be available in 3 0 obj. Could you verify that?
>> 
>> With kind regards
>> 
>> Maruan
>> 
>> Am 13.06.2014 um 14:21 schrieb Colette Joubarne <cj...@privacyanalytics.ca>:
>> 
>>> I have a tagged pdf doc with the following header:
>>> 
>>>          /Type/Catalog/Pages 2 0 R/Lang(en-CA) /StructTreeRoot 10 0 R/MarkInfo<</Marked true
>>> 
>>> I read in the contents, replace some of the text and create a new doc. I copy the document information from the original doc and set marked to true.
>>> 
>>>          newDoc = new PDDocument();
>>>          newDoc.setDocumentInformation(PTConstants.pdfDoc.getDocumentInformation());
>>> 
>>>          PDMarkInfo markinfo = new PDMarkInfo();
>>>          markinfo.setMarked(true);
>>>          newDoc.getDocumentCatalog().setMarkInfo(markinfo);
>>> 
>>> and when I check that it was set, it returns true:
>>> 
>>>    PDMarkInfo markInfo = PTConstants.pdfDoc.getDocumentCatalog().getMarkInfo();
>>>    if ((markInfo != null) && (markInfo.isMarked())) System.out.println("true");
>>> 
>>> But, while the resulting document displays correctly, the header indicates that it is not tagged:
>>> 
>>> /Type /Catalog
>>> /Version /1.4
>>> /Pages 2 0 R
>>> /MarkInfo 3 0 R
>>> 
>>> Any idea what is going on?
>>> 
>>> Colette
>> 
> 


RE: Unable to mark document as tagged

Posted by Colette Joubarne <cj...@privacyanalytics.ca>.
Maruan,

I am copying the entire structure from a tagged document and just replacing some of the text, so I would think that the structure is unchanged. Then again who knows what I might have messed up.

James-pdf is the original file:
https://dl.dropboxusercontent.com/u/7689859/James.pdf

James-mod.pdf is the modified file:
https://dl.dropboxusercontent.com/u/7689859/James-mod.pdf

Colette

-----Original Message-----
From: Maruan Sahyoun [mailto:sahyoun@fileaffairs.de] 
Sent: June-13-14 10:45 AM
To: users@pdfbox.apache.org
Subject: Re: Unable to mark document as tagged

Hi Colette,

this information alone doesn't make a document a tagged PDF! You might not have the structure information needed within your PDF. Would you have a works / doesn't work sample which you could upload to a public location as attachments are not allowed on the mailing list?

BR
Maruan

Am 13.06.2014 um 15:44 schrieb Colette Joubarne <cj...@privacyanalytics.ca>:

> Maruan,
> 
> Yes you are right, however why is it that when I look at the properties in Adobe Reader it indicates that the document is not tagged?
> 
> 3 0 obj
> <<
> /Marked true
>>> 
> 
> Colette
> -----Original Message-----
> From: Maruan Sahyoun [mailto:sahyoun@fileaffairs.de] 
> Sent: June-13-14 9:19 AM
> To: users@pdfbox.apache.org
> Subject: Re: Unable to mark document as tagged
> 
> Dear Colette,
> 
> /MarkInfo 3 0 R indicates that the information you are looking for is referenced and should be available in 3 0 obj. Could you verify that?
> 
> With kind regards
> 
> Maruan
> 
> Am 13.06.2014 um 14:21 schrieb Colette Joubarne <cj...@privacyanalytics.ca>:
> 
>> I have a tagged pdf doc with the following header:
>> 
>>           /Type/Catalog/Pages 2 0 R/Lang(en-CA) /StructTreeRoot 10 0 R/MarkInfo<</Marked true
>> 
>> I read in the contents, replace some of the text and create a new doc. I copy the document information from the original doc and set marked to true.
>> 
>>           newDoc = new PDDocument();
>>           newDoc.setDocumentInformation(PTConstants.pdfDoc.getDocumentInformation());
>> 
>>           PDMarkInfo markinfo = new PDMarkInfo();
>>           markinfo.setMarked(true);
>>           newDoc.getDocumentCatalog().setMarkInfo(markinfo);
>> 
>> and when I check that it was set, it returns true:
>> 
>>     PDMarkInfo markInfo = PTConstants.pdfDoc.getDocumentCatalog().getMarkInfo();
>>     if ((markInfo != null) && (markInfo.isMarked())) System.out.println("true");
>> 
>> But, while the resulting document displays correctly, the header indicates that it is not tagged:
>> 
>> /Type /Catalog
>> /Version /1.4
>> /Pages 2 0 R
>> /MarkInfo 3 0 R
>> 
>> Any idea what is going on?
>> 
>> Colette
> 


Re: Unable to mark document as tagged

Posted by Maruan Sahyoun <sa...@fileaffairs.de>.
Hi Colette,

this information alone doesn’t make a document a tagged PDF! You might not have the structure information needed within your PDF. Would you have a works / doesn’t work sample which you could upload to a public location as attachments are not allowed on the mailing list?

BR
Maruan

Am 13.06.2014 um 15:44 schrieb Colette Joubarne <cj...@privacyanalytics.ca>:

> Maruan,
> 
> Yes you are right, however why is it that when I look at the properties in Adobe Reader it indicates that the document is not tagged?
> 
> 3 0 obj
> <<
> /Marked true
>>> 
> 
> Colette
> -----Original Message-----
> From: Maruan Sahyoun [mailto:sahyoun@fileaffairs.de] 
> Sent: June-13-14 9:19 AM
> To: users@pdfbox.apache.org
> Subject: Re: Unable to mark document as tagged
> 
> Dear Colette,
> 
> /MarkInfo 3 0 R indicates that the information you are looking for is referenced and should be available in 3 0 obj. Could you verify that?
> 
> With kind regards
> 
> Maruan
> 
> Am 13.06.2014 um 14:21 schrieb Colette Joubarne <cj...@privacyanalytics.ca>:
> 
>> I have a tagged pdf doc with the following header:
>> 
>>           /Type/Catalog/Pages 2 0 R/Lang(en-CA) /StructTreeRoot 10 0 R/MarkInfo<</Marked true
>> 
>> I read in the contents, replace some of the text and create a new doc. I copy the document information from the original doc and set marked to true.
>> 
>>           newDoc = new PDDocument();
>>           newDoc.setDocumentInformation(PTConstants.pdfDoc.getDocumentInformation());
>> 
>>           PDMarkInfo markinfo = new PDMarkInfo();
>>           markinfo.setMarked(true);
>>           newDoc.getDocumentCatalog().setMarkInfo(markinfo);
>> 
>> and when I check that it was set, it returns true:
>> 
>>     PDMarkInfo markInfo = PTConstants.pdfDoc.getDocumentCatalog().getMarkInfo();
>>     if ((markInfo != null) && (markInfo.isMarked())) System.out.println("true");
>> 
>> But, while the resulting document displays correctly, the header indicates that it is not tagged:
>> 
>> /Type /Catalog
>> /Version /1.4
>> /Pages 2 0 R
>> /MarkInfo 3 0 R
>> 
>> Any idea what is going on?
>> 
>> Colette
> 


RE: Unable to mark document as tagged

Posted by Colette Joubarne <cj...@privacyanalytics.ca>.
Maruan,

Yes you are right, however why is it that when I look at the properties in Adobe Reader it indicates that the document is not tagged?

3 0 obj
<<
/Marked true
>>

Colette
-----Original Message-----
From: Maruan Sahyoun [mailto:sahyoun@fileaffairs.de] 
Sent: June-13-14 9:19 AM
To: users@pdfbox.apache.org
Subject: Re: Unable to mark document as tagged

Dear Colette,

/MarkInfo 3 0 R indicates that the information you are looking for is referenced and should be available in 3 0 obj. Could you verify that?

With kind regards

Maruan

Am 13.06.2014 um 14:21 schrieb Colette Joubarne <cj...@privacyanalytics.ca>:

> I have a tagged pdf doc with the following header:
> 
>            /Type/Catalog/Pages 2 0 R/Lang(en-CA) /StructTreeRoot 10 0 R/MarkInfo<</Marked true
> 
> I read in the contents, replace some of the text and create a new doc. I copy the document information from the original doc and set marked to true.
> 
>            newDoc = new PDDocument();
>            newDoc.setDocumentInformation(PTConstants.pdfDoc.getDocumentInformation());
> 
>            PDMarkInfo markinfo = new PDMarkInfo();
>            markinfo.setMarked(true);
>            newDoc.getDocumentCatalog().setMarkInfo(markinfo);
> 
> and when I check that it was set, it returns true:
> 
>      PDMarkInfo markInfo = PTConstants.pdfDoc.getDocumentCatalog().getMarkInfo();
>      if ((markInfo != null) && (markInfo.isMarked())) System.out.println("true");
> 
> But, while the resulting document displays correctly, the header indicates that it is not tagged:
> 
> /Type /Catalog
> /Version /1.4
> /Pages 2 0 R
> /MarkInfo 3 0 R
> 
> Any idea what is going on?
> 
> Colette


Re: Unable to mark document as tagged

Posted by Maruan Sahyoun <sa...@fileaffairs.de>.
Dear Colette,

/MarkInfo 3 0 R indicates that the information you are looking for is referenced and should be available in 3 0 obj. Could you verify that?

With kind regards

Maruan

Am 13.06.2014 um 14:21 schrieb Colette Joubarne <cj...@privacyanalytics.ca>:

> I have a tagged pdf doc with the following header:
> 
>            /Type/Catalog/Pages 2 0 R/Lang(en-CA) /StructTreeRoot 10 0 R/MarkInfo<</Marked true
> 
> I read in the contents, replace some of the text and create a new doc. I copy the document information from the original doc and set marked to true.
> 
>            newDoc = new PDDocument();
>            newDoc.setDocumentInformation(PTConstants.pdfDoc.getDocumentInformation());
> 
>            PDMarkInfo markinfo = new PDMarkInfo();
>            markinfo.setMarked(true);
>            newDoc.getDocumentCatalog().setMarkInfo(markinfo);
> 
> and when I check that it was set, it returns true:
> 
>      PDMarkInfo markInfo = PTConstants.pdfDoc.getDocumentCatalog().getMarkInfo();
>      if ((markInfo != null) && (markInfo.isMarked())) System.out.println("true");
> 
> But, while the resulting document displays correctly, the header indicates that it is not tagged:
> 
> /Type /Catalog
> /Version /1.4
> /Pages 2 0 R
> /MarkInfo 3 0 R
> 
> Any idea what is going on?
> 
> Colette