You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "William Fausser (Created) (JIRA)" <ji...@apache.org> on 2012/01/06 17:32:39 UTC

[jira] [Created] (PDFBOX-1204) OCR generated PDF/A has problems with preflight validation

OCR generated PDF/A  has problems with preflight validation
-----------------------------------------------------------

                 Key: PDFBOX-1204
                 URL: https://issues.apache.org/jira/browse/PDFBOX-1204
             Project: PDFBox
          Issue Type: Bug
          Components: Preflight
    Affects Versions: 1.7.0
            Reporter: William Fausser
         Attachments: boyd.pdf

/home/fausser/boyd.pdf is not valid, error(s):
2.1.2:Invalid Graphis object, The Info entry of a OutputIntent dictionary is missing
3.3.1: Glyph error, CID 95 is missing from the Composite Font format "HiddenHorzOCR"
7.2:Error on MetaData, ModificationDate present in the document catalog dictionary doesn't match with XMP information

Passes as a valid PDF/A with commercial validators Adobe Acrobat 10.x and Callas

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (PDFBOX-1204) OCR generated PDF/A has problems with preflight validation

Posted by "Eric Leleu (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PDFBOX-1204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13187802#comment-13187802 ] 

Eric Leleu commented on PDFBOX-1204:
------------------------------------

Hi,

I started to work on this issue last week. 
My first impression is that the font HiddenHorzOCR isn't well formed because the extracted FontFile doesn't have any glyph definition. 
That seems to confirm the error.
Maybe I'm wrong. I will try to work on this issue this week.

I don't yet look at the Metadata error.

BR,
Eric
                
> OCR generated PDF/A  has problems with preflight validation
> -----------------------------------------------------------
>
>                 Key: PDFBOX-1204
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-1204
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Preflight
>    Affects Versions: 1.7.0
>            Reporter: William Fausser
>         Attachments: boyd.pdf
>
>
> /home/fausser/boyd.pdf is not valid, error(s):
> 2.1.2:Invalid Graphis object, The Info entry of a OutputIntent dictionary is missing
> 3.3.1: Glyph error, CID 95 is missing from the Composite Font format "HiddenHorzOCR"
> 7.2:Error on MetaData, ModificationDate present in the document catalog dictionary doesn't match with XMP information
> Passes as a valid PDF/A with commercial validators Adobe Acrobat 10.x and Callas

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (PDFBOX-1204) OCR generated PDF/A has problems with preflight validation

Posted by "William Fausser (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PDFBOX-1204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

William Fausser updated PDFBOX-1204:
------------------------------------

    Attachment: boyd.pdf
    
> OCR generated PDF/A  has problems with preflight validation
> -----------------------------------------------------------
>
>                 Key: PDFBOX-1204
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-1204
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Preflight
>    Affects Versions: 1.7.0
>            Reporter: William Fausser
>         Attachments: boyd.pdf
>
>
> /home/fausser/boyd.pdf is not valid, error(s):
> 2.1.2:Invalid Graphis object, The Info entry of a OutputIntent dictionary is missing
> 3.3.1: Glyph error, CID 95 is missing from the Composite Font format "HiddenHorzOCR"
> 7.2:Error on MetaData, ModificationDate present in the document catalog dictionary doesn't match with XMP information
> Passes as a valid PDF/A with commercial validators Adobe Acrobat 10.x and Callas

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Assigned] (PDFBOX-1204) OCR generated PDF/A has problems with preflight validation

Posted by "Eric Leleu (Assigned) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PDFBOX-1204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Eric Leleu reassigned PDFBOX-1204:
----------------------------------

    Assignee: Eric Leleu
    
> OCR generated PDF/A  has problems with preflight validation
> -----------------------------------------------------------
>
>                 Key: PDFBOX-1204
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-1204
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Preflight
>    Affects Versions: 1.7.0
>            Reporter: William Fausser
>            Assignee: Eric Leleu
>         Attachments: boyd.pdf
>
>
> /home/fausser/boyd.pdf is not valid, error(s):
> 2.1.2:Invalid Graphis object, The Info entry of a OutputIntent dictionary is missing
> 3.3.1: Glyph error, CID 95 is missing from the Composite Font format "HiddenHorzOCR"
> 7.2:Error on MetaData, ModificationDate present in the document catalog dictionary doesn't match with XMP information
> Passes as a valid PDF/A with commercial validators Adobe Acrobat 10.x and Callas

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (PDFBOX-1204) OCR generated PDF/A has problems with preflight validation

Posted by "William Fausser (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PDFBOX-1204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13184937#comment-13184937 ] 

William Fausser commented on PDFBOX-1204:
-----------------------------------------


Hi Eric,
With issues 110 and 1200 getting fixed, I reran my test on the boyd.pdf   and still get a  preflight validation error below:

/home/fausser/boyd.pdf is not valid, error(s) :
3.3.1 : Glyph error, CID 95 is missing from the Composite Font format "HiddenHorzOCR"
7.2 : Error on MetaData, ModificationDate present in the document catalog dictionary doesn't match with XMP information

Regards,
Bill

                
> OCR generated PDF/A  has problems with preflight validation
> -----------------------------------------------------------
>
>                 Key: PDFBOX-1204
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-1204
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Preflight
>    Affects Versions: 1.7.0
>            Reporter: William Fausser
>         Attachments: boyd.pdf
>
>
> /home/fausser/boyd.pdf is not valid, error(s):
> 2.1.2:Invalid Graphis object, The Info entry of a OutputIntent dictionary is missing
> 3.3.1: Glyph error, CID 95 is missing from the Composite Font format "HiddenHorzOCR"
> 7.2:Error on MetaData, ModificationDate present in the document catalog dictionary doesn't match with XMP information
> Passes as a valid PDF/A with commercial validators Adobe Acrobat 10.x and Callas

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (PDFBOX-1204) OCR generated PDF/A has problems with preflight validation

Posted by "William Fausser (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PDFBOX-1204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13195781#comment-13195781 ] 

William Fausser commented on PDFBOX-1204:
-----------------------------------------

Eric,
You are welcome.
Thank You again :)

Regards,
Bill
                
> OCR generated PDF/A  has problems with preflight validation
> -----------------------------------------------------------
>
>                 Key: PDFBOX-1204
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-1204
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Preflight
>    Affects Versions: 1.7.0
>            Reporter: William Fausser
>            Assignee: Eric Leleu
>         Attachments: boyd.pdf
>
>
> /home/fausser/boyd.pdf is not valid, error(s):
> 2.1.2:Invalid Graphis object, The Info entry of a OutputIntent dictionary is missing
> 3.3.1: Glyph error, CID 95 is missing from the Composite Font format "HiddenHorzOCR"
> 7.2:Error on MetaData, ModificationDate present in the document catalog dictionary doesn't match with XMP information
> Passes as a valid PDF/A with commercial validators Adobe Acrobat 10.x and Callas

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Resolved] (PDFBOX-1204) OCR generated PDF/A has problems with preflight validation

Posted by "Eric Leleu (Resolved) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PDFBOX-1204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Eric Leleu resolved PDFBOX-1204.
--------------------------------

    Resolution: Fixed

Hi,

I thank you for the link.
It helped me a lot to find the fix.

As you have said, the problem is the partial font embedded by the OCR process.
When a glyph was used in a text, an exception was thrown when the glyph was missing or when the glyph width was inconsistent. This exception should not be thrown when the Rendering Mode is set to 3. I added this condition. 

I fixed the XMP error too.

Now the boyd.pdf file is considered as a valid PDF/A-1b.


BR,
Eric
                
> OCR generated PDF/A  has problems with preflight validation
> -----------------------------------------------------------
>
>                 Key: PDFBOX-1204
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-1204
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Preflight
>    Affects Versions: 1.7.0
>            Reporter: William Fausser
>            Assignee: Eric Leleu
>         Attachments: boyd.pdf
>
>
> /home/fausser/boyd.pdf is not valid, error(s):
> 2.1.2:Invalid Graphis object, The Info entry of a OutputIntent dictionary is missing
> 3.3.1: Glyph error, CID 95 is missing from the Composite Font format "HiddenHorzOCR"
> 7.2:Error on MetaData, ModificationDate present in the document catalog dictionary doesn't match with XMP information
> Passes as a valid PDF/A with commercial validators Adobe Acrobat 10.x and Callas

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (PDFBOX-1204) OCR generated PDF/A has problems with preflight validation

Posted by "William Fausser (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PDFBOX-1204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13193230#comment-13193230 ] 

William Fausser commented on PDFBOX-1204:
-----------------------------------------

Hi Eric,
I was looking at this document:
http://www.aiim.org/documents/standards/PDF-A/ISO19005AppNotes.pdf

I was wondering with an OCR type document if only a partial font set is used and if this paragraph wouldmake any sense:
"Developers should be aware that although the PDF/A-1 specification implies that all glyphs in the embedded data must have a matching entry in the Widths table, only those glyphs actually used need to have Width entries."

so, I'm trying to understand if the above applies to the 3.3.1 error message returned.

Regards,
Bill
                
> OCR generated PDF/A  has problems with preflight validation
> -----------------------------------------------------------
>
>                 Key: PDFBOX-1204
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-1204
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Preflight
>    Affects Versions: 1.7.0
>            Reporter: William Fausser
>         Attachments: boyd.pdf
>
>
> /home/fausser/boyd.pdf is not valid, error(s):
> 2.1.2:Invalid Graphis object, The Info entry of a OutputIntent dictionary is missing
> 3.3.1: Glyph error, CID 95 is missing from the Composite Font format "HiddenHorzOCR"
> 7.2:Error on MetaData, ModificationDate present in the document catalog dictionary doesn't match with XMP information
> Passes as a valid PDF/A with commercial validators Adobe Acrobat 10.x and Callas

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Issue Comment Edited] (PDFBOX-1204) OCR generated PDF/A has problems with preflight validation

Posted by "William Fausser (Issue Comment Edited) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PDFBOX-1204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13184937#comment-13184937 ] 

William Fausser edited comment on PDFBOX-1204 at 1/17/12 4:22 PM:
------------------------------------------------------------------


Hi Eric,
With issues 1110 and 1200 getting fixed, I reran my test on the boyd.pdf   and still get a  preflight validation error below:

/home/fausser/boyd.pdf is not valid, error(s) :
3.3.1 : Glyph error, CID 95 is missing from the Composite Font format "HiddenHorzOCR"
7.2 : Error on MetaData, ModificationDate present in the document catalog dictionary doesn't match with XMP information

OCR generated PDFs are a big part of the way PDF/As get generated and if these bugs get cleared up, I think the preflight product wiill
become usable as a valid PDF/A validator.

Regards,
Bill

                
      was (Author: bfausser):
    
Hi Eric,
With issues 110 and 1200 getting fixed, I reran my test on the boyd.pdf   and still get a  preflight validation error below:

/home/fausser/boyd.pdf is not valid, error(s) :
3.3.1 : Glyph error, CID 95 is missing from the Composite Font format "HiddenHorzOCR"
7.2 : Error on MetaData, ModificationDate present in the document catalog dictionary doesn't match with XMP information

Regards,
Bill

                  
> OCR generated PDF/A  has problems with preflight validation
> -----------------------------------------------------------
>
>                 Key: PDFBOX-1204
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-1204
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Preflight
>    Affects Versions: 1.7.0
>            Reporter: William Fausser
>         Attachments: boyd.pdf
>
>
> /home/fausser/boyd.pdf is not valid, error(s):
> 2.1.2:Invalid Graphis object, The Info entry of a OutputIntent dictionary is missing
> 3.3.1: Glyph error, CID 95 is missing from the Composite Font format "HiddenHorzOCR"
> 7.2:Error on MetaData, ModificationDate present in the document catalog dictionary doesn't match with XMP information
> Passes as a valid PDF/A with commercial validators Adobe Acrobat 10.x and Callas

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira