You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Jukka Zitting (JIRA)" <ji...@apache.org> on 2009/12/09 10:44:18 UTC

[jira] Created: (PDFBOX-581) Avoid warnings for graphics operations when extracting text

Avoid warnings for graphics operations when extracting text
-----------------------------------------------------------

                 Key: PDFBOX-581
                 URL: https://issues.apache.org/jira/browse/PDFBOX-581
             Project: PDFBox
          Issue Type: Improvement
          Components: Text extraction
            Reporter: Jukka Zitting
            Assignee: Jukka Zitting
            Priority: Minor


PDFStreamEngine logs warnings of all encountered PDF operators for which an OperatorProcessor has not explicitly been configured. This is a bit annoying for things like text extraction where many graphics operators can simply be ignored.

To solve this we can either disable the warnings entirely or add an explicit "Ignore" operator processor that simply ignores the selected operators. I'm inclined to implement the latter solution as I think it's a good idea to log warnings for truly unexpected operators.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Resolved: (PDFBOX-581) Avoid warnings for graphics operations when extracting text

Posted by "Jukka Zitting (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PDFBOX-581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jukka Zitting resolved PDFBOX-581.
----------------------------------

       Resolution: Fixed
    Fix Version/s: 1.0.0

Instead of adding an extra "Ignore" processor class, I fixed this in revision 889465 simply by making PDFStreamEngine ignore all operators that are listed in the properties file without a corresponding OperatorProcessor class name.


> Avoid warnings for graphics operations when extracting text
> -----------------------------------------------------------
>
>                 Key: PDFBOX-581
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-581
>             Project: PDFBox
>          Issue Type: Improvement
>          Components: Text extraction
>            Reporter: Jukka Zitting
>            Assignee: Jukka Zitting
>            Priority: Minor
>             Fix For: 1.0.0
>
>
> PDFStreamEngine logs warnings of all encountered PDF operators for which an OperatorProcessor has not explicitly been configured. This is a bit annoying for things like text extraction where many graphics operators can simply be ignored.
> To solve this we can either disable the warnings entirely or add an explicit "Ignore" operator processor that simply ignores the selected operators. I'm inclined to implement the latter solution as I think it's a good idea to log warnings for truly unexpected operators.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PDFBOX-581) Avoid warnings for graphics operations when extracting text

Posted by "Andreas Lehmkühler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PDFBOX-581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12788088#action_12788088 ] 

Andreas Lehmkühler commented on PDFBOX-581:
-------------------------------------------

IMHO the Ignore-operator is a good idea. As the TextStripper has its own property file to define the operator mapping, we should use the ignore-operator in that case and leave the PageDrawer property file alone. I think it's quite useful to know which unsupported operators are used during rendering a pdf. So that it's easy to decide if something is missing because of an issue or because of beeing not supported.

> Avoid warnings for graphics operations when extracting text
> -----------------------------------------------------------
>
>                 Key: PDFBOX-581
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-581
>             Project: PDFBox
>          Issue Type: Improvement
>          Components: Text extraction
>            Reporter: Jukka Zitting
>            Assignee: Jukka Zitting
>            Priority: Minor
>
> PDFStreamEngine logs warnings of all encountered PDF operators for which an OperatorProcessor has not explicitly been configured. This is a bit annoying for things like text extraction where many graphics operators can simply be ignored.
> To solve this we can either disable the warnings entirely or add an explicit "Ignore" operator processor that simply ignores the selected operators. I'm inclined to implement the latter solution as I think it's a good idea to log warnings for truly unexpected operators.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.