You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Jukka Zitting (JIRA)" <ji...@apache.org> on 2009/12/09 10:44:18 UTC
[jira] Created: (PDFBOX-581) Avoid warnings for graphics operations
when extracting text
Avoid warnings for graphics operations when extracting text
-----------------------------------------------------------
Key: PDFBOX-581
URL: https://issues.apache.org/jira/browse/PDFBOX-581
Project: PDFBox
Issue Type: Improvement
Components: Text extraction
Reporter: Jukka Zitting
Assignee: Jukka Zitting
Priority: Minor
PDFStreamEngine logs warnings of all encountered PDF operators for which an OperatorProcessor has not explicitly been configured. This is a bit annoying for things like text extraction where many graphics operators can simply be ignored.
To solve this we can either disable the warnings entirely or add an explicit "Ignore" operator processor that simply ignores the selected operators. I'm inclined to implement the latter solution as I think it's a good idea to log warnings for truly unexpected operators.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Resolved: (PDFBOX-581) Avoid warnings for graphics
operations when extracting text
Posted by "Jukka Zitting (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PDFBOX-581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jukka Zitting resolved PDFBOX-581.
----------------------------------
Resolution: Fixed
Fix Version/s: 1.0.0
Instead of adding an extra "Ignore" processor class, I fixed this in revision 889465 simply by making PDFStreamEngine ignore all operators that are listed in the properties file without a corresponding OperatorProcessor class name.
> Avoid warnings for graphics operations when extracting text
> -----------------------------------------------------------
>
> Key: PDFBOX-581
> URL: https://issues.apache.org/jira/browse/PDFBOX-581
> Project: PDFBox
> Issue Type: Improvement
> Components: Text extraction
> Reporter: Jukka Zitting
> Assignee: Jukka Zitting
> Priority: Minor
> Fix For: 1.0.0
>
>
> PDFStreamEngine logs warnings of all encountered PDF operators for which an OperatorProcessor has not explicitly been configured. This is a bit annoying for things like text extraction where many graphics operators can simply be ignored.
> To solve this we can either disable the warnings entirely or add an explicit "Ignore" operator processor that simply ignores the selected operators. I'm inclined to implement the latter solution as I think it's a good idea to log warnings for truly unexpected operators.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (PDFBOX-581) Avoid warnings for graphics
operations when extracting text
Posted by "Andreas Lehmkühler (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PDFBOX-581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12788088#action_12788088 ]
Andreas Lehmkühler commented on PDFBOX-581:
-------------------------------------------
IMHO the Ignore-operator is a good idea. As the TextStripper has its own property file to define the operator mapping, we should use the ignore-operator in that case and leave the PageDrawer property file alone. I think it's quite useful to know which unsupported operators are used during rendering a pdf. So that it's easy to decide if something is missing because of an issue or because of beeing not supported.
> Avoid warnings for graphics operations when extracting text
> -----------------------------------------------------------
>
> Key: PDFBOX-581
> URL: https://issues.apache.org/jira/browse/PDFBOX-581
> Project: PDFBox
> Issue Type: Improvement
> Components: Text extraction
> Reporter: Jukka Zitting
> Assignee: Jukka Zitting
> Priority: Minor
>
> PDFStreamEngine logs warnings of all encountered PDF operators for which an OperatorProcessor has not explicitly been configured. This is a bit annoying for things like text extraction where many graphics operators can simply be ignored.
> To solve this we can either disable the warnings entirely or add an explicit "Ignore" operator processor that simply ignores the selected operators. I'm inclined to implement the latter solution as I think it's a good idea to log warnings for truly unexpected operators.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.