You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Niraj Bhawnani (JIRA)" <ji...@apache.org> on 2010/07/14 03:27:51 UTC
[jira] Created: (PDFBOX-774) convertToImage causes JVM crash on
certain PDFs
convertToImage causes JVM crash on certain PDFs
-----------------------------------------------
Key: PDFBOX-774
URL: https://issues.apache.org/jira/browse/PDFBOX-774
Project: PDFBox
Issue Type: Bug
Affects Versions: 1.2.1, 1.2.0
Reporter: Niraj Bhawnani
I'm evaluating PDFBox and as part of the process I tried out several PDFs on it. One of the issues I found was on converting certain PDFs to images, it crashed the JVM with this message (Ubuntu Lucid Lynx 64-bit):
{noformat}
#
# A fatal error has been detected by the Java Runtime Environment:
#
# SIGSEGV (0xb) at pc=0x00007fe5b6be1a37, pid=2133, tid=140628023412496
#
# JRE version: 6.0_20-b02
# Java VM: Java HotSpot(TM) 64-Bit Server VM (16.3-b01 mixed mode linux-amd64 )
# Problematic frame:
# C [libfontmanager.so+0x27a37]
#
# An error report file with more information is saved as:
# /home/xxxxxx/hs_err_pid2133.log
#
# If you would like to submit a bug report, please visit:
# http://java.sun.com/webapps/bugreport/crash.jsp
# The crash happened outside the Java Virtual Machine in native code.
# See problematic frame for where to report the bug.
#
{noformat}
Of course, this seems like an issue with Java but it would be nice if PDFBox somehow worked around it. I tested this on 2 separate 64-bit Linux boxes as well as a 32-bit Windows box. Pretty much the same error on both platforms.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (PDFBOX-774) convertToImage causes JVM crash on
certain PDFs
Posted by "Niraj Bhawnani (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PDFBOX-774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Niraj Bhawnani updated PDFBOX-774:
----------------------------------
Attachment: IC_bp_strategy_presentation_march_2010_slides.pdf
Attached an example PDF where this happens that I grabbed off a Google search for "pdf presentation slides"
> convertToImage causes JVM crash on certain PDFs
> -----------------------------------------------
>
> Key: PDFBOX-774
> URL: https://issues.apache.org/jira/browse/PDFBOX-774
> Project: PDFBox
> Issue Type: Bug
> Affects Versions: 1.2.0, 1.2.1
> Reporter: Niraj Bhawnani
> Attachments: IC_bp_strategy_presentation_march_2010_slides.pdf
>
>
> I'm evaluating PDFBox and as part of the process I tried out several PDFs on it. One of the issues I found was on converting certain PDFs to images, it crashed the JVM with this message (Ubuntu Lucid Lynx 64-bit):
> {noformat}
> #
> # A fatal error has been detected by the Java Runtime Environment:
> #
> # SIGSEGV (0xb) at pc=0x00007fe5b6be1a37, pid=2133, tid=140628023412496
> #
> # JRE version: 6.0_20-b02
> # Java VM: Java HotSpot(TM) 64-Bit Server VM (16.3-b01 mixed mode linux-amd64 )
> # Problematic frame:
> # C [libfontmanager.so+0x27a37]
> #
> # An error report file with more information is saved as:
> # /home/xxxxxx/hs_err_pid2133.log
> #
> # If you would like to submit a bug report, please visit:
> # http://java.sun.com/webapps/bugreport/crash.jsp
> # The crash happened outside the Java Virtual Machine in native code.
> # See problematic frame for where to report the bug.
> #
> {noformat}
> Of course, this seems like an issue with Java but it would be nice if PDFBox somehow worked around it. I tested this on 2 separate 64-bit Linux boxes as well as a 32-bit Windows box. Pretty much the same error on both platforms.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
Re: Text extraction : do we need those files ?
Posted by Bernard Segonnes <bs...@free.fr>.
Thanks for the answer.
The PDFBOX-586 is from myself :-)
So, as I expect to have customers in asian, and 'righ to left' countries : I
will keep those files :-(
(I sometimes have Out Of Memory Exception I should catch as my app. runs on
mobile devices/phones). I will optimize elsewhere.
Selon Jukka Zitting <ju...@gmail.com>:
> Hi,
>
> On Mon, Aug 9, 2010 at 3:41 PM, Bernard Segonnes <bs...@free.fr> wrote:
> > I have ported PDFBox 1.1.0 on Android (only text extraction). The binary
> is
> > too big & too slow (probably due to memory constraints...) : around 5Mo
> (9Mo
> > once installed on a mobile device : too much)
>
> See PDFBOX-586 [1] for some related progress.
>
> > Are the files in :
> > 1) cmap require ? (78-EUC_H Adobe-CNS-5 GBK-EUC-V
> UniKS-UTF8-H
> > ...) I would be please to remove all those files :-)
>
> These are only needed for processing PDF documents that use CJK
> (Chinese, Japanese, Korean) fonts. These CMaps are needed to translate
> from the internal font-specific character identification codes to
> Unicode.
>
> > 2) pdf_*.xml are they require for text extraction ? (pdf_he_IL.xml
> > pdf_zh_Hant.xml ....)
>
> These are part of the ICU4J library. You only need ICU4J for handling
> Arabic and other right-to-left languages.
>
> [1] https://issues.apache.org/jira/browse/PDFBOX-586
>
> BR,
>
> Jukka Zitting
>
Bernard SEGONNES
-------------------------------------
bsegonnes@free.fr
http://bsegonnes.free.fr
Re: Text extraction : do we need those files ?
Posted by Jukka Zitting <ju...@gmail.com>.
Hi,
On Mon, Aug 9, 2010 at 3:41 PM, Bernard Segonnes <bs...@free.fr> wrote:
> I have ported PDFBox 1.1.0 on Android (only text extraction). The binary is
> too big & too slow (probably due to memory constraints...) : around 5Mo (9Mo
> once installed on a mobile device : too much)
See PDFBOX-586 [1] for some related progress.
> Are the files in :
> 1) cmap require ? (78-EUC_H Adobe-CNS-5 GBK-EUC-V UniKS-UTF8-H
> ...) I would be please to remove all those files :-)
These are only needed for processing PDF documents that use CJK
(Chinese, Japanese, Korean) fonts. These CMaps are needed to translate
from the internal font-specific character identification codes to
Unicode.
> 2) pdf_*.xml are they require for text extraction ? (pdf_he_IL.xml
> pdf_zh_Hant.xml ....)
These are part of the ICU4J library. You only need ICU4J for handling
Arabic and other right-to-left languages.
[1] https://issues.apache.org/jira/browse/PDFBOX-586
BR,
Jukka Zitting
Text extraction : do we need those files ?
Posted by Bernard Segonnes <bs...@free.fr>.
Hi,
I have ported PDFBox 1.1.0 on Android (only text extraction). The binary is
too big & too slow (probably due to memory constraints...) : around 5Mo (9Mo
once installed on a mobile device : too much)
So I'm looking for files I can delete.... I only need to extract text.
Are the files in :
1) cmap require ? (78-EUC_H Adobe-CNS-5 GBK-EUC-V UniKS-UTF8-H
...) I would be please to remove all those files :-)
2) pdf_*.xml are they require for text extraction ? (pdf_he_IL.xml
pdf_zh_Hant.xml ....)
3) other resoucres file I can remove ?
Thanks for the help.
[jira] Resolved: (PDFBOX-774) convertToImage causes JVM crash on
certain PDFs
Posted by "Jukka Zitting (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PDFBOX-774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jukka Zitting resolved PDFBOX-774.
----------------------------------
Assignee: Jukka Zitting
Resolution: Duplicate
The fix to PDFBOX-780 works around this issue.
> convertToImage causes JVM crash on certain PDFs
> -----------------------------------------------
>
> Key: PDFBOX-774
> URL: https://issues.apache.org/jira/browse/PDFBOX-774
> Project: PDFBox
> Issue Type: Bug
> Affects Versions: 1.2.0, 1.2.1
> Reporter: Niraj Bhawnani
> Assignee: Jukka Zitting
> Attachments: IC_bp_strategy_presentation_march_2010_slides.pdf
>
>
> I'm evaluating PDFBox and as part of the process I tried out several PDFs on it. One of the issues I found was on converting certain PDFs to images, it crashed the JVM with this message (Ubuntu Lucid Lynx 64-bit):
> {noformat}
> #
> # A fatal error has been detected by the Java Runtime Environment:
> #
> # SIGSEGV (0xb) at pc=0x00007fe5b6be1a37, pid=2133, tid=140628023412496
> #
> # JRE version: 6.0_20-b02
> # Java VM: Java HotSpot(TM) 64-Bit Server VM (16.3-b01 mixed mode linux-amd64 )
> # Problematic frame:
> # C [libfontmanager.so+0x27a37]
> #
> # An error report file with more information is saved as:
> # /home/xxxxxx/hs_err_pid2133.log
> #
> # If you would like to submit a bug report, please visit:
> # http://java.sun.com/webapps/bugreport/crash.jsp
> # The crash happened outside the Java Virtual Machine in native code.
> # See problematic frame for where to report the bug.
> #
> {noformat}
> Of course, this seems like an issue with Java but it would be nice if PDFBox somehow worked around it. I tested this on 2 separate 64-bit Linux boxes as well as a 32-bit Windows box. Pretty much the same error on both platforms.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.