You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jackrabbit.apache.org by "fabrizio giustina (JIRA)" <ji...@apache.org> on 2007/02/23 21:48:05 UTC

[jira] Created: (JCR-764) PdfTextFilter may leave parsed document open in case of errors

PdfTextFilter may leave parsed document open in case of errors
--------------------------------------------------------------

                 Key: JCR-764
                 URL: https://issues.apache.org/jira/browse/JCR-764
             Project: Jackrabbit
          Issue Type: Bug
    Affects Versions: 1.2.2
            Reporter: fabrizio giustina
            Priority: Trivial
         Attachments: textfilter_close.diff

In case of errors in a parsed PDF document jackrabbit may fail to properly close the parsed document. PDFBox will write a stack trace to system out at finalize to warn agains this.

this is the resulting log:

WARN org.apache.jackrabbit.core.query.LazyReader LazyReader.java(read:82) 20.02.2007 15:42:50 exception initializing reader org.apache.jackrabbit.core.query.PdfTextFilter$1: java.io.IOException: Error: Expected hex number, actual=' 2'
java.lang.Throwable: Warning: You did not close the PDF Document
   at org.pdfbox.cos.COSDocument.finalize(COSDocument.java:384)
   at java.lang.ref.Finalizer.invokeFinalizeMethod(Native Method)
   at java.lang.ref.Finalizer.runFinalizer(Finalizer.java:83)
   at java.lang.ref.Finalizer.access$100(Finalizer.java:14)
   at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:160)


this may happens because the parse() method at

parser = new PDFParser(new BufferedInputStream(in));
parser.parse();

immediately creates a document, but it can throw an exception while processing the file.
PdfTextFilter should check if parser still holds a document and close it appropriately.



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (JCR-764) PdfTextFilter may leave parsed document open in case of errors

Posted by "Jukka Zitting (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/JCR-764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jukka Zitting updated JCR-764:
------------------------------

          Component/s: indexing
        Fix Version/s: 1.2.3
             Assignee: Jukka Zitting
             Priority: Minor  (was: Trivial)
    Affects Version/s: 1.0
                       1.0.1
                       1.1
                       1.1.1
                       1.2.1

> PdfTextFilter may leave parsed document open in case of errors
> --------------------------------------------------------------
>
>                 Key: JCR-764
>                 URL: https://issues.apache.org/jira/browse/JCR-764
>             Project: Jackrabbit
>          Issue Type: Bug
>          Components: indexing
>    Affects Versions: 1.0, 1.0.1, 1.1, 1.1.1, 1.2.1, 1.2.2
>            Reporter: fabrizio giustina
>         Assigned To: Jukka Zitting
>            Priority: Minor
>             Fix For: 1.2.3
>
>         Attachments: textfilter_close.diff
>
>
> In case of errors in a parsed PDF document jackrabbit may fail to properly close the parsed document. PDFBox will write a stack trace to system out at finalize to warn agains this.
> this is the resulting log:
> WARN org.apache.jackrabbit.core.query.LazyReader LazyReader.java(read:82) 20.02.2007 15:42:50 exception initializing reader org.apache.jackrabbit.core.query.PdfTextFilter$1: java.io.IOException: Error: Expected hex number, actual=' 2'
> java.lang.Throwable: Warning: You did not close the PDF Document
>    at org.pdfbox.cos.COSDocument.finalize(COSDocument.java:384)
>    at java.lang.ref.Finalizer.invokeFinalizeMethod(Native Method)
>    at java.lang.ref.Finalizer.runFinalizer(Finalizer.java:83)
>    at java.lang.ref.Finalizer.access$100(Finalizer.java:14)
>    at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:160)
> this may happens because the parse() method at
> parser = new PDFParser(new BufferedInputStream(in));
> parser.parse();
> immediately creates a document, but it can throw an exception while processing the file.
> PdfTextFilter should check if parser still holds a document and close it appropriately.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (JCR-764) PdfTextFilter may leave parsed document open in case of errors

Posted by "fabrizio giustina (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/JCR-764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

fabrizio giustina updated JCR-764:
----------------------------------

    Attachment: textfilter_close.diff

simple patch which adds an additional cleanup on exceptions.

> PdfTextFilter may leave parsed document open in case of errors
> --------------------------------------------------------------
>
>                 Key: JCR-764
>                 URL: https://issues.apache.org/jira/browse/JCR-764
>             Project: Jackrabbit
>          Issue Type: Bug
>    Affects Versions: 1.2.2
>            Reporter: fabrizio giustina
>            Priority: Trivial
>         Attachments: textfilter_close.diff
>
>
> In case of errors in a parsed PDF document jackrabbit may fail to properly close the parsed document. PDFBox will write a stack trace to system out at finalize to warn agains this.
> this is the resulting log:
> WARN org.apache.jackrabbit.core.query.LazyReader LazyReader.java(read:82) 20.02.2007 15:42:50 exception initializing reader org.apache.jackrabbit.core.query.PdfTextFilter$1: java.io.IOException: Error: Expected hex number, actual=' 2'
> java.lang.Throwable: Warning: You did not close the PDF Document
>    at org.pdfbox.cos.COSDocument.finalize(COSDocument.java:384)
>    at java.lang.ref.Finalizer.invokeFinalizeMethod(Native Method)
>    at java.lang.ref.Finalizer.runFinalizer(Finalizer.java:83)
>    at java.lang.ref.Finalizer.access$100(Finalizer.java:14)
>    at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:160)
> this may happens because the parse() method at
> parser = new PDFParser(new BufferedInputStream(in));
> parser.parse();
> immediately creates a document, but it can throw an exception while processing the file.
> PdfTextFilter should check if parser still holds a document and close it appropriately.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Resolved: (JCR-764) PdfTextFilter may leave parsed document open in case of errors

Posted by "Jukka Zitting (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/JCR-764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jukka Zitting resolved JCR-764.
-------------------------------

    Resolution: Fixed

Patch applied to svn trunk in revision 511509 with some additional comments.

The new PdfTextExtractor class already covered this case, but would have failed in case an IOException had been thrown by the cleanup process. In revision 511510 I added a catch for such cleanup exceptions.

Thanks for the background work on this! To me it seems like this is really a PDFBox bug in that it fails to do proper cleanup in case an exception gets thrown. I'll see if I can formulate a good bug report and a patch for PDFBox to avoid such workarounds in Jackrabbit.

> PdfTextFilter may leave parsed document open in case of errors
> --------------------------------------------------------------
>
>                 Key: JCR-764
>                 URL: https://issues.apache.org/jira/browse/JCR-764
>             Project: Jackrabbit
>          Issue Type: Bug
>          Components: indexing
>    Affects Versions: 1.0, 1.0.1, 1.1, 1.1.1, 1.2.1, 1.2.2
>            Reporter: fabrizio giustina
>         Assigned To: Jukka Zitting
>            Priority: Minor
>             Fix For: 1.2.3
>
>         Attachments: textfilter_close.diff
>
>
> In case of errors in a parsed PDF document jackrabbit may fail to properly close the parsed document. PDFBox will write a stack trace to system out at finalize to warn agains this.
> this is the resulting log:
> WARN org.apache.jackrabbit.core.query.LazyReader LazyReader.java(read:82) 20.02.2007 15:42:50 exception initializing reader org.apache.jackrabbit.core.query.PdfTextFilter$1: java.io.IOException: Error: Expected hex number, actual=' 2'
> java.lang.Throwable: Warning: You did not close the PDF Document
>    at org.pdfbox.cos.COSDocument.finalize(COSDocument.java:384)
>    at java.lang.ref.Finalizer.invokeFinalizeMethod(Native Method)
>    at java.lang.ref.Finalizer.runFinalizer(Finalizer.java:83)
>    at java.lang.ref.Finalizer.access$100(Finalizer.java:14)
>    at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:160)
> this may happens because the parse() method at
> parser = new PDFParser(new BufferedInputStream(in));
> parser.parse();
> immediately creates a document, but it can throw an exception while processing the file.
> PdfTextFilter should check if parser still holds a document and close it appropriately.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.