You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@pdfbox.apache.org by "Michael McCandless (Created) (JIRA)" <ji...@apache.org> on 2011/11/05 15:44:51 UTC

[jira] [Created] (PDFBOX-1159) Speed up LZWFilter decoding

Speed up LZWFilter decoding
---------------------------

                 Key: PDFBOX-1159
                 URL: https://issues.apache.org/jira/browse/PDFBOX-1159
             Project: PDFBox
          Issue Type: Improvement
            Reporter: Michael McCandless
            Priority: Minor


I noticed that the LZW decoder performance can be improved: it's
allocating a new byte[] for every byte it visits in the stream.  This
is actually an O(N^2) cost, but N is typically fairly small.

I changed LZWDictionary to use its own private growable byte[] to
accumulate each added byte.  I also changed it to not pre-enroll all
initial (0-255) codes, but instead add it (lazily) on demand if the
code is used.

I also randomized the TestFilters test, and mixed in some
"more predictable" patterns, so we get better testing of the filters.
If the test fails it prints the seed used for the random numbers, so
we can reproduce the failure.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (PDFBOX-1159) Speed up LZWFilter decoding

Posted by "Andreas Lehmkühler (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/PDFBOX-1159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andreas Lehmkühler resolved PDFBOX-1159.
----------------------------------------

       Resolution: Fixed
    Fix Version/s: 1.7.0
         Assignee: Andreas Lehmkühler

I applied the patch in revision 1340063 as proposed except the changes to the Encoding class which has nothing to do with the LZW filter. I guess it was added by accident, wasn't it?

Thanks for the contribution!
                
> Speed up LZWFilter decoding
> ---------------------------
>
>                 Key: PDFBOX-1159
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-1159
>             Project: PDFBox
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>            Assignee: Andreas Lehmkühler
>            Priority: Minor
>             Fix For: 1.7.0
>
>         Attachments: PDFBOX-1159.patch
>
>
> I noticed that the LZW decoder performance can be improved: it's
> allocating a new byte[] for every byte it visits in the stream.  This
> is actually an O(N^2) cost, but N is typically fairly small.
> I changed LZWDictionary to use its own private growable byte[] to
> accumulate each added byte.  I also changed it to not pre-enroll all
> initial (0-255) codes, but instead add it (lazily) on demand if the
> code is used.
> I also randomized the TestFilters test, and mixed in some
> "more predictable" patterns, so we get better testing of the filters.
> If the test fails it prints the seed used for the random numbers, so
> we can reproduce the failure.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PDFBOX-1159) Speed up LZWFilter decoding

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/PDFBOX-1159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13278767#comment-13278767 ] 

Michael McCandless commented on PDFBOX-1159:
--------------------------------------------

Thanks Andreas!

On the Encoding.java change: you're right, it was unrelated to speeding up LZWFilter.  But I think it's still worth committing: it saves a redundant call to getName.

It was just a small code improvement I came across while digging into the LZWFilter...
                
> Speed up LZWFilter decoding
> ---------------------------
>
>                 Key: PDFBOX-1159
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-1159
>             Project: PDFBox
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>            Assignee: Andreas Lehmkühler
>            Priority: Minor
>             Fix For: 1.7.0
>
>         Attachments: PDFBOX-1159.patch
>
>
> I noticed that the LZW decoder performance can be improved: it's
> allocating a new byte[] for every byte it visits in the stream.  This
> is actually an O(N^2) cost, but N is typically fairly small.
> I changed LZWDictionary to use its own private growable byte[] to
> accumulate each added byte.  I also changed it to not pre-enroll all
> initial (0-255) codes, but instead add it (lazily) on demand if the
> code is used.
> I also randomized the TestFilters test, and mixed in some
> "more predictable" patterns, so we get better testing of the filters.
> If the test fails it prints the seed used for the random numbers, so
> we can reproduce the failure.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PDFBOX-1159) Speed up LZWFilter decoding

Posted by "Michael McCandless (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/PDFBOX-1159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13192153#comment-13192153 ] 

Michael McCandless commented on PDFBOX-1159:
--------------------------------------------

I think this patch is ready to be committed: it speeds up the LZW decoder, and improves its test case (instead of the same data every time, its randomizes, but prints the seed if there's a failure so we can reproduce it).
                
> Speed up LZWFilter decoding
> ---------------------------
>
>                 Key: PDFBOX-1159
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-1159
>             Project: PDFBox
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>            Priority: Minor
>         Attachments: PDFBOX-1159.patch
>
>
> I noticed that the LZW decoder performance can be improved: it's
> allocating a new byte[] for every byte it visits in the stream.  This
> is actually an O(N^2) cost, but N is typically fairly small.
> I changed LZWDictionary to use its own private growable byte[] to
> accumulate each added byte.  I also changed it to not pre-enroll all
> initial (0-255) codes, but instead add it (lazily) on demand if the
> code is used.
> I also randomized the TestFilters test, and mixed in some
> "more predictable" patterns, so we get better testing of the filters.
> If the test fails it prints the seed used for the random numbers, so
> we can reproduce the failure.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (PDFBOX-1159) Speed up LZWFilter decoding

Posted by "Michael McCandless (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/PDFBOX-1159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael McCandless updated PDFBOX-1159:
---------------------------------------

    Attachment: PDFBOX-1159.patch

Patch.
                
> Speed up LZWFilter decoding
> ---------------------------
>
>                 Key: PDFBOX-1159
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-1159
>             Project: PDFBox
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>            Priority: Minor
>         Attachments: PDFBOX-1159.patch
>
>
> I noticed that the LZW decoder performance can be improved: it's
> allocating a new byte[] for every byte it visits in the stream.  This
> is actually an O(N^2) cost, but N is typically fairly small.
> I changed LZWDictionary to use its own private growable byte[] to
> accumulate each added byte.  I also changed it to not pre-enroll all
> initial (0-255) codes, but instead add it (lazily) on demand if the
> code is used.
> I also randomized the TestFilters test, and mixed in some
> "more predictable" patterns, so we get better testing of the filters.
> If the test fails it prints the seed used for the random numbers, so
> we can reproduce the failure.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira