You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Mel Martinez (JIRA)" <ji...@apache.org> on 2010/01/14 21:41:54 UTC

[jira] Updated: (PDFBOX-600) PDFBox performance issue: PDFTextStripper performance tweak

     [ https://issues.apache.org/jira/browse/PDFBOX-600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mel Martinez updated PDFBOX-600:
--------------------------------

    Attachment: PDFTextStripper.java

flips the conditional expression component order in the within() method to speed up the test on left-to-right text.


> PDFBox performance issue:  PDFTextStripper performance tweak
> ------------------------------------------------------------
>
>                 Key: PDFBOX-600
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-600
>             Project: PDFBox
>          Issue Type: Improvement
>          Components: Text extraction
>    Affects Versions: 0.8.0-incubator
>         Environment: All
>            Reporter: Mel Martinez
>         Attachments: PDFTextStripper.java
>
>
> During text extraction, the PDFTextStripper needs to calculate textposition proximities in order to determine if text elements are overlapping either vertically or horizontally.
> As part of this, the PDFTextStripper.within(float first, float second, float variance) method is used.
> The current (0.8.0) version of this method uses the following test:   second > first - variance && second < first + variance
> This is accurate, but slower in my test documents than if you flip the test order:        second < first + variance && second > first - variance
> This is because the second test fails out faster on left-to-right text.   I believe that should be the default case.
> Please change the PDFTextStripper.within() method to use the second form of the test.  I.E. to:
>     private boolean within( float first, float second, float variance )
>     {
>         return second < first + variance && second > first - variance;
>     }
> Thanks!

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.