You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Uwe (JIRA)" <ji...@apache.org> on 2014/09/09 22:03:30 UTC
[jira] [Commented] (PDFBOX-1512) TextPositionComparator is not
compatible with Java 7
[ https://issues.apache.org/jira/browse/PDFBOX-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14127473#comment-14127473 ]
Uwe commented on PDFBOX-1512:
-----------------------------
Hi,
I wrote a patch to implement quicksort, see quicksort.patch.
Note: While the patch is written against the PDFBox 1.7 branch, it should be easy to apply on the trunk as well. The patch adds this:
* A Quicksort implementation and unit test
* A check if we're running on JDK6 or less: If yes: use Collections.sort(), if no, use the QuickSort.sort()
The custom quicksort implementation is probably a fair bit slower than the one in the JDK (on my box the unit test TestTextStripper takes about 13s instead of 3 (4x), which is the reason for the JDK check. Another option would be to simply catch the exception and only run quicksort as a fallback in this case.
Uwe
> TextPositionComparator is not compatible with Java 7
> ----------------------------------------------------
>
> Key: PDFBOX-1512
> URL: https://issues.apache.org/jira/browse/PDFBOX-1512
> Project: PDFBox
> Issue Type: Bug
> Components: Text extraction
> Affects Versions: 1.7.1
> Environment: Java 7
> Reporter: Benjamin Papez
> Attachments: FOP-2252.pdf, TextPositionComparator.java, Topo.pdf, Topo.txt, TopoContained.pdf, TopoContained.txt, TopoOverlap.pdf, TopoOverlap.txt, WFI_PDFParser_TextPostionComparator.txt, illustration-of-inconsistent-sorting.png, immo-kurier_arsenal_93x62.pdf
>
>
> The TextPostionCompartor causes the following exception running on Java 7: Unexpected RuntimeException from org.apache.tika.parser.ParserDecorator$1@9007fa2 Original cause: Comparison method violates its general contract!
> I think the problem is with this check:
> if ( yDifference < .1 ||
> (pos2YBottom >= pos1YTop && pos2YBottom <= pos1YBottom) ||
> (pos1YBottom >= pos2YTop && pos1YBottom <= pos2YBottom))
> as it violates the contract requirement:
> The implementor must also ensure that the relation is transitive: ((compare(x, y)>0) && (compare(y, z)>0)) implies compare(x, z)>0.
> Finally, the implementor must ensure that compare(x, y)==0 implies that sgn(compare(x, z))==sgn(compare(y, z)) for all z.
> Java 7 now is strict and throws exceptions when the contract is violated.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)