You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by Qingchao Kong <kq...@gmail.com> on 2014/04/30 09:07:17 UTC

TextPositionComparator violate the contract of the Comparator interface

Hi, I am using PDFBox to extract text. Here is the code:

PDFTextStripperByArea stripper = new PDFTextStripperByArea();
stripper.addRegion("cropbox", rec);
stripper.setSortByPosition(true);

Line  "stripper.setSortByPosition(true)" causes the following error:

Exception in thread "main" java.lang.IllegalArgumentException:
Comparison method violates its general contract!
at java.util.TimSort.mergeLo(TimSort.java:747)
at java.util.TimSort.mergeAt(TimSort.java:483)
at java.util.TimSort.mergeCollapse(TimSort.java:408)
at java.util.TimSort.sort(TimSort.java:214)
at java.util.TimSort.sort(TimSort.java:173)
at java.util.Arrays.sort(Arrays.java:659)
at java.util.Collections.sort(Collections.java:217)
at org.apache.pdfbox.util.PDFTextStripper.writePage(PDFTextStripper.java:565)
at org.apache.pdfbox.util.PDFTextStripperByArea.writePage(PDFTextStripperByArea.java:190)
at org.apache.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:457)
at org.apache.pdfbox.util.PDFTextStripperByArea.extractRegions(PDFTextStripperByArea.java:153)

Is this a bug? Is there a fix I can use?

Refer to my question on Stackoverflow:
http://stackoverflow.com/questions/23377520/how-to-define-regions-in-pdftextstripperbyarea

Re: TextPositionComparator violate the contract of the Comparator interface

Posted by Tilman Hausherr <TH...@t-online.de>.
Known problem :-(
https://issues.apache.org/jira/browse/PDFBOX-1512

Tilman

Am 30.04.2014 09:07, schrieb Qingchao Kong:
> Hi, I am using PDFBox to extract text. Here is the code:
>
> PDFTextStripperByArea stripper = new PDFTextStripperByArea();
> stripper.addRegion("cropbox", rec);
> stripper.setSortByPosition(true);
>
> Line  "stripper.setSortByPosition(true)" causes the following error:
>
> Exception in thread "main" java.lang.IllegalArgumentException:
> Comparison method violates its general contract!
> at java.util.TimSort.mergeLo(TimSort.java:747)
> at java.util.TimSort.mergeAt(TimSort.java:483)
> at java.util.TimSort.mergeCollapse(TimSort.java:408)
> at java.util.TimSort.sort(TimSort.java:214)
> at java.util.TimSort.sort(TimSort.java:173)
> at java.util.Arrays.sort(Arrays.java:659)
> at java.util.Collections.sort(Collections.java:217)
> at org.apache.pdfbox.util.PDFTextStripper.writePage(PDFTextStripper.java:565)
> at org.apache.pdfbox.util.PDFTextStripperByArea.writePage(PDFTextStripperByArea.java:190)
> at org.apache.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:457)
> at org.apache.pdfbox.util.PDFTextStripperByArea.extractRegions(PDFTextStripperByArea.java:153)
>
> Is this a bug? Is there a fix I can use?
>
> Refer to my question on Stackoverflow:
> http://stackoverflow.com/questions/23377520/how-to-define-regions-in-pdftextstripperbyarea