You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@pdfbox.apache.org by "Miro Mannino (JIRA)" <ji...@apache.org> on 2017/05/21 06:49:04 UTC

[jira] [Created] (PDFBOX-3799) Problem in TextPosition's hashCode

Miro Mannino created PDFBOX-3799:
------------------------------------

Summary: Problem in TextPosition's hashCode
Key: PDFBOX-3799
URL: https://issues.apache.org/jira/browse/PDFBOX-3799
Project: PDFBox
Issue Type: Bug
Components: Text extraction
Affects Versions: 2.0.6
Reporter: Miro Mannino

Just another side effect related to TextPosition's hashCode

I am using the hashCode because I want to know the color of each letter. To do this, during the processTextPosition, I save the current graphic state in a map, using the current text position as key. Then, on writeString, I iterate all the text positions and I get the color for each of them though this map.

Of course would be easier if this information could be saved in the text position. But this is just a desired feature.

I am discovering that from processTextPosition to writeString sometimes happens that the same textPosition has just a different unicode. In processTextPosition is just a "x" (char 120), but then on writeString the same textPosition the unicode is the x, followed by '̄' (char 772). Everything about the textPosition remains the same: same coordinates, same System.identityHashCode; the only thing that changes is the unicode, which causes the computation of a different hashCode.

That is giving problem. As workaround I am using now System.identityHashCode instead of the current TextPosition's implementation

--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org