You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Andreas Lehmkühler (JIRA)" <ji...@apache.org> on 2011/09/14 20:56:09 UTC
[jira] [Resolved] (PDFBOX-1080) Improve TextPosition.isDiacritic
and ICU4JImpl normalizeDiac performance
[ https://issues.apache.org/jira/browse/PDFBOX-1080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Andreas Lehmkühler resolved PDFBOX-1080.
----------------------------------------
Resolution: Fixed
Assignee: Andreas Lehmkühler
I added the proposed improvements in revision 1170768
Thanks for the contribution.
> Improve TextPosition.isDiacritic and ICU4JImpl normalizeDiac performance
> ------------------------------------------------------------------------
>
> Key: PDFBOX-1080
> URL: https://issues.apache.org/jira/browse/PDFBOX-1080
> Project: PDFBox
> Issue Type: Improvement
> Components: Text extraction
> Affects Versions: 1.6.0
> Reporter: Lars Torunski
> Assignee: Andreas Lehmkühler
> Priority: Minor
> Fix For: 1.7.0
>
>
> Character.getType with cText.charAt(0) and index range checks are invoked unnecessarily three times instead of only one time.
> Current 1.6.0 implementation:
> public boolean isDiacritic()
> {
> String cText = this.getCharacter();
> return (cText.length() == 1 && (Character.getType(cText.charAt(0)) == Character.NON_SPACING_MARK
> || Character.getType(cText.charAt(0)) == Character.MODIFIER_SYMBOL
> || Character.getType(cText.charAt(0)) == Character.MODIFIER_LETTER));
> }
> Please use something like this:
> public boolean isDiacritic()
> {
> final String cText = this.getCharacter();
> if (cText.length() != 1) return false;
> final int type = Character.getType(cText.charAt(0));
> return (type == Character.NON_SPACING_MARK
> || type == Character.MODIFIER_SYMBOL
> || type == Character.MODIFIER_LETTER);
> }
> Check the ICU4JImpl.normalizeDiac method also
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira