You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@poi.apache.org by bu...@apache.org on 2008/05/15 09:21:34 UTC
DO NOT REPLY [Bug 45001] New: Range.insertBefore() and
Range.delete() fail with Unicode characters
https://issues.apache.org/bugzilla/show_bug.cgi?id=45001
Summary: Range.insertBefore() and Range.delete() fail with
Unicode characters
Product: POI
Version: 3.0-dev
Platform: PC
OS/Version: Linux
Status: NEW
Severity: normal
Priority: P2
Component: HWPF
AssignedTo: dev@poi.apache.org
ReportedBy: nhira@cognocys.com
Created an attachment (id=21966)
--> (https://issues.apache.org/bugzilla/attachment.cgi?id=21966)
patch to Range, FileInformationBlock, and TextPiece to address problem
described above (see bug text for limitations of patch)
When OpenOffice.org creates MS Word 97 formatted *.doc files, it uses Unicode.
When Range.insertBefore() and Range.delete() are used with these multi-byte
representations, a couple of different problems occur:
1. The indices are not calculated correctly so delete() seems to delete
arbitrary characters or fail with IndexOutOfBoundsExceptions
2. For the same reason, insertBefore() seems to insert text at an arbitrary
position and subsequent operations fail with IndexOutOfBoundsExceptions
There is a marginally related problem with these operations; they do not update
FileInformationBlock.CCPText, and this throws OpenOffice.org for a loop. It
stops reading character text prematurely and renders document headers and
footers incorrectly.
(see attachment for a partial patch to address both problems; note that the
patch does not address overloaded versions of insertBefore(), nor does it
address insertAfter())
--
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org
DO NOT REPLY [Bug 45001] Range.insertBefore() and Range.delete()
fail with Unicode characters
Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=45001
--- Comment #3 from N. Hira <nh...@cognocys.com> 2008-06-13 22:42:41 PST ---
Created an attachment (id=22126)
--> (https://issues.apache.org/bugzilla/attachment.cgi?id=22126)
Sample document used to test Range.insertBefore() when Range uses Unicode (use
with test case from previous attachment)
--
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org
DO NOT REPLY [Bug 45001] Range.insertBefore() and Range.delete()
fail with Unicode characters
Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=45001
N. Hira <nh...@cognocys.com> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEEDINFO |ASSIGNED
--- Comment #4 from N. Hira <nh...@cognocys.com> 2008-06-13 22:49:19 PST ---
Sorry for the delay. Also have a replaceText() that can be cleaned up and
would make a great addition to the API for mail-merge-type uses...
/**
* Replace (one instance of) a piece of text with another...
*
* @param pPlaceHolder The text to be replaced (e.g.,
"${company}")
* @param pValue The replacement text (e.g.,
"Cognocys, Inc.")
* @param pDocument The <code>HWPFDocument</code>
in which the placeholder was found
* @param pStartOffset The offset or index where the
<code>CharacterRun</code> begins
* @param pPlaceHolderIndex The offset or index of the placeholder,
relative to the
*
<code>CharacterRun</code> where <code>pPlaceHolder</code> was found
*
* @throws DocumentFillerException
*/
protected void replaceText(String pPlaceHolder, String pValue,
int pStartOffset, int pPlaceHolderIndex, HWPFDocument pDocument)
throws DocumentFillerException {
int absPlaceHolderIndex = pStartOffset + pPlaceHolderIndex;
Range subRange = new Range(
absPlaceHolderIndex,
(absPlaceHolderIndex + pPlaceHolder.length()), pDocument);
if (subRange.usesUnicode()) {
absPlaceHolderIndex = pStartOffset + (pPlaceHolderIndex * 2);
subRange = new Range(
absPlaceHolderIndex,
(absPlaceHolderIndex + (pPlaceHolder.length() * 2)),
pDocument);
}
subRange.insertBefore(pValue);
// re-create the sub-range so we can delete it
subRange = new Range(
(absPlaceHolderIndex + pValue.length()),
(absPlaceHolderIndex + pPlaceHolder.length() +
pValue.length()),
pDocument);
if (subRange.usesUnicode())
subRange = new Range(
(absPlaceHolderIndex + (pValue.length() * 2)),
(absPlaceHolderIndex + (pPlaceHolder.length() * 2) +
(pValue.length() * 2)), pDocument);
subRange.delete();
}
--
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org
DO NOT REPLY [Bug 45001] Range.insertBefore() and Range.delete()
fail with Unicode characters
Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=45001
Nick Burch <ni...@torchbox.com> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|ASSIGNED |RESOLVED
Resolution| |FIXED
--- Comment #8 from Nick Burch <ni...@torchbox.com> 2008-06-19 04:47:52 PST ---
Thanks for this patch+test, applied to svn
--
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org
DO NOT REPLY [Bug 45001] Range.insertBefore() and Range.delete()
fail with Unicode characters
Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=45001
Nick Burch <ni...@torchbox.com> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|ASSIGNED |NEEDINFO
--- Comment #5 from Nick Burch <ni...@torchbox.com> 2008-06-16 05:50:33 PST ---
Thanks for the test case, added to svn
Any chance you could do a unit test for your new replaceText method too? I've
added that to svn too.
--
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org
DO NOT REPLY [Bug 45001] Range.insertBefore() and Range.delete()
fail with Unicode characters
Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=45001
N. Hira <nh...@cognocys.com> changed:
What |Removed |Added
----------------------------------------------------------------------------
Blocks| |45252
--
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org
DO NOT REPLY [Bug 45001] Range.insertBefore() and Range.delete()
fail with Unicode characters
Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=45001
N. Hira <nh...@cognocys.com> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEEDINFO |ASSIGNED
--- Comment #7 from N. Hira <nh...@cognocys.com> 2008-06-18 16:14:44 PST ---
The attachment includes the test case and a patch to Range to simplify
replaceText()...
(Thanks, Nick.)
--
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org
DO NOT REPLY [Bug 45001] Range.insertBefore() and Range.delete()
fail with Unicode characters
Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=45001
Nick Burch <ni...@torchbox.com> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|REOPENED |RESOLVED
Resolution| |FIXED
--- Comment #11 from Nick Burch <ni...@torchbox.com> 2008-06-28 11:53:55 PST ---
Thanks for the latest patch + test, applied to svn trunk
--
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org
DO NOT REPLY [Bug 45001] Range.insertBefore() and Range.delete()
fail with Unicode characters
Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=45001
--- Comment #2 from N. Hira <nh...@cognocys.com> 2008-06-13 22:38:46 PST ---
Created an attachment (id=22125)
--> (https://issues.apache.org/bugzilla/attachment.cgi?id=22125)
JUnit to test Range.insertBefore() when Range uses Unicode (use with sample
document from next attachment)
--
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org
DO NOT REPLY [Bug 45001] Range.insertBefore() and Range.delete()
fail with Unicode characters
Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=45001
--- Comment #10 from N. Hira <nh...@cognocys.com> 2008-06-22 21:03:55 PST ---
Created an attachment (id=22156)
--> (https://issues.apache.org/bugzilla/attachment.cgi?id=22156)
Patch for TextPiece, with unit test and illustrative document showing problem
with delete()
--
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org
DO NOT REPLY [Bug 45001] Range.insertBefore() and Range.delete()
fail with Unicode characters
Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=45001
Nick Burch <ni...@torchbox.com> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |NEEDINFO
--- Comment #1 from Nick Burch <ni...@torchbox.com> 2008-05-20 09:57:27 PST ---
Thanks for this patch, applied to trunk
Any chance you could also do us a little unit test, so we can be sure this
doesn't get broken again in the future? I'm leaving the bug open for now, until
we've got one
--
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org
DO NOT REPLY [Bug 45001] Range.insertBefore() and Range.delete()
fail with Unicode characters
Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=45001
--- Comment #6 from N. Hira <nh...@cognocys.com> 2008-06-18 16:12:49 PST ---
Created an attachment (id=22139)
--> (https://issues.apache.org/bugzilla/attachment.cgi?id=22139)
Zip file that contains a patch, a test case, and an MS Word document to support
the test case
--
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org
DO NOT REPLY [Bug 45001] Range.insertBefore() and Range.delete()
fail with Unicode characters
Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=45001
N. Hira <nh...@cognocys.com> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|RESOLVED |REOPENED
Resolution|FIXED |
--- Comment #9 from N. Hira <nh...@cognocys.com> 2008-06-22 20:57:39 PST ---
Follow up...
(Please let me know if I should create a new bug for this kind of thing in
future.)
I've discovered that the original patch to TextPiece does not function as
expected in that a delete() on a Unicode TextPiece results in the TextPiece
being adjusted to an incorrect length.
For every N characters deleted, the new length should be (previousLength - N),
but the current code sets it to (previousLength - (N/2)) when the TextPiece
uses Unicode.
I've attached a Unit Test, an illustrative document, and another patch to (the
current version of) TextPiece.
The Unit Test also illustrates how one can delete all instances of some text
from a Range.
--
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org