You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@poi.apache.org by bu...@apache.org on 2008/05/15 09:21:34 UTC

DO NOT REPLY [Bug 45001] New: Range.insertBefore() and Range.delete() fail with Unicode characters

https://issues.apache.org/bugzilla/show_bug.cgi?id=45001

           Summary: Range.insertBefore() and Range.delete() fail with
                    Unicode characters
           Product: POI
           Version: 3.0-dev
          Platform: PC
        OS/Version: Linux
            Status: NEW
          Severity: normal
          Priority: P2
         Component: HWPF
        AssignedTo: dev@poi.apache.org
        ReportedBy: nhira@cognocys.com


Created an attachment (id=21966)
 --> (https://issues.apache.org/bugzilla/attachment.cgi?id=21966)
patch to Range, FileInformationBlock, and TextPiece to address problem
described above (see bug text for limitations of patch)

When OpenOffice.org creates MS Word 97 formatted *.doc files, it uses Unicode. 
When Range.insertBefore() and Range.delete() are used with these multi-byte
representations, a couple of different problems occur:
1.  The indices are not calculated correctly so delete() seems to delete
arbitrary characters or fail with IndexOutOfBoundsExceptions
2.  For the same reason, insertBefore() seems to insert text at an arbitrary
position and subsequent operations fail with IndexOutOfBoundsExceptions

There is a marginally related problem with these operations; they do not update
FileInformationBlock.CCPText, and this throws OpenOffice.org for a loop.  It
stops reading character text prematurely and renders document headers and
footers incorrectly.

(see attachment for a partial patch to address both problems; note that the
patch does not address overloaded versions of insertBefore(), nor does it
address insertAfter())


-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


DO NOT REPLY [Bug 45001] Range.insertBefore() and Range.delete() fail with Unicode characters

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=45001





--- Comment #3 from N. Hira <nh...@cognocys.com>  2008-06-13 22:42:41 PST ---
Created an attachment (id=22126)
 --> (https://issues.apache.org/bugzilla/attachment.cgi?id=22126)
Sample document used to test Range.insertBefore() when Range uses Unicode (use
with test case from previous attachment)


-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


DO NOT REPLY [Bug 45001] Range.insertBefore() and Range.delete() fail with Unicode characters

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=45001


N. Hira <nh...@cognocys.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEEDINFO                    |ASSIGNED




--- Comment #4 from N. Hira <nh...@cognocys.com>  2008-06-13 22:49:19 PST ---
Sorry for the delay.  Also have a replaceText() that can be cleaned up and
would make a great addition to the API for mail-merge-type uses...

/**
 *      Replace (one instance of) a piece of text with another...
 *
 *      @param  pPlaceHolder            The text to be replaced (e.g.,
"${company}")
 *      @param  pValue                          The replacement text (e.g.,
"Cognocys, Inc.")
 *      @param  pDocument                       The <code>HWPFDocument</code>
in which the placeholder was found
 *      @param  pStartOffset            The offset or index where the
<code>CharacterRun</code> begins
 *      @param  pPlaceHolderIndex       The offset or index of the placeholder,
relative to the
 *                                                             
<code>CharacterRun</code> where <code>pPlaceHolder</code> was found
 *
 *      @throws DocumentFillerException
 */
protected void replaceText(String pPlaceHolder, String pValue, 
        int pStartOffset, int pPlaceHolderIndex, HWPFDocument pDocument) 
        throws DocumentFillerException {

        int absPlaceHolderIndex = pStartOffset + pPlaceHolderIndex;
        Range subRange = new Range(
                absPlaceHolderIndex, 
                (absPlaceHolderIndex + pPlaceHolder.length()), pDocument);
        if (subRange.usesUnicode()) {

                absPlaceHolderIndex = pStartOffset + (pPlaceHolderIndex * 2);
                subRange = new Range(
                        absPlaceHolderIndex, 
                        (absPlaceHolderIndex + (pPlaceHolder.length() * 2)), 
                        pDocument);
        }

        subRange.insertBefore(pValue);

        // re-create the sub-range so we can delete it
        subRange = new Range(
                (absPlaceHolderIndex + pValue.length()),
                (absPlaceHolderIndex + pPlaceHolder.length() +
pValue.length()), 
                        pDocument);
        if (subRange.usesUnicode())
                subRange = new Range(
                        (absPlaceHolderIndex + (pValue.length() * 2)),
                        (absPlaceHolderIndex + (pPlaceHolder.length() * 2) + 
                                (pValue.length() * 2)), pDocument);

        subRange.delete();
}


-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


DO NOT REPLY [Bug 45001] Range.insertBefore() and Range.delete() fail with Unicode characters

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=45001


Nick Burch <ni...@torchbox.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|ASSIGNED                    |RESOLVED
         Resolution|                            |FIXED




--- Comment #8 from Nick Burch <ni...@torchbox.com>  2008-06-19 04:47:52 PST ---
Thanks for this patch+test, applied to svn


-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


DO NOT REPLY [Bug 45001] Range.insertBefore() and Range.delete() fail with Unicode characters

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=45001


Nick Burch <ni...@torchbox.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|ASSIGNED                    |NEEDINFO




--- Comment #5 from Nick Burch <ni...@torchbox.com>  2008-06-16 05:50:33 PST ---
Thanks for the test case, added to svn

Any chance you could do a unit test for your new replaceText method too? I've
added that to svn too.


-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


DO NOT REPLY [Bug 45001] Range.insertBefore() and Range.delete() fail with Unicode characters

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=45001


N. Hira <nh...@cognocys.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Blocks|                            |45252




-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


DO NOT REPLY [Bug 45001] Range.insertBefore() and Range.delete() fail with Unicode characters

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=45001


N. Hira <nh...@cognocys.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEEDINFO                    |ASSIGNED




--- Comment #7 from N. Hira <nh...@cognocys.com>  2008-06-18 16:14:44 PST ---
The attachment includes the test case and a patch to Range to simplify
replaceText()...

(Thanks, Nick.)


-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


DO NOT REPLY [Bug 45001] Range.insertBefore() and Range.delete() fail with Unicode characters

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=45001


Nick Burch <ni...@torchbox.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|REOPENED                    |RESOLVED
         Resolution|                            |FIXED




--- Comment #11 from Nick Burch <ni...@torchbox.com>  2008-06-28 11:53:55 PST ---
Thanks for the latest patch + test, applied to svn trunk


-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


DO NOT REPLY [Bug 45001] Range.insertBefore() and Range.delete() fail with Unicode characters

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=45001





--- Comment #2 from N. Hira <nh...@cognocys.com>  2008-06-13 22:38:46 PST ---
Created an attachment (id=22125)
 --> (https://issues.apache.org/bugzilla/attachment.cgi?id=22125)
JUnit to test Range.insertBefore() when Range uses Unicode (use with sample
document from next attachment)


-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


DO NOT REPLY [Bug 45001] Range.insertBefore() and Range.delete() fail with Unicode characters

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=45001





--- Comment #10 from N. Hira <nh...@cognocys.com>  2008-06-22 21:03:55 PST ---
Created an attachment (id=22156)
 --> (https://issues.apache.org/bugzilla/attachment.cgi?id=22156)
Patch for TextPiece, with unit test and illustrative document showing problem
with delete()


-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


DO NOT REPLY [Bug 45001] Range.insertBefore() and Range.delete() fail with Unicode characters

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=45001


Nick Burch <ni...@torchbox.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |NEEDINFO




--- Comment #1 from Nick Burch <ni...@torchbox.com>  2008-05-20 09:57:27 PST ---
Thanks for this patch, applied to trunk

Any chance you could also do us a little unit test, so we can be sure this
doesn't get broken again in the future? I'm leaving the bug open for now, until
we've got one


-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


DO NOT REPLY [Bug 45001] Range.insertBefore() and Range.delete() fail with Unicode characters

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=45001





--- Comment #6 from N. Hira <nh...@cognocys.com>  2008-06-18 16:12:49 PST ---
Created an attachment (id=22139)
 --> (https://issues.apache.org/bugzilla/attachment.cgi?id=22139)
Zip file that contains a patch, a test case, and an MS Word document to support
the test case


-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


DO NOT REPLY [Bug 45001] Range.insertBefore() and Range.delete() fail with Unicode characters

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=45001


N. Hira <nh...@cognocys.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|RESOLVED                    |REOPENED
         Resolution|FIXED                       |




--- Comment #9 from N. Hira <nh...@cognocys.com>  2008-06-22 20:57:39 PST ---
Follow up...

(Please let me know if I should create a new bug for this kind of thing in
future.)

I've discovered that the original patch to TextPiece does not function as
expected in that a delete() on a Unicode TextPiece results in the TextPiece
being adjusted to an incorrect length.  

For every N characters deleted, the new length should be (previousLength - N),
but the current code sets it to (previousLength - (N/2)) when the TextPiece
uses Unicode.

I've attached a Unit Test, an illustrative document, and another patch to (the
current version of) TextPiece.

The Unit Test also illustrates how one can delete all instances of some text
from a Range.


-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org