You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@poi.apache.org by bu...@apache.org on 2017/11/20 13:53:54 UTC

[Bug 61787] New: Text extraction omitting text incorrectly

https://bz.apache.org/bugzilla/show_bug.cgi?id=61787

            Bug ID: 61787
           Summary: Text extraction omitting text incorrectly
           Product: POI
           Version: unspecified
          Hardware: PC
                OS: All
            Status: NEW
          Severity: normal
          Priority: P2
         Component: XWPF
          Assignee: dev@poi.apache.org
          Reporter: jmarkmurphy@apache.org
  Target Milestone: ---

Text extract omits run text where the run contains a rsidDel attribute. This is
incorrect as rsid* attributes are simply revision session identifiers. It is
possible for this attribute to be present, but the run text still be valid.
Instead of the revision session id attributes, text extract should key on
specific revision tags to determine which text to omit. The appropriate tag to
omit is <delText>

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 61787] Text extraction omitting text incorrectly

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=61787

Dominik Stadler <do...@gmx.at> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Depends on|                            |58067


Referenced Bugs:

https://bz.apache.org/bugzilla/show_bug.cgi?id=58067
[Bug 58067] getText() of XWPFParagraph returns deleted text if in "review" mode
-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 61787] Text extraction omitting text incorrectly

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=61787

Simon Gaeremynck <ga...@gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |gaeremyncks@gmail.com

--- Comment #2 from Simon Gaeremynck <ga...@gmail.com> ---
Created attachment 35540
  --> https://bz.apache.org/bugzilla/attachment.cgi?id=35540&action=edit
A .docx with rsidDel attributes

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 61787] Text extraction omitting text incorrectly

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=61787

Dominik Stadler <do...@gmx.at> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|---                         |FIXED

--- Comment #3 from Dominik Stadler <do...@gmx.at> ---
Adjusted this with r1819405 as follows:
* Instead of rsiddel check for deltext to exclude deleted content
* Also add runs from insertions in trackchanges to add inserted text correctly

Hopefully this now makes it work better across the various ways documents can
contain text-content.

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 61787] Text extraction omitting text incorrectly

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=61787

--- Comment #1 from Mark Murphy <jm...@apache.org> ---
This issue was introduced by Bug #58067

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org