You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@poi.apache.org by bu...@apache.org on 2017/08/31 18:52:04 UTC
[Bug 61475] New: Duplication of content in some XWPF
https://bz.apache.org/bugzilla/show_bug.cgi?id=61475
Bug ID: 61475
Summary: Duplication of content in some XWPF
Product: POI
Version: unspecified
Hardware: PC
Status: NEW
Severity: normal
Priority: P2
Component: XWPF
Assignee: dev@poi.apache.org
Reporter: tallison@mitre.org
Target Milestone: ---
Created attachment 35274
--> https://bz.apache.org/bugzilla/attachment.cgi?id=35274&action=edit
example docx
In regression tests for 3.17-rc2, I found some duplication of content in Tika,
and this is replicated with POI's XWPFWordExtractor.
XWPFDocument doc =
XWPFTestDataSamples.openSampleDocument("dupe1.docx");
XWPFWordExtractor extractor = new XWPFWordExtractor(doc);
In the attached file, "When readers open..." should only appear once, but it
appears twice.
Full reports are here:
http://162.242.228.174/reports/poi-3.17-rc2-docx.tar.gz
Roughly ~8000 docxs have apparently at least some duplicated content out of
~170k. Some of the extra content can be explained by the phonetic/ruby issue,
but not the majority.
--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org
[Bug 61475] Duplication of content in some XWPF
Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=61475
Tim Allison <ta...@mitre.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |RESOLVED
Resolution|--- |FIXED
--- Comment #2 from Tim Allison <ta...@mitre.org> ---
r1806839
--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org
[Bug 61475] Duplication of content in some XWPF
Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=61475
Tim Allison <ta...@mitre.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
OS| |All
--- Comment #1 from Tim Allison <ta...@mitre.org> ---
My fault on 61740.
The appending of the picture text slipped into the loop instead of being
applied after it.
1123 // Any picture text?
1124 if (pictureText != null && pictureText.length() > 0) {
1125 text.append("\n").append(pictureText);
1126 }
--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org