You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@poi.apache.org by bu...@apache.org on 2018/08/01 13:38:04 UTC
[Bug 62591] New: Trivial regression in trunk in isPlaceHolder in
newer sl
https://bz.apache.org/bugzilla/show_bug.cgi?id=62591
Bug ID: 62591
Summary: Trivial regression in trunk in isPlaceHolder in newer
sl
Product: POI
Version: 4.0-dev
Hardware: PC
Status: NEW
Severity: trivial
Priority: P2
Component: HSLF
Assignee: dev@poi.apache.org
Reporter: tallison@mitre.org
Target Milestone: ---
In TIKA-712 and TIKA-1171, we tried to ignore placeholder text from master
slides. Three of our unit tests for this fail after upgrading to
4.0.0-SNAPSHOT. slide.isPlaceHolder() is now returning false for some shapes
for which it used to return true. In the few cases I've looked at, it appears
to only be a regression for date and slidenumber placeholders.
With 4.0.0 and the sl framework this code:
@Test
public void masterBoilerplateTest() throws Exception {
try (InputStream is =
getResourceAsStream("/test-documents/testPPT_masterText.ppt")) {
SlideShow slideShow = SlideShowFactory.create(is);
for (Object masterSheetObj : slideShow.getSlideMasters()) {
MasterSheet ms = (MasterSheet)masterSheetObj;
for (Object shape : ms.getShapes()) {
if (shape instanceof TextShape) {
TextShape ts = (TextShape)shape;
System.out.println(ts.getShapeName() + " : "
+ts.isPlaceholder() + " : >"+ts.getText().replaceAll("[\r\n]", "\n")+"<");
}
}
}
}
}
prints out this:
Title Placeholder 1 : true : >Click to edit Master title style<
Text Placeholder 2 : true : >Click to edit Master text styles
Second level
Third level
Fourth level
Fifth level<
Date Placeholder 3 : false : >*<
Footer Placeholder 4 : false : ><
Slide Number Placeholder 5 : false : >*<
TextBox 6 : false : >Text that I added to the master slide<
Title Placeholder 1 : true : >Click to edit Master title style<
Text Placeholder 2 : true : >Click to edit Master text styles
Second level
Third level
Fourth level
Fifth level<
Date Placeholder 3 : false : >*<
Footer Placeholder 4 : false : ><
Slide Number Placeholder 5 : false : >*<
With out older HSLF-specific code, this:
public void masterBoilerplateTest() throws Exception {
try (InputStream is =
getResourceAsStream("/test-documents/testPPT_masterText.ppt")) {
HSLFSlideShow slideShow = new HSLFSlideShow(is);
for (HSLFMasterSheet masterSheet : slideShow.getSlideMasters()) {
for (HSLFShape shape : masterSheet.getShapes()) {
if (shape instanceof TextShape) {
TextShape ts = (TextShape)shape;
System.out.println(shape.getShapeName()+ " : "
+HSLFMasterSheet.isPlaceholder(shape) + " :
>"+ts.getText().replaceAll("[\r\n]", "\n")+"<");
}
}
}
}
}
prints out this:
Rectangle : true : >Click to edit Master title style<
Rectangle : true : >Click to edit Master text styles
Second level
Third level
Fourth level
Fifth level<
Rectangle : true : >*<
Rectangle : true : ><
Rectangle : true : >*<
TextBox : false : >Text that I added to the master slide<
Rectangle : true : >Click to edit Master title style<
Rectangle : true : >Click to edit Master text styles
Second level
Third level
Fourth level
Fifth level<
Rectangle : true : >*<
Rectangle : true : ><
Rectangle : true : >*<
--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org
[Bug 62591] Trivial regression in trunk in isPlaceHolder in newer sl
Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=62591
--- Comment #6 from Andreas Beeker <ki...@apache.org> ---
When I've added the metroblob condition, I've searched for a difference between
a placeholder, which is visible as-is vs. just a placeholder.
Although there was no check for masterTitleText before, I'd expect it to be
contained in the result - especially as setMasterByDefault(true) is set.
When deciding between including or omitting a slidemaster fragment, I would go
for including in the uncertain case.
I'm ok with keeping it the way it is now, if Tika is also handling it like
that.
--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org
Re: [Bug 62591] Trivial regression in trunk in isPlaceHolder in newer sl
Posted by Tim Allison <ta...@apache.org>.
Will attach it once bugzilla is back up. Meanwhile, you can grab it
here: https://github.com/apache/tika/blob/master/tika-parsers/src/test/resources/test-documents/testPPT_masterText.ppt
On Tue, Aug 7, 2018 at 5:07 PM <bu...@apache.org> wrote:
>
> https://bz.apache.org/bugzilla/show_bug.cgi?id=62591
>
> Andreas Beeker <ki...@apache.org> changed:
>
> What |Removed |Added
> ----------------------------------------------------------------------------
> Status|NEW |NEEDINFO
> OS| |All
>
> --- Comment #1 from Andreas Beeker <ki...@apache.org> ---
> Please add your testPPT_masterText.ppt
>
> --
> You are receiving this mail because:
> You are the assignee for the bug.
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
> For additional commands, e-mail: dev-help@poi.apache.org
>
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org
[Bug 62591] Trivial regression in trunk in isPlaceHolder in newer sl
Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=62591
Andreas Beeker <ki...@apache.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |NEEDINFO
OS| |All
--- Comment #1 from Andreas Beeker <ki...@apache.org> ---
Please add your testPPT_masterText.ppt
--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org
[Bug 62591] Trivial regression in trunk in isPlaceHolder in newer sl
Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=62591
--- Comment #3 from Tim Allison <ta...@mitre.org> ---
3.17:
@Override
public boolean isPlaceholder() {
OEPlaceholderAtom oep = getPlaceholderAtom();
if (oep != null) {
return true;
}
//special case for files saved in Office 2007
RoundTripHFPlaceholder12 hldr = getHFPlaceholderAtom();
if (hldr != null) {
return true;
}
return false;
}
4.0.0-SNAPSHOT:
@Override
public boolean isPlaceholder() {
return
((getPlaceholderAtom() != null) ||
//special case for files saved in Office 2007
(getHFPlaceholderAtom() != null)) &&
// check for metro shape of complex placeholder
(!new HSLFMetroShape<HSLFShape>(this).hasMetroBlob())
;
}
It looks like an extra condition was added in r1829453 (Bug 62092) that the
shape have a metroBlob.
Should we revert that condition or do we need to do something else?
--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org
[Bug 62591] Trivial regression in trunk in isPlaceHolder in newer sl
Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=62591
--- Comment #2 from Tim Allison <ta...@mitre.org> ---
Created attachment 36080
--> https://bz.apache.org/bugzilla/attachment.cgi?id=36080&action=edit
testPPT_masterText.ppt
--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org
[Bug 62591] Trivial regression in trunk in isPlaceHolder in newer sl
Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=62591
Tim Allison <ta...@mitre.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Resolution|--- |FIXED
Status|NEEDINFO |RESOLVED
--- Comment #5 from Tim Allison <ta...@mitre.org> ---
Actually, even in the current WithMaster.ppt file, "Footer from the master
slide" is included 3 times, even though it should only appear once.
When I remove the new condition, the tests pass, and the footer only appears
once.
In r1837742, I reverted the behavior. Please reopen and fix if I've
misunderstood something about metroblobs.
--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org
[Bug 62591] Trivial regression in trunk in isPlaceHolder in newer sl
Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=62591
--- Comment #7 from Tim Allison <ta...@mitre.org> ---
Andi,
It hurt me to modify anything you've done! I'm ok with reverting to include
the metroblob condition, and we can run our own check within Tika for
placeholder/not placeholder to match the previous behavior.
If you look at ppt files in the content/content_diffs_ignore_exceptions.xlsx
in http://162.242.228.174/reports/poi-4.0.0_reports.tar.gz , there's quite a
bit of extra content with "click" "master "style" (columns:
TOP_10_UNIQUE_TOKEN_DIFFS_B and TOP_10_MORE_IN_B).
I defer to you on this.
--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org
[Bug 62591] Trivial regression in trunk in isPlaceHolder in newer sl
Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=62591
--- Comment #4 from Tim Allison <ta...@mitre.org> ---
Created attachment 36082
--> https://bz.apache.org/bugzilla/attachment.cgi?id=36082&action=edit
another example
This file shows that a placeholder can have a metroblob and still be a place
holder...I think...we shouldn't extract "Titelmasterformat durch Klicken
bearbeiten"...right?
--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org