You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@poi.apache.org by bu...@apache.org on 2018/08/01 13:38:04 UTC

[Bug 62591] New: Trivial regression in trunk in isPlaceHolder in newer sl

https://bz.apache.org/bugzilla/show_bug.cgi?id=62591

            Bug ID: 62591
           Summary: Trivial regression in trunk in isPlaceHolder in newer
                    sl
           Product: POI
           Version: 4.0-dev
          Hardware: PC
            Status: NEW
          Severity: trivial
          Priority: P2
         Component: HSLF
          Assignee: dev@poi.apache.org
          Reporter: tallison@mitre.org
  Target Milestone: ---

In TIKA-712 and TIKA-1171, we tried to ignore placeholder text from master
slides.  Three of our unit tests for this fail after upgrading to
4.0.0-SNAPSHOT.  slide.isPlaceHolder() is now returning false for some shapes
for which it used to return true.  In the few cases I've looked at, it appears
to only be a regression for date and slidenumber placeholders.

With 4.0.0 and the sl framework this code:

    @Test
    public void masterBoilerplateTest() throws Exception {
        try (InputStream is =
getResourceAsStream("/test-documents/testPPT_masterText.ppt")) {
            SlideShow slideShow = SlideShowFactory.create(is);
            for (Object masterSheetObj : slideShow.getSlideMasters()) {
                MasterSheet ms = (MasterSheet)masterSheetObj;
                for (Object shape : ms.getShapes()) {
                    if (shape instanceof TextShape) {
                        TextShape ts = (TextShape)shape;
                        System.out.println(ts.getShapeName() + " : "
+ts.isPlaceholder() + " : >"+ts.getText().replaceAll("[\r\n]", "\n")+"<");
                    }
                }
            }
        }
    }

prints out this:
Title Placeholder 1 : true : >Click to edit Master title style<
Text Placeholder 2 : true : >Click to edit Master text styles
Second level
Third level
Fourth level
Fifth level<
Date Placeholder 3 : false : >*<
Footer Placeholder 4 : false : ><
Slide Number Placeholder 5 : false : >*<
TextBox 6 : false : >Text that I added to the master slide<
Title Placeholder 1 : true : >Click to edit Master title style<
Text Placeholder 2 : true : >Click to edit Master text styles
Second level
Third level
Fourth level
Fifth level<
Date Placeholder 3 : false : >*<
Footer Placeholder 4 : false : ><
Slide Number Placeholder 5 : false : >*<


With out older HSLF-specific code, this:
    public void masterBoilerplateTest() throws Exception {
        try (InputStream is =
getResourceAsStream("/test-documents/testPPT_masterText.ppt")) {
            HSLFSlideShow slideShow = new HSLFSlideShow(is);
            for (HSLFMasterSheet masterSheet : slideShow.getSlideMasters()) {
                for (HSLFShape shape : masterSheet.getShapes()) {
                    if (shape instanceof TextShape) {
                        TextShape ts = (TextShape)shape;
                        System.out.println(shape.getShapeName()+ " : "
+HSLFMasterSheet.isPlaceholder(shape) + " :
>"+ts.getText().replaceAll("[\r\n]", "\n")+"<");
                    }
                }
            }
        }
    }

prints out this:
Rectangle : true : >Click to edit Master title style<
Rectangle : true : >Click to edit Master text styles
Second level
Third level
Fourth level
Fifth level<
Rectangle : true : >*<
Rectangle : true : ><
Rectangle : true : >*<
TextBox : false : >Text that I added to the master slide<
Rectangle : true : >Click to edit Master title style<
Rectangle : true : >Click to edit Master text styles
Second level
Third level
Fourth level
Fifth level<
Rectangle : true : >*<
Rectangle : true : ><
Rectangle : true : >*<

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 62591] Trivial regression in trunk in isPlaceHolder in newer sl

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=62591

--- Comment #6 from Andreas Beeker <ki...@apache.org> ---
When I've added the metroblob condition, I've searched for a difference between
a placeholder, which is visible as-is vs. just a placeholder.

Although there was no check for masterTitleText before, I'd expect it to be
contained in the result - especially as setMasterByDefault(true) is set.

When deciding between including or omitting a slidemaster fragment, I would go
for including in the uncertain case.

I'm ok with keeping it the way it is now, if Tika is also handling it like
that.

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


Re: [Bug 62591] Trivial regression in trunk in isPlaceHolder in newer sl

Posted by Tim Allison <ta...@apache.org>.
Will attach it once bugzilla is back up.  Meanwhile, you can grab it
here: https://github.com/apache/tika/blob/master/tika-parsers/src/test/resources/test-documents/testPPT_masterText.ppt
On Tue, Aug 7, 2018 at 5:07 PM <bu...@apache.org> wrote:
>
> https://bz.apache.org/bugzilla/show_bug.cgi?id=62591
>
> Andreas Beeker <ki...@apache.org> changed:
>
>            What    |Removed                     |Added
> ----------------------------------------------------------------------------
>              Status|NEW                         |NEEDINFO
>                  OS|                            |All
>
> --- Comment #1 from Andreas Beeker <ki...@apache.org> ---
> Please add your testPPT_masterText.ppt
>
> --
> You are receiving this mail because:
> You are the assignee for the bug.
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
> For additional commands, e-mail: dev-help@poi.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 62591] Trivial regression in trunk in isPlaceHolder in newer sl

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=62591

Andreas Beeker <ki...@apache.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |NEEDINFO
                 OS|                            |All

--- Comment #1 from Andreas Beeker <ki...@apache.org> ---
Please add your testPPT_masterText.ppt

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 62591] Trivial regression in trunk in isPlaceHolder in newer sl

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=62591

--- Comment #3 from Tim Allison <ta...@mitre.org> ---

3.17:
 @Override
            public boolean isPlaceholder() {
                OEPlaceholderAtom oep = getPlaceholderAtom();
                if (oep != null) {
                    return true;
                }

                //special case for files saved in Office 2007
                RoundTripHFPlaceholder12 hldr = getHFPlaceholderAtom();
                if (hldr != null) {
                    return true;
                }

                return false;
            }

4.0.0-SNAPSHOT:
    @Override
    public boolean isPlaceholder() {
        return
            ((getPlaceholderAtom() != null) ||
            //special case for files saved in Office 2007
            (getHFPlaceholderAtom() != null)) &&
            // check for metro shape of complex placeholder
            (!new HSLFMetroShape<HSLFShape>(this).hasMetroBlob())
        ;
    }

It looks like an extra condition was added in r1829453 (Bug 62092) that the
shape have a metroBlob.

Should we revert that condition or do we need to do something else?

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 62591] Trivial regression in trunk in isPlaceHolder in newer sl

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=62591

--- Comment #2 from Tim Allison <ta...@mitre.org> ---
Created attachment 36080
  --> https://bz.apache.org/bugzilla/attachment.cgi?id=36080&action=edit
testPPT_masterText.ppt

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 62591] Trivial regression in trunk in isPlaceHolder in newer sl

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=62591

Tim Allison <ta...@mitre.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Resolution|---                         |FIXED
             Status|NEEDINFO                    |RESOLVED

--- Comment #5 from Tim Allison <ta...@mitre.org> ---
Actually, even in the current WithMaster.ppt file, "Footer from the master
slide" is included 3 times, even though it should only appear once.

When I remove the new condition, the tests pass, and the footer only appears
once.

In r1837742, I reverted the behavior.  Please reopen and fix if I've
misunderstood something about metroblobs.

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 62591] Trivial regression in trunk in isPlaceHolder in newer sl

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=62591

--- Comment #7 from Tim Allison <ta...@mitre.org> ---
Andi,

  It hurt me to modify anything you've done!  I'm ok with reverting to include 
the metroblob condition, and we can run our own check within Tika for
placeholder/not placeholder to match the previous behavior.

  If you look at ppt files in the content/content_diffs_ignore_exceptions.xlsx
in http://162.242.228.174/reports/poi-4.0.0_reports.tar.gz , there's quite a
bit of extra content with "click" "master "style" (columns:
TOP_10_UNIQUE_TOKEN_DIFFS_B and TOP_10_MORE_IN_B).

  I defer to you on this.

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 62591] Trivial regression in trunk in isPlaceHolder in newer sl

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=62591

--- Comment #4 from Tim Allison <ta...@mitre.org> ---
Created attachment 36082
  --> https://bz.apache.org/bugzilla/attachment.cgi?id=36082&action=edit
another example

This file shows that a placeholder can have a metroblob and still be a place
holder...I think...we shouldn't extract "Titelmasterformat durch Klicken
bearbeiten"...right?

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org