You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@poi.apache.org by bu...@apache.org on 2021/12/03 19:33:46 UTC

[Bug 65721] New: Extracting embedded files from non-standard ppt

https://bz.apache.org/bugzilla/show_bug.cgi?id=65721

            Bug ID: 65721
           Summary: Extracting embedded files from non-standard ppt
           Product: POI
           Version: 5.0.x-dev
          Hardware: PC
                OS: Linux
            Status: NEW
          Severity: normal
          Priority: P2
         Component: HSLF
          Assignee: dev@poi.apache.org
          Reporter: tallison@apache.org
  Target Milestone: ---

Over on https://issues.apache.org/jira/browse/TIKA-3526, matcha007 shared a ppt
file created by WPS 表格 that handles embedded files slightly differently than
standard ppt.

I tried some basic stuff with 5.1.0 and still had little luck.

The file is:
https://issues.apache.org/jira/secure/attachment/13032100/13032100_embedded+attachment.ppt

When I do the usual iterate through slides and then iterate through shapes
looking for HSLFObjectShape, the objectShape.getObjectData() returns null
because, as matcha007 pointed out, the _exEmbed is not found in
HSLFObjectShape's 

private ExEmbed getExEmbed(boolean create) {...

matcha007 found that if he added 3 to the objectId, in getExEmbed, it seemed to
work on this file, but there's no motivation for that (that I know of), and it
looks like it would break everything else.

I can extract the embedded files if I iterate through HSLFObjectData from that
slideshow level:
        POIFSFileSystem pfs = new POIFSFileSystem(p.toFile());
        try (HSLFSlideShow ss = new HSLFSlideShow(pfs.getRoot())) {

            HSLFObjectData[] objectData = ss.getEmbeddedObjects();

However, I can't then link those back to the ids in the shapes for this
particular file.

What can we do with this file?

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 65721] Extracting embedded files not possible from non-standard ppt

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=65721

Dominik Stadler <do...@gmx.at> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Severity|normal                      |enhancement
            Summary|Extracting embedded files   |Extracting embedded files
                   |from non-standard ppt       |not possible from
                   |                            |non-standard ppt

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org