You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@poi.apache.org by sp0065 <sp...@gmail.com> on 2012/04/11 03:13:22 UTC

How to split input pptx file into a set of single slide files

Hello,

It seems like Apache POI is a great API. But I am a beginner so I am looking
for some help.

I need to split input pptx and possibly ppt files into a set of slides each
stored in the separate output pptx (ppt) file. When input file gets uploaded
to a directory program will be called from my main program (written in PHP)
and do the split.

Is it possible to do it with Apache POI? Code examples would help. May be I
need to delete all slides but one an then repeat for each selected slide?

Thank you!

--
View this message in context: http://apache-poi.1045710.n5.nabble.com/How-to-split-input-pptx-file-into-a-set-of-single-slide-files-tp5631543p5631543.html
Sent from the POI - User mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


Re: How to split input pptx file into a set of single slide files

Posted by sp0065 <sp...@gmail.com>.
Yegor,

It works as I wanted now. This task is done.

Thank you!
Serhiy

--
View this message in context: http://apache-poi.1045710.n5.nabble.com/How-to-split-input-pptx-file-into-a-set-of-single-slide-files-tp5631543p5684212.html
Sent from the POI - User mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


Re: How to split input pptx file into a set of single slide files

Posted by Yegor Kozlov <ye...@dinom.ru>.
Here is the code that should work for you:

public class CompactPackage {

    public static void main(String[] args) throws Exception {
        XMLSlideShow pptx = new XMLSlideShow(new FileInputStream(args[0]));

        OPCPackage pkg = pptx.getPackage();
        for(PackagePart mediaPart :
pkg.getPartsByName(Pattern.compile("/ppt/media/.*?"))){
            if(!isReferenced(mediaPart, pkg)) {
                System.out.println(mediaPart.getPartName() + " is not
referenced. removing.... ");
                pkg.removePart(mediaPart);
            }
        }
    }

    /**
     * Check if a package part is referenced by any other part in the
OPC package
     *
     * @param mediaPart     the part to check for references
     * @param pkg           the package this parts belong to
     * @return              whether mediaPart is referenced or not
     */
    public static boolean isReferenced(PackagePart mediaPart,
OPCPackage pkg) throws Exception {
        for(PackagePart part : pkg.getParts()){
            if(part.isRelationshipPart()) continue;

            for(PackageRelationship rel : part.getRelationships()){
                if(
mediaPart.getPartName().getURI().equals(rel.getTargetURI())){
                    //System.out.println("mediaPart[" +
mediaPart.getPartName() + "] is referenced by " + part.getPartName());
                    return true;
                }
            }
        }
        return false;
    }
}

P.S. You may want to cleanup "/ppt/embeddings/.*?" too.

Cheers,
Yegor

On Thu, May 3, 2012 at 11:07 AM, sp0065 <sp...@gmail.com> wrote:
> Yegor,
>
> Thanks! Almost done. I generated the list of files in the /ppt/media/
> as you suggested.
>
> Now I am trying to generate a list of files referenced in my
> one-slide-presentation-file. Then I am going to delete those files
> that are not referenced. So far I was able to generate list of
> relationships from the only-slide I have. Relationships look like:
> <Relationship Id="rId2"
> Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/image"
> Target="../media/image1.jpeg"/>
>
> =========
> XSLFSlide slide0 = pptx.getSlides()[0];
> List<POIXMLDocumentPart> rels = slide0.getRelations();
>
> for(POIXMLDocumentPart r : rels){
>  PackageRelationship r1 = r.getPackageRelationship();
>  String name = r1.toString();
>  console.printf(name);
>  console.printf("\n");
> }
> =========
>
> Result of this code is:
> id=rId2 - container=org.apache.poi.openxml4j.opc.ZipPackage@4e79f1 -
> relationshipType=http://schemas.openxmlformats.org/officeDocument/2006/relationships/image
> - source=/ppt/slides/slide2.xml -
> target=/ppt/media/image1.jpeg,targetMode=INTERNAL
>
> I can match filename from the /ppt/media/ to the fiename in the the
> relationship using regex but it would be nice to generate only target
> part of the relationship.
>
> Any advise in terms of if this is the right direction would help.
>
>
>
>
> On Wed, May 2, 2012 at 2:25 AM, Yegor Kozlov-4 [via Apache POI]
> <ml...@n5.nabble.com> wrote:
>> POI does remove unused parts when removing slides. A media part can be
>> referenced by multiple slides and such compaction should be done when
>> writing slideshow, not when removing slides.
>>
>>
>> The code to compact pptx files and remove unreferenced media parts can
>> look as follows:
>>
>>         List<PackagePart> mediaParts =
>> pptx.getPackage().getPartsByName(Pattern.compile("/ppt/media/.*?"));
>>         for(PackagePart part : mediaParts){
>>            // TODO: check if this media part is referenced by other
>> slides and remove if it is not
>>             if(unused) {
>>                 pptx.getPackage().removePart(part);
>>             }
>>         }
>>
>>
>> Yegor
>>
>> On Tue, May 1, 2012 at 8:05 PM, sp0065 <[hidden email]> wrote:
>>
>>> I am looking for a way to compact/shrink pptx files that are generated as
>>> a
>>> results of removing all slides but one slide with slide.removeSlide(X). As
>>> I
>>> mentioned, size of the result files is almost the same as the size of the
>>> source presentation file because unnecessary images used in the removed
>>> slides were not removed from the file (ppt\media).
>>>
>>> When I opened one-slide-file in PowerPoint and Saved As, file was
>>> compacted
>>> and unnecessary images were removed. Can I do "Save As" programmatically
>>> with Apache POI by adding some operation to my program?
>>>
>>> Thank you.
>>>
>>> --
>>> View this message in context:
>>> http://apache-poi.1045710.n5.nabble.com/How-to-split-input-pptx-file-into-a-set-of-single-slide-files-tp5631543p5678348.html
>>
>>> Sent from the POI - User mailing list archive at Nabble.com.
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: [hidden email]
>>> For additional commands, e-mail: [hidden email]
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]
>>
>>
>>
>> ________________________________
>> If you reply to this email, your message will be added to the discussion
>> below:
>> http://apache-poi.1045710.n5.nabble.com/How-to-split-input-pptx-file-into-a-set-of-single-slide-files-tp5631543p5679837.html
>> To unsubscribe from How to split input pptx file into a set of single slide
>> files, click here.
>> NAML
>
>
> --
> View this message in context: http://apache-poi.1045710.n5.nabble.com/How-to-split-input-pptx-file-into-a-set-of-single-slide-files-tp5631543p5682458.html
> Sent from the POI - User mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


Re: How to split input pptx file into a set of single slide files

Posted by sp0065 <sp...@gmail.com>.
Yegor,

Thanks! Almost done. I generated the list of files in the /ppt/media/
as you suggested.

Now I am trying to generate a list of files referenced in my
one-slide-presentation-file. Then I am going to delete those files
that are not referenced. So far I was able to generate list of
relationships from the only-slide I have. Relationships look like:
<Relationship Id="rId2"
Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/image"
Target="../media/image1.jpeg"/>

=========
XSLFSlide slide0 = pptx.getSlides()[0];
List<POIXMLDocumentPart> rels = slide0.getRelations();

for(POIXMLDocumentPart r : rels){
  PackageRelationship r1 = r.getPackageRelationship();
  String name = r1.toString();
  console.printf(name);
  console.printf("\n");
}
=========

Result of this code is:
id=rId2 - container=org.apache.poi.openxml4j.opc.ZipPackage@4e79f1 -
relationshipType=http://schemas.openxmlformats.org/officeDocument/2006/relationships/image
- source=/ppt/slides/slide2.xml -
target=/ppt/media/image1.jpeg,targetMode=INTERNAL

I can match filename from the /ppt/media/ to the fiename in the the
relationship using regex but it would be nice to generate only target
part of the relationship.

Any advise in terms of if this is the right direction would help.




On Wed, May 2, 2012 at 2:25 AM, Yegor Kozlov-4 [via Apache POI]
<ml...@n5.nabble.com> wrote:
> POI does remove unused parts when removing slides. A media part can be
> referenced by multiple slides and such compaction should be done when
> writing slideshow, not when removing slides.
>
>
> The code to compact pptx files and remove unreferenced media parts can
> look as follows:
>
>         List<PackagePart> mediaParts =
> pptx.getPackage().getPartsByName(Pattern.compile("/ppt/media/.*?"));
>         for(PackagePart part : mediaParts){
>            // TODO: check if this media part is referenced by other
> slides and remove if it is not
>             if(unused) {
>                 pptx.getPackage().removePart(part);
>             }
>         }
>
>
> Yegor
>
> On Tue, May 1, 2012 at 8:05 PM, sp0065 <[hidden email]> wrote:
>
>> I am looking for a way to compact/shrink pptx files that are generated as
>> a
>> results of removing all slides but one slide with slide.removeSlide(X). As
>> I
>> mentioned, size of the result files is almost the same as the size of the
>> source presentation file because unnecessary images used in the removed
>> slides were not removed from the file (ppt\media).
>>
>> When I opened one-slide-file in PowerPoint and Saved As, file was
>> compacted
>> and unnecessary images were removed. Can I do "Save As" programmatically
>> with Apache POI by adding some operation to my program?
>>
>> Thank you.
>>
>> --
>> View this message in context:
>> http://apache-poi.1045710.n5.nabble.com/How-to-split-input-pptx-file-into-a-set-of-single-slide-files-tp5631543p5678348.html
>
>> Sent from the POI - User mailing list archive at Nabble.com.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>
>
> ________________________________
> If you reply to this email, your message will be added to the discussion
> below:
> http://apache-poi.1045710.n5.nabble.com/How-to-split-input-pptx-file-into-a-set-of-single-slide-files-tp5631543p5679837.html
> To unsubscribe from How to split input pptx file into a set of single slide
> files, click here.
> NAML


--
View this message in context: http://apache-poi.1045710.n5.nabble.com/How-to-split-input-pptx-file-into-a-set-of-single-slide-files-tp5631543p5682458.html
Sent from the POI - User mailing list archive at Nabble.com.

Re: How to split input pptx file into a set of single slide files

Posted by Yegor Kozlov <ye...@dinom.ru>.
POI does remove unused parts when removing slides. A media part can be
referenced by multiple slides and such compaction should be done when
writing slideshow, not when removing slides.


The code to compact pptx files and remove unreferenced media parts can
look as follows:

        List<PackagePart> mediaParts =
pptx.getPackage().getPartsByName(Pattern.compile("/ppt/media/.*?"));
        for(PackagePart part : mediaParts){
           // TODO: check if this media part is referenced by other
slides and remove if it is not
            if(unused) {
                pptx.getPackage().removePart(part);
            }
        }


Yegor

On Tue, May 1, 2012 at 8:05 PM, sp0065 <sp...@gmail.com> wrote:
> I am looking for a way to compact/shrink pptx files that are generated as a
> results of removing all slides but one slide with slide.removeSlide(X). As I
> mentioned, size of the result files is almost the same as the size of the
> source presentation file because unnecessary images used in the removed
> slides were not removed from the file (ppt\media).
>
> When I opened one-slide-file in PowerPoint and Saved As, file was compacted
> and unnecessary images were removed. Can I do "Save As" programmatically
> with Apache POI by adding some operation to my program?
>
> Thank you.
>
> --
> View this message in context: http://apache-poi.1045710.n5.nabble.com/How-to-split-input-pptx-file-into-a-set-of-single-slide-files-tp5631543p5678348.html
> Sent from the POI - User mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
> For additional commands, e-mail: user-help@poi.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


Re: How to split input pptx file into a set of single slide files

Posted by sp0065 <sp...@gmail.com>.
I am looking for a way to compact/shrink pptx files that are generated as a
results of removing all slides but one slide with slide.removeSlide(X). As I
mentioned, size of the result files is almost the same as the size of the
source presentation file because unnecessary images used in the removed
slides were not removed from the file (ppt\media).

When I opened one-slide-file in PowerPoint and Saved As, file was compacted
and unnecessary images were removed. Can I do "Save As" programmatically
with Apache POI by adding some operation to my program?

Thank you.

--
View this message in context: http://apache-poi.1045710.n5.nabble.com/How-to-split-input-pptx-file-into-a-set-of-single-slide-files-tp5631543p5678348.html
Sent from the POI - User mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


Re: How to split input pptx file into a set of single slide files

Posted by sp0065 <sp...@gmail.com>.
I was able to split ppts presentation into slides but still have few
problems.

First method I tried was copying content of each slide and output into
separate file:

for(XSLFSlide srcSlide : src.getSlides()){
 XMLSlideShow ppt = new XMLSlideShow();
 ppt.createSlide().importContent(srcSlide);
...

It worked but it did not copy theme "decoration" if a theme was used. Also
it did not copy text in bulleted lists and some other text.

Second method I tried was:
slide.removeSlide(X)

So I removed all slides except one and write out results int separate file.
This worked OK and handled any theme "decoration". However, file for each
output slide had a size that is close to the size of the source presentation
file (in my case 12MB). I looked inside of the side-file archive and found
that directory ppt\media includes images from all slides not only from
current slide. So unnecessary images were not deleted. I can live with this.
However, any thoughts would help to improve the program.

Thank you.



--
View this message in context: http://apache-poi.1045710.n5.nabble.com/How-to-split-input-pptx-file-into-a-set-of-single-slide-files-tp5631543p5677256.html
Sent from the POI - User mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


Re: How to split input pptx file into a set of single slide files

Posted by sp0065 <sp...@gmail.com>.
Yegor,
Your answer is really helpful. I will try to implement code tomorrow.
Thank you!



On Wed, Apr 11, 2012 at 1:41 AM, Yegor Kozlov-4 [via Apache POI]
<ml...@n5.nabble.com> wrote:
> Have a look at this example:
> https://svn.apache.org/repos/asf/poi/trunk/src/examples/src/org/apache/poi/xslf/usermodel/MergePresentations.java
>
> To split a .pptx presentation into just follow the pattern:
>
>             int num = 0;
>             for(XSLFSlide srcSlide : src.getSlides()){
>                 XMLSlideShow ppt = new XMLSlideShow();
>                 ppt.createSlide().importContent(srcSlide);
>
>                 FileOutputStream out = new FileOutputStream("slide-" +
> num +".pptx");
>                 ppt.write(out);
>                 out.close();
>
>                 num++;
>             }
>
> Yegor
> On Wed, Apr 11, 2012 at 5:13 AM, sp0065 <[hidden email]> wrote:
>
>> Hello,
>>
>> It seems like Apache POI is a great API. But I am a beginner so I am
>> looking
>> for some help.
>>
>> I need to split input pptx and possibly ppt files into a set of slides
>> each
>> stored in the separate output pptx (ppt) file. When input file gets
>> uploaded
>> to a directory program will be called from my main program (written in
>> PHP)
>> and do the split.
>>
>> Is it possible to do it with Apache POI? Code examples would help. May be
>> I
>> need to delete all slides but one an then repeat for each selected slide?
>>
>> Thank you!
>>
>> --
>> View this message in context:
>> http://apache-poi.1045710.n5.nabble.com/How-to-split-input-pptx-file-into-a-set-of-single-slide-files-tp5631543p5631543.html
>> Sent from the POI - User mailing list archive at Nabble.com.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>
>
> ________________________________
> If you reply to this email, your message will be added to the discussion
> below:
> http://apache-poi.1045710.n5.nabble.com/How-to-split-input-pptx-file-into-a-set-of-single-slide-files-tp5631543p5631957.html
> To unsubscribe from How to split input pptx file into a set of single slide
> files, click here.
> NAML


--
View this message in context: http://apache-poi.1045710.n5.nabble.com/How-to-split-input-pptx-file-into-a-set-of-single-slide-files-tp5631543p5637100.html
Sent from the POI - User mailing list archive at Nabble.com.

Re: How to split input pptx file into a set of single slide files

Posted by Yegor Kozlov <ye...@dinom.ru>.
Have a look at this example:
https://svn.apache.org/repos/asf/poi/trunk/src/examples/src/org/apache/poi/xslf/usermodel/MergePresentations.java

To split a .pptx presentation into just follow the pattern:

            int num = 0;
            for(XSLFSlide srcSlide : src.getSlides()){
                XMLSlideShow ppt = new XMLSlideShow();
                ppt.createSlide().importContent(srcSlide);

                FileOutputStream out = new FileOutputStream("slide-" +
num +".pptx");
                ppt.write(out);
                out.close();

                num++;
            }

Yegor
On Wed, Apr 11, 2012 at 5:13 AM, sp0065 <sp...@gmail.com> wrote:
> Hello,
>
> It seems like Apache POI is a great API. But I am a beginner so I am looking
> for some help.
>
> I need to split input pptx and possibly ppt files into a set of slides each
> stored in the separate output pptx (ppt) file. When input file gets uploaded
> to a directory program will be called from my main program (written in PHP)
> and do the split.
>
> Is it possible to do it with Apache POI? Code examples would help. May be I
> need to delete all slides but one an then repeat for each selected slide?
>
> Thank you!
>
> --
> View this message in context: http://apache-poi.1045710.n5.nabble.com/How-to-split-input-pptx-file-into-a-set-of-single-slide-files-tp5631543p5631543.html
> Sent from the POI - User mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
> For additional commands, e-mail: user-help@poi.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org