You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@poi.apache.org by Tales Paiva Nogueira <ta...@great.ufc.br> on 2006/06/30 22:01:59 UTC

PPT Pictures

Hi,

I'm having trouble extracting images from PPT files. When I have the same
image more than once, the API identifies only one image even if the size
differs from one to another.

I use the readPictures method in the HSLFSlideShow class which puts the
image streams in the a pictstream vector.

For instance, if there are 3 pictures, being 2 of them the same picture,
the returned vector length is 2.

What can I do to get the real images number?

Thanks,
Tales

---------------------------------------------------------------------
To unsubscribe, e-mail: poi-user-unsubscribe@jakarta.apache.org
Mailing List:     http://jakarta.apache.org/site/mail2.html#poi
The Apache Jakarta Poi Project:  http://jakarta.apache.org/poi/


Re[2]: PPT Pictures

Posted by Yegor Kozlov <ye...@dinom.ru>.
Hi

Attach the bad ppt and we will see what is wrong.
Also I would like to see the code which saves ppt images.

Yegor

mj> Hi

mj> I'd like to report an error that i found while using POI. I was retrieving
mj> the pictures from a ppt file and saving each one in a file and it was
mj> working ok, but i deleted one of the pictures of the PowerPoint file and POI
mj> was still saving it in a file. Then I realized that it happens when the
mj> source file of the picture is NOT a .PNG file. In this case, it happened
mj> with a .BMP and a .JPG file.

mj> Any suggestions?

mj> Thanks,
mj> mr_jonze.

mj> 2006/7/3, Yegor Kozlov <ye...@dinom.ru>:
>>
>> Hi
>>
>> > For instance, if there are 3 pictures, being 2 of them the same picture,
>> > the returned vector length is 2.
>>
>> It's how it is supposed to work. HSLFSlideShow.getPictures() returns the
>> actual array of images contained in the presentation.
>> Each image is included only once regardless of how many times you have it
>> in the slides.
>>
>>
>> > What can I do to get the real images number?
>>
>> It looks like you need the number of images shapes, not the number of
>> actual images contained in the ppt.
>>
>> See the code:
>>
>>         SlideShow ppt = new SlideShow(new HSLFSlideShow("images.ppt"));
>>
>>         //images contained in this slide show
>>         PictureData[] pict = ppt.getPictureData();
>>
>>         //get the number of image shapes
>>         int imageCount = 0;
>>         Slide[] slide = ppt.getSlides();
>>         for (int i = 0; i < slide.length; i++) {
>>             Shape[] sh = slide[i].getShapes();
>>             for (int j = 0; j < sh.length; j++) {
>>                 if (sh[j] instanceof Picture) {
>>                     Picture p = (Picture)sh[j];
>>                     PictureData pdata = p.getPictureData();
>>
>>                     imageCount++;
>>                 }
>>             }
>>         }
>>
>> Since the same image can be placed on several slides imageCount may not
>> equal to pict.length.
>>
>>
>> Yegor
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: poi-user-unsubscribe@jakarta.apache.org
>> Mailing List:     http://jakarta.apache.org/site/mail2.html#poi
>> The Apache Jakarta Poi Project:  http://jakarta.apache.org/poi/
>>
>>


---------------------------------------------------------------------
To unsubscribe, e-mail: poi-user-unsubscribe@jakarta.apache.org
Mailing List:     http://jakarta.apache.org/site/mail2.html#poi
The Apache Jakarta Poi Project:  http://jakarta.apache.org/poi/


Re: PPT Pictures

Posted by Nick Burch <ni...@torchbox.com>.
On Tue, 4 Jul 2006, mr.jonze wrote:
> I'd like to report an error that i found while using POI. I was 
> retrieving the pictures from a ppt file and saving each one in a file 
> and it was working ok, but i deleted one of the pictures of the 
> PowerPoint file and POI was still saving it in a file. Then I realized 
> that it happens when the source file of the picture is NOT a .PNG file. 
> In this case, it happened with a .BMP and a .JPG file.

Are you getting the pictures from HSLFSlideShow, or from individual 
slides?

Quite often, powerpoint will leave behind old records when you do a save. 
So, it's not impossible that when you "delete" some images, all that'll 
happen is the slide -> picture reference is deleted. The pictures could 
well remain in the pictures stream until powerpoint feels like doing a 
full resave


As Yegor said, if you could upload the problem PPT file, and your code to 
bugzilla, that'll help us look into it

Nick

---------------------------------------------------------------------
To unsubscribe, e-mail: poi-user-unsubscribe@jakarta.apache.org
Mailing List:     http://jakarta.apache.org/site/mail2.html#poi
The Apache Jakarta Poi Project:  http://jakarta.apache.org/poi/


Re: Re[2]: PPT Pictures

Posted by "mr.jonze" <mr...@gmail.com>.
Yes, it works the way it should. Thanks a lot.

2006/7/7, Yegor Kozlov <ye...@dinom.ru>:
>
> There are two ways to retrieve images from a ppt file:
>
> 1. Use  HSLFSlideShow.getPictureData().
> This method always returns the content of the image stream,
> It doesn't know if the images are used in the slides or not.
> I would say it corresponds to physical level, not to the logical, i.e.
> it returns images that are stored in  the ppt file, not the ones that
> are actually used.
> As Nick says, if you edit your ppt with 'incremental save' option
> enabled, HSLFSlideShow.getPictureData() can return previously deleted
> images which will be gone on next full resave.
>
> 2. Iterate over the shapes in a slide and collect Picture shapes.
> This approach guarantees that you get only those Pictures that are
> actually used:
>
>         Slide[] slide = ppt.getSlides();
>         for (int i = 0; i < slide.length; i++) {
>             Slide sl = slide[i];
>             Shape[] sh = sl.getShapes();
>             for (int j = 0; j < sh.length; j++) {
>                 Shape shape = sh[j];
>                 if (shape instanceof Picture){
>                     Picture picture = (Picture)shape;
>
>                     PictureData pict = picture.getPictureData();
>
>                     byte[] data = pict.getData();
>                     int type = pict.getType();
>
>                     if (type == Picture.JPEG){
>                         FileOutputStream out = new
> FileOutputStream("_slide"+j+".jpg");
>                         out.write(data);
>                         out.close();
>                     } else if (type == Picture.PNG){
>                         FileOutputStream out = new
> FileOutputStream("_slide"+j+".png");
>                         out.write(data);
>                         out.close();
>                     }
>                 }
>             }
>         }
>
> Yegor
>
> mj> Hi
>
> mj> I'd like to report an error that i found while using POI. I was
> retrieving
> mj> the pictures from a ppt file and saving each one in a file and it was
> mj> working ok, but i deleted one of the pictures of the PowerPoint file
> and POI
> mj> was still saving it in a file. Then I realized that it happens when
> the
> mj> source file of the picture is NOT a .PNG file. In this case, it
> happened
> mj> with a .BMP and a .JPG file.
>
> mj> Any suggestions?
>
> mj> Thanks,
> mj> mr_jonze.
>
> mj> 2006/7/3, Yegor Kozlov <ye...@dinom.ru>:
> >>
> >> Hi
> >>
> >> > For instance, if there are 3 pictures, being 2 of them the same
> picture,
> >> > the returned vector length is 2.
> >>
> >> It's how it is supposed to work. HSLFSlideShow.getPictures() returns
> the
> >> actual array of images contained in the presentation.
> >> Each image is included only once regardless of how many times you have
> it
> >> in the slides.
> >>
> >>
> >> > What can I do to get the real images number?
> >>
> >> It looks like you need the number of images shapes, not the number of
> >> actual images contained in the ppt.
> >>
> >> See the code:
> >>
> >>         SlideShow ppt = new SlideShow(new HSLFSlideShow("images.ppt"));
> >>
> >>         //images contained in this slide show
> >>         PictureData[] pict = ppt.getPictureData();
> >>
> >>         //get the number of image shapes
> >>         int imageCount = 0;
> >>         Slide[] slide = ppt.getSlides();
> >>         for (int i = 0; i < slide.length; i++) {
> >>             Shape[] sh = slide[i].getShapes();
> >>             for (int j = 0; j < sh.length; j++) {
> >>                 if (sh[j] instanceof Picture) {
> >>                     Picture p = (Picture)sh[j];
> >>                     PictureData pdata = p.getPictureData();
> >>
> >>                     imageCount++;
> >>                 }
> >>             }
> >>         }
> >>
> >> Since the same image can be placed on several slides imageCount may not
> >> equal to pict.length.
> >>
> >>
> >> Yegor
> >>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: poi-user-unsubscribe@jakarta.apache.org
> >> Mailing List:     http://jakarta.apache.org/site/mail2.html#poi
> >> The Apache Jakarta Poi Project:  http://jakarta.apache.org/poi/
> >>
> >>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: poi-user-unsubscribe@jakarta.apache.org
> Mailing List:     http://jakarta.apache.org/site/mail2.html#poi
> The Apache Jakarta Poi Project:  http://jakarta.apache.org/poi/
>
>

Re[2]: PPT Pictures

Posted by Yegor Kozlov <ye...@dinom.ru>.
There are two ways to retrieve images from a ppt file:

1. Use  HSLFSlideShow.getPictureData().
This method always returns the content of the image stream,
It doesn't know if the images are used in the slides or not.
I would say it corresponds to physical level, not to the logical, i.e.
it returns images that are stored in  the ppt file, not the ones that
are actually used.
As Nick says, if you edit your ppt with 'incremental save' option
enabled, HSLFSlideShow.getPictureData() can return previously deleted
images which will be gone on next full resave.

2. Iterate over the shapes in a slide and collect Picture shapes.
This approach guarantees that you get only those Pictures that are
actually used:

        Slide[] slide = ppt.getSlides();
        for (int i = 0; i < slide.length; i++) {
            Slide sl = slide[i];
            Shape[] sh = sl.getShapes();
            for (int j = 0; j < sh.length; j++) {
                Shape shape = sh[j];
                if (shape instanceof Picture){
                    Picture picture = (Picture)shape;

                    PictureData pict = picture.getPictureData();

                    byte[] data = pict.getData();
                    int type = pict.getType();

                    if (type == Picture.JPEG){
                        FileOutputStream out = new FileOutputStream("_slide"+j+".jpg");
                        out.write(data);
                        out.close();
                    } else if (type == Picture.PNG){
                        FileOutputStream out = new FileOutputStream("_slide"+j+".png");
                        out.write(data);
                        out.close();
                    }
                }
            }
        }

Yegor

mj> Hi

mj> I'd like to report an error that i found while using POI. I was retrieving
mj> the pictures from a ppt file and saving each one in a file and it was
mj> working ok, but i deleted one of the pictures of the PowerPoint file and POI
mj> was still saving it in a file. Then I realized that it happens when the
mj> source file of the picture is NOT a .PNG file. In this case, it happened
mj> with a .BMP and a .JPG file.

mj> Any suggestions?

mj> Thanks,
mj> mr_jonze.

mj> 2006/7/3, Yegor Kozlov <ye...@dinom.ru>:
>>
>> Hi
>>
>> > For instance, if there are 3 pictures, being 2 of them the same picture,
>> > the returned vector length is 2.
>>
>> It's how it is supposed to work. HSLFSlideShow.getPictures() returns the
>> actual array of images contained in the presentation.
>> Each image is included only once regardless of how many times you have it
>> in the slides.
>>
>>
>> > What can I do to get the real images number?
>>
>> It looks like you need the number of images shapes, not the number of
>> actual images contained in the ppt.
>>
>> See the code:
>>
>>         SlideShow ppt = new SlideShow(new HSLFSlideShow("images.ppt"));
>>
>>         //images contained in this slide show
>>         PictureData[] pict = ppt.getPictureData();
>>
>>         //get the number of image shapes
>>         int imageCount = 0;
>>         Slide[] slide = ppt.getSlides();
>>         for (int i = 0; i < slide.length; i++) {
>>             Shape[] sh = slide[i].getShapes();
>>             for (int j = 0; j < sh.length; j++) {
>>                 if (sh[j] instanceof Picture) {
>>                     Picture p = (Picture)sh[j];
>>                     PictureData pdata = p.getPictureData();
>>
>>                     imageCount++;
>>                 }
>>             }
>>         }
>>
>> Since the same image can be placed on several slides imageCount may not
>> equal to pict.length.
>>
>>
>> Yegor
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: poi-user-unsubscribe@jakarta.apache.org
>> Mailing List:     http://jakarta.apache.org/site/mail2.html#poi
>> The Apache Jakarta Poi Project:  http://jakarta.apache.org/poi/
>>
>>


---------------------------------------------------------------------
To unsubscribe, e-mail: poi-user-unsubscribe@jakarta.apache.org
Mailing List:     http://jakarta.apache.org/site/mail2.html#poi
The Apache Jakarta Poi Project:  http://jakarta.apache.org/poi/


Re: PPT Pictures

Posted by "mr.jonze" <mr...@gmail.com>.
Hi

I'd like to report an error that i found while using POI. I was retrieving
the pictures from a ppt file and saving each one in a file and it was
working ok, but i deleted one of the pictures of the PowerPoint file and POI
was still saving it in a file. Then I realized that it happens when the
source file of the picture is NOT a .PNG file. In this case, it happened
with a .BMP and a .JPG file.

Any suggestions?

Thanks,
mr_jonze.

2006/7/3, Yegor Kozlov <ye...@dinom.ru>:
>
> Hi
>
> > For instance, if there are 3 pictures, being 2 of them the same picture,
> > the returned vector length is 2.
>
> It's how it is supposed to work. HSLFSlideShow.getPictures() returns the
> actual array of images contained in the presentation.
> Each image is included only once regardless of how many times you have it
> in the slides.
>
>
> > What can I do to get the real images number?
>
> It looks like you need the number of images shapes, not the number of
> actual images contained in the ppt.
>
> See the code:
>
>         SlideShow ppt = new SlideShow(new HSLFSlideShow("images.ppt"));
>
>         //images contained in this slide show
>         PictureData[] pict = ppt.getPictureData();
>
>         //get the number of image shapes
>         int imageCount = 0;
>         Slide[] slide = ppt.getSlides();
>         for (int i = 0; i < slide.length; i++) {
>             Shape[] sh = slide[i].getShapes();
>             for (int j = 0; j < sh.length; j++) {
>                 if (sh[j] instanceof Picture) {
>                     Picture p = (Picture)sh[j];
>                     PictureData pdata = p.getPictureData();
>
>                     imageCount++;
>                 }
>             }
>         }
>
> Since the same image can be placed on several slides imageCount may not
> equal to pict.length.
>
>
> Yegor
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: poi-user-unsubscribe@jakarta.apache.org
> Mailing List:     http://jakarta.apache.org/site/mail2.html#poi
> The Apache Jakarta Poi Project:  http://jakarta.apache.org/poi/
>
>

Re: PPT Pictures

Posted by Yegor Kozlov <ye...@dinom.ru>.
Hi

> For instance, if there are 3 pictures, being 2 of them the same picture,
> the returned vector length is 2.

It's how it is supposed to work. HSLFSlideShow.getPictures() returns the actual array of images contained in the presentation.
Each image is included only once regardless of how many times you have it in the slides.


> What can I do to get the real images number?

It looks like you need the number of images shapes, not the number of actual images contained in the ppt.

See the code:

        SlideShow ppt = new SlideShow(new HSLFSlideShow("images.ppt"));

        //images contained in this slide show
        PictureData[] pict = ppt.getPictureData();

        //get the number of image shapes
        int imageCount = 0;
        Slide[] slide = ppt.getSlides();
        for (int i = 0; i < slide.length; i++) {
            Shape[] sh = slide[i].getShapes();
            for (int j = 0; j < sh.length; j++) {
                if (sh[j] instanceof Picture) {
                    Picture p = (Picture)sh[j];
                    PictureData pdata = p.getPictureData();

                    imageCount++;
                }
            }
        }

Since the same image can be placed on several slides imageCount may not equal to pict.length.


Yegor


---------------------------------------------------------------------
To unsubscribe, e-mail: poi-user-unsubscribe@jakarta.apache.org
Mailing List:     http://jakarta.apache.org/site/mail2.html#poi
The Apache Jakarta Poi Project:  http://jakarta.apache.org/poi/


Re: PPT Pictures

Posted by Nick Burch <ni...@torchbox.com>.
On Fri, 30 Jun 2006, Tales Paiva Nogueira wrote:
> I'm having trouble extracting images from PPT files. When I have the
> same image more than once, the API identifies only one image even if the
> size differs from one to another.

There shouldn't be anything special about duplicate images. We just walk
through the binary blob of pictures, pulling them out until we run out of
data.

> For instance, if there are 3 pictures, being 2 of them the same picture,
> the returned vector length is 2.

That shouldn't be the case. Could you open a bug in bugzilla, and upload
the PPT file that this occurs on? We can then take a look, and see if we
can spot what's going wrong

Nick

---------------------------------------------------------------------
To unsubscribe, e-mail: poi-user-unsubscribe@jakarta.apache.org
Mailing List:     http://jakarta.apache.org/site/mail2.html#poi
The Apache Jakarta Poi Project:  http://jakarta.apache.org/poi/