You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@poi.apache.org by Peter Becker <pe...@peterbecker.de> on 2004/06/14 06:51:25 UTC

Re: PowerPoint to Text

Hello all,

did anyone try this? It does not work for me, I tried running it on a 
sample PPT found on Google 
(http://www.kcmetro.cc.mo.us/longview/ctac/powerpoint/ct.ppt) as well as 
some PPT I created myself in OpenOffice. The results are a number of 
empty text files and error messages like this (copy of stdout, with 
empty lines removed -- there were three before each line starting with a 
slash):

<snip>
\Current User
org.apache.poi.hpsf.NoPropertySetStreamException
\PowerPoint Document
org.apache.poi.hpsf.NoPropertySetStreamException
\PersistentStorage Directory
org.apache.poi.hpsf.NoPropertySetStreamException
\DocumentSummaryInformation
org.apache.poi.hpsf.NoPropertySetStreamException
\SummaryInformation
org.apache.poi.hpsf.NoPropertySetStreamException
\Text_Content
org.apache.poi.hpsf.NoPropertySetStreamException
\Object4CompObj
org.apache.poi.hpsf.NoPropertySetStreamException
\Object4Ole10Native
org.apache.poi.hpsf.NoPropertySetStreamException
\Object4Ole
org.apache.poi.hpsf.NoPropertySetStreamException
\Object6CompObj
org.apache.poi.hpsf.NoPropertySetStreamException
\Object6Ole10Native
org.apache.poi.hpsf.NoPropertySetStreamException
\Object6Ole
org.apache.poi.hpsf.NoPropertySetStreamException
\Object8CompObj
org.apache.poi.hpsf.NoPropertySetStreamException
\Object8Ole10Native
org.apache.poi.hpsf.NoPropertySetStreamException
\Object8Ole
org.apache.poi.hpsf.NoPropertySetStreamException
\Object9CompObj
org.apache.poi.hpsf.NoPropertySetStreamException
\Object9OlePres000
org.apache.poi.hpsf.NoPropertySetStreamException
\Object9Ole10Native
org.apache.poi.hpsf.NoPropertySetStreamException
\Object9Ole
org.apache.poi.hpsf.NoPropertySetStreamException
\Object10CompObj
org.apache.poi.hpsf.NoPropertySetStreamException
\Object10Ole10Native
org.apache.poi.hpsf.NoPropertySetStreamException
\Object10Ole
org.apache.poi.hpsf.NoPropertySetStreamException
\Object7CompObj
org.apache.poi.hpsf.NoPropertySetStreamException
\Object7Ole10Native
org.apache.poi.hpsf.NoPropertySetStreamException
\Object7Ole
org.apache.poi.hpsf.NoPropertySetStreamException
\Object5CompObj
org.apache.poi.hpsf.NoPropertySetStreamException
\Object5Ole10Native
org.apache.poi.hpsf.NoPropertySetStreamException
\Object5Ole
org.apache.poi.hpsf.NoPropertySetStreamException
\Object2CompObj
org.apache.poi.hpsf.NoPropertySetStreamException
\Object2OlePres000
org.apache.poi.hpsf.NoPropertySetStreamException
\Object2Ole10Native
org.apache.poi.hpsf.NoPropertySetStreamException
\Object2Ole
org.apache.poi.hpsf.NoPropertySetStreamException
\Object3CompObj
org.apache.poi.hpsf.NoPropertySetStreamException
\Object3Ole10Native
org.apache.poi.hpsf.NoPropertySetStreamException
\Object3Ole
org.apache.poi.hpsf.NoPropertySetStreamException
\Object1CompObj
org.apache.poi.hpsf.NoPropertySetStreamException
\Object1OlePres000
org.apache.poi.hpsf.NoPropertySetStreamException
\Object1Ole10Native
org.apache.poi.hpsf.NoPropertySetStreamException
\Object1Ole
org.apache.poi.hpsf.NoPropertySetStreamException
\Header
org.apache.poi.hpsf.NoPropertySetStreamException
<snap>

Any idea what is going wrong?

Thanks,
   Peter



Koundinya (Sudhakar Chavali) wrote:
> Hi All,
> 
> We have done initail ground work for extracting PowerPoint 2
> text. We would like to say thanks to POI group. Though the base
> work is rough, we are able to extract the text from PowerPoint.
> 
> Sorry for bad programming. But hope this wll be helpful to make
> the good program from this scrath by the efficient developers.  
> 
> 
> Here is the sample. When ever there are modifictaions, we will
> post the information.
> 
> 
> import java.io.*;
> import java.util.*;
> import org.apache.poi.hpsf.*;
> import org.apache.poi.poifs.eventfilesystem.*;
> import org.apache.poi.util.HexDump;
> import org.apache.poi.util.LittleEndian;
> 
> public class PPT2Text
> {
> 		public static void main(String[] args)
> 			throws IOException
> 		{
> 			final String filename = args[0];
> 			POIFSReader r = new POIFSReader();
> 
> 			/* Register a listener for *all* documents. */
> 			r.registerListener(new MyPOIFSReaderListener());
> 			r.read(new FileInputStream(filename));
> 		}
> 
> 
> 
> 		static class MyPOIFSReaderListener implements
> POIFSReaderListener
> 		{
> 
> 			static int filename=1;
> 
> 			public void processPOIFSReaderEvent(POIFSReaderEvent event) 
> 			{
> 				PropertySet ps = null;
> 
> 				
> 				try
> 				{
> 					
> 					org.apache.poi.poifs.filesystem.DocumentInputStream
> dis=null;
> 
> 					System.out.println("\n\n");
> 					System.out.println(event.getPath()+event.getName());
> 					dis=event.getStream();
> /*
> 					byte btoWrite[]= new byte[12];
> 
> 					dis.read(btoWrite);
> 
> 					System.out.println("Version
> :"+LittleEndian.getUnsignedByte(btoWrite,0));
> 					System.out.println("Instance
> :"+LittleEndian.getUShort(btoWrite,0));
> 					System.out.println("Type
> :"+LittleEndian.getUShort(btoWrite,2));
> 					System.out.println("Len
> :"+LittleEndian.getLong(btoWrite,4));
> 
> */					
> 
> 					FileOutputStream fos= new
> FileOutputStream(""+filename+".txt");
> 
> 					byte btoWrite[]= new byte[dis.available()];
> 					dis.read(btoWrite,0,dis.available());
> 					for(int i=0;i<btoWrite.length-20;i++)
> 					{
> 					//System.out.println("Version
> :"+LittleEndian.getUnsignedByte(btoWrite,i+0));
> 					//System.out.println("Instance
> :"+LittleEndian.getUShort(btoWrite,i+0));
> 					//System.out.println("Type
> :"+LittleEndian.getUShort(btoWrite,i+2));
> 					//System.out.println("Len
> :"+LittleEndian.getUInt(btoWrite,i+4));
> 
> 					long type=LittleEndian.getUShort(btoWrite,i+2);
> 					long size=LittleEndian.getUInt(btoWrite,i+4);
> 						if (type==4008)
> 						{
> 							fos.write(btoWrite,i+4+1,(int)size+3);
> 
> 						}
> 
> 					}
> 
> 					filename++;
> 					//System.out.println(event.getStream().toString());
> 					//ps = PropertySetFactory.create(event.getStream());
> 				}
> 				catch (Exception ex)
> 				{
> 					//System.out.println("No property set stream: \"" +
> event.getPath() +
> 					//	event.getName() + "\"");
> 					System.out.println(ex);
> 					return;
> 				}
> 			}
> 		}
> 
> 
> }
> 
> 
> 
> 
> 
> 
> thanks,
> Sudhakar
> 
> 
> 
> 
> =====
> "No one can earn a million dollars honestly."- William Jennings Bryan (1860-1925) 
> 
> "Make everything as simple as possible, but not simpler."- Albert Einstein (1879-1955)
> 
> "It is dangerous to be sincere unless you are also stupid."- George Bernard Shaw (1856-1950)
> 
> __________________________________
> Do you Yahoo!?
> Yahoo! Finance Tax Center - File online. File on time.
> http://taxes.yahoo.com/filing.html


---------------------------------------------------------------------
To unsubscribe, e-mail: poi-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: poi-user-help@jakarta.apache.org