You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@poi.apache.org by Suyash Jape <su...@tcs.com> on 2008/08/18 13:13:16 UTC

How to convert plain html to excel binary (.xls)

Hi  , 
        I have  plain html files, which i want to store in different 
sheets on a single worksheet. How should i do that ? 

An HTML file saved as .xls can be opened by MS Excel but CANNOT be opened 
by POI  API 
 " java io ioexception invalid header signature hssf poifsfilesystem" 

An HTML file if opened in MS EXCEL and then  'Saved As'  .xls  file , can 
then be opened by POI.
So the problem is , how to convert plain HTML to binary .xls  , so that it 
can be opened by POI.

Thanks 
Suyash 
=====-----=====-----=====
Notice: The information contained in this e-mail
message and/or attachments to it may contain 
confidential or privileged information. If you are 
not the intended recipient, any dissemination, use, 
review, distribution, printing or copying of the 
information contained in this e-mail message 
and/or attachments to it are strictly prohibited. If 
you have received this communication in error, 
please notify us by reply e-mail or telephone and 
immediately and permanently delete the message 
and any attachments. Thank you



Re: How to convert plain html to excel binary (.xls)

Posted by Nick Burch <ni...@torchbox.com>.
On Mon, 18 Aug 2008, Anthony Andrews wrote:
> As far as I am aware, there is no single tool that will simply parse the 
> HTML for and then create an Excel spreadsheet from that data.

With some work, you might be able to get apache cocoon to do it for you. 
It supports reading from xml, processing then outputting as .xls. Assuming 
your html is well-formed, you should be able to treat it as generic xml 
and work like that.

http://cocoon.apache.org/1363_1_1.html

Nick

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


Re: How to convert plain html to excel binary (.xls)

Posted by Martin Brown <ma...@3bview.com>.
>         I have  plain html files, which i want to store in different 
> sheets on a single worksheet. How should i do that ? 
> 
> An HTML file saved as .xls can be opened by MS Excel but CANNOT be opened 
> by POI  API 
>  " java io ioexception invalid header signature hssf poifsfilesystem"

You could try automating OpenOffice.org. 

OpenOffice.org can insert HTML files into an existing spreadsheet and
then save as .xls. 

HTH

Martin

Filtered by 3BClean from http://www.3bview.com

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


Re: How to convert plain html to excel binary (.xls)

Posted by Anthony Andrews <py...@yahoo.com>.
As far as I am aware, there is no single tool that will simply parse the HTML for and then create an Excel spreadsheet from that data.

I think that you will have to look at the HTMEditorKit that is part of the core Java SDK. This allows you to create event driven code that will strip out the contents of the HTML file. As you recover the information you can then use HSSF to write it into a spreadsheet.

--- On Mon, 8/18/08, Suyash Jape <su...@tcs.com> wrote:
From: Suyash Jape <su...@tcs.com>
Subject: How to convert plain html to excel binary (.xls)
To: user@poi.apache.org
Date: Monday, August 18, 2008, 4:13 AM

Hi  , 
        I have  plain html files, which i want to store in different 
sheets on a single worksheet. How should i do that ? 

An HTML file saved as .xls can be opened by MS Excel but CANNOT be opened 
by POI  API 
 " java io ioexception invalid header signature hssf poifsfilesystem"


An HTML file if opened in MS EXCEL and then  'Saved As'  .xls  file ,
can 
then be opened by POI.
So the problem is , how to convert plain HTML to binary .xls  , so that it 
can be opened by POI.

Thanks 
Suyash 
=====-----=====-----=====
Notice: The information contained in this e-mail
message and/or attachments to it may contain 
confidential or privileged information. If you are 
not the intended recipient, any dissemination, use, 
review, distribution, printing or copying of the 
information contained in this e-mail message 
and/or attachments to it are strictly prohibited. If 
you have received this communication in error, 
please notify us by reply e-mail or telephone and 
immediately and permanently delete the message 
and any attachments. Thank you





      

Re: How to convert plain html to excel binary (.xls)

Posted by df...@jmlafferty.com.
Hi Thorsten,

Can you control your html?

If not then look for an html to xml convertor and follow Nick's suggestion.

HTH

Regards,
Dave
Sent via BlackBerry by AT&T

-----Original Message-----
From: "Thorsten Bux" <Th...@gmx.de>

Date: Wed, 20 Aug 2008 17:08:04 
To: POI Users List<us...@poi.apache.org>
Subject: Re: How to convert plain html to excel binary (.xls)


Hi,

thanks for the answer.

If I got you right, you are executing VBA Code from Java.
That is actually not possible for me, because I have to manipulate PPT-Files without an Office installation.

Thanks anyway
Thorsten


-------- Original-Nachricht --------
> Datum: Wed, 20 Aug 2008 08:00:53 -0700 (PDT)
> Von: Anthony Andrews <py...@yahoo.com>
> An: POI Users List <us...@poi.apache.org>
> Betreff: Re: How to convert plain html to excel binary (.xls)

> Well then, ages ago I wrote a piece of code that uses the Standard Widget
> Toolkit (part of the Eclipse IDE) to manipulate Word from Java code. In
> essence, the class shows how you can execute VBA code (almost) from within
> Java. Recently, someone contacted me to ask for the demonstration code again
> so I posted that archive on Rapidshare. At the same time, I wrote a class
> that showed how to do the same thing for Excel, to give you an idea, the code
> to save a file looks like this;
> 
> /**
>      Save the open file using a different name/location.
>      
>      @param fileName A String object encapsulating the full path
>                      to and name under which the file should be
>                      saved. Note that this method makes no
>                      attempt to check that a value has
> been passed
>                      into the fileName parameter or
> whether the
>                      value passed is reasonable. This is
> omitted
>                      to focus on the OLE specific coding.
>                      
>      @throw An SWTException will be thrown if a problem is
>             encountered obtaining the id's for properties or
>            invoking methods.
>      */
>     public void saveFileAs(String fileName) throws SWTException{
>     
>         OleAutomation automation = null;
>         
>         try {
>             
>             // Get an automation for the ActiveWorkbook object
>             automation = this.getAutomation(this.excelAuto,
>                                            
> "ActiveWorkbook");
>             
>             // Get the id of the SaveAs property of the
> automation
>             int[] propId = automation.getIDsOfNames(
>                  new String[]{"SaveAs"});
>             
>             // If it was not possible to obtain the id, throw an
>             // SWTException and terminate processing.
>             if(propId == null) {
>                 throw new SWTException(
>                     "In saveFileAs() and an id could not be
>                      obtained for the SaveAs method.");
>             }
>             
>             // Build an array of type Variant to hold the
>             // parameters for the SaveAs method; only the
>             // first parameter - the path to and name
>             // of the file - is required here. Note also the
>             // technique used; to construct an array of
>             // type Variant and initialise it's members
>             // with the parameter values. This is a standard
>             // technique used throughout the class.
>             Variant[] arguments = new Variant[1];
>             arguments[0]        = new Variant(fileName);
>             
>             // Call the invoke() method on the automation object
>             // and pass the id of the SaveAs property along
>             // with the array of parameters to indicate both
>             // the method that should be executed and what
>             // values should be passed to it.
>             Variant invokeResult = automation.invoke(
>                  propId[0], arguments);
>             
>             // Check that the method completed successfully.
>             // If not, throw an SWTException.
>             if(invokeResult == null) {
>                 throw new SWTException(
>                     "In saveFile() and a problem occurred
> invoking
>                      the SaveAs method.");
>             }
>             
>         }
>         finally {
>             
>             // Dispose of the automation to release IDespatch
>             if(automation != null) {
>                 automation.dispose();
>             }
>             
>         }
>         
>     }
> 
> As a technique, it has drawbacks; you can only use a single thread of
> control, it is not possible to sun in a client server architecture, etc but
> many of these prblems are shared with JNI.
> 
> If you wish, I will send you the OLE class so that you can have a play.
> Just let me know.
> 
> 
> --- On Wed, 8/20/08, Suyash Jape <su...@tcs.com> wrote:
> From: Suyash Jape <su...@tcs.com>
> Subject: Re: How to convert plain html to excel binary (.xls)
> To: "POI Users List" <us...@poi.apache.org>
> Date: Wednesday, August 20, 2008, 4:47 AM
> 
> Thanks Anthony.
> Yes.thats a good idea. And it will work well. Also we could call the vb 
> script from  java using Runtime.exec() 
>  The only issue is , as the rest of the project code is in Java, looking 
> for a way completely through Java. 
>   A laborious solution is : 
>   Decide a format for HTML 
>   Write a HTML parser  to take appropriate actions in POI  according to 
> which tag is encountered.
>   e.g : If  <table> 
>         If <tr> 
>                 sheet.createRow()
>  
>         and so on.
> 
> Regards 
> Suyash 
> 
> Anthony Andrews <py...@yahoo.com> wrote on 20-08-2008 16:01:37:
> 
> > Rather than use native integration, have you considered programming 
> > an Excel macro to do the work for you. I know that this will mean 
> > using VBA but the code should be very straightforward IMO. You can 
> > even use the macro recorder to get something like this;
> > 
> > Sub Macro1()
> > '
> > ' Macro1 Macro
> > ' Macro recorded 20/08/2008 by AA
> > '
> > 
> > '
> >     ChDir "C:\Documents and Settings\win
> user\Desktop\Cards"
> >     Workbooks.Open Filename:=_
> >         "C:\Documents and Settings\win
> user\Desktop\Cards\Form-   
> >              A-Lines free Christmas tree card pattern.htm"
> >     ChDir "C:\Documents and Settings\win
> user\Desktop"
> >     ActiveWorkbook.SaveAs Filename:=_
> >         "C:\Documents and Settings\win
> user\Desktop\test.xls",
> >         FileFormat:=xlNormal,
> >_Password:="",
> >         WriteResPassword:="",
> >         ReadOnlyRecommended:=False,
> >_
> >         CreateBackup:=False
> > End Sub
> > 
> > --- On Tue, 8/19/08, Suyash Jape <su...@tcs.com> wrote:
> > From: Suyash Jape <su...@tcs.com>
> > Subject: Re: How to convert plain html to excel binary (.xls)
> > To: "POI Users List" <us...@poi.apache.org>
> > Date: Tuesday, August 19, 2008, 9:01 PM
> > 
> > Hi all 
> >  Thanks for your replies ( Anthony , Nick , Martin ) 
> >   The HTML may not be well formed. So what would be ideal is passing the
> 
> > html content to MS Excel, let it covert it to binary and then store it 
> in 
> > a sheet through    HSSF. But i guess Microsoft would not release the API
> 
> > for MS Excel  :) 
> > 
> >   So i am going to try some JNI calls , to open MS Excel , and  make it 
> > 'Save As' the html to .xls.  Will have to try this with MS Excel
> as
> > most 
> > of my client machines won't have OpenOffice.
> > 
> > Thanks 
> > Suyash 
> > 
> > 
> > On Mon, 18 Aug 2008, Anthony Andrews wrote:
> > > As far as I am aware, there is no single tool that will simply parse 
> the 
> > 
> > > HTML for and then create an Excel spreadsheet from that data.
> > 
> > With some work, you might be able to get apache cocoon to do it for you.
> 
> > It supports reading from xml, processing then outputting as .xls. 
> Assuming 
> > 
> > your html is well-formed, you should be able to treat it as generic xml 
> > and work like that.
> > 
> > http://cocoon.apache.org/1363_1_1.html
> > 
> > Nick
> > 
> > 
> > You could try automating OpenOffice.org. 
> > 
> > OpenOffice.org can insert HTML files into an existing spreadsheet and
> > then save as .xls. 
> > 
> > HTH
> > 
> > Martin
> > 
> > Suyash Jape <su...@tcs.com> wrote on 18-08-2008 16:43:16:
> > 
> > > Hi  , 
> > >         I have  plain html files, which i want to store in different 
> > > sheets on a single worksheet. How should i do that ? 
> > > 
> > > An HTML file saved as .xls can be opened by MS Excel but CANNOT be 
> > opened 
> > > by POI  API 
> > >  " java io ioexception invalid header signature hssf
> > poifsfilesystem" 
> > > 
> > > An HTML file if opened in MS EXCEL and then  'Saved As'  .xls
> 
> > file , 
> > can 
> > > then be opened by POI.
> > > So the problem is , how to convert plain HTML to binary .xls  , so 
> that 
> > it 
> > > can be opened by POI.
> > > 
> > > Thanks 
> > > Suyash 
> > =====-----=====-----=====
> > Notice: The information contained in this e-mail
> > message and/or attachments to it may contain 
> > confidential or privileged information. If you are 
> > not the intended recipient, any dissemination, use, 
> > review, distribution, printing or copying of the 
> > information contained in this e-mail message 
> > and/or attachments to it are strictly prohibited. If 
> > you have received this communication in error, 
> > please notify us by reply e-mail or telephone and 
> > immediately and permanently delete the message 
> > and any attachments. Thank you
> > 
> > 
> > 
> > 
> > 
> > 
> > ForwardSourceID:NT0000679E 
> =====-----=====-----=====
> Notice: The information contained in this e-mail
> message and/or attachments to it may contain 
> confidential or privileged information. If you are 
> not the intended recipient, any dissemination, use, 
> review, distribution, printing or copying of the 
> information contained in this e-mail message 
> and/or attachments to it are strictly prohibited. If 
> you have received this communication in error, 
> please notify us by reply e-mail or telephone and 
> immediately and permanently delete the message 
> and any attachments. Thank you
> 
> 
> 
> 
> 
>       

-- 
GMX startet ShortView.de. Hier findest Du Leute mit Deinen Interessen!
Jetzt dabei sein: http://www.shortview.de/wasistshortview.php?mc=sv_ext_mf@gmx

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


Re: How to convert plain html to excel binary (.xls)

Posted by Thorsten Bux <Th...@gmx.de>.
Hi,

thanks for the answer.

If I got you right, you are executing VBA Code from Java.
That is actually not possible for me, because I have to manipulate PPT-Files without an Office installation.

Thanks anyway
Thorsten


-------- Original-Nachricht --------
> Datum: Wed, 20 Aug 2008 08:00:53 -0700 (PDT)
> Von: Anthony Andrews <py...@yahoo.com>
> An: POI Users List <us...@poi.apache.org>
> Betreff: Re: How to convert plain html to excel binary (.xls)

> Well then, ages ago I wrote a piece of code that uses the Standard Widget
> Toolkit (part of the Eclipse IDE) to manipulate Word from Java code. In
> essence, the class shows how you can execute VBA code (almost) from within
> Java. Recently, someone contacted me to ask for the demonstration code again
> so I posted that archive on Rapidshare. At the same time, I wrote a class
> that showed how to do the same thing for Excel, to give you an idea, the code
> to save a file looks like this;
> 
> /**
>      Save the open file using a different name/location.
>      
>      @param fileName A String object encapsulating the full path
>                      to and name under which the file should be
>                      saved. Note that this method makes no
>                      attempt to check that a value has
> been passed
>                      into the fileName parameter or
> whether the
>                      value passed is reasonable. This is
> omitted
>                      to focus on the OLE specific coding.
>                      
>      @throw An SWTException will be thrown if a problem is
>             encountered obtaining the id's for properties or
>            invoking methods.
>      */
>     public void saveFileAs(String fileName) throws SWTException{
>     
>         OleAutomation automation = null;
>         
>         try {
>             
>             // Get an automation for the ActiveWorkbook object
>             automation = this.getAutomation(this.excelAuto,
>                                            
> "ActiveWorkbook");
>             
>             // Get the id of the SaveAs property of the
> automation
>             int[] propId = automation.getIDsOfNames(
>                  new String[]{"SaveAs"});
>             
>             // If it was not possible to obtain the id, throw an
>             // SWTException and terminate processing.
>             if(propId == null) {
>                 throw new SWTException(
>                     "In saveFileAs() and an id could not be
>                      obtained for the SaveAs method.");
>             }
>             
>             // Build an array of type Variant to hold the
>             // parameters for the SaveAs method; only the
>             // first parameter - the path to and name
>             // of the file - is required here. Note also the
>             // technique used; to construct an array of
>             // type Variant and initialise it's members
>             // with the parameter values. This is a standard
>             // technique used throughout the class.
>             Variant[] arguments = new Variant[1];
>             arguments[0]        = new Variant(fileName);
>             
>             // Call the invoke() method on the automation object
>             // and pass the id of the SaveAs property along
>             // with the array of parameters to indicate both
>             // the method that should be executed and what
>             // values should be passed to it.
>             Variant invokeResult = automation.invoke(
>                  propId[0], arguments);
>             
>             // Check that the method completed successfully.
>             // If not, throw an SWTException.
>             if(invokeResult == null) {
>                 throw new SWTException(
>                     "In saveFile() and a problem occurred
> invoking
>                      the SaveAs method.");
>             }
>             
>         }
>         finally {
>             
>             // Dispose of the automation to release IDespatch
>             if(automation != null) {
>                 automation.dispose();
>             }
>             
>         }
>         
>     }
> 
> As a technique, it has drawbacks; you can only use a single thread of
> control, it is not possible to sun in a client server architecture, etc but
> many of these prblems are shared with JNI.
> 
> If you wish, I will send you the OLE class so that you can have a play.
> Just let me know.
> 
> 
> --- On Wed, 8/20/08, Suyash Jape <su...@tcs.com> wrote:
> From: Suyash Jape <su...@tcs.com>
> Subject: Re: How to convert plain html to excel binary (.xls)
> To: "POI Users List" <us...@poi.apache.org>
> Date: Wednesday, August 20, 2008, 4:47 AM
> 
> Thanks Anthony.
> Yes.thats a good idea. And it will work well. Also we could call the vb 
> script from  java using Runtime.exec() 
>  The only issue is , as the rest of the project code is in Java, looking 
> for a way completely through Java. 
>   A laborious solution is : 
>   Decide a format for HTML 
>   Write a HTML parser  to take appropriate actions in POI  according to 
> which tag is encountered.
>   e.g : If  <table> 
>         If <tr> 
>                 sheet.createRow()
>  
>         and so on.
> 
> Regards 
> Suyash 
> 
> Anthony Andrews <py...@yahoo.com> wrote on 20-08-2008 16:01:37:
> 
> > Rather than use native integration, have you considered programming 
> > an Excel macro to do the work for you. I know that this will mean 
> > using VBA but the code should be very straightforward IMO. You can 
> > even use the macro recorder to get something like this;
> > 
> > Sub Macro1()
> > '
> > ' Macro1 Macro
> > ' Macro recorded 20/08/2008 by AA
> > '
> > 
> > '
> >     ChDir "C:\Documents and Settings\win
> user\Desktop\Cards"
> >     Workbooks.Open Filename:= _
> >         "C:\Documents and Settings\win
> user\Desktop\Cards\Form-   
> >              A-Lines free Christmas tree card pattern.htm"
> >     ChDir "C:\Documents and Settings\win
> user\Desktop"
> >     ActiveWorkbook.SaveAs Filename:= _
> >         "C:\Documents and Settings\win
> user\Desktop\test.xls",
> >         FileFormat:=xlNormal,
> >         _Password:="",
> >         WriteResPassword:="",
> >         ReadOnlyRecommended:=False,
> >         _
> >         CreateBackup:=False
> > End Sub
> > 
> > --- On Tue, 8/19/08, Suyash Jape <su...@tcs.com> wrote:
> > From: Suyash Jape <su...@tcs.com>
> > Subject: Re: How to convert plain html to excel binary (.xls)
> > To: "POI Users List" <us...@poi.apache.org>
> > Date: Tuesday, August 19, 2008, 9:01 PM
> > 
> > Hi all 
> >  Thanks for your replies ( Anthony , Nick , Martin ) 
> >   The HTML may not be well formed. So what would be ideal is passing the
> 
> > html content to MS Excel, let it covert it to binary and then store it 
> in 
> > a sheet through    HSSF. But i guess Microsoft would not release the API
> 
> > for MS Excel  :) 
> > 
> >   So i am going to try some JNI calls , to open MS Excel , and  make it 
> > 'Save As' the html to .xls.  Will have to try this with MS Excel
> as
> > most 
> > of my client machines won't have OpenOffice.
> > 
> > Thanks 
> > Suyash 
> > 
> > 
> > On Mon, 18 Aug 2008, Anthony Andrews wrote:
> > > As far as I am aware, there is no single tool that will simply parse 
> the 
> > 
> > > HTML for and then create an Excel spreadsheet from that data.
> > 
> > With some work, you might be able to get apache cocoon to do it for you.
> 
> > It supports reading from xml, processing then outputting as .xls. 
> Assuming 
> > 
> > your html is well-formed, you should be able to treat it as generic xml 
> > and work like that.
> > 
> > http://cocoon.apache.org/1363_1_1.html
> > 
> > Nick
> > 
> > 
> > You could try automating OpenOffice.org. 
> > 
> > OpenOffice.org can insert HTML files into an existing spreadsheet and
> > then save as .xls. 
> > 
> > HTH
> > 
> > Martin
> > 
> > Suyash Jape <su...@tcs.com> wrote on 18-08-2008 16:43:16:
> > 
> > > Hi  , 
> > >         I have  plain html files, which i want to store in different 
> > > sheets on a single worksheet. How should i do that ? 
> > > 
> > > An HTML file saved as .xls can be opened by MS Excel but CANNOT be 
> > opened 
> > > by POI  API 
> > >  " java io ioexception invalid header signature hssf
> > poifsfilesystem" 
> > > 
> > > An HTML file if opened in MS EXCEL and then  'Saved As'  .xls
> 
> > file , 
> > can 
> > > then be opened by POI.
> > > So the problem is , how to convert plain HTML to binary .xls  , so 
> that 
> > it 
> > > can be opened by POI.
> > > 
> > > Thanks 
> > > Suyash 
> > =====-----=====-----=====
> > Notice: The information contained in this e-mail
> > message and/or attachments to it may contain 
> > confidential or privileged information. If you are 
> > not the intended recipient, any dissemination, use, 
> > review, distribution, printing or copying of the 
> > information contained in this e-mail message 
> > and/or attachments to it are strictly prohibited. If 
> > you have received this communication in error, 
> > please notify us by reply e-mail or telephone and 
> > immediately and permanently delete the message 
> > and any attachments. Thank you
> > 
> > 
> > 
> > 
> > 
> > 
> > ForwardSourceID:NT0000679E 
> =====-----=====-----=====
> Notice: The information contained in this e-mail
> message and/or attachments to it may contain 
> confidential or privileged information. If you are 
> not the intended recipient, any dissemination, use, 
> review, distribution, printing or copying of the 
> information contained in this e-mail message 
> and/or attachments to it are strictly prohibited. If 
> you have received this communication in error, 
> please notify us by reply e-mail or telephone and 
> immediately and permanently delete the message 
> and any attachments. Thank you
> 
> 
> 
> 
> 
>       

-- 
GMX startet ShortView.de. Hier findest Du Leute mit Deinen Interessen!
Jetzt dabei sein: http://www.shortview.de/wasistshortview.php?mc=sv_ext_mf@gmx

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


Re: How to convert plain html to excel binary (.xls)

Posted by Anthony Andrews <py...@yahoo.com>.
Well then, ages ago I wrote a piece of code that uses the Standard Widget Toolkit (part of the Eclipse IDE) to manipulate Word from Java code. In essence, the class shows how you can execute VBA code (almost) from within Java. Recently, someone contacted me to ask for the demonstration code again so I posted that archive on Rapidshare. At the same time, I wrote a class that showed how to do the same thing for Excel, to give you an idea, the code to save a file looks like this;

/**
     Save the open file using a different name/location.
     
     @param fileName A String object encapsulating the full path
                     to and name under which the file should be
                     saved. Note that this method makes no
                     attempt to check that a value has been passed
                     into the fileName parameter or whether the
                     value passed is reasonable. This is omitted
                     to focus on the OLE specific coding.
                     
     @throw An SWTException will be thrown if a problem is
            encountered obtaining the id's for properties or
           invoking methods.
     */
    public void saveFileAs(String fileName) throws SWTException{
    
        OleAutomation automation = null;
        
        try {
            
            // Get an automation for the ActiveWorkbook object
            automation = this.getAutomation(this.excelAuto,
                                            "ActiveWorkbook");
            
            // Get the id of the SaveAs property of the automation
            int[] propId = automation.getIDsOfNames(
                 new String[]{"SaveAs"});
            
            // If it was not possible to obtain the id, throw an
            // SWTException and terminate processing.
            if(propId == null) {
                throw new SWTException(
                    "In saveFileAs() and an id could not be
                     obtained for the SaveAs method.");
            }
            
            // Build an array of type Variant to hold the
            // parameters for the SaveAs method; only the
            // first parameter - the path to and name
            // of the file - is required here. Note also the
            // technique used; to construct an array of
            // type Variant and initialise it's members
            // with the parameter values. This is a standard
            // technique used throughout the class.
            Variant[] arguments = new Variant[1];
            arguments[0]        = new Variant(fileName);
            
            // Call the invoke() method on the automation object
            // and pass the id of the SaveAs property along
            // with the array of parameters to indicate both
            // the method that should be executed and what
            // values should be passed to it.
            Variant invokeResult = automation.invoke(
                 propId[0], arguments);
            
            // Check that the method completed successfully.
            // If not, throw an SWTException.
            if(invokeResult == null) {
                throw new SWTException(
                    "In saveFile() and a problem occurred invoking
                     the SaveAs method.");
            }
            
        }
        finally {
            
            // Dispose of the automation to release IDespatch
            if(automation != null) {
                automation.dispose();
            }
            
        }
        
    }

As a technique, it has drawbacks; you can only use a single thread of control, it is not possible to sun in a client server architecture, etc but many of these prblems are shared with JNI.

If you wish, I will send you the OLE class so that you can have a play. Just let me know.


--- On Wed, 8/20/08, Suyash Jape <su...@tcs.com> wrote:
From: Suyash Jape <su...@tcs.com>
Subject: Re: How to convert plain html to excel binary (.xls)
To: "POI Users List" <us...@poi.apache.org>
Date: Wednesday, August 20, 2008, 4:47 AM

Thanks Anthony.
Yes.thats a good idea. And it will work well. Also we could call the vb 
script from  java using Runtime.exec() 
 The only issue is , as the rest of the project code is in Java, looking 
for a way completely through Java. 
  A laborious solution is : 
  Decide a format for HTML 
  Write a HTML parser  to take appropriate actions in POI  according to 
which tag is encountered.
  e.g : If  <table> 
        If <tr> 
                sheet.createRow()
 
        and so on.

Regards 
Suyash 

Anthony Andrews <py...@yahoo.com> wrote on 20-08-2008 16:01:37:

> Rather than use native integration, have you considered programming 
> an Excel macro to do the work for you. I know that this will mean 
> using VBA but the code should be very straightforward IMO. You can 
> even use the macro recorder to get something like this;
> 
> Sub Macro1()
> '
> ' Macro1 Macro
> ' Macro recorded 20/08/2008 by AA
> '
> 
> '
>     ChDir "C:\Documents and Settings\win
user\Desktop\Cards"
>     Workbooks.Open Filename:= _
>         "C:\Documents and Settings\win
user\Desktop\Cards\Form-   
>              A-Lines free Christmas tree card pattern.htm"
>     ChDir "C:\Documents and Settings\win
user\Desktop"
>     ActiveWorkbook.SaveAs Filename:= _
>         "C:\Documents and Settings\win
user\Desktop\test.xls",
>         FileFormat:=xlNormal,
>         _Password:="",
>         WriteResPassword:="",
>         ReadOnlyRecommended:=False,
>         _
>         CreateBackup:=False
> End Sub
> 
> --- On Tue, 8/19/08, Suyash Jape <su...@tcs.com> wrote:
> From: Suyash Jape <su...@tcs.com>
> Subject: Re: How to convert plain html to excel binary (.xls)
> To: "POI Users List" <us...@poi.apache.org>
> Date: Tuesday, August 19, 2008, 9:01 PM
> 
> Hi all 
>  Thanks for your replies ( Anthony , Nick , Martin ) 
>   The HTML may not be well formed. So what would be ideal is passing the 

> html content to MS Excel, let it covert it to binary and then store it 
in 
> a sheet through    HSSF. But i guess Microsoft would not release the API 

> for MS Excel  :) 
> 
>   So i am going to try some JNI calls , to open MS Excel , and  make it 
> 'Save As' the html to .xls.  Will have to try this with MS Excel
as
> most 
> of my client machines won't have OpenOffice.
> 
> Thanks 
> Suyash 
> 
> 
> On Mon, 18 Aug 2008, Anthony Andrews wrote:
> > As far as I am aware, there is no single tool that will simply parse 
the 
> 
> > HTML for and then create an Excel spreadsheet from that data.
> 
> With some work, you might be able to get apache cocoon to do it for you. 

> It supports reading from xml, processing then outputting as .xls. 
Assuming 
> 
> your html is well-formed, you should be able to treat it as generic xml 
> and work like that.
> 
> http://cocoon.apache.org/1363_1_1.html
> 
> Nick
> 
> 
> You could try automating OpenOffice.org. 
> 
> OpenOffice.org can insert HTML files into an existing spreadsheet and
> then save as .xls. 
> 
> HTH
> 
> Martin
> 
> Suyash Jape <su...@tcs.com> wrote on 18-08-2008 16:43:16:
> 
> > Hi  , 
> >         I have  plain html files, which i want to store in different 
> > sheets on a single worksheet. How should i do that ? 
> > 
> > An HTML file saved as .xls can be opened by MS Excel but CANNOT be 
> opened 
> > by POI  API 
> >  " java io ioexception invalid header signature hssf
> poifsfilesystem" 
> > 
> > An HTML file if opened in MS EXCEL and then  'Saved As'  .xls

> file , 
> can 
> > then be opened by POI.
> > So the problem is , how to convert plain HTML to binary .xls  , so 
that 
> it 
> > can be opened by POI.
> > 
> > Thanks 
> > Suyash 
> =====-----=====-----=====
> Notice: The information contained in this e-mail
> message and/or attachments to it may contain 
> confidential or privileged information. If you are 
> not the intended recipient, any dissemination, use, 
> review, distribution, printing or copying of the 
> information contained in this e-mail message 
> and/or attachments to it are strictly prohibited. If 
> you have received this communication in error, 
> please notify us by reply e-mail or telephone and 
> immediately and permanently delete the message 
> and any attachments. Thank you
> 
> 
> 
> 
> 
> 
> ForwardSourceID:NT0000679E 
=====-----=====-----=====
Notice: The information contained in this e-mail
message and/or attachments to it may contain 
confidential or privileged information. If you are 
not the intended recipient, any dissemination, use, 
review, distribution, printing or copying of the 
information contained in this e-mail message 
and/or attachments to it are strictly prohibited. If 
you have received this communication in error, 
please notify us by reply e-mail or telephone and 
immediately and permanently delete the message 
and any attachments. Thank you





      

Re: How to convert plain html to excel binary (.xls)

Posted by Suyash Jape <su...@tcs.com>.
Thanks Anthony.
Yes.thats a good idea. And it will work well. Also we could call the vb 
script from  java using Runtime.exec() 
 The only issue is , as the rest of the project code is in Java, looking 
for a way completely through Java. 
  A laborious solution is : 
  Decide a format for HTML 
  Write a HTML parser  to take appropriate actions in POI  according to 
which tag is encountered.
  e.g : If  <table> 
        If <tr> 
                sheet.createRow()
 
        and so on.

Regards 
Suyash 

Anthony Andrews <py...@yahoo.com> wrote on 20-08-2008 16:01:37:

> Rather than use native integration, have you considered programming 
> an Excel macro to do the work for you. I know that this will mean 
> using VBA but the code should be very straightforward IMO. You can 
> even use the macro recorder to get something like this;
> 
> Sub Macro1()
> '
> ' Macro1 Macro
> ' Macro recorded 20/08/2008 by AA
> '
> 
> '
>     ChDir "C:\Documents and Settings\win user\Desktop\Cards"
>     Workbooks.Open Filename:= _
>         "C:\Documents and Settings\win user\Desktop\Cards\Form-   
>              A-Lines free Christmas tree card pattern.htm"
>     ChDir "C:\Documents and Settings\win user\Desktop"
>     ActiveWorkbook.SaveAs Filename:= _
>         "C:\Documents and Settings\win user\Desktop\test.xls",
>         FileFormat:=xlNormal,
>         _Password:="",
>         WriteResPassword:="",
>         ReadOnlyRecommended:=False,
>         _
>         CreateBackup:=False
> End Sub
> 
> --- On Tue, 8/19/08, Suyash Jape <su...@tcs.com> wrote:
> From: Suyash Jape <su...@tcs.com>
> Subject: Re: How to convert plain html to excel binary (.xls)
> To: "POI Users List" <us...@poi.apache.org>
> Date: Tuesday, August 19, 2008, 9:01 PM
> 
> Hi all 
>  Thanks for your replies ( Anthony , Nick , Martin ) 
>   The HTML may not be well formed. So what would be ideal is passing the 

> html content to MS Excel, let it covert it to binary and then store it 
in 
> a sheet through    HSSF. But i guess Microsoft would not release the API 

> for MS Excel  :) 
> 
>   So i am going to try some JNI calls , to open MS Excel , and  make it 
> 'Save As' the html to .xls.  Will have to try this with MS Excel as
> most 
> of my client machines won't have OpenOffice.
> 
> Thanks 
> Suyash 
> 
> 
> On Mon, 18 Aug 2008, Anthony Andrews wrote:
> > As far as I am aware, there is no single tool that will simply parse 
the 
> 
> > HTML for and then create an Excel spreadsheet from that data.
> 
> With some work, you might be able to get apache cocoon to do it for you. 

> It supports reading from xml, processing then outputting as .xls. 
Assuming 
> 
> your html is well-formed, you should be able to treat it as generic xml 
> and work like that.
> 
> http://cocoon.apache.org/1363_1_1.html
> 
> Nick
> 
> 
> You could try automating OpenOffice.org. 
> 
> OpenOffice.org can insert HTML files into an existing spreadsheet and
> then save as .xls. 
> 
> HTH
> 
> Martin
> 
> Suyash Jape <su...@tcs.com> wrote on 18-08-2008 16:43:16:
> 
> > Hi  , 
> >         I have  plain html files, which i want to store in different 
> > sheets on a single worksheet. How should i do that ? 
> > 
> > An HTML file saved as .xls can be opened by MS Excel but CANNOT be 
> opened 
> > by POI  API 
> >  " java io ioexception invalid header signature hssf
> poifsfilesystem" 
> > 
> > An HTML file if opened in MS EXCEL and then  'Saved As'  .xls 
> file , 
> can 
> > then be opened by POI.
> > So the problem is , how to convert plain HTML to binary .xls  , so 
that 
> it 
> > can be opened by POI.
> > 
> > Thanks 
> > Suyash 
> =====-----=====-----=====
> Notice: The information contained in this e-mail
> message and/or attachments to it may contain 
> confidential or privileged information. If you are 
> not the intended recipient, any dissemination, use, 
> review, distribution, printing or copying of the 
> information contained in this e-mail message 
> and/or attachments to it are strictly prohibited. If 
> you have received this communication in error, 
> please notify us by reply e-mail or telephone and 
> immediately and permanently delete the message 
> and any attachments. Thank you
> 
> 
> 
> 
> 
> 
> ForwardSourceID:NT0000679E 
=====-----=====-----=====
Notice: The information contained in this e-mail
message and/or attachments to it may contain 
confidential or privileged information. If you are 
not the intended recipient, any dissemination, use, 
review, distribution, printing or copying of the 
information contained in this e-mail message 
and/or attachments to it are strictly prohibited. If 
you have received this communication in error, 
please notify us by reply e-mail or telephone and 
immediately and permanently delete the message 
and any attachments. Thank you



Re: How to convert plain html to excel binary (.xls)

Posted by Anthony Andrews <py...@yahoo.com>.
Rather than use native integration, have you considered programming an Excel macro to do the work for you. I know that this will mean using VBA but the code should be very straightforward IMO. You can even use the macro recorder to get something like this;

Sub Macro1()
'
' Macro1 Macro
' Macro recorded 20/08/2008 by AA
'

'
    ChDir "C:\Documents and Settings\win user\Desktop\Cards"
    Workbooks.Open Filename:= _
        "C:\Documents and Settings\win user\Desktop\Cards\Form-   
             A-Lines free Christmas tree card pattern.htm"
    ChDir "C:\Documents and Settings\win user\Desktop"
    ActiveWorkbook.SaveAs Filename:= _
        "C:\Documents and Settings\win user\Desktop\test.xls",
        FileFormat:=xlNormal,
        _Password:="",
        WriteResPassword:="",
        ReadOnlyRecommended:=False,
        _
        CreateBackup:=False
End Sub

--- On Tue, 8/19/08, Suyash Jape <su...@tcs.com> wrote:
From: Suyash Jape <su...@tcs.com>
Subject: Re: How to convert plain html to excel binary (.xls)
To: "POI Users List" <us...@poi.apache.org>
Date: Tuesday, August 19, 2008, 9:01 PM

Hi all 
 Thanks for your replies ( Anthony , Nick , Martin ) 
  The HTML may not be well formed. So what would be ideal is passing the 
html content to MS Excel, let it covert it to binary and then store it in 
a sheet through    HSSF. But i guess Microsoft would not release the API 
for MS Excel  :) 
 
  So i am going to try some JNI calls , to open MS Excel , and  make it 
'Save As' the html to .xls.  Will have to try this with MS Excel as
most 
of my client machines won't have OpenOffice.

Thanks 
Suyash 


On Mon, 18 Aug 2008, Anthony Andrews wrote:
> As far as I am aware, there is no single tool that will simply parse the 

> HTML for and then create an Excel spreadsheet from that data.

With some work, you might be able to get apache cocoon to do it for you. 
It supports reading from xml, processing then outputting as .xls. Assuming 

your html is well-formed, you should be able to treat it as generic xml 
and work like that.

http://cocoon.apache.org/1363_1_1.html

Nick


You could try automating OpenOffice.org. 

OpenOffice.org can insert HTML files into an existing spreadsheet and
then save as .xls. 

HTH

Martin

Suyash Jape <su...@tcs.com> wrote on 18-08-2008 16:43:16:

> Hi  , 
>         I have  plain html files, which i want to store in different 
> sheets on a single worksheet. How should i do that ? 
> 
> An HTML file saved as .xls can be opened by MS Excel but CANNOT be 
opened 
> by POI  API 
>  " java io ioexception invalid header signature hssf
poifsfilesystem" 
> 
> An HTML file if opened in MS EXCEL and then  'Saved As'  .xls 
file , 
can 
> then be opened by POI.
> So the problem is , how to convert plain HTML to binary .xls  , so that 
it 
> can be opened by POI.
> 
> Thanks 
> Suyash 
=====-----=====-----=====
Notice: The information contained in this e-mail
message and/or attachments to it may contain 
confidential or privileged information. If you are 
not the intended recipient, any dissemination, use, 
review, distribution, printing or copying of the 
information contained in this e-mail message 
and/or attachments to it are strictly prohibited. If 
you have received this communication in error, 
please notify us by reply e-mail or telephone and 
immediately and permanently delete the message 
and any attachments. Thank you





      

Re: How to convert plain html to excel binary (.xls)

Posted by Suyash Jape <su...@tcs.com>.
Hi all 
 Thanks for your replies ( Anthony , Nick , Martin ) 
  The HTML may not be well formed. So what would be ideal is passing the 
html content to MS Excel, let it covert it to binary and then store it in 
a sheet through    HSSF. But i guess Microsoft would not release the API 
for MS Excel  :) 
 
  So i am going to try some JNI calls , to open MS Excel , and  make it 
'Save As' the html to .xls.  Will have to try this with MS Excel as most 
of my client machines won't have OpenOffice.

Thanks 
Suyash 


On Mon, 18 Aug 2008, Anthony Andrews wrote:
> As far as I am aware, there is no single tool that will simply parse the 

> HTML for and then create an Excel spreadsheet from that data.

With some work, you might be able to get apache cocoon to do it for you. 
It supports reading from xml, processing then outputting as .xls. Assuming 

your html is well-formed, you should be able to treat it as generic xml 
and work like that.

http://cocoon.apache.org/1363_1_1.html

Nick


You could try automating OpenOffice.org. 

OpenOffice.org can insert HTML files into an existing spreadsheet and
then save as .xls. 

HTH

Martin

Suyash Jape <su...@tcs.com> wrote on 18-08-2008 16:43:16:

> Hi  , 
>         I have  plain html files, which i want to store in different 
> sheets on a single worksheet. How should i do that ? 
> 
> An HTML file saved as .xls can be opened by MS Excel but CANNOT be 
opened 
> by POI  API 
>  " java io ioexception invalid header signature hssf poifsfilesystem" 
> 
> An HTML file if opened in MS EXCEL and then  'Saved As'  .xls  file , 
can 
> then be opened by POI.
> So the problem is , how to convert plain HTML to binary .xls  , so that 
it 
> can be opened by POI.
> 
> Thanks 
> Suyash 
=====-----=====-----=====
Notice: The information contained in this e-mail
message and/or attachments to it may contain 
confidential or privileged information. If you are 
not the intended recipient, any dissemination, use, 
review, distribution, printing or copying of the 
information contained in this e-mail message 
and/or attachments to it are strictly prohibited. If 
you have received this communication in error, 
please notify us by reply e-mail or telephone and 
immediately and permanently delete the message 
and any attachments. Thank you