You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@poi.apache.org by Suyash Jape <su...@tcs.com> on 2008/08/18 13:13:16 UTC
How to convert plain html to excel binary (.xls)
Hi ,
I have plain html files, which i want to store in different
sheets on a single worksheet. How should i do that ?
An HTML file saved as .xls can be opened by MS Excel but CANNOT be opened
by POI API
" java io ioexception invalid header signature hssf poifsfilesystem"
An HTML file if opened in MS EXCEL and then 'Saved As' .xls file , can
then be opened by POI.
So the problem is , how to convert plain HTML to binary .xls , so that it
can be opened by POI.
Thanks
Suyash
=====-----=====-----=====
Notice: The information contained in this e-mail
message and/or attachments to it may contain
confidential or privileged information. If you are
not the intended recipient, any dissemination, use,
review, distribution, printing or copying of the
information contained in this e-mail message
and/or attachments to it are strictly prohibited. If
you have received this communication in error,
please notify us by reply e-mail or telephone and
immediately and permanently delete the message
and any attachments. Thank you
Re: How to convert plain html to excel binary (.xls)
Posted by Nick Burch <ni...@torchbox.com>.
On Mon, 18 Aug 2008, Anthony Andrews wrote:
> As far as I am aware, there is no single tool that will simply parse the
> HTML for and then create an Excel spreadsheet from that data.
With some work, you might be able to get apache cocoon to do it for you.
It supports reading from xml, processing then outputting as .xls. Assuming
your html is well-formed, you should be able to treat it as generic xml
and work like that.
http://cocoon.apache.org/1363_1_1.html
Nick
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org
Re: How to convert plain html to excel binary (.xls)
Posted by Martin Brown <ma...@3bview.com>.
> I have plain html files, which i want to store in different
> sheets on a single worksheet. How should i do that ?
>
> An HTML file saved as .xls can be opened by MS Excel but CANNOT be opened
> by POI API
> " java io ioexception invalid header signature hssf poifsfilesystem"
You could try automating OpenOffice.org.
OpenOffice.org can insert HTML files into an existing spreadsheet and
then save as .xls.
HTH
Martin
Filtered by 3BClean from http://www.3bview.com
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org
Re: How to convert plain html to excel binary (.xls)
Posted by Anthony Andrews <py...@yahoo.com>.
As far as I am aware, there is no single tool that will simply parse the HTML for and then create an Excel spreadsheet from that data.
I think that you will have to look at the HTMEditorKit that is part of the core Java SDK. This allows you to create event driven code that will strip out the contents of the HTML file. As you recover the information you can then use HSSF to write it into a spreadsheet.
--- On Mon, 8/18/08, Suyash Jape <su...@tcs.com> wrote:
From: Suyash Jape <su...@tcs.com>
Subject: How to convert plain html to excel binary (.xls)
To: user@poi.apache.org
Date: Monday, August 18, 2008, 4:13 AM
Hi ,
I have plain html files, which i want to store in different
sheets on a single worksheet. How should i do that ?
An HTML file saved as .xls can be opened by MS Excel but CANNOT be opened
by POI API
" java io ioexception invalid header signature hssf poifsfilesystem"
An HTML file if opened in MS EXCEL and then 'Saved As' .xls file ,
can
then be opened by POI.
So the problem is , how to convert plain HTML to binary .xls , so that it
can be opened by POI.
Thanks
Suyash
=====-----=====-----=====
Notice: The information contained in this e-mail
message and/or attachments to it may contain
confidential or privileged information. If you are
not the intended recipient, any dissemination, use,
review, distribution, printing or copying of the
information contained in this e-mail message
and/or attachments to it are strictly prohibited. If
you have received this communication in error,
please notify us by reply e-mail or telephone and
immediately and permanently delete the message
and any attachments. Thank you
Re: How to convert plain html to excel binary (.xls)
Posted by df...@jmlafferty.com.
Hi Thorsten,
Can you control your html?
If not then look for an html to xml convertor and follow Nick's suggestion.
HTH
Regards,
Dave
Sent via BlackBerry by AT&T
-----Original Message-----
From: "Thorsten Bux" <Th...@gmx.de>
Date: Wed, 20 Aug 2008 17:08:04
To: POI Users List<us...@poi.apache.org>
Subject: Re: How to convert plain html to excel binary (.xls)
Hi,
thanks for the answer.
If I got you right, you are executing VBA Code from Java.
That is actually not possible for me, because I have to manipulate PPT-Files without an Office installation.
Thanks anyway
Thorsten
-------- Original-Nachricht --------
> Datum: Wed, 20 Aug 2008 08:00:53 -0700 (PDT)
> Von: Anthony Andrews <py...@yahoo.com>
> An: POI Users List <us...@poi.apache.org>
> Betreff: Re: How to convert plain html to excel binary (.xls)
> Well then, ages ago I wrote a piece of code that uses the Standard Widget
> Toolkit (part of the Eclipse IDE) to manipulate Word from Java code. In
> essence, the class shows how you can execute VBA code (almost) from within
> Java. Recently, someone contacted me to ask for the demonstration code again
> so I posted that archive on Rapidshare. At the same time, I wrote a class
> that showed how to do the same thing for Excel, to give you an idea, the code
> to save a file looks like this;
>
> /**
> Save the open file using a different name/location.
>
> @param fileName A String object encapsulating the full path
> to and name under which the file should be
> saved. Note that this method makes no
> attempt to check that a value has
> been passed
> into the fileName parameter or
> whether the
> value passed is reasonable. This is
> omitted
> to focus on the OLE specific coding.
>
> @throw An SWTException will be thrown if a problem is
> encountered obtaining the id's for properties or
> invoking methods.
> */
> public void saveFileAs(String fileName) throws SWTException{
>
> OleAutomation automation = null;
>
> try {
>
> // Get an automation for the ActiveWorkbook object
> automation = this.getAutomation(this.excelAuto,
>
> "ActiveWorkbook");
>
> // Get the id of the SaveAs property of the
> automation
> int[] propId = automation.getIDsOfNames(
> new String[]{"SaveAs"});
>
> // If it was not possible to obtain the id, throw an
> // SWTException and terminate processing.
> if(propId == null) {
> throw new SWTException(
> "In saveFileAs() and an id could not be
> obtained for the SaveAs method.");
> }
>
> // Build an array of type Variant to hold the
> // parameters for the SaveAs method; only the
> // first parameter - the path to and name
> // of the file - is required here. Note also the
> // technique used; to construct an array of
> // type Variant and initialise it's members
> // with the parameter values. This is a standard
> // technique used throughout the class.
> Variant[] arguments = new Variant[1];
> arguments[0] = new Variant(fileName);
>
> // Call the invoke() method on the automation object
> // and pass the id of the SaveAs property along
> // with the array of parameters to indicate both
> // the method that should be executed and what
> // values should be passed to it.
> Variant invokeResult = automation.invoke(
> propId[0], arguments);
>
> // Check that the method completed successfully.
> // If not, throw an SWTException.
> if(invokeResult == null) {
> throw new SWTException(
> "In saveFile() and a problem occurred
> invoking
> the SaveAs method.");
> }
>
> }
> finally {
>
> // Dispose of the automation to release IDespatch
> if(automation != null) {
> automation.dispose();
> }
>
> }
>
> }
>
> As a technique, it has drawbacks; you can only use a single thread of
> control, it is not possible to sun in a client server architecture, etc but
> many of these prblems are shared with JNI.
>
> If you wish, I will send you the OLE class so that you can have a play.
> Just let me know.
>
>
> --- On Wed, 8/20/08, Suyash Jape <su...@tcs.com> wrote:
> From: Suyash Jape <su...@tcs.com>
> Subject: Re: How to convert plain html to excel binary (.xls)
> To: "POI Users List" <us...@poi.apache.org>
> Date: Wednesday, August 20, 2008, 4:47 AM
>
> Thanks Anthony.
> Yes.thats a good idea. And it will work well. Also we could call the vb
> script from java using Runtime.exec()
> The only issue is , as the rest of the project code is in Java, looking
> for a way completely through Java.
> A laborious solution is :
> Decide a format for HTML
> Write a HTML parser to take appropriate actions in POI according to
> which tag is encountered.
> e.g : If <table>
> If <tr>
> sheet.createRow()
>
> and so on.
>
> Regards
> Suyash
>
> Anthony Andrews <py...@yahoo.com> wrote on 20-08-2008 16:01:37:
>
> > Rather than use native integration, have you considered programming
> > an Excel macro to do the work for you. I know that this will mean
> > using VBA but the code should be very straightforward IMO. You can
> > even use the macro recorder to get something like this;
> >
> > Sub Macro1()
> > '
> > ' Macro1 Macro
> > ' Macro recorded 20/08/2008 by AA
> > '
> >
> > '
> > ChDir "C:\Documents and Settings\win
> user\Desktop\Cards"
> > Workbooks.Open Filename:=_
> > "C:\Documents and Settings\win
> user\Desktop\Cards\Form-
> > A-Lines free Christmas tree card pattern.htm"
> > ChDir "C:\Documents and Settings\win
> user\Desktop"
> > ActiveWorkbook.SaveAs Filename:=_
> > "C:\Documents and Settings\win
> user\Desktop\test.xls",
> > FileFormat:=xlNormal,
> >_Password:="",
> > WriteResPassword:="",
> > ReadOnlyRecommended:=False,
> >_
> > CreateBackup:=False
> > End Sub
> >
> > --- On Tue, 8/19/08, Suyash Jape <su...@tcs.com> wrote:
> > From: Suyash Jape <su...@tcs.com>
> > Subject: Re: How to convert plain html to excel binary (.xls)
> > To: "POI Users List" <us...@poi.apache.org>
> > Date: Tuesday, August 19, 2008, 9:01 PM
> >
> > Hi all
> > Thanks for your replies ( Anthony , Nick , Martin )
> > The HTML may not be well formed. So what would be ideal is passing the
>
> > html content to MS Excel, let it covert it to binary and then store it
> in
> > a sheet through HSSF. But i guess Microsoft would not release the API
>
> > for MS Excel :)
> >
> > So i am going to try some JNI calls , to open MS Excel , and make it
> > 'Save As' the html to .xls. Will have to try this with MS Excel
> as
> > most
> > of my client machines won't have OpenOffice.
> >
> > Thanks
> > Suyash
> >
> >
> > On Mon, 18 Aug 2008, Anthony Andrews wrote:
> > > As far as I am aware, there is no single tool that will simply parse
> the
> >
> > > HTML for and then create an Excel spreadsheet from that data.
> >
> > With some work, you might be able to get apache cocoon to do it for you.
>
> > It supports reading from xml, processing then outputting as .xls.
> Assuming
> >
> > your html is well-formed, you should be able to treat it as generic xml
> > and work like that.
> >
> > http://cocoon.apache.org/1363_1_1.html
> >
> > Nick
> >
> >
> > You could try automating OpenOffice.org.
> >
> > OpenOffice.org can insert HTML files into an existing spreadsheet and
> > then save as .xls.
> >
> > HTH
> >
> > Martin
> >
> > Suyash Jape <su...@tcs.com> wrote on 18-08-2008 16:43:16:
> >
> > > Hi ,
> > > I have plain html files, which i want to store in different
> > > sheets on a single worksheet. How should i do that ?
> > >
> > > An HTML file saved as .xls can be opened by MS Excel but CANNOT be
> > opened
> > > by POI API
> > > " java io ioexception invalid header signature hssf
> > poifsfilesystem"
> > >
> > > An HTML file if opened in MS EXCEL and then 'Saved As' .xls
>
> > file ,
> > can
> > > then be opened by POI.
> > > So the problem is , how to convert plain HTML to binary .xls , so
> that
> > it
> > > can be opened by POI.
> > >
> > > Thanks
> > > Suyash
> > =====-----=====-----=====
> > Notice: The information contained in this e-mail
> > message and/or attachments to it may contain
> > confidential or privileged information. If you are
> > not the intended recipient, any dissemination, use,
> > review, distribution, printing or copying of the
> > information contained in this e-mail message
> > and/or attachments to it are strictly prohibited. If
> > you have received this communication in error,
> > please notify us by reply e-mail or telephone and
> > immediately and permanently delete the message
> > and any attachments. Thank you
> >
> >
> >
> >
> >
> >
> > ForwardSourceID:NT0000679E
> =====-----=====-----=====
> Notice: The information contained in this e-mail
> message and/or attachments to it may contain
> confidential or privileged information. If you are
> not the intended recipient, any dissemination, use,
> review, distribution, printing or copying of the
> information contained in this e-mail message
> and/or attachments to it are strictly prohibited. If
> you have received this communication in error,
> please notify us by reply e-mail or telephone and
> immediately and permanently delete the message
> and any attachments. Thank you
>
>
>
>
>
>
--
GMX startet ShortView.de. Hier findest Du Leute mit Deinen Interessen!
Jetzt dabei sein: http://www.shortview.de/wasistshortview.php?mc=sv_ext_mf@gmx
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org
Re: How to convert plain html to excel binary (.xls)
Posted by Thorsten Bux <Th...@gmx.de>.
Hi,
thanks for the answer.
If I got you right, you are executing VBA Code from Java.
That is actually not possible for me, because I have to manipulate PPT-Files without an Office installation.
Thanks anyway
Thorsten
-------- Original-Nachricht --------
> Datum: Wed, 20 Aug 2008 08:00:53 -0700 (PDT)
> Von: Anthony Andrews <py...@yahoo.com>
> An: POI Users List <us...@poi.apache.org>
> Betreff: Re: How to convert plain html to excel binary (.xls)
> Well then, ages ago I wrote a piece of code that uses the Standard Widget
> Toolkit (part of the Eclipse IDE) to manipulate Word from Java code. In
> essence, the class shows how you can execute VBA code (almost) from within
> Java. Recently, someone contacted me to ask for the demonstration code again
> so I posted that archive on Rapidshare. At the same time, I wrote a class
> that showed how to do the same thing for Excel, to give you an idea, the code
> to save a file looks like this;
>
> /**
> Save the open file using a different name/location.
>
> @param fileName A String object encapsulating the full path
> to and name under which the file should be
> saved. Note that this method makes no
> attempt to check that a value has
> been passed
> into the fileName parameter or
> whether the
> value passed is reasonable. This is
> omitted
> to focus on the OLE specific coding.
>
> @throw An SWTException will be thrown if a problem is
> encountered obtaining the id's for properties or
> invoking methods.
> */
> public void saveFileAs(String fileName) throws SWTException{
>
> OleAutomation automation = null;
>
> try {
>
> // Get an automation for the ActiveWorkbook object
> automation = this.getAutomation(this.excelAuto,
>
> "ActiveWorkbook");
>
> // Get the id of the SaveAs property of the
> automation
> int[] propId = automation.getIDsOfNames(
> new String[]{"SaveAs"});
>
> // If it was not possible to obtain the id, throw an
> // SWTException and terminate processing.
> if(propId == null) {
> throw new SWTException(
> "In saveFileAs() and an id could not be
> obtained for the SaveAs method.");
> }
>
> // Build an array of type Variant to hold the
> // parameters for the SaveAs method; only the
> // first parameter - the path to and name
> // of the file - is required here. Note also the
> // technique used; to construct an array of
> // type Variant and initialise it's members
> // with the parameter values. This is a standard
> // technique used throughout the class.
> Variant[] arguments = new Variant[1];
> arguments[0] = new Variant(fileName);
>
> // Call the invoke() method on the automation object
> // and pass the id of the SaveAs property along
> // with the array of parameters to indicate both
> // the method that should be executed and what
> // values should be passed to it.
> Variant invokeResult = automation.invoke(
> propId[0], arguments);
>
> // Check that the method completed successfully.
> // If not, throw an SWTException.
> if(invokeResult == null) {
> throw new SWTException(
> "In saveFile() and a problem occurred
> invoking
> the SaveAs method.");
> }
>
> }
> finally {
>
> // Dispose of the automation to release IDespatch
> if(automation != null) {
> automation.dispose();
> }
>
> }
>
> }
>
> As a technique, it has drawbacks; you can only use a single thread of
> control, it is not possible to sun in a client server architecture, etc but
> many of these prblems are shared with JNI.
>
> If you wish, I will send you the OLE class so that you can have a play.
> Just let me know.
>
>
> --- On Wed, 8/20/08, Suyash Jape <su...@tcs.com> wrote:
> From: Suyash Jape <su...@tcs.com>
> Subject: Re: How to convert plain html to excel binary (.xls)
> To: "POI Users List" <us...@poi.apache.org>
> Date: Wednesday, August 20, 2008, 4:47 AM
>
> Thanks Anthony.
> Yes.thats a good idea. And it will work well. Also we could call the vb
> script from java using Runtime.exec()
> The only issue is , as the rest of the project code is in Java, looking
> for a way completely through Java.
> A laborious solution is :
> Decide a format for HTML
> Write a HTML parser to take appropriate actions in POI according to
> which tag is encountered.
> e.g : If <table>
> If <tr>
> sheet.createRow()
>
> and so on.
>
> Regards
> Suyash
>
> Anthony Andrews <py...@yahoo.com> wrote on 20-08-2008 16:01:37:
>
> > Rather than use native integration, have you considered programming
> > an Excel macro to do the work for you. I know that this will mean
> > using VBA but the code should be very straightforward IMO. You can
> > even use the macro recorder to get something like this;
> >
> > Sub Macro1()
> > '
> > ' Macro1 Macro
> > ' Macro recorded 20/08/2008 by AA
> > '
> >
> > '
> > ChDir "C:\Documents and Settings\win
> user\Desktop\Cards"
> > Workbooks.Open Filename:= _
> > "C:\Documents and Settings\win
> user\Desktop\Cards\Form-
> > A-Lines free Christmas tree card pattern.htm"
> > ChDir "C:\Documents and Settings\win
> user\Desktop"
> > ActiveWorkbook.SaveAs Filename:= _
> > "C:\Documents and Settings\win
> user\Desktop\test.xls",
> > FileFormat:=xlNormal,
> > _Password:="",
> > WriteResPassword:="",
> > ReadOnlyRecommended:=False,
> > _
> > CreateBackup:=False
> > End Sub
> >
> > --- On Tue, 8/19/08, Suyash Jape <su...@tcs.com> wrote:
> > From: Suyash Jape <su...@tcs.com>
> > Subject: Re: How to convert plain html to excel binary (.xls)
> > To: "POI Users List" <us...@poi.apache.org>
> > Date: Tuesday, August 19, 2008, 9:01 PM
> >
> > Hi all
> > Thanks for your replies ( Anthony , Nick , Martin )
> > The HTML may not be well formed. So what would be ideal is passing the
>
> > html content to MS Excel, let it covert it to binary and then store it
> in
> > a sheet through HSSF. But i guess Microsoft would not release the API
>
> > for MS Excel :)
> >
> > So i am going to try some JNI calls , to open MS Excel , and make it
> > 'Save As' the html to .xls. Will have to try this with MS Excel
> as
> > most
> > of my client machines won't have OpenOffice.
> >
> > Thanks
> > Suyash
> >
> >
> > On Mon, 18 Aug 2008, Anthony Andrews wrote:
> > > As far as I am aware, there is no single tool that will simply parse
> the
> >
> > > HTML for and then create an Excel spreadsheet from that data.
> >
> > With some work, you might be able to get apache cocoon to do it for you.
>
> > It supports reading from xml, processing then outputting as .xls.
> Assuming
> >
> > your html is well-formed, you should be able to treat it as generic xml
> > and work like that.
> >
> > http://cocoon.apache.org/1363_1_1.html
> >
> > Nick
> >
> >
> > You could try automating OpenOffice.org.
> >
> > OpenOffice.org can insert HTML files into an existing spreadsheet and
> > then save as .xls.
> >
> > HTH
> >
> > Martin
> >
> > Suyash Jape <su...@tcs.com> wrote on 18-08-2008 16:43:16:
> >
> > > Hi ,
> > > I have plain html files, which i want to store in different
> > > sheets on a single worksheet. How should i do that ?
> > >
> > > An HTML file saved as .xls can be opened by MS Excel but CANNOT be
> > opened
> > > by POI API
> > > " java io ioexception invalid header signature hssf
> > poifsfilesystem"
> > >
> > > An HTML file if opened in MS EXCEL and then 'Saved As' .xls
>
> > file ,
> > can
> > > then be opened by POI.
> > > So the problem is , how to convert plain HTML to binary .xls , so
> that
> > it
> > > can be opened by POI.
> > >
> > > Thanks
> > > Suyash
> > =====-----=====-----=====
> > Notice: The information contained in this e-mail
> > message and/or attachments to it may contain
> > confidential or privileged information. If you are
> > not the intended recipient, any dissemination, use,
> > review, distribution, printing or copying of the
> > information contained in this e-mail message
> > and/or attachments to it are strictly prohibited. If
> > you have received this communication in error,
> > please notify us by reply e-mail or telephone and
> > immediately and permanently delete the message
> > and any attachments. Thank you
> >
> >
> >
> >
> >
> >
> > ForwardSourceID:NT0000679E
> =====-----=====-----=====
> Notice: The information contained in this e-mail
> message and/or attachments to it may contain
> confidential or privileged information. If you are
> not the intended recipient, any dissemination, use,
> review, distribution, printing or copying of the
> information contained in this e-mail message
> and/or attachments to it are strictly prohibited. If
> you have received this communication in error,
> please notify us by reply e-mail or telephone and
> immediately and permanently delete the message
> and any attachments. Thank you
>
>
>
>
>
>
--
GMX startet ShortView.de. Hier findest Du Leute mit Deinen Interessen!
Jetzt dabei sein: http://www.shortview.de/wasistshortview.php?mc=sv_ext_mf@gmx
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org
Re: How to convert plain html to excel binary (.xls)
Posted by Anthony Andrews <py...@yahoo.com>.
Well then, ages ago I wrote a piece of code that uses the Standard Widget Toolkit (part of the Eclipse IDE) to manipulate Word from Java code. In essence, the class shows how you can execute VBA code (almost) from within Java. Recently, someone contacted me to ask for the demonstration code again so I posted that archive on Rapidshare. At the same time, I wrote a class that showed how to do the same thing for Excel, to give you an idea, the code to save a file looks like this;
/**
Save the open file using a different name/location.
@param fileName A String object encapsulating the full path
to and name under which the file should be
saved. Note that this method makes no
attempt to check that a value has been passed
into the fileName parameter or whether the
value passed is reasonable. This is omitted
to focus on the OLE specific coding.
@throw An SWTException will be thrown if a problem is
encountered obtaining the id's for properties or
invoking methods.
*/
public void saveFileAs(String fileName) throws SWTException{
OleAutomation automation = null;
try {
// Get an automation for the ActiveWorkbook object
automation = this.getAutomation(this.excelAuto,
"ActiveWorkbook");
// Get the id of the SaveAs property of the automation
int[] propId = automation.getIDsOfNames(
new String[]{"SaveAs"});
// If it was not possible to obtain the id, throw an
// SWTException and terminate processing.
if(propId == null) {
throw new SWTException(
"In saveFileAs() and an id could not be
obtained for the SaveAs method.");
}
// Build an array of type Variant to hold the
// parameters for the SaveAs method; only the
// first parameter - the path to and name
// of the file - is required here. Note also the
// technique used; to construct an array of
// type Variant and initialise it's members
// with the parameter values. This is a standard
// technique used throughout the class.
Variant[] arguments = new Variant[1];
arguments[0] = new Variant(fileName);
// Call the invoke() method on the automation object
// and pass the id of the SaveAs property along
// with the array of parameters to indicate both
// the method that should be executed and what
// values should be passed to it.
Variant invokeResult = automation.invoke(
propId[0], arguments);
// Check that the method completed successfully.
// If not, throw an SWTException.
if(invokeResult == null) {
throw new SWTException(
"In saveFile() and a problem occurred invoking
the SaveAs method.");
}
}
finally {
// Dispose of the automation to release IDespatch
if(automation != null) {
automation.dispose();
}
}
}
As a technique, it has drawbacks; you can only use a single thread of control, it is not possible to sun in a client server architecture, etc but many of these prblems are shared with JNI.
If you wish, I will send you the OLE class so that you can have a play. Just let me know.
--- On Wed, 8/20/08, Suyash Jape <su...@tcs.com> wrote:
From: Suyash Jape <su...@tcs.com>
Subject: Re: How to convert plain html to excel binary (.xls)
To: "POI Users List" <us...@poi.apache.org>
Date: Wednesday, August 20, 2008, 4:47 AM
Thanks Anthony.
Yes.thats a good idea. And it will work well. Also we could call the vb
script from java using Runtime.exec()
The only issue is , as the rest of the project code is in Java, looking
for a way completely through Java.
A laborious solution is :
Decide a format for HTML
Write a HTML parser to take appropriate actions in POI according to
which tag is encountered.
e.g : If <table>
If <tr>
sheet.createRow()
and so on.
Regards
Suyash
Anthony Andrews <py...@yahoo.com> wrote on 20-08-2008 16:01:37:
> Rather than use native integration, have you considered programming
> an Excel macro to do the work for you. I know that this will mean
> using VBA but the code should be very straightforward IMO. You can
> even use the macro recorder to get something like this;
>
> Sub Macro1()
> '
> ' Macro1 Macro
> ' Macro recorded 20/08/2008 by AA
> '
>
> '
> ChDir "C:\Documents and Settings\win
user\Desktop\Cards"
> Workbooks.Open Filename:= _
> "C:\Documents and Settings\win
user\Desktop\Cards\Form-
> A-Lines free Christmas tree card pattern.htm"
> ChDir "C:\Documents and Settings\win
user\Desktop"
> ActiveWorkbook.SaveAs Filename:= _
> "C:\Documents and Settings\win
user\Desktop\test.xls",
> FileFormat:=xlNormal,
> _Password:="",
> WriteResPassword:="",
> ReadOnlyRecommended:=False,
> _
> CreateBackup:=False
> End Sub
>
> --- On Tue, 8/19/08, Suyash Jape <su...@tcs.com> wrote:
> From: Suyash Jape <su...@tcs.com>
> Subject: Re: How to convert plain html to excel binary (.xls)
> To: "POI Users List" <us...@poi.apache.org>
> Date: Tuesday, August 19, 2008, 9:01 PM
>
> Hi all
> Thanks for your replies ( Anthony , Nick , Martin )
> The HTML may not be well formed. So what would be ideal is passing the
> html content to MS Excel, let it covert it to binary and then store it
in
> a sheet through HSSF. But i guess Microsoft would not release the API
> for MS Excel :)
>
> So i am going to try some JNI calls , to open MS Excel , and make it
> 'Save As' the html to .xls. Will have to try this with MS Excel
as
> most
> of my client machines won't have OpenOffice.
>
> Thanks
> Suyash
>
>
> On Mon, 18 Aug 2008, Anthony Andrews wrote:
> > As far as I am aware, there is no single tool that will simply parse
the
>
> > HTML for and then create an Excel spreadsheet from that data.
>
> With some work, you might be able to get apache cocoon to do it for you.
> It supports reading from xml, processing then outputting as .xls.
Assuming
>
> your html is well-formed, you should be able to treat it as generic xml
> and work like that.
>
> http://cocoon.apache.org/1363_1_1.html
>
> Nick
>
>
> You could try automating OpenOffice.org.
>
> OpenOffice.org can insert HTML files into an existing spreadsheet and
> then save as .xls.
>
> HTH
>
> Martin
>
> Suyash Jape <su...@tcs.com> wrote on 18-08-2008 16:43:16:
>
> > Hi ,
> > I have plain html files, which i want to store in different
> > sheets on a single worksheet. How should i do that ?
> >
> > An HTML file saved as .xls can be opened by MS Excel but CANNOT be
> opened
> > by POI API
> > " java io ioexception invalid header signature hssf
> poifsfilesystem"
> >
> > An HTML file if opened in MS EXCEL and then 'Saved As' .xls
> file ,
> can
> > then be opened by POI.
> > So the problem is , how to convert plain HTML to binary .xls , so
that
> it
> > can be opened by POI.
> >
> > Thanks
> > Suyash
> =====-----=====-----=====
> Notice: The information contained in this e-mail
> message and/or attachments to it may contain
> confidential or privileged information. If you are
> not the intended recipient, any dissemination, use,
> review, distribution, printing or copying of the
> information contained in this e-mail message
> and/or attachments to it are strictly prohibited. If
> you have received this communication in error,
> please notify us by reply e-mail or telephone and
> immediately and permanently delete the message
> and any attachments. Thank you
>
>
>
>
>
>
> ForwardSourceID:NT0000679E
=====-----=====-----=====
Notice: The information contained in this e-mail
message and/or attachments to it may contain
confidential or privileged information. If you are
not the intended recipient, any dissemination, use,
review, distribution, printing or copying of the
information contained in this e-mail message
and/or attachments to it are strictly prohibited. If
you have received this communication in error,
please notify us by reply e-mail or telephone and
immediately and permanently delete the message
and any attachments. Thank you
Re: How to convert plain html to excel binary (.xls)
Posted by Suyash Jape <su...@tcs.com>.
Thanks Anthony.
Yes.thats a good idea. And it will work well. Also we could call the vb
script from java using Runtime.exec()
The only issue is , as the rest of the project code is in Java, looking
for a way completely through Java.
A laborious solution is :
Decide a format for HTML
Write a HTML parser to take appropriate actions in POI according to
which tag is encountered.
e.g : If <table>
If <tr>
sheet.createRow()
and so on.
Regards
Suyash
Anthony Andrews <py...@yahoo.com> wrote on 20-08-2008 16:01:37:
> Rather than use native integration, have you considered programming
> an Excel macro to do the work for you. I know that this will mean
> using VBA but the code should be very straightforward IMO. You can
> even use the macro recorder to get something like this;
>
> Sub Macro1()
> '
> ' Macro1 Macro
> ' Macro recorded 20/08/2008 by AA
> '
>
> '
> ChDir "C:\Documents and Settings\win user\Desktop\Cards"
> Workbooks.Open Filename:= _
> "C:\Documents and Settings\win user\Desktop\Cards\Form-
> A-Lines free Christmas tree card pattern.htm"
> ChDir "C:\Documents and Settings\win user\Desktop"
> ActiveWorkbook.SaveAs Filename:= _
> "C:\Documents and Settings\win user\Desktop\test.xls",
> FileFormat:=xlNormal,
> _Password:="",
> WriteResPassword:="",
> ReadOnlyRecommended:=False,
> _
> CreateBackup:=False
> End Sub
>
> --- On Tue, 8/19/08, Suyash Jape <su...@tcs.com> wrote:
> From: Suyash Jape <su...@tcs.com>
> Subject: Re: How to convert plain html to excel binary (.xls)
> To: "POI Users List" <us...@poi.apache.org>
> Date: Tuesday, August 19, 2008, 9:01 PM
>
> Hi all
> Thanks for your replies ( Anthony , Nick , Martin )
> The HTML may not be well formed. So what would be ideal is passing the
> html content to MS Excel, let it covert it to binary and then store it
in
> a sheet through HSSF. But i guess Microsoft would not release the API
> for MS Excel :)
>
> So i am going to try some JNI calls , to open MS Excel , and make it
> 'Save As' the html to .xls. Will have to try this with MS Excel as
> most
> of my client machines won't have OpenOffice.
>
> Thanks
> Suyash
>
>
> On Mon, 18 Aug 2008, Anthony Andrews wrote:
> > As far as I am aware, there is no single tool that will simply parse
the
>
> > HTML for and then create an Excel spreadsheet from that data.
>
> With some work, you might be able to get apache cocoon to do it for you.
> It supports reading from xml, processing then outputting as .xls.
Assuming
>
> your html is well-formed, you should be able to treat it as generic xml
> and work like that.
>
> http://cocoon.apache.org/1363_1_1.html
>
> Nick
>
>
> You could try automating OpenOffice.org.
>
> OpenOffice.org can insert HTML files into an existing spreadsheet and
> then save as .xls.
>
> HTH
>
> Martin
>
> Suyash Jape <su...@tcs.com> wrote on 18-08-2008 16:43:16:
>
> > Hi ,
> > I have plain html files, which i want to store in different
> > sheets on a single worksheet. How should i do that ?
> >
> > An HTML file saved as .xls can be opened by MS Excel but CANNOT be
> opened
> > by POI API
> > " java io ioexception invalid header signature hssf
> poifsfilesystem"
> >
> > An HTML file if opened in MS EXCEL and then 'Saved As' .xls
> file ,
> can
> > then be opened by POI.
> > So the problem is , how to convert plain HTML to binary .xls , so
that
> it
> > can be opened by POI.
> >
> > Thanks
> > Suyash
> =====-----=====-----=====
> Notice: The information contained in this e-mail
> message and/or attachments to it may contain
> confidential or privileged information. If you are
> not the intended recipient, any dissemination, use,
> review, distribution, printing or copying of the
> information contained in this e-mail message
> and/or attachments to it are strictly prohibited. If
> you have received this communication in error,
> please notify us by reply e-mail or telephone and
> immediately and permanently delete the message
> and any attachments. Thank you
>
>
>
>
>
>
> ForwardSourceID:NT0000679E
=====-----=====-----=====
Notice: The information contained in this e-mail
message and/or attachments to it may contain
confidential or privileged information. If you are
not the intended recipient, any dissemination, use,
review, distribution, printing or copying of the
information contained in this e-mail message
and/or attachments to it are strictly prohibited. If
you have received this communication in error,
please notify us by reply e-mail or telephone and
immediately and permanently delete the message
and any attachments. Thank you
Re: How to convert plain html to excel binary (.xls)
Posted by Anthony Andrews <py...@yahoo.com>.
Rather than use native integration, have you considered programming an Excel macro to do the work for you. I know that this will mean using VBA but the code should be very straightforward IMO. You can even use the macro recorder to get something like this;
Sub Macro1()
'
' Macro1 Macro
' Macro recorded 20/08/2008 by AA
'
'
ChDir "C:\Documents and Settings\win user\Desktop\Cards"
Workbooks.Open Filename:= _
"C:\Documents and Settings\win user\Desktop\Cards\Form-
A-Lines free Christmas tree card pattern.htm"
ChDir "C:\Documents and Settings\win user\Desktop"
ActiveWorkbook.SaveAs Filename:= _
"C:\Documents and Settings\win user\Desktop\test.xls",
FileFormat:=xlNormal,
_Password:="",
WriteResPassword:="",
ReadOnlyRecommended:=False,
_
CreateBackup:=False
End Sub
--- On Tue, 8/19/08, Suyash Jape <su...@tcs.com> wrote:
From: Suyash Jape <su...@tcs.com>
Subject: Re: How to convert plain html to excel binary (.xls)
To: "POI Users List" <us...@poi.apache.org>
Date: Tuesday, August 19, 2008, 9:01 PM
Hi all
Thanks for your replies ( Anthony , Nick , Martin )
The HTML may not be well formed. So what would be ideal is passing the
html content to MS Excel, let it covert it to binary and then store it in
a sheet through HSSF. But i guess Microsoft would not release the API
for MS Excel :)
So i am going to try some JNI calls , to open MS Excel , and make it
'Save As' the html to .xls. Will have to try this with MS Excel as
most
of my client machines won't have OpenOffice.
Thanks
Suyash
On Mon, 18 Aug 2008, Anthony Andrews wrote:
> As far as I am aware, there is no single tool that will simply parse the
> HTML for and then create an Excel spreadsheet from that data.
With some work, you might be able to get apache cocoon to do it for you.
It supports reading from xml, processing then outputting as .xls. Assuming
your html is well-formed, you should be able to treat it as generic xml
and work like that.
http://cocoon.apache.org/1363_1_1.html
Nick
You could try automating OpenOffice.org.
OpenOffice.org can insert HTML files into an existing spreadsheet and
then save as .xls.
HTH
Martin
Suyash Jape <su...@tcs.com> wrote on 18-08-2008 16:43:16:
> Hi ,
> I have plain html files, which i want to store in different
> sheets on a single worksheet. How should i do that ?
>
> An HTML file saved as .xls can be opened by MS Excel but CANNOT be
opened
> by POI API
> " java io ioexception invalid header signature hssf
poifsfilesystem"
>
> An HTML file if opened in MS EXCEL and then 'Saved As' .xls
file ,
can
> then be opened by POI.
> So the problem is , how to convert plain HTML to binary .xls , so that
it
> can be opened by POI.
>
> Thanks
> Suyash
=====-----=====-----=====
Notice: The information contained in this e-mail
message and/or attachments to it may contain
confidential or privileged information. If you are
not the intended recipient, any dissemination, use,
review, distribution, printing or copying of the
information contained in this e-mail message
and/or attachments to it are strictly prohibited. If
you have received this communication in error,
please notify us by reply e-mail or telephone and
immediately and permanently delete the message
and any attachments. Thank you
Re: How to convert plain html to excel binary (.xls)
Posted by Suyash Jape <su...@tcs.com>.
Hi all
Thanks for your replies ( Anthony , Nick , Martin )
The HTML may not be well formed. So what would be ideal is passing the
html content to MS Excel, let it covert it to binary and then store it in
a sheet through HSSF. But i guess Microsoft would not release the API
for MS Excel :)
So i am going to try some JNI calls , to open MS Excel , and make it
'Save As' the html to .xls. Will have to try this with MS Excel as most
of my client machines won't have OpenOffice.
Thanks
Suyash
On Mon, 18 Aug 2008, Anthony Andrews wrote:
> As far as I am aware, there is no single tool that will simply parse the
> HTML for and then create an Excel spreadsheet from that data.
With some work, you might be able to get apache cocoon to do it for you.
It supports reading from xml, processing then outputting as .xls. Assuming
your html is well-formed, you should be able to treat it as generic xml
and work like that.
http://cocoon.apache.org/1363_1_1.html
Nick
You could try automating OpenOffice.org.
OpenOffice.org can insert HTML files into an existing spreadsheet and
then save as .xls.
HTH
Martin
Suyash Jape <su...@tcs.com> wrote on 18-08-2008 16:43:16:
> Hi ,
> I have plain html files, which i want to store in different
> sheets on a single worksheet. How should i do that ?
>
> An HTML file saved as .xls can be opened by MS Excel but CANNOT be
opened
> by POI API
> " java io ioexception invalid header signature hssf poifsfilesystem"
>
> An HTML file if opened in MS EXCEL and then 'Saved As' .xls file ,
can
> then be opened by POI.
> So the problem is , how to convert plain HTML to binary .xls , so that
it
> can be opened by POI.
>
> Thanks
> Suyash
=====-----=====-----=====
Notice: The information contained in this e-mail
message and/or attachments to it may contain
confidential or privileged information. If you are
not the intended recipient, any dissemination, use,
review, distribution, printing or copying of the
information contained in this e-mail message
and/or attachments to it are strictly prohibited. If
you have received this communication in error,
please notify us by reply e-mail or telephone and
immediately and permanently delete the message
and any attachments. Thank you