You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@poi.apache.org by "Brule, Jon" <Jo...@xerox.com> on 2006/03/10 21:51:14 UTC

HSSF - Generating large spreadsheets in streaming manner?

Is it possible to generate a very large spreadsheet (e.g. several
thousand rows) in a low-memory, streaming manner? I am looking for a
corollary to the event model used to parse large spreadsheets.

If not, I assume that the Cocoon serializer, which I understand uses
HSSF, would not operate in a streaming manner either...

Thank you.

Regards,
Jon
_________________
Jon R. Brule
Paramount Computing Associates

---------------------------------------------------------------------
To unsubscribe, e-mail: poi-user-unsubscribe@jakarta.apache.org
Mailing List:     http://jakarta.apache.org/site/mail2.html#poi
The Apache Jakarta Poi Project:  http://jakarta.apache.org/poi/


Re: HSSF - Generating large spreadsheets in streaming manner?

Posted by Sanjiv Jivan <sa...@gmail.com>.
That seems to make sense. Thanks.

On 1/12/07, David Fisher <df...@jmlafferty.com> wrote:
>
> Accept state and follow Michael's suggestion (which is how we do it
> with our dynamic PPT, XLS and PDF) is to build the file and then
> serve the file's bytes. While the content is creating we show a
> progress with a continue working button, if that is clicked then we
> have a menu bar spot that tracks the progress and when the build
> thread has completed if the progress is still going the download
> proceeds automatically, if they have continued working then the menu
> bar updates to let them the document is ready and they can select it
> for download at their leisure.
>
>
>
>
>
>
>
> Heck according to ComputerWorld this approach defeats the recently
> discovered Adobe Reader web browsing vulnerability issue.
>
> Other technical reasons:
>
> IE tends to think it knows what type of file it is being served by
> looking at the beginning, guessing and then re-requesting the file.
> Do you want to be building that 50MB file twice? Most of us don't.
> This is the standards flogging trick that Microsoft used to beat
> Netscape with their first IE.
>
> Alternatively, if you are doing nothing except serving 50MB of data w/
> o any style attributes then why not just serve up a CSV file? If you
> set the mime-type correctly that will stream out just fine.
>
> Regards,
> Dave
>
> On Jan 12, 2007, at 2:33 PM, Sanjiv Jivan wrote:
>
> > When you go to download a file from the web you see a download
> > dialog with a
> > progress bar. Would you prefer that when you want to download ,say,
> > a 50 MB
> > zip file (which will take more than a second or two) off the
> > internet that
> > it follow the workflow you describe?
> >
> > The fact that the Excel spreadsheet is generated dynamically is an
> > internal
> > detail. Again, we're talking about a download that takes a few
> > minutes and
> > not hours. Why should the end user go about a different workflow to
> > download
> > a file?
> >
> >
> > On 1/12/07, Donahue, Michael <mi...@pearson.com> wrote:
> >>
> >> Sanjiv -
> >>
> >> Strictly speaking as a web developer, this is a very bad approach to
> >> dealing with a task that may take more than a second or two complete.
> >> Typically, a web application should never make the user wait for more
> >> than a few seconds to completely load the next page.  I don't think I
> >> would want to take your approach for a think client either.
> >>
> >> In the situation you described, it would be better to tell the
> >> user that
> >> their request has been accepted and as soon as it is complete they
> >> will
> >> be notified through some other mechanism that they can download or
> >> view
> >> the results.  This could be through a screen/window pop.
> >>
> >> There still might be a few good places that it might make sense to
> >> have
> >> a streaming API, I'd prefer to see effort spent on tasks that have a
> >> broader utilization curve like the recently added comments support.
> >>
> >> Lastly; "THANK YOU!!" to all of the POI Project developers for all of
> >> their efforts to make POI better.
> >>
> >> -----Original Message-----
> >> From: Sanjiv Jivan [mailto:sanjiv.jivan@gmail.com]
> >> Sent: Friday, January 12, 2007 12:41 PM
> >> To: POI Users List; acoliver@apache.org
> >> Subject: Re: HSSF - Generating large spreadsheets in streaming
> >> manner?
> >>
> >> I think that having a streaming API would be very useful and its not
> >> because
> >> of trying to generate a massive non human readable spreadsheet.
> >> You have
> >> to
> >> factor in the time it takes to build that data to be used for the
> >> spreadsheet too.
> >>
> >> Consider a use case where a user is trying to download a spreadsheet
> >> with
> >> 500 - 1000 rows but the logic involved in getting the data for the
> >> spreadsheet takes around a minute. Without a streaming API, when a
> >> user
> >> tries to download such a file they click on the link and basically
> >> the
> >> browser waits for 1 minute and only then pops up a save dialog
> >> since the
> >> contents of the spreadsheet could only be written out to the response
> >> stream
> >> after the entire spreadsheet was generated. Had there been a
> >> streaming
> >> API,
> >> the contents could have been written to the response stream on the
> >> fly
> >> and a
> >> nice download dialog with progress bar would have displayed by the
> >> browser.
> >>
> >>
> >> On 3/10/06, Andrew C. Oliver <ac...@apache.org> wrote:
> >> >
> >> > not yet.  Demand for the cocoon serializer hasn't been very high
> >> so it
> >> > is mostly deprecated (unless there is some massive uptake of
> >> support
> >> for
> >> > it).
> >> >
> >> > Okay its time for my yearly rant on this subject (not aimed at
> >> you...you
> >> > just reminded me I hadn't done it this year):
> >> >
> >> > I'm always a little curious about this.  XLS is a HORRIBLE format
> >> (which
> >> > is why I started POI, I wanted to do something difficult).  It is a
> >> > HORRIBLY inefficient format and WAS NOT DESIGNED to stream.  Yet
> >> people
> >> > generate massive sheets in it.  My pensiveness is that no human is
> >> > likely to read such a large sheet or be able to do anything
> >> patricularly
> >> > useful with it.  So who are these sheets for?  Often it turns
> >> out they
> >> > are some kind of data transfer, which is frankly BAFFLING.  Why?
> >> > Because I could do the same transfer with like 1/10th of the
> >> storage,
> >> > bandwidth, CPU, etc in a more well-thought out (or at least
> >> lightweight)
> >> > format.  Yet I saw a spreadsheet today that was 100mb.  The
> >> power of
> >> > Excel is that it can style the data and use some formulas.  This is
> >> good
> >> > for what is to me a summary report and not RAW 100m or gigs of
> >> data..
> >> > Of course this comes from someone who knows how to hack the
> >> underlying
> >> > binary structures but barely knows how to run the Excel GUI.   :-)
> >> >
> >> > We now return you to your previously scheduled mail list activity.
> >> >
> >> > -Andy
> >> >
> >> > PS.  I wish the open office GUI wasn't so crappy, sluggish and
> >> > well...cruddy looking and printed nicely.  Their file formats
> >> make so
> >> > much more sense (and with compression they're reasonably efficient)
> >> and
> >> > the brilliance of text is that it works nicely with revision
> >> control
> >> and
> >> > revision control tags.
> >> >
> >> > PPS.  I also wish the open office developers would either learn C
> >> ++,
> >> > convert all of their code to C and/or port open office to a
> >> language
> >> > they know how to write better structured code in.
> >> >
> >> > Brule, Jon wrote:
> >> > > Is it possible to generate a very large spreadsheet (e.g. several
> >> > > thousand rows) in a low-memory, streaming manner? I am looking
> >> for a
> >> > > corollary to the event model used to parse large spreadsheets.
> >> > >
> >> > > If not, I assume that the Cocoon serializer, which I
> >> understand uses
> >> > > HSSF, would not operate in a streaming manner either...
> >> > >
> >> > > Thank you.
> >> > >
> >> > > Regards,
> >> > > Jon
> >> > > _________________
> >> > > Jon R. Brule
> >> > > Paramount Computing Associates
> >> > >
> >> > >
> >> ---------------------------------------------------------------------
> >> > > To unsubscribe, e-mail: poi-user-unsubscribe@jakarta.apache.org
> >> > > Mailing List:     http://jakarta.apache.org/site/mail2.html#poi
> >> > > The Apache Jakarta Poi Project:  http://jakarta.apache.org/poi/
> >> > >
> >> > >
> >> >
> >> >
> >> >
> >> >
> >> ---------------------------------------------------------------------
> >> > To unsubscribe, e-mail: poi-user-unsubscribe@jakarta.apache.org
> >> > Mailing List:     http://jakarta.apache.org/site/mail2.html#poi
> >> > The Apache Jakarta Poi Project:  http://jakarta.apache.org/poi/
> >> >
> >> >
> >>
> >> *********************************************************************
> >> *******
> >> This email may contain confidential material.
> >> If you were not an intended recipient,
> >> Please notify the sender and delete all copies.
> >> We may monitor email to and from our network.
> >>
> >> *********************************************************************
> >> *******
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: poi-user-unsubscribe@jakarta.apache.org
> >> Mailing List:     http://jakarta.apache.org/site/mail2.html#poi
> >> The Apache Jakarta Poi Project:  http://jakarta.apache.org/poi/
> >>
> >>
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: poi-user-unsubscribe@jakarta.apache.org
> Mailing List:     http://jakarta.apache.org/site/mail2.html#poi
> The Apache Jakarta Poi Project:  http://jakarta.apache.org/poi/
>
>

Re: HSSF - Generating large spreadsheets in streaming manner?

Posted by David Fisher <df...@jmlafferty.com>.
Accept state and follow Michael's suggestion (which is how we do it  
with our dynamic PPT, XLS and PDF) is to build the file and then  
serve the file's bytes. While the content is creating we show a  
progress with a continue working button, if that is clicked then we  
have a menu bar spot that tracks the progress and when the build  
thread has completed if the progress is still going the download  
proceeds automatically, if they have continued working then the menu  
bar updates to let them the document is ready and they can select it  
for download at their leisure.


RE: HSSF - Generating large spreadsheets in streaming manner?

Posted by "Donahue, Michael" <mi...@pearson.com>.
Since your question is leading off topic for the POI User list, I think
we need to take this offline.  If you would like, I'll be happy to
continue this discussion in direct e-mails.

-----Original Message-----
From: Sanjiv Jivan [mailto:sanjiv.jivan@gmail.com] 
Sent: Friday, January 12, 2007 2:33 PM
To: POI Users List
Subject: Re: HSSF - Generating large spreadsheets in streaming manner?

When you go to download a file from the web you see a download dialog
with a
progress bar. Would you prefer that when you want to download ,say, a 50
MB
zip file (which will take more than a second or two) off the internet
that
it follow the workflow you describe?

The fact that the Excel spreadsheet is generated dynamically is an
internal
detail. Again, we're talking about a download that takes a few minutes
and
not hours. Why should the end user go about a different workflow to
download
a file?


On 1/12/07, Donahue, Michael <mi...@pearson.com> wrote:
>
> Sanjiv -
>
> Strictly speaking as a web developer, this is a very bad approach to
> dealing with a task that may take more than a second or two complete.
> Typically, a web application should never make the user wait for more
> than a few seconds to completely load the next page.  I don't think I
> would want to take your approach for a think client either.
>
> In the situation you described, it would be better to tell the user
that
> their request has been accepted and as soon as it is complete they
will
> be notified through some other mechanism that they can download or
view
> the results.  This could be through a screen/window pop.
>
> There still might be a few good places that it might make sense to
have
> a streaming API, I'd prefer to see effort spent on tasks that have a
> broader utilization curve like the recently added comments support.
>
> Lastly; "THANK YOU!!" to all of the POI Project developers for all of
> their efforts to make POI better.
>
> -----Original Message-----
> From: Sanjiv Jivan [mailto:sanjiv.jivan@gmail.com]
> Sent: Friday, January 12, 2007 12:41 PM
> To: POI Users List; acoliver@apache.org
> Subject: Re: HSSF - Generating large spreadsheets in streaming manner?
>
> I think that having a streaming API would be very useful and its not
> because
> of trying to generate a massive non human readable spreadsheet. You
have
> to
> factor in the time it takes to build that data to be used for the
> spreadsheet too.
>
> Consider a use case where a user is trying to download a spreadsheet
> with
> 500 - 1000 rows but the logic involved in getting the data for the
> spreadsheet takes around a minute. Without a streaming API, when a
user
> tries to download such a file they click on the link and basically the
> browser waits for 1 minute and only then pops up a save dialog since
the
> contents of the spreadsheet could only be written out to the response
> stream
> after the entire spreadsheet was generated. Had there been a streaming
> API,
> the contents could have been written to the response stream on the fly
> and a
> nice download dialog with progress bar would have displayed by the
> browser.
>
>
> On 3/10/06, Andrew C. Oliver <ac...@apache.org> wrote:
> >
> > not yet.  Demand for the cocoon serializer hasn't been very high so
it
> > is mostly deprecated (unless there is some massive uptake of support
> for
> > it).
> >
> > Okay its time for my yearly rant on this subject (not aimed at
> you...you
> > just reminded me I hadn't done it this year):
> >
> > I'm always a little curious about this.  XLS is a HORRIBLE format
> (which
> > is why I started POI, I wanted to do something difficult).  It is a
> > HORRIBLY inefficient format and WAS NOT DESIGNED to stream.  Yet
> people
> > generate massive sheets in it.  My pensiveness is that no human is
> > likely to read such a large sheet or be able to do anything
> patricularly
> > useful with it.  So who are these sheets for?  Often it turns out
they
> > are some kind of data transfer, which is frankly BAFFLING.  Why?
> > Because I could do the same transfer with like 1/10th of the
storage,
> > bandwidth, CPU, etc in a more well-thought out (or at least
> lightweight)
> > format.  Yet I saw a spreadsheet today that was 100mb.  The power of
> > Excel is that it can style the data and use some formulas.  This is
> good
> > for what is to me a summary report and not RAW 100m or gigs of
data..
> > Of course this comes from someone who knows how to hack the
underlying
> > binary structures but barely knows how to run the Excel GUI.   :-)
> >
> > We now return you to your previously scheduled mail list activity.
> >
> > -Andy
> >
> > PS.  I wish the open office GUI wasn't so crappy, sluggish and
> > well...cruddy looking and printed nicely.  Their file formats make
so
> > much more sense (and with compression they're reasonably efficient)
> and
> > the brilliance of text is that it works nicely with revision control
> and
> > revision control tags.
> >
> > PPS.  I also wish the open office developers would either learn C++,
> > convert all of their code to C and/or port open office to a language
> > they know how to write better structured code in.
> >
> > Brule, Jon wrote:
> > > Is it possible to generate a very large spreadsheet (e.g. several
> > > thousand rows) in a low-memory, streaming manner? I am looking for
a
> > > corollary to the event model used to parse large spreadsheets.
> > >
> > > If not, I assume that the Cocoon serializer, which I understand
uses
> > > HSSF, would not operate in a streaming manner either...
> > >
> > > Thank you.
> > >
> > > Regards,
> > > Jon
> > > _________________
> > > Jon R. Brule
> > > Paramount Computing Associates
> > >
> > >
> ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: poi-user-unsubscribe@jakarta.apache.org
> > > Mailing List:     http://jakarta.apache.org/site/mail2.html#poi
> > > The Apache Jakarta Poi Project:  http://jakarta.apache.org/poi/
> > >
> > >
> >
> >
> >
> >
---------------------------------------------------------------------
> > To unsubscribe, e-mail: poi-user-unsubscribe@jakarta.apache.org
> > Mailing List:     http://jakarta.apache.org/site/mail2.html#poi
> > The Apache Jakarta Poi Project:  http://jakarta.apache.org/poi/
> >
> >
>
>
************************************************************************
****
> This email may contain confidential material.
> If you were not an intended recipient,
> Please notify the sender and delete all copies.
> We may monitor email to and from our network.
>
>
************************************************************************
****
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: poi-user-unsubscribe@jakarta.apache.org
> Mailing List:     http://jakarta.apache.org/site/mail2.html#poi
> The Apache Jakarta Poi Project:  http://jakarta.apache.org/poi/
>
>
**************************************************************************** 
This email may contain confidential material. 
If you were not an intended recipient, 
Please notify the sender and delete all copies. 
We may monitor email to and from our network. 
****************************************************************************

---------------------------------------------------------------------
To unsubscribe, e-mail: poi-user-unsubscribe@jakarta.apache.org
Mailing List:     http://jakarta.apache.org/site/mail2.html#poi
The Apache Jakarta Poi Project:  http://jakarta.apache.org/poi/


Re: HSSF - Generating large spreadsheets in streaming manner?

Posted by Sanjiv Jivan <sa...@gmail.com>.
When you go to download a file from the web you see a download dialog with a
progress bar. Would you prefer that when you want to download ,say, a 50 MB
zip file (which will take more than a second or two) off the internet that
it follow the workflow you describe?

The fact that the Excel spreadsheet is generated dynamically is an internal
detail. Again, we're talking about a download that takes a few minutes and
not hours. Why should the end user go about a different workflow to download
a file?


On 1/12/07, Donahue, Michael <mi...@pearson.com> wrote:
>
> Sanjiv -
>
> Strictly speaking as a web developer, this is a very bad approach to
> dealing with a task that may take more than a second or two complete.
> Typically, a web application should never make the user wait for more
> than a few seconds to completely load the next page.  I don't think I
> would want to take your approach for a think client either.
>
> In the situation you described, it would be better to tell the user that
> their request has been accepted and as soon as it is complete they will
> be notified through some other mechanism that they can download or view
> the results.  This could be through a screen/window pop.
>
> There still might be a few good places that it might make sense to have
> a streaming API, I'd prefer to see effort spent on tasks that have a
> broader utilization curve like the recently added comments support.
>
> Lastly; "THANK YOU!!" to all of the POI Project developers for all of
> their efforts to make POI better.
>
> -----Original Message-----
> From: Sanjiv Jivan [mailto:sanjiv.jivan@gmail.com]
> Sent: Friday, January 12, 2007 12:41 PM
> To: POI Users List; acoliver@apache.org
> Subject: Re: HSSF - Generating large spreadsheets in streaming manner?
>
> I think that having a streaming API would be very useful and its not
> because
> of trying to generate a massive non human readable spreadsheet. You have
> to
> factor in the time it takes to build that data to be used for the
> spreadsheet too.
>
> Consider a use case where a user is trying to download a spreadsheet
> with
> 500 - 1000 rows but the logic involved in getting the data for the
> spreadsheet takes around a minute. Without a streaming API, when a user
> tries to download such a file they click on the link and basically the
> browser waits for 1 minute and only then pops up a save dialog since the
> contents of the spreadsheet could only be written out to the response
> stream
> after the entire spreadsheet was generated. Had there been a streaming
> API,
> the contents could have been written to the response stream on the fly
> and a
> nice download dialog with progress bar would have displayed by the
> browser.
>
>
> On 3/10/06, Andrew C. Oliver <ac...@apache.org> wrote:
> >
> > not yet.  Demand for the cocoon serializer hasn't been very high so it
> > is mostly deprecated (unless there is some massive uptake of support
> for
> > it).
> >
> > Okay its time for my yearly rant on this subject (not aimed at
> you...you
> > just reminded me I hadn't done it this year):
> >
> > I'm always a little curious about this.  XLS is a HORRIBLE format
> (which
> > is why I started POI, I wanted to do something difficult).  It is a
> > HORRIBLY inefficient format and WAS NOT DESIGNED to stream.  Yet
> people
> > generate massive sheets in it.  My pensiveness is that no human is
> > likely to read such a large sheet or be able to do anything
> patricularly
> > useful with it.  So who are these sheets for?  Often it turns out they
> > are some kind of data transfer, which is frankly BAFFLING.  Why?
> > Because I could do the same transfer with like 1/10th of the storage,
> > bandwidth, CPU, etc in a more well-thought out (or at least
> lightweight)
> > format.  Yet I saw a spreadsheet today that was 100mb.  The power of
> > Excel is that it can style the data and use some formulas.  This is
> good
> > for what is to me a summary report and not RAW 100m or gigs of data..
> > Of course this comes from someone who knows how to hack the underlying
> > binary structures but barely knows how to run the Excel GUI.   :-)
> >
> > We now return you to your previously scheduled mail list activity.
> >
> > -Andy
> >
> > PS.  I wish the open office GUI wasn't so crappy, sluggish and
> > well...cruddy looking and printed nicely.  Their file formats make so
> > much more sense (and with compression they're reasonably efficient)
> and
> > the brilliance of text is that it works nicely with revision control
> and
> > revision control tags.
> >
> > PPS.  I also wish the open office developers would either learn C++,
> > convert all of their code to C and/or port open office to a language
> > they know how to write better structured code in.
> >
> > Brule, Jon wrote:
> > > Is it possible to generate a very large spreadsheet (e.g. several
> > > thousand rows) in a low-memory, streaming manner? I am looking for a
> > > corollary to the event model used to parse large spreadsheets.
> > >
> > > If not, I assume that the Cocoon serializer, which I understand uses
> > > HSSF, would not operate in a streaming manner either...
> > >
> > > Thank you.
> > >
> > > Regards,
> > > Jon
> > > _________________
> > > Jon R. Brule
> > > Paramount Computing Associates
> > >
> > >
> ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: poi-user-unsubscribe@jakarta.apache.org
> > > Mailing List:     http://jakarta.apache.org/site/mail2.html#poi
> > > The Apache Jakarta Poi Project:  http://jakarta.apache.org/poi/
> > >
> > >
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: poi-user-unsubscribe@jakarta.apache.org
> > Mailing List:     http://jakarta.apache.org/site/mail2.html#poi
> > The Apache Jakarta Poi Project:  http://jakarta.apache.org/poi/
> >
> >
>
> ****************************************************************************
> This email may contain confidential material.
> If you were not an intended recipient,
> Please notify the sender and delete all copies.
> We may monitor email to and from our network.
>
> ****************************************************************************
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: poi-user-unsubscribe@jakarta.apache.org
> Mailing List:     http://jakarta.apache.org/site/mail2.html#poi
> The Apache Jakarta Poi Project:  http://jakarta.apache.org/poi/
>
>

RE: HSSF - Generating large spreadsheets in streaming manner?

Posted by "Donahue, Michael" <mi...@pearson.com>.
Sanjiv -

Strictly speaking as a web developer, this is a very bad approach to
dealing with a task that may take more than a second or two complete.
Typically, a web application should never make the user wait for more
than a few seconds to completely load the next page.  I don't think I
would want to take your approach for a think client either.

In the situation you described, it would be better to tell the user that
their request has been accepted and as soon as it is complete they will
be notified through some other mechanism that they can download or view
the results.  This could be through a screen/window pop.

There still might be a few good places that it might make sense to have
a streaming API, I'd prefer to see effort spent on tasks that have a
broader utilization curve like the recently added comments support.

Lastly; "THANK YOU!!" to all of the POI Project developers for all of
their efforts to make POI better.

-----Original Message-----
From: Sanjiv Jivan [mailto:sanjiv.jivan@gmail.com] 
Sent: Friday, January 12, 2007 12:41 PM
To: POI Users List; acoliver@apache.org
Subject: Re: HSSF - Generating large spreadsheets in streaming manner?

I think that having a streaming API would be very useful and its not
because
of trying to generate a massive non human readable spreadsheet. You have
to
factor in the time it takes to build that data to be used for the
spreadsheet too.

Consider a use case where a user is trying to download a spreadsheet
with
500 - 1000 rows but the logic involved in getting the data for the
spreadsheet takes around a minute. Without a streaming API, when a user
tries to download such a file they click on the link and basically the
browser waits for 1 minute and only then pops up a save dialog since the
contents of the spreadsheet could only be written out to the response
stream
after the entire spreadsheet was generated. Had there been a streaming
API,
the contents could have been written to the response stream on the fly
and a
nice download dialog with progress bar would have displayed by the
browser.


On 3/10/06, Andrew C. Oliver <ac...@apache.org> wrote:
>
> not yet.  Demand for the cocoon serializer hasn't been very high so it
> is mostly deprecated (unless there is some massive uptake of support
for
> it).
>
> Okay its time for my yearly rant on this subject (not aimed at
you...you
> just reminded me I hadn't done it this year):
>
> I'm always a little curious about this.  XLS is a HORRIBLE format
(which
> is why I started POI, I wanted to do something difficult).  It is a
> HORRIBLY inefficient format and WAS NOT DESIGNED to stream.  Yet
people
> generate massive sheets in it.  My pensiveness is that no human is
> likely to read such a large sheet or be able to do anything
patricularly
> useful with it.  So who are these sheets for?  Often it turns out they
> are some kind of data transfer, which is frankly BAFFLING.  Why?
> Because I could do the same transfer with like 1/10th of the storage,
> bandwidth, CPU, etc in a more well-thought out (or at least
lightweight)
> format.  Yet I saw a spreadsheet today that was 100mb.  The power of
> Excel is that it can style the data and use some formulas.  This is
good
> for what is to me a summary report and not RAW 100m or gigs of data..
> Of course this comes from someone who knows how to hack the underlying
> binary structures but barely knows how to run the Excel GUI.   :-)
>
> We now return you to your previously scheduled mail list activity.
>
> -Andy
>
> PS.  I wish the open office GUI wasn't so crappy, sluggish and
> well...cruddy looking and printed nicely.  Their file formats make so
> much more sense (and with compression they're reasonably efficient)
and
> the brilliance of text is that it works nicely with revision control
and
> revision control tags.
>
> PPS.  I also wish the open office developers would either learn C++,
> convert all of their code to C and/or port open office to a language
> they know how to write better structured code in.
>
> Brule, Jon wrote:
> > Is it possible to generate a very large spreadsheet (e.g. several
> > thousand rows) in a low-memory, streaming manner? I am looking for a
> > corollary to the event model used to parse large spreadsheets.
> >
> > If not, I assume that the Cocoon serializer, which I understand uses
> > HSSF, would not operate in a streaming manner either...
> >
> > Thank you.
> >
> > Regards,
> > Jon
> > _________________
> > Jon R. Brule
> > Paramount Computing Associates
> >
> >
---------------------------------------------------------------------
> > To unsubscribe, e-mail: poi-user-unsubscribe@jakarta.apache.org
> > Mailing List:     http://jakarta.apache.org/site/mail2.html#poi
> > The Apache Jakarta Poi Project:  http://jakarta.apache.org/poi/
> >
> >
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: poi-user-unsubscribe@jakarta.apache.org
> Mailing List:     http://jakarta.apache.org/site/mail2.html#poi
> The Apache Jakarta Poi Project:  http://jakarta.apache.org/poi/
>
>
**************************************************************************** 
This email may contain confidential material. 
If you were not an intended recipient, 
Please notify the sender and delete all copies. 
We may monitor email to and from our network. 
****************************************************************************

---------------------------------------------------------------------
To unsubscribe, e-mail: poi-user-unsubscribe@jakarta.apache.org
Mailing List:     http://jakarta.apache.org/site/mail2.html#poi
The Apache Jakarta Poi Project:  http://jakarta.apache.org/poi/


Re: HSSF - Generating large spreadsheets in streaming manner?

Posted by Sanjiv Jivan <sa...@gmail.com>.
I think that having a streaming API would be very useful and its not because
of trying to generate a massive non human readable spreadsheet. You have to
factor in the time it takes to build that data to be used for the
spreadsheet too.

Consider a use case where a user is trying to download a spreadsheet with
500 - 1000 rows but the logic involved in getting the data for the
spreadsheet takes around a minute. Without a streaming API, when a user
tries to download such a file they click on the link and basically the
browser waits for 1 minute and only then pops up a save dialog since the
contents of the spreadsheet could only be written out to the response stream
after the entire spreadsheet was generated. Had there been a streaming API,
the contents could have been written to the response stream on the fly and a
nice download dialog with progress bar would have displayed by the browser.


On 3/10/06, Andrew C. Oliver <ac...@apache.org> wrote:
>
> not yet.  Demand for the cocoon serializer hasn't been very high so it
> is mostly deprecated (unless there is some massive uptake of support for
> it).
>
> Okay its time for my yearly rant on this subject (not aimed at you...you
> just reminded me I hadn't done it this year):
>
> I'm always a little curious about this.  XLS is a HORRIBLE format (which
> is why I started POI, I wanted to do something difficult).  It is a
> HORRIBLY inefficient format and WAS NOT DESIGNED to stream.  Yet people
> generate massive sheets in it.  My pensiveness is that no human is
> likely to read such a large sheet or be able to do anything patricularly
> useful with it.  So who are these sheets for?  Often it turns out they
> are some kind of data transfer, which is frankly BAFFLING.  Why?
> Because I could do the same transfer with like 1/10th of the storage,
> bandwidth, CPU, etc in a more well-thought out (or at least lightweight)
> format.  Yet I saw a spreadsheet today that was 100mb.  The power of
> Excel is that it can style the data and use some formulas.  This is good
> for what is to me a summary report and not RAW 100m or gigs of data..
> Of course this comes from someone who knows how to hack the underlying
> binary structures but barely knows how to run the Excel GUI.   :-)
>
> We now return you to your previously scheduled mail list activity.
>
> -Andy
>
> PS.  I wish the open office GUI wasn't so crappy, sluggish and
> well...cruddy looking and printed nicely.  Their file formats make so
> much more sense (and with compression they're reasonably efficient) and
> the brilliance of text is that it works nicely with revision control and
> revision control tags.
>
> PPS.  I also wish the open office developers would either learn C++,
> convert all of their code to C and/or port open office to a language
> they know how to write better structured code in.
>
> Brule, Jon wrote:
> > Is it possible to generate a very large spreadsheet (e.g. several
> > thousand rows) in a low-memory, streaming manner? I am looking for a
> > corollary to the event model used to parse large spreadsheets.
> >
> > If not, I assume that the Cocoon serializer, which I understand uses
> > HSSF, would not operate in a streaming manner either...
> >
> > Thank you.
> >
> > Regards,
> > Jon
> > _________________
> > Jon R. Brule
> > Paramount Computing Associates
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: poi-user-unsubscribe@jakarta.apache.org
> > Mailing List:     http://jakarta.apache.org/site/mail2.html#poi
> > The Apache Jakarta Poi Project:  http://jakarta.apache.org/poi/
> >
> >
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: poi-user-unsubscribe@jakarta.apache.org
> Mailing List:     http://jakarta.apache.org/site/mail2.html#poi
> The Apache Jakarta Poi Project:  http://jakarta.apache.org/poi/
>
>

Re: HSSF - Generating large spreadsheets in streaming manner?

Posted by "Andrew C. Oliver" <ac...@apache.org>.
not yet.  Demand for the cocoon serializer hasn't been very high so it 
is mostly deprecated (unless there is some massive uptake of support for 
it).

Okay its time for my yearly rant on this subject (not aimed at you...you 
just reminded me I hadn't done it this year):

I'm always a little curious about this.  XLS is a HORRIBLE format (which 
is why I started POI, I wanted to do something difficult).  It is a 
HORRIBLY inefficient format and WAS NOT DESIGNED to stream.  Yet people 
generate massive sheets in it.  My pensiveness is that no human is 
likely to read such a large sheet or be able to do anything patricularly 
useful with it.  So who are these sheets for?  Often it turns out they 
are some kind of data transfer, which is frankly BAFFLING.  Why? 
Because I could do the same transfer with like 1/10th of the storage, 
bandwidth, CPU, etc in a more well-thought out (or at least lightweight) 
format.  Yet I saw a spreadsheet today that was 100mb.  The power of 
Excel is that it can style the data and use some formulas.  This is good 
for what is to me a summary report and not RAW 100m or gigs of data.. 
Of course this comes from someone who knows how to hack the underlying 
binary structures but barely knows how to run the Excel GUI.   :-)

We now return you to your previously scheduled mail list activity.

-Andy

PS.  I wish the open office GUI wasn't so crappy, sluggish and 
well...cruddy looking and printed nicely.  Their file formats make so 
much more sense (and with compression they're reasonably efficient) and 
the brilliance of text is that it works nicely with revision control and 
revision control tags.

PPS.  I also wish the open office developers would either learn C++, 
convert all of their code to C and/or port open office to a language 
they know how to write better structured code in.

Brule, Jon wrote:
> Is it possible to generate a very large spreadsheet (e.g. several
> thousand rows) in a low-memory, streaming manner? I am looking for a
> corollary to the event model used to parse large spreadsheets.
> 
> If not, I assume that the Cocoon serializer, which I understand uses
> HSSF, would not operate in a streaming manner either...
> 
> Thank you.
> 
> Regards,
> Jon
> _________________
> Jon R. Brule
> Paramount Computing Associates
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: poi-user-unsubscribe@jakarta.apache.org
> Mailing List:     http://jakarta.apache.org/site/mail2.html#poi
> The Apache Jakarta Poi Project:  http://jakarta.apache.org/poi/
> 
> 



---------------------------------------------------------------------
To unsubscribe, e-mail: poi-user-unsubscribe@jakarta.apache.org
Mailing List:     http://jakarta.apache.org/site/mail2.html#poi
The Apache Jakarta Poi Project:  http://jakarta.apache.org/poi/