You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@poi.apache.org by frankgrimes97 <fr...@gmail.com> on 2013/03/12 20:15:36 UTC

SXSSF Streaming without on-disk temporary files?

Hi All,

I have a web service and also a webapp frontend which currently expose
exported data as text/csv.
However, most of my users access data through the webapp and use Excel.

The problem is that it seems that there are many gotchas with Excel CSV
handling.
e.g.
http://stackoverflow.com/questions/6588068/which-encoding-opens-csv-files-correctly-with-excel-on-both-mac-and-windows
http://xbfish.com/2011/05/11/all-data-are-in-single-column-when-opening-csv-with-excel/
http://www.paessler.com/knowledgebase/en/topic/2293-i-have-trouble-opening-csv-files-with-microsoft-excel-is-there-a-quick-way-to-fix-this

Even in my limited testing on Excel for Mac 2011 I get very bizarre
handling of CSV files.
Specifically, a CSV file double clicked in Finder opens Excel but puts all
data in one column as reported above.
Opening Excel first and then selecting the CSV file from within Excel
brings up the Text Import Wizard, forcing users to explicitly select the
delimiter.

Also, I believe that behaviour in different browsers when dealing with
text/csv is not always consistent.
e.g.
http://code.google.com/p/chromium/issues/detail?id=152911

For these reasons, I would like to offer users an option to retrieve the
data in Excel format directly.
The amount of data retrieved can be quite large, but from my understanding
it should be possible to create/stream XLSX/OOXML directly to a
ServletOutputStream.
However, I've looked at the documentation and API for SXSSF in POI and I
don't see a way to avoid writing the content out to a temporary file.

Is it possible to generate the content without incurring the cost of disk
write/read I/O and then cleanup?
If not through the SXSSF interface, then perhaps some other POI APIs might
support it?

Thanks,

Frank Grimes

Re: SXSSF Streaming without on-disk temporary files?

Posted by Nick Burch <ap...@gagravarr.org>.
On Tue, 12 Mar 2013, frankgrimes97 wrote:
> For these reasons, I would like to offer users an option to retrieve the
> data in Excel format directly.
> The amount of data retrieved can be quite large, but from my understanding
> it should be possible to create/stream XLSX/OOXML directly to a
> ServletOutputStream.
> However, I've looked at the documentation and API for SXSSF in POI and I
> don't see a way to avoid writing the content out to a temporary file.
>
> Is it possible to generate the content without incurring the cost of disk
> write/read I/O and then cleanup?

Nope, sorry. The excel file format has some back-and-forward references. 
The .xlsx format doesn't have as many as .xls, but there are still some.

In theory, you should be able to open a streaming zip write, push out the 
bits that don't have back/forward refs, write some bits of the file, send 
those, then send the last parts and wrap it all up. It would mean some 
noticable changes to low level bits of POI though, as some bits (esp the 
ooxml code) assumes everything is to hand

If you're worried about disk io, you'd be best off just using something 
like tmpfs or a ram disk to hold the parts.

Nick

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org