You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@poi.apache.org by bu...@apache.org on 2015/05/11 21:07:47 UTC

[Bug 57919] New: Support in-place write when opened from a File

https://bz.apache.org/bugzilla/show_bug.cgi?id=57919

            Bug ID: 57919
           Summary: Support in-place write when opened from a File
           Product: POI
           Version: 3.12-FINAL
          Hardware: PC
                OS: Linux
            Status: NEW
          Severity: normal
          Priority: P2
         Component: POI Overall
          Assignee: dev@poi.apache.org
          Reporter: apache@gagravarr.org

Because the older POIFSFileSytem only ever supported read or write, and only
read from an InputStream, all the POIDocument classes only supported writing
out to a new stream. When the OOXML support was added, this followed the same
pattern

With most things using NPOIFSFileSystem, which supports in-place write when
using a File, we can revisit this. It should be possible to have POIDocument /
POIXMLDocument classes write in-place if opened from a File, avoiding the need
to do the write-as-new + close + rename shuffle on Windows, amongst other
benefits

The existing write(OutputStream) would remain, both for compatibility and for
those needing to load from streams rather than files, but it should be possible
to add a write() method as well for file-based in-place write. Just needs some
work!

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 57919] Support in-place write when opened from a File

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=57919

--- Comment #5 from Javen O'Neal <on...@apache.org> ---
Created attachment 34054
  --> https://bz.apache.org/bugzilla/attachment.cgi?id=34054&action=edit
timing saving an HSSFWorkbook on local/network drive and buffered/unbuffered

Here's the timing results:

> ncalls  tottime     cumtime    function
> 30      62.083488   62.083488  save_network_unbuffered
> 30      6.488765    6.488765   save_network_buffered
> 30      1.807957    1.807957   save_local_unbuffered
> 30      1.397861    1.397861   save_local_buffered

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 57919] Support in-place write when opened from a File

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=57919

Javen O'Neal <on...@apache.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
  Attachment #34054|0                           |1
        is obsolete|                            |

--- Comment #7 from Javen O'Neal <on...@apache.org> ---
Created attachment 34055
  --> https://bz.apache.org/bugzilla/attachment.cgi?id=34055&action=edit
read from NPOIFSFileSystem and write in place or buffered/unbuffered
outputstream

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 57919] Support in-place write when opened from a File

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=57919

--- Comment #11 from Javen O'Neal <on...@apache.org> ---
I've been trying to get meaningful data that excludes the effect of JVM JIT
(which I've noticed in the form of "modify" running much slower on the first
test than subsequent tests and disk caching.

So far it looks like in-place writing is faster than writing to a different
location, and in-place writing to network is only slightly slower than in-place
writing to local.

I think the utility and simplicity of writing in-place is worth it regardless
of the timing results, so I'd say go ahead and continue adding in-placing
writing. Is it easy to make in-place writing work for
WorkbookFactory.create(File) and HSSFWorkbook(File)?

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 57919] Support in-place write when opened from a File

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=57919

--- Comment #14 from Javen O'Neal <on...@apache.org> ---
r1799308 might come in handy for handling nested try...finally statements with
closeable resources. It should also make it easier to write unit tests that
don't leak resources without indenting the code with try...finally statements.

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 57919] Support in-place write when opened from a File

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=57919

--- Comment #10 from Javen O'Neal <on...@apache.org> ---
I think the data from comment 8 is inconclusive because the same open_workbook
function was called for each test but has results varying by a factor of two.
The order that the tests ran affected the results. What we're seeing here is
disk cache performance. I'm going to run fewer trials spaced over more time
during peak business hours when the network drive is busiest. I expected to see
the order of magnitude difference between local ad network unbuffered.

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 57919] Support in-place write when opened from a File

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=57919

Javen O'Neal <on...@apache.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Blocks|                            |59252

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 57919] Support in-place write when opened from a File

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=57919

--- Comment #1 from Nick Burch <ap...@gagravarr.org> ---
Initial HSSFWorkbook in-place write support added in r1753103. If we're happy
with how this looks and works, should be quite quick to generalise for the
other OLE2 (POIDocument) classes which support write

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 57919] Support in-place write when opened from a File

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=57919

--- Comment #6 from Javen O'Neal <on...@apache.org> ---
I used
https://svn.apache.org/viewvc/poi/trunk/test-data/spreadsheet/53446.xls?view=log
 from r1614697, which was one of the larger spreadsheets.

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 57919] Support in-place write when opened from a File

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=57919

--- Comment #13 from Nick Burch <ap...@gagravarr.org> ---
In-place-write and write-to-new-File now done for HSSF, HSLF and HPSF

We could probably use some feedback on it before making more major changes to
HWPF to add the support there, and before delving into OPC / X??F (for which
we've at least one other bug around close vs revert vs write!)

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 57919] Support in-place write when opened from a File

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=57919

Javen O'Neal <on...@apache.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
  Attachment #34056|application/vnd.ms-excel    |text/csv
          mime type|                            |

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 57919] Support in-place write when opened from a File

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=57919

--- Comment #9 from Nick Burch <ap...@gagravarr.org> ---
Looks like the in-place write stuff is faster than buffered streams over the
network, and locally. Am I reading that right?

If so, I can look to add a write(File) method to HSSFWorkbook which would
create a new file and write into that. Oh, and do write() + write(File) on the
other write-supporting H??F classes.

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 57919] Support in-place write when opened from a File

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=57919

Javen O'Neal <on...@apache.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Version|3.12-FINAL                  |3.16-dev

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 57919] Support in-place write when opened from a File

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=57919

Nick Burch <ap...@gagravarr.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |NEEDINFO

--- Comment #4 from Nick Burch <ap...@gagravarr.org> ---
Currently (N)POIFS doesn't support opening an empty file, so you'd (for now)
need to create a new empty POIFS, save it to a file (would be ~2kb), open that,
then save the document into it. I guess we might want a static create method on
POIFSFileSystem that hides all that, which could later be replaced with logic
to create the new file from scratch

As for if it would help... Any chance you could, on your somewhat shonky
filesystem, open a HSSFWorkbook from a POIFS loaded r/w from a file, make some
changes, do an in-place write, and see how quick it saves back vs OutputStream
vs BufferedOutputStream?

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 57919] Support in-place write when opened from a File

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=57919

--- Comment #2 from Javen O'Neal <on...@apache.org> ---
Nick said in r1753103
> This will fail (with an {@link IllegalStateException} if the
> Workbook was opened read-only, opened from an {@link InputStream}
> instead of a File, or if this is not the root document. For those cases, 
> you must use {@link #write(OutputStream)} to write to a brand new stream.

We urge users to read spreadsheets from a File rather than an InputStream to
reduce memory consumption. Is there any we we could also reduce memory
consumption by writing a File instead of an OutputStream?

I noticed a significant performance improvement when writing to a
BufferedOutputStream(FileOutputStream) on a network drive (Windows share). We
should be cognizant of that, either in the code or as a disclaimer in the
JavaDocs.

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 57919] Support in-place write when opened from a File

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=57919

Nick Burch <ap...@gagravarr.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Hardware|PC                          |All
           Severity|normal                      |enhancement
                 OS|Linux                       |All

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 57919] Support in-place write when opened from a File

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=57919

--- Comment #3 from Javen O'Neal <on...@apache.org> ---
Added performance note about using BufferedOutputStreams in r1753112, since it
wasn't mentioned there yet.

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 57919] Support in-place write when opened from a File

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=57919

--- Comment #12 from Nick Burch <ap...@gagravarr.org> ---
(In reply to Javen O'Neal from comment #11)
> Is it easy to make in-place writing work for
> WorkbookFactory.create(File) and HSSFWorkbook(File)?

in-place write works for both of those as long as you set read-only to false
when opening

> I think the utility and simplicity of writing in-place is worth it
> regardless of the timing results, so I'd say go ahead and continue adding
> in-placing writing.

OK, I've had a go at adding HSSFWorkbook.write(File). Will need to fix a POIFS
bug later before it works... Does HSSFWorkbook now look how you'd expect / like
all the document classes to look for write methods?

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 57919] Support in-place write when opened from a File

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=57919

--- Comment #8 from Javen O'Neal <on...@apache.org> ---
Created attachment 34056
  --> https://bz.apache.org/bugzilla/attachment.cgi?id=34056&action=edit
timing spreadsheet

Averaged over 40 runs.

> Method               Open     Modify   Write    Close
> unbuffered local     0.214    0.004    0.137    0.035
> unbuffered network   0.321    0.005    0.219    0.035
> buffered local       0.189    0.002    0.141    0.038
> buffered network     0.326    0.002    0.215    0.035
> inplace local        0.178    0.002    0.097    0.040
> inplace network      0.278    0.003    0.131    0.069

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org