You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@poi.apache.org by bu...@apache.org on 2012/07/01 17:19:55 UTC

[Bug 53493] New: delete xml's temporary files when write workbook

https://issues.apache.org/bugzilla/show_bug.cgi?id=53493

          Priority: P2
            Bug ID: 53493
          Assignee: dev@poi.apache.org
           Summary: delete xml's temporary files when write workbook
          Severity: normal
    Classification: Unclassified
                OS: All
          Reporter: submax@tiscalinet.it
          Hardware: All
            Status: NEW
           Version: 3.8
         Component: SXSSF
           Product: POI

Created attachment 29018
  --> https://issues.apache.org/bugzilla/attachment.cgi?id=29018&action=edit
patch for delete xml's temporary files when write workbook

the xml's temporary files are not deleted when call write method of
SXSSFWorkbook class.

Example:
/var/tmp/poi-sxssf-sheet-xml127328075585779261.gz
or when compress is active:
/var/tmp/poi-sxssf-sheet127328075585779261

In the SXSSF code there is not a part for this purpose.

There is only _fd.deleteOnExit() in SheetDataWriter class, but is not enough if
the program that use POI not exit from jvm and these files remain on file
system.

For confirm this, in the method write in SXSSFWorkbook class, there is already
tmplFile.delete() for another tmp file.

In conclusion in write method missing the delete on xml's temporary files.

I believe the correct solution is to have SXSSF tidy up when it's done with
files, rather than requiring users to do that via explicit temp directory
controls.

I attach a patch for fix this issue. This is very rudimental patch (I don't
know the structure of SXSSF classes), this is an example,  but it works very
well.

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 53493] delete xml's temporary files when write workbook

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=53493

--- Comment #3 from Alex Geller <ag...@4js.com> ---
Yes, you are right but that is another issue (bug). There is nothing in the
documentation that says that you can call (S)XSSFSheet.write() only once. 
To fix it, the statements flushRows(0) and  _writer.close() in
SXSSFSheet.getWorksheetXMLInputStream() should be done only if the sheet has
not already been flushed to the disk so that the function should look something
like this:
public InputStream getWorksheetXMLInputStream() throws IOException
{
// flush all remaining data and close the temp file writer if not already done
    if(isModifiable())
    {
        flushRows(0);
        _writer.close();
    }
    return _writer.getWorksheetXMLInputStream();
} 
/** Is this sheet modifiable. Sheets created before a workbook is saved become
immutable after saving*/
public boolean isModifiable()
{
    return !_writer.isClosed();
}
The method isModifiable() also documents another detail in the constraints in
terms of "order of calls" that the streaming nature of SXSSF imposes over XSSF.
Perhaps we should consider introducing an unchecked
"StreamingConstraintException" when one violates the calling rules (e.g. if you
try to add data to a closed sheet).
However, I think that the general rule should be to avoid constraints when we
have the choice between a constrained and a non constrained implementation of a
public API method.
In the case of SXSSFWorkbook.write() I see no reason why the number of calls
should be restricted to calling it only once. I even suspect that that saving a
workbook, adding sheets and saving again will work with the suggested fix. 
On the other hand, if you run into more difficult issues then it might not be
worth it and we should document that you may call the function
SXSSFWorkbook.write() only once.
Regarding your correct observation that the current code already contains code
to delete the temporary data file of a sheet, my question is, why that isn't
sufficient for your problem? If you make sure that your application does not
reference the SXSSFWorkbook after saving it then the temporary files should get
deleted some tome after that when the garbage collector collects the sheets. If
this doesn't happen than that is a bug. Apparently the general advice is not to
use finalize() for temp file deletion but the arguments I have heard so far are
not convincing. I will read up the relevant chapter in Effective Java and if it
turns out to be an issue then I agree that we will have to remove the files
explicitly.

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 53493] delete xml's temporary files when write workbook

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=53493

Yegor Kozlov <ye...@dinom.ru> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Depends on|                            |53780
         Resolution|---                         |FIXED

--- Comment #8 from Yegor Kozlov <ye...@dinom.ru> ---
As of r1384784, POI provides a method to explicitly dispose of the temp files
associated with the workbook.

Call SXSSFWorkbook.dispose() in the end and you will be good. 

Yegor

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 53493] delete xml's temporary files when write workbook

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=53493

David Pletcher <dp...@google.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |dpletcher@google.com

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 53493] delete xml's temporary files when write workbook

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=53493

--- Comment #1 from Alex Geller <ag...@4js.com> ---
Reading the patch I conclude that you can save a SXSSF workbook only once. That
would not be correct. The earliest point in time that the a data file can be
deleted it when it is no longer referenced. Hence I am OK with that part of the
patch that deletes the files in Sheet.finalize(). I think that the part that
explicitly calls Sheet.finalize() from the Workbook.write() should be removed.

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 53493] delete xml's temporary files when write workbook

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=53493

--- Comment #4 from Alex Geller <ag...@4js.com> ---
Created attachment 29032
  --> https://issues.apache.org/bugzilla/attachment.cgi?id=29032&action=edit
Command line application to test SXSSF temp file removal in finalize()

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 53493] delete xml's temporary files when write workbook

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=53493

--- Comment #7 from Alex Geller <ag...@4js.com> ---
(In reply to comment #6)
> 1) What is bug that don't permit multiple writes of the same workbook object?
I posted a new bug 53515 "SXSSFWorkbook.write() fails when called more than
once" (https://issues.apache.org/bugzilla/show_bug.cgi?id=53515) 
> 2) This bug occurs only with SXSSF ?
Yes, the test application provided in bug 53515 shows that it works perfectly
for XSSFWorkbook. Besides, if that weren't the case then it should be
documented.

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 53493] delete xml's temporary files when write workbook

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=53493

--- Comment #5 from Alex Geller <ag...@4js.com> ---
I hope it is OK to post this much prose for a bug but I think I am done now.
The main problem with temp file removal in finalize() is that you can't predict
when it is done. I thought that that wouldn't be an issue since we don't really
care when they are removed as long as they are removed timely (e.g. the next
few minutes). However it is possible that this is not the case. If you comment
the line that runs gc explicitly from the attached test program then the
workbook does not get garbage collected on my JVM (garbage collection is
implementation dependent so it may not reproduce on other JVMs). However, for
long running programs that allocate memory regularly one can probably assume
that the scheme works.
I suggest to leave the code as is but to introduce an additional function
SXSSFWorkbook.dispose() that allows to free the resources explicitly.

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 53493] delete xml's temporary files when write workbook

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=53493

--- Comment #2 from Massimo Cavalleri <su...@tiscalinet.it> ---
(In reply to comment #1)
> Reading the patch I conclude that you can save a SXSSF workbook only once.
> That would not be correct. The earliest point in time that the a data file
> can be deleted it when it is no longer referenced. Hence I am OK with that
> part of the patch that deletes the files in Sheet.finalize(). I think that
> the part that explicitly calls Sheet.finalize() from the Workbook.write()
> should be removed.

I try to write a same workbook two time but this do not permit action and
return Exception:

Exception in thread "main" java.io.IOException: Stream closed
    at java.io.BufferedWriter.ensureOpen(BufferedWriter.java:98)
    at java.io.BufferedWriter.flushBuffer(BufferedWriter.java:108)
    at java.io.BufferedWriter.flush(BufferedWriter.java:235)
    at
org.apache.poi.xssf.streaming.SheetDataWriter.close(SheetDataWriter.java:78)
    at
org.apache.poi.xssf.streaming.SXSSFSheet.getWorksheetXMLInputStream(SXSSFSheet.java:71)
    at
org.apache.poi.xssf.streaming.SXSSFWorkbook.injectData(SXSSFWorkbook.java:295)
    at
org.apache.poi.xssf.streaming.SXSSFWorkbook.write(SXSSFWorkbook.java:767)
    at TestXlsx.main(TestXlsx.java:37)

For this reason I have insert the delete in write method. 
I understand that object's workbook can to be writed only one time.
Infact finalize() of SheetDataWriter already exist and this is private.

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 53493] delete xml's temporary files when write workbook

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=53493

--- Comment #6 from Massimo Cavalleri <su...@tiscalinet.it> ---

1) What is bug that don't permit multiple writes of the same workbook object?
2) This bug occurs only with SXSSF ?

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org