You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@poi.apache.org by bu...@apache.org on 2015/06/20 16:12:05 UTC

[Bug 58061] New: Switch to NPOIFS causes simple XLS created by POI to be corrupt

https://bz.apache.org/bugzilla/show_bug.cgi?id=58061

            Bug ID: 58061
           Summary: Switch to NPOIFS causes simple XLS created by POI to
                    be corrupt
           Product: POI
           Version: 3.13-dev
          Hardware: PC
                OS: Linux
            Status: NEW
          Severity: regression
          Priority: P2
         Component: HSSF
          Assignee: dev@poi.apache.org
          Reporter: dominik.stadler@gmx.at

.xls files produced by POI on trunk are not valid right now, it seems a recent
change broke HSSF completely.

When running "git bissect", the following is found as culprit:

----
9d4d015a344dbe95918dcb7b2daa5af2f0677da0 is the first bad commit
commit 9d4d015a344dbe95918dcb7b2daa5af2f0677da0
Author: nick <ni...@13f79535-47bb-0310-9956-ffa450edef68>
Date:   Mon May 11 18:04:30 2015 +0000

    #56791 More updates from OPOIFS to NPOIFS

    git-svn-id: https://svn.apache.org/repos/asf/poi/trunk@1678783
13f79535-47bb-0310-9956-ffa450edef68

:040000 040000 277fa7aea6ea9ed9bfaa53cf45c4d390437ac1e3
26deba68dd564b5b790679dab7eeb99abbe550bf M      src
'bisect run' erfolgreich ausgeführt
----

I used the following command in the "git bissect" to decide if it still works
or not, i.e. a simple XLS that is produced by POI will not open in LibreOffice
anymore. It's the same for Excel:

----
java -cp build/classes:build/examples-classes
org.apache.poi.hssf.usermodel.examples.CreateCells && \
test -f workbook.xls && \
unoconv -vvv --timeout=10 --doctype=spreadsheet --output=workbook.pdf
workbook.xls
----

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 58061] Switch to NPOIFS causes simple XLS created by POI to be corrupt

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=58061

--- Comment #11 from Nick Burch <ap...@gagravarr.org> ---
I've noticed that most of the other tools pop their Properties Table into block
0, which NPOIFS wasn't. I've changed that in r1688038. More tweaks may be
required though

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 58061] Switch to NPOIFS causes simple XLS created by POI to be corrupt

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=58061

--- Comment #15 from Nick Burch <ap...@gagravarr.org> ---
With the help of Morten Welinder from libgsf, I think we might have this
cracked now as of r1689505! Would someone be able to confirm?

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 58061] Switch to NPOIFS causes simple XLS created by POI to be corrupt

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=58061

Dominik Stadler <do...@gmx.at> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Resolution|---                         |FIXED
             Status|NEEDINFO                    |RESOLVED

--- Comment #16 from Dominik Stadler <do...@gmx.at> ---
Thanks a lot, it seems to work fine now also with very small files!

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 58061] Switch to NPOIFS causes simple XLS created by POI to be corrupt

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=58061

Dominik Stadler <do...@gmx.at> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Blocks|                            |56791

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 58061] Switch to NPOIFS causes simple XLS created by POI to be corrupt

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=58061

--- Comment #6 from Dominik Stadler <do...@gmx.at> ---
Created attachment 32845
  --> https://bz.apache.org/bugzilla/attachment.cgi?id=32845&action=edit
Log when parsing the corrupted file

bffvalidator c:\temp\58061corrupt.xls
BFFValidator: "c:\temp\58061corrupt.xls" NOT RECOGNIZED (The Microsoft Office
Binary File Format Validator encountered an error reading the file you
specified, OR The Microsof
t Office Binary File Format Validator supports Word, Excel, and PowerPoint
binary file formats only. The file you specified is an unsupported file type.)
at 06/22/15 16:42:14
Log at: c:\temp\58061corrupt.xls.bffvalidator.06-22-15_16-42-14.xml

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 58061] Switch to NPOIFS causes simple XLS created by POI to be corrupt

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=58061

--- Comment #10 from Nick Burch <ap...@gagravarr.org> ---
Created attachment 32852
  --> https://bz.apache.org/bugzilla/attachment.cgi?id=32852&action=edit
libgsf-1 python program to check the NPOIFS and OPOIFS files

I've had a bit of a play with libgsf-1, in python, listing script attached. It
reports no problems reading the OPOIFS generated file, but on the NPOIFS one it
gives the error 

libgsf:msole-WARNING **: failure reading block 0 for 'Workbook'

That, coupled with the BFF validator error, does make me think we must be doing
something wrong with how we're writing the file. Odd that our code reads it
fine, but the other libraries don't... May need to step through the libgsf code
to see what it isn't liking, then re-check the matching NPOIFS write code
against the spec for that section

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 58061] Switch to NPOIFS causes simple XLS created by POI to be corrupt

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=58061

--- Comment #8 from Dominik Stadler <do...@gmx.at> ---
Created attachment 32847
  --> https://bz.apache.org/bugzilla/attachment.cgi?id=32847&action=edit
File created with NPOIFS

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 58061] Switch to NPOIFS causes simple XLS created by POI to be corrupt

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=58061

Nick Burch <ap...@gagravarr.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |NEEDINFO

--- Comment #1 from Nick Burch <ap...@gagravarr.org> ---
The files won't be identical at the byte level, as the code for assigning
blocks is different in NPOIFS. Streams should be the same though

It's odd that things are breaking, as I believe people have been using NPOIFS
for ages with HSSF without problems.

Does it happen with all files written, or only certain ones? What's the
smallest file we can generate with OPOIFS and NPOIFS that shows it? (It's much
easier to compare at the byte/raw level something small!)

If you take a problematic file, generate it with OPOIFS and with NPOIFS, then
run POIFSDump on the two, are the various streams the same?

Can NPOIFS open and read the problematic file OK? Can OPOIFS read it? Can HSSF
read it?

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 58061] Switch to NPOIFS causes simple XLS created by POI to be corrupt

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=58061

--- Comment #3 from Dominik Stadler <do...@gmx.at> ---
Created attachment 32842
  --> https://bz.apache.org/bugzilla/attachment.cgi?id=32842&action=edit
results from POIFSDump in two directories

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 58061] Switch to NPOIFS causes simple XLS created by POI to be corrupt

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=58061

--- Comment #13 from Nick Burch <ap...@gagravarr.org> ---
I've spent quite a bit more time on this, but I'm still stumped as to what
we're doing wrong. OpenMcdf is able to read the files without error or issue,
but libgsf objects. If you open the file in OpenMcdf, save the Workbook stream,
then create a new file with OpenMcdf, Excel opens that file. (However, that
file is wrong in other ways, bugs reported to the OpenMcdf project!) If you
save the Workbook stream out using POIFSDump, then use python + ctypes + libgsf
to write it, then Excel is happy with the file, but in some situations libgsf
gives the same error about its file that it does for NPOIFS generated ones! Bug
reported to libgsf as well. And I still can't work out what NPOIFS is doing
that upsets Excel / OpenOffice...

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 58061] Switch to NPOIFS causes simple XLS created by POI to be corrupt

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=58061

--- Comment #7 from Dominik Stadler <do...@gmx.at> ---
Created attachment 32846
  --> https://bz.apache.org/bugzilla/attachment.cgi?id=32846&action=edit
Log when parsing the good file

bffvalidator c:\temp\58061good.xls
BFFValidator: "c:\temp\58061good.xls" FAILED at 06/22/15 16:42:09
Log at: c:\temp\58061good.xls.bffvalidator.06-22-15_16-42-09.xml
See:
http://msdn.microsoft.com/en-us/library/A6FFF2B4-470A-463D-A6E9-9DAD9676CD44
for more information

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 58061] Switch to NPOIFS causes simple XLS created by POI to be corrupt

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=58061

--- Comment #4 from Dominik Stadler <do...@gmx.at> ---
BiffViewer can open both without error and the only difference is in the
username that is written to the file. The corrupt one uses the login-name, the
other the first- and last-name of the current user.

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 58061] Switch to NPOIFS causes simple XLS created by POI to be corrupt

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=58061

--- Comment #12 from Nick Burch <ap...@gagravarr.org> ---
Larger files seem to open OK with OpenOffice and libgsf, so it looks like
remaining issues are around the ministream. If someone with handy access to a
copy of windows could try re-creating the small test file on trunk, and seeing
if the BFFV gives any more helpful messages about the file or not, that'd help!

(If not, I've got a c program working with libgsf-1, so next step otherwise is
to use gdb to step into it to see why it doesn't like the file)

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 58061] Switch to NPOIFS causes simple XLS created by POI to be corrupt

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=58061

Dominik Stadler <do...@gmx.at> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEEDINFO                    |NEW

--- Comment #2 from Dominik Stadler <do...@gmx.at> ---
The following produces an invalid xls for me both on Linux and Windows:

    @Test
    public void test58061() throws IOException {
        Workbook wb = new HSSFWorkbook();
        wb.createSheet("testsheet");

        OutputStream out = new FileOutputStream("C:\\temp\\58061.xls");
        try {
            wb.write(out);
        } finally {
            out.close();
        }
    }

I will attach the result of POIFSDump for the a good file (using
OPOIFSFileSystem) and a bad one (using NPOIFSFileSystem).

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 58061] Switch to NPOIFS causes simple XLS created by POI to be corrupt

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=58061

--- Comment #9 from Dominik Stadler <do...@gmx.at> ---
Created attachment 32848
  --> https://bz.apache.org/bugzilla/attachment.cgi?id=32848&action=edit
File created with OPOIFS

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 58061] Switch to NPOIFS causes simple XLS created by POI to be corrupt

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=58061

Nick Burch <ap...@gagravarr.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |NEEDINFO

--- Comment #5 from Nick Burch <ap...@gagravarr.org> ---
Any chance you could try running the latest Microsoft Binary File Format
Validator on the two files, and see if either gets reported as having problems?
If not, we'll have to dive into the hex dumps, which won't be fun or quick...

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 58061] Switch to NPOIFS causes simple XLS created by POI to be corrupt

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=58061

--- Comment #14 from Nick Burch <ap...@gagravarr.org> ---
No response on my libgsf or OpenMcdf bug reports

I had a play with the Node.JS OLE2 library js-cfb
<https://github.com/SheetJS/js-cfb> - this is able to read the NPOIFS
ministream without error, just like OpenMcdf can do. I've also tried the Perl
module Spreadsheet::Read, and it reads the NPOIFS file fine too

Microsoft released a new version of the [MS-CFB] file format spec last week,
but sadly that didn't seem to have any changes in it that cover this case

At this point, it looks like a subtle issue where one reading of the format
spec differs slightly from the one Excel + OO has. Unless someone else can spot
something, it's probably going to need setting up a Windows C development
environment, writing a simple lister with the same OLE2 functions that Excel
uses, and stepping into that with a debugger to see why it doesn't like our
ministream.... Any takers? :)

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org